Post

DynamoDB Table Design Principles - NoSQL Modeling

Abstract

DynamoDB is one of the most popular NoSQL DB storages, that has amazing performance and scaling characteristics and is widely used in many designs. There is mindshift that requires us to switch from traditional RDBMS storages to NoSQL, especially in terms of Data cardinality, denormalization, non-Join queries when planning and using NoSQL storages.

Introduction

The main caveat when using DynamoDB is to properly plan you Table structure and indexes in a way that will reflect to your data access patterns before the Table is created. And this is due to some limits.

First of all structures like LSIs are only possible to be created on the Table creation time, secondly DynamoDB does not support Regular DB schema Migration, that we used to have in RDBMS world (like flyway, liquibase, etc.), so ALTERing the table is not the operation that we can afford.

This post describes the variety of technics for Data Access to DynamoDB and also explains Data Access Patterns planning on the Table design stage.

Target DynamoDB Table Structure

diagram.png

NoSQL DynamoDB Table Structure is generated using MultiCloud Diagrams Framework

Methodology

All access patterns are driven by the proper defined Primary Key. Let’s go through the process of Table planning with overview of each structure. DynamoDB provides following PKs:

  • Simple PK (partition key)
  • Compound PK (partition key & sort key)

Table Structure

dynamo.png

Entity Relationship Diagram

erDiagram
  MOVIES {
    string Actor
    string Movie
    string Genre
    string Year
  }

Operations

  • API level access
  • Query level access
  • Scans

API level Get Item operation

To perform getItem API operation, we must include both existing PK and Sort Key (it is not possible to query by PK if SK exists).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
dynamodb = boto3.client('dynamodb')
table_name = 'Movies'
# Define the key(s) for the item you want to retrieve
key = {
  'Actor': {'S': 'Tom Cruise'},
  'Movie': {'S': 'Top Gun'}
}

response = dynamodb.get_item(
  TableName=table_name,
  Key=key,
  # Add any optional parameters if needed, e.g., ProjectionExpression, ConsistentRead, etc.
)
item = response.get('Item')

We have a requirement to support following data access patterns to access Table data based on Actor and Movie pairs:

IDAccess Pattern
A1Get All movies of Actor
A2Get movie details with Actor and movie name
A3Get movies of Actor that starts with ‘A’

All this Access Patterns have the same PK and same SK and can be handled from initial Table key design out of the box.

Query

Queries are the most powerful request mechanism in DynamoDB. They allow to access the data only by PK or using compound PK and SK

SK Condition

When querying through (PK and SK) besides standard SK comparison with some particular value, following operations can be used:

  • Equal to
  • Less than or equal to
  • Less than
  • Greater than or equal to
  • Greater than
  • Between
  • Begins with

Using Queries we can handle all these Access Patterns:

IDAccess PatternAccess BYKey ConditionFilter Condition
A1Get All movies of ActorBY actor name“Actor = ‘Tom Cruise’” 
A2Get movie details with Actor and movie nameBY actor name and BY movie name“Actor = ‘Tom Cruise’ AND EQUALS_TO(Movie, ‘Top Gun’)” 
A3Get movies of Actor that starts with ‘A’BY actor name and BY movie name“Actor = ‘Tom Cruise’ AND GREATER_THAN_OR_EQUAL_TO(Movie, ‘T’)” 

Projection Expression

Projections allow to limit attributes in the response, here is the example with boto3 library to get only Year and Genre attributes from table.

1
2
3
4
5
6
7
8
9
10
11
12
13
dynamodb = boto3.client('dynamodb')
table_name = 'Movies'

key_condition_expression = Key('Movie').eq('Top Gun') & Key('Actor').begins_with('Tom')
projection_expression = 'Year, Genre'

response = dynamodb.query(
  TableName=table_name,
  KeyConditionExpression=key_condition_expression,
  ProjectionExpression=projection_expression
)

items = response['Items']

LSI

We have more access patterns in requirements - to get Movies based on actor name and genre type, to get actor movies within specific range.

IDAccess Pattern
A4Get movies Actor that have starting from ‘2020’ year
A5Get movies Actor that have Genre ‘Comedy’

Both these access patterns share the same PK as in our Table, but have distinct SK (Year and Genre). Of course, it is possible to query or scan the DataSet and apply Filtering on top, but it will be not efficient. Because DynamoDB first retrieves all data based on PK, SK and on retrieved DataSet applies the Filters Predicate.

Much more effective is to define LSI - it is an index that has same PK as a main Table but distict attribute for SK:

IDAccess PatternAccess BYKey ConditionFilter Condition
A4Get movies Actor that have starting from ‘2020’ yearLSI: BY actor name and BY year“Actor = ‘Tom Cruise’ AND GREATER_THAN_OR_EQUAL_TO(Year, 2022)” 
A5Get movies Actor that have Genre ‘Comedy’LSI: BY actor name and BY Genre“Actor = ‘Tom Cruise’ AND EQUAL_TO(Genre, ‘Action’)” 

Scan

This is the most expensive operation to access the Table data. We should always avoid usage of the full table scan, due to:

  • huge negative impact to extra consumed Table Read Capacity
  • too long Time of the scan operation
  • no PK or Index used in operation

But in some cases Scan operation is used: small Table size, no possibility to introduce and consume one more GSI, etc.

1
2
3
4
5
6
7
8
dynamodb = boto3.client('dynamodb')
table_name = 'Movies'

response = dynamodb.scan(
  TableName=table_name,
  # Add any optional parameters if needed, e.g., FilterExpression, ProjectionExpression, etc.
)
items = response['Items']

GSIs

Help to introduce additional Data Access Patterns that are not covered with Primary and Sort Keys, we can introduce and use GSIs.

GSI Inverted Index

We will use technic called Inverted Index - PK(Actor) and SK(Movie) of Movies Table will be flipped in the GSI to PK(Movie) and SK(Actor). GSI automatically updates data when Items are added/updated/deleted in Main Table, there is no need to add extra inserts into Index Table, synchronize them with Main Table - all these are automatically provisioned.

img_10.png

So, after Flipping Movies <-> Actors Tables we have required access pattern that is using non PK attribute of Main Table:

IDAccess PatternAccess BYKey ConditionFilter Condition
A6Get all actors of MovieGSI: BY movie name“Movie = ‘Top Gun’” 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
dynamodb = boto3.client('dynamodb')
table_name = 'Movies'

# Define the GSI name
index_name = 'GSI1'

# Define the key condition expression and expression attribute values
key_condition_expression = Key('Movie').eq('Top Gun')
projection_expression = 'Year, Genre'

# Perform the query on the GSI
response = dynamodb.query(
  TableName=table_name,
  IndexName=index_name,
  KeyConditionExpression=key_condition_expression,
  ProjectionExpression=projection_expression,
)

items = response['Items']

Entity Chart

Following Entity Chart are describing Table and Index structures:

EntityPKSK
MoviesActorMovie
Movies#LSIActorYear
Movies#LSIActorGenre
Movies#GSIMovieActor

As a result we fully provided Table structure for required access patterns with 1 LSI, 1 GSI:

IDAccess PatternAccess BYKey ConditionFilter Condition
A1Get All movies of ActorBY actor name“Actor = ‘Tom Cruise’” 
A2Get movie details with Actor and movie nameBY actor name and BY movie name“Actor = ‘Tom Cruise’ AND EQUALS_TO(Movie, ‘Top Gun’)” 
A3Get movies of Actor that starts with ‘A’BY actor name and BY movie name“Actor = ‘Tom Cruise’ AND GREATER_THAN_OR_EQUAL_TO(Movie, ‘T’)” 
A4Get movies Actor that have starting from ‘2020’ yearLSI: BY actor name and BY year“Actor = ‘Tom Cruise’ AND GREATER_THAN_OR_EQUAL_TO(Year, 2022)” 
A5Get movies Actor that have Genre ‘Comedy’LSI: BY actor name and BY Genre“Actor = ‘Tom Cruise’ AND EQUAL_TO(Genre, ‘Action’)” 
A6Get all actors of MovieGSI: BY movie name“Movie = ‘Top Gun’” 
This post is licensed under CC BY 4.0 by the author.