This is one of the best DynamoDB modeling walk-throughs I've seen. DAT401 is too advanced and fast, which takes us back to the beginning. There are some interesting patterns and explanations, like using Table PK=SK so that when GSI1 which has PK of Table-PK, we don't get low-cardinality PKs. At 18:11 there's an interesting use of prefix to disclude directors from appearing as part of actors aggregates. Coupling DynamoDB to ElasticsearchService at 27:28 with streams and Lambda was obvious to me -- it's a great solution -- but may not be to everyone.
if you had plotted different gsi similar to the base table and shown how typical queries (access patterns) are addressed -- the entire picture would be clear
How to ensure for example that the actor id actually exists before creating the relation between movie and actors. In other words passing any wrong actor id when creating movie&actor relation will pass and the data invalid, So do I have to check for the existence of the actor id before creating the movie(PK)&actor(SK) item?
Hi, Edin, Your video was amazing. I have a question: What is the best pattern to keep consistency across 1 to N relationship items kept in different partitions? Ie: If an actor has worked in many movies and we have an item in each of the movies to represent that releationship. How can we keep all that metadata consistent across all the items? Thanks!
Thank you, Xavi. For data consistency, as a general rule, whenever possible we default to eventual consistency approaches. In this example, first thing we have is the unique identifiers, which help to maintain relationships. If we need to update all the movie relationships for a given actor, we’d query the GSI on the actor ID to get all the movies. Then, depending on the nature of update we need to make (e.g. is it a set to new value or an increment? Idempotent?) we could either simply iterate over all the values and update each one, or we could use conditional update for idempotency. Both of these approaches are eventually consistent, and while these updates are taking place, the query on the actor ID would return some items with old and some with updated values. If all the updates have to be made in a consistent manner, we’d use transactional API. And for a consistent view, we'd also have to use transactional get API.
What if you example the actor name was misspelled how can i make sure updating the actor name in the actor partition will change the actor name in movie partition as well?
@@tvguyultra4448 You would have to make a tradeoff either to optimize for read and have some difficulties at updating and deleting or not, If you think the actor name will not change frequently or even change at all you could do as the video suggests, but if the name does change often you could either do a transaction to update actor name in actor and movie partitions at once but it does have a limit so you would still have to deal with that, or instead just store the actor id in the movie partition so u would only update the actor name at one place. I actually ended up switching to MySQL after a long struggle with DynamoDB because it didn't provide much flexibility for a startup that had changing requirements all the time, but I'm sure it's great when you exactly know most of your use cases.
@@ahmadnabil2441 that’s an interesting case. I would argue that oftentimes the requirements are not set on stone and change frequently, especially startups or new projects/products
The traditional pattern that AWS seems to promote is to name the partition key field and sort key fields as generic as possible. Generally they promote using "pk" and "sk" for partition key and sort key respectively. This is because in NoSQL, you likely will have multiple entities sharing the same field. For example, a movie title might be the partition key in one entry/row, and the next entry/row will have a director's name as the partition key. It seems bad at first, but when you start seeing advanced data modeling it becomes really powerful to "non-normalize" your data and use generic keys like pk and sk to reference that data.
Edin Zulich... How does the actual table look like? How the relationship between the Movie-Actor, Movie-Director items in the table look like? Does the Movie-Actor relation item have attributes? I'm stuck at 7:12 Could you help?
Hi Jose, The image at 7:12 shows what the table looks like. For example, one item that can be seen in it is: ID=“MOV#xyz1234”, SK=“ACT#ab00345”, NAME=“Steve Zahn”, TITLE=“Blaze”, CHARACTER=“Oilman”, CO=8. This item represents a relationship between the movie ID="MOV#xyz1234" (which is the partition key) and the actor represented by SK="ACT#ab00345" (the sort key). The rest of the item are attributes, which means that the relation does have attributes.
@@edinzulich9387 This comment actually helps a lot. I was never sure if the "SK" attribute is a pseudo attribute, just to show visually where the search key is pointing to, or if this "SK" attribute is actually existing in the data record. thanks!
One observation the use of # in the id's ... i think is a bad idea as most REST implementations use GET with url path containing id, a # there gave me a lot of pain .. to realize that if i have my ID as MOV#1234 then only MOV is send and #--- is removed by the browser as the fragment part of the URI .. Even with URI encoding the the path like someapi/movie/MOV#123 the #123 part becomes the fragment .. I think under score _ might be a better option.. Usually I never use ID as part of URL and use the query part of the URI to send the ID. but in AWS API this is sort of the default way .. So I guess using _ make much much more sense
You generally wouldn't use the hash symbol in the context that you are talking about. Instead you would see a pattern that looks more like this. In your query parameter you would pass something like ?movie=1234&actor=9876 then in the code of your application. You would simply capture the movie id of 1234 and make a query to the database where you concatenate the entity code (MOV in this example is the code used for the movie entity) with the ID. So move you have MOV#1234. This part is happening on the backend of your application (or via Lambdas if you are serverless). You aren't (and shouldn't) going to query the database directly from the client side anyway. So this is how you would stand in the middle of the two. If for whatever reason you had to pass it into your URL, then you would want to URL encode it anyway, which prevents this issue. And you should be URL encoding anything you put into your query parameters anyway, no matter what your application is or the technology stack it uses.
I'm starting to get it - the mental model of switching from relation database to nosql is difficult. I would like to see how the data gets stored in the ruclips.net/video/nhUtZ7suZWI/видео.html .
This is one of the best DynamoDB modeling walk-throughs I've seen. DAT401 is too advanced and fast, which takes us back to the beginning. There are some interesting patterns and explanations, like using Table PK=SK so that when GSI1 which has PK of Table-PK, we don't get low-cardinality PKs. At 18:11 there's an interesting use of prefix to disclude directors from appearing as part of actors aggregates. Coupling DynamoDB to ElasticsearchService at 27:28 with streams and Lambda was obvious to me -- it's a great solution -- but may not be to everyone.
if you had plotted different gsi similar to the base table and shown how typical queries (access patterns) are addressed -- the entire picture would be clear
How to ensure for example that the actor id actually exists before creating the relation between movie and actors. In other words passing any wrong actor id when creating movie&actor relation will pass and the data invalid, So do I have to check for the existence of the actor id before creating the movie(PK)&actor(SK) item?
Hi, Edin,
Your video was amazing.
I have a question:
What is the best pattern to keep consistency across 1 to N relationship items kept in different partitions?
Ie: If an actor has worked in many movies and we have an item in each of the movies to represent that releationship. How can we keep all that metadata consistent across all the items?
Thanks!
Thank you, Xavi.
For data consistency, as a general rule, whenever possible we default to eventual consistency approaches. In this example, first thing we have is the unique identifiers, which help to maintain relationships. If we need to update all the movie relationships for a given actor, we’d query the GSI on the actor ID to get all the movies. Then, depending on the nature of update we need to make (e.g. is it a set to new value or an increment? Idempotent?) we could either simply iterate over all the values and update each one, or we could use conditional update for idempotency. Both of these approaches are eventually consistent, and while these updates are taking place, the query on the actor ID would return some items with old and some with updated values. If all the updates have to be made in a consistent manner, we’d use transactional API. And for a consistent view, we'd also have to use transactional get API.
What if you example the actor name was misspelled how can i make sure updating the actor name in the actor partition will change the actor name in movie partition as well?
wanted to know as well. The "write" was not really explained and it's missing from these tutorials
Did you guys ever figure this out?
@@tvguyultra4448 You would have to make a tradeoff either to optimize for read and have some difficulties at updating and deleting or not, If you think the actor name will not change frequently or even change at all you could do as the video suggests, but if the name does change often you could either do a transaction to update actor name in actor and movie partitions at once but it does have a limit so you would still have to deal with that, or instead just store the actor id in the movie partition so u would only update the actor name at one place.
I actually ended up switching to MySQL after a long struggle with DynamoDB because it didn't provide much flexibility for a startup that had changing requirements all the time, but I'm sure it's great when you exactly know most of your use cases.
@@ahmadnabil2441 that’s an interesting case. I would argue that oftentimes the requirements are not set on stone and change frequently, especially startups or new projects/products
If you look at the example that is e. g. shown at 7:53. Would you simply name the PK and SK in a generic way e. g. 'SortID' and 'PartitionUUID'?
The traditional pattern that AWS seems to promote is to name the partition key field and sort key fields as generic as possible. Generally they promote using "pk" and "sk" for partition key and sort key respectively. This is because in NoSQL, you likely will have multiple entities sharing the same field. For example, a movie title might be the partition key in one entry/row, and the next entry/row will have a director's name as the partition key. It seems bad at first, but when you start seeing advanced data modeling it becomes really powerful to "non-normalize" your data and use generic keys like pk and sk to reference that data.
What does the "S" in "SNAME" mean?
He referred to it as search name in an earlier video.
Thanks bud, I missed that part.
Edin Zulich...
How does the actual table look like?
How the relationship between the Movie-Actor, Movie-Director items in the table look like?
Does the Movie-Actor relation item have attributes?
I'm stuck at 7:12
Could you help?
Hi Jose,
The image at 7:12 shows what the table looks like. For example, one item that can be seen in it is:
ID=“MOV#xyz1234”, SK=“ACT#ab00345”, NAME=“Steve Zahn”, TITLE=“Blaze”, CHARACTER=“Oilman”, CO=8.
This item represents a relationship between the movie ID="MOV#xyz1234" (which is the partition key) and the actor represented by SK="ACT#ab00345" (the sort key). The rest of the item are attributes, which means that the relation does have attributes.
Thank You Zulich... You made my day..!
@@edinzulich9387 This comment actually helps a lot. I was never sure if the "SK" attribute is a pseudo attribute, just to show visually where the search key is pointing to, or if this "SK" attribute is actually existing in the data record. thanks!
Great video!
One observation the use of # in the id's ... i think is a bad idea as most REST implementations use GET with url path containing id, a # there gave me a lot of pain .. to realize that if i have my ID as MOV#1234 then only MOV is send and #--- is removed by the browser as the fragment part of the URI .. Even with URI encoding the the path like someapi/movie/MOV#123 the #123 part becomes the fragment .. I think under score _ might be a better option.. Usually I never use ID as part of URL and use the query part of the URI to send the ID. but in AWS API this is sort of the default way .. So I guess using _ make much much more sense
You're right. # delimits the fragment part of a URI. Take a look at this: tools.ietf.org/html/rfc3986#section-3 and this www.urlencoder.io/learn/
You could prepend it internally to your back-end application, i think its not meant to be directly access. /123, in the backend prepend MOV
@@romyeo7078 this
You generally wouldn't use the hash symbol in the context that you are talking about. Instead you would see a pattern that looks more like this.
In your query parameter you would pass something like ?movie=1234&actor=9876 then in the code of your application. You would simply capture the movie id of 1234 and make a query to the database where you concatenate the entity code (MOV in this example is the code used for the movie entity) with the ID. So move you have MOV#1234. This part is happening on the backend of your application (or via Lambdas if you are serverless). You aren't (and shouldn't) going to query the database directly from the client side anyway. So this is how you would stand in the middle of the two.
If for whatever reason you had to pass it into your URL, then you would want to URL encode it anyway, which prevents this issue. And you should be URL encoding anything you put into your query parameters anyway, no matter what your application is or the technology stack it uses.
I'm starting to get it - the mental model of switching from relation database to nosql is difficult. I would like to see how the data gets stored in the ruclips.net/video/nhUtZ7suZWI/видео.html .
Reminiscent of the Six Degrees of Kevin Bacon