a few things to add. i prefer partitioning based on a guaranteed key in the sense it will not distribute badly ... so the "first letter of name" is a bad idea. better use the record id and group 100k of them or what into a partition. then before storing partitions on different servers, there are a few more things to do first. one is to split modifying queries from read-only queries (which has to be done on the application level) so a simple read-replica-server (which is trivially to be setup in postgres) can be used. next what is possible is a db split on the logical level. i mean for example keep the user's core data on db1 and chat messages on db2. leaving out foreign keys and using weak references instead, with a periodic cleanup job that resolves broken links is a good idea, eliminating issues on backup restore when cut in a bad moment as well.
Coming from a decade+ of data work with health records, I have to bump this comment. Name, location and birthdate combined still aren't unique. Messing up data with potential tromps like this is straight up lethal in some fields. Remember, friends: bad data is worse than no data.
The video script explains the basics of database sharding and partitioning in system design. It discusses how sharding can help manage large amounts of data by breaking it up into smaller partitions spread across multiple servers. The script also highlights the advantages and disadvantages of sharding in terms of scalability, performance, and operational complexity. Key moments: 00:32 Traditional databases encounter limitations with increasing data size, necessitating sharding to enhance scalability and performance. -Geobase sharding partitions data based on user locations, reducing latency by routing users to the closest node. -Range-based sharding divides data by key value ranges, simplifying partition computation but potentially leading to uneven splits. -Hash-based sharding uses hashing algorithms to evenly distribute data across partitions, reducing hotspots but potentially separating related rows. -Automatic sharding dynamically manages data partitioning for higher performance and scalability, but manual sharding at the application layer increases development complexity. 03:55 Sharding enables scaling, faster queries, and system availability, but poses challenges like complex management, hot spots, and high operational costs. -Advantages of sharding include scalability, faster queries, and improved system availability during outages. -Disadvantages of sharding involve complex data relationships, potential hot spots, and operational costs for maintaining high availability. Generated by sider.ai
Great analysis, thank you! Could you help me with something unrelated: I have a SafePal wallet with USDT, and I have the seed phrase. (alarm fetch churn bridge exercise tape speak race clerk couch crater letter). How should I go about transferring them to Binance?
Do watch this video with closed captioning on for unintentional comic effect, because for some reason CC does not always know what sharding is, so it sometimes captions it as "sharting". So you get to learn about "manual vs automatic sharting".
Thanks for the interesting content! 😍 Just a small off-topic question: 😅 I have these words 🤨. (behave today finger ski upon boy assault summer exhaust beauty stereo over). Not sure how to use them, would appreciate help. 🙏
Thanks for sharing such valuable information! I have a quick question: I have a SafePal wallet with USDT, and I have the seed phrase. (air carpet target dish off jeans toilet sweet piano spoil fruit essay). How should I go about transferring them to Binance?
I would think that another potential disadvantage would be if you are using commercial rather than OpenSource operating systems or databases where the licensing costs increase as the number of servers increase also.
Sharding is also data partitioning. Partitioning can mean different things based on context, similar to “consistency” (which is different in the context of ACID and CAP)
"commodity hardware does not have ECC - don’t run a db on it" SQLite is a file based database. It doesn't have to reside into the non-paged part of the RAM. High energy cosmic radiation can corrupt only the volatile memory cells, not the storage. Also modern commodity hardware have some level of ECC for CPU cache memory. Single bit ECC support for L2 cache, and multi-bit ECC for L1 cache (at least my 10 year old Intel i7 has). A whole query operation will probably fit into the cache size of the CPU unless the data size for columns exceeds the L2 cache size of the CPU (good luck exceeding that, for example say L2 cache is 256 KB and even if we have half of it available for our query operation at this moment with all the data for columns, it would take more than 100 columns each containing >1000 bytes to surpass that cache boundary, domain corresponding these kinda large query is not a thing of commodity hardware anyways. Hospital billing, hotel management, restaurant billing? Nah). Taking worst case memory access time say 100 nano-seconds to fetch the data from RAM to L2 cache memory. Radiation will have to corrupt those exact memory bits inside the RAM within that 100 nano-seconds during the fetching cycle. Then it will take another 100 or so nano-seconds to write the data back to the disk (worst case disk access time of 50ms (0.005 ns) is assumed). It's extremely unlikely; almost next to impossible for that radiation to randomly flip those specific memory cells inside the RAM out of billions of memory cells pertaining to the SQLite update/delete query executing function that will complete it's execution and save the data into the disk within like 10 milliseconds at most (including all network overhead of system calls). SQLite for Desktop is your friend. However, if you intend to use any of the client-server architecture based database like MySQL etc then your statement is valid indeed.
Make sure you're interview-ready with Exponent's system design interview prep course: bit.ly/3YTjsjH
Animations to visualize what she is saying would make this video perfect!
I didn't knew what a database sharding was. This video gave me good amount of topics for me to research and learn. Thanks for the video!
This video was great. Short. Crisp. To the point.
❤
Great and to the point explanation, No bluff
Thanks
Glad you liked it!
This was an amazingly informative video to get a high level overview of what database sharing is, thank you!
Some people are very beautiful with a helping hand , thanku❤
you guys are amazing i recently found your channel i am learning a lot and i am loving it
a few things to add. i prefer partitioning based on a guaranteed key in the sense it will not distribute badly ... so the "first letter of name" is a bad idea. better use the record id and group 100k of them or what into a partition. then before storing partitions on different servers, there are a few more things to do first. one is to split modifying queries from read-only queries (which has to be done on the application level) so a simple read-replica-server (which is trivially to be setup in postgres) can be used. next what is possible is a db split on the logical level. i mean for example keep the user's core data on db1 and chat messages on db2. leaving out foreign keys and using weak references instead, with a periodic cleanup job that resolves broken links is a good idea, eliminating issues on backup restore when cut in a bad moment as well.
Coming from a decade+ of data work with health records, I have to bump this comment. Name, location and birthdate combined still aren't unique. Messing up data with potential tromps like this is straight up lethal in some fields.
Remember, friends: bad data is worse than no data.
The video script explains the basics of database sharding and partitioning in system design. It discusses how sharding can help manage large amounts of data by breaking it up into smaller partitions spread across multiple servers. The script also highlights the advantages and disadvantages of sharding in terms of scalability, performance, and operational complexity.
Key moments:
00:32 Traditional databases encounter limitations with increasing data size, necessitating sharding to enhance scalability and performance.
-Geobase sharding partitions data based on user locations, reducing latency by routing users to the closest node.
-Range-based sharding divides data by key value ranges, simplifying partition computation but potentially leading to uneven splits.
-Hash-based sharding uses hashing algorithms to evenly distribute data across partitions, reducing hotspots but potentially separating related rows.
-Automatic sharding dynamically manages data partitioning for higher performance and scalability, but manual sharding at the application layer increases development complexity.
03:55 Sharding enables scaling, faster queries, and system availability, but poses challenges like complex management, hot spots, and high operational costs.
-Advantages of sharding include scalability, faster queries, and improved system availability during outages.
-Disadvantages of sharding involve complex data relationships, potential hot spots, and operational costs for maintaining high availability.
Generated by sider.ai
very well described, thanks for sharing.
Great video on sharing, but partitioning wasn't mentioned or discussed.
Just memorize every word and say in the job interview.. unbelievable..
Great analysis, thank you! Could you help me with something unrelated: I have a SafePal wallet with USDT, and I have the seed phrase. (alarm fetch churn bridge exercise tape speak race clerk couch crater letter). How should I go about transferring them to Binance?
Greatly explained, I subbed
Do watch this video with closed captioning on for unintentional comic effect, because for some reason CC does not always know what sharding is, so it sometimes captions it as "sharting". So you get to learn about "manual vs automatic sharting".
Great video!
Awesome explanation.
Thanks!
Thanks for the interesting content! 😍 Just a small off-topic question: 😅 I have these words 🤨. (behave today finger ski upon boy assault summer exhaust beauty stereo over). Not sure how to use them, would appreciate help. 🙏
Thanks for sharing such valuable information! I have a quick question: I have a SafePal wallet with USDT, and I have the seed phrase. (air carpet target dish off jeans toilet sweet piano spoil fruit essay). How should I go about transferring them to Binance?
I would think that another potential disadvantage would be if you are using commercial rather than OpenSource operating systems or databases where the licensing costs increase as the number of servers increase also.
Crystal clear
you did not mention eventual consitency as a drawback of sharding?
Untill her hands moved I thought she was an AI robot 😂
Awesome, thanks
Sorry, everyone...
I parted *_and_* sharded 😢
Good video but confusing use of the term 'partition', which is different than 'shard'.
Sharding is also data partitioning. Partitioning can mean different things based on context, similar to “consistency” (which is different in the context of ACID and CAP)
Who is she and how do we get more videos with her?
you're strong
Monolithic Databases??
It sounds you messed up partitioning with sharding.
And commodity hardware does not have ECC - don’t run a db on it.
Each partition is stored within the same database server SO it's easier because sharding require multiple database servers ?
"commodity hardware does not have ECC - don’t run a db on it"
SQLite is a file based database. It doesn't have to reside into the non-paged part of the RAM. High energy cosmic radiation can corrupt only the volatile memory cells, not the storage.
Also modern commodity hardware have some level of ECC for CPU cache memory. Single bit ECC support for L2 cache, and multi-bit ECC for L1 cache (at least my 10 year old Intel i7 has). A whole query operation will probably fit into the cache size of the CPU unless the data size for columns exceeds the L2 cache size of the CPU (good luck exceeding that, for example say L2 cache is 256 KB and even if we have half of it available for our query operation at this moment with all the data for columns, it would take more than 100 columns each containing >1000 bytes to surpass that cache boundary, domain corresponding these kinda large query is not a thing of commodity hardware anyways. Hospital billing, hotel management, restaurant billing? Nah).
Taking worst case memory access time say 100 nano-seconds to fetch the data from RAM to L2 cache memory. Radiation will have to corrupt those exact memory bits inside the RAM within that 100 nano-seconds during the fetching cycle. Then it will take another 100 or so nano-seconds to write the data back to the disk (worst case disk access time of 50ms (0.005 ns) is assumed). It's extremely unlikely; almost next to impossible for that radiation to randomly flip those specific memory cells inside the RAM out of billions of memory cells pertaining to the SQLite update/delete query executing function that will complete it's execution and save the data into the disk within like 10 milliseconds at most (including all network overhead of system calls).
SQLite for Desktop is your friend.
However, if you intend to use any of the client-server architecture based database like MySQL etc then your statement is valid indeed.
Some visualization would have gone a long way
Thanks for the feedback!
Well thanks for reading the script.
😂😂😂
A lot of these YT educators write down the material before speaking to the camera. What’s your point?
Every single youtuber has to be prepared bruh, they can't just speak everything from mind and stutter when thinking :|
It's not a reaction video
😂😂haha
今天的油管就看到这儿了
You are looking so cute 🥰
reading for a teleprompter is not teaching!! sure it gave me topics that I can refer myself
A lot of youtube educators have their material scripted before speaking to the camera? What’s your point?
her name pls
It is: NoneOfYourBusiness
am in love with this lady what her id
you got the definition of Sharding wrong. understood you never did sharding in your life.