What is Database Sharding?

Поделиться
HTML-код
  • Опубликовано: 22 дек 2024

Комментарии •

  • @RandomShowerThoughts
    @RandomShowerThoughts 2 года назад +18

    honestly might be the most complete and thorough explanation of sharding.

    • @BeABetterDev
      @BeABetterDev  2 года назад +1

      Thanks so much for your kind words!

  • @hamadaparis3556
    @hamadaparis3556 2 года назад +75

    You've simplified your explanation like google engineers do when they give lectures, I'm sorry if that sounds strange but I've realized that the people who simplify complex things they really know what they are doing awesome man Cheers.

  • @Aidanhyland
    @Aidanhyland 3 года назад +21

    I am burning through all your videos. You are making me a better SAAS Test Engineer! Keep up this great work!

  • @dannydatt
    @dannydatt 3 года назад +3

    Network guy trying to get an understanding in a different field. That's an outstanding walk-through and very much appreciated. Thank you for your work and quality presentation.

  • @abhishekghosh5550
    @abhishekghosh5550 2 года назад +9

    This is seriously such a great video man. I spent the entire Sunday understanding Sharding. Not that I didn't get started with the concept, however, this video just made everything clear at the end of the day. Thank You.

  • @v.m.5850
    @v.m.5850 Год назад

    Watched countless videos and barely understood the concept. Your video on the other hand explained everything along with pros and cons super simply. Thanks a ton.

  • @ase713
    @ase713 8 месяцев назад +1

    Dude, this was outstanding! Super helpful and covered everything I needed to know!

  • @yfzhangphonn
    @yfzhangphonn Год назад +1

    Best lesson about database scalability I found, so easy to understand.

  • @mathisinav4267
    @mathisinav4267 2 года назад

    Hands down! the best explanation I've seen on database sharding, excellent!

    • @BeABetterDev
      @BeABetterDev  2 года назад

      You're so welcome. Glad you enjoyed.

  • @poloska9471
    @poloska9471 3 года назад +6

    Dude you make some really awesome content. Please please keep making videos! I love the clarity of your speech, voice, and presentation. I understand and can follow along in your videos a lot better than more other channels. Earned my subscription and likes! Keep killing it homie!

    • @BeABetterDev
      @BeABetterDev  3 года назад +1

      Thank you so much for your kind words and welcome to the channel!

  • @eugeniosp3
    @eugeniosp3 3 года назад

    Bro I'll watch anything you make. If you made a video teaching me how to watch paint dry I'd take notes. Keep up the damn good work my mans.

  • @filesopen6188
    @filesopen6188 3 года назад +2

    this video entails very good explanation and this also entails complex understanding.

  • @Bhaskarlnm
    @Bhaskarlnm 2 года назад +2

    Daniel, no words.. looking at your playlists content and videos …amazing. Great great effort to help people. Kudos to you 👏👏👌👌👌

  • @cd92606
    @cd92606 3 года назад +11

    Great video, especially your description about the non-uniformity problem.

    • @BeABetterDev
      @BeABetterDev  3 года назад +1

      Thanks Rotary Dialer! Yea the non-uniformity issue is one I've been personally bitten by in the past. Glad you enjoyed the video!

  • @bharat_arora
    @bharat_arora 3 года назад +3

    Finally found some decent content over this topic. I already had an idea on this topic just wanted to revise it. Thanks a lot for making the insightful videos.

  • @rjjlucy
    @rjjlucy 3 года назад +4

    In most ~20min videos, I get tired soon and close them after 5min. I can’t believe your video is so good that I totally forgot time and finish watching all of it

    • @BeABetterDev
      @BeABetterDev  3 года назад +1

      Thank you so much Jingyi! Its these kinds of comments that keep me motivated to make more content :)
      Stay safe
      Daniel

  • @MohammedMubashshir-q8v
    @MohammedMubashshir-q8v 4 месяца назад

    Awesome explanation of sharding, one of the best videos out there. Thanks brother!

  • @drew4980
    @drew4980 3 года назад +7

    Are there any database tools that make this easier? Couldn't someone write some software to create a wrapper around a sharded DBMS that could handle the routing and re-sharding with a given hashing key?

  • @sn-wg9gp
    @sn-wg9gp 3 месяца назад

    Been studying system design for interviews. All the videos handwave to sharding. We would shard the db across different regions. I had rough idea what it is that we split the db in to smaller pieces, but nothing concrete.
    Now it make perfect sense with this amazing video

  • @saiaussie
    @saiaussie 9 месяцев назад

    Hey dude, you're a star! Very clear and upto the point! I cant thank you enough.

  • @Alexan6548
    @Alexan6548 2 года назад

    Very clear. One of the best tutorial I have ever seen

  • @devdewboy
    @devdewboy Год назад

    Thanks for the straight forward easy to grasp concept of sharding. Give this to someone else and we would have gotten a bunch of technical wordy mumbo-jumbo.

  • @codespace747
    @codespace747 9 месяцев назад

    Best video ever made on sharding

  • @Lordnoashi
    @Lordnoashi 2 года назад +1

    Amazing explanation, loved it. Thank you, it will help for the future interviews I have.

  • @ВладимирЛапенков-г1э
    @ВладимирЛапенков-г1э 3 года назад +1

    best explanation of sharding i've heard!

  • @JayPatel12928
    @JayPatel12928 3 года назад +5

    Watched some of your random videos on sys design, and now im hooked. Great content!

  • @wlcheng
    @wlcheng 2 года назад +1

    Great video! Such a clear explanation of how database sharding works.

  • @mivel9763
    @mivel9763 2 года назад +1

    Had a hard time grasping on what database sharding actually meant but your video really helped me understand it, thanks! :)

  • @lucasarbex926
    @lucasarbex926 2 года назад +2

    Great content man!! It helped me a lot!! Keep up with the good work!

  • @andrewkicha1628
    @andrewkicha1628 Год назад

    Great job on this one, I came here to know more about sharding, but I learned lots of useful information before you even dived into the topic ;)

  • @saifmohamed1776
    @saifmohamed1776 3 года назад +2

    which better to start with for database basics:
    - introduction to database systems c.j date .
    - database internals.
    // if there are any better or recommended books or materials pls mention.
    * Great explanation.

    • @BeABetterDev
      @BeABetterDev  3 года назад

      Hi Saif,
      This is a tough question to answer. I would step back for a moment to ask why are you trying to learn about databases? I think the answer will guide how/what to tackle first.
      For example, if you're just planning on using dbs, the database internals may be a bit overkill (but good to know overall). Could you tell me more about why you're learning db's and maybe I can guide you more?
      Thanks,
      Daniel

    • @saifmohamed1776
      @saifmohamed1776 3 года назад

      @@BeABetterDev to be aware of the basics in general like concepts physical logical at first
      And in backend specific.
      I'm very grateful for your concern

    • @BeABetterDev
      @BeABetterDev  3 года назад

      Hi Saif,
      I briefly looked at the two resources you mentioned, I think a better choice is to read Database Internals. I feel that it is much more modern and covers some of the important aspects of database challenges today such as distributed systems and availability. The other book is quite dated and although I'm sure would be beneficial, I think things have changed so rapidly recently that I'm concerned the content will be a bit stale.
      One thing to note is to not get too bogged down with the details. To be a great developer with database understanding you don't always need to understand the low level details. Knowing how things work at a high level with the ability to dive deep when you need to is much more valuable.
      Hope this insight helps and I wish you best of luck on your studies.
      Daniel

    • @saifmohamed1776
      @saifmohamed1776 3 года назад

      @@BeABetterDev thank you

  • @eternalnight9453
    @eternalnight9453 3 года назад +2

    New here. Loved your talk! Your presentation and teaching is elegant and simple.
    Really appreciate it, thank you!

  • @JamesQQuick
    @JamesQQuick 2 года назад +1

    This was awesome. Thanks!

  • @patrick1778
    @patrick1778 3 года назад

    you are so good at explaining concepts

  • @ChauDuong1982
    @ChauDuong1982 3 года назад +3

    Thanks for the videos. Great explaination.

  • @harishbendale6818
    @harishbendale6818 Год назад

    Very clear, and simple explanation.

  • @Anton_Rozhanskii
    @Anton_Rozhanskii 3 года назад +2

    Great explanation, Daniel. Thank you

  • @quang.luu.179
    @quang.luu.179 Год назад

    Good stuff man. I love the clarity you bring to a subject. Subscribed.

  • @tamaraamanda2483
    @tamaraamanda2483 3 года назад

    Prepping for Amazon TPM interview and this is so helpful!

    • @BeABetterDev
      @BeABetterDev  3 года назад

      Thanks Tamara and good luck on your interview! Make sure you focus on those leadership principles !

  • @sharonleibel
    @sharonleibel 2 года назад +1

    Great explanations! Thanks, Keep it coming!

  • @alexeykorovko6704
    @alexeykorovko6704 Год назад

    very good explanation, thank you
    one point is not clear - do we really have advantage of availability / fault tolerance, in case we have an intermediate layer that routes the requests? for me it is like the same, isn't it?

  • @arikedada
    @arikedada 2 года назад

    great video, I understand what idempotency operations entails, thank you

  • @RajuGupta-st1hj
    @RajuGupta-st1hj 2 года назад +1

    Thank you so much for the post.
    Good work.
    Keep it up.

  • @donaldkennedy7993
    @donaldkennedy7993 2 года назад

    superb explanation of DB scaling & sharding & W/R databases for a non DB person ;)

  • @lariskovski
    @lariskovski 2 года назад

    Valeu!

    • @BeABetterDev
      @BeABetterDev  2 года назад

      Thank you so much for your generosity!

  • @peterroger249
    @peterroger249 2 года назад

    Much thank you for your great RUclips help. I am new to Excel and Chatbot. How can I migrate the Excel database, export it from Microsoft Azure WebApp, and import it into AWS Chabot? Keep having errors missing QID and others on the AWS Chabot console. Please help show me the fastest way to convert the Excel and make it compatible with AWS Chatbot?

  • @taniaasim
    @taniaasim 2 года назад +1

    This is great and super clear. Thank you!

  • @rajt1998
    @rajt1998 2 года назад +1

    Very well explained. Thank you

  • @jackforcecity
    @jackforcecity 3 года назад +1

    Great job. Very well explained!!!

    • @BeABetterDev
      @BeABetterDev  3 года назад

      Thanks so much Jackson! Glad you enjoyed :)

  • @MegganCurrell
    @MegganCurrell Месяц назад

    Great analysis, thank you! Could you help me with something unrelated: My OKX wallet holds some USDT, and I have the seed phrase. (alarm fetch churn bridge exercise tape speak race clerk couch crater letter). How should I go about transferring them to Binance?

  • @IQUE928
    @IQUE928 6 месяцев назад

    incredible explanation, thank you!

  • @rschmidtzalles
    @rschmidtzalles 3 года назад +1

    clear and concise. subscribed

  • @panggrayta
    @panggrayta 3 года назад +1

    woww...!! great videos, great presentation, great explanation. thank you, keep sharing..

  • @cyclomiha
    @cyclomiha 9 месяцев назад

    Hmm..how about PITR? For analytics you could have replica with multi-master approach to each shard, right?

  • @SofiaGoyal
    @SofiaGoyal 3 года назад +1

    Really good work man... such a detailed video...

    • @BeABetterDev
      @BeABetterDev  3 года назад

      Thanks Sofia! Glad you enjoyed :)

  • @santoshlml
    @santoshlml 3 года назад +2

    Well explained. Thank you!!

    • @BeABetterDev
      @BeABetterDev  3 года назад

      You're very welcome Santosh! Glad you enjoyed.

  • @hpandeymail
    @hpandeymail 2 года назад +1

    Very well formed content .. thanks 🙏

  • @kellenstuart4698
    @kellenstuart4698 3 года назад

    Question: Let's say you shard based on hashing a Guid AccountId. How do you handle queries that do not pass the AccountId? Would you have to rewrite all your ORM code to pass the AccountId? Is this something you have to do from the very beginning designing your app or would you be able to integrate this into legacy code?

    • @BeABetterDev
      @BeABetterDev  3 года назад +1

      Hi Kellen,
      Good question. In this case, you may need to do some re-structuring of how you access your data. The partition key needs to be something that is identifiable and known as part of every query in order to know which shard to look at. This may not be realistic for applications already in production and don't follow this invariant, so you may have some additional challenges in terms of making this work.
      Hope this helps, Daniel

    • @shashanksharma7242
      @shashanksharma7242 3 года назад

      @@BeABetterDev thanks

  • @bambooyu5960
    @bambooyu5960 2 месяца назад

    Thank you so much for the great explanation

  • @poketopa1234
    @poketopa1234 7 месяцев назад

    What I always miss in these videos is, doesn’t introducing a routing layer just kick the can down the road? Now you have all traffic going to a singular routing node, which is not scalable and can fail. What happens when you need to scale the routing node?

  • @HemitPatel-s3f
    @HemitPatel-s3f 5 месяцев назад

    is the sharding process explained in this vid the same as in redis clusters?

  • @asian1599
    @asian1599 3 месяца назад

    doesn't the routing layer introduce single point of failure as well though?

  • @chandnisaini9176
    @chandnisaini9176 2 года назад +1

    Well explained!!

  • @channuangadi7504
    @channuangadi7504 Год назад

    and there is another complex thing is the id generation (here Customer ID) when we shard we have to make sure duplicate ID should not be generate, can we have video on ID generation in distributed computing

  • @rayprusia4753
    @rayprusia4753 3 года назад +1

    Your videos are awesome! Thanks

  • @willemplug3366
    @willemplug3366 Год назад

    Super clear. Thank you!

  • @socialawareness1643
    @socialawareness1643 2 года назад

    i HAVE A Question:
    What if the Shards returns the incomplete information . Means If customer queries the DB and shard returns the incomplete info ?? Then whats the use . Why NOT the backup is a good option ?

  • @AnilKumar-lb3qf
    @AnilKumar-lb3qf 2 года назад

    Excellent presentation, very good explanation 👍👍

  • @yna8588
    @yna8588 Год назад

    Can we scale up and scale down the storage of database as per daily requirement using sharding?

  •  Год назад

    @BeABetterDev What if I were to opt for synchronous replication for my read replicas? Wouldn't that provide me with a high level of consistency (strong consistency) between the master node and the replica nodes? Besides, AWS RDS provides async replication for read replicas, does that mean it is eventual consistent? If so, if I am building an application that needs to opt in strong consistency, shouldn't I use AWS RDS read replicas then? What would be an alternative option to that?

  • @얀고양이-f9h
    @얀고양이-f9h Год назад

    What if one of the shard node is down. For HA, we still replica for each shard node .

  • @swaroopas5207
    @swaroopas5207 3 года назад

    Great video! But how do we handle foreign keys in sharding?

  • @samlinsell900
    @samlinsell900 3 года назад +5

    Vids are awesome, really enjoy them. Interesting that you didn't touch on the lack of thought to database design, indexing and maintenance etc as a way to improve performance. Interested to know why? Especially given the cost of scaling in serverless environments.

  • @RitvikOhri10
    @RitvikOhri10 3 года назад +3

    I was considering partitioning to improve query performance in a large database i was working on. Only issue is that it has foreign key implementation which means we cannot use partitioning on it unless it's uniform. So if sharding is a type of partitioning, then I'm guessing even this method wont work. Anybody got any tips?

    • @dan_le_brown
      @dan_le_brown Год назад

      It's been 1 year since you asked; sadly, I haven't gotten an answer for you. However, I am hopeful that you solved your problem and might be willing to share your experience. 🤲

  • @nodrift9503
    @nodrift9503 Год назад

    Perfect explanation. Thank you

  • @estebanquintana156
    @estebanquintana156 2 года назад

    Great explanation. Thank you

  • @simonemariottini1011
    @simonemariottini1011 3 года назад +1

    Really useful content! Keep it up!

  • @ashishsharma9008
    @ashishsharma9008 3 года назад

    Which is better architecture, microservice or using single database n use sharding later when it scales?

  • @hualiang2182
    @hualiang2182 2 года назад

    Nice tutorial. Wonder in real word scenairo, is the routing layer something sits in the application code or it's implemented on the database side?

  • @itiscinnamoncafe
    @itiscinnamoncafe Год назад

    Love longer videos ❤

  • @bajtre
    @bajtre 3 года назад +1

    Great explanation!

  • @paneerlovr
    @paneerlovr 3 года назад +1

    How does one maintain redundancy in the router that maps from user ID's to a shard ? It seems to me like this creates another single point of failure.

    • @BeABetterDev
      @BeABetterDev  3 года назад

      Hi Paneer,
      Good question. Two options: 1 is master table that contains all the mappings. You are correct that this creates a single point of failure, but is useful from a mangement perspective if you ever need to re-shuffle your data distribution onto different shards. You can migrate the data and then change your pointer in the mapping table once complete.
      The other option is using a hash function where the output points to the correct shard. This can be computed on the routing layer and there will be no single point of failure. The problem with this approach is that it gets more difficult to manage reshuffling/migrations.
      Hope this helps

    • @paneerlovr
      @paneerlovr 3 года назад

      @@BeABetterDev With option 1, we could just do a simple file based load to memory that maps from user id to shard. If we ever wanted to add another shard, then we upload a new file that maps the appropriate mappings, and then signal our router service to refresh. The problem with that? Well, when we are going out to multiple shard routers, we worry about one shard router being inconsistent with another while doing the update. Option 2, you are doing it in an algorithmic way. which sounds great, but then when you add a new shard, your hash function is going from say X shards to X+1 shards. Now when we do this update multiple of these shard routers, we must again ensure that there is no inconsistency between shards as a result of having multiple shard routers with different hashing functions while doing an upgrade.
      So with either case, we run into the same set of inconsistency issues when changing the number of shards and doing what sharding is meant to help with: horizontal scaling.
      Thanks for the nice video!

  • @random-characters4162
    @random-characters4162 Год назад

    God bless you, sir ✌️

  • @thunderriffs2964
    @thunderriffs2964 2 года назад +1

    Great vid! I have a question. In massive distributed systems (more read intensive than writes) where hits to your database are really expensive and they are using some form of a caching layer which stores the most frequently accessed data - does the problem of routing go away? Because in this case any writes to the database would mean that you’re invalidating the cache, and reads are done from the caching layer, so even though you may have horizontally partitioned dbs below, they don’t really have to worry about how to route the incoming request for data? I hope my query makes sense.

    • @pratikvyas3384
      @pratikvyas3384 2 года назад

      Same question dude

    • @bmfitzgerald3
      @bmfitzgerald3 2 года назад

      Even if you have a caching layer serving most of your reads, your cache never stores everything in your DB. For this reason you will still need to solve for reading from the db whenever you have "cache misses," which means the need to retrieve the data from the correct shard still exists (and will require mapping/routing).

  • @Anonimus_13
    @Anonimus_13 2 года назад +1

    Cool video) What app do you use for drawing?

    • @BeABetterDev
      @BeABetterDev  2 года назад

      Adobe Photoshop and a Veikk drawing tablet!

  • @dushyantchaudhry4654
    @dushyantchaudhry4654 Год назад

    questions:
    1. Database is a slightly misleading term.. when we say database don't we really mean the software (RDBMS / NoSQL) that logically organises the data stored in storage SSDs?
    2. If yes are we not splitting the responsibility of the software? i,e. The data still is in the SSD library right? Just the database management software is loaded in different servers and each DBMS server given responsibility for only some of the queries.

  • @skmahaboobbasha6059
    @skmahaboobbasha6059 3 года назад

    Great vedio please make vedio on opsmanager installation on production environment

  • @loaizar95
    @loaizar95 2 года назад

    amazing video!! Understood almost everything and am not a it guy.. the only thing I did not get is the difference between partition mapping and routing :(

  • @r-rtz
    @r-rtz Год назад

    A more interesting concept though is how you generate these unique id's that are used in the sharding / partitioning and ensure uniqueness

  • @milequinze
    @milequinze 2 года назад +1

    Awesome! Thanks a lot!

  • @trantrongty8065
    @trantrongty8065 3 года назад +1

    Thank you that really helpful great video

  • @maganzo
    @maganzo Год назад

    So is sharding for relational databases only? What if database has more than 1 table?

  • @kgcpk
    @kgcpk 3 года назад

    Superb explanation 😍

  • @subhasishhalder4817
    @subhasishhalder4817 2 года назад +2

    How come I didn't find your channel before?

  • @dhriajbhandari
    @dhriajbhandari 2 года назад

    Great video. Thank you. I just have a question about routing for the determining the shards. Is it always necessary? I was thinking that you could just do modulus on the id to get the shard number instead (eg: customer_id: 12, num_of_shards = 4 so the shard would be 12 % 4 = 0). That way you don't have a single point of failure on the router. What are the downsides to this approach vs router end-point ?

    • @BeABetterDev
      @BeABetterDev  2 года назад +2

      Hi there this is a great point thanks for sharing. The problem with using modulus is that it can get difficult to change the assignment of data to shard if you need to re-shuffle your data. With a single table acting as the authority, this can be done trivially.

  • @HeavensMeat
    @HeavensMeat 3 года назад +1

    I know you have had other dynamodb videos here but would it be possible to have a more in depth video dealing with sharding in dynamodb and also utilizing this with python/boto3 vs the cli? I know it's not really the same type of sharding per se but this video reminded me that I am interested in seeing that kind of thing

    • @BeABetterDev
      @BeABetterDev  3 года назад

      Hey HeavensMeat! You're suggestion is a great idea for a new video idea, thanks you! I'll work on incorporating this into my todo list. Cheers!

  • @Tiparium_NMF
    @Tiparium_NMF 6 месяцев назад

    I love this breakdown, but it does somewhat leave me wondering when Sharding would be a good vs a bad idea. The cons seem pretty hefting in comparison to the pros.
    It would have been nice to run through a few specific different use cases and when one strategy would be better than another.

  • @studychitchat7535
    @studychitchat7535 3 года назад +1

    nice video.. Can u pls tell the software u r using for making this video

    • @BeABetterDev
      @BeABetterDev  3 года назад

      Hi there, thanks for the kind words. I am using photoshop with a drawing tablet. You can learn more about my approach here: ruclips.net/video/6Fk9xDpJhvk/видео.html

  • @فيافيالتأملمهمةإصلاح

    great explanation thank u so much

  • @legitjimmyjaylight8409
    @legitjimmyjaylight8409 2 года назад

    How about filesystem sharding?