Это видео недоступно.
Сожалеем об этом.

Implementing Vertical Sharding

Поделиться
HTML-код
  • Опубликовано: 16 авг 2024
  • System Design for SDE-2 and above: arpitbhayani.m...
    System Design for Beginners: arpitbhayani.m...
    Redis Internals: arpitbhayani.m...
    Build Your Own Redis / DNS / BitTorrent / SQLite - with CodeCrafters.
    Sign up and get 40% off - app.codecrafte...
    In the video, I explained the importance of sharding in scaling databases, focusing on vertical sharding where tables are distributed across multiple servers. I discussed the transition from monolithic to microservices architecture and how vertical sharding helps in this shift. I detailed the implementation steps of moving tables between database servers, emphasizing the use of tools like Zookeeper for storing meta information and ensuring reactive updates across API servers. The process involved dumping tables, loading them into new databases, setting up replications, and performing a seamless cutover for data consistency.
    Recommended videos and playlists
    If you liked this video, you will find the following videos and playlists helpful
    System Design: • PostgreSQL connection ...
    Designing Microservices: • Advantages of adopting...
    Database Engineering: • How nested loop, hash,...
    Concurrency In-depth: • How to write efficient...
    Research paper dissections: • The Google File System...
    Outage Dissections: • Dissecting GitHub Outa...
    Hash Table Internals: • Internal Structure of ...
    Bittorrent Internals: • Introduction to BitTor...
    Things you will find amusing
    Knowledge Base: arpitbhayani.m...
    Bookshelf: arpitbhayani.m...
    Papershelf: arpitbhayani.m...
    Other socials
    I keep writing and sharing my practical experience and learnings every day, so if you resonate then follow along. I keep it no fluff.
    LinkedIn: / arpitbhayani
    Twitter: / arpit_bhayani
    Weekly Newsletter: arpit.substack...
    Thank you for watching and supporting! it means a ton.
    I am on a mission to bring out the best engineering stories from around the world and make you all fall in
    love with engineering. If you resonate with this then follow along, I always keep it no-fluff.

Комментарии • 50

  • @d4devotion
    @d4devotion 2 года назад +2

    I have hit my head so many times understanding the sharding, but could not get it so well. But this guy never fail to explain the things in so so easy way. I am lucky that I found this channel on YT.

  • @adianimesh
    @adianimesh 2 года назад +3

    binge watching after a break ! So much quality content over the last week. Thanks a lot

  • @homestaysandcafes
    @homestaysandcafes Год назад +1

    Really grateful to God that I found this valuable gem like content on time♥️
    Never worry about views, because some gem music videos are also hidden and craps are getting 1B views

  • @jaisamtani303
    @jaisamtani303 Месяц назад

    There are actually 2 ways to do Vertical Sharding:
    1) The one which you mentioned where there is no downtime, it's real time
    2) The one where there is downtime and T2 is dumped on DB2 during downtime so as to avoid replication and further steps
    You mentioned in video at last that large tables are not sharded as the replication process for them to get in sync is difficult, for such tables we can use downtime approach. What if you come up to a situation where DB1 shard has 2/3 almost same size tables and all are hot? You can take a downtime and do vertical Sharding.
    Also for financial domain companies, this realtime vertical sharding with 10-15ms is not acceptable, they also might be using downtime approach!

  • @mukeshmahadev7419
    @mukeshmahadev7419 Год назад +2

    Arpit bhai you just rocked it, ek dum top level content with no clutter even for 1 second.
    This video filled me up with confidence that I can handle database in production.
    Started binge watching your channel.
    Keep making content Sir.
    One thought that hit me while watching this video : This type of content will catalyse the transition of India from being IT services hub to IT manufacturing hub😄

  • @vasusharma1192
    @vasusharma1192 2 года назад +5

    Maybe a dumb question but here I go
    If the table renaming step ( table to table.bak) is done after firing zookeeper update, can’t this be more helpful in reducing the small database down time ( assuming zookeeper updates happen immediately without consistency issues )
    Saying this because, if we do this, the second DB server is anyways up and will take requests and renaming can happen later … this will also ensures that the replication is completely done

    • @AsliEngineering
      @AsliEngineering  2 года назад +11

      If we update the config and then rename the table the tables will diverge i.e. the new table will get some writes and the old one will also get some writes.
      This would lead to an unresolvable conflict. For example: old table has rows till ID 100, the new table also has updates till row ID 100. Now you update the config and it takes 100 ms reflect it on all servers but one of the API server got the changes in 1ms.
      So for rest 99ms there would be a situation where both the tables are accepting the writes from a subset of API servers.
      This would lead to a divergence/conflict.
      Consider auto increment ID column. There will come a time where both the table will have two different rows with the same ID because two different API servers wrote to the tables in different databases.
      Which is why to have consistency and no conflict we taking a miniscule downtime, cutting off the traffic, and then sending the update.
      Hope that helps.

    • @vasusharma1192
      @vasusharma1192 2 года назад

      @@AsliEngineering thanks a lot for the quick reply, that clears everything… Amazing content btw, no one covers such practical aspects of things so well .. Hats off ✌🏻

  • @Aditya-us5gj
    @Aditya-us5gj 2 года назад

    Designing cannot be anymore intersting and easy when compared to your videos. Just keep those videos comming everyday !! I've already took out a slot from my day to watch your videos.

  • @6vikas
    @6vikas 2 года назад +1

    One of the best content on YT for Vertical Sharding , looking forward for Horizontal Sharding video. :). One question related to joining between 2 database tables , do we need to use host level join in case?

    • @AsliEngineering
      @AsliEngineering  2 года назад +2

      We would not join across databases. Joins would happen locally.
      Also, thank you so much for the kind words 🙌

  • @ashishtewari2162
    @ashishtewari2162 Год назад +1

    Great content Arpit. Very easy to understand,
    small doubt - Why to rename the table first then go for zookeeper config change? Why not first update the config in zookeeper then take back up the table. This will reduce the availability loss.

    • @jaisamtani303
      @jaisamtani303 Месяц назад

      Bro assume you have 50 API servers, if you update in zookeeper config first, its watcher will start updating the API servers. Assume this config update happened at 1 API Server at 1ms and at 50th API Server at 50ms, so till 50ms, 50th API server was writing to DB1. We will miss this on DB2. This is called inconsistent databases.
      Whereas if you are renaming first, then your DB1 will not have any write operation as all requests will be failing due to table not found error. Now after your Zookeeper has updated all API servers after 50ms, both DB2 and DB1 will have same data as your DB1 and DB2 were not operating during 50ms. This is called consistent database but less available database

    • @jaisamtani303
      @jaisamtani303 Месяц назад

      For Financial domain companies, consistency matters most. Assume you did a credit of 1L which was written into DB1 during 50ms. Now when DB2 is taking in request and this transaction is lost? Will you be okay with it? So for Financial domain companies, consistency is utmost important, availability is not

  • @arunrahullakkapragada2304
    @arunrahullakkapragada2304 Год назад +1

    One doubt. While copying bin log to shard 2 we record last time stamp or id till which we copied right? After that copy is done, we start replication right?
    CDC or replication service catches up the shard 2 with live updates
    What about the updates that are happening to db while we are copying the bin log?

    • @AsliEngineering
      @AsliEngineering  Год назад

      Already answered in the video. But still put some more thought and you'll get the answer on your own.

  • @raj_kundalia
    @raj_kundalia Год назад +1

    thank you!

  • @DEEPAKKUMAR-wk5pk
    @DEEPAKKUMAR-wk5pk 2 года назад

    you nailed it, man

  • @ramyakrishnan8741
    @ramyakrishnan8741 8 месяцев назад

    Thanks for an amazing video - may i know the difference between federation and vertical sharding?

  • @ujjwalsaini5830
    @ujjwalsaini5830 6 месяцев назад

    Great content. Didn't feel like skipping even for a sec. Kudos!! Also, one question - How do we go about migrating huge table from one database server to another? By huge table I am assuming that the table size is big and also there are huge number of writes happening.

    • @AsliEngineering
      @AsliEngineering  6 месяцев назад

      Migrating high write database from one to another is done in 6 broad steps.
      0. Take snapshot
      1. Load it in a new database
      2. Setup replication
      3. Let it catch up
      4. Stop the write for a fraction of second
      5. Failover

    • @ujjwalsaini5830
      @ujjwalsaini5830 6 месяцев назад

      @@AsliEngineering so the same strategy is being followed whether the table size is big or small? or Are there any alternate practices being followed to make the migration more efficient?

  • @shivamsrivastava3076
    @shivamsrivastava3076 2 года назад

    Just connecting the dots, is this the same way how we scale blob storage (S3/Azure) when data node in a bucket gets hot? :)

  • @chiragrajani1606
    @chiragrajani1606 Год назад

    What about the failed requests when we renamed the table ie `Table Not Found` part. Read requests are acceptable but those write requests will be lost, wont be that a consistency issue?

  • @aniruddhkhera510
    @aniruddhkhera510 6 месяцев назад

    Arpit, as always amazing video, thanks for sharing. I was actually planning to join your Feb cohort but couldn't enroll before the registrations got closed.
    I have some thoughts on this video, maybe I am missing something. I feel migration of table t1 from 1 db server to another with this approach is kind of over-engineering. I have done migration in my previous company, let me explain my approach.
    1. We don't need to store the metadata about which db server the table belongs to in zookeeper or any service discovery. Generally in each app server we have our DB configurations file (yaml, xml), we can add and maintain both the DB configs in that. And app server connects to both.
    2. The cutover can happen gradually with dual writes to the table in both the DB servers (simple code change). And historic data can be migrated by the db table snapshot.
    3. The final cutover can be done by maintaining a config in a remote config, which is basically WIREON/WIREOFF (WOWO) configuration, i.e. turnoff the writes to the previous db server table (example: disable.writes.to.xyz := true)
    Let me know your thoughts..

  • @vikassrivastava7081
    @vikassrivastava7081 2 года назад +1

    Indepth video! 🙏🏼

    • @vikassrivastava7081
      @vikassrivastava7081 2 года назад

      Arpit bro , can u suggest any book for beginners like me for System design alongside ur awesome videos!!

  • @vighneshmahale
    @vighneshmahale 2 года назад

    Very Informative!

  • @cnp6501
    @cnp6501 Год назад

    how is vertical sharding different from partitioning?

  • @notionmakeit2888
    @notionmakeit2888 11 месяцев назад

    how can we get your notes Please help

  • @Polly10189
    @Polly10189 Год назад

    Thanks Arpit, Allah bless you. Top notch level content.
    Have one query :
    If I have a large DB/Table for which I have indexing on some columns as well. While partitioning my data, do my indexing also got partition or I have to do manual indexing on my data partition on it's restored on different DB instance.

  • @rahulsarkar4206
    @rahulsarkar4206 Год назад

    How the watch updates config of API server? Are they connected on websocket? Dont think so generally. Please explain.

    • @AsliEngineering
      @AsliEngineering  Год назад

      these granular details I cover in my course, so cannot answer it here.

  • @kaustavdas1577
    @kaustavdas1577 5 месяцев назад +1

    Price increased 1.8 times in 1 year

    • @AsliEngineering
      @AsliEngineering  5 месяцев назад

      in 2 years. also the course has changed significantly. it is much more in depth than what I used to cover.

  • @sayantankundu4532
    @sayantankundu4532 Год назад

    Hey Arpit, Great Video . Have a doubt here
    You mentioned zookeeper watch will inform the API server when there is a change, but where will the API Server store this config information ?
    If API server is not storing the config information then with every request we need to hit the zookeeper first to get the config, which will surely add latency.

    • @AsliEngineering
      @AsliEngineering  Год назад

      You don't need to make network call everytime. Local copy of config is held at the server.

    • @sayantankundu4532
      @sayantankundu4532 Год назад

      Thank you arpit for clarifying it

  • @pranjalmishra2602
    @pranjalmishra2602 Год назад

    What if a request needs to connect to two tables present in different DB servers?

    • @AsliEngineering
      @AsliEngineering  Год назад

      what do you mean when you say "connect"?

    • @pranjalmishra2602
      @pranjalmishra2602 Год назад

      @@AsliEngineering
      I meant, there comes a request which needs some data from a table which is there in DB1 and another data from the table which is in DB2.
      I guess I'm still unclear:(

  • @debmalyapan53
    @debmalyapan53 2 года назад

    amazing

  • @AnubhavShrivastava
    @AnubhavShrivastava 2 года назад

    awesome

  • @imdsk28
    @imdsk28 Год назад

    Massive Like ❤

  • @deepadeshra7195
    @deepadeshra7195 2 года назад

    Maybe a silly question,
    but I am confused with one thing in DB sharding.
    Let's say in DB-1 there is T1 and T2. there is one relationship between T1 and T2 (foreign key relationship), and then we moved T2 to another database server DB-2. so T1 in DB1 and T2 in DB2, so in these distributed scenario how the data integrity will be maintained ?

    • @AsliEngineering
      @AsliEngineering  2 года назад +2

      You have to drop foreign keys. You cannot have cross shard foreign keys.

    • @deepadeshra7195
      @deepadeshra7195 2 года назад

      @@AsliEngineering Thank you :)