How Reddit designed their metadata store to serve 100k req/sec at p99 of 17ms

Поделиться
HTML-код
  • Опубликовано: 26 июл 2024
  • System Design for SDE-2 and above: arpitbhayani.me/masterclass
    System Design for Beginners: arpitbhayani.me/sys-design
    Redis Internals: arpitbhayani.me/redis
    Build Your Own Redis / DNS / BitTorrent / SQLite - with CodeCrafters.
    Sign up and get 40% off - app.codecrafters.io/join?via=...
    Recommended videos and playlists
    If you liked this video, you will find the following videos and playlists helpful
    System Design: • PostgreSQL connection ...
    Designing Microservices: • Advantages of adopting...
    Database Engineering: • How nested loop, hash,...
    Concurrency In-depth: • How to write efficient...
    Research paper dissections: • The Google File System...
    Outage Dissections: • Dissecting GitHub Outa...
    Hash Table Internals: • Internal Structure of ...
    Bittorrent Internals: • Introduction to BitTor...
    Things you will find amusing
    Knowledge Base: arpitbhayani.me/knowledge-base
    Bookshelf: arpitbhayani.me/bookshelf
    Papershelf: arpitbhayani.me/papershelf
    Other socials
    I keep writing and sharing my practical experience and learnings every day, so if you resonate then follow along. I keep it no fluff.
    LinkedIn: / arpitbhayani
    Twitter: / arpit_bhayani
    Weekly Newsletter: arpit.substack.com
    Thank you for watching and supporting! it means a ton.
    I am on a mission to bring out the best engineering stories from around the world and make you all fall in
    love with engineering. If you resonate with this then follow along, I always keep it no-fluff.
  • НаукаНаука

Комментарии • 61

  • @richi12345678910
    @richi12345678910 2 месяца назад +4

    The Kafka CDC can solve the problem of synchronous write inconsistencies, but not the backfill overwrriting. I suspect they might do some kind of business logic or SHA/checksum validation to ensure that they are not overwriting the data during backfilling. Correct me if I'm missing something bro.

  • @vinayak_98
    @vinayak_98 2 месяца назад +1

    Hey Arpit, thanks a lot for putting this up. Your writing skills are next level, crisp and crystal clear. Could you please tell what's the setup you use for taking these notes?
    Thanks in advance.

  • @nextgodlevel4056
    @nextgodlevel4056 2 месяца назад +17

    Successfully ruined my upcoming weekend. Have to view all of your videos now 😢

  • @prtk2329
    @prtk2329 2 месяца назад

    Nice, very informative 👍

  • @raj_kundalia
    @raj_kundalia Месяц назад

    Thank you for doing this!

  • @ankitpandey3724
    @ankitpandey3724 2 месяца назад +23

    Weekend party ❌ Asli Engineering ✅

  • @AayushThokchom
    @AayushThokchom 2 месяца назад +2

    Large Data Migration -> Event Driven Architecture
    Also, interesting to learn about postgres's extentions which are not required if going with a serverless database solution like DynamoDB.

  • @mehulgupta8218
    @mehulgupta8218 2 месяца назад +2

    As usual great stuff🙌🏻

  • @GSTGST-dw5rf
    @GSTGST-dw5rf Месяц назад

    Отлично, что на RUclips есть такие полезные видео. Спасибо, Министр!

  • @user-ob1zi3jc1r
    @user-ob1zi3jc1r 2 месяца назад

    how many shards were used to hold those partitions to achieve that much throughput

  • @namanjoshi5089
    @namanjoshi5089 2 месяца назад

    Amajeeng

  • @JardaniJovonovich192
    @JardaniJovonovich192 2 месяца назад

    I guess we don't need both the CDC Setup and dual writes, just thr CDC setup would suffice to insert the data in the new DB, correct?

  • @sachinmalik5837
    @sachinmalik5837 2 месяца назад +3

    Hi Arpit, I think you could have gone a bit more into depth, like they have mentioned in their blog. a bit about how they are using incrementing post_id, which allows them to manage most of the query from 1 partition only. Not complaining at all. Thanks for being awesome as always.
    TLDR; 7 minutes seem a bit short

    • @AsliEngineering
      @AsliEngineering  2 месяца назад +2

      I deliberately skipped it because it would have taken 4 more minutes of explanation. I experimented in this video by keeping it very surface level and around 8 min mark, and the retention for this one is through the roof 😅
      In the last 15 videos I saw a massive drop in retention numbers when I started explaining the implementation nuances or when video length went beyond 8 minutes. So I wanted to experiment and test out the hypothesis in this one video.
      Hence you see I did not even inject the ad of my courses or the intro. Jumped right in the topic.
      But yes, given their IDs are monotonic, their batch gets for media metadata would almost always hit the single partition if they partition the data by ranges.

    • @SOMEWORKS456
      @SOMEWORKS456 2 месяца назад +2

      Would appreciate if you can make another 8 min video for details.
      I am here for the meat. Surface level stuff is ok. But meat.
      No complains, just stating the opinions of a random lurker.

    • @amoghyermalkar5502
      @amoghyermalkar5502 2 месяца назад +1

      @@AsliEngineering so in short your focus is more viewers instead of better video quality right?

    • @AsliEngineering
      @AsliEngineering  2 месяца назад +1

      @@amoghyermalkar5502 if you really feel like it then please check out other videos on my channel.
      I go in depth that other people cannot even think about or comprehend.
      Remember, it hurts to put in effort of 2 days on a video to be seen by just 2000 people in 7 days.

    • @sachinmalik5837
      @sachinmalik5837 2 месяца назад +1

      @@AsliEngineering Absolutely. I can understand that, without naming names there are so many "Tech" Creator who are getting 10x times the views we got here but they just never seem to talk about Substance.
      I just want to you know we do appreciate it a lot. I am still trying to read more blogs on my own so I don't think I am being spoon fed for watching a video,

  • @guytonedhai
    @guytonedhai 2 месяца назад +1

    Asli Engineering!

  • @poojanpatel2437
    @poojanpatel2437 2 месяца назад +1

    Good video. Would appreciate a lot it you can attach any resources you used in video like blog from reddit that is mentioned in description. Would be great if link is also attached there.

    • @AsliEngineering
      @AsliEngineering  2 месяца назад +1

      Already added in the description. You are just a scroll away from finding that out.

  • @dreamerchawla
    @dreamerchawla 2 месяца назад

    Hey Arpit… thanks for the video
    I liked doing partition as policy that runs on a cron. But wouldn’t moving data around in partitions also warrant a change in backend(read) ?
    Or you are saying the backend has been written in a way that it takes partitioning into account while reading the data?

    • @AsliEngineering
      @AsliEngineering  2 месяца назад

      They use range based partitioning so no repartitioning required.

    • @AsliEngineering
      @AsliEngineering  2 месяца назад

      They use range based partitioning so no repartitioning required. New range every 90 milion IDs.

  • @karanhotwani5179
    @karanhotwani5179 Месяц назад

    Nice. Kafka part seemed over engineering. Can just verify the hash before writing to new metadata db in syncing phase.

  • @keshavb2896
    @keshavb2896 2 месяца назад +1

    Why reddit don't go for document db for there storage as per structure and pattern .... What u think about it @arpit?

    • @AsliEngineering
      @AsliEngineering  2 месяца назад +2

      according to me familiarity of stack could be the biggest reason.
      Apart from this the query pattern here is that most request hits single partition (given IDs are monotonically increasing and partitions are created on range basis). Most KV stores do hash based partitioning because of which the lookups need to be fanned out across the shards which is quite expensive.
      The databases that do support range based lookup on per shard is DDB and that managed offering at scale becomes very expensive.
      These are some of the pointers I could think of. But again this is pure guess.

  • @kunalyadav4776
    @kunalyadav4776 29 дней назад +1

    Since they are storing data in JSON and also scaling the postgres db, Why they did not go with non -relational db like mongoDB, which stores data in JSON and also provide scaling out of the box ?

    • @AsliEngineering
      @AsliEngineering  29 дней назад

      I have already answered this in some other comment, do go through that.
      TLDR; operational expertise and familiarity seems to have taken precedence.

  • @pixiedust56
    @pixiedust56 2 месяца назад +1

    Arpit - using cdc and kafka.. that still does not solve the problem of - Data from old source during 'migration' overriding data in the new aurora postgres, right?
    What am i missing?
    You will still need a bulk batch job that takes up all the archival data from all the multiple sources and ingests them into the new Aurora. Using CDC does not solve for that backport, correct?

    • @2020goro
      @2020goro 2 месяца назад +1

      CDC can transfer the historical data using snapshots of the existing databases if/when the transaction log is not available for old data, and then the consumers report any write conflicts into a separate table which the devs can remediate later on. Hope that answers your question

    • @AsliEngineering
      @AsliEngineering  2 месяца назад +1

      The consumers of Kafka have this responsibility. It is not that just adding Kafka solved the problem. The core conflict management is written in the consumer of it which checks and sets in the database.

  • @anurag-vishwakarma
    @anurag-vishwakarma 2 месяца назад

    Can u share the notes ? Pls

  • @nextgodlevel4056
    @nextgodlevel4056 2 месяца назад +2

    How pg bounce minimizes the cost of creating a new process for each request?
    May be I am wrong, can you tell me how cost is reducing here?.

    • @niteshlohchab9219
      @niteshlohchab9219 2 месяца назад +2

      let's say each connection spawns a new process. killing a connection kills the process.. what would you do logically? Think before reading the next line..
      simply re-use the connections .. that's what every database proxy in front usually does in simple words.. the connections are re-used and managed accordingly.

    • @nextgodlevel4056
      @nextgodlevel4056 2 месяца назад

      @@niteshlohchab9219 Thanks, bro, for the easy and simple explanation; I appreciate it. What I was thinking was that the term "cost" is used for money, but I was wrong. Here, "cost" means scalability and performance, ensuring that each client gets a response as quickly as possible. So, in terms of money, we increase the cost, and in terms of scalability and performance, we decrease the cost.
      If we look at it for the long term in enterprise applications, having a scalable product also increases revenue.
      Let me know If I am correct or not 🙃

    • @AsliEngineering
      @AsliEngineering  2 месяца назад +1

      Because it does connection pooling, so connections are reused.

  • @ramannanda
    @ramannanda Месяц назад

    postgres is the king still :) with extensions it is all you need...

  • @bhumit070
    @bhumit070 2 месяца назад

    Dayumm

  • @suhanijain5026
    @suhanijain5026 2 месяца назад +1

    why are they using postgres, if they are storing it as json ?

    • @AsliEngineering
      @AsliEngineering  2 месяца назад +1

      Stack familiarity, plus range based partitioning support.

  • @code-master
    @code-master 2 месяца назад

    How will you handle search, because the relevant data might be several days older partitions. Even if they're using a secondary data store, the date/time range-based partitioning or even sharding will not suffice. what do you think?

    • @AsliEngineering
      @AsliEngineering  2 месяца назад +1

      Why would it be several days older? The migration was a one time activity.
      Post the switch the writes of media metadata is always going to the unified media metadata store.

    • @code-master
      @code-master 2 месяца назад

      Thank you for your response, Apologies, my question was not clear.
      My question was related more related to searching through such a data store where the data is partitioned daily i.e. partitioned on the created_at.
      Let’s say you search for an 'X term' post, and the result ideally will contain a lookup from several partitions. For example, if there is a relevant post from a year back. We are looking at many partitions. To build the search result, the DB has to load each daily partition.
      Daily partition will work well if the lookup is limited to a couple of days back. That’s my understanding.

  • @calvindsouza1305
    @calvindsouza1305 2 месяца назад +1

    What is used over here to write down the notes?

  • @LeoLeo-nx5gi
    @LeoLeo-nx5gi 2 месяца назад

    Thanks Arpit!! Also what are your thoughts about using Pandas as a metadata DB, Dropbox had a post regarding they using Pandas wherein they explained in depth why other DBs are not better for them. (Would like to know your views too on it)

    • @AsliEngineering
      @AsliEngineering  2 месяца назад

      I am not aware about this. Let me take a look.

  • @atanusikder4217
    @atanusikder4217 2 месяца назад

    What is CDC mentioned here ? Please suggest some pointers

  • @myjourney7713
    @myjourney7713 2 месяца назад

    How did they check if the reads from old vs new database are same?

    • @AsliEngineering
      @AsliEngineering  2 месяца назад

      a simple diff would work given that the final JSON has to be the same as no changes were made to the client.

    • @myjourney7713
      @myjourney7713 2 месяца назад

      @@AsliEngineering if there is any issue at scale, wouldn't it be very hard to debug?

  • @LeftBoot
    @LeftBoot 2 месяца назад

    How can I use AI to make this sound like my native language?

  • @anand.garlapati
    @anand.garlapati 2 месяца назад

    Are you saying the reddit has unified database per region?

    • @AsliEngineering
      @AsliEngineering  2 месяца назад

      Unified database implies that the data that was split across multiple services has been moved to one database.
      Now this unified one can be replicated across regions to improve client side response times.

    • @anand.garlapati
      @anand.garlapati 2 месяца назад

      What was the motivation to go for dedicated database per service initially by the Reddit? Could you please tell how many such services they have it?
      Regarding your second point, the Reddit team allowed only reads from the replicated databases and not writes. Correct?

  • @shalinikandwal7845
    @shalinikandwal7845 2 месяца назад

    what is CDC?

    • @AbhisarMohapatra
      @AbhisarMohapatra 2 месяца назад +1

      Change Data Capture ... means streaming of bin log files of database