How Discord Stores TRILLIONS of Messages

Поделиться
HTML-код
  • Опубликовано: 27 сен 2024

Комментарии • 119

  • @geob7o
    @geob7o 9 месяцев назад +17

    Hats off for the on call team during these processes 😅

  • @areeburrehmankhan1166
    @areeburrehmankhan1166 Год назад +54

    Amazing content man . Also your animation are really helpful in making things more understandable. Keep up the great work.

  • @yadav117uday
    @yadav117uday Год назад +267

    you should have discussed how they moved from mongodb to cassandra, its was a bigger engineering challenge

    • @gradientO
      @gradientO Год назад +8

      Can you brief about why nosql makes sense in this context instead of relational dbs?

    • @terencepan2232
      @terencepan2232 Год назад +30

      ​@@gradientOscaling writes in relational db is hard or not as possible

    • @flygonfiasco9751
      @flygonfiasco9751 Год назад +7

      @@gradientOthere’s a lot of overhead in maintaining the relationships in a relational database and there’s overhead in ensuring strong consistency

    • @truevelvett
      @truevelvett Год назад +4

      @@terencepan2232 You can definitely shard but the operational complexity is insane

    • @kamal-xd7id
      @kamal-xd7id 9 месяцев назад

      Because its not even written in original blog post from Discord.

  • @ivanlee172
    @ivanlee172 Год назад +11

    One thing missing here is how they setup the monitoring system to have such analysis from the running system, which is also an important part. Would like to hear more about this.

  • @hdrtghkes
    @hdrtghkes 12 дней назад

    Discord doing the backend: 👹
    Discord doing the UI/UX: 🥺

  • @king0s
    @king0s Год назад +13

    my takeaways and a question:
    ScyllaDB with C++ under the hood and no GC was a part of it.
    Request coalescing is another part.
    Selecting and deciding on optimal Google Cloud services i.e Persistent disk is another part.
    The two layered RAID setup is another part.
    Modifying the Linux kernel to do write on the Persistent disks and reads from the local SSDs was another part(But I'm wondering how they made the changes to the Linux kernel of the Google VMs? anyone help me out here, do they bake the changes to a VM image and deploy the image to the Google VMs?)
    And finally for the migration using a data migrator written in memory safe and highly performant Rust was another great decision.

  • @piyushkatariya1040
    @piyushkatariya1040 Год назад +26

    The real beauty of ScyllDB lies in Seastar framework which bypasses OS kernel to achieve close to metal speed. Also, Cassandra 5.0 which will release in few months from now won't have this issues and will much higher throughput as it will be able to compile against Java 17 SDK and might also probably run Java 21 which has Generational ZGC garbage collector which gives you sub milliseconds latency. Cassandra also have the advantages of defining aggregate function in Java or JS and run in Casandra DB instance itself rather than fetching all data in application server which is super expensive.

    • @bruterasta
      @bruterasta Год назад +2

      On the blog post it was pretty clear, that they were tired of GC alltogheter.

    • @piyushkatariya1040
      @piyushkatariya1040 Год назад +2

      @@bruterasta Generational ZGC is auto tunable.

  • @flatdinos
    @flatdinos Год назад

    next month I'll migrating one of my database, it's so painful to planning and flow designing of migration, but this video help a lot.

  • @nareshgb1
    @nareshgb1 Год назад

    "discord found themselves in a bit of a pickle"....good one :)

  • @0xmmn
    @0xmmn Год назад

    kudos to make a video to the point and easy to grasp not like other channels doing long videos to hack the youtube algorithm.

  • @sharmilak5109
    @sharmilak5109 Год назад +1

    Hi your contents are worthy, it was such a meticulous way of explanation of concepts,thank you

  • @RakeshBitling
    @RakeshBitling Год назад +46

    This channel is underrated.. actually a lot of useful videos are available

    • @areeburrehmankhan1166
      @areeburrehmankhan1166 Год назад +1

      Yes dude he has taught me so much like wow .

    • @wilfredv1930
      @wilfredv1930 11 месяцев назад

      It is not underrated, for the subject and 500k subs is pretty well rated actually.
      It is a very well known resource.

  • @kiransingh2935
    @kiransingh2935 Год назад +2

    I have literally never seen a company be happy with choosing Cassandra. I know of engineering orgs who have an entire pod of SREs dedicated to nothing but keeping the Cassandra alive.

  • @JacobSamro
    @JacobSamro Год назад +2

    Scylla is a saviour ❤

  • @khiariyoussef3226
    @khiariyoussef3226 Год назад +16

    I may not have enough experience with such tasks, but how do they make sure they pick the right database that fits their performance requirements (at that scale) ?

    • @peterpeter9230
      @peterpeter9230 Год назад +18

      Sometimes these companies have experts (maybe one of the developers of such DB) but at such a scale it's mostly these companies, who test if these DBs can widthstand such a hight load. So I'd call it precision guess work. Compare existing solutions in terms of performance etc., maybe run a extensive tests and then just give it a go.

    • @NachitenRemix
      @NachitenRemix Год назад +6

      I guess they have TONS of metrics which cann give them the info they exactly need on what they need and don't need to optimizr

    • @MaulikParmar210
      @MaulikParmar210 Год назад +6

      They didn't - there are Petabyte scale deployments used by many organisations that are larger than Discord uses. Your DB is going to be as good as its integration with the system and use case. This case study only shows discord lacks expertise or is reluctant to take in advisory from experienced people. This is true for Twitter and airbnb, too, when they switched DBs, thinking it would help but didn't.
      It requires a lot of experience and architectural knowledge of DB itself to make the rught call. In reality tho who make right call are usually not in like light or advertise about their milestones. So it's not an open book that can be simply explained with diagrams while implementation details would vary a lot.
      P.S. Even large orgs make mistakes or lack skills. They sometimes need external help, and when they dont, they often run into performance issues that are talked about in the public domain.
      How they pick it? It's mostly biased towards teams experience and decision makers bias towards tech they know until they are ready to move on and do all migrations to new tech, it's always moving, changing so there's no specific reciepie for that.

    • @georgesmith9178
      @georgesmith9178 Год назад +9

      This actually follows some money logic. First you pick a tool that is good enough and FREE, like Cassandra. If the business develops slowly, then you can get by with this initial solution just fine. But, when you have a booming growth, as is the case with Discord, the free solution is not designed for that type of scale, so you start to think about high-availability, low-latency, and so on, and usually end up with some commercial solution, that may be even be based on your original free solution, but with enterprise features on top. In the case of Discord, they just picked a DB with better engine, and they leveraged a different storage solution in the cloud, ergo a better performing storage. Of course, the explosive growth guarantees money is no longer an issue - you can afford the enterprise solution and the better performing storage because customers are paying :). There is you logic :).

    • @alooooooola
      @alooooooola Год назад

      it think the fact that they changed the behavior of read and write of an SSD make the different. I dont really think changing the DB make that much different, yes it may be better but not so much without the technic of override SSD behavior. And since this is Discord, they migrated from a garbage collector language and move to a non-garbage collector language before (Go to Rust), it could also affect their choice in this case.

  • @tubenzr
    @tubenzr Год назад

    what an Excellent team to be able pull off that humongous data 🥶

  • @SimarMannSingh
    @SimarMannSingh Год назад +3

    QUICK QUESTION: I've seen this style of YT video somewhere else too, somewhere in a tech video I remember. Is there a tool you're using to make these kind of videos? Or did you (or someone you hired) edited the video yourself?

  • @zitronenlolli1
    @zitronenlolli1 Год назад +1

    great content!!

  • @gigakoresh
    @gigakoresh Год назад +4

    Inspiring story. It would be really cool to see how Telegram solves their backend challenges. They have an even greater scale, smaller team with less funding and fastest latency of any messenger. I wish they shared more of their backend, that system is engineering marvel, just like Discord.

    • @pegasusgemini6541
      @pegasusgemini6541 9 месяцев назад

      Me too, i'm really impressed on how Telegram is very responsive and optimized

  • @gus473
    @gus473 Год назад +1

    💯 Great episode! Interesting takes on this 🐘 project! 😎✌️

  • @BenThatOneGuy
    @BenThatOneGuy Год назад +2

    Quality content as always, love this channel

  • @beofonemind
    @beofonemind Год назад

    Thanks for this, very helpful.

  • @georgesmith9178
    @georgesmith9178 Год назад +1

    Awesome content. Please, tell me what you use for the graphics and animations. They are so fluid.

  • @aunghtayoo337
    @aunghtayoo337 Год назад +1

    "of course. Migration production data is no joke!". Thanks for the quality contents.

  • @bjugdbjk
    @bjugdbjk Год назад

    Could you make a video on the Data services part which written in Rust ? That will be a quite interesting to many folks !!

  • @pm71241
    @pm71241 Год назад +7

    Hmm ... As a user of ScyllaDB. This doesn't seem like the most daunting DB migration scenario, I could imagine.
    I mean... Had it been any other database than Cassandra, the task would have been magnitudes larger.

  • @ketaminefairy
    @ketaminefairy Год назад

    someone please explain more on the RAID 0 setup? What is the safety net under that? Am I missing/not understanding something?

  • @punkerIII
    @punkerIII Год назад

    Thank you for the video.

  • @dougphilips8807
    @dougphilips8807 Год назад +1

    I am confused the difference between "Request coalescing" and what your other videos call caching? Since "Request coalescing" is part of the success here I'd like to understand the difference better, thanks!

  • @delucabruno
    @delucabruno Год назад

    kudos to the Discord team 👏

  • @soorkie
    @soorkie Год назад

    I love your content. Do we have a discord server for this channel and this community? I would really love to engage more.

  • @touchwithbabu
    @touchwithbabu Год назад +1

    Love your content and respect for your efforts

  • @EdwinFairchild
    @EdwinFairchild Год назад

    is discord the only paltform having to store trillions of messages? or data for that matter? how have other companies solved it?

  • @RahulGupta_Grahul
    @RahulGupta_Grahul Год назад

    I think the boxes at ruclips.net/video/O3PwuzCvAjI/видео.htmlsi=3YXYT_VTw2QqriyG&t=343 are a bit misleading. First, probably scylla uses both persistent disks for writes and NVMe(s) for reads, so, really one box should contain both with arrows for reads and writes. Secondly, the message service is partitioned, therefore, different writes go on different instances of scylla instead of a single scylla master node or central cluster which the triangular placement of scylla in diagram suggest.

  • @seeball
    @seeball Год назад

    That's incredible

  • @mz7640
    @mz7640 Год назад

    Can't they use AWS dynamoDB?

  • @repairstudio4940
    @repairstudio4940 Год назад +1

    To the Discord devs 🥂

  • @jianruan5491
    @jianruan5491 Год назад

    cool,I want to try in my work

  • @DK-ox7ze
    @DK-ox7ze Год назад +5

    If the writes were going to the persistent disk with high latency, then how were they made immediately available (with low latency) to all the intended members of that messaging group?

    • @daviduzumaki
      @daviduzumaki Год назад

      I'm guessing data was also stored in an LRU cache when it was written to the persistent disk

    • @DK-ox7ze
      @DK-ox7ze Год назад

      @@daviduzumaki In that case they might have to again deal with data loss issues which they were facing, and the reason why they introduced persistent storage in the first place.

  • @abhishekkumar-ei5hl
    @abhishekkumar-ei5hl Год назад

    Please make video in realtime application like figma, Google docs, or multiplayer game

  • @ayotomiwasalau6373
    @ayotomiwasalau6373 Год назад

    What pipeline tool did they use for the data migration from Cassandra to Scylla?

  • @5p4rk3r
    @5p4rk3r Год назад

    how to run scylla db in the cloud??

  • @jasonguo7596
    @jasonguo7596 Год назад

    Is it ok to move older messages to a separate database periodically? So the main database would not be carrying so much data all the time.
    And have dedicated service to read messages from that database with historical messages.

  • @nishant_singh
    @nishant_singh 7 месяцев назад

    Can you make me a pro at syatem design ?? I just love this subject...

  • @TheSelectmax
    @TheSelectmax Год назад +1

    Thanks for the content. A little introduction was missing what kind of database this is and why it is better, except for C++ under the hood

  • @carlosm.1233
    @carlosm.1233 Год назад

    What software do you use to create such presentation?

  • @thememmer
    @thememmer Год назад

    What tools do you use to create these videos

  • @palkollar7739
    @palkollar7739 Год назад

    what do you use for these animations?

  • @joaoguilherme-or1ud
    @joaoguilherme-or1ud Год назад

    Show!!!!

  • @imMavenGuy
    @imMavenGuy Год назад

    9 days - are you kidding me!

  • @jerkmeo
    @jerkmeo Год назад

    That's just awesome sharing. Thanks!

  • @sriharsha580
    @sriharsha580 Год назад +2

    What is a garbage collector, how it is useful for DB’s?

    • @cherubin7th
      @cherubin7th Год назад +5

      Cassandra is written in Java, so the garbage collector is automatically part of it.

    • @RicardoSilvaTripcall
      @RicardoSilvaTripcall Год назад +2

      The Garbage Collector is responsible for checking which variables aren't more needed in the program and removing than from memory, the problem is, the Garbage Collector has to run every now and then, and this process requires processing power and will add latency to the main application running, because for the most of the time, the whole application or parts of it has to be stopped and wait for the Garbage Collector to do it's job, even though this process takes a few milliseconds, in a high throughput application this can add up and slow down the whole system.
      Languages like C++ and Rust have a "manual" memory allocation and deallocation, the programmer is responsible to take care of it before hand, so you won't need a garbage collector at runtime, what results in faster executions ...

    • @zakk6182
      @zakk6182 Год назад

      @@RicardoSilvaTripcallappreciate this

  • @2xKTfc
    @2xKTfc Год назад

    So much effort for messages that nobody can ever find again anyway 😂Seriously, Discord scrollback might as well be write-only it's so unusable. :(

  • @RohanDas23
    @RohanDas23 Год назад

    How do they store TRILLIONS of messages? Like everyone else, they use DATABASE. Hows that any different of any other services?

  • @nareshb5
    @nareshb5 Год назад +2

    1st viewer😂

  • @leonchen8317
    @leonchen8317 Год назад

    Brother, I love your content, but please, your tone is too flat. It makes me sleepy

  • @zl7289
    @zl7289 Год назад +5

    I’m just curious, what tools are you using to make this beautiful diagram and fancy effects 😮❤

  • @ariganeri
    @ariganeri Год назад +17

    Really good content. Just one point - writes go to both the local SSDs and the persistent disks at the same time. Write-mostly in Linux md means that reads go to the other disks, so in this case the local SSDs.

    • @locfuho3899
      @locfuho3899 Год назад

      I don't think that is needed in case of sending and receiving messages, as long as data sync from the write disk come in order, we only need eventual consistency.
      For example, in a group chat, every body sending and receiving messages, right?
      As long as the messages stored correctly in the write DB, they can eventually be synced to the read DBs.
      I mean there is no hard consistency here.

  • @dirac7233
    @dirac7233 Год назад +2

    Scylla is basically a C++ version of Cassandra. This story is one more proof of java's inferiority. Why are people still using Java ?? 🤷🏽‍♂️

    • @sergiocoder
      @sergiocoder 3 месяца назад +1

      Because they can't learn C++ :)

  • @nexus888
    @nexus888 2 месяца назад

    If Discord would only create a good UX. Having may servers is a nightmare to go through..

  • @javisartdesign
    @javisartdesign Год назад +1

    Cool, interesting to watch! learning all the time!

  • @rolina.azmitiamedrano5137
    @rolina.azmitiamedrano5137 Год назад

    👀👌

  • @rhaba
    @rhaba 8 месяцев назад

    Is it possible that there's a typo at 4:48 and following? There are two md0 devices.

  • @vaddimka
    @vaddimka Год назад

    Well it doesn't actually tell HOW it stores trillions of messages (like partitioning, cache organization, hot path issues), just a high level overview of the migration and dbms

  • @sagiajaj17
    @sagiajaj17 Год назад

    How is their solution different from RAID10? It is essentially raid1 array mirroring a raid0 set. Am i missing something?

  • @chrishabgood8900
    @chrishabgood8900 Год назад

    Joy != rust

  • @leomysky
    @leomysky Год назад +1

    Thank you for the video!

  • @pegasusgemini6541
    @pegasusgemini6541 9 месяцев назад

    Waouh!

  • @RahulGupta_Grahul
    @RahulGupta_Grahul Год назад

    Nice explainer. Can you also please link the OG blogpost in the description?

  • @jerryking4777
    @jerryking4777 Год назад

    wow, can i ask what after effect template is used for presentation? I would like to present my work in school.

  • @RisalFajar
    @RisalFajar Год назад

    When there's a new message or edited old message, how do the app receive the changes? Is it by listening to data on the DB (like a Web Socket)?

  • @carlosharris2681
    @carlosharris2681 Год назад

    Friendly programming noob here: Can someone explain why Garbage Collection-free gives Scylla an advantage over Cassandra? Thanks in advanced.

    • @sergiocoder
      @sergiocoder 3 месяца назад

      To my understanding, without a garbage collector there are no stop-the-world pauses to collect the garbage in the background and therefore more computing resources are spent on the actual work of DB.

  • @NapoleonPosada
    @NapoleonPosada Год назад +1

    Amazing. I have read about the features of Scylla but this a definitive proof that is an excellent database.

    • @ankeshkapil3129
      @ankeshkapil3129 Год назад

      But costly to run

    • @hagaiak
      @hagaiak 8 месяцев назад

      @@ankeshkapil3129 How is a more efficient database more costly to run? The more performant a DB is, the cheaper it is to run, even if you're talking about small workloads

  • @luckylove72
    @luckylove72 Год назад

    So they didn’t plan to have data warehouse?

  • @willianrocha8615
    @willianrocha8615 Год назад

    Only 9 days what a huge achievement

  • @blasttrash
    @blasttrash Год назад

    3:28 is that like a cache on query parameters?

  • @AymanDevops
    @AymanDevops Год назад

    An amazing video thanks man,
    I just have a question about the Data service part, isn't similar to redis or CDN or anyother cache mechanism? I'm just wondered why they called this name and what it does exactly

    • @lord_nikon_010
      @lord_nikon_010 Год назад +1

      I believe is not like a CDN or Redis. I understand that is an API acting as an interface between the Monolith and the database, decoupling one form another - which allowed to use a language (Rust) more efficient to handle the high-performance data operations.

  • @quang.luu.179
    @quang.luu.179 Год назад

    excellent conten.

  • @shinmini99
    @shinmini99 Год назад

    thanks for content :)

  • @bjugdbjk
    @bjugdbjk Год назад

    Rust is a superstar !!

  • @MinhHoang-fu7tt
    @MinhHoang-fu7tt Год назад

    Amazing topic!!!

  • @shpluk
    @shpluk Год назад +2

    I can agree it is cool technology wise, but 99.9% of discord messages are garbage, and who will ever scroll back to see any of them?
    I admire the engineering effort it just seems to me kind of wasted effort

    • @bluebird3131
      @bluebird3131 Год назад

      Not true at all. If you have message garbage on Discord, it your server choice you have to question, not the tool or the others... 🥱