Really good content. Just one point - writes go to both the local SSDs and the persistent disks at the same time. Write-mostly in Linux md means that reads go to the other disks, so in this case the local SSDs.
I don't think that is needed in case of sending and receiving messages, as long as data sync from the write disk come in order, we only need eventual consistency. For example, in a group chat, every body sending and receiving messages, right? As long as the messages stored correctly in the write DB, they can eventually be synced to the read DBs. I mean there is no hard consistency here.
One thing missing here is how they setup the monitoring system to have such analysis from the running system, which is also an important part. Would like to hear more about this.
my takeaways and a question: ScyllaDB with C++ under the hood and no GC was a part of it. Request coalescing is another part. Selecting and deciding on optimal Google Cloud services i.e Persistent disk is another part. The two layered RAID setup is another part. Modifying the Linux kernel to do write on the Persistent disks and reads from the local SSDs was another part(But I'm wondering how they made the changes to the Linux kernel of the Google VMs? anyone help me out here, do they bake the changes to a VM image and deploy the image to the Google VMs?) And finally for the migration using a data migrator written in memory safe and highly performant Rust was another great decision.
The real beauty of ScyllDB lies in Seastar framework which bypasses OS kernel to achieve close to metal speed. Also, Cassandra 5.0 which will release in few months from now won't have this issues and will much higher throughput as it will be able to compile against Java 17 SDK and might also probably run Java 21 which has Generational ZGC garbage collector which gives you sub milliseconds latency. Cassandra also have the advantages of defining aggregate function in Java or JS and run in Casandra DB instance itself rather than fetching all data in application server which is super expensive.
@@ankeshkapil3129 How is a more efficient database more costly to run? The more performant a DB is, the cheaper it is to run, even if you're talking about small workloads
QUICK QUESTION: I've seen this style of YT video somewhere else too, somewhere in a tech video I remember. Is there a tool you're using to make these kind of videos? Or did you (or someone you hired) edited the video yourself?
I may not have enough experience with such tasks, but how do they make sure they pick the right database that fits their performance requirements (at that scale) ?
Sometimes these companies have experts (maybe one of the developers of such DB) but at such a scale it's mostly these companies, who test if these DBs can widthstand such a hight load. So I'd call it precision guess work. Compare existing solutions in terms of performance etc., maybe run a extensive tests and then just give it a go.
They didn't - there are Petabyte scale deployments used by many organisations that are larger than Discord uses. Your DB is going to be as good as its integration with the system and use case. This case study only shows discord lacks expertise or is reluctant to take in advisory from experienced people. This is true for Twitter and airbnb, too, when they switched DBs, thinking it would help but didn't. It requires a lot of experience and architectural knowledge of DB itself to make the rught call. In reality tho who make right call are usually not in like light or advertise about their milestones. So it's not an open book that can be simply explained with diagrams while implementation details would vary a lot. P.S. Even large orgs make mistakes or lack skills. They sometimes need external help, and when they dont, they often run into performance issues that are talked about in the public domain. How they pick it? It's mostly biased towards teams experience and decision makers bias towards tech they know until they are ready to move on and do all migrations to new tech, it's always moving, changing so there's no specific reciepie for that.
This actually follows some money logic. First you pick a tool that is good enough and FREE, like Cassandra. If the business develops slowly, then you can get by with this initial solution just fine. But, when you have a booming growth, as is the case with Discord, the free solution is not designed for that type of scale, so you start to think about high-availability, low-latency, and so on, and usually end up with some commercial solution, that may be even be based on your original free solution, but with enterprise features on top. In the case of Discord, they just picked a DB with better engine, and they leveraged a different storage solution in the cloud, ergo a better performing storage. Of course, the explosive growth guarantees money is no longer an issue - you can afford the enterprise solution and the better performing storage because customers are paying :). There is you logic :).
it think the fact that they changed the behavior of read and write of an SSD make the different. I dont really think changing the DB make that much different, yes it may be better but not so much without the technic of override SSD behavior. And since this is Discord, they migrated from a garbage collector language and move to a non-garbage collector language before (Go to Rust), it could also affect their choice in this case.
I have literally never seen a company be happy with choosing Cassandra. I know of engineering orgs who have an entire pod of SREs dedicated to nothing but keeping the Cassandra alive.
I am confused the difference between "Request coalescing" and what your other videos call caching? Since "Request coalescing" is part of the success here I'd like to understand the difference better, thanks!
Inspiring story. It would be really cool to see how Telegram solves their backend challenges. They have an even greater scale, smaller team with less funding and fastest latency of any messenger. I wish they shared more of their backend, that system is engineering marvel, just like Discord.
If the writes were going to the persistent disk with high latency, then how were they made immediately available (with low latency) to all the intended members of that messaging group?
@@daviduzumaki In that case they might have to again deal with data loss issues which they were facing, and the reason why they introduced persistent storage in the first place.
Hmm ... As a user of ScyllaDB. This doesn't seem like the most daunting DB migration scenario, I could imagine. I mean... Had it been any other database than Cassandra, the task would have been magnitudes larger.
An amazing video thanks man, I just have a question about the Data service part, isn't similar to redis or CDN or anyother cache mechanism? I'm just wondered why they called this name and what it does exactly
I believe is not like a CDN or Redis. I understand that is an API acting as an interface between the Monolith and the database, decoupling one form another - which allowed to use a language (Rust) more efficient to handle the high-performance data operations.
Is it ok to move older messages to a separate database periodically? So the main database would not be carrying so much data all the time. And have dedicated service to read messages from that database with historical messages.
I think the boxes at ruclips.net/video/O3PwuzCvAjI/видео.htmlsi=3YXYT_VTw2QqriyG&t=343 are a bit misleading. First, probably scylla uses both persistent disks for writes and NVMe(s) for reads, so, really one box should contain both with arrows for reads and writes. Secondly, the message service is partitioned, therefore, different writes go on different instances of scylla instead of a single scylla master node or central cluster which the triangular placement of scylla in diagram suggest.
Well it doesn't actually tell HOW it stores trillions of messages (like partitioning, cache organization, hot path issues), just a high level overview of the migration and dbms
To my understanding, without a garbage collector there are no stop-the-world pauses to collect the garbage in the background and therefore more computing resources are spent on the actual work of DB.
The Garbage Collector is responsible for checking which variables aren't more needed in the program and removing than from memory, the problem is, the Garbage Collector has to run every now and then, and this process requires processing power and will add latency to the main application running, because for the most of the time, the whole application or parts of it has to be stopped and wait for the Garbage Collector to do it's job, even though this process takes a few milliseconds, in a high throughput application this can add up and slow down the whole system. Languages like C++ and Rust have a "manual" memory allocation and deallocation, the programmer is responsible to take care of it before hand, so you won't need a garbage collector at runtime, what results in faster executions ...
I can agree it is cool technology wise, but 99.9% of discord messages are garbage, and who will ever scroll back to see any of them? I admire the engineering effort it just seems to me kind of wasted effort
Hats off for the on call team during these processes 😅
This channel is underrated.. actually a lot of useful videos are available
Yes dude he has taught me so much like wow .
It is not underrated, for the subject and 500k subs is pretty well rated actually.
It is a very well known resource.
Amazing content man . Also your animation are really helpful in making things more understandable. Keep up the great work.
Really good content. Just one point - writes go to both the local SSDs and the persistent disks at the same time. Write-mostly in Linux md means that reads go to the other disks, so in this case the local SSDs.
I don't think that is needed in case of sending and receiving messages, as long as data sync from the write disk come in order, we only need eventual consistency.
For example, in a group chat, every body sending and receiving messages, right?
As long as the messages stored correctly in the write DB, they can eventually be synced to the read DBs.
I mean there is no hard consistency here.
you should have discussed how they moved from mongodb to cassandra, its was a bigger engineering challenge
Can you brief about why nosql makes sense in this context instead of relational dbs?
@@gradientOscaling writes in relational db is hard or not as possible
@@gradientOthere’s a lot of overhead in maintaining the relationships in a relational database and there’s overhead in ensuring strong consistency
@@terencepan2232 You can definitely shard but the operational complexity is insane
Because its not even written in original blog post from Discord.
One thing missing here is how they setup the monitoring system to have such analysis from the running system, which is also an important part. Would like to hear more about this.
my takeaways and a question:
ScyllaDB with C++ under the hood and no GC was a part of it.
Request coalescing is another part.
Selecting and deciding on optimal Google Cloud services i.e Persistent disk is another part.
The two layered RAID setup is another part.
Modifying the Linux kernel to do write on the Persistent disks and reads from the local SSDs was another part(But I'm wondering how they made the changes to the Linux kernel of the Google VMs? anyone help me out here, do they bake the changes to a VM image and deploy the image to the Google VMs?)
And finally for the migration using a data migrator written in memory safe and highly performant Rust was another great decision.
Those animations look sick!!🔥🔥🔥🔥
The real beauty of ScyllDB lies in Seastar framework which bypasses OS kernel to achieve close to metal speed. Also, Cassandra 5.0 which will release in few months from now won't have this issues and will much higher throughput as it will be able to compile against Java 17 SDK and might also probably run Java 21 which has Generational ZGC garbage collector which gives you sub milliseconds latency. Cassandra also have the advantages of defining aggregate function in Java or JS and run in Casandra DB instance itself rather than fetching all data in application server which is super expensive.
On the blog post it was pretty clear, that they were tired of GC alltogheter.
@@bruterasta Generational ZGC is auto tunable.
I’m just curious, what tools are you using to make this beautiful diagram and fancy effects 😮❤
Hi your contents are worthy, it was such a meticulous way of explanation of concepts,thank you
Amazing. I have read about the features of Scylla but this a definitive proof that is an excellent database.
But costly to run
@@ankeshkapil3129 How is a more efficient database more costly to run? The more performant a DB is, the cheaper it is to run, even if you're talking about small workloads
kudos to make a video to the point and easy to grasp not like other channels doing long videos to hack the youtube algorithm.
next month I'll migrating one of my database, it's so painful to planning and flow designing of migration, but this video help a lot.
QUICK QUESTION: I've seen this style of YT video somewhere else too, somewhere in a tech video I remember. Is there a tool you're using to make these kind of videos? Or did you (or someone you hired) edited the video yourself?
I may not have enough experience with such tasks, but how do they make sure they pick the right database that fits their performance requirements (at that scale) ?
Sometimes these companies have experts (maybe one of the developers of such DB) but at such a scale it's mostly these companies, who test if these DBs can widthstand such a hight load. So I'd call it precision guess work. Compare existing solutions in terms of performance etc., maybe run a extensive tests and then just give it a go.
I guess they have TONS of metrics which cann give them the info they exactly need on what they need and don't need to optimizr
They didn't - there are Petabyte scale deployments used by many organisations that are larger than Discord uses. Your DB is going to be as good as its integration with the system and use case. This case study only shows discord lacks expertise or is reluctant to take in advisory from experienced people. This is true for Twitter and airbnb, too, when they switched DBs, thinking it would help but didn't.
It requires a lot of experience and architectural knowledge of DB itself to make the rught call. In reality tho who make right call are usually not in like light or advertise about their milestones. So it's not an open book that can be simply explained with diagrams while implementation details would vary a lot.
P.S. Even large orgs make mistakes or lack skills. They sometimes need external help, and when they dont, they often run into performance issues that are talked about in the public domain.
How they pick it? It's mostly biased towards teams experience and decision makers bias towards tech they know until they are ready to move on and do all migrations to new tech, it's always moving, changing so there's no specific reciepie for that.
This actually follows some money logic. First you pick a tool that is good enough and FREE, like Cassandra. If the business develops slowly, then you can get by with this initial solution just fine. But, when you have a booming growth, as is the case with Discord, the free solution is not designed for that type of scale, so you start to think about high-availability, low-latency, and so on, and usually end up with some commercial solution, that may be even be based on your original free solution, but with enterprise features on top. In the case of Discord, they just picked a DB with better engine, and they leveraged a different storage solution in the cloud, ergo a better performing storage. Of course, the explosive growth guarantees money is no longer an issue - you can afford the enterprise solution and the better performing storage because customers are paying :). There is you logic :).
it think the fact that they changed the behavior of read and write of an SSD make the different. I dont really think changing the DB make that much different, yes it may be better but not so much without the technic of override SSD behavior. And since this is Discord, they migrated from a garbage collector language and move to a non-garbage collector language before (Go to Rust), it could also affect their choice in this case.
Scylla is a saviour ❤
Cool, interesting to watch! learning all the time!
Awesome content. Please, tell me what you use for the graphics and animations. They are so fluid.
"discord found themselves in a bit of a pickle"....good one :)
great content!!
💯 Great episode! Interesting takes on this 🐘 project! 😎✌️
Thank you for the video!
Nice explainer. Can you also please link the OG blogpost in the description?
Thanks for this, very helpful.
I have literally never seen a company be happy with choosing Cassandra. I know of engineering orgs who have an entire pod of SREs dedicated to nothing but keeping the Cassandra alive.
what an Excellent team to be able pull off that humongous data 🥶
Is it possible that there's a typo at 4:48 and following? There are two md0 devices.
Discord doing the backend: 👹
Discord doing the UI/UX: 🥺
Quality content as always, love this channel
Thank you for the video.
someone please explain more on the RAID 0 setup? What is the safety net under that? Am I missing/not understanding something?
Could you make a video on the Data services part which written in Rust ? That will be a quite interesting to many folks !!
I love your content. Do we have a discord server for this channel and this community? I would really love to engage more.
3:28 is that like a cache on query parameters?
Only 9 days what a huge achievement
"of course. Migration production data is no joke!". Thanks for the quality contents.
I am confused the difference between "Request coalescing" and what your other videos call caching? Since "Request coalescing" is part of the success here I'd like to understand the difference better, thanks!
Amazing topic!!!
wow, can i ask what after effect template is used for presentation? I would like to present my work in school.
What tools do you use to create these videos
kudos to the Discord team 👏
Inspiring story. It would be really cool to see how Telegram solves their backend challenges. They have an even greater scale, smaller team with less funding and fastest latency of any messenger. I wish they shared more of their backend, that system is engineering marvel, just like Discord.
Me too, i'm really impressed on how Telegram is very responsive and optimized
What pipeline tool did they use for the data migration from Cassandra to Scylla?
Love your content and respect for your efforts
What software do you use to create such presentation?
is discord the only paltform having to store trillions of messages? or data for that matter? how have other companies solved it?
I think quite a few like meta companies products : whatsapp, insta, fb. Google : google fpm, gmail, google ads, youtube
what do you use for these animations?
If the writes were going to the persistent disk with high latency, then how were they made immediately available (with low latency) to all the intended members of that messaging group?
I'm guessing data was also stored in an LRU cache when it was written to the persistent disk
@@daviduzumaki In that case they might have to again deal with data loss issues which they were facing, and the reason why they introduced persistent storage in the first place.
Hmm ... As a user of ScyllaDB. This doesn't seem like the most daunting DB migration scenario, I could imagine.
I mean... Had it been any other database than Cassandra, the task would have been magnitudes larger.
Why?
Rust is a superstar !!
An amazing video thanks man,
I just have a question about the Data service part, isn't similar to redis or CDN or anyother cache mechanism? I'm just wondered why they called this name and what it does exactly
I believe is not like a CDN or Redis. I understand that is an API acting as an interface between the Monolith and the database, decoupling one form another - which allowed to use a language (Rust) more efficient to handle the high-performance data operations.
That's just awesome sharing. Thanks!
Is it ok to move older messages to a separate database periodically? So the main database would not be carrying so much data all the time.
And have dedicated service to read messages from that database with historical messages.
excellent conten.
How is their solution different from RAID10? It is essentially raid1 array mirroring a raid0 set. Am i missing something?
I think the boxes at ruclips.net/video/O3PwuzCvAjI/видео.htmlsi=3YXYT_VTw2QqriyG&t=343 are a bit misleading. First, probably scylla uses both persistent disks for writes and NVMe(s) for reads, so, really one box should contain both with arrows for reads and writes. Secondly, the message service is partitioned, therefore, different writes go on different instances of scylla instead of a single scylla master node or central cluster which the triangular placement of scylla in diagram suggest.
When there's a new message or edited old message, how do the app receive the changes? Is it by listening to data on the DB (like a Web Socket)?
That's incredible
If Discord would only create a good UX. Having may servers is a nightmare to go through..
Thanks for the content. A little introduction was missing what kind of database this is and why it is better, except for C++ under the hood
thanks for content :)
To the Discord devs 🥂
So they didn’t plan to have data warehouse?
cool,I want to try in my work
Please make video in realtime application like figma, Google docs, or multiplayer game
Can't they use AWS dynamoDB?
how to run scylla db in the cloud??
Well it doesn't actually tell HOW it stores trillions of messages (like partitioning, cache organization, hot path issues), just a high level overview of the migration and dbms
Friendly programming noob here: Can someone explain why Garbage Collection-free gives Scylla an advantage over Cassandra? Thanks in advanced.
To my understanding, without a garbage collector there are no stop-the-world pauses to collect the garbage in the background and therefore more computing resources are spent on the actual work of DB.
Can you make me a pro at syatem design ?? I just love this subject...
Scylla is basically a C++ version of Cassandra. This story is one more proof of java's inferiority. Why are people still using Java ?? 🤷🏽♂️
Because they can't learn C++ :)
9 days - are you kidding me!
What is a garbage collector, how it is useful for DB’s?
Cassandra is written in Java, so the garbage collector is automatically part of it.
The Garbage Collector is responsible for checking which variables aren't more needed in the program and removing than from memory, the problem is, the Garbage Collector has to run every now and then, and this process requires processing power and will add latency to the main application running, because for the most of the time, the whole application or parts of it has to be stopped and wait for the Garbage Collector to do it's job, even though this process takes a few milliseconds, in a high throughput application this can add up and slow down the whole system.
Languages like C++ and Rust have a "manual" memory allocation and deallocation, the programmer is responsible to take care of it before hand, so you won't need a garbage collector at runtime, what results in faster executions ...
@@RicardoSilvaTripcallappreciate this
Show!!!!
Waouh!
Joy != rust
👀👌
So much effort for messages that nobody can ever find again anyway 😂Seriously, Discord scrollback might as well be write-only it's so unusable. :(
I can agree it is cool technology wise, but 99.9% of discord messages are garbage, and who will ever scroll back to see any of them?
I admire the engineering effort it just seems to me kind of wasted effort
Not true at all. If you have message garbage on Discord, it your server choice you have to question, not the tool or the others... 🥱
How do they store TRILLIONS of messages? Like everyone else, they use DATABASE. Hows that any different of any other services?
1st viewer😂
Brother, I love your content, but please, your tone is too flat. It makes me sleepy