Notes: TCP good, just use delay. Server will do the encoding instead of streamer. Chunks should be aligned of different resolution for smoother user experience. CDNs and Redis for faster delivery. For chat, client polling using websocket.
Great video Jordan. It became a habit of mine 😄 to watch your content and study your system designs. Thanks mate for your work, I appreciate what you're doing.
Thanks for the videos and specially making the 2.0 playlists. Been a follower since Dec last year and was able to get Amazon and Google's offer. Would have liked to thank you in person if you were still at G. 😄. Keep up the great stuff man!
The 'encoding server' is a bit hand wavey as encoding to multiple formats and resolution would involve some sort of queue + async workers, but I think you've covered this in other videos. Also I would guess encoding is actually done on the streamer client in some cases depending on how beefy it is - i.e. not when the streamer is using a mobile device, but probably so for PC clients. No doubt twitch is happy to offload those costs to streamers whenever possible.
Agree with your second point, especially if they use 2 pcs. I'm actually not sure about the first, as it's pretty imperative that this get done ASAP. Maybe they do it asynchronously, but with some sort of understanding that it has to get done within < 5s
Great video Jordan. Does Redis cache need to have a lock, it's single threaded, so by it's nature, locking can be prevented? Is my understanding correct?
Can we pretty please add flink somewhere? Maybe to store user's offset of last seen video segment, to restart from it when user starts watching same stream after a few days. A design without DB=>cdc=>kafka=>flink does not look complete these days ;-).
Great video! Thank you! One comment: The encoding server will have to write to 4 different places. Either: - 2 PC: Bad - Write into the metadata cache last: Better And we could have a CRON job that cleans up failed job in the background?
So I'd say it really only has to write to two places (since the caches should be content that is pulled in, not pushed in). To avoid two phase commit, we can just write the metadata row once the S3 clip is uploaded. If for some reason there are some orphaned S3 files that's no biggie
@@jordanhasnolife5163 hey dude, if the caches are set on pull ins? why does the final diagram seems to depict a flow where CDN and metadata caches are pushed on encodings?
Hi, You are delivering high quality content, It's very unique and hitting the core problems of each and every design. Can you please make a video on collaborative editing tool like excalidraw.
Thanks! Any reason in particular that you think Excalidraw is challenging? Unlike google docs, I imagine that there aren't enough concurrent edits to make a single leader infeasible here
Any reason not to decouple the 4 parallel steps(s3 upload, HBase upload, Redis update, CDN update) with Encoding server using Kafka. Increased latency seems one but is there anything else?
Increased latency is pretty relevant when you need this completed before being able to have users get the live stream data. We have no guarantees on when those tasks would complete otherwise.
1) Would you use Zookeeper for the locks on s3 files? 2) why not use Cassandra also for the metadata store (instead of HBase)? Even with multiple columns for multiple resolutions/formats, the size of metadata is small compared to the corresponding video chunks themselves, so the "read single column for current resolution" might be an over-optimization here?
1) Why do I need locks on s3 files? 2) Perhaps so, I think the table itself may be read a lot more than it is written to though, so I wonder if using a database that favors writes is the correct move here. Though that may invalidate H-Base too haha
Thanks for the video!! I have one question, why don't we partition the chat servers by combination of video-id/user-id or even just by user-id?, this way we won't have an overloaded chat server for popular streams
given a "stream id, msg id, timestamp" how does cassandra read 10 msgs before it? what if the msg is at the head of a SSTable? I feel these videos don't discuss read, but just focus on how fast cassandra writes
How? It's a database. You sort the messages by timestamp. All LSM based databases read multiple SSTables. These videos don't dicuss SSTable reads because I have a full concepts series that does.
Hey jordan, i'm not able to understand the use of CDN's here, are we going to store that 1 sec chunk in those CDN and also why do we need to store metadata, can you please clarify these doubts of mine
Metadata basically just stores the link of the clip in S3 as well as it's sequence number and resolution. The CDN is used as a globally distributed cache of the video clips.
You have used same db type - columnar db but different underlying db for chunk metadata and chatDb. I guess reason behind it is you want chunk metadata to have consistent db (Hbase) while for chatDb you want availability more (cassandra). Is this right understanding or there are other reasons also ?
Hey I tried to explain this one more in the video. Cassandra - as fast write throughput as you're going to get (other than maybe like kafka or something and just using a log, which actually now that I think about it isn't the worst solution here). HBase - fairly fast for writes, but important for reads since we only want to access a couple columns of our data at once. You call both of these columnar dbs. To my knowledge, they both use column formats, but HBase uses column oriented storage, which is a significant difference.
I appreciate that man! I already have no life though so you've paid your debt. In all seriousness, once you get that job, no need to keep watching go have fun and socialize :)
Crisp and very structured unlike a few other videos. Thanks Jordan for amazing content !
Your videos are one of the best source to learn about system design. Appreciate your effort and consistency:)
Notes:
TCP good, just use delay.
Server will do the encoding instead of streamer.
Chunks should be aligned of different resolution for smoother user experience.
CDNs and Redis for faster delivery.
For chat, client polling using websocket.
Great video Jordan. It became a habit of mine 😄 to watch your content and study your system designs. Thanks mate for your work, I appreciate what you're doing.
Babe wake up, new system design just dropped
She may still be sleeping
@@jordanhasnolife5163 That's OK, the system is fault tolerant. I will eventually achieve consistency via an air horn
@@scottmangiapane lol, I see you're the primary in your relationship then and you're asynchronously replicating your sleep status to her
@@jordanhasnolife5163 Oh yeah. And if she's still not up, I'll implement sharting I mean sharding ;)
@@jordanhasnolife5163 Oh yeah. And if she still won't wake up, I'll implement sharting I mean sharding ;)
Thanks for the videos and specially making the 2.0 playlists. Been a follower since Dec last year and was able to get Amazon and Google's offer. Would have liked to thank you in person if you were still at G. 😄. Keep up the great stuff man!
Congrats man!! You're an absolute legend! And have fun at Google!
The 'encoding server' is a bit hand wavey as encoding to multiple formats and resolution would involve some sort of queue + async workers, but I think you've covered this in other videos.
Also I would guess encoding is actually done on the streamer client in some cases depending on how beefy it is - i.e. not when the streamer is using a mobile device, but probably so for PC clients. No doubt twitch is happy to offload those costs to streamers whenever possible.
Agree with your second point, especially if they use 2 pcs. I'm actually not sure about the first, as it's pretty imperative that this get done ASAP. Maybe they do it asynchronously, but with some sort of understanding that it has to get done within < 5s
Great content Jordan! Really Enjoyed it. I kindly request you to also cover Online Bidding Platform (like e-Bay) in one of your videos. Thank you!
Will do eventually!
I watch Jordan's videos more frequently than videos from Jordan's favorite website.
I don't!
Great video Jordan. Does Redis cache need to have a lock, it's single threaded, so by it's nature, locking can be prevented? Is my understanding correct?
Sounds reasonable to me!
toe reveal stream when?!
great video btw!!
Can we pretty please add flink somewhere? Maybe to store user's offset of last seen video segment, to restart from it when user starts watching same stream after a few days. A design without DB=>cdc=>kafka=>flink does not look complete these days ;-).
Flink broke up with me but I miss her
Great video! Thank you! One comment: The encoding server will have to write to 4 different places. Either:
- 2 PC: Bad
- Write into the metadata cache last: Better
And we could have a CRON job that cleans up failed job in the background?
So I'd say it really only has to write to two places (since the caches should be content that is pulled in, not pushed in).
To avoid two phase commit, we can just write the metadata row once the S3 clip is uploaded. If for some reason there are some orphaned S3 files that's no biggie
@@jordanhasnolife5163 hey dude, if the caches are set on pull ins? why does the final diagram seems to depict a flow where CDN and metadata caches are pushed on encodings?
@@goldensunliu Probably because I'm a bum and drew the arrow in the wrong direction
Hi, You are delivering high quality content, It's very unique and hitting the core problems of each and every design. Can you please make a video on collaborative editing tool like excalidraw.
Thanks! Any reason in particular that you think Excalidraw is challenging? Unlike google docs, I imagine that there aren't enough concurrent edits to make a single leader infeasible here
Any reason not to decouple the 4 parallel steps(s3 upload, HBase upload, Redis update, CDN update) with Encoding server using Kafka. Increased latency seems one but is there anything else?
Increased latency is pretty relevant when you need this completed before being able to have users get the live stream data. We have no guarantees on when those tasks would complete otherwise.
1) Would you use Zookeeper for the locks on s3 files?
2) why not use Cassandra also for the metadata store (instead of HBase)? Even with multiple columns for multiple resolutions/formats, the size of metadata is small compared to the corresponding video chunks themselves, so the "read single column for current resolution" might be an over-optimization here?
1) Why do I need locks on s3 files?
2) Perhaps so, I think the table itself may be read a lot more than it is written to though, so I wonder if using a database that favors writes is the correct move here. Though that may invalidate H-Base too haha
@@jordanhasnolife5163 locking would be to handle a thundering heard situation
Thanks for the video!!
I have one question, why don't we partition the chat servers by combination of video-id/user-id or even just by user-id?, this way we won't have an overloaded chat server for popular streams
Because we only want to have to read from one place for a given chat
The content is awesome, is the notes published anywhere, can we get it?
Check my channel description
Hey Joradan, great video compared to v1!! BTW what happened to Robinhood, I was in middle of that video??
Give it 8 hours from now
given a "stream id, msg id, timestamp" how does cassandra read 10 msgs before it? what if the msg is at the head of a SSTable? I feel these videos don't discuss read, but just focus on how fast cassandra writes
How? It's a database. You sort the messages by timestamp. All LSM based databases read multiple SSTables. These videos don't dicuss SSTable reads because I have a full concepts series that does.
15:00 when batman visits his grandma
Hey jordan, i'm not able to understand the use of CDN's here, are we going to store that 1 sec chunk in those CDN and also why do we need to store metadata, can you please clarify these doubts of mine
Metadata basically just stores the link of the clip in S3 as well as it's sequence number and resolution. The CDN is used as a globally distributed cache of the video clips.
You have used same db type - columnar db but different underlying db for chunk metadata and chatDb. I guess reason behind it is you want chunk metadata to have consistent db (Hbase) while for chatDb you want availability more (cassandra). Is this right understanding or there are other reasons also ?
Hey I tried to explain this one more in the video.
Cassandra - as fast write throughput as you're going to get (other than maybe like kafka or something and just using a log, which actually now that I think about it isn't the worst solution here).
HBase - fairly fast for writes, but important for reads since we only want to access a couple columns of our data at once.
You call both of these columnar dbs. To my knowledge, they both use column formats, but HBase uses column oriented storage, which is a significant difference.
Amazing video!
ill keep watchin ur videos till i die or u die
I appreciate that man! I already have no life though so you've paid your debt.
In all seriousness, once you get that job, no need to keep watching go have fun and socialize :)
@@jordanhasnolife5163 i kinda watch em for fun at this point XDD
Lipsync still missing
Oh. So. Cool.
start streaming on twitch!!!
Feet incoming?!
hahahahaahah deserved for flying on delta (I never step foot inside an airplane)
C'mon man it wasn't even spirit or frontier!
@@jordanhasnolife5163 if it was spirit they would seat you on the jet engine none the less pick your poison
@@jordanhasnolife5163 those 2 would have gotten you a seat on the turbine
What kind of BS is this?? Where the hell is FLINK
We're on a break 😭