I have 2 question on the final architecture diagram. one is why raw video is sending directly from ingestion to s3. s3 only take final processed video after processing by workers right? and second, why the arrow is from different devices to CDN instead of CDN to different devices
These kinds of mock discussion on SD is really helpful. Provides viewer a thought process while dealing such questions. Kindly do more these kinds of video ...
Another awesome delivery , thanks Gaurav , One thought :- we increased the storage to ~6x for considering different resolution and formats , which we can handle by introducing 2 entities in the system . one , for avoiding different format , we can provide a dedicated video player to user, which understand our format only . Second entity is a resolution manager which we can place before streaming engine , which can help us to upgrade or downgrade a resolution as per user bandwidth or user reqest . take axample like netlix and youtube , they have their own media player which can understand their recording format . yes one extra task will be to convert uplaoded videos to application understanding format while uploading only but that will be fruitfull in saving 6x of storage cost . resolution can also be handled at runtime in 2 ways . -One by keeping always a high resolution copy and downgrade it at run time before serving to user. downside is a storage increment because of high resolution copies . - another is to always keep a low resolution copy for reference with some pixel patteren files to convert the low resolution copy to high resolution copy at run time . Up side it we can reduce the cost of storage system significantly. for perfromace handling in conversion , a dedicated system with predefined resolution converter filter can work .
@@edwardspencer9397 It not just about creating an app which can play video. You'll of-course have an app. Different formats have different properties. Some have small file sizes but require some hardware acceleration to perform well which may not be available on all devices. So even if you create your own player, it will do software decoding which will be slow - users will complain about phones getting warm, high battery consumption and sluggish performance. Instead you create different formats that are optimized for a particular family of hardware. There can always be a basic format as a fallback but you should cover the large percentage of devices in formats optimized for them.
@@lhxperimental Large percentage of devices is no longer true. Businesses always prefer those who have medium / high end phones/devices capable of hardware acceleration because all the others owning low end phones are mostly poor people who have no intention to spend any money on subscriptions or visit advertisers. So even if a poor guy uninstalls something due to overheating issues it shouldn't be a problem.
That was really amazing... like how smoothly she explains bits and pieces of the problem. loved it. Learned a lot. . . Thanks a lot for this content guyz.
Few ideas! - Utilising the fact that most requests are of videos that are in trend, and trends die in ~month or so, instead of storing all the transcoded files, we have a live transcoder, and store the result in a cache (or CDN) with a TTL of ~ month (this time can be decided by data analysis). Twitter did this and were able to save millions on storage costs. - We can have live websockets with the online users, so that whenever the video is complete we can notify them, and maybe also the users who were tagged, or are very engaged with an account. - Instead of dividing videos in chunks after receiving the whole video, let the client do the chunking and upload chunks only. This would result in way less failures as if a upload fails after uploading 95% of the video, you don't need to re upload the entire file again. - Maybe have caches on top of databases
There should be some questions asked upfront before diving in such as "do we want video searching", "do we want to generate newfeed", "what about video sharing", "are users able to download video", "are users able to follow other people", etc. After that we can focus on what the interviewer is really interested at.
Kudos on this interview. So refreshing to see a mock sys design on youtube where the interviewer takes it seriously, challenges, questions and pushes the decisions of the interviewee.👏
Great take at the design problem. :) However I'd have a different approach for replication. We're replicating the video in s3 for 2 reasons: 1. Fault tolerance 2. Latency due to geographical location I'd suggest to replicate to far fewer s3 locations and that too only for (1). To tackle (2) we can have this approach --> 1. Buffer around 1 second or so of the video on the device upfront. 2. When user starts watching the video, then lazily load the rest of the video in chunks. The buffering strategy further depends on (to name a few): 1. Device network quality 2. Prediction of potential videos which user might want to watch based on some ranking algorithm Also, regarding hot video meta data caching: 1. We can cache the api response at cloudfront end. 2. Redis can also be used alternatively. Redis might be a better approach here because it is distributed and if the video is deleted/modified by the OP then we can update it accordingly.
1. We can cache the api response at cloudfront end. -> AWS has the Global Accelerator for this purpose. It's costly, but if you're ingesting ~1.2TB of videos everyday, you can afford it.
Liar it's no where near the real world projects...!! Although they are really good, it only gives us a idea of MVP and also how to crack interviews!! Real world scenarios are much worse and terrifying👻😱!!
Thank you both for putting this together and providing this content openly. This is very helpful for those trying to prepare for this exact type of interview scenario and who might not be familiar with the format. Excellent job!
Thanks Gaurav Sen & Yogita for informative contents. You guys are great. I was looking for such videos since long time. Finally found one. Thanks again.
When i started watching i thought ill quit in between but the session was so nice and non boring and interactive that I watched the hole video thanks a lot for this
Hey gorav, much helpful for the freshers and people with 1-2 years of experience in this field because this is how we deal with upper management, I always gets those diagrams and based on that do my implementation but now only I knew how they come to the conclusion of what needs to be done. Thanks for this. 👍
In so many video I searched the difference between sql and no sql but i didn't understand the use case but I got a clear picture about the use case for the no sql.. Thanks for this keep posting your videos especially yogitha
Amazing video!!! Learnt a lot. The parallel workflow thing blew my mind. I thought it could be done later on, maybe post the original upload in a slower way. But that matrix thing was amazing!!
Amazing video....lot of questions were addressed. This duo should do a video series covering other case studies like : stock broker platform , uber , whatsapp etc
Great video! One feedback - I didn't see the usage of the 1.2TB data you calculated, I mean a translation of how many servers (with resources like CPU, RAM, Disk, IO, etc) would be needed for ingestion pipeline as well as storage would have been helpful. Also, some interesting scenarios like thundering herd, data compression to reduce cost would have been of great help. And don't you think, putting all the video in the CDN would be cost heavy. Should have some strategy based on popularity/recency/TTL and upload/remove the video from CDN.
wow the end-to-end request flow was really smart, as we're just returning the list of metadata it'll be fast and metadata will have actual video link too
Great discussion. Yogita, huge respect. The way you explained the different choices you took, is an eye opener for people like me who is going to take the bull by horn soon. Subscribed to your channel as well. Thank you Gaurav.
I didn't quite get how the queue was adding more advantage if done in a sync way. The video needs to be anyway uploaded at least once. Since the video is uploaded onto a queue message (or somewhere else) and then copied to S3, more IOs and network calls will be needed. Since S3 is anyway getting used, so one can even give temporary (and restricted) access to the S3 objects so that the user can directly upload to S3 without much security concerns and then simply queue just the URL of the uploaded object, along with other user metadata in the queue. So the overall upload API time will be just creating an empty S3 object with the client uploading in an async way in the background. Creating something like AWS S3 or Azure Storage or any distributed File System like storage is quite hard, and it doesn't make sense to really create them from scratch unless budget and scale are limited. Also, there can be multiple ways to implement queues, it can be some pre-existing queue provided by some Cloud platform, or is it built indigenously on top of a database. What happens when someone node reading the queue message crashes or the power was unplugged. What kind of queue message popping mechanism is used while reading those messages, to make the system crash resilient and we do not lose the client's request because the power plug was pulled out after a message was popped from the queue. Does the queue have a strict FIFO requirement (I don't think it is necessary in this case), what happens if multiple queue messages are enqueued because of the retries from the client? Last but not least we cannot keep the videos forever and they need to be GCed, how frequently the GC will run and remove those videos? A lot of questions can be asked, and that's what I like about Design interviews. Ask anything and everything :) Side Note: I have never used AWS's S3, however, I have used Azure Storage extensively and I could search for Azure Storage equivalents in AWS's S3. So please forgive me if I have wrongly used AWS terminologies.
These are excellent questions. The queue mentioned here has events (just the id, url and some metadata about the video). You need a persistent queue with retries here. Ordering isn't necessary. I'd use something like Apache Kafka for this.
I was thinking in the same lines, even though Q can do that job I felt all those high computing tasks should be pushed as backend batch jobs that enrich the meta data. That way during the upload it is easier to maintain the consistency of the system. During upload the videos get pushed into an S3 bucket, an authenticated URL with the token is generated and then the URL along with other information is either stored into DB or pushed to Q for storage into DB, and the user is provided a response. Now once the Data is in DB or another Q a new event can trigger that behind the scene performs the variable bit rate conversion and other needed things and enrich the data. I am just citing an approach IMHO, it will be helpful if someone can validate this
What happens when someone node reading the queue message crashes or the power was unplugged. After reading a message from the queue, there could be a visibility timeout in the queue and the message won't be available to the other worker nodes. After doing the processing, worker must send an acknowledgment to the queue that this is processed and delete the message. And after visibility timeout if queue don't receive any ack then the message can be available again. With this if worker crash then the message will be visible to other workers and system would be more resilient. There should be an algo(S3 provides it) to monitor the Infrequent accessed videos. The videos that are not accessed could be moved to Infrequent access(to save cost) after some time or if not being used for some time. Please reply if you have any thoughts on this.
@@letsmusic3341 Yeah the visibility timeout approach should work. However, there are extreme corner case scenarios when let's say the worker node couldn't finish processing the message, and the message becomes visible to other nodes. Now, to avoid such a thing there is a need for some sort of locking over the message in the queue (which can simply mean to update the visibility timeout again). Even updating the visibility timeout again and again till the job is finished in a different thread can have problems, especially when the thread misses to update the visibility timeout. There can be a rare clock skew problem, the thread updating the locks can have a different notion of time as compared to the queue service and hence might not even refresh the visibility timeout at the right time. But these are extremely rare scenarios. Using external locking with larger timeout mechanisms outside of the queue service can help in such scenarios. A large enough timeout with additional buffer time can also help in such problems.
@Amitayush Thakur Nice question. @Gaurav Sen to be more clear, upload happens to S3. MetaData about the video and User is passed as event to the queue for further processing.
I personally think that using queue for uploading is kind of an overkill introducing extra overhead. Since we are already using distributed file storage technology (S3) we can use their specific features to directly let client upload to distributed file storage (pre signed URLs in case of S3, we can generate these before head to reduce latency even further). This will skip a few extra hops reducing upload latency. Once video is uploaded, we can do extra processing(compression, check for NSFW/abusive content, converting to different codecs and resolutions) before approving the video. We can configure hooks for firing after video approval event in order to include video for listing and feed updating.
Yes, and introduce lifecycle rules into the S3 bucket to transition files to something like, the "Infrequent access" storage type, after 15 days. This would make a world of difference, considering this app is going to upload a hypothetical 1.2TB of videos everyday.
This is a better solution! s3 also supports multi part file upload which can be useful for larger video files. Besides, we can use s3's object creation event to fire off sqs/sns/lambda for downstream processesing
Once video is uploaded, we can do extra processing(compression, check for NSFW/abusive content, converting to different codecs and resolutions) before approving the video. RE this. If you are uploading large files, they will be chunked. as such, these chunked parts can be processed in parallel thereby increasing the rate of processing overall and allowing more videos to be uploaded and processed. E.g. if you one chunk is determined to be NSFW, then that kills off all trailing chunks; the earlier this is picked up the better (which will occur via chunking)
Very informative and was more like technical talks rather than interviews, which is very good actually. Couple of points I was waiting for but was not discuss were: 1. Algorithm on Intelligence used (in Tiktok or IG) 2. How once played video won't show up again? (can only be played when fetched through ID/CDN) 3. How to track whether a user is interested in a profile and play videos from that profile alternatively? 4. How testing can be performed by this approach? 5. What are the alternatives for the S3 bucket?
1) intelligence?? Is it for recommendations..can u please add more details 2) for that we have implemented a playlist type of thing based on user choice and keep refreshing it using cron and there we keep these checks
4) for testing..we have written test to upload asynchronously and use ffmpeg for crrating chunks and later test it in terms of quality and resolution on different screen
Hi, Thanks for making this video. I feel a few points are not covered 1. What kind/technology of queue to use? 2. What will be the request size? 3. How you will break the video and how you will form the video back? In the case of parallel processing, how will you recognize which chunk belongs to which video? There will be multiple requests for uploading from different users. 4. A bit of protocol discussion would have added some advantage? Also, I have some doubts:- Won't it be good that the ingestion service returns an S3 URL to the user and then the user uploads the video directly using this URL? Is it possible for CDN to directly access S3 and we can skip video service? To return the list of videos and thumbnails we can have a separate service.
Talking specifically about s3, AWS provides CDN, CloudFront which restricts access to your S3 bucket. By only allowing CloudFront endpoints to give access to the content, this way the applications will be more secure and responsive.
Thanks a lot Gaurav for this extremely useful video. I must appreciate Yogita for this very detailed system design and component choices right from the queue, S3, CDN, Diff DB's, etc were awesome and especially the processing part of the video via workers. Thank you both!!
How does that happen exactly by the way ? You literally split 1 mb file into three 333kb files and then convert them using any file-format-converter like FFMpeg etc, and then merge again ??
Thanks @gaurav for making such a extremely handy and useful video. Kudos for that. 👍 Can we please have part 2 of this video where you include discuss about the 1. Exception handling and reporting, 2. Ballpark estimate for each component of this system. 3. What strategy to be used a month or a year after to decrease load on the file system.
I quite did not understand how chunking a video would help. A video file has headers and then frame data. Now once the video is ingested we can't simply chunk the blob data (the way we chunk text data normally) without the header. The chunked video would not make any sense without the header. Now if the idea was to 30mb video => chunked to 10mb + 10mb + 10mb video => and then pass each chunk to converters which convert to different codec and resolution, we can't do it with basic blob data chunking because the converters in the pipeline could not work without the headers of original video file. Because the header will be attached to the first 10mb chunk itself. So essentially the pipeline should look something like ```sh 30mb video files = [header]framedata = 10mb file [header]framedata ==> converter_fn ===> corrupted file ==\ + 10mb file framedata ==> converter_fn ===> corrupted file========= | ==> corrupted file after joining + 10mb file framedata ==> converter_fn ===> corrupted file======== / ``` So AFAIK, if we have to chunk 30sec video => 10sec + 10 sec + 10sec video => [1] our best shot here is to pass it through ffmpeg to chunk the video (which essentially recreates the header with proper value with frames) . [2] somehow make the client send 3 different videos and during consumption play it as 1 video from UI layer. Note: When you think about video header, it's not like text HTTP header, it would have general video codec + playback + compression related metadata etc. And then [I guess] each frame would have its own header (inside the frame blocks) => Chunking needs to take care of all these facts => hence we need something like ffmpeg . I probably misunderstood the point Yowgita (apologies if I misspelled the name) had made, or probably there are some techniques she implied which I don't know. In either case, would love to know if I missed something in my understanding.
Good points. You are really familiar with video file format! I guess in a real interview the interviewer would not expect candidates to know the stuff you've mentioned, they would probably give bonus points for recognizing that we should chunk videos when they're huge (think RUclips). For TikTok I guess we should be ok just uploading the videos as is
The video is very helpful. Some changes that could be done to the ingestion services. Rather than ingestion service uploading the files to S3. We should make use of multi-part feature of S3 and have client upload to S3 directly by returning the signed S3 URLs to the client. That way you are not putting load on Ingestion service and it become highly scalable. Let AWS take care of the file upload and deal with the bandwidth. Following that design, we do not need to put the videos in different regions. Once the CDN is hooked up with a S3 bucket, it will take care of replicating the data to all the configured regions where the user is supposed to access the video. Let me know what do you think about this approach.
It is a very good video!!!! Few things to add: 1. Logging is very important 2. Authentication and Authorization 3. Metrics and Reports 4. Containerization & process orchestration (Docker, Kubernetes) 5. API gateway & Microservices
Good discussion guys Thumbs up! I am not expert on video streaming systems but a few questions out of curiosity- 1. Can't we store video in a standard resolution and transform the resolution while streaming down to user? Shouldn't really require too much compute in real-time but we may save a lot on storage. I may be wrong though. 2. Do we need a very effective data life cycle policy as well to archive and trash old videos? And I agree S3 with glacier may be a good choice. 3. What is the plan to make S3 lookup faster? Any considerations for this in Video Metadata? 4. Applications like this have a list of video that users keep swiping back and forth can this fact be used to make performance better by caching next two and last two videos on user device ahead of time? Guys I am really impressed with the problem statement and your approach, just speaking out mind here, hope that's ok :)
Thank you for your effort, time, and most importantly its insanely invaluable content. This video proves how intelligent Indians generally are... We Iranians/Persians admire how intellectual and intelligent they are, we simply cannot take our eyes off them, in all honesty not only they are super brilliant minded but also super hard-working people.😊 Lots of love and respect for you all from IRAN ❤️
Liked this very much.. A feedback to Gaurav. Protocol type used for upload or streaming is an implantation detail and not really required at system design level. If the candidate knows it is brilliant but if candidate doesn’t know, in real world, they can just read up on the latest in streaming solution and use it. Again just an implementation detail.
Few points which I think should be discussed to reduce the latency factor i.e. Inter-service communication (especially when the question was asked to retrieve the user-specific videos). We can think of gRPC protocol for inter-service communication. Moreover, handle the complexity of service mesh through service mesh technology like Istio for runtime service discovery. Will the video chunking be done at the client side or server side & how? Does S3 or CDN supports video streaming, if not how are we going to handle it? We can think of MongoDB with GridFS, as it supports binary file streaming. You talked about the pipeline but did not clarify the approach to implement the same. Maybe we can think of the AWS Lambda functions that will be triggered on the file upload event of the S3 bucket. Lambda functions will take care of all parallel activities that were discussed, along with the video Metadata formation like thumbnails. This part was also not discussed that how thumbnails would be created. & many more things which probably is not possible to discuss over 45 mins video. But really enjoyed the session and had a great learning. Thank you for the efforts
Great video, super helpful! 2 quick thoughts I had: 1. I don't think User data needs to be in a SQL database. While it would have the benefits of ACID compliance, the most common access pattern seems to be just getting all the info for one user and also updating fields for one user. So storing it in the same NoSQL document-style DB as video metadata like MongoDB would possibly suffice? It would also make it easier to horizontally scale as well. Don't think it's a critical part of the design of the system though! 2. I'd be curious how we'd want to shard the video metadata DB for it to be performant for the recommendation engine. Good job though, I enjoy the mock interview format!
Yes, and even S3 data storage types. If this app is ingesting 1.2TB of files everyday, it'd make sense to store the raw files in a S3 file storage type that costs less money. Version your videos, and setup lifecycle rules to convert them to "Infrequent access" storage type, after let's say, 15 days.
@@kanuj.bhatnagar I agree with storing the videos in some object storage (s3 or google cloud storage) but applying lifecycle in 15 days is not good. I mean for that we can easily update the cache to remove the older video but at least wait for 180 days to move next lifecycle. As access rate for not frequent data changed high in could storage system.
Sir i think first We don't need to devide in to chunks or we can use format like hls or dynamic mp4 for streaming it can significantly reduce storage or also we need one db like clickhouse for things like recommendation engine or for video uploading we can use like presigned url to store temp and we can easily implant editing pipe line with this approach we can use saga for tranction if any step we can revert full process with out any manual action this is my conclusion what i observed youtube,hotstar request patern Suggest any problem with this approach 😊
Awesome video. Just want to check, at 13:13 she mentioned "We can have multiple S3s in multiple Regions". I believe, S3 is a global service and each bucket name should be unique.
If you are preparing for a system design interview, try get.interviewready.io.
All the best 😁
S3 is not a file storage
Hi. Could you please share the name of the online tool you are using for colaborating?
@@vishal733 All online meeting service will have a whiteboard inbuilt in it such as webex, zoom, etc.
I have 2 question on the final architecture diagram. one is why raw video is sending directly from ingestion to s3. s3 only take final processed video after processing by workers right? and second, why the arrow is from different devices to CDN instead of CDN to different devices
What software is used for drawing in this video?
These kinds of mock discussion on SD is really helpful. Provides viewer a thought process while dealing such questions. Kindly do more these kinds of video ...
Why do u have two spaces around "viewer"
++
Another awesome delivery , thanks Gaurav ,
One thought :- we increased the storage to ~6x for considering different resolution and formats , which we can handle by introducing 2 entities in the system . one , for avoiding different format , we can provide a dedicated video player to user, which understand our format only . Second entity is a resolution manager which we can place before streaming engine , which can help us to upgrade or downgrade a resolution as per user bandwidth or user reqest .
take axample like netlix and youtube , they have their own media player which can understand their recording format . yes one extra task will be to convert uplaoded videos to application understanding format while uploading only but that will be fruitfull in saving 6x of storage cost .
resolution can also be handled at runtime in 2 ways .
-One by keeping always a high resolution copy and downgrade it at run time before serving to user. downside is a storage increment because of high resolution copies .
- another is to always keep a low resolution copy for reference with some pixel patteren files to convert the low resolution copy to high resolution copy at run time . Up side it we can reduce the cost of storage system significantly.
for perfromace handling in conversion , a dedicated system with predefined resolution converter filter can work .
Brilliant points, thanks!
It would also be good idea to take a look at ffmpeg and "ts" files creation
Yes it is common sense to create your own video player which supports all devices instead of creating 20 formats lol.
@@edwardspencer9397 It not just about creating an app which can play video. You'll of-course have an app. Different formats have different properties. Some have small file sizes but require some hardware acceleration to perform well which may not be available on all devices. So even if you create your own player, it will do software decoding which will be slow - users will complain about phones getting warm, high battery consumption and sluggish performance. Instead you create different formats that are optimized for a particular family of hardware. There can always be a basic format as a fallback but you should cover the large percentage of devices in formats optimized for them.
@@lhxperimental Large percentage of devices is no longer true. Businesses always prefer those who have medium / high end phones/devices capable of hardware acceleration because all the others owning low end phones are mostly poor people who have no intention to spend any money on subscriptions or visit advertisers. So even if a poor guy uninstalls something due to overheating issues it shouldn't be a problem.
Scrolling tiktok for 45 min. - No
Watch whole video for 45 min. - Yes, it's great.
It took me 2+hrs to watch 45min video lol
That was really amazing... like how smoothly she explains bits and pieces of the problem.
loved it.
Learned a lot.
.
.
Thanks a lot for this content guyz.
You're very welcome!
You both are just too good!! I love the authenticity and simplicity. The actual interview does take this similar course. Keep up the great work.
Few ideas!
- Utilising the fact that most requests are of videos that are in trend, and trends die in ~month or so, instead of storing all the transcoded files, we have a live transcoder, and store the result in a cache (or CDN) with a TTL of ~ month (this time can be decided by data analysis). Twitter did this and were able to save millions on storage costs.
- We can have live websockets with the online users, so that whenever the video is complete we can notify them, and maybe also the users who were tagged, or are very engaged with an account.
- Instead of dividing videos in chunks after receiving the whole video, let the client do the chunking and upload chunks only. This would result in way less failures as if a upload fails after uploading 95% of the video, you don't need to re upload the entire file again.
- Maybe have caches on top of databases
s3 also have multiple tiers . you can set the rule to move files to lower tier after set time and further
Agree with chunking the video on the client side!
There should be some questions asked upfront before diving in such as "do we want video searching", "do we want to generate newfeed", "what about video sharing", "are users able to download video", "are users able to follow other people", etc. After that we can focus on what the interviewer is really interested at.
ya i was wondering the same
That would be really a microservices part AFAIK. Scalable architecture is the first goal followed by additive services.
@@ashishprasad1963 correct
Great discussion...The most important parts starts at 19:20 and 38:04 to be specific
Kudos on this interview. So refreshing to see a mock sys design on youtube where the interviewer takes it seriously, challenges, questions and pushes the decisions of the interviewee.👏
Two of my fav youtubers on system desigm
I am watching this video after almost 2 years. Thanks for uploading these kind of videos, They are very helpful.
Thank you!
Very detailed, touches very important system design aspects. Gives many pointers for further research!
A zillion Thanks!
By watching this video I fallen in love with System Design 😅
This is way to learn How system design with respect to requirements
Great take at the design problem. :)
However I'd have a different approach for replication. We're replicating the video in s3 for 2 reasons:
1. Fault tolerance
2. Latency due to geographical location
I'd suggest to replicate to far fewer s3 locations and that too only for (1).
To tackle (2) we can have this approach -->
1. Buffer around 1 second or so of the video on the device upfront.
2. When user starts watching the video, then lazily load the rest of the video in chunks.
The buffering strategy further depends on (to name a few):
1. Device network quality
2. Prediction of potential videos which user might want to watch based on some ranking algorithm
Also, regarding hot video meta data caching:
1. We can cache the api response at cloudfront end.
2. Redis can also be used alternatively.
Redis might be a better approach here because it is distributed and if the video is deleted/modified by the OP then we can update it accordingly.
1. We can cache the api response at cloudfront end. -> AWS has the Global Accelerator for this purpose. It's costly, but if you're ingesting ~1.2TB of videos everyday, you can afford it.
This video is so good. It so helpful talking to engineering manager.
Liar it's no where near the real world projects...!! Although they are really good, it only gives us a idea of MVP and also how to crack interviews!! Real world scenarios are much worse and terrifying👻😱!!
Thank you both for putting this together and providing this content openly. This is very helpful for those trying to prepare for this exact type of interview scenario and who might not be familiar with the format. Excellent job!
There should be more sessions like this. It's super helpful. I loved it!
I love this video and got to know atleast at a basic level the system design approach.
Thanks Gaurav Sen & Yogita for informative contents. You guys are great. I was looking for such videos since long time. Finally found one. Thanks again.
Our pleasure!
one of the most valuable content in youtube for young IT engineers
She came really prepared for this question! Didn’t she 😂 she was playing back what she prepped really nicely for this video. Great stuff folks 👍
This was probably the best video so far. Please try to make more such videos
The best mock I saw in my 2 months studying for my interview.
When i started watching i thought ill quit in between but the session was so nice and non boring and interactive that I watched the hole video thanks a lot for this
this video was not on hole, are you sure watched this video only ?
Great video...
The way she used all of her info and Gaurav summarized, it is just great in a short time.
Thank you
One of the best videos to understand system design. Thanks guys
Thank you so much, Gaurav and Yogita. I got to learn a lot from this particular video. Please posting such videos for the community. Thanks again.
I LOVE THIS VIDEO!!! You brought a pro and the back and forth brings that dual insight
i think the integrations of s3/cdn and cache/cdn are something i would like to learn more as a followup. Great video btw!
No interviewer is that humble like gaurav, when we ask for requirements they say you yourself think of it
Very helpful.
Have used all the knowledge gathered so far in the playlist.
Thanks for sharing this discussion!
You're welcome!
Awesome, guys! It is really valuable to see such interview in action. Feels like you are the one who is being interviewed. Good job, thank you! 🤩
Hey gorav, much helpful for the freshers and people with 1-2 years of experience in this field because this is how we deal with upper management, I always gets those diagrams and based on that do my implementation but now only I knew how they come to the conclusion of what needs to be done. Thanks for this. 👍
In so many video I searched the difference between sql and no sql but i didn't understand the use case but I got a clear picture about the use case for the no sql.. Thanks for this keep posting your videos especially yogitha
One suggestion: For video upload, put these tasks into a message queue like Kafka and put workers to work asynchronously
Kafka or something like AWS Kinesis would be help if we were streaming something LIVE. In this scenario AWS SQS or RabbitMQ are the correct tools
Amazing video!!! Learnt a lot. The parallel workflow thing blew my mind. I thought it could be done later on, maybe post the original upload in a slower way. But that matrix thing was amazing!!
Amazing video....lot of questions were addressed. This duo should do a video series covering other case studies like :
stock broker platform , uber , whatsapp etc
ruclips.net/video/vvhC64hQZMk/видео.html
Gaurav sir aap to clean bold ho gaye. Interviewer got impressed throughout. Thanks so much for the efforts.
Great video! One feedback - I didn't see the usage of the 1.2TB data you calculated, I mean a translation of how many servers (with resources like CPU, RAM, Disk, IO, etc) would be needed for ingestion pipeline as well as storage would have been helpful. Also, some interesting scenarios like thundering herd, data compression to reduce cost would have been of great help. And don't you think, putting all the video in the CDN would be cost heavy. Should have some strategy based on popularity/recency/TTL and upload/remove the video from CDN.
We can use rating engine for this like acording reach of video in particular region replication of video in that region like fb
Very good for some one who is interested in designing solutions...hits the basics really hard.
This is my first system Design video that I watch till end 😅
Coincidentally Akamai CDN was down just a few days after this video was uploaded
The interview answer was all over the place. She was dodging from the protocol question for 36 minutes. And Gaurav asked the question finally.
Thanks so much Sen-sei
wow the end-to-end request flow was really smart, as we're just returning the list of metadata it'll be fast and metadata will have actual video link too
Thanks, good video that explains how the world's most popular app works
Excellent video ! Thanks Yogita for putting yourself out there for our benefit.
Many thanks for sharing. It is helpful to see the chain of thoughts, when architecting the solution.
Maza aagaya... Thanks a lot... So much knowledge in a 45 min video.
One of the best video on this channel.
Long time subscriber of Yogita's channel here!
Great discussion. Yogita, huge respect. The way you explained the different choices you took, is an eye opener for people like me who is going to take the bull by horn soon. Subscribed to your channel as well. Thank you Gaurav.
I didn't quite get how the queue was adding more advantage if done in a sync way. The video needs to be anyway uploaded at least once. Since the video is uploaded onto a queue message (or somewhere else) and then copied to S3, more IOs and network calls will be needed. Since S3 is anyway getting used, so one can even give temporary (and restricted) access to the S3 objects so that the user can directly upload to S3 without much security concerns and then simply queue just the URL of the uploaded object, along with other user metadata in the queue. So the overall upload API time will be just creating an empty S3 object with the client uploading in an async way in the background. Creating something like AWS S3 or Azure Storage or any distributed File System like storage is quite hard, and it doesn't make sense to really create them from scratch unless budget and scale are limited.
Also, there can be multiple ways to implement queues, it can be some pre-existing queue provided by some Cloud platform, or is it built indigenously on top of a database. What happens when someone node reading the queue message crashes or the power was unplugged. What kind of queue message popping mechanism is used while reading those messages, to make the system crash resilient and we do not lose the client's request because the power plug was pulled out after a message was popped from the queue. Does the queue have a strict FIFO requirement (I don't think it is necessary in this case), what happens if multiple queue messages are enqueued because of the retries from the client? Last but not least we cannot keep the videos forever and they need to be GCed, how frequently the GC will run and remove those videos?
A lot of questions can be asked, and that's what I like about Design interviews. Ask anything and everything :)
Side Note: I have never used AWS's S3, however, I have used Azure Storage extensively and I could search for Azure Storage equivalents in AWS's S3. So please forgive me if I have wrongly used AWS terminologies.
These are excellent questions. The queue mentioned here has events (just the id, url and some metadata about the video).
You need a persistent queue with retries here. Ordering isn't necessary. I'd use something like Apache Kafka for this.
I was thinking in the same lines, even though Q can do that job I felt all those high computing tasks should be pushed as backend batch jobs that enrich the meta data. That way during the upload it is easier to maintain the consistency of the system. During upload the videos get pushed into an S3 bucket, an authenticated URL with the token is generated and then the URL along with other information is either stored into DB or pushed to Q for storage into DB, and the user is provided a response. Now once the Data is in DB or another Q a new event can trigger that behind the scene performs the variable bit rate conversion and other needed things and enrich the data. I am just citing an approach IMHO, it will be helpful if someone can validate this
What happens when someone node reading the queue message crashes or the power was unplugged.
After reading a message from the queue, there could be a visibility timeout in the queue and the message won't be available to the other worker nodes. After doing the processing, worker must send an acknowledgment to the queue that this is processed and delete the message. And after visibility timeout if queue don't receive any ack then the message can be available again. With this if worker crash then the message will be visible to other workers and system would be more resilient.
There should be an algo(S3 provides it) to monitor the Infrequent accessed videos. The videos that are not accessed could be moved to Infrequent access(to save cost) after some time or if not being used for some time.
Please reply if you have any thoughts on this.
@@letsmusic3341 Yeah the visibility timeout approach should work. However, there are extreme corner case scenarios when let's say the worker node couldn't finish processing the message, and the message becomes visible to other nodes. Now, to avoid such a thing there is a need for some sort of locking over the message in the queue (which can simply mean to update the visibility timeout again). Even updating the visibility timeout again and again till the job is finished in a different thread can have problems, especially when the thread misses to update the visibility timeout. There can be a rare clock skew problem, the thread updating the locks can have a different notion of time as compared to the queue service and hence might not even refresh the visibility timeout at the right time. But these are extremely rare scenarios. Using external locking with larger timeout mechanisms outside of the queue service can help in such scenarios. A large enough timeout with additional buffer time can also help in such problems.
@Amitayush Thakur Nice question. @Gaurav Sen to be more clear, upload happens to S3. MetaData about the video and User is passed as event to the queue for further processing.
I personally think that using queue for uploading is kind of an overkill introducing extra overhead. Since we are already using distributed file storage technology (S3) we can use their specific features to directly let client upload to distributed file storage (pre signed URLs in case of S3, we can generate these before head to reduce latency even further). This will skip a few extra hops reducing upload latency. Once video is uploaded, we can do extra processing(compression, check for NSFW/abusive content, converting to different codecs and resolutions) before approving the video. We can configure hooks for firing after video approval event in order to include video for listing and feed updating.
S3 is Not a distributed File System, it's an object storage (binary).
Yes, and introduce lifecycle rules into the S3 bucket to transition files to something like, the "Infrequent access" storage type, after 15 days. This would make a world of difference, considering this app is going to upload a hypothetical 1.2TB of videos everyday.
This is a better solution! s3 also supports multi part file upload which can be useful for larger video files. Besides, we can use s3's object creation event to fire off sqs/sns/lambda for downstream processesing
Once video is uploaded, we can do extra processing(compression, check for NSFW/abusive content, converting to different codecs and resolutions) before approving the video.
RE this. If you are uploading large files, they will be chunked. as such, these chunked parts can be processed in parallel thereby increasing the rate of processing overall and allowing more videos to be uploaded and processed.
E.g. if you one chunk is determined to be NSFW, then that kills off all trailing chunks; the earlier this is picked up the better (which will occur via chunking)
Very informative and was more like technical talks rather than interviews, which is very good actually. Couple of points I was waiting for but was not discuss were:
1. Algorithm on Intelligence used (in Tiktok or IG)
2. How once played video won't show up again? (can only be played when fetched through ID/CDN)
3. How to track whether a user is interested in a profile and play videos from that profile alternatively?
4. How testing can be performed by this approach?
5. What are the alternatives for the S3 bucket?
1) intelligence?? Is it for recommendations..can u please add more details
2) for that we have implemented a playlist type of thing based on user choice and keep refreshing it using cron and there we keep these checks
3) recommendations engine is the solution and for playing some part we can save smaller chunks less than 5ms and play it as a demo when user clicks it
4) for testing..we have written test to upload asynchronously and use ffmpeg for crrating chunks and later test it in terms of quality and resolution on different screen
Hi, Thanks for making this video. I feel a few points are not covered
1. What kind/technology of queue to use?
2. What will be the request size?
3. How you will break the video and how you will form the video back? In the case of parallel processing, how will you recognize which chunk belongs to which video? There will be multiple requests for uploading from different users.
4. A bit of protocol discussion would have added some advantage?
Also, I have some doubts:-
Won't it be good that the ingestion service returns an S3 URL to the user and then the user uploads the video directly using this URL?
Is it possible for CDN to directly access S3 and we can skip video service? To return the list of videos and thumbnails we can have a separate service.
Talking specifically about s3, AWS provides CDN, CloudFront which restricts access to your S3 bucket. By only allowing CloudFront endpoints to give access to the content, this way the applications will be more secure and responsive.
Thanks a lot Gaurav for this extremely useful video. I must appreciate Yogita for this very detailed system design and component choices right from the queue, S3, CDN, Diff DB's, etc were awesome and especially the processing part of the video via workers. Thank you both!!
I'm just 10 minutes in the video and it's already great! Thank you for this! :D
The idea to split the video file to chunks and process them parallel is really interesting and I feel very fundamental in processing input in general.
How does that happen exactly by the way ? You literally split 1 mb file into three 333kb files and then convert them using any file-format-converter like FFMpeg etc, and then merge again ??
This was really nice discussion, AWS has got a good endorsement…. On a lighter note
It was too good! informative. Hoping to see more such videos. Thanks Gaurva and Yogita.
Excellent session very helpful..u guys r actual heroes for dev like us..
Thanks @gaurav for making such a extremely handy and useful video. Kudos for that. 👍
Can we please have part 2 of this video where you include discuss about the
1. Exception handling and reporting,
2. Ballpark estimate for each component of this system.
3. What strategy to be used a month or a year after to decrease load on the file system.
Hi first of all thank you both of you so much for sharing how things work .i will.wish for your best future
Fantastic video, guys! Thanks so much for sharing! Very insightful!
Inspired me to think about IT in a significant way for the first time
this video is just so precious . many thanks
Awesome stuff ! Thanks for this, Gaurav !
Very helpful discussion around databases. Thanks Yogita and Gaurav!
I quite did not understand how chunking a video would help. A video file has headers and then frame data. Now once the video is ingested we can't simply chunk the blob data (the way we chunk text data normally) without the header. The chunked video would not make any sense without the header. Now if the idea was to 30mb video => chunked to 10mb + 10mb + 10mb video => and then pass each chunk to converters which convert to different codec and resolution, we can't do it with basic blob data chunking because the converters in the pipeline could not work without the headers of original video file. Because the header will be attached to the first 10mb chunk itself.
So essentially the pipeline should look something like
```sh
30mb video files = [header]framedata
= 10mb file [header]framedata ==> converter_fn ===> corrupted file ==\
+ 10mb file framedata ==> converter_fn ===> corrupted file========= | ==> corrupted file after joining
+ 10mb file framedata ==> converter_fn ===> corrupted file======== /
```
So AFAIK, if we have to chunk 30sec video => 10sec + 10 sec + 10sec video => [1] our best shot here is to pass it through ffmpeg to chunk the video (which essentially recreates the header with proper value with frames) . [2] somehow make the client send 3 different videos and during consumption play it as 1 video from UI layer.
Note: When you think about video header, it's not like text HTTP header, it would have general video codec + playback + compression related metadata etc. And then [I guess]
each frame would have its own header (inside the frame blocks) => Chunking needs to take care of all these facts => hence we need something like ffmpeg .
I probably misunderstood the point Yowgita (apologies if I misspelled the name) had made, or probably there are some techniques she implied which I don't know. In either case, would love to know if I missed something in my understanding.
Excellent points
Good points. You are really familiar with video file format! I guess in a real interview the interviewer would not expect candidates to know the stuff you've mentioned, they would probably give bonus points for recognizing that we should chunk videos when they're huge (think RUclips). For TikTok I guess we should be ok just uploading the videos as is
The video is very helpful. Some changes that could be done to the ingestion services. Rather than ingestion service uploading the files to S3. We should make use of multi-part feature of S3 and have client upload to S3 directly by returning the signed S3 URLs to the client. That way you are not putting load on Ingestion service and it become highly scalable. Let AWS take care of the file upload and deal with the bandwidth. Following that design, we do not need to put the videos in different regions. Once the CDN is hooked up with a S3 bucket, it will take care of replicating the data to all the configured regions where the user is supposed to access the video. Let me know what do you think about this approach.
Excellent points, thank you!
Thanks Yogita and Gaurav, looking forward to more such videos
This should be in trending .. Awesome ..
It is a very good video!!!!
Few things to add:
1. Logging is very important
2. Authentication and Authorization
3. Metrics and Reports
4. Containerization & process orchestration (Docker, Kubernetes)
5. API gateway & Microservices
Good discussion guys Thumbs up!
I am not expert on video streaming systems but a few questions out of curiosity-
1. Can't we store video in a standard resolution and transform the resolution while streaming down to user? Shouldn't really require too much compute in real-time but we may save a lot on storage. I may be wrong though.
2. Do we need a very effective data life cycle policy as well to archive and trash old videos? And I agree S3 with glacier may be a good choice.
3. What is the plan to make S3 lookup faster? Any considerations for this in Video Metadata?
4. Applications like this have a list of video that users keep swiping back and forth can this fact be used to make performance better by caching next two and last two videos on user device ahead of time?
Guys I am really impressed with the problem statement and your approach, just speaking out mind here, hope that's ok :)
The fourth point is great one and is actually implemented in tiktok, youtube shorts and other shorts streaming platforms.
amazing video...You should do videos like these more often....
40:28 This is what legends waited for- Caching trending videos!
Thank you for your effort, time, and most importantly its insanely invaluable content. This video proves how intelligent Indians generally are...
We Iranians/Persians admire how intellectual and intelligent they are, we simply cannot take our eyes off them, in all honesty not only they are super brilliant minded but also super hard-working people.😊
Lots of love and respect for you all from IRAN ❤️
Instead of Uploading Files from Api ,
can use direct upload file into S3 using signed S3 url
It would be great to have Yogita interview you in a similar way.
Fabulous video.. Thank you @Gaurav and @Yogitha
wow Yogita is a real pro It was amazing !!!!
super informative , sudoCode effort was really great. Keep making more such content, lets take airbnb as next system.
How are we going to join Mysql & KVs to pick all the videos?
One solution is to have a table which has and then query the kv with video id
Good one, @yogita explained very well.
S3 is Obj Storage, EFS is file storage but anyway got your point from immutability point of view.
Listening to their conversation, surely both have just theoretical knowledge and never worked on something this big.
Liked this very much.. A feedback to Gaurav.
Protocol type used for upload or streaming is an implantation detail and not really required at system design level. If the candidate knows it is brilliant but if candidate doesn’t know, in real world, they can just read up on the latest in streaming solution and use it. Again just an implementation detail.
Good point!
really enjoyed the session and also learned new things, keep uploading more
Few points which I think should be discussed to reduce the latency factor i.e. Inter-service communication (especially when the question was asked to retrieve the user-specific videos). We can think of gRPC protocol for inter-service communication. Moreover, handle the complexity of service mesh through service mesh technology like Istio for runtime service discovery. Will the video chunking be done at the client side or server side & how? Does S3 or CDN supports video streaming, if not how are we going to handle it? We can think of MongoDB with GridFS, as it supports binary file streaming. You talked about the pipeline but did not clarify the approach to implement the same. Maybe we can think of the AWS Lambda functions that will be triggered on the file upload event of the S3 bucket. Lambda functions will take care of all parallel activities that were discussed, along with the video Metadata formation like thumbnails. This part was also not discussed that how thumbnails would be created. & many more things which probably is not possible to discuss over 45 mins video. But really enjoyed the session and had a great learning. Thank you for the efforts
Great video, super helpful!
2 quick thoughts I had:
1. I don't think User data needs to be in a SQL database. While it would have the benefits of ACID compliance, the most common access pattern seems to be just getting all the info for one user and also updating fields for one user. So storing it in the same NoSQL document-style DB as video metadata like MongoDB would possibly suffice? It would also make it easier to horizontally scale as well. Don't think it's a critical part of the design of the system though!
2. I'd be curious how we'd want to shard the video metadata DB for it to be performant for the recommendation engine.
Good job though, I enjoy the mock interview format!
Thank you!
1. That's right, a NoSQL database would also work fine here.
2. I would likely shard it by Genre or Language.
This video is amazing guys, great work
In case of video editing is also a requirement. S3 Versioning of files can be helpful. So choosing s3 fits that too. Thoughts?
Yes, and even S3 data storage types. If this app is ingesting 1.2TB of files everyday, it'd make sense to store the raw files in a S3 file storage type that costs less money. Version your videos, and setup lifecycle rules to convert them to "Infrequent access" storage type, after let's say, 15 days.
@@kanuj.bhatnagar I agree with storing the videos in some object storage (s3 or google cloud storage) but applying lifecycle in 15 days is not good. I mean for that we can easily update the cache to remove the older video but at least wait for 180 days to move next lifecycle. As access rate for not frequent data changed high in could storage system.
thanks for knowledge sharing.
Sir i think first We don't need to devide in to chunks or we can use format like hls or dynamic mp4 for streaming it can significantly reduce storage or also we need one db like clickhouse for things like recommendation engine or for video uploading we can use like presigned url to store temp and we can easily implant editing pipe line with this approach we can use saga for tranction if any step we can revert full process with out any manual action this is my conclusion what i observed youtube,hotstar request patern
Suggest any problem with this approach
😊
Awesome thanks Gaurav and Yogita 👍
I wish my interviewers asked a lot like Gaurav. Normally, they just sit there and keep quiet most of the time :)
Awesome video. Just want to check, at 13:13 she mentioned "We can have multiple S3s in multiple Regions". I believe, S3 is a global service and each bucket name should be unique.