Extremely high availability...lot of earthquake everywhere....still you can see videos. It was really funny when you said "You might not want to see videos at that time, but thats a different story" 🤣🤣
Great video ..!! However if you are in a 45 min interview to design RUclips like service…. Talking about tags .. talking about user sharing accounts. or ppl clicking through pagination to find the videos can be seen as diverting from the core requirement. As a candidate you need to strictly focus on how users upload, how the server processes the videos formats etc, second how videos are streamed when user clicks on a video. As focused and great the first part was… the second part was all over the place, if someone were to just see the 2nd half they might think the question was “design google search”
It is really useful to watch your videos to get some understanding about the system designs & not only for interviews. I have doubts- 1) Why do we need to do the tagging for each chunk, for what purpose are we doing the tagging. 2) If ISP decides to do the caching on its side, then how are we planning to collect the statistics & how do we manage the accessibility of such cached content.
Very good, bud! Thank you! Working on just internal applications at non big-tech, not dealing with such big-data or this detailed analytics, never makes you get aware of the scope/scale or all these testcases for these big platforms out there, and how they work such smart and efficiently.
IMO, E2E is very broad - which is super helpful! But one needs to skim through to get the basics of the components. Choice DB, caching & partitioning could have been little deeper than explaining the user login or analytics (probably a topic on it's own). But definitely very helpful overall.
I really enjoy these videos, but in a way, it's a little daunting - the analysis/explanation of all your components is taking at minimum 1 hour, where as I think most systems design interviews take place E2E within maybe 50 minutes. For example, I'm not sure if log in is something you really have to cover at all in this kind of interview, as it's mostly irrelevant and non-specific to this application. Also I might be just dumb but I feel like your system architecture design is quite higher in quality compared to other interview resources, but at the same time much more unrealistic (in my opinion). I think for me, explaining the design choices would be more helpful rather than you walking through the flow of why this architecture was set up this way. You mentioned in a comment below that it would be repetitive, but I think if you explain it in the context of each interview problem it would actually be very helpful. Thanks for all your content though, I am learning a lot!
It must be intentional - since all ppl will not have same skill - some might need it to be slow - Use the speed button to double the speed and it would suddenly turn into a 30min video
Hello Sandeep, good job. Liked your plain & no nonsense way of teaching. Also, I liked the color code of components (services, open sources etc.), that makes it easy to understand. One more thing, you speak slowly/calmly, which helps immensely, that sets the pace, while we hear we can think as well.
Thanks @codeKarle. Your system design videos are gems. These videos for sure will gain traction in future and will get its much deserved views and likes.
Not a bad video. Thanks for this. Two things I learned - The presenter's favorite movies :) - Why I'm so frustrated by the the recommendations in any sort of systems. The "AI" thinks bc I like numbers 2 and 3 and someone likes numbers 1 and 2 and 3, -> I'm going to like the number 1 as well, however I don't like non-prime numbers, but the dumb "AI" will never notice that. And bc they copy each other's homework, all recommendation systems in the world lets me down.
Great Video. Thanks and kudos to the hard work you have put in. I just have one request - when using an external component like Cassandra, if you could compare with other alternatives and talk about why you chose this, that will be of great help.
We have done that in some of the videos, at random places. Doing that everytime would have become repetitive. You can check out the Databases video. There you would get to know about the alternatives that are available: ruclips.net/video/cODCpXtPHbQ/видео.html
Very nice video, complete in all respects. cover lots of topics: CDN, search, recommendations, optimization - great content, and thoroughly enjoyed. Thanks for sharing the knowledge.
Amazing video like others, thank you Sandeep! One request though, this video does not have a writeup like others. I find that very useful for making notes. Please keep up the good work!!!
All I got to learn was that you're a big time fan if mission impossible! Just kidding.. Great Work! Appreciate all the hard work you've put in to consolidate all this 👍
This was really good, but would appreciate more if you could speak a more about caching and database design with tradeoffs (like relational DBs vs Cassandra), how would we cache content in CDNs, etc.
For Content Processor, instead of storing chunks directly to CDN, I think transcoded chunks should be stored to Output S3 bucket and CDN can use Output Bucket as Origin server.
@@Saurabh2816 i think he isnot saying that we should get file from S3 bucket over cdn. He is saying instead of the cdn uploader uploading the chunks to cdn, why dont we first upload to s3 and then there is an automatic sync available in aws that keeps all the cdns (cloudfronts) updated with the origin (s3 chunks) in this case. I think that would have some pros and cons
yes, that would have pros and cons Pros 1) there will be one master source of the chunks in s3 for later retrieval if required 2) leveraging existing sync process from AWS S3 to AWS cloudfront would be super efficient Cons 1) More redirection and complication in terms of process flow. But i dont think it is a major cons as this happens when user is not waiting for something.
Hey, Thank you so much all your knowledge sharing. I am able to perform very nice in all my interviews. Keep up the good work. More power to you. Keep rocking!!!
Hey Sandeep, your videos are very detailed and help learn a lot. Thanks for making these videos. Please continue to make more videos as these are really awesome!!!
Questions: 1) Do you put all the data from multiple events and sources in one Kafka queue or do you use separate Kafka queues? If it is only one KAFKA queue how do multiple consumers decide which ones they want to process ? Do they have to do that inside their code or does KAFKA provide that configuration? 2) Instead of KAFKA would Amazon SNS/SQS work in this case , if not why not?
Kafka is not meant for parallelize processing on a message-by-message basis but a AMPQ style broker is (rabbitmq/amazonsqs). So I think the video was wrong about choosing kafka to load balance messages among the content processor
Agree with this, in an interview we would be asked to create the diagram live. I would prefer if you would create the diagram on camera while explaining what each component does.
Really awesome content and great explanation. One request from my side is please increase the video sound or use some other mic which provide great sound. Middle of video I heard very low compare to beginning or ending of the video. Thanks for the useful awesome content
Thank you for the detailed explanations. What I recommend is for your videos, you can add some pictures such as showing Netflix home page during mentioning Collaborative filtering. You can add some colors to your videos to engage more. Thanks again for your effort.
I believe it should be usable for audio streaming. The main difference would be your bandwidth and storage requirements, because sound files are inherently different from video files. You may also consider that most users tend to listen to the same songs over and over again stored locally on their device, in contrast to RUclips video where most users stream new videos on a constant basis while almost never downloading those videos locally. This could have implications on bandwidth as well as analytics - if you want to aggregate a user's behavior for recommendation purposes when they mostly listen to their local library (as opposed to music streaming), how do you get that information? One way would be to have the client send an event about to the server side every time a song is played or completed (this can be done asynchronously, it does not have to be sent immediately after a song is played), where Kafka could route that event to a Spark cluster, which would then be stored in a Hadoop cluster for the Recommendation Engine to consume.
Very nice presentation. Thanks very much. A question I have: why do you choose cloud services (like Amazon S3) for some part and decide to use your own stack in other places, e.g., running your own Cassandra cluster?
There is no technical reason for that, I try to use very common solutions which anyone can use easily. People are generally comfortable to use S3 as a file store, because a lot of companies use it, so it makes it easy to understand the larger picture. Any other solution would also be equally good :) My main idea is to tell people how can they design a system using solutions that they can easily get their hands on. This particular thing is explained more in ruclips.net/video/cODCpXtPHbQ/видео.html
49:10 I couldn't understand why you used cassandra for home page service. You mentioned that it contains information about user such as likes, dislikes etc. But doesn't this information change very often. For example, if I watch a video on system design today, then I'd like the first row of my home page to contain another system design video. For that to happen, we have to update the Cassandra row. But I heard Cassandra is not good for updates. I understand that it is good here because it handles reads very well and it's "always on". Cassandra scales very well with additional data. But we don't need it here, right? There won't be many new user accounts getting created every minute. So can we use something like mysql + cache(maybe a write-back) similar to what you suggested for user service?
Still dont know how is a video played? I mean, there are so many chunks stored in CDN server. So How does CDN organize these chunks and merge them into one whole movie? Will it merge completely before playing? Or the service just load one or several chunks at one time to give to the user? If there were any troubles during the playing, how can the service fix them?
Iam a fresher and tomorrow is my Interview of system design but after watching this video I think sytem design is too much for fresher....I cleared 1st round, In my 1st round they gave me 2sum problem and deletion of anode, they are giving me 3lpa with 1yr bond.
Amazing Video thanks for sharing this ,Can you please add Summary for all your design videos , I see you have added for a few which gives a lot of sense . Thanks Again for all the great work 👍👍👍
Great video thanks !!! Have a few doubts - 1. Why did we choose cassandra instead of Hbase for storing graph? Tutorials about their internals will be good. 2. Sometimes interviewers do not mention the requirements clearly, even after asking. There is some miscommunication. How do we resolve this? Egs - Some say how will we scale read/write requests(even after horizontal scaling, caching, CDNs, sharding in case data storage is huge). 3. Is it ok to mention the DB? Some could ask internals in case we dont know. Thanks in advance
1. You could use HBase as well. Cassandra and HBase are very similar in terms of performance and use-cases they cater to when looking from a high level. For alternatives for most DBs, you can check this video: ruclips.net/video/cODCpXtPHbQ/видео.html 2. Depends on a case by case basis. But it's usually a good idea to tell your assumption and then solve for it. Usually people tend to tell what they expect when you do that. 3. If you have some idea about the DB, you should say a name, or a type, for example you can say that you want to use a key-value store and not explicitly mention redis. But if you have no idea about a DB, don't throw random names. Also, no one expects you to know internals of a DB unless you are applying for a DBA role. All you need to know is the use-cases that the DB is good at.
Thanks for this video .. do you mean after processing rom spark cluster you will store in HDFS (Hadoop cluster).? In short what will be used from hadoop cluster.?...
Hi codeKarle, nice video. I'd appreciate it if you could explain more about why you made decisions. For instance why cassandra? why use an async pipeline?
One ques here - why did you introduce apache spark streaming between kafka and hadoop cluster in this design? Couldn't hadoop cluster directly consume from kafka and itself perform batch processing as opposed to spark consuming all data and then sending to hadoop in batches?
Very nice presentation. I have a small question, how is video played by same user account across multiple devices at same time handled. And is it synchronized, i.e., if I watch a video on my phone for 20 min, and then login to my laptop, it still shows 20 min. I am guessing we regularly send user activity information to the backend, and so anytime we login or homepage is refreshed, it loads up the user's recent activity. What do you think ?
How does the inter-service communication is handled in this scenarios, Does the requests goes through load balancer for every service to communicate since I am considering multiple instances of one or more services may be running at one point of time.
I have a query on the upload service, i.e how are we handling the case when let's say some of the process in the content processor fails , are we maintaining state of each chunk and in case how are we handling which chunk needs to be retried?
Great video as usual. Does live streaming of cricket matches say by hotstar or disney use the same process. I am sure there will be different challenge as we wont have enough time for transcoding , creating lower quality images and then uploading chunks to cdn. how to handle that scenario where we have to live stream with say very very near real time say not more than 10 seconds
for predicting what you would watch tomorrow, is it ok to have the client occupy bandwith and user device storage just for the prediction? or is there a mature solution to achieve it on the server or cdn side?
Thanks a lot for making the video! You have provided a holistic view of the design! I have a few questions though. 1) Does the elastic search has its own data store or will it be indexing on Cassandra? 2) There is a link shown between Elastic Seach cluster and Recommendation Engine. Not able to understand the purpose?
Thanks!! To answer your queries: 1. ES would have it's own data store to store the data and the indexes. 2. The arrow between the Recommendation engine and Elastic Search is because the Recommendation Engine reads from Elastic Search.
@@codeKarle reads from ES? In the video while explaining choosing between different thumbnails I think you mentioned that recommendations engine feeds the data in es. Now when search service queries, it gets result from es based on the recommendations stored. I am confused
Really nice video goes into details of many areas. details about openconnect show efforts you have put into getting to details. Since its big video do you plan to share summary link like you did with few other videos ?
Before even uploadign to S3 - dont you think that content processor should first check for filteration lets say if its against the policy its not worth to upload on S3
Amazing and to the point explanation. Just one thing though, is this the sufficient approach for any HLD interview or do we need to go into DB design and API as well?
cool video. One question: those content filters, how do the work? Is it a machine watching video chunks and somehow defining privacy, nudity etc levels? Or will it be youtube employees to watch those chunks and set the levels/tags of privacy, nudity, legal etc?
Thank you for awesome video. One doubt regarding open connect. I doubt it is cost neutral for Netflix by moving the harddisk from CDN to ISP provider as there can be more ISPs to add the hard disks compared to number of CDNs.
Great Video!! Doubt: Say, in the Original S3, the links for Format A Resolution 480 Chunk X is L1 Format B Resolution 1080 Chink Y is L2 And so on.... Doubt 1: What will be the link be after storing them to the Cache (open connect) at ISP? Doubt 2: How will viewer client get to know know URL? Doubt 3: The Database has the links to L1, L2, L3...of the original S3 storage. How will it fetch the links saved in the OC(cache at the IPS)? What is the flow? Like: Client - ISP - fetch video service - DB - ISP - Cache - client ?
Extremely high availability...lot of earthquake everywhere....still you can see videos. It was really funny when you said "You might not want to see videos at that time, but thats a different story" 🤣🤣
The breadth of your videos is absolutely unmatched.
Hey I am an SDE 2 at Amazon. I went through the entire video. Great content man! Very informative. Worth the length.
amazon sde-2's are overrated they don't know basic things
Is that true bro?@@Markcarleous1903
@@Markcarleous1903where do you work buddy
Great video ..!! However if you are in a 45 min interview to design RUclips like service…. Talking about tags .. talking about user sharing accounts. or ppl clicking through pagination to find the videos can be seen as diverting from the core requirement. As a candidate you need to strictly focus on how users upload, how the server processes the videos formats etc, second how videos are streamed when user clicks on a video. As focused and great the first part was… the second part was all over the place, if someone were to just see the 2nd half they might think the question was “design google search”
It is really useful to watch your videos to get some understanding about the system designs & not only for interviews.
I have doubts-
1) Why do we need to do the tagging for each chunk, for what purpose are we doing the tagging.
2) If ISP decides to do the caching on its side, then how are we planning to collect the statistics & how do we manage the accessibility of such cached content.
Very good, bud! Thank you!
Working on just internal applications at non big-tech, not dealing with such big-data or this detailed analytics, never makes you get aware of the scope/scale or all these testcases for these big platforms out there, and how they work such smart and efficiently.
IMO, E2E is very broad - which is super helpful! But one needs to skim through to get the basics of the components. Choice DB, caching & partitioning could have been little deeper than explaining the user login or analytics (probably a topic on it's own). But definitely very helpful overall.
Your code karle videos are amazing, Sandeep. No BS; comprehensive; pure distilled information. Thank you!!!
The best I have ever seen! Impressive work, Sandeep!
I really enjoy these videos, but in a way, it's a little daunting - the analysis/explanation of all your components is taking at minimum 1 hour, where as I think most systems design interviews take place E2E within maybe 50 minutes. For example, I'm not sure if log in is something you really have to cover at all in this kind of interview, as it's mostly irrelevant and non-specific to this application. Also I might be just dumb but I feel like your system architecture design is quite higher in quality compared to other interview resources, but at the same time much more unrealistic (in my opinion). I think for me, explaining the design choices would be more helpful rather than you walking through the flow of why this architecture was set up this way. You mentioned in a comment below that it would be repetitive, but I think if you explain it in the context of each interview problem it would actually be very helpful. Thanks for all your content though, I am learning a lot!
It must be intentional - since all ppl will not have same skill - some might need it to be slow - Use the speed button to double the speed and it would suddenly turn into a 30min video
Hello Sandeep, good job. Liked your plain & no nonsense way of teaching. Also, I liked the color code of components (services, open sources etc.), that makes it easy to understand. One more thing, you speak slowly/calmly, which helps immensely, that sets the pace, while we hear we can think as well.
bhai, bahut hard work kiya hai apne explain krne ke liye.. thank you for your free service with knowledge
Thanks @codeKarle. Your system design videos are gems. These videos for sure will gain traction in future and will get its much deserved views and likes.
bohot badhiya Sandeep bhai - you are a combination of good looks, good brain, and good attitude
Not a bad video. Thanks for this.
Two things I learned
- The presenter's favorite movies :)
- Why I'm so frustrated by the the recommendations in any sort of systems. The "AI" thinks bc I like numbers 2 and 3 and someone likes numbers 1 and 2 and 3, -> I'm going to like the number 1 as well, however I don't like non-prime numbers, but the dumb "AI" will never notice that. And bc they copy each other's homework, all recommendation systems in the world lets me down.
Great Video. Thanks and kudos to the hard work you have put in.
I just have one request - when using an external component like Cassandra, if you could compare with other alternatives and talk about why you chose this, that will be of great help.
We have done that in some of the videos, at random places. Doing that everytime would have become repetitive.
You can check out the Databases video. There you would get to know about the alternatives that are available: ruclips.net/video/cODCpXtPHbQ/видео.html
Awesome explanation Sir. I have not seen so much detailed and informative explanation for system design questions.
i felt that good recommendation system should be a functional requirement
Very nice video, complete in all respects. cover lots of topics: CDN, search, recommendations, optimization - great content, and thoroughly enjoyed. Thanks for sharing the knowledge.
Thanks!! Glad that you find it useful :)
Amazing video like others, thank you Sandeep! One request though, this video does not have a writeup like others. I find that very useful for making notes. Please keep up the good work!!!
All I got to learn was that you're a big time fan if mission impossible! Just kidding.. Great Work! Appreciate all the hard work you've put in to consolidate all this 👍
One Suggetion. It would be be wonderful if you could break the video into chapters. The content was amazing. Thanks for sharing your knowledge.
This was really good, but would appreciate more if you could speak a more about caching and database design with tradeoffs (like relational DBs vs Cassandra), how would we cache content in CDNs, etc.
For Content Processor, instead of storing chunks directly to CDN, I think transcoded chunks should be stored to Output S3 bucket and CDN can use Output Bucket as Origin server.
but wouldn't you prefer to get a file from a CDN then from a S3 bucket? What's the benefit?
@@Saurabh2816 i think he isnot saying that we should get file from S3 bucket over cdn. He is saying instead of the cdn uploader uploading the chunks to cdn, why dont we first upload to s3 and then there is an automatic sync available in aws that keeps all the cdns (cloudfronts) updated with the origin (s3 chunks) in this case. I think that would have some pros and cons
yes, that would have pros and cons
Pros
1) there will be one master source of the chunks in s3 for later retrieval if required
2) leveraging existing sync process from AWS S3 to AWS cloudfront would be super efficient
Cons
1) More redirection and complication in terms of process flow. But i dont think it is a major cons as this happens when user is not waiting for something.
Great work! Thank you for creating these.
I appreciate all the efforts you put in sharing out this video
Hey, Thank you so much all your knowledge sharing. I am able to perform very nice in all my interviews. Keep up the good work. More power to you.
Keep rocking!!!
Hey Sandeep, your videos are very detailed and help learn a lot. Thanks for making these videos. Please continue to make more videos as these are really awesome!!!
Questions:
1) Do you put all the data from multiple events and sources in one Kafka queue or do you use separate Kafka queues? If it is only one KAFKA queue how do multiple consumers decide which ones they want to process ? Do they have to do that inside their code or does KAFKA provide that configuration?
2) Instead of KAFKA would Amazon SNS/SQS work in this case , if not why not?
have you found an answer to this question?
looking for the same
Kafka is having concept of topics.. Multiple events are put into various topics.. Which are subscribed by the required consumer
Kafka is not meant for parallelize processing on a message-by-message basis but a AMPQ style broker is (rabbitmq/amazonsqs). So I think the video was wrong about choosing kafka to load balance messages among the content processor
well explained in detail. Looking forward for videos on data structures, in java as coding language.
Loved it! though it would have been good if you could please sketch the diagram and explain each block at the same time.
Agree with this, in an interview we would be asked to create the diagram live. I would prefer if you would create the diagram on camera while explaining what each component does.
Really awesome content and great explanation. One request from my side is please increase the video sound or use some other mic which provide great sound. Middle of video I heard very low compare to beginning or ending of the video. Thanks for the useful awesome content
Great video. Got a doubt. The chunks will be stored only in the CDN ? is that the best way ? shouldn't we store the processed chunks as well ?
Very detailed and informative explanation. Thanks for all the efforts in teaching us system design.
Thank you for the detailed explanations. What I recommend is for your videos, you can add some pictures such as showing Netflix home page during mentioning Collaborative filtering. You can add some colors to your videos to engage more. Thanks again for your effort.
Thanks! Honestly it looks like it might require a lot of editing effort, if thats not too much , we'll definitely try and do that :)
Awesome video . Seamless delivery . Can the same system design be used for Audio streaming service like spotify or youtube music ?
I believe it should be usable for audio streaming. The main difference would be your bandwidth and storage requirements, because sound files are inherently different from video files. You may also consider that most users tend to listen to the same songs over and over again stored locally on their device, in contrast to RUclips video where most users stream new videos on a constant basis while almost never downloading those videos locally. This could have implications on bandwidth as well as analytics - if you want to aggregate a user's behavior for recommendation purposes when they mostly listen to their local library (as opposed to music streaming), how do you get that information? One way would be to have the client send an event about to the server side every time a song is played or completed (this can be done asynchronously, it does not have to be sent immediately after a song is played), where Kafka could route that event to a Spark cluster, which would then be stored in a Hadoop cluster for the Recommendation Engine to consume.
Very nice presentation. Thanks very much. A question I have: why do you choose cloud services (like Amazon S3) for some part and decide to use your own stack in other places, e.g., running your own Cassandra cluster?
There is no technical reason for that, I try to use very common solutions which anyone can use easily. People are generally comfortable to use S3 as a file store, because a lot of companies use it, so it makes it easy to understand the larger picture. Any other solution would also be equally good :)
My main idea is to tell people how can they design a system using solutions that they can easily get their hands on. This particular thing is explained more in ruclips.net/video/cODCpXtPHbQ/видео.html
Absolutely brilliant. Thank you SO much for this much detail and depth.
your walk through is so much better than those "L8 engineer" or ex-Fang tutorials. shame that the accent is a bit thick for me
Super video bro! This cleared out a lot of questions and helped me understand end to end architecture of video streaming services.
AWesome video. One feedback, the echo from the mic is making it hard to understand and if you can switch, will make it more easy
Peer to Peer Protocol - used in Torrents, DC++ or any filesharing across thousands of machines.
Though it was very long but it was very informative and very detailed. Thanks for it 👍
Awesome and easily understood content!
49:10 I couldn't understand why you used cassandra for home page service. You mentioned that it contains information about user such as likes, dislikes etc. But doesn't this information change very often.
For example, if I watch a video on system design today, then I'd like the first row of my home page to contain another system design video. For that to happen, we have to update the Cassandra row. But I heard Cassandra is not good for updates.
I understand that it is good here because it handles reads very well and it's "always on". Cassandra scales very well with additional data. But we don't need it here, right? There won't be many new user accounts getting created every minute.
So can we use something like mysql + cache(maybe a write-back) similar to what you suggested for user service?
Still dont know how is a video played? I mean, there are so many chunks stored in CDN server. So How does CDN organize these chunks and merge them into one whole movie? Will it merge completely before playing? Or the service just load one or several chunks at one time to give to the user? If there were any troubles during the playing, how can the service fix them?
Very detailed explanation, Great work!!!
Iam a fresher and tomorrow is my Interview of system design but after watching this video I think sytem design is too much for fresher....I cleared 1st round, In my 1st round they gave me 2sum problem and deletion of anode, they are giving me 3lpa with 1yr bond.
Amazing Video thanks for sharing this ,Can you please add Summary for all your design videos , I see you have added for a few which gives a lot of sense . Thanks Again for all the great work 👍👍👍
Very very nicely explained. Thank you so much!
Great video thanks !!! Have a few doubts -
1. Why did we choose cassandra instead of Hbase for storing graph? Tutorials about their internals will be good.
2. Sometimes interviewers do not mention the requirements clearly, even after asking. There is some miscommunication. How do we resolve this? Egs - Some say how will we scale read/write requests(even after horizontal scaling, caching, CDNs, sharding in case data storage is huge).
3. Is it ok to mention the DB? Some could ask internals in case we dont know.
Thanks in advance
1. You could use HBase as well. Cassandra and HBase are very similar in terms of performance and use-cases they cater to when looking from a high level. For alternatives for most DBs, you can check this video: ruclips.net/video/cODCpXtPHbQ/видео.html
2. Depends on a case by case basis. But it's usually a good idea to tell your assumption and then solve for it. Usually people tend to tell what they expect when you do that.
3. If you have some idea about the DB, you should say a name, or a type, for example you can say that you want to use a key-value store and not explicitly mention redis. But if you have no idea about a DB, don't throw random names. Also, no one expects you to know internals of a DB unless you are applying for a DBA role. All you need to know is the use-cases that the DB is good at.
Thanks for this video .. do you mean after processing rom spark cluster you will store in HDFS (Hadoop cluster).? In short what will be used from hadoop cluster.?...
I gave a thumbs up and watched it till the end, will this be rated as a super awesome video
Hi codeKarle, nice video. I'd appreciate it if you could explain more about why you made decisions. For instance why cassandra? why use an async pipeline?
Such great videos on system design. Hats off !!! :) :)
Can you please provide a summary for this video which has been done for several of your other presentations?
Awesome video .. very in-depth touched a lot of different things, great explanation.
Why did you stop creating content? Awesome video explanations!
Should we use any caching for recommendation (in home page service) as well ?
Quite detailed, thanks for ur efforts.
Crystal clear explanation!!!
super quality content. Thanks a lot for sharing.
One ques here - why did you introduce apache spark streaming between kafka and hadoop cluster in this design? Couldn't hadoop cluster directly consume from kafka and itself perform batch processing as opposed to spark consuming all data and then sending to hadoop in batches?
Awesome video man !! Thank you so much 😍
Beautiful explaination
Very nice presentation. I have a small question, how is video played by same user account across multiple devices at same time handled. And is it synchronized, i.e., if I watch a video on my phone for 20 min, and then login to my laptop, it still shows 20 min. I am guessing we regularly send user activity information to the backend, and so anytime we login or homepage is refreshed, it loads up the user's recent activity. What do you think ?
Very nicely explained.
I liked you explanation but you should try to reduce it little and make it more bullet points centric , like you explained for cassandra .
How does the inter-service communication is handled in this scenarios, Does the requests goes through load balancer for every service to communicate since I am considering multiple instances of one or more services may be running at one point of time.
Awesome video .. Covered great depth of this difficult topic ..
Great Content. Super helpful
This is awesome! and in great detail. Thank you for this video & Please do more on the subject. Subscribed!
I have a query on the upload service, i.e how are we handling the case when let's say some of the process in the content processor fails , are we maintaining state of each chunk and in case how are we handling which chunk needs to be retried?
Great content. Thank you Sandeep.
Superb explanation. But please also try to cover how the data is sharded across different servers
Thanks!
Sure, probably in the next ones I'll cover that.
Great video as usual. Does live streaming of cricket matches say by hotstar or disney use the same process. I am sure there will be different challenge as we wont have enough time for transcoding , creating lower quality images and then uploading chunks to cdn. how to handle that scenario where we have to live stream with say very very near real time say not more than 10 seconds
Very nice video, very good explanation, thank you so much.
As the traffic is encrypted, how ISP can cache it?
for predicting what you would watch tomorrow, is it ok to have the client occupy bandwith and user device storage just for the prediction? or is there a mature solution to achieve it on the server or cdn side?
Thanks a lot for making the video! You have provided a holistic view of the design!
I have a few questions though.
1) Does the elastic search has its own data store or will it be indexing on Cassandra?
2) There is a link shown between Elastic Seach cluster and Recommendation Engine. Not able to understand the purpose?
Thanks!!
To answer your queries:
1. ES would have it's own data store to store the data and the indexes.
2. The arrow between the Recommendation engine and Elastic Search is because the Recommendation Engine reads from Elastic Search.
@@codeKarle reads from ES?
In the video while explaining choosing between different thumbnails I think you mentioned that recommendations engine feeds the data in es. Now when search service queries, it gets result from es based on the recommendations stored.
I am confused
You are awesome man. More power to you :D
Thanks!! Glad that you liked it :)
Really nice video goes into details of many areas. details about openconnect show efforts you have put into getting to details. Since its big video do you plan to share summary link like you did with few other videos ?
Before even uploadign to S3 - dont you think that content processor should first check for filteration lets say if its against the policy its not worth to upload on S3
Amazing and to the point explanation. Just one thing though, is this the sufficient approach for any HLD interview or do we need to go into DB design and API as well?
cool video. One question: those content filters, how do the work? Is it a machine watching video chunks and somehow defining privacy, nudity etc levels? Or will it be youtube employees to watch those chunks and set the levels/tags of privacy, nudity, legal etc?
Superb explanation
Glad that you liked it!
@@codeKarle ❤️
Do share it with your friends/colleagues if you found it good 🙂
this video is really helpful
Great explanation and informative content. Keep going 🙂👍
Thank you!! There is a lot more coming your way :)
Thank you for awesome video. One doubt regarding open connect. I doubt it is cost neutral for Netflix by moving the harddisk from CDN to ISP provider as there can be more ISPs to add the hard disks compared to number of CDNs.
great video
Please make a video on Distributed Job Scheduler.
Is it a good idea to just throw a message queue like kafka in between everything?
Elastic search cluster is shared between two services, is that recommended?
Thanks for very informative video.
@15:09 How system handles if "content filter" or any other step fails for a chunk of video ?
when you say local CDN 1, 2,3 at 58:50, do you mean different servers, server1, server2, serve3 within the same local CDN?
Awesome
Great Video!!
Doubt:
Say, in the Original S3, the links for
Format A Resolution 480 Chunk X is L1
Format B Resolution 1080 Chink Y is L2
And so on....
Doubt 1:
What will be the link be after storing them to the Cache (open connect) at ISP?
Doubt 2:
How will viewer client get to know know URL?
Doubt 3:
The Database has the links to L1, L2, L3...of the original S3 storage. How will it fetch the links saved in the OC(cache at the IPS)?
What is the flow? Like: Client - ISP - fetch video service - DB - ISP - Cache - client ?
Great Content 👍. Can you please explain how livestreams are handled?
Excellent video!
You said we are going to upload each chunk on the CDN, if that is true, then why are we aggregating the chunks using spark again?