Bro you are fantastic. So today I had an interview with MSCI (Morgan Stanley) and they asked me about designing around Ola, where they wanted me to show the cabs around a user as soon as he logins to the app. I remembered this video of yours and explained how things work and how the mapping service would divide the city into grids and fetch the information. The interviewer was satisfied and I feel I have cracked the round. All thanks to you \m/
Great video on two main dimensions that makes it easy to follow and appreciate: 1. Scope of the product is clearly defined 2. Delivery of content is very crisp and to the point Cant stress how many other videos out there have a catchy title like "Design Uber service" that goes all over the place in terms of what it wants to cover and how to delivery the content. This is the best video on Uber System design available on RUclips.
It would have been great if you had covered how the request for cab is fanned out to multiple drivers and once any one driver accepts the trip, how the request is dropped from all the other drivers. overall content is very good and helpful. Very good effort.
All your videos are very good. They are good to think about various requirements, splitting a given problem into multiple services and so on. But I think you haven't been talking about the data model, how it can be represented in a NoSql vs RDBMS vs graph database, and so on. Many interviewers expect that discussions, if not all. Hope you can do that in your future videos. Thanks!
man, you are real good. seems you have worked at Uber or 1 of these cab apps, and have great understanding of when/where to use different tools/databases. thx much! But 1 point these are so detailed (including the time that would take to draw each of those layouts and bullet points), and some of these sys design interviews are only for 45 minutes (which actually may be only 35 minutes leaving the 5 minutes each for intro/outro), can't cover everything, but we can at-least layout the buffet of knowledge we have and let them pick what they would want us to go over.
I watched a couple of videos on Uber system design and wasn't satisfied and luckily ended up here. Very good explanation focusing on key components. Thanks bro !
I think the complexity of maintaining redis to map driver id - handler and the reverse mapping also can be avoided. If we load balance the request from driver based on driver id hash, the request will go to the corresponding socket handler, and when the response comes back say driver gets a ride booked, the driver object will have the id from which corresponding handler can be found. Thereby eliminating the need to store mapping.
when a new driver comes online and gets connected to a websocket, how the websocket manager gets to know the fact and how does it update mapping? Is it using internally a HashMap?
Very confused between roles of Location service and Map service. Who takes lat long and returns segment id ? Who identifies the list of drivers based on segment?
In many interview people ask algorithms in system design round. but I see your most of the design is like use kafka for self pace or decoupling, scaling. Redis for cache over mysql. It would be great if you can add algo in each explanation.
Hi, thanks for the content on this channel! Had one question though.. in the customer flow, we will need a load balancer that makes a web socket connection to the cab request service right? Since there can be multiple customer requests at the same time?
Thanks Sandeep ... Such a great video. just one thing so customer also connects with Map service to find out its location + the A(pickup) and B(drop point) point
Why did you choose request response model for location update from driver? I think event driven model is more suitable for this. Please let me know your comments
Hi, In the video, you said we will keep the driver location in NoSQL DB, what would be the sharding key for cabs location data? To fetch fast their current location. The driver will be continuously moving after every 4-sec new location feed will come and the old location feed will be stale data. Please help.
Had a similar question, from the use case point of view it makes sense to keep the segment id as the partition key and in the wide columns add the various cabs. But the issue is a cabs segment keeps changing, so how do entry has to be deleted from older segment too, how do we do this since Cassandra does not support deleted very well
I think we are just concerned with the current location of the driver. So we can keep that in redis with key being driver_id and value being lat-lon and segment I'd. This in turn can help in deciding if driver has moved to another segment which can further be updated in the redis which also contains segment_id v/s list of driverids.
Yes Pratyush, we are experimenting with a couple of options these days. Looks like recording videos is a far more complicated thing than building software :)
How can we scale the websocket handlers, if we have lakhs of drivers live at the same time how can we handle that? We just can't add more and more websocket servers right?
would it not be easier to put driver locations in a kafka first and then have multiple consumers reading for location service? This way the load will be fairly balanced.
Great video! Watched 3 other videos and this one is the best by far. One thing though is you never mentioned what Location Service will write to Kafka. Can you please clarify this?
i think location service pushes events regarding payments and pricing. It was mentioned by the author that location service also calculates pricing and payments for drivers
Location service will send 1. details about route selected between source and destination and the time it takes. It will help improving MapService suggestions. 2. Route taken by driver is different from suggested route. It will help in Fraud Service, driver profiling, driver priority engine and user profiling.
um...what happend when the driver is offline (car crash, no battery, phone break) or reject the trip, when you try to assign a trip to him, how to handle the confirmation ?
Hi, If I understand well the cab finder send a message to kafka and then location service get the message and put another message to kafka (Or the ACK contains the list?) with cabs and then cab finder send the notification to all of those cabs and request an acceptance, then when one of them accept the trip the cab finder response with the cab to the cab request service?
What is this LB doing ? In couple of diagram you have linked the D1 - > Websocket Handler1. It is not clear the other endpoint of WebSocket connection. Should be LB, right?
Thanks for amazing content on the channel! Could you please clarify how would the maps service converts lat-longs to segment? I was thinking if the segment size is constant, it would just be a function of lat long, but with variable segment sizes (for avoiding too many or too few drivers), would you need an index?
I think here geohash can be one option where world is divided into 32 sections which keeps on dividing recursively and we can always derive a geohash from lat long which can serve as an identifier for the segment.
@@uditagrawal6603 Thanks or bringing up geohash -- it makes sense for the drivers to constantly publish their 8 or 9 byte geohash .. the precision for location management can be left to the "map service".. there wont also be a need to translate lat, long to a segment.
You have mentioned that the location will be updated almost every 5 secs for drivers and Cassandra is a good for this as it will be able to scale. But in your DB video, you have mentioned that noSql should not be used for multiple updates. Can you please clarify?
Different databases(Nosql or others) are optimized for different kind of read-write patterns. Cassandra for example is great for inserts and reads, decent enough at updates, but bad at deletes. In this scenario of location pings, we are doing a insert every 5 seconds for a driver and then bulk reads by driver id when we need to plot the driver's/trip's trail. So for this scenario Cassandra would be a good choice. Hope that answers :)
Hi, can anyone help with this question: System design question to design a heatmap. user can input any range of times (in minutes) and I want to be able to see the density of drivers on the map color coded You don't have to worry about the actual color coding part (assume you have some UI that will take in the count input and do the coloring for you).
Hi, thank you for uber hld design. I am looking for database design. What are table we have to create SQL db and what could be structure for no-sql db? Could you please help to have video on this? Or anyone have link which explain same please help share in comment.
How the data related to trips is distributed? since it is a distributed system. it will not be feasible to store data into single instance of database? we need to do some sharding( even in cassandra we will need to choose distribution key ) so how the distibution/sharing of data happens?
If Uber stores driver data for every 5 seconds, the data stored is insanely huge. Data stored per day would be 30 bytes * (86400/5) * 1M Drivers. Around 500 GB per day. Around 300 TB per year just for storing driver location
I agree that this amount of data is huge, but this is primarily analytical data right. We wont be actively querying on it unless we need to do some analytics on it. so this data can be moved to something like Hive or even older data can be moved to Archive cold storage
Man... the content of your video is truly interesting and is worth to watch. But in 30 minutes you say "OK" about 200 times. I'm trying to focus on WHAT you are saying but my brain is just getting ready for the next "OK" (that I don't want to hear). Is it possible to get a sound engineer, who could remove all the "OK"s from the sound track and reupload the video in that form?
Thanks for the great video! I think the WebSocket Server stuff adds a bit unnecessary complexity. This way the web server is no longer stateless. Why cannot we store the trip-driver matching information in database, so every time drivers send location data the web server query the db to find assigned trip and return to driver if exists?
Hey Sandeep, Great content. Just few doubts : How do we divide the area into segments as you mentioned? Why we have different driver service, we could just include one field as role ( user/driver) and utilize same service only?
It's like sectors of telecom towers 😀 and do visitor stuff , local to the area...only correct usecase of Cassandra is trip history...can we use weighted graphs to calculate minimum distance? shouldn't we draw circles rather than segment because it will be equidistant and all cabs with in that radius can be then selected based on requirement
The white board is often hidden by the instructor. It would be best if these were slides and the relevant portions are highlighted using markers or shading techniques. The instructor's video can be relegated to a thumbnail except at start and end. Good content but poor presentation.
Bro you are fantastic. So today I had an interview with MSCI (Morgan Stanley) and they asked me about designing around Ola, where they wanted me to show the cabs around a user as soon as he logins to the app. I remembered this video of yours and explained how things work and how the mapping service would divide the city into grids and fetch the information. The interviewer was satisfied and I feel I have cracked the round. All thanks to you \m/
so, u joined Morgan stanley ??
Did you finally join MSCI?
bro, don't keep us hanging around... share if you were selected?
he go banned from applying@@rishav144
No he did not join morgan stanley..he joined byju’s n then jio..
What an amazing explanation to the most complex of things. Teaching is an art, and it does get exemplified by your teaching style.
Kudos to you man
Very true. He is awesome
Great video on two main dimensions that makes it easy to follow and appreciate:
1. Scope of the product is clearly defined
2. Delivery of content is very crisp and to the point
Cant stress how many other videos out there have a catchy title like "Design Uber service" that goes all over the place in terms of what it wants to cover and how to delivery the content. This is the best video on Uber System design available on RUclips.
This is the best video for uber system design out there!
It would have been great if you had covered how the request for cab is fanned out to multiple drivers and once any one driver accepts the trip, how the request is dropped from all the other drivers. overall content is very good and helpful. Very good effort.
You are an amazing teacher. Why have you stopped making such an amazing videos . Kindly create contents of top quality. You rock
All your videos are very good. They are good to think about various requirements, splitting a given problem into multiple services and so on. But I think you haven't been talking about the data model, how it can be represented in a NoSql vs RDBMS vs graph database, and so on. Many interviewers expect that discussions, if not all. Hope you can do that in your future videos. Thanks!
man, you are real good. seems you have worked at Uber or 1 of these cab apps, and have great understanding of when/where to use different tools/databases. thx much! But 1 point these are so detailed (including the time that would take to draw each of those layouts and bullet points), and some of these sys design interviews are only for 45 minutes (which actually may be only 35 minutes leaving the 5 minutes each for intro/outro), can't cover everything, but we can at-least layout the buffet of knowledge we have and let them pick what they would want us to go over.
This video is very good. The best thing is that you have provided summary also with clear diagram.
Thank You very much.
I watched a couple of videos on Uber system design and wasn't satisfied and luckily ended up here. Very good explanation focusing on key components. Thanks bro !
Fantastic bro. Your teaching style and kind of substance you produce is awesome!
Very good explanation...all scenarios covered.
Nice explanation!.. Liked it!
I think the complexity of maintaining redis to map driver id - handler and the reverse mapping also can be avoided.
If we load balance the request from driver based on driver id hash, the request will go to the corresponding socket handler, and when the response comes back say driver gets a ride booked, the driver object will have the id from which corresponding handler can be found. Thereby eliminating the need to store mapping.
when a new driver comes online and gets connected to a websocket, how the websocket manager gets to know the fact and how does it update mapping? Is it using internally a HashMap?
Very nicely explained
Very confused between roles of Location service and Map service. Who takes lat long and returns segment id ? Who identifies the list of drivers based on segment?
Good Content.
U a System Design God
"Segment" is essentially a quad tree.
It can also be geohash as well.
In many interview people ask algorithms in system design round. but I see your most of the design is like use kafka for self pace or decoupling, scaling. Redis for cache over mysql. It would be great if you can add algo in each explanation.
Very nice work 👍
Hi, thanks for the content on this channel! Had one question though.. in the customer flow, we will need a load balancer that makes a web socket connection to the cab request service right? Since there can be multiple customer requests at the same time?
excellent ,thank you
Thanks Sandeep ... Such a great video. just one thing so customer also connects with Map service to find out its location + the A(pickup) and B(drop point) point
Why did you choose request response model for location update from driver? I think event driven model is more suitable for this. Please let me know your comments
Very easy to understand & amazing!!
Our college project is not real project but it gives some insight. Similarly this video is just an insight. I think uber is more complex designed.
Brilliant
Cab finder is doing so many things, how do we handle spike in requests for it? Do we enqueue requests and spin up more instances of services?
Hi,
In the video, you said we will keep the driver location in NoSQL DB, what would be the sharding key for cabs location data? To fetch fast their current location. The driver will be continuously moving after every 4-sec new location feed will come and the old location feed will be stale data.
Please help.
Had a similar question, from the use case point of view it makes sense to keep the segment id as the partition key and in the wide columns add the various cabs. But the issue is a cabs segment keeps changing, so how do entry has to be deleted from older segment too, how do we do this since Cassandra does not support deleted very well
I think we are just concerned with the current location of the driver.
So we can keep that in redis with key being driver_id and value being lat-lon and segment I'd.
This in turn can help in deciding if driver has moved to another segment which can further be updated in the redis which also contains segment_id v/s list of driverids.
Would using a quadtree be a quicker way of determining cabs in the same segment as the rider trying to hail a cab ?
Your way of explaining is fantastic bro,
But pls change the mic voice would be more clearer
thnx for this video
Great Video, one issue - can you please change your audio system, the audio quality is poor.
Yes Pratyush, we are experimenting with a couple of options these days.
Looks like recording videos is a far more complicated thing than building software :)
@@codeKarle But these are saviours for people who are struggling to learn system design....
How can we scale the websocket handlers, if we have lakhs of drivers live at the same time how can we handle that? We just can't add more and more websocket servers right?
Superb explanation
would it not be easier to put driver locations in a kafka first and then have multiple consumers reading for location service? This way the load will be fairly balanced.
How is redis updated when a driver's segment changes?
Great video! Watched 3 other videos and this one is the best by far. One thing though is you never mentioned what Location Service will write to Kafka. Can you please clarify this?
i think location service pushes events regarding payments and pricing. It was mentioned by the author that location service also calculates pricing and payments for drivers
Location service will send 1. details about route selected between source and destination and the time it takes. It will help improving MapService suggestions. 2. Route taken by driver is different from suggested route. It will help in Fraud Service, driver profiling, driver priority engine and user profiling.
A great video, thanks man!
Only one suggestion, please buy a better microphone :)
um...what happend when the driver is offline (car crash, no battery, phone break) or reject the trip, when you try to assign a trip to him, how to handle the confirmation ?
why would you use mysql for user data?
Hi, If I understand well the cab finder send a message to kafka and then location service get the message and put another message to kafka (Or the ACK contains the list?) with cabs and then cab finder send the notification to all of those cabs and request an acceptance, then when one of them accept the trip the cab finder response with the cab to the cab request service?
What is this LB doing ? In couple of diagram you have linked the D1 - > Websocket Handler1. It is not clear the other endpoint of WebSocket connection. Should be LB, right?
Thanks for amazing content on the channel! Could you please clarify how would the maps service converts lat-longs to segment? I was thinking if the segment size is constant, it would just be a function of lat long, but with variable segment sizes (for avoiding too many or too few drivers), would you need an index?
I think here geohash can be one option where world is divided into 32 sections which keeps on dividing recursively and we can always derive a geohash from lat long which can serve as an identifier for the segment.
@@uditagrawal6603 Thanks or bringing up geohash -- it makes sense for the drivers to constantly publish their 8 or 9 byte geohash .. the precision for location management can be left to the "map service".. there wont also be a need to translate lat, long to a segment.
Can instead of segmentation by road distance. we take time
How the ride sharing will be implemented? Say, user is trying to book a seat in a shared cab.
is webscocket horizontal scalable
why catn we use lat long instead segment
You have mentioned that the location will be updated almost every 5 secs for drivers and Cassandra is a good for this as it will be able to scale. But in your DB video, you have mentioned that noSql should not be used for multiple updates. Can you please clarify?
Different databases(Nosql or others) are optimized for different kind of read-write patterns. Cassandra for example is great for inserts and reads, decent enough at updates, but bad at deletes. In this scenario of location pings, we are doing a insert every 5 seconds for a driver and then bulk reads by driver id when we need to plot the driver's/trip's trail. So for this scenario Cassandra would be a good choice.
Hope that answers :)
Hi, can anyone help with this question: System design question to design a heatmap. user can input any range of times (in minutes) and I want to be able to see the density of drivers on the map color coded You don't have to worry about the actual color coding part (assume you have some UI that will take in the count input and do the coloring for you).
Hi, thank you for uber hld design.
I am looking for database design. What are table we have to create SQL db and what could be structure for no-sql db? Could you please help to have video on this? Or anyone have link which explain same please help share in comment.
How the data related to trips is distributed? since it is a distributed system. it will not be feasible to store data into single instance of database? we need to do some sharding( even in cassandra we will need to choose distribution key ) so how the distibution/sharing of data happens?
Thanks for your video.
There are millions of customers , how does this enormous connection request is handled?
If Uber stores driver data for every 5 seconds, the data stored is insanely huge. Data stored per day would be 30 bytes * (86400/5) * 1M Drivers. Around 500 GB per day. Around 300 TB per year just for storing driver location
I agree that this amount of data is huge, but this is primarily analytical data right. We wont be actively querying on it unless we need to do some analytics on it. so this data can be moved to something like Hive or even older data can be moved to Archive cold storage
Did system like Uber/Ola really uses Redis for storing location -- driver mapping ?
Man... the content of your video is truly interesting and is worth to watch. But in 30 minutes you say "OK" about 200 times. I'm trying to focus on WHAT you are saying but my brain is just getting ready for the next "OK" (that I don't want to hear).
Is it possible to get a sound engineer, who could remove all the "OK"s from the sound track and reupload the video in that form?
Seen a lot of videos of Uber design. GeeksForGeeks says its a 'Hard' problem. After watching your video, its not that hard it seems. Easily navigable.
Can you please make HLD of Book my Show ?
Thanks for the great video! I think the WebSocket Server stuff adds a bit unnecessary complexity. This way the web server is no longer stateless. Why cannot we store the trip-driver matching information in database, so every time drivers send location data the web server query the db to find assigned trip and return to driver if exists?
That's a write heavy operation which can result in slow queries (high latencies) which can degrade the performance
Can I specify that the system should be Real Time?
great content but please use better microphone.
Hey Sandeep, Great content. Just few doubts :
How do we divide the area into segments as you mentioned?
Why we have different driver service, we could just include one field as role ( user/driver) and utilize same service only?
Typically quad tree is used for dividing the area into segments. Check Google S2 library for details
Can I know how some founders of the app did this complex thing by themselves ?
First few years are always monolith and some basic features.
make one for coding platform like leetcode
I hope you must have upgraded your microphone by 2024
It's like sectors of telecom towers 😀 and do visitor stuff , local to the area...only correct usecase of Cassandra is trip history...can we use weighted graphs to calculate minimum distance? shouldn't we draw circles rather than segment because it will be equidistant and all cabs with in that radius can be then selected based on requirement
Speed up the video to 1.5 and thank me later :+1
Good One Buddy ...But Please focus the camera on the board rather than your Face. That will help
minute correction. it's (1,0) on x-axis and (0,1) on y-axis
Hello, Can your company or someone who you may know develop a software like Uber for my rideshare company? please reach out to me.
We can provide consultation, but not the full implementation at this point in time. Please let us know if that works.
Hello Umer, Have you already started development?
The white board is often hidden by the instructor. It would be best if these were slides and the relevant portions are highlighted using markers or shading techniques. The instructor's video can be relegated to a thumbnail except at start and end. Good content but poor presentation.