Whenever I see your videos, it feels like I'm sitting in ofc design review meeting with a senior engineer as you don't throw fancy words, really appreciate your content _/\_
Great videos on system design Sandeep. Thanks! I have one concern about the Airbnb design. You have used Kafka for multiple things and few of them don't really require or would be actually a not-so-good design. 1. Hotel onboarding: This hotel onboarding wouldn't really require an async service as I don't see this system as write-intensive. A simple sync REST API should work. 2. Booking of a room: The booking of the room by a user & removing the room from the available list should be consistent. I think here also Kafka's use should be avoided. In both cases, immediate reflection to the other users should be handled by a write-through cache policy. Let me know your thoughts. Thank you!
1. Why is Kafka not a good choice? There can be many workflows that we can trigger whenever a hotel is onboarded, like sending a notification to customers, advertisers etc. All of that can be decoupled into a separate service and the service can consume from Kafka
I really like your unassuming way of explaining the essence of your topic. The only suggestion I make is that make it a little more relaxed and casual. I would like to see the tension fade away.
Really great video : Have some doubts below - So how are we storing available table. Are we going to store for 365 for each hotel . That will be inefficient. - Which service will pull data from hadoop cluster? once the analysis done from hadoop cluster aren't we storing it in Analytic DB? - Should not we store images first to s3/gcs then later cdn?
Excellent points. I had the same question. I think we shouldn't write images directly to CDN. We will store them in an object storage and CDN will cache it based on usage of the files geographically.
In case anyone reads it now, I believe we are anyways dividing the entire geography into parts in terms of data centers so it won't be exactly a single table populating the data for all the hotels and its room types for 365 days. It will be more like - If I divide the entire geography into 4 parts where each part will have a separate database, so available_rooms table can contain 1 million/4 = 250000 hotels * 10 room categories * 365 days = ~750 million rows. Ofcourse this data will be sharded based on hotel IDs so it will be separated into separate shards.
Great video. I have one question though. How are you generating/updating the available_rooms table? Where's that data coming from to begin with? Thanks!
Very well explained. Just one point of discussion here from my end: Can we create some inventory-service and keep availability there instead of booking-service, this will help in well defined boundaries and responsibilities. Booking service should just keep the booking information and when booking is done, it can update the availability in one single distributed transaction. Let me know if there are any challenges/problems in the above said approach.
Superb Sandeep. I am looking for Coupon management system design. its commonly asked across companies. On internet, there is not a single design for this.
First time on your channel, it's pretty good and easily articulated. One small suggestion is to take a break for few seconds, you are not giving any stops.
Thanks brother .. it really helped.. especially hatsoff for explaining each of the implementation, requirement, user case etc... Please make next videos more lengthy so that beginners can understand more details.
Great video Sandeep. Quick tip for Product Managers looking at this - I think the first 15 minutes and then 32:20 onwards is pretty relevant. Wanted your thoughts on something: The safest way to interact between the booking service and the search service would be to have a single DB where locks are taken on the respective fields when a booking is made. This would guarantee that no room is double booked ever. That approach seems very very 90s and theoretical. But is there ever a case in your design that the delay between syncing the search and booking service through Kafka can cause a double booking scenario. Are there any other cons you could think about this architecture? Thanks in advance.
Hello, we are using only one DB for booking management. There will never be a case where there would be a double booking. there can be a case where in search experience hotel is shown as available but on trying to book - it might be shows as unavailable. But this should be in only very minimal cases
Contentful video,very nicely explained .. Basically i was looking out for scenario , where multiple people select the same hotel almost at the same time , or like how we do solve the race condition in such a huge system ..? by avoiding same booking with multiple people.
The property ISOLATION of ACID properties deals with that. So, if you set the Isolation level of a transaction (T1) to be Serializable, no other transaction would be able to write into the tables involved in the first transaction (T1). So, whichever transaction starts first, that transaction will end first. In the case of a booking scenario, let's say, T1 and T2 are two transactions booking the same room for same dates at the same time, If T1 starts first, in the Serializable mode, then T2 will have to wait until T1 completes. When T1 completes, T2 sees that there already exists a record which it is trying to book, so it will have to rollback. In that case, you send a message to the user, that someone already booked it.
Excellent video overall. Automatic scalability: How much is the application developer controlling it (scripts, architecture, etc) and how much is the vendor component/cloud service provider (thresholds, metrics, etc) itself doing it? I see a component re-usage pattern. User Service, Hotel/Cab Service, Booking Service, Search Service - Elastic Search, etc across your videos. Great to see the modularity being put into action. You talk about trade offs and justify design choices which is a good but also reinforcing about first principles of components. Much more than an interview question which reflects knowledge and experience. Only disappointing thing is it is rendered by poor voice recording. Sigh!
Very good video. Ref the last part in terms of dividing dcs by region, it may not work if a user in USA wants to book say a hotel in India. You may want to rethink this to be an active active infrastructure with multi leader replication and IP any casting for low latencies.
Amazing Video thanks for sharing this ,Can you please add Summary for all your design videos , I see you have added for a few which gives a lot of sense . Thanks Again for all the great work 👍👍👍
Thanks for the lucid explanation. I have a couple of questions: 1. How do we maintain consistency in this design. If a room has been booked, it should immediately be blocked. Using Kafka would introduce certain delay, during which another person can book the same room. 2. Can't we use MongoDB for hotel database? A hotel has complex properties, like reviews, amenities etc. Don't you think a no sql database would better fit the use case here?
Very well explanation. I liked the way everything is explained starting from overall architecture to smaller services. Looking forward for more such videos.
@CodeKarlw I like this design, BUT: - Using a RDBMS for performing the booking lock may not be appreciated by all interviewers: I proposed a very similar architecture in a sr backend engineer interview @ Uber and I've been rejected with the motivation "He used a RDBMS for performing the reservation lock". It would be interesting seeing an alternate implementation. (Redis lock?) - The service crash scenario is completely absent in the design, specifically a crash of the booking service after a payment has been successfully performed but before the booking status update - How do you maintain the datacenter database aligned? Shared kafka topic consumed by an align service? Thank you for your work anyway!
Great video. One thing i that i found missing in this design is : When admin of hotel adds new rooms -> there is no flows that updates rooms availability in DB of booking service Please let me know if i i'm missing anything on this
Thanks for the great explanation. Question: Why do you suggest the approach of setting the boolean "isActive" to false, to delete a record rather than using the SQL delete operation?
It's generally not a good idea to delete something from the Database. If you need that information for auditing, financial reporting, or just debugging, it'll be helpful to have that information in our DB marked inactive. Additionally delete queries so have a risk of deleting something accidentally in case of bugs, so i'll always prefer to have a delete/isActive flag and filter the active records in get queries.
Thanks for great System design series. One doubt w.r.t. DB design here: I assume search API will look like search(lat ,long, startdate, enddate, # of rooms) to return list of hotels in the given lat long area. Perhaps we can use Quad Tree type data structure here if we want to reduce latency or update booking ASAP. (1) Available_rooms table has a field for available_quantity, Will it carry 365 entries for date column for say 1 year? If not what is more efficient way to store this info and perform fast search. Also don't you think PK should be room_id+date above for uniqueness. Please correct me if wrong. and then (2) How are we connecting Available_Rooms with hotel.
@theghostwhowalk - I had exactly the same concern as "(2) How are we connecting Available_Rooms with hotel." - Typically there should be an association of hotel in the 'Booking' DB as well. Else how for a given date range would a list of hotels appear based on room availability. @codekarle - Sandeep can you help out here ?
Great turotial. The one thing which I feel is missing : Once the hotel owner adds his hostel to the service, there should be some Map segment kind of thing that should happen in order to map a hotel to a place. I mean this can't be a general search like Amazon search, if a user searches for a place, how're we going to show all the hotels in that place? For me the segment approach which you showed during Maps tutorials would work best here.
That's a great suggestion. One possible solution without breaking this design is to add some tags while adding the hotel and in that service we could also suggest some tags to the owner so that if someone searches hotels near Parvati river we get this suggestion.
I don't understand a few points here - 1. How is the available quantity only a function of room_id. If we are accepting date range in POST /book, it should be accounted for in the schema. 2. Assuming, we have that in the schema, we should also need to have an efficient query that checks for availaibility based on these three criterias.
The content is awesome, really helpful to me. Please make videos on geographic information systems like- Google Earth, Street View, Voice recognition systems like- Siri, Alexa
Thank you for the insightful video. Very well delivered. Question though, Where do you place users on the waiting list, who have reserved a fully booked hotel that suddenly becomes available?. Do you need a queuing system or they can be placed in the archival
Great video! I think you missed a point where the hotel db needs to be updated with decremented available rooms when someone books a room via booking service, to technically there should be a room--updates src talking to kafka which decrements the available rooms
Simple and Nice video. Loved it. However, it did not cover searching of hotels in a radius. It would have been helpful to see how that might have been done.
That would be very similar to this one, just that in that scenario, instead of hotel being the primary entity, it would be a flight ticket. Rest most of the things would remain very similar to Airbnb, be it the search flow, be it the booking flow, be it the analytics. One difference though would be the way we interact with the third parties(Airlines) and do reconciliation. We'll try and do a video on that, but that might take some time :)
We could do that too. There is nothing wrong in that approach as well. I would prefer the redis approach, reason: since we have an easy way to implement this and it's widely used & stable, then why bother building & maintaining a timer.
Nice Explanation , I am having one doubt here , From Hotel Service Storing the data into one MYSQL Cluster, and from Booking Service Storing the Data into Another MYSQL Cluster, then How do they in sync ? I am mean Hotel Service data and booking service data in DBS.
Thanks for article. Live updates to the Hotel will get published to Kafka and it feeds into Elastic Search. But my question is, Elastic Search only holds live updates or its like a replica of Hotel DB ?
Superb explanation .. thank you for this . One question do you have more explanation on Apache Kafka and Redis as separate topics ? It will be really helpful.
Rarely the specific room number is assigned at Booking time. You just get a confirmation about a King/Twin/Double room and later when you arrive at the property you get to know 12A/12B/3A etc.
To answer it in a generic way, lets say Service A is calling Service B for something. Now if A needs the response of B to proceed further with what it was doing, or if they collectively form a Transactional System, then we can use Rest/RPC/any syncronous method. But wherever A does not need the response of B, and this is more of an Info that B needs to do something on, then I would rather make it an Async thing and keep any queueing solution like Kafka in between. Async RPC/Async Rest could be used, but then it has it's own set of complications like who does the retry, what if server B is down, circuit breaking, sudden bursts in traffic, etc, and Kafka is taking a lot of that complexity away.
Best system design video i have watched till now, i was able to relate to most of the explanation given for choosing the right storage at right place. Precise and to the point.
thanks for detailed explanation. I am new newbie to system designs - Could you please clarify if Hotel Service, Search Services, booking service,.etc. are different EC2 machines and there would be multiple load balancer supporting each service to scale? thanks in advance.
At a very high level, kind of yes. They are different services, and by that it means they are different logical components, and it's better to assume that they would be running on different hardware. But that's not a requirement. You can have two services running on one machine, and specially with the increased adoption of container based technologies, it's more common to have multiple services running on same machine. By machine I here mean both VMs and physical hardware, and in case of EC2 you could still have two different EC2 VM instances running off the same physical hardware and multiple containers of your services running on each of those EC2 VMs. For the Load Balancers, you could have multiple loadbalancers supporting each service, or you could choose to use a Service Mesh which is a more common approach in the container based world.
Thanks! The same design would work for systems like Ticketmaster, Fandango, Bookmyshow, etc. Just that instead of hotels here, the entity being sold would be a ticket, with certain changes in the data model. But architecturally, it'll be the same.
Ready nice video detailed explanation. I'm wondering how we can get quick Search results with filters like available_quality, date_range and room_classification. Please make a video on this.
Glad I found your channel - the best System Design videos, well structured, organized and easy to follow. Quick question - what it MySQL master becomes a bottleneck, for example there is a sporting event and a spike of bookings? Read path can be scaled by adding more slaves but how about write path?
Thanks!! One obvious way would be the usual sharding, but that complicates the maintenance process. I would rather split the servers in multiple groups or rings, with the intent that one group maintains the data for a few countries, and in case of planned/predictable spikes, you can throw more hardware in the groups where you expect more traffic for a couple of weeks when the traffic is expected to increase. This ring would not just be on the DB front, but also for the Application since you'll need to scale up the app servers as well most likely. I am assuming here that we know what kind of spikes would be there, but most companies these days have that kind of a Prediction Engine build accounting the social activities/historic data/etc, so that's a fair assumption I believe. Do share the channel with your colleagues, it helps :)
@@codeKarle Thanks Sandeep! By splitting into multiple groups, we are still looking at having 1 master per group right? Otherwise if we add 2 or more masters we are essentially sharding, right? The question is whether can the spike within one country/city cause write traffic throughput increase to the point that a single master becomes a bottleneck and won't be able to keep up?
To optimize writes sharding+ replication of each shard is one way to support writes and reads. Sharding however is tricky on relational Dbs. Another approach would be active active replication across the cluster with leaderless quorum
For the issue where TTL expires before the Payment success message, why can we set the TTL long enough (more than Payment service SLA) for us to be absolutely sure the payment will never be successful later. Con is definitely that the room is reserved for more than it should be blocked for, but we definitely reduce some complexity there. Kudos for the effort
Thank you very much for sharing your knowledge. I have a question regarding search: hotel inventory is associated with time. For example, this particular hotel does not have room now but will have room(s) in the next 2 days. How do you store data in your ElasticSearch to allow searching for rooms given the checkin, checkout time?
what if customer booking for hotel and at the same time hotel delist their room. Since the databases are different it would definitely create inconsistency. How are you planning to handle that ? Will you be confirming the room to both the databases ? User generally spend 5-10 minutes to make the payment, and in between if the hotel is delisted then how it is handled ?
Great video, thank you! Just a question - 1. Why don't you use Redis for the other services too? 2. Why don't you use Casandra DB for Hotel service too? 3. The arrow from my SQL cluster to archival service is strange to me. How does MySQL send messages? It wouldn't be better that after booking service is done booking then send message \ HTTP request to archived service? 4. The archived service has only one purpose, just to insert data to Cassandra. Isn't it overhead and the logic should be in the booking service? Thank you again :)
1. Why don't you use Redis for the other services too? - Usually I prefer to do this when the read-writes ratio is very high. If there are a lot of writes then there is not too much benefit of caching. Usually these decisions would be data driven based on real access patterns. 2. Why don't you use Casandra DB for Hotel service too? - Cassandra could be used here, but I believe on this kind of dataset, you'll have lot of random queries happening, to fetch hotel by id, to fetch hotels in a region, or to fetch hotels managed by a user, etc. This would be for a lot of internal tools mainly that every company has, which we did not go over int he video, but those tools are always there. Now that query pattern is not optimal for Cassandra and we'll need a lot of data duplication to build those tools, Mysql does a better hob there. 3. The arrow from my SQL cluster to archival service is strange to me. How does MySQL send messages? It wouldn't be better that after booking service is done booking then send message \ HTTP request to archived service? - Those arrows are just to call out reads and writes. I know it's a bit cluttered there, but mysql is not sending any messages. Archival usually should be a bulk batch job, and not done for each booking, and it makes it more optimal, since you can run this at low load times though there is nothing stopping from doing it the way you are suggesting as well. 4. The archived service has only one purpose, just to insert data to Cassandra. Isn't it overhead and the logic should be in the booking service? It could be the same service in a small scale system, but at scale this would cause problems. Think of it in a way that when archival begins, it would probably shoot up to near 100% CPU utilization for a few seconds while it is running unless you are throttling. If thats in the same service, it'll impact other features of booking service. If you have it as a separate service, you can run that out of just two machines/maybe one docker container only, while booking service can scale as per your bookings traffic. Hope that answers your questions :)
Great stuff. Thank you so much. This definitely gives me more than an idea of how to go about designing. QQ. do you have a write up for this like you have for Amazon-system-design.
Good video. Just one concern, how come room id will have quantity, I think room id represents one room and quantity should always be one. We should have another table which maps room id to booking id.
Thanks for sharing. I have questions: - Is the user service is together located in Booking service? - I guess better to have noun in the API design, e.g. booking instead of book. WDYT?
@Sandeep : Nice discussion. Apart from concerns(which people have already pointed out), that if we connect booking and new hotel posting with kafka, the system will become near real time for search rather than real time which I see can be improved by using SQL DB only to power the search. Also, Kafka has some retention period like 7 days , so powering elastic search from Kafka rather than the SQL DB will result in inaccuracies in listing the hotel. please let me know your thoughts on that.
I think here the intention to use Elastic search is to support Fuzzy word search and for Word search Elastic Search DB is best as it optimised for that and gives results in best Time complexity. Also, we can do it with SQL DB but the point here is scalability and when more than a 1M concurrent users tries to search at same time SQL will be overloaded. Also, kafka retention period is nothing to do with elastic search it only retains events produced.
Nice Video. I just have a question on inventory. So here we are keeping the hotel rooms inventory in ELK for powering search and also in the MySQL DB (available_room table) for booking service to interact with?
That is for book-keeping purpose and incase a user wants to see an older booking, which happened let's say 2 years ago. That become is not being edited, so you don't need any ACID compliance there.
@23.07, i guess we need to mention we need to use which isolation to achieve. I guess check and set or serailizable isolation will help in race condition. Serialiazable isolation can be used with or without locks. it is better to use serializable shapshot isolation (SSI) to get the job done without using locks.
Very nice work brother..it helped us in building our system :) gratitude
Glad that it helped :)
Which system/which company you used this in?
@@codeKarle an homestay app we soon are going to launch :) we are a startup.
That's great to know 🙂
@@codeKarle brother do you contacts you share i am from non it but deeply wanna to made it.
@@rohitkhurana7424 did your start up come out?
Wathcing in 2024, great video brother, clear high level design and dive into details of the specific parts. Very valuable and informative.
One of the most complete and perfect system designs tutorial I have seen. The breadth and detail is perfect as an HLD.
Whenever I see your videos, it feels like I'm sitting in ofc design review meeting with a senior engineer as you don't throw fancy words, really appreciate your content _/\_
Great videos on system design Sandeep. Thanks!
I have one concern about the Airbnb design. You have used Kafka for multiple things and few of them don't really require or would be actually a not-so-good design.
1. Hotel onboarding: This hotel onboarding wouldn't really require an async service as I don't see this system as write-intensive. A simple sync REST API should work.
2. Booking of a room: The booking of the room by a user & removing the room from the available list should be consistent. I think here also Kafka's use should be avoided.
In both cases, immediate reflection to the other users should be handled by a write-through cache policy. Let me know your thoughts. Thank you!
1. Why is Kafka not a good choice? There can be many workflows that we can trigger whenever a hotel is onboarded, like sending a notification to customers, advertisers etc. All of that can be decoupled into a separate service and the service can consume from Kafka
I really like your unassuming way of explaining the essence of your topic. The only suggestion I make is that make it a little more relaxed and casual. I would like to see the tension fade away.
One of the best system design videos I have ever seen. Hats off to you and the mechanism you told for handling the case of reserving hotel bookings
This channel should have millions of subscribers.. you are highly under rated. Keep doing the good work
Thanks!!
Can't get there without your support.
Do share the channel with your friends/colleagues and hopefully we'll get there :)
Your videos are by far the best system design resources I have found on this platform. Thank you for these valuable materials!
Definitely the best system design video I have seen! very detailed explanation of each part
Thanks!! Glad that you liked it!
Really great video : Have some doubts below
- So how are we storing available table. Are we going to store for 365 for each hotel . That will be inefficient.
- Which service will pull data from hadoop cluster? once the analysis done from hadoop cluster aren't we storing it in Analytic DB?
- Should not we store images first to s3/gcs then later cdn?
Excellent points. I had the same question. I think we shouldn't write images directly to CDN. We will store them in an object storage and CDN will cache it based on usage of the files geographically.
In case anyone reads it now,
I believe we are anyways dividing the entire geography into parts in terms of data centers so it won't be exactly a single table populating the data for all the hotels and its room types for 365 days.
It will be more like -
If I divide the entire geography into 4 parts where each part will have a separate database, so available_rooms table can contain 1 million/4 = 250000 hotels * 10 room categories * 365 days = ~750 million rows. Ofcourse this data will be sharded based on hotel IDs so it will be separated into separate shards.
A wonderful playlist to binge watch :D More power to you Sandeep :)
Glad that you liked it :)
Great video. I have one question though. How are you generating/updating the available_rooms table? Where's that data coming from to begin with? Thanks!
That's probably from UI/App for hotel to hotel svc right?
Very well explained.
Just one point of discussion here from my end:
Can we create some inventory-service and keep availability there instead of booking-service, this will help in well defined boundaries and responsibilities.
Booking service should just keep the booking information and when booking is done, it can update the availability in one single distributed transaction.
Let me know if there are any challenges/problems in the above said approach.
Superb Sandeep. I am looking for Coupon management system design. its commonly asked across companies. On internet, there is not a single design for this.
Love it how your mic makes you sound like you're speaking from the moon. 😂 Thank you for the video.
thanks for the video!
we should also talk about how we use semaphores when we access shared resources in a microservice setup.
Your videos are very helpful! Please consider uploading a low level design series as well.
Very nice video. very thorough and provided all the necessary information in very easy to understand way. great work.
One of the great System design videos out on youtube. Keep up the good work..
First time on your channel, it's pretty good and easily articulated. One small suggestion is to take a break for few seconds, you are not giving any stops.
Thanks brother .. it really helped.. especially hatsoff for explaining each of the implementation, requirement, user case etc...
Please make next videos more lengthy so that beginners can understand more details.
Great video Sandeep. Quick tip for Product Managers looking at this - I think the first 15 minutes and then 32:20 onwards is pretty relevant.
Wanted your thoughts on something:
The safest way to interact between the booking service and the search service would be to have a single DB where locks are taken on the respective fields when a booking is made. This would guarantee that no room is double booked ever. That approach seems very very 90s and theoretical. But is there ever a case in your design that the delay between syncing the search and booking service through Kafka can cause a double booking scenario. Are there any other cons you could think about this architecture?
Thanks in advance.
Hello, we are using only one DB for booking management. There will never be a case where there would be a double booking. there can be a case where in search experience hotel is shown as available but on trying to book - it might be shows as unavailable. But this should be in only very minimal cases
Contentful video,very nicely explained .. Basically i was looking out for scenario , where multiple people select the same hotel almost at the same time , or like how we do solve the race condition in such a huge system ..? by avoiding same booking with multiple people.
The property ISOLATION of ACID properties deals with that. So, if you set the Isolation level of a transaction (T1) to be Serializable, no other transaction would be able to write into the tables involved in the first transaction (T1). So, whichever transaction starts first, that transaction will end first. In the case of a booking scenario, let's say, T1 and T2 are two transactions booking the same room for same dates at the same time, If T1 starts first, in the Serializable mode, then T2 will have to wait until T1 completes. When T1 completes, T2 sees that there already exists a record which it is trying to book, so it will have to rollback. In that case, you send a message to the user, that someone already booked it.
Excellent video overall.
Automatic scalability: How much is the application developer controlling it (scripts, architecture, etc) and how much is the vendor component/cloud service provider (thresholds, metrics, etc) itself doing it?
I see a component re-usage pattern. User Service, Hotel/Cab Service, Booking Service, Search Service - Elastic Search, etc across your videos. Great to see the modularity being put into action.
You talk about trade offs and justify design choices which is a good but also reinforcing about first principles of components. Much more than an interview question which reflects knowledge and experience.
Only disappointing thing is it is rendered by poor voice recording. Sigh!
excellent video. very informative and useful. Thanks and kudos to Sandeep.
Very good video. Ref the last part in terms of dividing dcs by region, it may not work if a user in USA wants to book say a hotel in India. You may want to rethink this to be an active active infrastructure with multi leader replication and IP any casting for low latencies.
Really great brother.
Thank you from the bottom of my heart ❤❤❤
Can you please create a video focusing only on the "search hotels". I am sure that will be interesting
I'm wondering how we can get quick Search results with filters like available_quality, date_range and room_classification.
Amazing Video thanks for sharing this ,Can you please add Summary for all your design videos , I see you have added for a few which gives a lot of sense . Thanks Again for all the great work 👍👍👍
Thanks!!
Yeah, that is work in progress. Other summaries would be live soon in a couple of days at www.codekarle.com/
Very nicely explained from high level to low level. Lots of concepts covered in the whole session.
Thanks for the lucid explanation. I have a couple of questions:
1. How do we maintain consistency in this design. If a room has been booked, it should immediately be blocked. Using Kafka would introduce certain delay, during which another person can book the same room.
2. Can't we use MongoDB for hotel database? A hotel has complex properties, like reviews, amenities etc. Don't you think a no sql database would better fit the use case here?
we can always have hotel/room metadata in mongo db
Very good video and extremely simple explanation, connecting us to the internals of the real world hotel booking.
Very well explanation. I liked the way everything is explained starting from overall architecture to smaller services. Looking forward for more such videos.
@CodeKarlw I like this design, BUT:
- Using a RDBMS for performing the booking lock may not be appreciated by all interviewers: I proposed a very similar architecture in a sr backend engineer interview @ Uber and I've been rejected with the motivation "He used a RDBMS for performing the reservation lock". It would be interesting seeing an alternate implementation. (Redis lock?)
- The service crash scenario is completely absent in the design, specifically a crash of the booking service after a payment has been successfully performed but before the booking status update
- How do you maintain the datacenter database aligned? Shared kafka topic consumed by an align service?
Thank you for your work anyway!
Great video. One thing i that i found missing in this design is :
When admin of hotel adds new rooms -> there is no flows that updates rooms availability in DB of booking service
Please let me know if i i'm missing anything on this
Your videos are really great for improving architectural concepts.
Thanks!
Glad that you liked them.
Do share these with your colleagues :)
Great explanation and elaboration on the design.
Thanks for the great explanation.
Question: Why do you suggest the approach of setting the boolean "isActive" to false, to delete a record rather than using the SQL delete operation?
It's generally not a good idea to delete something from the Database. If you need that information for auditing, financial reporting, or just debugging, it'll be helpful to have that information in our DB marked inactive. Additionally delete queries so have a risk of deleting something accidentally in case of bugs, so i'll always prefer to have a delete/isActive flag and filter the active records in get queries.
Thanks for great System design series. One doubt w.r.t. DB design here:
I assume search API will look like search(lat ,long, startdate, enddate, # of rooms) to return list of hotels in the given lat long area. Perhaps we can use Quad Tree type data structure here if we want to reduce latency or update booking ASAP.
(1) Available_rooms table has a field for available_quantity, Will it carry 365 entries for date column for say 1 year? If not what is more efficient way to store this info and perform fast search. Also don't you think PK should be room_id+date above for uniqueness. Please correct me if wrong.
and then (2) How are we connecting Available_Rooms with hotel.
@theghostwhowalk - I had exactly the same concern as "(2) How are we connecting Available_Rooms with hotel." - Typically there should be an association of hotel in the 'Booking' DB as well. Else how for a given date range would a list of hotels appear based on room availability.
@codekarle - Sandeep can you help out here ?
@@Legendary-Akshit Yes, booking table should be connected with hotel id rather than room id
Amazing walk through. Thank you making this video.
Great turotial. The one thing which I feel is missing : Once the hotel owner adds his hostel to the service, there should be some Map segment kind of thing that should happen in order to map a hotel to a place. I mean this can't be a general search like Amazon search, if a user searches for a place, how're we going to show all the hotels in that place? For me the segment approach which you showed during Maps tutorials would work best here.
That's a great suggestion. One possible solution without breaking this design is to add some tags while adding the hotel and in that service we could also suggest some tags to the owner so that if someone searches hotels near Parvati river we get this suggestion.
I don't understand a few points here -
1. How is the available quantity only a function of room_id. If we are accepting date range in POST /book, it should be accounted for in the schema.
2. Assuming, we have that in the schema, we should also need to have an efficient query that checks for availaibility based on these three criterias.
You can probably utilize range based locks while reading the available state and creating the booking entry.
The content is awesome, really helpful to me. Please make videos on geographic information systems like- Google Earth, Street View, Voice recognition systems like- Siri, Alexa
Thank you for the insightful video. Very well delivered. Question though, Where do you place users on the waiting list, who have reserved a fully booked hotel that suddenly becomes available?. Do you need a queuing system or they can be placed in the archival
i love you man. Your videos help me a lot in my prep. Keep it up!!
Great video! I think you missed a point where the hotel db needs to be updated with decremented available rooms when someone books a room via booking service, to technically there should be a room--updates src talking to kafka which decrements the available rooms
Simple and Nice video. Loved it. However, it did not cover searching of hotels in a radius. It would have been helpful to see how that might have been done.
Well explained , to the point explanation , thanks a lot !
This is an awesome explanation.
Requesting you to please prepare one system design video for Flight Ticket booking system also.
That would be very similar to this one, just that in that scenario, instead of hotel being the primary entity, it would be a flight ticket. Rest most of the things would remain very similar to Airbnb, be it the search flow, be it the booking flow, be it the analytics.
One difference though would be the way we interact with the third parties(Airlines) and do reconciliation. We'll try and do a video on that, but that might take some time :)
Explained very well. Please make video on online pizza delivery app system design or online shopping system design.
Excellent tutorial!
very nicely explained. Thank you
Question: why can't we rely on our timer in payment service rather than getting notified from redis?
We could do that too. There is nothing wrong in that approach as well.
I would prefer the redis approach, reason: since we have an easy way to implement this and it's widely used & stable, then why bother building & maintaining a timer.
Nice Explanation , I am having one doubt here , From Hotel Service Storing the data into one MYSQL Cluster, and from Booking Service Storing the Data into Another MYSQL Cluster, then How do they in sync ? I am mean Hotel Service data and booking service data in DBS.
Great work!! Please upload more videos
Thanks for article.
Live updates to the Hotel will get published to Kafka and it feeds into Elastic Search. But my question is, Elastic Search only holds live updates or its like a replica of Hotel DB ?
Its amazing how you have utilized Kafka to interact with different systems. Can you clarify- what if kafka is failed? how are we handling that
Then all hell breaks loose. It’s assumed Kafka has high availability with partitioning replication
Thank you very much, that was enlightening, God bless you.
absolutely amazing explanation!! Gratitude
Thank you for this presentation. It is well explained.
If possible, request to add ticket booking app design like bookmyshow, please.
Your video is amazing and informative .Thanks :)
Eagerly waiting for your new videos .hope u will make it soon
Superb explanation .. thank you for this . One question do you have more explanation on Apache Kafka and Redis as separate topics ? It will be really helpful.
Awesome video, thanks for your sharing. I learned a lot from this video and others of yours
Hi Sandeep, Great video Content! Cant thank you enough! Could you please also explain the cancellation scenario
Great video. Just wanted to check at which point the Actual Room is assigned to a booking, eg 12A, 12B etc.
Maybe a bit too late to answer this, but this would happen after the acknowledgement for payment has been received.
Rarely the specific room number is assigned at Booking time. You just get a confirmation about a King/Twin/Double room and later when you arrive at the property you get to know 12A/12B/3A etc.
Very nice design presentation. Question: why do you use kafka instead of just RPCs, say, between the hodel svc and booking svc?
To answer it in a generic way, lets say Service A is calling Service B for something.
Now if A needs the response of B to proceed further with what it was doing, or if they collectively form a Transactional System, then we can use Rest/RPC/any syncronous method. But wherever A does not need the response of B, and this is more of an Info that B needs to do something on, then I would rather make it an Async thing and keep any queueing solution like Kafka in between.
Async RPC/Async Rest could be used, but then it has it's own set of complications like who does the retry, what if server B is down, circuit breaking, sudden bursts in traffic, etc, and Kafka is taking a lot of that complexity away.
Best system design video i have watched till now, i was able to relate to most of the explanation given for choosing the right storage at right place. Precise and to the point.
Ultimate video..hats off to u
thanks for detailed explanation. I am new newbie to system designs - Could you please clarify if Hotel Service, Search Services, booking service,.etc. are different EC2 machines and there would be multiple load balancer supporting each service to scale? thanks in advance.
At a very high level, kind of yes.
They are different services, and by that it means they are different logical components, and it's better to assume that they would be running on different hardware.
But that's not a requirement. You can have two services running on one machine, and specially with the increased adoption of container based technologies, it's more common to have multiple services running on same machine.
By machine I here mean both VMs and physical hardware, and in case of EC2 you could still have two different EC2 VM instances running off the same physical hardware and multiple containers of your services running on each of those EC2 VMs.
For the Load Balancers, you could have multiple loadbalancers supporting each service, or you could choose to use a Service Mesh which is a more common approach in the container based world.
Fabulous video. Will this work for Ticketmaster or fandango. If not what type of modification we have to make to the architecture
Thanks!
The same design would work for systems like Ticketmaster, Fandango, Bookmyshow, etc. Just that instead of hotels here, the entity being sold would be a ticket, with certain changes in the data model. But architecturally, it'll be the same.
@@codeKarle thank you
Ready nice video detailed explanation. I'm wondering how we can get quick Search results with filters like available_quality, date_range and room_classification. Please make a video on this.
Glad I found your channel - the best System Design videos, well structured, organized and easy to follow.
Quick question - what it MySQL master becomes a bottleneck, for example there is a sporting event and a spike of bookings? Read path can be scaled by adding more slaves but how about write path?
Thanks!!
One obvious way would be the usual sharding, but that complicates the maintenance process.
I would rather split the servers in multiple groups or rings, with the intent that one group maintains the data for a few countries, and in case of planned/predictable spikes, you can throw more hardware in the groups where you expect more traffic for a couple of weeks when the traffic is expected to increase. This ring would not just be on the DB front, but also for the Application since you'll need to scale up the app servers as well most likely.
I am assuming here that we know what kind of spikes would be there, but most companies these days have that kind of a Prediction Engine build accounting the social activities/historic data/etc, so that's a fair assumption I believe.
Do share the channel with your colleagues, it helps :)
@@codeKarle Thanks Sandeep!
By splitting into multiple groups, we are still looking at having 1 master per group right? Otherwise if we add 2 or more masters we are essentially sharding, right? The question is whether can the spike within one country/city cause write traffic throughput increase to the point that a single master becomes a bottleneck and won't be able to keep up?
@@alexkorzo6129 Stumbled upon this comment. In my understanding sharding is the way to go. Probably on (country, city, hotel_id)
@@codeKarle Are you talking about consistent hashing here??
To optimize writes sharding+ replication of each shard is one way to support writes and reads. Sharding however is tricky on relational Dbs. Another approach would be active active replication across the cluster with leaderless quorum
Very nice work @codekarle. Only thing to improve is that your voice is not very audible. There is some noise.
This is really top notch work. Decent depth as well. Are you SDE3 or Principal?
Very well explanation
For the issue where TTL expires before the Payment success message, why can we set the TTL long enough (more than Payment service SLA) for us to be absolutely sure the payment will never be successful later.
Con is definitely that the room is reserved for more than it should be blocked for, but we definitely reduce some complexity there.
Kudos for the effort
Thank you very much for sharing your knowledge. I have a question regarding search: hotel inventory is associated with time. For example, this particular hotel does not have room now but will have room(s) in the next 2 days. How do you store data in your ElasticSearch to allow searching for rooms given the checkin, checkout time?
Great!! well explained.Can you post video on Flight Reservation System?
Golden video❤ , Great
what if customer booking for hotel and at the same time hotel delist their room. Since the databases are different it would definitely create inconsistency. How are you planning to handle that ? Will you be confirming the room to both the databases ?
User generally spend 5-10 minutes to make the payment, and in between if the hotel is delisted then how it is handled ?
the booking table will have an entry with reserved status for that room_id, the hotel staff wont be allowed to delist
Thanks Sandeep ! One request, can you create one design tutorial on Rate limiting !
Every video is very helpful.
This is super great. Thanks.
Excellent design
Really nice work! Bravoooo!
very clean explanation, thank you!
Great video, thank you!
Just a question -
1. Why don't you use Redis for the other services too?
2. Why don't you use Casandra DB for Hotel service too?
3. The arrow from my SQL cluster to archival service is strange to me. How does MySQL send messages? It wouldn't be better that after booking service is done booking then send message \ HTTP request to archived service?
4. The archived service has only one purpose, just to insert data to Cassandra. Isn't it overhead and the logic should be in the booking service?
Thank you again :)
1. Why don't you use Redis for the other services too? - Usually I prefer to do this when the read-writes ratio is very high. If there are a lot of writes then there is not too much benefit of caching. Usually these decisions would be data driven based on real access patterns.
2. Why don't you use Casandra DB for Hotel service too? - Cassandra could be used here, but I believe on this kind of dataset, you'll have lot of random queries happening, to fetch hotel by id, to fetch hotels in a region, or to fetch hotels managed by a user, etc. This would be for a lot of internal tools mainly that every company has, which we did not go over int he video, but those tools are always there. Now that query pattern is not optimal for Cassandra and we'll need a lot of data duplication to build those tools, Mysql does a better hob there.
3. The arrow from my SQL cluster to archival service is strange to me. How does MySQL send messages? It wouldn't be better that after booking service is done booking then send message \ HTTP request to archived service? - Those arrows are just to call out reads and writes. I know it's a bit cluttered there, but mysql is not sending any messages.
Archival usually should be a bulk batch job, and not done for each booking, and it makes it more optimal, since you can run this at low load times though there is nothing stopping from doing it the way you are suggesting as well.
4. The archived service has only one purpose, just to insert data to Cassandra. Isn't it overhead and the logic should be in the booking service? It could be the same service in a small scale system, but at scale this would cause problems. Think of it in a way that when archival begins, it would probably shoot up to near 100% CPU utilization for a few seconds while it is running unless you are throttling. If thats in the same service, it'll impact other features of booking service. If you have it as a separate service, you can run that out of just two machines/maybe one docker container only, while booking service can scale as per your bookings traffic.
Hope that answers your questions :)
Amazing video
Thanks for the detailed content
Maja agya bhai, appreciated
Got a brief insight! Ty🌟
Great stuff. Thank you so much. This definitely gives me more than an idea of how to go about designing. QQ. do you have a write up for this like you have for Amazon-system-design.
Good video. Just one concern, how come room id will have quantity, I think room id represents one room and quantity should always be one. We should have another table which maps room id to booking id.
Nice explanation!
Thanks for sharing. I have questions:
- Is the user service is together located in Booking service?
- I guess better to have noun in the API design, e.g. booking instead of book. WDYT?
@Sandeep : Nice discussion. Apart from concerns(which people have already pointed out), that if we connect booking and new hotel posting with kafka, the system will become near real time for search rather than real time which I see can be improved by using SQL DB only to power the search. Also, Kafka has some retention period like 7 days , so powering elastic search from Kafka rather than the SQL DB will result in inaccuracies in listing the hotel. please let me know your thoughts on that.
I think here the intention to use Elastic search is to support Fuzzy word search and for Word search Elastic Search DB is best as it optimised for that and gives results in best Time complexity. Also, we can do it with SQL DB but the point here is scalability and when more than a 1M concurrent users tries to search at same time SQL will be overloaded. Also, kafka retention period is nothing to do with elastic search it only retains events produced.
Nice Video. I just have a question on inventory.
So here we are keeping the hotel rooms inventory in ELK for powering search and also in the MySQL DB (available_room table) for booking service to interact with?
Greate video! Can anyone explain to me why we need the canssandra to store the bookings that have already happened? What is reading from it?
That is for book-keeping purpose and incase a user wants to see an older booking, which happened let's say 2 years ago. That become is not being edited, so you don't need any ACID compliance there.
@23.07, i guess we need to mention we need to use which isolation to achieve. I guess check and set or serailizable isolation will help in race condition. Serialiazable isolation can be used with or without locks. it is better to use serializable shapshot isolation (SSI) to get the job done without using locks.
If you could add the most important api calls for each service in the design diagram , it would be helpful especially for interviews. Thank you
Great suggestion! We'll add that in the future ones.