Awesome video. Few points :- i) We can use indexing on UID for faster retrievals. O(n) to O(log(n)) ii) Spawning a thread for every user is not efficient. Instead, we can use a dynamic threadpool. iii) Similarly, having a queue per user is not efficient. Instead, we can have a global queue with object (message, action like send/receive, uid) iv) Since you are using DB for storing message in case user is not online, we need to implement a disaster recovery mechanism i.e. replication. v) We can also implement blocked contacts by storing list of blocked UIDs for each user in the DB & we can drop such messages in web server.
actually whatsapp and other messaging apps use very customized protocols and language features to cater specific needs, such as Erlang which is extremely useful where millions of lightweight processes might need to be started and each other doesn't need to share data (everything is immutable)...also their xmpp based messaging uses extremely compressed payload compared to normal xml data transfer. These are implementation details but these are what makes the design itself eventually.
You are pretty thorough about the subject and it is reflected by the way you teach. A lot of people are usually confused as they have a vague idea about what to speak. This shows that you take efforts perparing for the delivery of the content. Thank you so much for your efforts to make these videos. I am learning a lot.
the guy has done amazingly! what I don't fully understand is the pid uid approach. All the messaging servers sit behind ALB. If a user session is bound with a pid in a specific server, it will lower the efficiency of load balancing, cuz the user will stay with that server during the conversation session. It's doable, but not ideal, IMO.
I love this content. Just for people who want to read, I got a couple of suggestions here: I personally would use the hearthbeat to keep the process alive. Once the heartbeat stops, or messages fail to be delivered, the process responsible for a client shutdowns and writes the shuttingdown timestamp in the DB and that is the last seen time. It has much less writes to the DB. For asymmetric encryption one would use the public/private key pairs to encrypt a common secret key (that is how you use it usually in SSL and other protocols) since asymmetric encryption is very expensive in both time and computing resources. You don't want to spent most of your processing power on encryption primitives (even less in a cellphone).
Well firstly, this video is really good. One part which is missing is you did not consider multiple messaging server instances running. This actually raises a very important question. One server let's say maintain persistent connection, what if that goes down for some reason?
Nice video btw. However, there are questions that I wanted to ask. When A sends a message to B and B is yet to connect to the server, your explanation said that the message is kept in the global DB until the PID of B in the server is created. When the PID of B is created, the server did a full lookup to the DB in order to find out what messages that is not received by B. My question is that: 1. Does the server evoke a trigger to lookup the global database whenever a new PID entry is formed? 2. If the server evoke a lookup trigger to the global DB with an assumption of there maybe a million of pending messages in the global DB. Does looking up without updating the global DB or indexing could scale up the system? 3. What is the main core /pattern of every real life system? Does multiple queries by giving heartbeat will slow down the performance?
@@meiliangwhut I dont think, he is saying to store the pid in the DB. He is storing the uid of the user B and the message for user B. When user B connects, and reports that is it connected, we have to find all undelivered messages and send it to user B. Also even though pid is transient, it will alive while the connection is on. You have to know which messaging server the receiver is connected to
Did not explain the case where sender and receiver clients are connected to two separate "message servers". In that case the message cant be sent to in-memory-queue. A dedicated messaging queue like sqs can be used to solve this.. This is a great explanation 💖💖
Few Questions which I am pondering right now 1. How does the long running thread/process work in Messaging server? Does it mean that the connection from client to server is not stateless but stateful and every time it finds the same process? Does that mean that the process is running on a single central machine and not in a horizontally scaled pods? 2. Is it client responsibility to poll for the messages after some time interval or is it the server's job to push messages to the client? If its the later then does that mean that Billions of client (devices) are making a stateful connection to server? is the protocol being used here is HTTP?
For last seen, instead of continuous heart beat, the user can update the column whenever he is closing the app and make the column as Null whenever he comes online Here null specifies that the user is online...
Thanks, that was pretty great. Would love to see an expansion of how encryption would work for the system and the tradeoffs you'd get between symmetric/asymmetric encryption.
Excellent brother..before seeing your videos I never thought about the system design. Now I exited to understand every application system design . Thanks for everything.!
Great explanation. Some question on last seen design. Why you need a heart bit send by user at regular interval. If the user is connected to server it is the current time and once disconnected just store that time as last seen. So difference of current time minus that timestamp will be last seen. Once user reconnect to server it is again going to be current time as last seen.
Yes... He is explaning Erlang OTP framework in simple way... Only super light weight threading mechanisms will scale like this... WhatsApp itslef uses Erlang OTP I guess.
Personally, I think .(1) The message receiver and message delivery should be decouple into 2 separate service, and they could be fully stateless. They can be connected with a message queue service or sit behind service discovery directly. Put them in a single executable and pass message between process is feasible for very small user base, but it's not scalable at all. Besides, It makes the "Mess SERVER" hard to test. (2) If you have an abstract concept "channel" or "session" here, then it's easy to forward message with rpc or write it into DB. (3) real-time message is time sensitive and has high frequent requests. It's not a very good idea to write all message into DB. Instead, I would say we should only write message for offline users into two level DB (Memory + persistent).
When I saw this video, I had doubts that WhatsApp’s messaging server can handle 10 million concurrent connections. Those were arose on the basis that setup would hit limit of 65k because all the connections to the back end server would be from the load balancer, that means source ip, destination ip, destination port would be same for each connection. Then only variable would be source port. So load balancer won’t be able to create more than 65k connections. But I learnt that this can be solved by making backend server listen on multiple ports due to which we can have 64k * (no. of ports on which backend is listening), which can easily overcome 10 million connections limits (given that throughput of application server is high enough).
Hey, I think some example of duplex protocol should be mentioned. Just saying "TCP" does not provide complete picture. Mostly XMPP protocol (built on top of TCP) is used, and not, for example HTTP.
Very educational. The other very important feature that wasn't addressed here is the 'group chat'. I wonder how WhatsApp handles them. If you could shed light on that as well, that would be awesome. Other than that, it simply awesome. Keep up the great work.
It would be very similar to the one to one chat logic. Except that, Whatsapp stores the users in a group. So, when a message sent to a group, whatsapp, takes the users in the group and sends to the corresponding Queues through the PIDs. If the user is offline, it is stored in DB against the userid.
Hi @Tech Dummies, Thanks for this video. But i have one doubt. As you have mentioned at 15:47 minute of this video, for every active client there would be a corresponding thread/process and a queue. But my point is - isn't it taking unnecessary space in memory just for being active. bcz whatsApp has billions of users. And if millions are just active( they are not doing messaging or call) , for them also process and queue will be created. Thank You !
One thing I don't understand is how many threads will be active in the service instances and clarity around thread management. Also let's say we have millions of connections in one server, will there be calls to other servers for contacts who have connections in other server.(If so this could be the reason for some delay that we experience when contacting some remote contacts) Rest is superb!
Thank you for this lecture . Further, I have a query regarding design Problem Statement:- I have subcategories defined under Vehicle Class as :- Bike ,Car, truck, bus Further, there are different variants under each four categories Bike :- sport, standard Car :- sport, standard, electric Truck :- Mini Truck , Power Truck Bus:- Mini Bus, AC Bus I have to calculate the price based on variants. For example Mini Bus price. Question:- How should I define the classes ? 1. Keep only vehicle as a class . 2. Keep Vehicle as a base class and make bike, car, truck , bus as subclasses and they should inherit the base class. 3. make sports bike as class which will inherit bike class which will further inherit vehicle class
Thank you for the video, it was very helpful! Great Idea using the CDN. One point that I would like to mention, that you don't have to send a heartbeat . Because with WebSocket there are events, for example connected and disconnection, so heartbeat might be wasteful.
Thanks so much for your time and effort.This section was missing from youtube or sometime they went into too much details which led to a disinterest in the topic. I don't know how, but somehow you are able to make these interesting and just the right amount of depth is explained. Really helpful and much appreciated.
Thanks for the video. As u said, does the whatsApp stores the user along with last seen time in the table? For the million of users ? How does the chat history works? How does the whatsapp stores the user contacts in the table?
Very good video, thank you. qq! what if client B is connected to another server? How does server A (that client A is connected) communicate to all other servers and check to see if client B has any online process there?
How would a client connected to different message server interacts. Like A connected to server 1 and B connected to server 2. In that case, How will be the server 1 server 2 interaction.
I am not sure whether Whatsapp uses PID + data pipe combination. Same could be simplified by using separate Queues in a messaging server like RabbitMQ which can be scaled horizontally, with persistent messaging. So, whenever a message sent from A to B, the message could be put in B's Queue. When B connects, automatically, message would be read by B. Once B Acknowledges, the message would be removed from queue. But as u have mentioned, there could be a limitation on how many connections server can handle. Likewise , when a message is sent to a group, the message can be put in the queue of all users of the group. Encryption in case of Asymmetric keys, we encrypt with public key and receiver decrypts with private key. so , we would end up storing public keys of all contacts in local store?
Great video, but I did not understand the media part properly. Can you please explain the justification for using the CDN for media? I think CDN will be useful only if the fraction of shared messages is very high. According to me CDN is like a cache, which will be useful if the number of items stored
Hey, You said at 10:03 that the server never knows the connection to the client and will never tries to connect to the client. Let us assume 2 situations 1. When the client is currently offline and now opens up the internet then the connection will be established with the message server . What happens to the connection when the client internet is on but the client is not doing any activity on the Whatsapp. Will the connection get closed? Then in that case how do we receive messages even when we are not actually using the App. If not then is Whatsapp keeping all the client connections live and running for all the clients having a Living Internet Connection even for a situation when the App is not being used. Are these client connection information stored somewhere on with the Message Server?
Hi 2 questions: 1) You said each process has 1 queue, typically whatsapp has 1M connection per server so do you think it's feasible to have that many queue? 2) What if before sending message server goes down. in that case queue also lost which causes our messages are lost as well? What are your thoughts?
Awesome one! Thanks! your videos are great! a quick question on this lecture. Say for an instance, if client A resides in a separate country (say US) from client B (say India), to which CDN A will upload the image in this scenario? If it's on US, how will the client B know which CDN url to connect to?
Two ways you can do: First way: When a group message is sent from a User A, at the server the process which is responsible for user A will perform lookup in a Group to users mapping table(gid->[user-a, user-b, user-c])note: (You can cache it for faster lookup) (this is not the user-to-pid table shown in the table).When Process A figures out all the users of group, it will try to get get PID for each users to send it to the respective user’s processes via Queue/Pipe. If the user is not online, them messges are persisted in the DB. And when the offline users are back online, there respective processes(connection) reads all the pending messages from DB and delivers them. Second way: You can have separate set of dedicated workers which are responsible to dispense group messages only.when User A sends a group message. Process A hands over the message to dedicated group message workers and in turn these worker will Dispense messages to all the online members and saves messages for all the offline group members. If you like, please subscribe and share this video, Thanks
Thanks for the reply. But counter question asked to me was, saving the details for all the offline users for all messages in DB, will be creating a lot of rows. is it still be feasible ? Can cache be used for it ?
You can use cache where every you need to access the data very much faster. DB will create many rows, but that;s the best way to persist messages for long time. Alternatively. you can persist messages in cache for some time(24 hour) and if the user doesn't come online by then. you have to persist in the DB(NOSQL with user id, indexed) That way you will be saving messages of the users who doesn't come online for 24hours only.
19:40 So when a new process is spawned when user B is back online, and since you are just storing uid and msg in the DB, and when that process reads msgs for B in the table, how does B know who is the sender of the msg? Shouldn't you be storing the sender_uid too in the db? Because you are sending the msg to B, but since B doesn't have the sender information, in which chat will it show the message?
in think message itself will be stored with sender_id on it. the app will parse the message into the user and message. in fact message here will be an array with message, id, timestamp, group_name etc. IMO
Thanks for this education. It is really appreciated. Just want to ask one thing, is the Whats app uses WebSocket to pass the message because it is not using any HTTP for alphabet messages?
I believe sending receiving txt messages when clients are offline is done using a lightweight connection . For other use case of media files https protocol is being used . what prompted for this decision as only difference is payload
Somewhat I think the bigger picture is still missing here, user base is definitely useful, then jump into Last seen directly?? What about other important facts like do we need to store all the messages, what about geo distribution, etc..
when you saying 'a client connects to the messaging server', what do you mean by connecting to the server? Is it a state or websocket, what is the protocol or tech used here?
I thought there will be many Message Server, each MS handles number of user socket. and thread are listening on the sockets and forward to destination MS, where thread there will pick up the message and forward to destination socket. 1)each MS knows userId-> socket. 2)proxy know user-> MS 3) each socket has listening on
I don't think WhatsApp is implemented in this way. The key to success of this App is the High Performance Messaging Bus Architecture built by the founders at Yahoo.
Same doubt I was having and was reading through the comments section to see if others feel the same or I am too novice to understand details. Any explanation is appreciated. Thanks!!
Create a new queue for each user? What is that queue? Is it something like SQS or RabbitMQ? Or is in-memory Queue? Creating a queue for each user itself sounds insane and impractical.
how does a process has been identified for a particular user? As in when user able to achieve a connection how does the process mapping works and how does it sends to other process ?
Very good video, just a point that needs more attention is at 14:33 when you talk about threads and process inside the message service. Actually this is not strictly true, actually they use green threads inside the BEAM machine which are totally different and much more scalable than a real CPU thread per connection. For more info I would recommend the reading on en.wikipedia.org/wiki/Green_threads and en.wikipedia.org/wiki/C10k_problem
Hey Thanks for this video. Just need one clarification on tick point (sent, delivered,read) . Are we saving this information in database? And one more question : What the mechanism in meesaging server to find out the appropriate client from billion of user ?
Thanks Narenra for the very nice explanation. I have one doubt: Will creating separate process/thread for every active user will be a scalable design..?..At a time, millions of users can be online, so creating millions of thread/process can cause scalable issues, like lots of memory/CPU consumption...Please provide your input.
Exactly, the best solution is to actually maintain a single thread and use RAM to store sockets. Basically a giant-ass in-memory data structure which keeps for each user a socket and a small queue. His understanding of "queue" seemed rather muddled which was disguised as some magic abstraction. Aslo, number of websocket connections on a server is limited to number of unused ports above port 1024, isn't it? That's incredibly limiting.
Can you please help in giving me high-level architecture for the following functional requirements? FR (Notification system): - send an email, SMS, in-app msgs - support all devices - mobile app, web app, tab, etc. - Get promo codes/audio confirmation messages (OTP on a call), media content (mp3, etc.) from exiting service/system, and use those promo codes to send. - expiration of a notification - recurring in nature - schedule a message Basically, I'm looking for a system design for the recurring scheduler.
I think process/thread which is handling A , first will try to find whether the B 's messaging server is in localNode or in remote node based on b's Id and last used location info in db , based on that it will handle that . if its in some remote node first it will send to that node .
How does light weight process that handle message queue for particular client keep its message queue size within certain bound? E.g. If some client sends 100s of messages in burst then the process handling that user's client will be overwhelmed. Will those be offloaded to DB as well? or will it stop accepting new messages until queue is empty? one more question on similar note, what happens if client has sent message and it is in queue but the server holding that queue / process dies. In that case how will clients know to retry sending those messages on reconnect?
Hello Naren, Thanks a ton for the video. I have one question, considering the user base of WhatsApp (1B+), I don't think single load balancer can handle the load of so many connections. There must be multiple load balancers for example, one for each region etc. Could you please throw some light on this. Also, it would really help if you can discuss more on DB sizing. You mentioned WhatsApp manages to serve 10,000 users per messaging server node so there would be at least 100,000 messaging servers to handle 1billiion live users. What would be the design of DB to handle the load of 100,000 servers?
Actually whatsapp successfully handles upto 2Million connections per server (blog.whatsapp.com/196/1-million-is-so-2011)please recalculate the number of servers needed. and also when we say there are 1B live users we can optimize connections problem using GCM/APN to establish connections(lazy connect) only where there is a messages at the server!!
Yes do dns round Robin to first bl and then second level bl. Make sure that first level bl needs to be online 100% because how round Robin works. Google also uses de cix for example to directly pipe the request so their servers. I am 14 btw so I ahbw no idear about production
Just one question: according to this architecture you are only storing the messages to DB when the other user is not connected right? then how are you gonna keep track of all the history or should i say the past messages sent or received by any particular user?
1. Can some tell if using one thread per user is feasible? Normally we will have few thousands TCP connection opened.A server will process once request at one time, it will get request , look up receiver connection details and forward it.I dont see need to multiple threads here. 2. Also why we need queue for each receiver? If user is online send it right away,otherwise save it db and finish request.
Amazing and very simple explanation for such a complex system. I have a question about managing unique ID for every message for million different users. How could they do that? Thanks 😊
You didn't talk how the encryption is used. If it's symmetric, anyone can decrypt easily if key is known. In case asymmetric then how many public and private keys is created or it's same private and public key used for all the users?
i dont understand what exactly the connection object is, can be more detail (i know look up the hash userid : connection object) , how do it help you to send the request to the target user? and how do it handle its lifecycle (e.g when a user is off, when the system is down)
Suppose user A sends 10 text and 10 video messages to a group at different times . If user B of that group comes on line after certain period . I would like to know how all the text and media messages retrieved to user B phone? . Does it keep the same sequential order of messages posted to the concerned server databases by the user A ?. OR does B retrieve text messages first and media contents later I.e .does it give any priority to text or video / picture messages already stored in databases .
Awesome video. Few points :-
i) We can use indexing on UID for faster retrievals. O(n) to O(log(n))
ii) Spawning a thread for every user is not efficient. Instead, we can use a dynamic threadpool.
iii) Similarly, having a queue per user is not efficient. Instead, we can have a global queue with object (message, action like send/receive, uid)
iv) Since you are using DB for storing message in case user is not online, we need to implement a disaster recovery mechanism i.e. replication.
v) We can also implement blocked contacts by storing list of blocked UIDs for each user in the DB & we can drop such messages in web server.
actually whatsapp and other messaging apps use very customized protocols and language features to cater specific needs, such as Erlang which is extremely useful where millions of lightweight processes might need to be started and each other doesn't need to share data (everything is immutable)...also their xmpp based messaging uses extremely compressed payload compared to normal xml data transfer. These are implementation details but these are what makes the design itself eventually.
You are pretty thorough about the subject and it is reflected by the way you teach. A lot of people are usually confused as they have a vague idea about what to speak. This shows that you take efforts perparing for the delivery of the content. Thank you so much for your efforts to make these videos. I am learning a lot.
Clear and better than other WhatsApp design videos on RUclips. Thanks!
the guy has done amazingly! what I don't fully understand is the pid uid approach. All the messaging servers sit behind ALB. If a user session is bound with a pid in a specific server, it will lower the efficiency of load balancing, cuz the user will stay with that server during the conversation session. It's doable, but not ideal, IMO.
Thats an excellent explanation. Thats marks the start of my system design interviews preparation. 24-05-2024
I love this content. Just for people who want to read, I got a couple of suggestions here:
I personally would use the hearthbeat to keep the process alive. Once the heartbeat stops, or messages fail to be delivered, the process responsible for a client shutdowns and writes the shuttingdown timestamp in the DB and that is the last seen time. It has much less writes to the DB.
For asymmetric encryption one would use the public/private key pairs to encrypt a common secret key (that is how you use it usually in SSL and other protocols) since asymmetric encryption is very expensive in both time and computing resources. You don't want to spent most of your processing power on encryption primitives (even less in a cellphone).
How do you recognize when heartbeat has stopped?
Well firstly, this video is really good. One part which is missing is you did not consider multiple messaging server instances running.
This actually raises a very important question.
One server let's say maintain persistent connection, what if that goes down for some reason?
Nice video btw. However, there are questions that I wanted to ask. When A sends a message to B and B is yet to connect to the server, your explanation said that the message is kept in the global DB until the PID of B in the server is created. When the PID of B is created, the server did a full lookup to the DB in order to find out what messages that is not received by B. My question is that:
1. Does the server evoke a trigger to lookup the global database whenever a new PID entry is formed?
2. If the server evoke a lookup trigger to the global DB with an assumption of there maybe a million of pending messages in the global DB. Does looking up without updating the global DB or indexing could scale up the system?
3. What is the main core /pattern of every real life system? Does multiple queries by giving heartbeat will slow down the performance?
Steven Candra the design listed in the video is a poor design. I cannot believe he is putting the pid in the database. The thread id is transient.
@@meiliangwhut I dont think, he is saying to store the pid in the DB. He is storing the uid of the user B and the message for user B. When user B connects, and reports that is it connected, we have to find all undelivered messages and send it to user B. Also even though pid is transient, it will alive while the connection is on. You have to know which messaging server the receiver is connected to
Did not explain the case where sender and receiver clients are connected to two separate "message servers". In that case the message cant be sent to in-memory-queue. A dedicated messaging queue like sqs can be used to solve this..
This is a great explanation 💖💖
Few Questions which I am pondering right now
1. How does the long running thread/process work in Messaging server? Does it mean that the connection from client to server is not stateless but stateful and every time it finds the same process? Does that mean that the process is running on a single central machine and not in a horizontally scaled pods?
2. Is it client responsibility to poll for the messages after some time interval or is it the server's job to push messages to the client? If its the later then does that mean that Billions of client (devices) are making a stateful connection to server? is the protocol being used here is HTTP?
For last seen, instead of continuous heart beat, the user can update the column whenever he is closing the app and make the column as Null whenever he comes online
Here null specifies that the user is online...
Thanks for the recommendation YT , was searching fr ur videos frm few months but was not particular but now im very specific learning from ur videos
Thanks, that was pretty great. Would love to see an expansion of how encryption would work for the system and the tradeoffs you'd get between symmetric/asymmetric encryption.
Considered :) and Thanks
Excellent brother..before seeing your videos I never thought about the system design. Now I exited to understand every application system design . Thanks for everything.!
Great explanation. Some question on last seen design. Why you need a heart bit send by user at regular interval. If the user is connected to server it is the current time and once disconnected just store that time as last seen. So difference of current time minus that timestamp will be last seen. Once user reconnect to server it is again going to be current time as last seen.
It would be impossible to use system threads for an application like Whatsapp so it is definitely Erlang's actors
@@shaaradpandey5546 It won't scale.
@@shaaradpandey5546 spring? no. It depends on how much users do you need to support. forget about spring, that's 90s tech
Even actors are threads, all you gotta look at is whether they are scaling across servers or on the same server.
Yes... He is explaning Erlang OTP framework in simple way... Only super light weight threading mechanisms will scale like this... WhatsApp itslef uses Erlang OTP I guess.
😊😊😊😊😊😊😊😊😊😊😊😊😊😊😊😊😊😊😊😊😊😊😊😊😊
Personally, I think .(1) The message receiver and message delivery should be decouple into 2 separate service, and they could be fully stateless. They can be connected with a message queue service or sit behind service discovery directly. Put them in a single executable and pass message between process is feasible for very small user base, but it's not scalable at all. Besides, It makes the "Mess SERVER" hard to test. (2) If you have an abstract concept "channel" or "session" here, then it's easy to forward message with rpc or write it into DB. (3) real-time message is time sensitive and has high frequent requests. It's not a very good idea to write all message into DB. Instead, I would say we should only write message for offline users into two level DB (Memory + persistent).
What did I just hear? New thread/process to handle each peer?
We are talking about billions of peers together here!
What is your idea ? Any standard http server handles request processing creating multiple threads unless you are using CGI.
When I saw this video, I had doubts that WhatsApp’s messaging server can handle 10 million concurrent connections. Those were arose on the basis that setup would hit limit of 65k because all the connections to the back end server would be from the load balancer, that means source ip, destination ip, destination port would be same for each connection. Then only variable would be source port. So load balancer won’t be able to create more than 65k connections. But I learnt that this can be solved by making backend server listen on multiple ports due to which we can have 64k * (no. of ports on which backend is listening), which can easily overcome 10 million connections limits (given that throughput of application server is high enough).
Hey, I think some example of duplex protocol should be mentioned. Just saying "TCP" does not provide complete picture. Mostly XMPP protocol (built on top of TCP) is used, and not, for example HTTP.
Very educational. The other very important feature that wasn't addressed here is the 'group chat'. I wonder how WhatsApp handles them. If you could shed light on that as well, that would be awesome. Other than that, it simply awesome. Keep up the great work.
It would be very similar to the one to one chat logic. Except that, Whatsapp stores the users in a group. So, when a message sent to a group, whatsapp, takes the users in the group and sends to the corresponding Queues through the PIDs. If the user is offline, it is stored in DB against the userid.
Hi @Tech Dummies, Thanks for this video. But i have one doubt. As you have mentioned at 15:47 minute of this video, for every active client there would be a corresponding thread/process and a queue. But my point is - isn't it taking unnecessary space in memory just for being active.
bcz whatsApp has billions of users. And if millions are just active( they are not doing messaging or call) , for them also process and queue will be created.
Thank You !
I came here to ask the same question. Anyway, do you have any answer / Explanantion for this? though the question you asked two years ago. Thanks!
One thing I don't understand is how many threads will be active in the service instances and clarity around thread management.
Also let's say we have millions of connections in one server, will there be calls to other servers for contacts who have connections in other server.(If so this could be the reason for some delay that we experience when contacting some remote contacts)
Rest is superb!
Thank you for this lecture .
Further, I have a query regarding design
Problem Statement:-
I have subcategories defined under Vehicle Class as :- Bike ,Car, truck, bus
Further, there are different variants under each four categories
Bike :- sport, standard
Car :- sport, standard, electric
Truck :- Mini Truck , Power Truck
Bus:- Mini Bus, AC Bus
I have to calculate the price based on variants. For example Mini Bus price.
Question:- How should I define the classes ?
1. Keep only vehicle as a class .
2. Keep Vehicle as a base class and make bike, car, truck , bus as subclasses and they should inherit the base class.
3. make sports bike as class which will inherit bike class which will further inherit vehicle class
location sharing is another important use case but can be implemented similarly to text message as we here sent the latitude-longitude data.
Thank you for the video, it was very helpful!
Great Idea using the CDN.
One point that I would like to mention, that you don't have to send a heartbeat . Because with WebSocket there are events, for example connected and disconnection, so heartbeat might be wasteful.
so, when a user I disconnected, the timestamp at that point is sent to the small dB holding the user's active status right?
awesome video. Wouldn't a single queue for a server handling multiple client connections be more efficient than 1-1 for the queue to thread/client?
Narendra, I will pass my interview thanks to your tutorials !! I owe you a beer.. or two
All the best:)
Did you pass?
@@somerandomguy000 I am a Software Development Manager at Amazon now !
@@JUSTINHBK007 congrats man!
Thanks so much for your time and effort.This section was missing from youtube or sometime they went into too much details which led to a disinterest in the topic. I don't know how, but somehow you are able to make these interesting and just the right amount of depth is explained. Really helpful and much appreciated.
Thanks :)
Very nice explanation, thank you very much! I would like to see the followup video talking about encryption and audio/video calling service.
So neat and just rightly detailed. I regret finding your channel so late.
Don't get it why we need pair?
to match users and processes
Brilliant! Just the right depth and clarity!
This is really great and straightforward. I liked it better than Tushar Roy's video on the same topic
Thanks for the video. As u said, does the whatsApp stores the user along with last seen time in the table? For the million of users ? How does the chat history works?
How does the whatsapp stores the user contacts in the table?
you are making complex things so simple.I did not know how 27 mins pass and know i feel like i know a lot of things.Thanks a ton.
no words...simple and superb
Very good video, thank you. qq! what if client B is connected to another server? How does server A (that client A is connected) communicate to all other servers and check to see if client B has any online process there?
just loved the way you explain and make it simple to understand!!!great video !! thanks
How would a client connected to different message server interacts. Like A connected to server 1 and B connected to server 2. In that case, How will be the server 1 server 2 interaction.
thanks great vidoe. U use single thread to deal with user message, so how many users one server could handle?
I am not sure whether Whatsapp uses PID + data pipe combination. Same could be simplified by using separate Queues in a messaging server like RabbitMQ which can be scaled horizontally, with persistent messaging. So, whenever a message sent from A to B, the message could be put in B's Queue. When B connects, automatically, message would be read by B. Once B Acknowledges, the message would be removed from queue. But as u have mentioned, there could be a limitation on how many connections server can handle. Likewise , when a message is sent to a group, the message can be put in the queue of all users of the group. Encryption in case of Asymmetric keys, we encrypt with public key and receiver decrypts with private key. so , we would end up storing public keys of all contacts in local store?
But for every user will there be separate queue created?
@@nilaysheth3283 Yes. One queue per user because messaging systems are queue intensive.
Great video, but I did not understand the media part properly. Can you please explain the justification for using the CDN for media? I think CDN will be useful only if the fraction of shared messages is very high. According to me CDN is like a cache, which will be useful if the number of items stored
So would you recommend a normal storage space to save the media temporarily(till B downloads it to his phone), something like Amazon S3?
Hey,
You said at 10:03 that the server never knows the connection to the client and will never tries to connect to the client. Let us assume 2 situations
1. When the client is currently offline and now opens up the internet then the connection will be established with the message server . What happens to the connection when the client internet is on but the client is not doing any activity on the Whatsapp.
Will the connection get closed? Then in that case how do we receive messages even when we are not actually using the App.
If not then is Whatsapp keeping all the client connections live and running for all the clients having a Living Internet Connection even for a situation when the App is not being used.
Are these client connection information stored somewhere on with the Message Server?
Great knowledgable explanation. Quick question: Why are we using a different connection HTTP server for media rather not the same connection?
Hi 2 questions: 1) You said each process has 1 queue, typically whatsapp has 1M connection per server so do you think it's feasible to have that many queue? 2) What if before sending message server goes down. in that case queue also lost which causes our messages are lost as well? What are your thoughts?
Agreed. Ditributed message queue should be the way i believe. Creating a queue per message or user in the server definetly wont scale
Awesome one! Thanks! your videos are great!
a quick question on this lecture. Say for an instance, if client A resides in a separate country (say US) from client B (say India), to which CDN A will upload the image in this scenario? If it's on US, how will the client B know which CDN url to connect to?
I found your videos very interesting sir, just I watch them 1.25x speed and then it becomes awesome.
Excellent video. Can you please tell something about how groups messages are handled ? And how offline members text will be managed in that case ?
Two ways you can do:
First way:
When a group message is sent from a User A, at the server the process which is responsible for user A will perform lookup in a Group to users mapping table(gid->[user-a, user-b, user-c])note: (You can cache it for faster lookup) (this is not the user-to-pid table shown in the table).When Process A figures out all the users of group, it will try to get get PID for each users to send it to the respective user’s processes via Queue/Pipe. If the user is not online, them messges are persisted in the DB.
And when the offline users are back online, there respective processes(connection) reads all the pending messages from DB and delivers them.
Second way:
You can have separate set of dedicated workers which are responsible to dispense group messages only.when User A sends a group message. Process A hands over the message to dedicated group message workers and in turn these worker will Dispense messages to all the online members and saves messages for all the offline group members.
If you like, please subscribe and share this video, Thanks
Thanks for the reply. But counter question asked to me was, saving the details for all the offline users for all messages in DB, will be creating a lot of rows. is it still be feasible ?
Can cache be used for it ?
You can use cache where every you need to access the data very much faster.
DB will create many rows, but that;s the best way to persist messages for long time.
Alternatively. you can persist messages in cache for some time(24 hour) and if the user doesn't come online by then. you have to persist in the DB(NOSQL with user id, indexed)
That way you will be saving messages of the users who doesn't come online for 24hours only.
19:40 So when a new process is spawned when user B is back online, and since you are just storing uid and msg in the DB, and when that process reads msgs for B in the table, how does B know who is the sender of the msg? Shouldn't you be storing the sender_uid too in the db? Because you are sending the msg to B, but since B doesn't have the sender information, in which chat will it show the message?
in think message itself will be stored with sender_id on it. the app will parse the message into the user and message. in fact message here will be an array with message, id, timestamp, group_name etc. IMO
Excellent Tutorial. Keep up the good work
Good job and explained well. Just one suggestion it is always good idea to save in db first each message and then transfer it to the other user.
So are you talking about running a process for every client? Which keeps reading from queue?
Thanks for this education. It is really appreciated.
Just want to ask one thing, is the Whats app uses WebSocket to pass the message because it is not using any HTTP for alphabet messages?
I believe sending receiving txt messages when clients are offline is done using a lightweight connection . For other use case of media files https protocol is being used . what prompted for this decision as only difference is payload
Love your videos! Thank you. And the links in the description are very helpful as well.
Somewhat I think the bigger picture is still missing here, user base is definitely useful, then jump into Last seen directly?? What about other important facts like do we need to store all the messages, what about geo distribution, etc..
What about the low level design? Dont you think if MySQL tables are used they cannot handle queries in millions? Please explain that also.
when you saying 'a client connects to the messaging server', what do you mean by connecting to the server? Is it a state or websocket, what is the protocol or tech used here?
Very useful video, great explanation!
I thought there will be many Message Server, each MS handles number of user socket. and thread are listening on the sockets and forward to destination MS, where thread there will pick up the message and forward to destination socket.
1)each MS knows userId-> socket.
2)proxy know user-> MS
3) each socket has listening on
very nicely explained. Thanks
I thought my dog was barking but it was in the video lol.
Really great videos, Thank you Narendra
Nice Explanation. its my first system design understanding. Its clear
3:30 - Should also clarify with the interviewer - how many messages per day will be sent by a user on average.
Really really really super great explanation bro👍👌👌👌
crisp, in depth and perfect ! thanks for your time
I don't think WhatsApp is implemented in this way. The key to success of this App is the High Performance Messaging Bus Architecture built by the founders at Yahoo.
What happens if the thread is in different message server. Do we need to have network connectivity between those ?
Same doubt I was having and was reading through the comments section to see if others feel the same or I am too novice to understand details.
Any explanation is appreciated.
Thanks!!
Your videos are absolutely simple and to the point..real beauty....hats off, Great work!
@At thanks
0:53 - Group chat is also an important use case of Whatsapp
Create a new queue for each user? What is that queue? Is it something like SQS or RabbitMQ? Or is in-memory Queue? Creating a queue for each user itself sounds insane and impractical.
rabbitMQ
how does a process has been identified for a particular user? As in when user able to achieve a connection how does the process mapping works and how does it sends to other process ?
No scale estimations?
Very good video, just a point that needs more attention is at 14:33 when you talk about threads and process inside the message service. Actually this is not strictly true, actually they use green threads inside the BEAM machine which are totally different and much more scalable than a real CPU thread per connection. For more info I would recommend the reading on en.wikipedia.org/wiki/Green_threads and en.wikipedia.org/wiki/C10k_problem
Hey Thanks for this video.
Just need one clarification on tick point (sent, delivered,read) .
Are we saving this information in database?
And one more question :
What the mechanism in meesaging server to find out the appropriate client from billion of user ?
Simply Great Work. Awesome.
You don't need heartbeat every after 5 seconds , better once user exits the app , you can update LastSeenTimeStamp .
Thanks Narenra for the very nice explanation. I have one doubt: Will creating separate process/thread for every active user will be a scalable design..?..At a time, millions of users can be online, so creating millions of thread/process can cause scalable issues, like lots of memory/CPU consumption...Please provide your input.
Not quite the processes but Actors. Search for Akka, Orleans.
Thanks @@ursypc . I will explore it definitely.
Exactly, the best solution is to actually maintain a single thread and use RAM to store sockets. Basically a giant-ass in-memory data structure which keeps for each user a socket and a small queue.
His understanding of "queue" seemed rather muddled which was disguised as some magic abstraction.
Aslo, number of websocket connections on a server is limited to number of unused ports above port 1024, isn't it? That's incredibly limiting.
Can you please help in giving me high-level architecture for the following functional requirements?
FR (Notification system):
- send an email, SMS, in-app msgs
- support all devices - mobile app, web app, tab, etc.
- Get promo codes/audio confirmation messages (OTP on a call), media content (mp3, etc.) from exiting service/system, and use those promo codes to send.
- expiration of a notification
- recurring in nature
- schedule a message
Basically, I'm looking for a system design for the recurring scheduler.
at 19:33, what if clients A and B are connected to different messaging server ?
I think process/thread which is handling A , first will try to find whether the B 's messaging server is in localNode or in remote node based on b's Id and last used location info in db , based on that it will handle that . if its in some remote node first it will send to that node .
How does light weight process that handle message queue for particular client keep its message queue size within certain bound? E.g. If some client sends 100s of messages in burst then the process handling that user's client will be overwhelmed. Will those be offloaded to DB as well? or will it stop accepting new messages until queue is empty?
one more question on similar note, what happens if client has sent message and it is in queue but the server holding that queue / process dies. In that case how will clients know to retry sending those messages on reconnect?
Man, you need more light in the room, by the way, great video, Thanks!
Can we have a messaging queue when the receiver is offline so that when receiver is online system does not has to query DB
Hello Naren, Thanks a ton for the video.
I have one question, considering the user base of WhatsApp (1B+), I don't think single load balancer can handle the load of so many connections. There must be multiple load balancers for example, one for each region etc. Could you please throw some light on this.
Also, it would really help if you can discuss more on DB sizing. You mentioned WhatsApp manages to serve 10,000 users per messaging server node so there would be at least 100,000 messaging servers to handle 1billiion live users. What would be the design of DB to handle the load of 100,000 servers?
Actually whatsapp successfully handles upto 2Million connections per server (blog.whatsapp.com/196/1-million-is-so-2011)please recalculate the number of servers needed. and also when we say there are 1B live users we can optimize connections problem using GCM/APN to establish connections(lazy connect) only where there is a messages at the server!!
Yes do dns round Robin to first bl and then second level bl. Make sure that first level bl needs to be online 100% because how round Robin works. Google also uses de cix for example to directly pipe the request so their servers. I am 14 btw so I ahbw no idear about production
Just one question: according to this architecture you are only storing the messages to DB when the other user is not connected right? then how are you gonna keep track of all the history or should i say the past messages sent or received by any particular user?
1. Can some tell if using one thread per user is feasible? Normally we will have few thousands TCP connection opened.A server will process once request at one time, it will get request , look up receiver connection details and forward it.I dont see need to multiple threads here.
2. Also why we need queue for each receiver? If user is online send it right away,otherwise save it db and finish request.
Why tcp is not used?? Why http is better to send media? Please clarify
Thanks. Is there any messaging framework that I can hands on and play with?
You have not talked abt the various tables included...how the messages are stored in the local DB etc..
I thought its just a Send or Rec App...Gud explained
I wish I could like the video twice or thrice. Super amazing. Taken bottom up approach. Reached complexity with simplicity.
Amazing and very simple explanation for such a complex system. I have a question about managing unique ID for every message for million different users. How could they do that? Thanks 😊
user_id + uuid ?
Thanks for making things easy brother.
I've a Question
18:00 - what is A message server's process got interrupted ?
Anil 4u I also do have the same question? Want to know whether the messages in server queue will be added to the db ?
You didn't talk how the encryption is used. If it's symmetric, anyone can decrypt easily if key is known. In case asymmetric then how many public and private keys is created or it's same private and public key used for all the users?
i dont understand what exactly the connection object is, can be more detail (i know look up the hash userid : connection object) , how do it help you to send the request to the target user? and how do it handle its lifecycle (e.g when a user is off, when the system is down)
Suppose user A sends 10 text and 10 video messages to a group at different times . If user B of that group comes on line after certain period . I would like to know how all the text and media messages retrieved to user B phone? . Does it keep the same sequential order of messages posted to the concerned server databases by the user A ?. OR does B retrieve text messages first and media contents later I.e .does it give any priority to text or video / picture messages already stored in databases .
Does the table of [PID, UID] is only one table which uses all the online users at the same time ?
What in case client 1 and 2 are connected to different messaging server?
Thanks man! these videos are really helpfull.