Whatsapp System design or software architecture
HTML-код
- Опубликовано: 13 июн 2024
- Most important and common question asked for developers and architect to check their design skills.
Design whatsApp or Provide architecture for whatsApp
OR
Learn how to build Realtime messaging system or Chatting application.
blog.whatsapp.com/196/1-milli...
highscalability.com/blog/2013/...
blog-bhaskaruni.blogspot.com/2...
Disclaimer: System design can vary based on the feature. this video explains the possible solution to build one. it doesn't guarantee the scale or full working. Instead this resource should be consumed to understand how it can be designed.
Donate/Patreon: / techdummies Наука
You are pretty thorough about the subject and it is reflected by the way you teach. A lot of people are usually confused as they have a vague idea about what to speak. This shows that you take efforts perparing for the delivery of the content. Thank you so much for your efforts to make these videos. I am learning a lot.
Awesome video. Few points :-
i) We can use indexing on UID for faster retrievals. O(n) to O(log(n))
ii) Spawning a thread for every user is not efficient. Instead, we can use a dynamic threadpool.
iii) Similarly, having a queue per user is not efficient. Instead, we can have a global queue with object (message, action like send/receive, uid)
iv) Since you are using DB for storing message in case user is not online, we need to implement a disaster recovery mechanism i.e. replication.
v) We can also implement blocked contacts by storing list of blocked UIDs for each user in the DB & we can drop such messages in web server.
actually whatsapp and other messaging apps use very customized protocols and language features to cater specific needs, such as Erlang which is extremely useful where millions of lightweight processes might need to be started and each other doesn't need to share data (everything is immutable)...also their xmpp based messaging uses extremely compressed payload compared to normal xml data transfer. These are implementation details but these are what makes the design itself eventually.
Great comment. I guess this video is good to understand how to get to the point where such optimizations are needed
Clear and better than other WhatsApp design videos on RUclips. Thanks!
Excellent brother..before seeing your videos I never thought about the system design. Now I exited to understand every application system design . Thanks for everything.!
crisp, in depth and perfect ! thanks for your time
Thanks, that was pretty great. Would love to see an expansion of how encryption would work for the system and the tradeoffs you'd get between symmetric/asymmetric encryption.
Considered :) and Thanks
Thanks man! these videos are really helpfull.
Thanks for the recommendation YT , was searching fr ur videos frm few months but was not particular but now im very specific learning from ur videos
So neat and just rightly detailed. I regret finding your channel so late.
Thats an excellent explanation. Thats marks the start of my system design interviews preparation. 24-05-2024
just loved the way you explain and make it simple to understand!!!great video !! thanks
Love your videos! Thank you. And the links in the description are very helpful as well.
love your videos. Pretty straight forward and informative. Keep it up.
I thought its just a Send or Rec App...Gud explained
Narendra, I will pass my interview thanks to your tutorials !! I owe you a beer.. or two
All the best:)
Did you pass?
@@somerandomguy000 I am a Software Development Manager at Amazon now !
@@JUSTINHBK007 congrats man!
awesome video. Wouldn't a single queue for a server handling multiple client connections be more efficient than 1-1 for the queue to thread/client?
This is really great and straightforward. I liked it better than Tushar Roy's video on the same topic
Thank you for the video, it was very helpful!
Great Idea using the CDN.
One point that I would like to mention, that you don't have to send a heartbeat . Because with WebSocket there are events, for example connected and disconnection, so heartbeat might be wasteful.
so, when a user I disconnected, the timestamp at that point is sent to the small dB holding the user's active status right?
Good job and explained well. Just one suggestion it is always good idea to save in db first each message and then transfer it to the other user.
Nice Explanation. its my first system design understanding. Its clear
Well firstly, this video is really good. One part which is missing is you did not consider multiple messaging server instances running.
This actually raises a very important question.
One server let's say maintain persistent connection, what if that goes down for some reason?
Man, you need more light in the room, by the way, great video, Thanks!
Brilliant! Just the right depth and clarity!
Thanks so much for your time and effort.This section was missing from youtube or sometime they went into too much details which led to a disinterest in the topic. I don't know how, but somehow you are able to make these interesting and just the right amount of depth is explained. Really helpful and much appreciated.
Thanks :)
Very nice explanation, thank you very much! I would like to see the followup video talking about encryption and audio/video calling service.
Thanks for the concise video, I like the accent.
Thank you. That's well explained and to the point
I found your videos very interesting sir, just I watch them 1.25x speed and then it becomes awesome.
Nice video btw. However, there are questions that I wanted to ask. When A sends a message to B and B is yet to connect to the server, your explanation said that the message is kept in the global DB until the PID of B in the server is created. When the PID of B is created, the server did a full lookup to the DB in order to find out what messages that is not received by B. My question is that:
1. Does the server evoke a trigger to lookup the global database whenever a new PID entry is formed?
2. If the server evoke a lookup trigger to the global DB with an assumption of there maybe a million of pending messages in the global DB. Does looking up without updating the global DB or indexing could scale up the system?
3. What is the main core /pattern of every real life system? Does multiple queries by giving heartbeat will slow down the performance?
Steven Candra the design listed in the video is a poor design. I cannot believe he is putting the pid in the database. The thread id is transient.
@@meiliangwhut I dont think, he is saying to store the pid in the DB. He is storing the uid of the user B and the message for user B. When user B connects, and reports that is it connected, we have to find all undelivered messages and send it to user B. Also even though pid is transient, it will alive while the connection is on. You have to know which messaging server the receiver is connected to
Your videos are absolutely simple and to the point..real beauty....hats off, Great work!
@At thanks
Can we have a messaging queue when the receiver is offline so that when receiver is online system does not has to query DB
I love this content. Just for people who want to read, I got a couple of suggestions here:
I personally would use the hearthbeat to keep the process alive. Once the heartbeat stops, or messages fail to be delivered, the process responsible for a client shutdowns and writes the shuttingdown timestamp in the DB and that is the last seen time. It has much less writes to the DB.
For asymmetric encryption one would use the public/private key pairs to encrypt a common secret key (that is how you use it usually in SSL and other protocols) since asymmetric encryption is very expensive in both time and computing resources. You don't want to spent most of your processing power on encryption primitives (even less in a cellphone).
How do you recognize when heartbeat has stopped?
Just brilliant Boss, Thank you!!
Hey, I think some example of duplex protocol should be mentioned. Just saying "TCP" does not provide complete picture. Mostly XMPP protocol (built on top of TCP) is used, and not, for example HTTP.
Really great videos, Thank you Narendra
Great knowledgable explanation. Quick question: Why are we using a different connection HTTP server for media rather not the same connection?
Hi Naren, One Doubt, If the thread of another user is on different messaging server, how the communication is happening between two nodes handling different connections of the same conversation. Sorry, if I am missing something
the guy has done amazingly! what I don't fully understand is the pid uid approach. All the messaging servers sit behind ALB. If a user session is bound with a pid in a specific server, it will lower the efficiency of load balancing, cuz the user will stay with that server during the conversation session. It's doable, but not ideal, IMO.
location sharing is another important use case but can be implemented similarly to text message as we here sent the latitude-longitude data.
19:40 So when a new process is spawned when user B is back online, and since you are just storing uid and msg in the DB, and when that process reads msgs for B in the table, how does B know who is the sender of the msg? Shouldn't you be storing the sender_uid too in the db? Because you are sending the msg to B, but since B doesn't have the sender information, in which chat will it show the message?
in think message itself will be stored with sender_id on it. the app will parse the message into the user and message. in fact message here will be an array with message, id, timestamp, group_name etc. IMO
Thank you!!! I beign to learn System Design!!!
no words...simple and superb
Very educational. The other very important feature that wasn't addressed here is the 'group chat'. I wonder how WhatsApp handles them. If you could shed light on that as well, that would be awesome. Other than that, it simply awesome. Keep up the great work.
It would be very similar to the one to one chat logic. Except that, Whatsapp stores the users in a group. So, when a message sent to a group, whatsapp, takes the users in the group and sends to the corresponding Queues through the PIDs. If the user is offline, it is stored in DB against the userid.
Simply Great Work. Awesome.
What about the low level design? Dont you think if MySQL tables are used they cannot handle queries in millions? Please explain that also.
So are you talking about running a process for every client? Which keeps reading from queue?
wow, thanks a lot for the clear explanation.
Very good video, thank you. qq! what if client B is connected to another server? How does server A (that client A is connected) communicate to all other servers and check to see if client B has any online process there?
Very useful video, great explanation!
Nice ! Thanks for sharing
Excellent Tutorial. Keep up the good work
Just one question: according to this architecture you are only storing the messages to DB when the other user is not connected right? then how are you gonna keep track of all the history or should i say the past messages sent or received by any particular user?
Your videos are really great.. :) Delivered
Great explanation!! the best system design video so far I have seen in youtube!!
For last seen, instead of continuous heart beat, the user can update the column whenever he is closing the app and make the column as Null whenever he comes online
Here null specifies that the user is online...
Hii Thanks dude☺ can you please make something for Devops engineer interview prep..
Thanks for your work man!!
Hello Naren, Thanks a ton for the video.
I have one question, considering the user base of WhatsApp (1B+), I don't think single load balancer can handle the load of so many connections. There must be multiple load balancers for example, one for each region etc. Could you please throw some light on this.
Also, it would really help if you can discuss more on DB sizing. You mentioned WhatsApp manages to serve 10,000 users per messaging server node so there would be at least 100,000 messaging servers to handle 1billiion live users. What would be the design of DB to handle the load of 100,000 servers?
Actually whatsapp successfully handles upto 2Million connections per server (blog.whatsapp.com/196/1-million-is-so-2011)please recalculate the number of servers needed. and also when we say there are 1B live users we can optimize connections problem using GCM/APN to establish connections(lazy connect) only where there is a messages at the server!!
Yes do dns round Robin to first bl and then second level bl. Make sure that first level bl needs to be online 100% because how round Robin works. Google also uses de cix for example to directly pipe the request so their servers. I am 14 btw so I ahbw no idear about production
Few Questions which I am pondering right now
1. How does the long running thread/process work in Messaging server? Does it mean that the connection from client to server is not stateless but stateful and every time it finds the same process? Does that mean that the process is running on a single central machine and not in a horizontally scaled pods?
2. Is it client responsibility to poll for the messages after some time interval or is it the server's job to push messages to the client? If its the later then does that mean that Billions of client (devices) are making a stateful connection to server? is the protocol being used here is HTTP?
Hey,
You said at 10:03 that the server never knows the connection to the client and will never tries to connect to the client. Let us assume 2 situations
1. When the client is currently offline and now opens up the internet then the connection will be established with the message server . What happens to the connection when the client internet is on but the client is not doing any activity on the Whatsapp.
Will the connection get closed? Then in that case how do we receive messages even when we are not actually using the App.
If not then is Whatsapp keeping all the client connections live and running for all the clients having a Living Internet Connection even for a situation when the App is not being used.
Are these client connection information stored somewhere on with the Message Server?
Great job, thank you so much 😌
Hi @Tech Dummies, Thanks for this video. But i have one doubt. As you have mentioned at 15:47 minute of this video, for every active client there would be a corresponding thread/process and a queue. But my point is - isn't it taking unnecessary space in memory just for being active.
bcz whatsApp has billions of users. And if millions are just active( they are not doing messaging or call) , for them also process and queue will be created.
Thank You !
I came here to ask the same question. Anyway, do you have any answer / Explanantion for this? though the question you asked two years ago. Thanks!
Hey Thanks for this video.
Just need one clarification on tick point (sent, delivered,read) .
Are we saving this information in database?
And one more question :
What the mechanism in meesaging server to find out the appropriate client from billion of user ?
Why there is a need for separate Http server to send images or audio files? Why can't we use the message server directly?
I have one observation on this. If we keep the message buffer within the messaging service or the gateway, won't the gateways be overwhelmed since it holds the costly TCP or websocket connection for each client?
It would be impossible to use system threads for an application like Whatsapp so it is definitely Erlang's actors
@@shaaradpandey5546 It won't scale.
@@shaaradpandey5546 spring? no. It depends on how much users do you need to support. forget about spring, that's 90s tech
Even actors are threads, all you gotta look at is whether they are scaling across servers or on the same server.
Yes... He is explaning Erlang OTP framework in simple way... Only super light weight threading mechanisms will scale like this... WhatsApp itslef uses Erlang OTP I guess.
😊😊😊😊😊😊😊😊😊😊😊😊😊😊😊😊😊😊😊😊😊😊😊😊😊
What did I just hear? New thread/process to handle each peer?
We are talking about billions of peers together here!
What is your idea ? Any standard http server handles request processing creating multiple threads unless you are using CGI.
Awesome one! Thanks! your videos are great!
a quick question on this lecture. Say for an instance, if client A resides in a separate country (say US) from client B (say India), to which CDN A will upload the image in this scenario? If it's on US, how will the client B know which CDN url to connect to?
very nicely explained. Thanks
when you saying 'a client connects to the messaging server', what do you mean by connecting to the server? Is it a state or websocket, what is the protocol or tech used here?
Excellent video. Can you please tell something about how groups messages are handled ? And how offline members text will be managed in that case ?
Two ways you can do:
First way:
When a group message is sent from a User A, at the server the process which is responsible for user A will perform lookup in a Group to users mapping table(gid->[user-a, user-b, user-c])note: (You can cache it for faster lookup) (this is not the user-to-pid table shown in the table).When Process A figures out all the users of group, it will try to get get PID for each users to send it to the respective user’s processes via Queue/Pipe. If the user is not online, them messges are persisted in the DB.
And when the offline users are back online, there respective processes(connection) reads all the pending messages from DB and delivers them.
Second way:
You can have separate set of dedicated workers which are responsible to dispense group messages only.when User A sends a group message. Process A hands over the message to dedicated group message workers and in turn these worker will Dispense messages to all the online members and saves messages for all the offline group members.
If you like, please subscribe and share this video, Thanks
Thanks for the reply. But counter question asked to me was, saving the details for all the offline users for all messages in DB, will be creating a lot of rows. is it still be feasible ?
Can cache be used for it ?
You can use cache where every you need to access the data very much faster.
DB will create many rows, but that;s the best way to persist messages for long time.
Alternatively. you can persist messages in cache for some time(24 hour) and if the user doesn't come online by then. you have to persist in the DB(NOSQL with user id, indexed)
That way you will be saving messages of the users who doesn't come online for 24hours only.
I have some questions though 1) how are users identified/distinguished from the senders device before the message reaches the server.
2) how could I possible create such a connection in Django?
Awesome explication !!!
Really really really super great explanation bro👍👌👌👌
How will the pid thing work if you have multiple messaging servers and client B is connected to some other server.
What in case client 1 and 2 are connected to different messaging server?
How does light weight process that handle message queue for particular client keep its message queue size within certain bound? E.g. If some client sends 100s of messages in burst then the process handling that user's client will be overwhelmed. Will those be offloaded to DB as well? or will it stop accepting new messages until queue is empty?
one more question on similar note, what happens if client has sent message and it is in queue but the server holding that queue / process dies. In that case how will clients know to retry sending those messages on reconnect?
Thanks for this education. It is really appreciated.
Just want to ask one thing, is the Whats app uses WebSocket to pass the message because it is not using any HTTP for alphabet messages?
What happens if the thread is in different message server. Do we need to have network connectivity between those ?
Same doubt I was having and was reading through the comments section to see if others feel the same or I am too novice to understand details.
Any explanation is appreciated.
Thanks!!
Did not explain the case where sender and receiver clients are connected to two separate "message servers". In that case the message cant be sent to in-memory-queue. A dedicated messaging queue like sqs can be used to solve this..
This is a great explanation 💖💖
thanks great vidoe. U use single thread to deal with user message, so how many users one server could handle?
at 19:33, what if clients A and B are connected to different messaging server ?
I think process/thread which is handling A , first will try to find whether the B 's messaging server is in localNode or in remote node based on b's Id and last used location info in db , based on that it will handle that . if its in some remote node first it will send to that node .
Absolutely incredible!
Thanks. Is there any messaging framework that I can hands on and play with?
Hi Naren how does a user connected to one server communicates to a user connected to a different whatspp server
Great work!🎉
you are making complex things so simple.I did not know how 27 mins pass and know i feel like i know a lot of things.Thanks a ton.
Thanks for making things easy brother.
I've a Question
18:00 - what is A message server's process got interrupted ?
Anil 4u I also do have the same question? Want to know whether the messages in server queue will be added to the db ?
I am not sure whether Whatsapp uses PID + data pipe combination. Same could be simplified by using separate Queues in a messaging server like RabbitMQ which can be scaled horizontally, with persistent messaging. So, whenever a message sent from A to B, the message could be put in B's Queue. When B connects, automatically, message would be read by B. Once B Acknowledges, the message would be removed from queue. But as u have mentioned, there could be a limitation on how many connections server can handle. Likewise , when a message is sent to a group, the message can be put in the queue of all users of the group. Encryption in case of Asymmetric keys, we encrypt with public key and receiver decrypts with private key. so , we would end up storing public keys of all contacts in local store?
But for every user will there be separate queue created?
@@nilaysheth3283 Yes. One queue per user because messaging systems are queue intensive.
How would a client connected to different message server interacts. Like A connected to server 1 and B connected to server 2. In that case, How will be the server 1 server 2 interaction.
Good Job man,
This was amazing.
Great explanation. Some question on last seen design. Why you need a heart bit send by user at regular interval. If the user is connected to server it is the current time and once disconnected just store that time as last seen. So difference of current time minus that timestamp will be last seen. Once user reconnect to server it is again going to be current time as last seen.
I prefer the big whiteboard you used. LOL
Does the table of [PID, UID] is only one table which uses all the online users at the same time ?
how does a process has been identified for a particular user? As in when user able to achieve a connection how does the process mapping works and how does it sends to other process ?
I'm not sure why'd we'd even bother with the in-process queue/database for when A and B are both connected to the messaging service. For one, it's unlikely any both A and B would connect to the same exact service instance at the same time. And two, you already have a system for when only one is connected (ie. write to external DB and poll when comes online) so wouldn't that solution work in general?
this is fantastic.
You didn't talk how the encryption is used. If it's symmetric, anyone can decrypt easily if key is known. In case asymmetric then how many public and private keys is created or it's same private and public key used for all the users?
Also what if Client's a and b are connected to different servers.
Starting each thread associated / coupled with a queue for each 1-to-1 communication between devices does not seem scalable to me. Why do we need each thread/pid per 1-to-1 communication channel ? Why it could not be possible rebalancing queues and a pool of workers in each instance ?