Brief Outline 00:00:59 Notification Service 00:02:12 Problem Requirements/Capacity Estimates 00:03:56 Fan Out Design Pattern 00:06:38 Idempotence 00:08:17 Idempotence Continued 00:11:17 Idempotence Continued 00:12:16 Bloom Filters 00:14:15 Client Connections to Notification Server 00:16:56 Final Diagram - Notification Service Thanks, Jordan~
Love your videos Jordan! What are your thoughts on creating videos where you build simple projects/programs that implement certain concepts from your systems design playlist? I personally can't find other channels that teach these concepts as well as you, and I feel like doing so would be pretty complementary to your already quality content.
great video. thanks for the incredible contribution. 2 questions 1) how do you notify users who aren’t online about unpopular posts. i would assume you need some sort of user specific queue after you fanout. the notifications table is sharded by topic id so it possibly won’t be queried directly for unpopular posts. 2) how do you notify users who are online about popular posts since i don’t see it connected to the web socket flow. i assume you expect them to poll periodically to see if they have any popular posts?
1) when they come back online, they'll basically hit a cache of all notifications meant for them specifically (cache is partitioned by hash of user id), and combine this with popular notifications they were subscribed to 2) yep, just polling!
I'm a bit confused by your question here - I guess I'm confused why nesting would imply that things would be stored any differently than normal. I'm going to assume that if a user is subscribed to a/, it also means they receive notifications for a/b. In such an event, we basically need to a) partition our servers/flink such that all subscriptions starting with a/ topic are on the same node or b) if there are too many subscriptions under the a/* tree to store on one node, we'll have to do some amount of denormalization where for each subscription a/b/c/d, we store all the subscriptions of parent topics on the same node. Many downsides to this approach in terms of write speeds for subscribing to new topics/data consistency, so maybe at that point we'd be better off ditching caching topic subscriptions in flink and instead just running a scatter/gather query to find all subscriptions for a/, a/b/, a/b/c/ and a/b/c/d and merging them together.
@@jordanhasnolife5163 I mean use can subscribe on specific categories like: Europe -> France -> Paris. So, this use should get notification (message) if this message is related to Europe -> France -> Paris (AND condition). How to design it?
Hi, great video, great content. I have a question on Bloom Filter part for reducing DB roundtrip. 1. I'm curious about how do we determine Bloom Filter size. 2. As you mention, Bloom Filter can't guarantee its result. What will the notification server do after receiving the result from Bloom Filter ? Probably we can not do something like "Seen => Not send noti, Not Seen => Send noti"...
1) I think you'd have to try out many in practice and see what gives you the best performance 2) Bloom filters tell us when we have not seen something, which allows us to skip a database read to check if a user has seen that idempotency key before. If there's any question whether they've seen it or not, you've gotta send the notification.
Really appreciate recording and sharing these videos. They are super helpful. For idempotency part, I think we can give each message (per user dimension) one increasing number and keep maximum number on server side. By comparing service and client side's number, we can know if any message is missing or not.
another comment on this awesome video after rewatching it bunch of times. so to conclude, how did u say we achieve idempotency since there seems to be no best option?
Hey Jordan, watched all your videos...blah blah blah... great content love it blah blah blah... funny descriptions blah blah blah... now that all the formalities have been taken care of (lol I promise I'm a fan), could you make the OneNotes for all the Sys Design Interview problems accessible as well please? I love going back to the playlist 1.0 powerpoints for concepts, but it would also be really nice to have these oneNote accessible too - ideally in a way I can clone and scribble lotion and tissue boxes on and oh annotate stuff. Willing to send feet pics if you get the job done before my interviews :P
Haha - appreciate the formalities. Going to do the actual questions themselves when the series is done, have been punting on the concepts. Have some time today, so I'll have to see if I can clutch up.
The design assumes we have full control on the clients and we can control the logic of the clients to have bidirectional communication, but in reality we often don't have control over the clients. Notification to the clients often requires existing infra (e.g. Apple APNs) which we need to integrate with but cannot modify its protocol.
Hi Jordan - thanks for the video. For the fanout - you mentioned Flink needs to reach out to ZK to find which notification server to send the message. I think it will pass the user Id to ZK and get the correct Notification Server to send the message too. But you also mentioned the routing server which will be called in by client to know the correct notification server the client needs to connect. Can we combine the 2 (ZK and routing server) so that both client and Flink can call to find the correct Notification Server based on User ID ?
Thanks for the video, could you please help clarify the connection b/w polling service and topic-subscription table? my understanding is polling service can get all its needed from cache + cassandra?
Agree with the polling service part. I think for the topic subscription the idea is you can get notifications that don't get sent to too many people from there, and then you can poll for the popular ones
Jordan, another great video, thank you! Would you be interested in doing the system design of a Google Sheets like application(not the collaboration functionality that you have already covered in Google Docs) but with features more closely resembling Airtable/Causal, focusing on real-time formula application?
Hi Jordan I have a very basic question. Are we building the notification service for all notifications (from all applications) for all users ? In the final diagram, you have mentioned that flink node receives data from 2 sources. 1. Topic subscriptions of user. Basically topic_id --> userId's. 2. Messages from Notification Queue. topic_id --> message. Using above 2 sources, how does the flink sends the real time messages ? Some messages are user specific right for example one to one pings. Sorry if this sounds silly.
Yeah. I'm not very concerned with one to one pings in this video. If every user is connected to one notification server, we can just figure out where they are connected to and send the message there. In the general case, flink figures out the user IDs the message needs to go to, and via a load balancer/proxy sends it to all of the proper servers which then forward them to users via we sockets
you seem to actually know what youre talking about, its kind of ridiculous how out of thousands of these types of videos every single other one i've seen is either from ex-swe's that seem like they were only interns with how they've "designed" some systems. sincere thank you this was extremely helpful for creating a notification system for a social app im working on for context my needs are pretty basic/simple, notifications for actions, follows, etc. say a user likes a post, my current approach is 1. create the notification row in the notification table, referencing the action 2. in a sorted set with key "user_notifications": use a batch key as the score and the the id of the notification in the database as the value. the batch key looks like "like_postid". 3. on pull request, get all the values in the sorted set and first using the batch key, batch all things with the same key together and increment a count. using the most recent notification id, get the relevant data and attach it to the notification. 4. store the batched notification data in the database and return the batched notification to the user. the final 2 steps hits redis to get all the ids, then it hits the database and redis again to store the batched notifications, and finally returns it to the user. im not sure if this is a good approach at all
Thanks for the video. How feasible would that be to build reverse mapping at scale? Sounds like expensive operation + at what cadence, we will either spent a lot of compute rebuilding revery sharding or there will be a signiy lag beyey users subscribing and starting to recrcey notifications for that topic.
I really don't think it would be super significant, we're really just going through a load balancer and into Kafka- and from there we can receive the message in flink and we're good to go
How does Flink handle (topicID userID) update? What if there is a new topic? What if a new user subscribes to an existing topic? What if a topic is deleted etc?
1) Flink adds a key to the "topic subscriptions" map. 2) Flink adds the userId as a value in the list of users subscribed to that topic. 3) Flink removes the key for that topic from the map.
Can you make alternate design for this, where users are unaware of the notification service, they just receive messages and not subscribed to any topic. B2C users use this service to send notifications to their users
@@jordanhasnolife5163for example if I bought something on Amazon. I get a mail about my purchase. I also get a mail from my credit card company that some amount has been spent. As a user I don’t need to know how companies implement their notifications mechanism.
Why is the subscription change events topic sharded on user id? I expected it to be on topic id as this stream needs to be joined with the events in notification queue which are also partitioned on topic id
we have two kafka queus, one is a CDC from topic subscriptions table which is partitioned by userid, and the other is partition by topicid. Both ingest data to Flink, so it needs to shuffle one or the other, doens´t it?
Storing the idempotency keys on client is a wild idea, and can be easily treated as a red flag. I wouldn't mention this as an option in a real interview.
I'd appreciate if you can back up calling it a "wild" idea. Considering duplicate notifications aren't coming in with that significant of a delay, you can just clear the keys after some amount of time.
1k notification for 1B topics, 1 trillion notifications a day? are you trying to spam the whole earth population by 10 notifications an hour??? We get it, it has to be scalable.
Video Editing is too low 👎👎👎👎 Edit the video in a good style. Add nice b-roll animation when you are speaking or explaining something. This will increase visualization. Put proper subtitles, effects, transitions, and colors in the video. Insert the B-Roll (footage and animation) very well because it is absolutely low in your video. Do all this and then see how the video doesn't turn out good. 🍏
Brief Outline
00:00:59 Notification Service
00:02:12 Problem Requirements/Capacity Estimates
00:03:56 Fan Out Design Pattern
00:06:38 Idempotence
00:08:17 Idempotence Continued
00:11:17 Idempotence Continued
00:12:16 Bloom Filters
00:14:15 Client Connections to Notification Server
00:16:56 Final Diagram - Notification Service
Thanks, Jordan~
Thanks for this work. Really helpful for me
Leon is the real mvp
My third favourite system design RUclipsr just uploaded, sweet!
Hell yeah brother
What are the the other ones?
@@nepo2616 Jordan (with a life), and Jordan (with 50% of a life)
Lol 😂
"I'm just greedy"
I already like this RUclips channel.
Love your videos Jordan! What are your thoughts on creating videos where you build simple projects/programs that implement certain concepts from your systems design playlist? I personally can't find other channels that teach these concepts as well as you, and I feel like doing so would be pretty complementary to your already quality content.
Hey! It's definitely something that I've considered - when I run out of shit to do/post in a bit I may turn in that direction :)
great video. thanks for the incredible contribution. 2 questions
1) how do you notify users who aren’t online about unpopular posts. i would assume you need some sort of user specific queue after you fanout. the notifications table is sharded by topic id so it possibly won’t be queried directly for unpopular posts.
2) how do you notify users who are online about popular posts since i don’t see it connected to the web socket flow. i assume you expect them to poll periodically to see if they have any popular posts?
1) when they come back online, they'll basically hit a cache of all notifications meant for them specifically (cache is partitioned by hash of user id), and combine this with popular notifications they were subscribed to
2) yep, just polling!
Hello, Jordan.
How to store user's subscriptions in DB for example for nested categories? Europe -> France -> Paris -> Events -> etc.?
I'm a bit confused by your question here - I guess I'm confused why nesting would imply that things would be stored any differently than normal.
I'm going to assume that if a user is subscribed to a/, it also means they receive notifications for a/b. In such an event, we basically need to
a) partition our servers/flink such that all subscriptions starting with a/ topic are on the same node
or
b) if there are too many subscriptions under the a/* tree to store on one node, we'll have to do some amount of denormalization where for each subscription a/b/c/d, we store all the subscriptions of parent topics on the same node. Many downsides to this approach in terms of write speeds for subscribing to new topics/data consistency, so maybe at that point we'd be better off ditching caching topic subscriptions in flink and instead just running a scatter/gather query to find all subscriptions for a/, a/b/, a/b/c/ and a/b/c/d and merging them together.
@@jordanhasnolife5163 I mean use can subscribe on specific categories like: Europe -> France -> Paris. So, this use should get notification (message) if this message is related to Europe -> France -> Paris (AND condition). How to design it?
@@povdata I mean Europe France Paris is effectively just a string topic, is it not? I don't see why things are any different here
Hi, great video, great content. I have a question on Bloom Filter part for reducing DB roundtrip.
1. I'm curious about how do we determine Bloom Filter size.
2. As you mention, Bloom Filter can't guarantee its result. What will the notification server do after receiving the result from Bloom Filter ? Probably we can not do something like "Seen => Not send noti, Not Seen => Send noti"...
1) I think you'd have to try out many in practice and see what gives you the best performance
2) Bloom filters tell us when we have not seen something, which allows us to skip a database read to check if a user has seen that idempotency key before. If there's any question whether they've seen it or not, you've gotta send the notification.
@ Many thanks!!! Keep it up man 🔥
Really appreciate recording and sharing these videos. They are super helpful. For idempotency part, I think we can give each message (per user dimension) one increasing number and keep maximum number on server side. By comparing service and client side's number, we can know if any message is missing or not.
Yep!
another comment on this awesome video after rewatching it bunch of times. so to conclude, how did u say we achieve idempotency since there seems to be no best option?
I'd probably just do it on the server and if we send a duplicate notification such is life
Thanks for the video. One question, what's the line between "topic subscritpion DB" and the "Notification polling service" in the end diagram?
If we're gonna poll for notifications we need to know which topic a user is subscribed to
Hey Jordan, watched all your videos...blah blah blah... great content love it blah blah blah... funny descriptions blah blah blah... now that all the formalities have been taken care of (lol I promise I'm a fan), could you make the OneNotes for all the Sys Design Interview problems accessible as well please? I love going back to the playlist 1.0 powerpoints for concepts, but it would also be really nice to have these oneNote accessible too - ideally in a way I can clone and scribble lotion and tissue boxes on and oh annotate stuff.
Willing to send feet pics if you get the job done before my interviews :P
Haha - appreciate the formalities. Going to do the actual questions themselves when the series is done, have been punting on the concepts. Have some time today, so I'll have to see if I can clutch up.
@@jordanhasnolife5163 +1 to OP. Really love your content and access to slides would be super helpful. Thanks a lot in advance!
The design assumes we have full control on the clients and we can control the logic of the clients to have bidirectional communication, but in reality we often don't have control over the clients. Notification to the clients often requires existing infra (e.g. Apple APNs) which we need to integrate with but cannot modify its protocol.
Yeah, but it wouldn't be an interview problem if I just said "hit the APN api" I suppose.
Hi Jordan - thanks for the video. For the fanout - you mentioned Flink needs to reach out to ZK to find which notification server to send the message. I think it will pass the user Id to ZK and get the correct Notification Server to send the message too. But you also mentioned the routing server which will be called in by client to know the correct notification server the client needs to connect. Can we combine the 2 (ZK and routing server) so that both client and Flink can call to find the correct Notification Server based on User ID ?
For sure
Ive been waiting for this video. This is one of the basic services
Thanks for the video, could you please help clarify the connection b/w polling service and topic-subscription table? my understanding is polling service can get all its needed from cache + cassandra?
Agree with the polling service part. I think for the topic subscription the idea is you can get notifications that don't get sent to too many people from there, and then you can poll for the popular ones
Jordan, another great video, thank you!
Would you be interested in doing the system design of a Google Sheets like application(not the collaboration functionality that you have already covered in Google Docs) but with features more closely resembling Airtable/Causal, focusing on real-time formula application?
Hey Harika! I'd have to think about this one a bit, versus whether it's just we have a formula per cell and then it gets applied on all client devices
Hi Jordan
I have a very basic question.
Are we building the notification service for all notifications (from all applications) for all users ?
In the final diagram, you have mentioned that flink node receives data from 2 sources.
1. Topic subscriptions of user. Basically topic_id --> userId's.
2. Messages from Notification Queue. topic_id --> message.
Using above 2 sources, how does the flink sends the real time messages ? Some messages are user specific right for example one to one pings.
Sorry if this sounds silly.
Yeah. I'm not very concerned with one to one pings in this video. If every user is connected to one notification server, we can just figure out where they are connected to and send the message there.
In the general case, flink figures out the user IDs the message needs to go to, and via a load balancer/proxy sends it to all of the proper servers which then forward them to users via we sockets
you seem to actually know what youre talking about, its kind of ridiculous how out of thousands of these types of videos every single other one i've seen is either from ex-swe's that seem like they were only interns with how they've "designed" some systems.
sincere thank you this was extremely helpful for creating a notification system for a social app im working on
for context my needs are pretty basic/simple, notifications for actions, follows, etc. say a user likes a post, my current approach is
1. create the notification row in the notification table, referencing the action
2. in a sorted set with key "user_notifications": use a batch key as the score and the the id of the notification in the database as the value. the batch key looks like "like_postid".
3. on pull request, get all the values in the sorted set and first using the batch key, batch all things with the same key together and increment a count. using the most recent notification id, get the relevant data and attach it to the notification.
4. store the batched notification data in the database and return the batched notification to the user.
the final 2 steps hits redis to get all the ids, then it hits the database and redis again to store the batched notifications, and finally returns it to the user.
im not sure if this is a good approach at all
sure! just remember, if this is an MVP, no need to build for scale that you don't have to accommodate yet!
@@jordanhasnolife5163 i know! im just trying to make sure i have the basics covered, because im not sure if my current approach is appropriate
@@jordanhasnolife5163 i’m also not sure if i should have 2 different tables for notifications (one for batched and one for individual)
Thanks for the video.
How feasible would that be to build reverse mapping at scale? Sounds like expensive operation + at what cadence, we will either spent a lot of compute rebuilding revery sharding or there will be a signiy lag beyey users subscribing and starting to recrcey notifications for that topic.
I really don't think it would be super significant, we're really just going through a load balancer and into Kafka- and from there we can receive the message in flink and we're good to go
Jordan, Can you use the bigger mouse pointer, that way we can track, which box you are talking about? Thanks
I'm using an apple pencil, to my knowledge there is no mouse pointer.
Could you explain a little bit more about what flink operator it use to handle this flink mapping?
Use a MapState on incoming elements and aggregate by key (key is topic, value is list of users subscribed to it)
How does Flink handle (topicID userID) update? What if there is a new topic? What if a new user subscribes to an existing topic? What if a topic is deleted etc?
1) Flink adds a key to the "topic subscriptions" map.
2) Flink adds the userId as a value in the list of users subscribed to that topic.
3) Flink removes the key for that topic from the map.
Can you make alternate design for this, where users are unaware of the notification service, they just receive messages and not subscribed to any topic. B2C users use this service to send notifications to their users
What does this mean "unaware of the notification service". You have to be connected to something?
@@jordanhasnolife5163for example if I bought something on Amazon. I get a mail about my purchase. I also get a mail from my credit card company that some amount has been spent. As a user I don’t need to know how companies implement their notifications mechanism.
Why is the subscription change events topic sharded on user id? I expected it to be on topic id as this stream needs to be joined with the events in notification queue which are also partitioned on topic id
See 19:26, typo on my part
we have two kafka queus, one is a CDC from topic subscriptions table which is partitioned by userid, and the other is partition by topicid. Both ingest data to Flink, so it needs to shuffle one or the other, doens´t it?
Ah maybe a typo on my end, both should be sharded by topicId
@@jordanhasnolife5163 aha! thanks :) love your videos!
You forgot about 2nd requirement - Offline clients must be able to receive notifications later
Throw the notifications to a DB cache per user (so partition based on userId). Same deal as Twitter basically.
Storing the idempotency keys on client is a wild idea, and can be easily treated as a red flag. I wouldn't mention this as an option in a real interview.
I'd appreciate if you can back up calling it a "wild" idea. Considering duplicate notifications aren't coming in with that significant of a delay, you can just clear the keys after some amount of time.
only fans i m 🤣🤣
Thank you.
1k notification for 1B topics, 1 trillion notifications a day? are you trying to spam the whole earth population by 10 notifications an hour??? We get it, it has to be scalable.
You can't handle this scale
Video Editing is too low 👎👎👎👎
Edit the video in a good style.
Add nice b-roll animation when you are speaking or explaining something. This will increase visualization.
Put proper subtitles, effects, transitions, and colors in the video.
Insert the B-Roll (footage and animation) very well because it is absolutely low in your video.
Do all this and then see how the video doesn't turn out good. 🍏
See duplicate comment