Good discussions. Very helpful in preparing for upcoming interviews. A few things I don’t understand about the solution: 1) With a distributed rate limiter sharded on userId/IP address like you’ve proposed here, I can’t see the need for read replicas. Every operation is probably a write operation. That’s under the assumption that the vast majority of requests don’t get throttled and this put a timestamp in a queue. Read replicas would only be helpful when the request is throttled. So I think if we want 100% accuracy, we can’t do leaderless replication. But actually I would argue that there are a lot of cases where we would prefer speed over accuracy. And if our scale is high enough where requests come so often a single machine doesn’t have the IO to handle those requests (or the blocking data structures are under too much pressure), to support higher throughput we would need to compromise accuracy. By allowing writes to go to multiple machines, we can scale much higher in write throughput. The loss is accuracy. We also then need to share that information between nodes. Perhaps with a leader, perhaps not. We can also use gossip style dissemination. 2) I can’t understand how the cache would work on the LB. This would I presume be for throttled requests only. I suppose the rate limiter could return the earliest time that the next request would succeed and the cache would be valid only until then. Is that the idea? Another good thing to talk about would be UDP vs TCP here, which again falls into the accuracy-speed tradeoff here. Overall great discussion here in the video, maybe some of these points could further someone else’s thinking on the problem space while preparing.
1) Fair point - you may not want to read from that replica but you do still want a replica in case you have to perform failover. Agree with your point regarding a multi leader/leaderless setup, that can be feasible if our limits are more loose. 2) It's just a write back cache for certain users to do all of the rate limiting logic like normal, but in the load balancer node, so that we can avoid a network call. 3) Yeah I think introducing UDP in a problem where dropped/unordered data is really not acceptable may end up with us just finding ourselves reimplementing TCP lol, but I think that's a good thing to bring up!
Jordan, to avoid locking in sliding window algorithm, we could split the time window into multiple buckets and store them as a circular array, for ex: 1 sec can be broken into 10 buckets of 100ms. The total hits at any point is the total of all the hits in the buckets. We can use atomic variables to maintain the hits in each bucket and the total hits.
Seems reasonable to me, though note that everything is going to the last bucket anyways, so I'm not sure how much contention you're really avoiding here (you mentioned using atomics in each bucket, so you still need some concurrency control).
@@jordanhasnolife5163 The bucket is decided based on the time (in ms), so it will be distributed across all buckets. Yes, some concurrency control but atomic counters will be much faster than locks.
@@cowboy4mhell if things are coming in sequentially, why would they be distributed to different buckets? The last events are still all going into the same ones.
I think Redis operations (like INCR etc.) are atomic since by default it runs single threaded and uses atomic hardware ops (like compare and swap). And based on what I read so far, even single threaded redis seems to be pretty performant. Any reason why going multi-threaded with locking might be more performant? NOTE: I recently discovered your channel and absolutely love your content. Keep it coming 🙌🏽.
I guess we'd have to do some bench marking in practice! Single threading could very well be fine here, especially if there's gonna be a lot of contention!
amazing jordan! i remember seeing sorted sets of redis for the sliding window, while you use linked lists your solution does have better TC but maybe they use sorted sets because requests can reach the rate limiter "out of order"? not sure why they would overcomplicate the solution otherwise?
Nice Video Jordan.. One small question though In the final picture, where are we going to keep the Sliding Window Logic to kick out the expired counts ? Is it inside LB or we will create a new RateLimiting Service which uses Redis ?
Wherever we're doing the rate limiting - in reality it could very well just be a custom server that we've deployed with our own code and stuff will just live in memory.
Hi Jordan. Great video as usual! I had one question. Can't we use Leaderless replication with partitioning to send a particular userId/IP to the same node everytime?
I’d like to dig into the memory requirements of the system some more with the added distinction between a fixed window vs sliding window algorithm. I worry that to support a sliding window rate limit our memory requirements would balloon so much as to invalidate the architecture. Fixed window -- I think your initial estimate holds up as long as we only need a single interval/ process (or maybe a handful) tracked in memory which is responsible for updating all of the token buckets. Sliding window -- If every active ip/user needs a linked list then I think we have a problem. The LL needs as many nodes as there can be requests within a window. That might be say: 1000 if our rate limit is like 1000 per minute or per 24 hours (seems like a reasonable thing to support). Each node in the LL needs a 4byte pointer to the next node and 4 bytes for the timestamp. 8 bytes * 1000 * 1 billion alone puts us at 8TB. At 8TB+ are we getting outside the realm of a manageable number of rate limit shards? 🤔
I think your estimate is worst case, which doesn't consider the fact that most users will have like at most one node in their linked list. Even still, 8TB of memory is less than 100 machines, all of which are independent and don't coordinate with one another in any way whatsoever. Seems reasonable to me!
@ the comment that 8TB is reasonable touches on an intuition that I’m kind of struggling with. Generally, one question we find ourselves asking in almost all system design problems is: “can we fit our database in memory”. To answer this question I’d love to have a rule of thumb, like: if we can fit it in less than X GB or shard it in such a way that it can fit in less than Y shards (each X GB in size) then “yes”, else “no”. But… well… what are X and Y to you?
@@lucasparzych5195 I'd try and say that pretty much any scale is acceptable if you have a large enough budget haha (FAANG). I feel like anything up to a Petabyte could honestly be fine even, if the latency requirements are low enough
@ that’s a new perspective to me and insightful thank you. “If the latency requirements are low enough and cost isn’t an issue just find a way to partition it and we can fit most anything in memory” love it. One question I have is around managing in an automated way the partitions (adding/ removing as data is added to the system or as traffic needs facilitate) automatically updating the load balancer config, actually spinning up the new redis/etc). Do you have any videos that touch on that? Ive seen you reference zookeeper, but that’s not actually going to like do the rebalancing, right? It’s just a shared key/value store? Also I’ve heard zookeeper is kind of less relevant these days?
@@lucasparzych5195 I don't know what's replacing zookeeper, but no that wouldn't do automatic partition rebalancing for the most part. Either you'd have to use a technology that has it built in, build your own auto rebalancing, or have it done manually. There are pros and cons to all approaches! Generally speaking, having pre-set many small partition sizes from the start makes life easier when it comes to rebalancing.
Great video! In the final design, where is the rate limiting logic implemented? Is it in between the load balancer and the redis database? Or is it implemented on the backend services? Or is it just for us to decide based on the constraints?
I'd say mostly for you to decide based on constraints. There are tradeoffs to each approach between how independently our rate limiting service can scale and the latency at which we can perform rate limiting.
re pimples, i had a pizza face growing up its an oil issue. half a lemon, half a grapefruit half an orange tsp olive oil (so you dont get an ulcer) in tge blender, daily. at night dab of toothpaste on each itll dry them up, after two weeks max 1 month youll need a new reason to not get some. very nice videos im enjoying them immensly
Regarding a need to use locks to sync access to lists in Redis (for the sliding window case) It might be possible to move list update logic to Redis function or use Redis transaction, which will make execution atomic. Taking to account that Redis is single-threaded there will not be need to use locks
Great video. However, this implementation is assuming that all 20 services will be rate limited at the same spot? So this design wouldn't work if each service needed to be rate limited independently?
Great video, can the linkedList be a binary search tree instead? Inserting element is slower but when you try to remove from it, it takes O(logn) instead of O(n), you would have to balance the binary search tree once in a while in O(n), but not always.
Hi there awesome video! I have a doubt. How will the load balancer cache help ? The rate limiting data must reset after specified time frame so how will the cache be updated unless we let the request pass through the loadbalancer to the RL severs.
What do you mean it must be reset? The load balancing cache can basically just run the same exact code as the rate limiter would. That way it can act as a write back cache and you can avoid an extra network call.
How would that work? Like a subset of the requests can just be fulfilled by the load balancer? The rest go on to do the network call?@@jordanhasnolife5163
I watched this video several times (like the rest of the problems) and practiced by recording myself while designing the rate limiter, notification system, Twitter, etc. Luckily, my interviewer from Amazon picked up the rate limiter, and he didn't like it. Didn't tell me what I did wrong, and he had no feedback or clues for me. I explained my choices based on his numbers and requirements (eventual consistency, exponential lock on IPs, partitioning, replicas, Redis choice, and why). Still don't understand what I missed
Well I suppose there are many reasons that it could have been, and it's hard for me to say exactly without all of us seeing the interview! It's even possible you just had a bad interviewer! This sounds likely too if he didn't ask you any questions during the interview. Nonetheless, keep up the good work and trust the process! I've failed plenty of interviews too, and all you can do is consider it good practice!
One doubt, at 7:48 you have put the services behind a load balancer. Can a load balancer distribute load among separate services? Isn’t that the job of API gateways?
But you can have a multi-master setup too, with a follower of each master. Just need to ensure that the data for one user goes inside a particular shard always. This would solve the user/ip level rate limiting?
In the sliding window algorithm, did you say someone could make the design decision of always adding every request to the list EVEN if the request was outside the window and would otherwise be invalid? could you explain that please or clarify if that wasn't what you meant
Ah sorry, no, I think what I meant there is when you hit your rate limit with the sliding window you can: 1) not add events over the limit to the linked list (this will mean that you're only bottlenecked by the existing events in the linked list) 2) add them to the linked list (now you have to wait for all of them to expire before you can make more API calls).
We dont necessarily need our rate limiter to be available all the time? - 1 we have the cache at the load balancer - the rate limiting as a service is not on the load balancer so if it is down for sometime it wont affect the service as a whole
With User ID and IP you can also add in user system data for a more precise system where, user system(mac address(Random mac on new phones is a troublesome thing),phone make,model,device id,screen resolution,processor,storage,device specs etc) a combination of these 3 can make it more precise, especially with cgnat(same ip for whole neighbourhood- common in asia by isps) networks.
I do yeah - apologies for being a bum in terms of uploading these but I'll get to it soon enough, probably when I finish this series so that I can upload them in batch
Hi, was wondering if Rate limit rules will come into discussion. For this particular endpoint, these many requests are remaining for this ip). So these configurations need to be stored somewhere right? Probably a DB? Also correct me if I am wrong, in Redis, the running counters of rate limit are stored right? Like, 5 requests have been exhausted. Also how are we refreshing the limits here, say after a minute has passed I need to reset the limits right?
If we're doing fixed size windows, our counts get reset on the new (hour, minute, whatever time window). If we're using sliding windows, a background process will expire requests that are outside of our window.
Will they ask to implement algorithms in an interview? I have boilerplate for sliding window in redis Lua and Ive used nginx's default which I think is leaky bucket... implementation isnt hard just some cli stuff. In an interview how likely do you think they would ask for details about the algorithm itself?
Thank you for the video! Good learning as always. One doubt. When you say multi-leader replication setup, 1. Do you mean it in context of rate limiting services ? Like every rate-limiting service node will keep track of its count (CRDTs) and then broadcast it to other nodes. (If yes, then why do we need Redis ?) Or, 2. Do you mean in context of Redis ?
Jordan I saw it in the comments too but wouldnt Redis' native support for sorted sets be very powerful for Rate Limiting? Specifically the zremrangebyscore function for removing expired keys in a sliding window? Also with Redis isnt concurrency not an issue since it is single threaded?
Could you explain more of the sorted set solution for Redis? Assuming requests come in in order then can’t we just use a queue? I guess the sorted site would handle if there are some out of order request due to the distributed servers
Realistically I'm sure this works just fine! I've never used Redis in my day to day work, but even then I'm hesitant to call out a very specific function of one technology as opposed to going through the overall approach, that way we can recognize many technologies that do what we want!
Yeah queue sounds fine here. I guess everything goes in by timestamp in his implementation and then you remove them based on their timestamp value every x seconds
@@jordanhasnolife5163 Very fair point Jordan!! @jdxxmxnd - something like this: ZREMRANGEBYSCORE "$KEY" 0 $((CURRENT_TIMESTAMP - WINDOW_SIZE)) # Get the count of requests in the current window CURRENT_COUNT=$(redis-cli ZCARD "$KEY") if [ "$CURRENT_COUNT" -lt "$RATE_LIMIT" ]; then # If count is less than the limit, add the current timestamp redis-cli ZADD "$KEY" "$CURRENT_TIMESTAMP" "$CURRENT_TIMESTAMP" redis-cli EXPIRE "$KEY" "$WINDOW_SIZE" # Set expiration for the key Essentially a sliding window implementation. The benefit here is that Redis is single threaded and gives us a blazing fast way of keeping a index of how often it sees a key - and we dont have concurrency concerns as only 1 thread ever modifies the keys. The key here can be an IP or user ID or session cookie. I was doing some benchmarking of Redis on my local MBP and boy was getting 200K SET operations in a second! With beefy hardware this can further be accelerated. But ofcourse as Jordan pointed out for fault tolerance and further scalability we'd want to have a distributed Redis cache with a single leader. Even for queuing Redis has amazing native constructs. Ofcourse it has durability issues though so while it works amazing for rate limiting and such - not a great persistent store - unless you use a fork like Memory DB or do some periodic checkpointing.
I don't understand the point of a cache on top of redis. Won't every request require us to update the cache to get the latest count from the redis store? Cache would make sense if it was read heavy I think.
The cache is on the load balancer, so it helps us avoid an additional network call. If it's a write back cache, that means it is the source of truth, hence we don't have to go to the redis store.
@@jordanhasnolife5163 does write back cache just mean that lookup logic lives in LB and rate limiter service is just responsible for writing the data into Redis? so separating the responsibility between LB and Rate Limiter?
I do have a video on the game server! Just have to remake it at some point. For bidding, probably some partitioned redis cache on auctionId, you'll have to use atomic operations to increase the bid.
@@jordanhasnolife5163 i will look for the game server vid! Would you use kafka and flink to process the bidding requests? Would you stream process it or batch process it? And how would you update and show the users the last bidding if it constantly update? Thanks in advance 🙏🏻
Rate lmiting is done per server level right. So while the esitmation, why did you multiply with 20 serviers. we use 20 different rate limiters with the limit 12 GB
I don't know what you mean "per server", as there are many servers. Yeah, you can always partition, but the amount of data needed to be stored on a cumulative basis is for all services. We have a rate limiting service that scales independently from our application servers.
you dont talk about costs in any of these videos. cost is a very imp. aspect which can redefine the entire solution. But good video, you bring new dimensions to my thought process.
Agreed on the cost part, and this is certainly true IRL. Though to be fair, I don't think that most systems design interviews are expecting you to have a concrete idea of the costs of your solutions. Though at a high level, I agree that you should probably have an idea when making designs which areas are costly/could potentially be improved upon.
Yeah fair, in this case I'm kinda envisioning the load balancer more as like an API gateway, where we have to hit it first to route the request to the appropriate microservice, in which case we may need to hit that first before doing rate limiting. But to be honest, yeah you can totally put the rate limiter before a load balancer.
Good video! Thinking of applying this "rate limiting" thing to the dudes locked in my basement that keep up my daily leetcode streak.
You should consider evicting them from your basement (and replacing them with fresh entries)
I love your videos Jordan... just amazing... my teacher... ill wish you happy teachers day on teachers day...😊😊
Good discussions. Very helpful in preparing for upcoming interviews.
A few things I don’t understand about the solution:
1) With a distributed rate limiter sharded on userId/IP address like you’ve proposed here, I can’t see the need for read replicas. Every operation is probably a write operation. That’s under the assumption that the vast majority of requests don’t get throttled and this put a timestamp in a queue. Read replicas would only be helpful when the request is throttled.
So I think if we want 100% accuracy, we can’t do leaderless replication. But actually I would argue that there are a lot of cases where we would prefer speed over accuracy. And if our scale is high enough where requests come so often a single machine doesn’t have the IO to handle those requests (or the blocking data structures are under too much pressure), to support higher throughput we would need to compromise accuracy.
By allowing writes to go to multiple machines, we can scale much higher in write throughput. The loss is accuracy. We also then need to share that information between nodes. Perhaps with a leader, perhaps not. We can also use gossip style dissemination.
2) I can’t understand how the cache would work on the LB. This would I presume be for throttled requests only. I suppose the rate limiter could return the earliest time that the next request would succeed and the cache would be valid only until then. Is that the idea?
Another good thing to talk about would be UDP vs TCP here, which again falls into the accuracy-speed tradeoff here.
Overall great discussion here in the video, maybe some of these points could further someone else’s thinking on the problem space while preparing.
1) Fair point - you may not want to read from that replica but you do still want a replica in case you have to perform failover. Agree with your point regarding a multi leader/leaderless setup, that can be feasible if our limits are more loose.
2) It's just a write back cache for certain users to do all of the rate limiting logic like normal, but in the load balancer node, so that we can avoid a network call.
3) Yeah I think introducing UDP in a problem where dropped/unordered data is really not acceptable may end up with us just finding ourselves reimplementing TCP lol, but I think that's a good thing to bring up!
Jordan, to avoid locking in sliding window algorithm, we could split the time window into multiple buckets and store them as a circular array, for ex: 1 sec can be broken into 10 buckets of 100ms. The total hits at any point is the total of all the hits in the buckets. We can use atomic variables to maintain the hits in each bucket and the total hits.
Seems reasonable to me, though note that everything is going to the last bucket anyways, so I'm not sure how much contention you're really avoiding here (you mentioned using atomics in each bucket, so you still need some concurrency control).
@@jordanhasnolife5163 The bucket is decided based on the time (in ms), so it will be distributed across all buckets. Yes, some concurrency control but atomic counters will be much faster than locks.
@@cowboy4mhell if things are coming in sequentially, why would they be distributed to different buckets? The last events are still all going into the same ones.
I think Redis operations (like INCR etc.) are atomic since by default it runs single threaded and uses atomic hardware ops (like compare and swap). And based on what I read so far, even single threaded redis seems to be pretty performant. Any reason why going multi-threaded with locking might be more performant?
NOTE: I recently discovered your channel and absolutely love your content. Keep it coming 🙌🏽.
I guess we'd have to do some bench marking in practice! Single threading could very well be fine here, especially if there's gonna be a lot of contention!
amazing jordan!
i remember seeing sorted sets of redis for the sliding window, while you use linked lists
your solution does have better TC
but maybe they use sorted sets because requests can reach the rate limiter "out of order"?
not sure why they would overcomplicate the solution otherwise?
Nor am I!
Nice Video Jordan..
One small question though
In the final picture, where are we going to keep the Sliding Window Logic to kick out the expired counts ? Is it inside LB or we will create a new RateLimiting Service which uses Redis ?
Wherever we're doing the rate limiting - in reality it could very well just be a custom server that we've deployed with our own code and stuff will just live in memory.
Hi Jordan. Great video as usual! I had one question. Can't we use Leaderless replication with partitioning to send a particular userId/IP to the same node everytime?
Sure but then you really just described single leader replication with partitioning lol
I’d like to dig into the memory requirements of the system some more with the added distinction between a fixed window vs sliding window algorithm. I worry that to support a sliding window rate limit our memory requirements would balloon so much as to invalidate the architecture.
Fixed window
--
I think your initial estimate holds up as long as we only need a single interval/ process (or maybe a handful) tracked in memory which is responsible for updating all of the token buckets.
Sliding window
--
If every active ip/user needs a linked list then I think we have a problem. The LL needs as many nodes as there can be requests within a window. That might be say: 1000 if our rate limit is like 1000 per minute or per 24 hours (seems like a reasonable thing to support). Each node in the LL needs a 4byte pointer to the next node and 4 bytes for the timestamp. 8 bytes * 1000 * 1 billion alone puts us at 8TB.
At 8TB+ are we getting outside the realm of a manageable number of rate limit shards? 🤔
I think your estimate is worst case, which doesn't consider the fact that most users will have like at most one node in their linked list.
Even still, 8TB of memory is less than 100 machines, all of which are independent and don't coordinate with one another in any way whatsoever. Seems reasonable to me!
@ the comment that 8TB is reasonable touches on an intuition that I’m kind of struggling with.
Generally, one question we find ourselves asking in almost all system design problems is: “can we fit our database in memory”. To answer this question I’d love to have a rule of thumb, like: if we can fit it in less than X GB or shard it in such a way that it can fit in less than Y shards (each X GB in size) then “yes”, else “no”. But… well… what are X and Y to you?
@@lucasparzych5195 I'd try and say that pretty much any scale is acceptable if you have a large enough budget haha (FAANG). I feel like anything up to a Petabyte could honestly be fine even, if the latency requirements are low enough
@ that’s a new perspective to me and insightful thank you. “If the latency requirements are low enough and cost isn’t an issue just find a way to partition it and we can fit most anything in memory” love it.
One question I have is around managing in an automated way the partitions (adding/ removing as data is added to the system or as traffic needs facilitate) automatically updating the load balancer config, actually spinning up the new redis/etc). Do you have any videos that touch on that?
Ive seen you reference zookeeper, but that’s not actually going to like do the rebalancing, right? It’s just a shared key/value store? Also I’ve heard zookeeper is kind of less relevant these days?
@@lucasparzych5195 I don't know what's replacing zookeeper, but no that wouldn't do automatic partition rebalancing for the most part. Either you'd have to use a technology that has it built in, build your own auto rebalancing, or have it done manually.
There are pros and cons to all approaches!
Generally speaking, having pre-set many small partition sizes from the start makes life easier when it comes to rebalancing.
Great video! In the final design, where is the rate limiting logic implemented? Is it in between the load balancer and the redis database? Or is it implemented on the backend services? Or is it just for us to decide based on the constraints?
I'd say mostly for you to decide based on constraints. There are tradeoffs to each approach between how independently our rate limiting service can scale and the latency at which we can perform rate limiting.
edging is backpressure management
Good point!
re pimples, i had a pizza face growing up its an oil issue. half a lemon, half a grapefruit half an orange tsp olive oil (so you dont get an ulcer) in tge blender, daily. at night dab of toothpaste on each itll dry them up, after two weeks max 1 month youll need a new reason to not get some. very nice videos im enjoying them immensly
I appreciate it! Yeah I eat a ton of dairy due to lifting (whey protein agh), but what can ya do, I don't really care too much
Just get whey isolate you noob
Regarding a need to use locks to sync access to lists in Redis (for the sliding window case) It might be possible to move list update logic to Redis function or use Redis transaction, which will make execution atomic. Taking to account that Redis is single-threaded there will not be need to use locks
Yeah for Redis you're totally right, just wanted to mention this in the general case if someone were to use a multithreaded server.
Jordan, thank you for the great content. Could you please share the slides used for all sys design 2.0 videos?
Hey! I will upload them in bulk when the current series is done :)
Great video. However, this implementation is assuming that all 20 services will be rate limited at the same spot? So this design wouldn't work if each service needed to be rate limited independently?
No, you can shard out the rate limiter per service and do this independently.
Great video, can the linkedList be a binary search tree instead? Inserting element is slower but when you try to remove from it, it takes O(logn) instead of O(n), you would have to balance the binary search tree once in a while in O(n), but not always.
I believe in our case we're only removing from the head of the linked list, which is why I use that
Hi there awesome video! I have a doubt. How will the load balancer cache help ? The rate limiting data must reset after specified time frame so how will the cache be updated unless we let the request pass through the loadbalancer to the RL severs.
What do you mean it must be reset? The load balancing cache can basically just run the same exact code as the rate limiter would. That way it can act as a write back cache and you can avoid an extra network call.
@@jordanhasnolife5163 Okay got it. Thanks for your response.
How would that work? Like a subset of the requests can just be fulfilled by the load balancer? The rest go on to do the network call?@@jordanhasnolife5163
Big thanks from another Googler 🙏
I watched this video several times (like the rest of the problems) and practiced by recording myself while designing the rate limiter, notification system, Twitter, etc. Luckily, my interviewer from Amazon picked up the rate limiter, and he didn't like it. Didn't tell me what I did wrong, and he had no feedback or clues for me. I explained my choices based on his numbers and requirements (eventual consistency, exponential lock on IPs, partitioning, replicas, Redis choice, and why). Still don't understand what I missed
Well I suppose there are many reasons that it could have been, and it's hard for me to say exactly without all of us seeing the interview!
It's even possible you just had a bad interviewer! This sounds likely too if he didn't ask you any questions during the interview.
Nonetheless, keep up the good work and trust the process! I've failed plenty of interviews too, and all you can do is consider it good practice!
One doubt, at 7:48 you have put the services behind a load balancer. Can a load balancer distribute load among separate services? Isn’t that the job of API gateways?
I guess depends on the implementation but sure I'm fine using the term API gateway
correction at 14:19 : it should be memcache, you typed it memache'd'. Which is a persistent storage based service based on memcache.
But you can have a multi-master setup too, with a follower of each master. Just need to ensure that the data for one user goes inside a particular shard always. This would solve the user/ip level rate limiting?
You just described partitioning
In the sliding window algorithm, did you say someone could make the design decision of always adding every request to the list EVEN if the request was outside the window and would otherwise be invalid? could you explain that please or clarify if that wasn't what you meant
Ah sorry, no, I think what I meant there is when you hit your rate limit with the sliding window you can:
1) not add events over the limit to the linked list (this will mean that you're only bottlenecked by the existing events in the linked list)
2) add them to the linked list (now you have to wait for all of them to expire before you can make more API calls).
We dont necessarily need our rate limiter to be available all the time?
- 1 we have the cache at the load balancer
- the rate limiting as a service is not on the load balancer so if it is down for sometime it wont affect the service as a whole
Yeah, but we do want it up as much as possible to avoid spammers
With User ID and IP you can also add in user system data for a more precise system where, user system(mac address(Random mac on new phones is a troublesome thing),phone make,model,device id,screen resolution,processor,storage,device specs etc) a combination of these 3 can make it more precise, especially with cgnat(same ip for whole neighbourhood- common in asia by isps) networks.
Nice insight, thank you!
@@jordanhasnolife5163 i guess twitter does this to detect bots using vm's to create new accounts for spamming.
Wondering if you have all the iPad notes stored somewhere for quick revision before an interview.
I do yeah - apologies for being a bum in terms of uploading these but I'll get to it soon enough, probably when I finish this series so that I can upload them in batch
@@jordanhasnolife5163 Thanks man, yeah that would be very helpful for sure. Looking forward to these.
Hi, was wondering if Rate limit rules will come into discussion. For this particular endpoint, these many requests are remaining for this ip). So these configurations need to be stored somewhere right? Probably a DB?
Also correct me if I am wrong, in Redis, the running counters of rate limit are stored right? Like, 5 requests have been exhausted.
Also how are we refreshing the limits here, say after a minute has passed I need to reset the limits right?
If we're doing fixed size windows, our counts get reset on the new (hour, minute, whatever time window).
If we're using sliding windows, a background process will expire requests that are outside of our window.
Will they ask to implement algorithms in an interview? I have boilerplate for sliding window in redis Lua and Ive used nginx's default which I think is leaky bucket... implementation isnt hard just some cli stuff. In an interview how likely do you think they would ask for details about the algorithm itself?
I doubt they'd want you to go *that* in depth, but they may ask for some high level details, I'm not entirely sure.
This is a beginner question, but how does sharding with single leader replication work? Does each shard range of databases have their own leader?
Yep!
Thank you for the video! Good learning as always.
One doubt. When you say multi-leader replication setup,
1. Do you mean it in context of rate limiting services ? Like every rate-limiting service node will keep track of its count (CRDTs) and then broadcast it to other nodes. (If yes, then why do we need Redis ?)
Or,
2. Do you mean in context of Redis ?
Either or could work, I believe redis does have CRDT support built in so perhaps one less thing to invent.
Jordan I saw it in the comments too but wouldnt Redis' native support for sorted sets be very powerful for Rate Limiting? Specifically the zremrangebyscore function for removing expired keys in a sliding window?
Also with Redis isnt concurrency not an issue since it is single threaded?
Could you explain more of the sorted set solution for Redis?
Assuming requests come in in order then can’t we just use a queue? I guess the sorted site would handle if there are some out of order request due to the distributed servers
Realistically I'm sure this works just fine!
I've never used Redis in my day to day work, but even then I'm hesitant to call out a very specific function of one technology as opposed to going through the overall approach, that way we can recognize many technologies that do what we want!
Yeah queue sounds fine here. I guess everything goes in by timestamp in his implementation and then you remove them based on their timestamp value every x seconds
@@jordanhasnolife5163 Very fair point Jordan!!
@jdxxmxnd - something like this:
ZREMRANGEBYSCORE "$KEY" 0 $((CURRENT_TIMESTAMP - WINDOW_SIZE))
# Get the count of requests in the current window
CURRENT_COUNT=$(redis-cli ZCARD "$KEY")
if [ "$CURRENT_COUNT" -lt "$RATE_LIMIT" ]; then
# If count is less than the limit, add the current timestamp
redis-cli ZADD "$KEY" "$CURRENT_TIMESTAMP" "$CURRENT_TIMESTAMP"
redis-cli EXPIRE "$KEY" "$WINDOW_SIZE" # Set expiration for the key
Essentially a sliding window implementation. The benefit here is that Redis is single threaded and gives us a blazing fast way of keeping a index of how often it sees a key - and we dont have concurrency concerns as only 1 thread ever modifies the keys.
The key here can be an IP or user ID or session cookie.
I was doing some benchmarking of Redis on my local MBP and boy was getting 200K SET operations in a second! With beefy hardware this can further be accelerated.
But ofcourse as Jordan pointed out for fault tolerance and further scalability we'd want to have a distributed Redis cache with a single leader.
Even for queuing Redis has amazing native constructs. Ofcourse it has durability issues though so while it works amazing for rate limiting and such - not a great persistent store - unless you use a fork like Memory DB or do some periodic checkpointing.
Shouldn't we could use rabbit mq here as we are not concerned with order
I just don't see why we need a message broker in the first place, we want synchronous responses here
I don't understand the point of a cache on top of redis. Won't every request require us to update the cache to get the latest count from the redis store? Cache would make sense if it was read heavy I think.
The cache is on the load balancer, so it helps us avoid an additional network call. If it's a write back cache, that means it is the source of truth, hence we don't have to go to the redis store.
Yes but doesn't every cache hit require us to update the rate counter by 1?@@jordanhasnolife5163
@@jordanhasnolife5163 does write back cache just mean that lookup logic lives in LB and rate limiter service is just responsible for writing the data into Redis? so separating the responsibility between LB and Rate Limiter?
@@davidoh0905 It just means that a subset of the rate limiting data will live in the load balancer, as opposed to redis.
which app do you use for the white boarding in your video?
OneNote
I'm curious how you would design a real time bidding system, or multiplayer game server
I do have a video on the game server! Just have to remake it at some point. For bidding, probably some partitioned redis cache on auctionId, you'll have to use atomic operations to increase the bid.
@@jordanhasnolife5163 i will look for the game server vid! Would you use kafka and flink to process the bidding requests? Would you stream process it or batch process it? And how would you update and show the users the last bidding if it constantly update? Thanks in advance 🙏🏻
Rate lmiting is done per server level right. So while the esitmation, why did you multiply with 20 serviers. we use 20 different rate limiters with the limit 12 GB
I don't know what you mean "per server", as there are many servers. Yeah, you can always partition, but the amount of data needed to be stored on a cumulative basis is for all services. We have a rate limiting service that scales independently from our application servers.
At this point, you should just double down on the memes man, love that stuff!
also, pp :p
ᕦ(ò_óˇ)ᕤ
Thank you Jordan!
Can you please share the google drive link of all the above notes ?
see substack in my channel description
Thank you for your video.
Thank you!
you dont talk about costs in any of these videos. cost is a very imp. aspect which can redefine the entire solution. But good video, you bring new dimensions to my thought process.
Agreed on the cost part, and this is certainly true IRL. Though to be fair, I don't think that most systems design interviews are expecting you to have a concrete idea of the costs of your solutions. Though at a high level, I agree that you should probably have an idea when making designs which areas are costly/could potentially be improved upon.
for atomicity we can use lua scripts into redis.
Shouldn't the Rate Limiter be part of API Gateway?
In practice, probably. Or at least the write back cache part of it should be.
Thanks for this Video too Sir...
dumb question, why don't we put it in front of lb? :|
Yeah fair, in this case I'm kinda envisioning the load balancer more as like an API gateway, where we have to hit it first to route the request to the appropriate microservice, in which case we may need to hit that first before doing rate limiting. But to be honest, yeah you can totally put the rate limiter before a load balancer.
@@jordanhasnolife5163 Thank you. If I was a girl I would start hitting on you.
W vid. very clear.
Can you do goatee tutorial? Thanks
I don't think you want that from me - how about a talking to no women tutorial I'm pro at that
🤣@@jordanhasnolife5163
can you share notes if possible ?
I will get to it eventually! Will post on my channel about it when I do
I've been binge watching your channel and all I hear is Kafka and Flink even though those terms didn't come in this video. Send help.
9K views damn dude you are blowing up
My toilet at least for sure
no flink? wtf. who are you and where’s our boy jordan??
Rare Flink L, Redis paid me off
When are gonna open a discord so we design a feet pic recommendation plateform
Holy moly someone send this guy to Y combinator asap
Considering how sarcastic he can be, lame me thought he is referring to the other "Edging".. Ahem Ahem..
Who said I'm not
golden
Shower
🙇
Rate limiter system design in Hindi : ruclips.net/video/khhe7avsw1g/видео.html
Easy to understand...
I'll let the free market decide
talk is cheap, show me your code.
Code is cheap, show me your design