Gaurav nice video. One comment. Writeback cache refers to writing to cache first and then the update gets propagated to db asynchronously from cache. What you're describing as writeback is actually write-through, since in write through, order of writing (to db or cache first) doesn't matter.
Write-through: data is written in cache & DB; I/O completion is confirmed only when data is written in both places Write-around: data is written in DB only; I/O completion is confirmed when data is written in DB Write-back: data is written in cache first; I/O completion is confirmed when data is written in cache; data is written to DB asynchronously (background job) and does not block the request from being processed
Other variants 1. There are 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors. 2. There are only two hard problems in distributed systems: 2. Exactly-once delivery 1. Guaranteed order of messages 2. Exactly-once delivery
@@gkcs A humble suggestion, I think you should have a sub-reddit for the channel, because these are such critical topics [not just for cracking interviews], I'm sure they'd definitely encourage healthy discussions. I think YT's comment system is not really ideal to have/track conversations with fellow channel members.
@@gkcs Can you please give some hints on WHY "out of order Delivery" is a problem in distributed systems, if the application is running on TCP ..................PLease Kindly reply.
@goutham Kolluru , Can you please give an hint on WHY "out of order Delivery" is a problem in distributed systems, if the application is running on TCP ..................PLease Kindly reply.
Notes: In Memory Caching - Save memory cost - For commonly accessed data - Avoid Re-computation - For frequent computation like finding average age - Reduce DB Load - Hit cache before querying DB Drawbacks of Cache - Hardware (SSD) much more expensive than DB - As we store more data on cache, search time increases (counter productive) Design - Database (Infinite information) vs Cache (Relevant information) Cache Policy - Least Recently Used (LRU) - Top entires are recent entries, remove least recently used entries in cache Issue with caches - Extra calls - When we couldn’t find entry in cache, we query from database. - Threshing - Input and output cache without ever using results - Consistency - When update DB, we must maintain consistency between cache and DB Where to place the cache - Close to server (in memory) - Benefit - Fast - Issue - Maintaining consistency between memory of different servers, especially for sensitive data such as password - Close to DB (global cache, i.e. Redis) - Benefit - Accurate, Able to scale independently Write-through vs Write-back - Write-through - Update cache, before updating DB - Not possible for multiple servers - Write-back - Update DB, before updating cache - Issue: Performance - When we update the DB, and we keep updating the cache based on that, much of the data in the cache will be fine and invalidating them will be expensive - Hybrid - Any update first write to cache - After a while, persist entries in bulk to database
I can already hear the interviewer asking "with the hybrid solution: what happens when the cache node dies before it flushes to the concrete storage?" You said youd avoid using that strategy for sensitive writes but you'd still stand to lose upto the size of the buffer you defined on the cache in the e entire of failure. You'd have to factor that risk into your trade off. Great video, as always. Thank you!
Cache doesn’t stop network calls but does stop slow costly database queries. This is still explained well and I’m being a little pedantic. Good video, great excitement and energy.
Hi Guarav, I really like your videos thank you for sharing! I need to point out something about this video. Writing directly do DB and updating cache after, is called write around not write back. The last option you have provided, writing to cache and updating DB after a while if necessary, is called write back
Great video. But I wanted to point out that, I think what you are referring to as 'write-back' is termed as 'write-around', as it comes "around" to the cache after writing to the database. Both 'write-around' and 'write-through' are "eager writes" and done synchronously. In contrast, "write-back" is a "lazy write" policy done asynchronously - data is written to the cache and updated to the database in a non-blocking manner. We may choose to be even lazier and play around with the timing however and batch the writes to save network round-trips. This reduces latency, at the cost of temporary inconsistency (or permanent if the cache server crashes - to avoid which we replicate the caches)
Teaching and learning are processes. Gaurav makes it fun to learn about stuff, then let it be systems or the egg dropping problem. I might just take the InterviewReady course to participate in the interactive sessions. Take a bow!
Gaurav, what you initially described as write-back at around 10:30 I have seen described as write-around. Write-back is where you write to the cache and get confirmation that the update was made, then the system copies from the cache to the database (or whatever authoritative data store you have) later... be it milliseconds or minutes later. Write through is reliable for things that have to be ACID but it is slower than write back. You later describe what I have always heard as write-back at around 12 and a half minutes
Description for write back cache is incorrect. Write-back cache: Under this scheme, data is written to cache alone and completion is immediately confirmed to the client. The write to the permanent storage is done after specified intervals or under certain conditions. This results in low latency and high throughput for write-intensive applications, however, this speed comes with the risk of data loss in case of a crash or other adverse event because the only copy of the written data is in the cache.
Yes, as per my understanding, write-through cache : when data is written on the cache it is modified in the main memory, write back cache: when dirty data (data changed) is evicted from the cache , it is written on the main memory, so write back cache will be faster. The whole explanation around there two concepts given in this video seems fuzzy.
What you explained as write-back cache is actually a write-around cache. In write-back cache...you update only the cache during the write call and update the db later (either while eviction or periodically in the background).
If someone explains any concept with confidence & clarity like you in the interview, he/she can rock it seriously. Heavily inspired by you & love your content of system design. Thanks for the effort @Gaurav Sen
A few other reasons not to store completely everything in cache (and thereby ditching DBs altogether) are (1) durability since some caches are in-memory only; (2) range lookups, which would require searching the whole cache vs a DB which could at least leverage an index to help with a range query. Once a DB responds to a range query, of course that response could be cached.
Fun part. I was going through 'Grokking The System Design Interview' course, found the term 'Redis', started searching for more on it on youtube, landed here, finished the video and Gaurav is now asking me to go back to the course. Was going to anyway! :)
Summary Caching can be used for the following purposes: Reduce duplication of the same request Reduce load on DB. Fast retrieval of already computed things. Cache runs on SSD (RAM) Rather than on commodity hardware. Don't overload the cache for obvious reasons: It is expensive(hardware) Search time will increase Think of two things:(You obviously want to keep data that is going to be most used) !So predict! When will you load data in the cache When will you evict data from the cache Cache Policy = Cache Performance Least Recently Used Least Frequently used Sliding Window Cache Policy = Cache Performance Least Recently Used Least Frequently used Sliding Window Avoid thrashing in Cache Putting data into the cache and removing it without using it again most of the time. Issues can be of Data Consistency What if data has changed Problems with Keeping cache in Server memory(In memory) -What if the server goes down(cache will go down) -How to maintain consistency in data across cache. Mechanism Write through Always write first in the cache if there is an entry and then write in DB. The second part can be synchronous. But if you have in-memory cache for every server obviously you will enter into data inconsistency again Write back Go to Db, make an update, and check-in cache if you have the entry.. Evict it. But suppose there is no any important update and you keep evicting entries from cache like this you can again fall into thrashing. One can use Hybrid approach as per the use case. Thanks to @GauravSen
One Observation, cache need not run on expensive hardware, and for cache, one would use "memory" centric instances on the cloud, not SSD(s) and caches can be used in place of a database if the size is relatively small and you require high throughput and efficiency.
This everything what I needed. I am really looking forward to learn that how can create an online game hosting server . I researched a lot on how do it and I didn't get it what is exactly happening. Your CDN video was really good 👍. Now I have understood how exactly CDN works and why it uses distributed caching 👍💯
The draw back of write through you explained is equally applicable in Write Back i.e. I null the value in S1 still the value is not null in S2. Major thing is - Redis is not distributed cache. Even their own definition does not include the word "Distributed" - Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes with radius queries and streams. Redis has built-in replication, Lua scripting, LRU eviction, transactions and different levels of on-disk persistence, and provides high availability via Redis Sentinel and automatic partitioning with Redis Cluster.
Nice Explanation Gaurav. This video covers basics of caching. In one of the interviews, I was asked to design the Caching System for stream of objects having validity. Is it possible for you to make some video on this system design topic?
I have one doubt regarding the cache policy. Gaurav explained that for critical data we use Write Back policy to ensure consistency. In write through one instance memory cache gets updated and others can remain stale. 1) My question is same can happen in Write Back, one instance's in memory cache entry gets deleted and we update DB..other instances still have that entry. So there is inconsistency in write Back as well. Why do we prefer write back for critical data because same issue is there in write back. If answer is invalidate all instances in memory cache entry then same can be done for Write through. Which makes me ask question 2. 2) My another question is : We can update all instances' in memory cache entry and then update DB. In this way consistency is maintained so why not we use this for critical data like password financial information.
The hybrid approach suggests 1. Write update data only on the local server cache. Do not write to db 2. After some time interval, persist some chunked amount of the cache data to the db But what if between 1 and 2, the local server crashes? Isn't the update data lost forever?
I have one doubt. The definition you gave for write-back should be for write-around. In write-around, we hit the DB first and then update the cache. In write-back, we first update the cache and then wait for some time to bulk write in DB. Please let me know if my understanding is wrong.
Hi @Gaurav, In write through policy can this will be also issue , let's suppose someone did some update transaction it updated the cache and went through to db to update but there is some check is enabled in db where that transaction failed , so we will having incorrect data in cache? same goes with hybrid model out of 10 transaction let's say 2 failed while updating in db , in this way we will be having incorrect response from the cache .
@@deshkarabhishek This indeed again an example of "click bait". A person saying X but - as many others before him - explaining Y. Where Y is The Basics, and X is The Difficult. These people who this "click bait" trick are mostly people from India. I'm not saying that all Indian people upload worthless info, some of them are really spectacular - but 100% of the worthless info are from India. With regards to Redis / Caching - my guess it that RedisLabs acknowledged this "click bait" problem and uploads extremely good info. (And some of this info is actually done by some ultra intelligent Indians - because when an Indian is intelligent, he / she is extremely intelligent)
@@larskrenning260 I think you gotta keep in mind that some of what you're seeing is because of the high population and because of the higher proportion of Indians pursuing engineering. :) So I'm not sure you get anything of value from that anecdotal observation.
@@deshkarabhishek Well, that's bad. It will be great if you could share a video with your production experience. May be Gaurav can also learn about 'DISTRIBUTED' cache from you.
@@larskrenning260 lol, are you a jealous pig? cuz your comment sounds like a nazi who is not potty trained, this is youtube, not toilet, please behave and inform yourself before commenting such stupid stuff. your comment makes me feel go and throw up 100% crap people like you make this world stink I agree this video was not his best video, but you all are here and learning from him your comment shows how much of ignorant you are I would delete it if I was you
Title: What is Distributed Caching? Explained... There is not a single 'D' in this 'Distibuted' explanation. You are talking about 'cache' and it's variations in implementation ONLY. All in all, change the title to 'What is caching?'
Awesome overview thanks. One other possible issue with write-through - it's possible to make the update to the cache then the DB update itself fails. Now your cache and db will be inconsistent.
I have a question, the first point you mentioned is to reduce network calls. But as you mentioned that we need a seperate system, thus the network calls minimization stands void. Right? So, how benificial it is to use Redis if we are still doing IO calls? Is it like, DB IO call is more expensive than Redis IO call? I am a bit skeptical on this part.
@Gaurav Sen - How network call can be reduced in terms of distributed cache wherein cache would be distributed? Why distributed cache is faster than database?
Gaurav, what you are describing as a Write Back cache is actually called Write Around cache. What you describe as the hybrid mechanism, is actually called the Write Back cache. In both assumption is an asynchronous update unlike Write Through where update is synchronous. Might be worth taking this video offline and uploading a corrected version to avoid misleading folks prepping for interviews.
At 3:05 seconds, you mention that if we keep on storing everything in cache we might as well increase our search time. Isn't cache key value pairs entries and search being a O(1) operation?
It is O(1), we have have limited main memory. Once we run out, we will have to fall back on secondary storage, which is an I/O call. Also, the O(1) assumes very few collisions for hash buckets. As the number of entries per bucket increases, the search time slows too (This scenario is unlikely, but good to know about).
@@gkcs I agree with your points. That point doesn't comes through that fair and up in video. It conveys as if cache itself slows down when it is filled with more data within the given memory limit. Hope, I am making sense.
Do you implement caching on most systems? It will add complexity, how can you determine if it is worth the additional effort to develop. Love the videos by the way. These are a great learning tool, you do a great job.
One approach I use for consistency is lazy updates. On DB write instead of pushing the data back to the caches (which may never get read if a second update comes in) the DB writes the ID to invalidate to a message queue that all caches subscribe to. Then you can implement query--then-cache-on-miss semantics. This way load throughout the system is reduced, with some double-queries occurring if the cache was cleared after a good query due to latency (this can be eliminated by using versioning: using the current timestamp in milliseconds at the time of write and broadcasting it so that the cache only accepts to clear itself if the cached version # differs from the broadcasted version #)
Is there something like these worker servers have a service dedicated to subscription against publication from database update? will that not keep the caches in all servers updated?
@@gkcs suppose we have n servers and there is a service running on each of them which subscribes to database server. The database server also has a service which publishes the updates when the data gets updated inside the DB. I'm basically trying to implement subscription publishing model ( mqtt/rabbit...) to keep the cache on the servers updated.
i think you mixed write-back with write-around cache. write-back is when you just update the cache and the database gets updated at a later point in time. write-around is when the db gets updated first and then the cache gets notified asynchronously about that update.
Is it wise to use pub/sub of redis to invalidate a cache. Like each microservice publish an event in redis, now the subscriber can remove or update cache based on that.
13:00 Can you please explain why financial data should use write-back and not write-through? I thought you want high consistency it's not like social network where consistency doesn't matter. Write-through has higher consistency than write-back does it not?
A label/comment in the video about the change of usage w.r.t to write-back and write-through would help future viewers. I never saw the pinned comment until recently. This could have backfired in an interview.
Gaurav nice video. One comment. Writeback cache refers to writing to cache first and then the update gets propagated to db asynchronously from cache. What you're describing as writeback is actually write-through, since in write through, order of writing (to db or cache first) doesn't matter.
Ah, thanks for the clarification!
Yes, would be great if you can add a comment saying correction about the 'Write back cache'. Thanks for the great video!
I agree.. a comment in the video correcting this would be good update to this.
So Gaurav was also wrong in saying "write-back" is a good policy for distributed systems?
@Gaurav Yes that would be great. That part was confusing, had to read about that separately.
Write-through: data is written in cache & DB; I/O completion is confirmed only when data is written in both places
Write-around: data is written in DB only; I/O completion is confirmed when data is written in DB
Write-back: data is written in cache first; I/O completion is confirmed when data is written in cache; data is written to DB asynchronously (background job) and does not block the request from being processed
Q
Other variants
1. There are 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors.
2. There are only two hard problems in distributed systems: 2. Exactly-once delivery 1. Guaranteed order of messages 2. Exactly-once delivery
Hahahaha!
@@gkcs A humble suggestion, I think you should have a sub-reddit for the channel, because these are such critical topics [not just for cracking interviews], I'm sure they'd definitely encourage healthy discussions. I think YT's comment system is not really ideal to have/track conversations with fellow channel members.
This is an underrated comment .... 😂😂😂
@@gkcs Can you please give some hints on WHY "out of order Delivery" is a problem in distributed systems, if the application is running on TCP ..................PLease Kindly reply.
@goutham Kolluru , Can you please give an hint on WHY "out of order Delivery" is a problem in distributed systems, if the application is running on TCP ..................PLease Kindly reply.
Notes:
In Memory Caching
- Save memory cost - For commonly accessed data
- Avoid Re-computation - For frequent computation like finding average age
- Reduce DB Load - Hit cache before querying DB
Drawbacks of Cache
- Hardware (SSD) much more expensive than DB
- As we store more data on cache, search time increases (counter productive)
Design
- Database (Infinite information) vs Cache (Relevant information)
Cache Policy
- Least Recently Used (LRU) - Top entires are recent entries, remove least recently used entries in cache
Issue with caches
- Extra calls - When we couldn’t find entry in cache, we query from database.
- Threshing - Input and output cache without ever using results
- Consistency - When update DB, we must maintain consistency between cache and DB
Where to place the cache
- Close to server (in memory)
- Benefit - Fast
- Issue - Maintaining consistency between memory of different servers, especially for sensitive data such as password
- Close to DB (global cache, i.e. Redis)
- Benefit - Accurate, Able to scale independently
Write-through vs Write-back
- Write-through - Update cache, before updating DB
- Not possible for multiple servers
- Write-back - Update DB, before updating cache
- Issue: Performance - When we update the DB, and we keep updating the cache based on that, much of the data in the cache will be fine and invalidating them will be expensive
- Hybrid
- Any update first write to cache
- After a while, persist entries in bulk to database
nice, but write through and write back notes part is wrong, pls correct it. you can check other comments. thanks
Nice notes
I can already hear the interviewer asking "with the hybrid solution: what happens when the cache node dies before it flushes to the concrete storage?" You said youd avoid using that strategy for sensitive writes but you'd still stand to lose upto the size of the buffer you defined on the cache in the e entire of failure. You'd have to factor that risk into your trade off. Great video, as always. Thank you!
The world needs more people like you. Thank you!
Cache doesn’t stop network calls but does stop slow costly database queries. This is still explained well and I’m being a little pedantic. Good video, great excitement and energy.
Hi Guarav, I really like your videos thank you for sharing! I need to point out something about this video. Writing directly do DB and updating cache after, is called write around not write back. The last option you have provided, writing to cache and updating DB after a while if necessary, is called write back
Thanks Zehra 😁
I just can't find a better content on YT than this, thanks man!
Dude you are the reason for my system design interest Thanks and never stop making system design videos
I watched this video 3 times because of confusion but ur pinned comment saved my mind
thank you sir
I don't know how people can dislike your video Gaurav, you are a master at explaining the concepts.
Great video. But I wanted to point out that, I think what you are referring to as 'write-back' is termed as 'write-around', as it comes "around" to the cache after writing to the database. Both 'write-around' and 'write-through' are "eager writes" and done synchronously. In contrast, "write-back" is a "lazy write" policy done asynchronously - data is written to the cache and updated to the database in a non-blocking manner. We may choose to be even lazier and play around with the timing however and batch the writes to save network round-trips. This reduces latency, at the cost of temporary inconsistency (or permanent if the cache server crashes - to avoid which we replicate the caches)
Thank you so much for these videos!. Using this I was able to pass my system design interview.
This man is literally insane in explanation 🔥
Teaching and learning are processes. Gaurav makes it fun to learn about stuff, then let it be systems or the egg dropping problem.
I might just take the InterviewReady course to participate in the interactive sessions.
Take a bow!
I am actually using write back redis in our system but this video actually helped me to understand what's happening overall. GReat video
each of ur videos, i watched ay least twice lol, thank you!! WE ALL LOVE U! U R THE BEST!
I also watch his videos mamy times.
At least 4 times to be precise.
Gaurav, what you initially described as write-back at around 10:30 I have seen described as write-around. Write-back is where you write to the cache and get confirmation that the update was made, then the system copies from the cache to the database (or whatever authoritative data store you have) later... be it milliseconds or minutes later. Write through is reliable for things that have to be ACID but it is slower than write back. You later describe what I have always heard as write-back at around 12 and a half minutes
Yes, I messed up with the names. Thanks for pointing it out 😁
@@gkcs so does this mean mean that write-through is good for critical data (financial/passwords) and write-back/write-around is not?
Description for write back cache is incorrect.
Write-back cache: Under this scheme, data is written to cache alone and completion is immediately confirmed to the client. The write to the permanent storage is done after specified intervals or under certain conditions. This results in low latency and high throughput for write-intensive applications, however, this speed comes with the risk of data loss in case of a crash or other adverse event because the only copy of the written data is in the cache.
Thanks for pointing this out Satvik 😁👍
I believe the description in the video given for write-back cache is actually a write-around cache (according to grokking system design)
What if the cache itself is replicated? Will write-back still has risk of data loss
Yes, as per my understanding, write-through cache : when data is written on the cache it is modified in the main memory, write back cache: when dirty data (data changed) is evicted from the cache , it is written on the main memory, so write back cache will be faster. The whole explanation around there two concepts given in this video seems fuzzy.
What you explained as write-back cache is actually a write-around cache. In write-back cache...you update only the cache during the write call and update the db later (either while eviction or periodically in the background).
Nice video Gaurav, really like your way of explaining. Also, the fast forward when you write on board is great editing, keeps the viewer hooked.
If someone explains any concept with confidence & clarity like you in the interview, he/she can rock it seriously. Heavily inspired by you & love your content of system design. Thanks for the effort @Gaurav Sen
A few other reasons not to store completely everything in cache (and thereby ditching DBs altogether) are (1) durability since some caches are in-memory only; (2) range lookups, which would require searching the whole cache vs a DB which could at least leverage an index to help with a range query. Once a DB responds to a range query, of course that response could be cached.
nice quick video to get an overview. thanks Gaurav. you are helping a lot of people.
Fun part. I was going through 'Grokking The System Design Interview' course, found the term 'Redis', started searching for more on it on youtube, landed here, finished the video and Gaurav is now asking me to go back to the course. Was going to anyway! :)
Hahaha!
Good video around basic caching concepts. I was hoping to learn more about Redis (given your video title)!
amazing clarity, intuitive explanations
Thanks Gaurav, your lecture helped me to crack MS. Keep posting video's
Congrats!
Are you in the Hyd campus?
Bhai. u r a life saver! Brilliant tutoring. Thank you!
Great content. Would love to hear more about how to solve cached data inconsistencies in distributed systems.
The way you explained concepts is AWSOME.
Can you please create a video that decribes DOCKER and Containers in your style.
Thank you for the video. You could have gone a little deeper about how the cache is implemented? What’s the underlying data structure of the cache?
Summary
Caching can be used for the following purposes:
Reduce duplication of the same request
Reduce load on DB.
Fast retrieval of already computed things.
Cache runs on SSD (RAM)
Rather than on commodity hardware.
Don't overload the cache for obvious reasons:
It is expensive(hardware)
Search time will increase
Think of two things:(You obviously want to keep data that is going to be most used)
!So predict!
When will you load data in the cache
When will you evict data from the cache
Cache Policy = Cache Performance
Least Recently Used
Least Frequently used
Sliding Window
Cache Policy = Cache Performance
Least Recently Used
Least Frequently used
Sliding Window
Avoid thrashing in Cache
Putting data into the cache and removing it without using it again most of the time.
Issues can be of Data Consistency
What if data has changed
Problems with Keeping cache in Server memory(In memory)
-What if the server goes down(cache will go down)
-How to maintain consistency in data across cache.
Mechanism
Write through
Always write first in the cache if there is an entry and then write in DB.
The second part can be synchronous.
But if you have in-memory cache for every server obviously you will enter into data inconsistency again
Write back
Go to Db, make an update, and check-in cache if you have the entry.. Evict it.
But suppose there is no any important update and you keep evicting entries from cache like this you can again fall into thrashing.
One can use Hybrid approach as per the use case.
Thanks to @GauravSen
Explained like my interviewed candidate today.
Nicely packed lot of information for glimpse.. Great work
Very easy understanding Gaurav. Thanks a lot !!!
thanks for this quick tutorial :) your English is really good
Correction: INPUTING and OUTPUTTING -> Adding and Removing 5:46
One Observation, cache need not run on expensive hardware, and for cache, one would use "memory" centric instances on the cloud, not SSD(s) and caches can be used in place of a database if the size is relatively small and you require high throughput and efficiency.
I think simply telling THANK YOU will be very less for this help !!! Superb video.
Glad to help :)
I mean you can always do more by becoming a channel member 😄
wonderfully explained. thanks
This everything what I needed. I am really looking forward to learn that how can create an online game hosting server . I researched a lot on how do it and I didn't get it what is exactly happening. Your CDN video was really good 👍. Now I have understood how exactly CDN works and why it uses distributed caching 👍💯
Thank you 😁
The draw back of write through you explained is equally applicable in Write Back i.e. I null the value in S1 still the value is not null in S2. Major thing is - Redis is not distributed cache. Even their own definition does not include the word "Distributed" - Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes with radius queries and streams. Redis has built-in replication, Lua scripting, LRU eviction, transactions and different levels of on-disk persistence, and provides high availability via Redis Sentinel and automatic partitioning with Redis Cluster.
Thank you so much..! your videos are really valuable. Really appreciate your effort, sir.!!
You articulate these concepts very well. Thanks for the upload.
Excellent! Great video with tremendous info and design considerations
Very nice presentation . Simple, powerful and fast presentation. Keep up the style
Thank you!
Very informative and concepts explained clearly. Thanks
Amazing Explanation!! Thanks!!
Your System Design videos are very good and helpful, thanks!
hey Gaurav, for holidays I'll watch your videos day in and day out... So please teach new topics asap.
I love to listen you
At 2:45, you say that storing a lot of data on cache leads to increased search times. Can you explain how?
Very well explained !!
Great explanation. You are making my revision so much easier. Thanks!!
Excellent info and presentation - thanks!
Nice Explanation Gaurav. This video covers basics of caching. In one of the interviews, I was asked to design the Caching System for stream of objects having validity. Is it possible for you to make some video on this system design topic?
learned a ton in this video thanks so much
I have one doubt regarding the cache policy. Gaurav explained that for critical data we use Write Back policy to ensure consistency. In write through one instance memory cache gets updated and others can remain stale.
1) My question is same can happen in Write Back, one instance's in memory cache entry gets deleted and we update DB..other instances still have that entry. So there is inconsistency in write Back as well. Why do we prefer write back for critical data because same issue is there in write back.
If answer is invalidate all instances in memory cache entry then same can be done for Write through. Which makes me ask question 2.
2) My another question is : We can update all instances' in memory cache entry and then update DB. In this way consistency is maintained so why not we use this for critical data like password financial information.
It is a really great video.Finally found a detailed video.Thank you for sharing your knowledge!!
@12:48 Would this scenario apply if there are multiple replicas for a service with redis?
You have explained it very nicely. Thanks.
The hybrid approach suggests
1. Write update data only on the local server cache. Do not write to db
2. After some time interval, persist some chunked amount of the cache data to the db
But what if between 1 and 2, the local server crashes? Isn't the update data lost forever?
It is.
Nice video, thanks! For the hybrid mode, when S1 persists to DB in bulk, the S2 is still having the old data, right? How do we update S2?
I have one doubt. The definition you gave for write-back should be for write-around. In write-around, we hit the DB first and then update the cache.
In write-back, we first update the cache and then wait for some time to bulk write in DB.
Please let me know if my understanding is wrong.
always watching your videos. topic straight to the point. keep uploading man. thanks always.
Your explanation is awesome. Keep it up!
Thanks!
Very knowledgeable. Nicely explained
Thanks!
My boy look very energized... keep it up!
😁
The cache isnt stored on a SSD, its stored in memory right? At 2:36 you mentioned a cache is stored on an SSD.
Depends on the kind of cache
You continue to offer great content. thank you !
Great video Gaurav!
Thanks code_report 😁
this video was gold. studying for my facebook on-site and i need to understand a bit more how backend works. cheers @gaurav sen
Hi @Gaurav, In write through policy can this will be also issue , let's suppose someone did some update transaction it updated the cache and went through to db to update but there is some check is enabled in db where that transaction failed , so we will having incorrect data in cache? same goes with hybrid model out of 10 transaction let's say 2 failed while updating in db , in this way we will be having incorrect response from the cache .
this isn't distributed caching , this is simply about caching & Redis ...
@@deshkarabhishek This indeed again an example of "click bait". A person saying X but - as many others before him - explaining Y. Where Y is The Basics, and X is The Difficult. These people who this "click bait" trick are mostly people from India. I'm not saying that all Indian people upload worthless info, some of them are really spectacular - but 100% of the worthless info are from India. With regards to Redis / Caching - my guess it that RedisLabs acknowledged this "click bait" problem and uploads extremely good info. (And some of this info is actually done by some ultra intelligent Indians - because when an Indian is intelligent, he / she is extremely intelligent)
@@larskrenning260 I think you gotta keep in mind that some of what you're seeing is because of the high population and because of the higher proportion of Indians pursuing engineering. :) So I'm not sure you get anything of value from that anecdotal observation.
@@deshkarabhishek Well, that's bad. It will be great if you could share a video with your production experience. May be Gaurav can also learn about 'DISTRIBUTED' cache from you.
@@namangarg3933 correct
@@larskrenning260 lol, are you a jealous pig? cuz your comment sounds like a nazi who is not potty trained, this is youtube, not toilet, please behave and inform yourself before commenting such stupid stuff.
your comment makes me feel go and throw up
100% crap people like you make this world stink
I agree this video was not his best video, but you all are here and learning from him
your comment shows how much of ignorant you are I would delete it if I was you
Title: What is Distributed Caching? Explained...
There is not a single 'D' in this 'Distibuted' explanation. You are talking about 'cache' and it's variations in implementation ONLY.
All in all, change the title to 'What is caching?'
@7:55
Awesome overview thanks. One other possible issue with write-through - it's possible to make the update to the cache then the DB update itself fails. Now your cache and db will be inconsistent.
True 😁
I have a question, the first point you mentioned is to reduce network calls. But as you mentioned that we need a seperate system, thus the network calls minimization stands void. Right?
So, how benificial it is to use Redis if we are still doing IO calls? Is it like, DB IO call is more expensive than Redis IO call? I am a bit skeptical on this part.
This is my video on your channel and I must say that you explain very well! You seem professional, knowledgable and researched your topic well!
Great explanation for caching. I believe you'll go far.
Awesome explanation gaurav. You're cool man. We want a lottt more from you. We admire your ability to explain topics with great simplicity.
@Gaurav Sen - How network call can be reduced in terms of distributed cache wherein cache would be distributed? Why distributed cache is faster than database?
Thank you Gaurav, it was a really good explanation
Gaurav, what you are describing as a Write Back cache is actually called Write Around cache. What you describe as the hybrid mechanism, is actually called the Write Back cache. In both assumption is an asynchronous update unlike Write Through where update is synchronous. Might be worth taking this video offline and uploading a corrected version to avoid misleading folks prepping for interviews.
At 3:05 seconds, you mention that if we keep on storing everything in cache we might as well increase our search time. Isn't cache key value pairs entries and search being a O(1) operation?
It is O(1), we have have limited main memory. Once we run out, we will have to fall back on secondary storage, which is an I/O call.
Also, the O(1) assumes very few collisions for hash buckets. As the number of entries per bucket increases, the search time slows too (This scenario is unlikely, but good to know about).
@@gkcs I agree with your points. That point doesn't comes through that fair and up in video. It conveys as if cache itself slows down when it is filled with more data within the given memory limit. Hope, I am making sense.
Awesome explanation! Thanks
Thank you!
Excellent explanation
Do you implement caching on most systems? It will add complexity, how can you determine if it is worth the additional effort to develop.
Love the videos by the way. These are a great learning tool, you do a great job.
Great video, thank you!
One approach I use for consistency is lazy updates. On DB write instead of pushing the data back to the caches (which may never get read if a second update comes in) the DB writes the ID to invalidate to a message queue that all caches subscribe to. Then you can implement query--then-cache-on-miss semantics. This way load throughout the system is reduced, with some double-queries occurring if the cache was cleared after a good query due to latency (this can be eliminated by using versioning: using the current timestamp in milliseconds at the time of write and broadcasting it so that the cache only accepts to clear itself if the cached version # differs from the broadcasted version #)
Useful :)
Is there something like these worker servers have a service dedicated to subscription against publication from database update? will that not keep the caches in all servers updated?
I didn't understand your question. Please take an example.
@@gkcs suppose we have n servers and there is a service running on each of them which subscribes to database server. The database server also has a service which publishes the updates when the data gets updated inside the DB. I'm basically trying to implement subscription publishing model ( mqtt/rabbit...) to keep the cache on the servers updated.
great video,very helpful to learn english
Good video. Thank you. From Canada.
i think you mixed write-back with write-around cache. write-back is when you just update the cache and the database gets updated at a later point in time. write-around is when the db gets updated first and then the cache gets notified asynchronously about that update.
As we add more data to a cache, why would search time increase? Since we most likely are using key-value pairs, wouldn't retrieval always be O(1)?
Great explanation
Is it wise to use pub/sub of redis to invalidate a cache. Like each microservice publish an event in redis, now the subscriber can remove or update cache based on that.
We do need a pub sub mechanism for this. The pinned comment talks about it :)
13:00 Can you please explain why financial data should use write-back and not write-through? I thought you want high consistency it's not like social network where consistency doesn't matter. Write-through has higher consistency than write-back does it not?
A label/comment in the video about the change of usage w.r.t to write-back and write-through would help future viewers. I never saw the pinned comment until recently. This could have backfired in an interview.
Is global cache also runs as in-memory data store but can be deployed in a different cluster (other than app server) ?