i believe the partition of follower cache helps the hotspot issue because you are basically dividing the user groups that are suppose to come to the same hotspot into different cache partitions. As to the leader cache, it is more for the thundering herd issue. Image you have us-west-2 goes offline for 5 mins, then goes back online, if there is no leader cache, then all of the west coast users will hit multiple follower cache, generating a huge wave of duplicate request to the TAO dbs. Using a leader cache that has a smaller data eviction time, it adds another layer that dedupes those duplicate requests. Thus protecting the TAO dbs. Happy to discuss more
How does write-through cache and/or async replication help with 'see your own writes/ write timeliness' ? My understanding is in order to see your own writes(given replication/sync lag) requests by the same writer have to be served by the same web server -> cache -> db chain.
It helps because the reads are served immediately from the cache. And also the webservers that write to the follower cache are in the same zone/ecosystem.
due to load balancer level sticky session, the same writer client will use the same follower cache. A write request returns as long as the client region cache is populated. Thus the write sees what they write pretty much immediately. But I also do not fully understand the async replication. If a client request -> local follower cache update -> local lead cache update -> local db write is completed, then there is no need to wait for the local lead cache update -> master region lead cache update -> master region db update -> async local region db replication to happen. Is the db async replication only done for consistency purpose? If so can we just use CRDT?
This is GOLDMINE. Amazed with how the system is laid. Respect for all the engineers involved.
awesome design and presentation. very helpful. Thx for sharing.
This is awesome! Thanks for sharing. Great talk.
How does the cache leader help reduce a hotspot ?
i believe the partition of follower cache helps the hotspot issue because you are basically dividing the user groups that are suppose to come to the same hotspot into different cache partitions. As to the leader cache, it is more for the thundering herd issue. Image you have us-west-2 goes offline for 5 mins, then goes back online, if there is no leader cache, then all of the west coast users will hit multiple follower cache, generating a huge wave of duplicate request to the TAO dbs. Using a leader cache that has a smaller data eviction time, it adds another layer that dedupes those duplicate requests. Thus protecting the TAO dbs. Happy to discuss more
How does write-through cache and/or async replication help with 'see your own writes/ write timeliness' ?
My understanding is in order to see your own writes(given replication/sync lag) requests by the same writer have to be served by the same web server -> cache -> db chain.
It helps because the reads are served immediately from the cache. And also the webservers that write to the follower cache are in the same zone/ecosystem.
due to load balancer level sticky session, the same writer client will use the same follower cache. A write request returns as long as the client region cache is populated. Thus the write sees what they write pretty much immediately.
But I also do not fully understand the async replication. If a client request -> local follower cache update -> local lead cache update -> local db write is completed, then there is no need to wait for the local lead cache update -> master region lead cache update -> master region db update -> async local region db replication to happen. Is the db async replication only done for consistency purpose? If so can we just use CRDT?
Simple and Elegant
Well this is a dense talk...
Meep