Jordan has no life
Jordan has no life
  • Видео 221
  • Просмотров 2 284 448
Apache Arrow - A Game Changer? | Distributed Systems Deep Dives With Ex-Google SWE
I used many sources but I found these the most useful:
ruclips.net/video/R4BIXbfKBtk/видео.html&ab_channel=Dremio
ruclips.net/video/OLsXlKb_XRQ/видео.html&ab_channel=Databricks
arrow.apache.org/faq/
arrow.apache.org/docs/python/memory.html#on-disk-and-memory-mapped-files
ursalabs.org/blog/2020-feather-v2/
I wish the ladies were as capable of processing my data as two arrow native servers communicating with one another
Просмотров: 955

Видео

Snowflake - Power With No Tuning | Distributed Systems Deep Dives With Ex-Google SWE
Просмотров 2 тыс.День назад
event.cwi.nl/lsde/papers/p215-dageville-snowflake.pdf justinjaffray.com/query-engines-push-vs.-pull/ Snowflakes for my paper, snowflake is my personality, snowflakes at the DJ show, they're everywhere really
Spark - Fault Tolerance Made Easy | Distributed Systems Deep Dives With Ex-Google SWE
Просмотров 2,6 тыс.14 дней назад
people.csail.mit.edu/matei/papers/2012/nsdi_spark.pdf The last time I felt any sort of spark in my life was the one I felt eating taco bell at 2am on a sunday
ZooKeeper - Better Than Chubby? | Distributed Systems Deep Dives With Ex-Google SWE
Просмотров 2,8 тыс.21 день назад
www.usenix.org/legacy/event/atc10/tech/full_papers/Hunt.pdf I have a slight preference towards chubby since I'm chubby myself
Kafka - Perfect For Logs? | Distributed Systems Deep Dives With Ex-Google SWE
Просмотров 5 тыс.28 дней назад
notes.stephenholiday.com/Kafka.pdf There's no way Kafka is achieving better log throughput than my toilet though
Mesa - Data Warehousing Done Right | Distributed Systems Deep Dives With Ex-Google SWE
Просмотров 2,1 тыс.Месяц назад
static.googleusercontent.com/media/research.google.com/en//pubs/archive/42851.pdf I'm also shooting out a bunch of data every 5 minutes or so, however unlike mesa no one seems interested in it
Photon - Exactly Once Stream Processing | Distributed Systems Deep Dives With Ex-Google SWE
Просмотров 2,6 тыс.Месяц назад
static.googleusercontent.com/media/research.google.com/en//pubs/archive/41318.pdf When I do stream processing I tend to also only do it on that particular event exactly once, as I'm afraid to two phase commit
Spanner - The Perfect Database? | Distributed Systems Deep Dives With Ex-Google SWE
Просмотров 2,9 тыс.Месяц назад
storage.googleapis.com/gweb-research2023-media/pubtools/1974.pdf Spanner waits for the right time, but I won't, interpret that how you will
Systems Design in an Hour
Просмотров 26 тыс.Месяц назад
Hi all, the slides are in the channel description (see google drive link). I will add timestamps shortly. Thank you all. 2:01 Performing A Systems Design Interview 4:02 Database Fundamentals 14:26 Data Serialization Frameworks 15:47 Replication 27:47 Sharding 37:47 Batch Processing 39:02 Stream Processing 44:12 Other Types Of Storage 53:29 Caching 55:39 Load Balancing 57:34 Systems Design Inter...
Megastore - Paxos ... But Better? | Distributed Systems Deep Dives With Ex-Google SWE
Просмотров 2 тыс.2 месяца назад
www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf My ex used to call me her mega store I don't get it
Percolator - Two Phase Commit In Practice | Distributed Systems Deep Dives With Ex-Google SWE
Просмотров 2,6 тыс.2 месяца назад
www.usenix.org/legacy/event/osdi10/tech/full_papers/Peng.pdf Gonna pop a percolator after this so I can relax
Dremel - Columns Are Better | Distributed Systems Deep Dives With Ex-Google SWE
Просмотров 2,1 тыс.2 месяца назад
Paper: static.googleusercontent.com/media/research.google.com/en//pubs/archive/36632.pdf The ladies are always curious about repetitions when it comes to my column Obligatory: before you ask me about the protocol buffers stuff, read the paper lol. It may help disambiguate things
Google SSO - Strong Consistency in Practice | Distributed Systems Deep Dives With Ex-Google SWE
Просмотров 3,7 тыс.2 месяца назад
This video is sponsored by Brilliant. To try everything Brilliant has to offer-free-for a full 30 days, visit brilliant.org/Jordanhasnolife/ . You’ll also get 20% off an annual premium subscription. paper link: www.usenix.org/legacy/event/worlds06/tech/prelim_papers/perl/perl.pdf The only thing single copy semantics, single sign on, and Jordan have in common is that we're all single.
BigTable - One Database to Rule Them All?. | Distributed Systems Deep Dives With Ex-Google SWE
Просмотров 2,7 тыс.2 месяца назад
storage.googleapis.com/gweb-research2023-media/pubtools/4443.pdf I've only got one column in my column family but recently a lot of ladies have been requesting read access
Google File System (GFS) - It's Ok To Fail | Distributed Systems Deep Dives With Ex-Google SWE
Просмотров 5 тыс.3 месяца назад
static.googleusercontent.com/media/research.google.com/en//archive/gfs-sosp2003.pdf They used to call me the shadow master in my gym locker room
Chubby - Eventual Consistency Is Too Hard... | Distributed Systems Deep Dives With Ex-Google SWE
Просмотров 4,2 тыс.3 месяца назад
Chubby - Eventual Consistency Is Too Hard... | Distributed Systems Deep Dives With Ex-Google SWE
MapReduce - Google Thinks You're Bad At Coding | Distributed Systems Deep Dives With Ex-Google SWE
Просмотров 6 тыс.3 месяца назад
MapReduce - Google Thinks You're Bad At Coding | Distributed Systems Deep Dives With Ex-Google SWE
Dynamo - Why Amazon Ditched SQL | Distributed Systems Deep Dives With Ex-Google SWE
Просмотров 15 тыс.3 месяца назад
Dynamo - Why Amazon Ditched SQL | Distributed Systems Deep Dives With Ex-Google SWE
time for a change
Просмотров 15 тыс.4 месяца назад
time for a change
31: Distributed Priority Queue | Systems Design Interview Questions With Ex-Google SWE
Просмотров 8 тыс.4 месяца назад
31: Distributed Priority Queue | Systems Design Interview Questions With Ex-Google SWE
30: LinkedIn Mutual Connection Search | Systems Design Interview Questions With Ex-Google SWE
Просмотров 5 тыс.4 месяца назад
30: LinkedIn Mutual Connection Search | Systems Design Interview Questions With Ex-Google SWE
29: Amazon Payment Gateway | Systems Design Interview Questions With Ex-Google SWE
Просмотров 13 тыс.4 месяца назад
29: Amazon Payment Gateway | Systems Design Interview Questions With Ex-Google SWE
28: Bidding Platform (eBay) | Systems Design Interview Questions With Ex-Google SWE
Просмотров 8 тыс.5 месяцев назад
28: Bidding Platform (eBay) | Systems Design Interview Questions With Ex-Google SWE
27: High Throughput Stock Exchange | Systems Design Interview Questions With Ex-Google SWE
Просмотров 11 тыс.5 месяцев назад
27: High Throughput Stock Exchange | Systems Design Interview Questions With Ex-Google SWE
26: Robinhood Stock Trading Platform | Systems Design Interview Questions With Ex-Google SWE
Просмотров 8 тыс.5 месяцев назад
26: Robinhood Stock Trading Platform | Systems Design Interview Questions With Ex-Google SWE
25: Live Streaming (Twitch) | Systems Design Interview Questions With Ex-Google SWE
Просмотров 7 тыс.5 месяцев назад
25: Live Streaming (Twitch) | Systems Design Interview Questions With Ex-Google SWE
24: Video/Conference Calling (Zoom) | Systems Design Interview Questions With Ex-Google SWE
Просмотров 7 тыс.5 месяцев назад
24: Video/Conference Calling (Zoom) | Systems Design Interview Questions With Ex-Google SWE
23: Multiplayer Battle Royale Video Game | Systems Design Interview Questions With Ex-Google SWE
Просмотров 5 тыс.6 месяцев назад
23: Multiplayer Battle Royale Video Game | Systems Design Interview Questions With Ex-Google SWE
Insights From an L7 Meta Manager: Interviews, Onboarding, and Building Trust
Просмотров 52 тыс.6 месяцев назад
Insights From an L7 Meta Manager: Interviews, Onboarding, and Building Trust
22: Recommendation Engine (YouTube, TikTok) | Systems Design Interview Questions With Ex-Google SWE
Просмотров 11 тыс.6 месяцев назад
22: Recommendation Engine (RUclips, TikTok) | Systems Design Interview Questions With Ex-Google SWE

Комментарии

  • @costathoughts
    @costathoughts 18 часов назад

    I was thinking about the recently Apache Fury vs protobuf

  • @anjanobalesh8046
    @anjanobalesh8046 18 часов назад

    @9:30 you said if we want to insert at x and x is occupied we do x+1 . What of x+1 is out of range will we move to next partition ?

  • @skullTT
    @skullTT 20 часов назад

    for hot assets like AAPL, GOOG, do they have dedicated pricing servers? for example, one layer-1 AAPL pricing server directly connect to publisher, multiple layer-2 AAPL pricing server connected to the layer-1 AAPL pricing server, user server will connect any one of the layer-2 pricing server

  • @tal32123
    @tal32123 21 час назад

    Right wrist and no tissues... Due to apache arrow?

  • @thanhn2001
    @thanhn2001 22 часа назад

    Thanks for suffering through your sore throat to deliver this great lesson to us

  • @ankur20010
    @ankur20010 23 часа назад

    how easy is to register a spark consumer to zookeeper? Because spark consumers are configured by and are managed by spark driver program. Does spark controller provide this info via any API? And even if it provides, since the executors number is dynamic, how will flink manage the merge sort of list?

  • @ankur20010
    @ankur20010 День назад

    I think that you can't control which HDFS node to write the top K list. The distribution is handled by HDFS and all that is visible to you is a directory structure. Same directory may be distributed across different data nodes.

  • @mayankchhabra3070
    @mayankchhabra3070 День назад

    Hi Jordan, Thanks for the amazing video! I had a few questions around the posts DB. 1) As we are using Cassandra here which is a leaderless replication DB.Lets say a user uploads a post, and the user immediately wants to update the post. Because we are using a leaderless replication, its possible that the user may/may-not be able to read their own writes here. Does it make sense to have some kind of Write-back caches: which can provide read-your-writes consistency so that a user can update a very recently uploaded post and then flush these writes/updates to Cassandra (assuming most of the users only update a new posts within 5-10 minutes from the time of the upload)? 2) Follow up to the first question, if we have a write-back cache (assuming we use a distributed cache) and if the cache goes down which would lead to the posts getting dropped as they were not committed to the DB. In this case would a Write Ahead Log with cache help us make this more fault tolerant?

  • @dmytro.soltysiuk
    @dmytro.soltysiuk День назад

    I know it’s 1 year old and probably you know this already, but DBs like Postgres use MVCC, so you don’t need to lock rows that you update in your transaction.

  • @ihor4256
    @ihor4256 День назад

    correct me if I'm wrong but we don't need serializable transactions for booking one seat itself(i am not talking about claiming e.t.c). if we have a lock. Let's suppose that you acquire a distributed lock when you try to create/update the booking. Even if you decide to have a separate payment entity, most probably you don't need to update the balance, only create it with a reference to the booking id enough. You are gonna use a 3rd party vendor to send the data, so if the data is successfully transferred you just commit the transaction. Atomicity is provided by default. There are no 2 users that can have a lock at the same time (provided by consensus algorithm), and you just release the lock when a transaction is committed.

    • @ihor4256
      @ihor4256 День назад

      I was wrong in my previous message. Technically, 2 booking requests can occur at the same time when TTL expires(and you are already booking) and another one decided to acquire a lock(reserved) and then immediately started booking. This can be fixed using either Serializable transactions OR a better way to construct a booking id to be ticket_id + concert_time + seat_number. Thus, your entire insert transaction would fail if one of those constraint violated. For updates, you could just use optimistic locking. Am I right ?

  • @theyruinedyoutubeagain
    @theyruinedyoutubeagain День назад

    At first I thought you rated Meghan _Trainor_ a 10 and I almost threw up

  • @theyruinedyoutubeagain
    @theyruinedyoutubeagain День назад

    Arrow Flight is a bit like cpnproto or flatbuffers

  • @baluuspremium-dp3kx
    @baluuspremium-dp3kx День назад

    Hey Jordan! I love your simplified explanation of Advanced concepts. Kudos to your oratory skills!!!! It would be great if you can make a video on Elastic block storage(AWS) or Persistent Disk(GCP). Much Appreciated!!!!!

  • @baluuspremium-dp3kx
    @baluuspremium-dp3kx День назад

    Hey Jordan! I love your simplified explanation of Advanced concepts. Kudos to your oratory skills!!!! It would be great if you can make a video on Elastic block storage(AWS) or Persistent Disk(GCP). Much Appreciated!!!!!

  • @TheSakox
    @TheSakox День назад

    Dude I loooove arrow and what it has to offer for future of open source data engineering, so glad you also did a video on it!

  • @cunningham.s_law
    @cunningham.s_law День назад

    lets goo

  • @jhguygih
    @jhguygih День назад

    Can we partition in single node? Like create partition in mysql. Whats the benefit like just going directly to the Hard drive section where the data is? Because I assume the db files are already sized limited and in the B tree we know where they are

  • @devashishdalvi7527
    @devashishdalvi7527 День назад

    0:40 wierd correlation indeed 😂

  • @fma654321
    @fma654321 День назад

    Hey Jordan! This is super helpful. Thanks so much! Quick question about ensuring jobs only run once - the retry logic via the scheduling node a la timestamps is intentional right? As in, we intend to run the same job more than once so was a bit confused by that concern at around 23:05. Even if an executor goes down, we want to retry that job right? I guess I was bit confused why "running jobs once" is even a concern as that is expected behavior.

  • @Spyrie
    @Spyrie 2 дня назад

    Didn't know Kylo Ren do tech stuffs

  • @atharvakamble5785
    @atharvakamble5785 2 дня назад

    video so good, liked in the first 1 minute

  • @vetiarvind
    @vetiarvind 2 дня назад

    Hey Jordan this is awesome content and you're a great teacher. Listening to it in the gym from India and I can follow everything.

  • @rashminpatel3716
    @rashminpatel3716 2 дня назад

    Hey Jordan, thanks for the amazing system design video as usual !! I have one doubt on usage of Flink. Whenever atleast one flink compute node goes down or restarts, then the flink job fails and it has to restore the entire state across all the nodes from S3. So, this whole restoration process can take few minutes. So, our message delievery will be delayed for that many minutes, affecting the entire user base. Is that understanding correct ?

    • @jordanhasnolife5163
      @jordanhasnolife5163 2 дня назад

      I don't think this would require restoring all nodes from S3. I would just expect no messages to be processed from that partition during that point in time.

    • @rashminpatel3716
      @rashminpatel3716 21 час назад

      @@jordanhasnolife5163 Yes, it would be great that way, but flink relies on distributed consistent snapshot across nodes. So, checkpointing and restoration operates at job level, it cannot be split at node level.

  • @shashankpratapwar-wj7xl
    @shashankpratapwar-wj7xl 2 дня назад

    I had a hard time reading alex petrov's database internals book, so I quit after few days. But this is informative and engaging at same time. Looking forward to entire series.

  • @cdgtopnp
    @cdgtopnp 3 дня назад

    Chapters please 🙏

  • @VibhorKumar-uo9dd
    @VibhorKumar-uo9dd 3 дня назад

    One question regarding Fan out approach. While pushing posts to each followers, we are pushing that to a particular News Feed Caches corresponding to that user. My doubt is whether these news feed caches are just an another caching layer sharded on user id(let's say 10 caching servers sharded on userid for 100 users), or they specific to the user(100 users 100 caches in that case)?

  • @yuxuanche2552
    @yuxuanche2552 3 дня назад

    Hi Jordan, just wondering if the mutual connection databases in 14:14 are the same as the mutual cache table?

  • @jaeorgjaehpaejh
    @jaeorgjaehpaejh 4 дня назад

    Jordan sat alone in his dimly lit room, eyes fixed on the screen. Kafka Streams flowed before him like a slow, seductive dance. His fingers moved over the keyboard, sending commands that made the data bend to his will-smooth, precise, and totally in his control. “Processing this much data feels like... processing my love life,” he chuckled, leaning in closer. “A little messy at first, but once I get my hands on it, everything falls into place... perfectly.” The logs rolled in real time, a rhythm that matched his heartbeat. Who needed real life when the streams responded to him this way? Controlled. Obedient. Alive. Jordan didn’t just process data-he made it swoon.

    • @jordanhasnolife5163
      @jordanhasnolife5163 2 дня назад

      Real or ChatGPT? Either way, well done, I do think about kafka streams a lot

  • @DimitarVandev
    @DimitarVandev 4 дня назад

    cool

  • @vinaybabu2635
    @vinaybabu2635 4 дня назад

    Hey Jordan, do you have any course on Udemy ? If not highly recommend u do that

    • @jordanhasnolife5163
      @jordanhasnolife5163 2 дня назад

      I do not! Everything that I post I want to be free! If I'm going to sell something, hopefully it can provide some real utility to you guys :)

  • @srishti2k22-iw5dh
    @srishti2k22-iw5dh 4 дня назад

    great

  • @rish.i
    @rish.i 4 дня назад

    Thank you Jordan for this series. In the paper when comparing b/w 3 partitions approaches its written that for fixed size 3rd strategy, the membership information stored at each node is reduced by three orders of magnitude. Whereas in previous paras, its mentioned that 3rd strategy stores not only no of tokens (servers) hash as in 1st but also which partition info stored on each node. Isn’t that contradictory info or am misunderstanding something. Ideally third partition scheme should contain more membership information per node. Assuming that they have not changed request forwarding from O(1) hops to logn dht like chord or pastry like routing whereas each node stores only limited number of nodes information sacrificing direct hops.

    • @jordanhasnolife5163
      @jordanhasnolife5163 2 дня назад

      I'm actually not quite sure, but I imagine that for fixed tokens perhaps you can just give each token a name and say which token belongs to which node, rather than explicitly listing out the ranges, which saves a bit of information to propagate (maybe if there's 128 tokens for example, you only need a short to communicate a token name)

  • @ankitgomkale11
    @ankitgomkale11 4 дня назад

    Hey Jordan, I just wanted to take a moment to express my deep gratitude. I recently received 5 offers from tier-1 companies, including an offer from Google for a Staff Engineer role. Your videos have been an absolute game-changer for me throughout this journey. I can't thank you enough for the insights and guidance you've shared-it's made a world of difference. Please keep up the amazing work, you're truly making an impact!

    • @jordanhasnolife5163
      @jordanhasnolife5163 2 дня назад

      Hey Ankit! That's unbelievable! Congratulations and keep killing it!! I'm really glad all your hard work paid off :)

  • @skullTT
    @skullTT 4 дня назад

    some other videos talked about recommendation engine from ML aspect, content filtering, collaborative filtering. What is the relationship between them and this embedding approach

    • @jordanhasnolife5163
      @jordanhasnolife5163 2 дня назад

      I don't know off the top of my head, but I think you'll always need embeddings at this point if you're doing ML

  • @skullTT
    @skullTT 4 дня назад

    besides vector db, which database should we choose for other data including entity history db, neighbor index. Can we use Cassandra because for history db because it is write heavy and append only. KV store for neighbor index.

    • @jordanhasnolife5163
      @jordanhasnolife5163 2 дня назад

      what do you mean by "entity history"?

    • @skullTT
      @skullTT День назад

      @@jordanhasnolife5163 it is in the slide "Step 1a)" at 13'

  • @mostinho7
    @mostinho7 4 дня назад

    11:00 good summary Can use hash index when you want fast reads and writes but you can’t do efficient range queries. Hash index is also kept in memory not on disk

  • @chaitanyatanwar8151
    @chaitanyatanwar8151 5 дней назад

    Thank You!

  • @chaitanyatanwar8151
    @chaitanyatanwar8151 5 дней назад

    Thank you! The videos and the discussions in Comments make this channel the best source for system design.

  • @XoPlanetI
    @XoPlanetI 5 дней назад

    14:02 isn't it the same UUId and different timestamp for replacing the message?

  • @RS7-123
    @RS7-123 5 дней назад

    another comment on this awesome video after rewatching it bunch of times. so to conclude, how did u say we achieve idempotency since there seems to be no best option?

    • @jordanhasnolife5163
      @jordanhasnolife5163 2 дня назад

      I'd probably just do it on the server and if we send a duplicate notification such is life

  • @chaitanyatanwar8151
    @chaitanyatanwar8151 5 дней назад

    Thank you!

  • @abcxyz9637
    @abcxyz9637 5 дней назад

    Assigning N seats in a section to a group of k people each requesting {s_1, ..., s_k} seats is a Bin-packing problem. Simply allocating seats in FIFO order may lead to unfairness and sub-optimal allocation (people at the end of the list and requesting higher number of seats will be most impacted; and we want to sell as many tickets as possible). Although, it's not practically possible to solve Bin-packing in real-time, a simple optimization would be to maintain the sum of total seats requested by people for every section. In real-time, while the bookings are on-going, that sum must not exceed N. The actual seat allocation may be done offline e.g., seats can be finalized after the booking closes, and the users can be sent their tickets via mail/phone [AWS SNS] with actual seat numbers.

    • @jordanhasnolife5163
      @jordanhasnolife5163 2 дня назад

      That's totally fine by me if it's allowed by your interviewer to compute exact seats after the fact!

  • @guitarMartial
    @guitarMartial 5 дней назад

    Jordan would HBase be a better alternative to MySQL here? You get write ahead log style indexing which can then be leveraged to build our heap as well and expire elements as they are consumed.

    • @jordanhasnolife5163
      @jordanhasnolife5163 2 дня назад

      I think you'd have to hack at HBase a bit, especially considering our heap is probably biger than HBase's mem table. Could maybe be an optimization though!

  • @chaitanyatanwar8151
    @chaitanyatanwar8151 6 дней назад

    Thank you!

  • @mayankchhabra3070
    @mayankchhabra3070 6 дней назад

    If we take the example of creating a search on top of chats and if we partition it at chat_id wont that lead to an uneven distribution of data? Given elastic search has these shards and it tries to distribute the data evenly across all the shards but if we explicitly route our data to a specific shard (using chat_id in our example) it can lead to uneven distribution of data across shards where one chat might have active and other might be dormant. Just thinking out loud how we would solve for this :P (Probably distribute it evenly by using some composite key but that would defy the purpose to just search chats from one partition)

    • @jordanhasnolife5163
      @jordanhasnolife5163 2 дня назад

      Using many small partitions and balancing them appropriately I believe tends to be the preferred approach here

  • @hazemabdelalim5432
    @hazemabdelalim5432 6 дней назад

    Why do you hashing on the user id and just store the session state in a separate storage like redis ? Sticky session will not guarantee the even distribution of traffic and it might be a bottleneck for very high amount of traffic

    • @jordanhasnolife5163
      @jordanhasnolife5163 2 дня назад

      Yeah at the end of the day this is fine too, if you have to deliver a user notification to two servers it's not a big deal

  • @rashminpatel3716
    @rashminpatel3716 6 дней назад

    All those channels are more redundant than replicated database !! True that 😂

  • @Jayvil773
    @Jayvil773 6 дней назад

    Thank you for the series, finally finished it! You're a literal chad, I can only hope to grow up to be like you one day.

  • @adishgangwal7105
    @adishgangwal7105 7 дней назад

    Hi Jordan - In order service - the second Flink which is sharded by product ID --how does it know that it has received all the products of the cart before sending email to the user ?

    • @jordanhasnolife5163
      @jordanhasnolife5163 6 дней назад

      I hadn't really made that my intention, and in the diagram we're willing to send multiple emails. If we wanted to do this, we'd probably have to split the products like we do here, have the original order id with a number of products and order id in the message, and then send to one final kafka queue + consumer to aggregate on order id

  • @RS7-123
    @RS7-123 7 дней назад

    great video. thanks for the incredible contribution. 2 questions 1) how do you notify users who aren’t online about unpopular posts. i would assume you need some sort of user specific queue after you fanout. the notifications table is sharded by topic id so it possibly won’t be queried directly for unpopular posts. 2) how do you notify users who are online about popular posts since i don’t see it connected to the web socket flow. i assume you expect them to poll periodically to see if they have any popular posts?

    • @jordanhasnolife5163
      @jordanhasnolife5163 6 дней назад

      1) when they come back online, they'll basically hit a cache of all notifications meant for them specifically (cache is partitioned by hash of user id), and combine this with popular notifications they were subscribed to 2) yep, just polling!