Amazing content as always dude. Love how much in depth you go in all of your videos! My favorite channel of all by far! Have recommended this to several friends.
as allways great videos - 18:00 forgot all that stuff about ESs caching so thanks for the reminder, gonna reread that part in ES docs. great job knowing about lucene most of my applicants have no clue about ES definitely not that lucene is not a db but a search engine (hate the json syntax, but what can you do) again, super fun to listen to your vids and watch this content
If we use the local index (meaning each node stores term -> [doc id's] and multiple nodes can reference the same term), does this mean we need to query all the nodes to answer a search query? How do we know which nodes have the term we are interested in if we are not partitioning by term?
@@jordanhasnolife5163 it’s pretty cool, I used it at my last company. We used debezium to capture changes from the database using the WAL, it would then write to a Kafka topic and we can read off it. The one downside here is that it writes all the messages into a single topic, and a single partition to ensure ordering. So the approach you mentioned of writing directly to Kafka will allow us to write to multiple partitions if needed (allowing more parallelization)
Hey Jordan, Question here: 1. Can we have hierarchical shards. Can we have a sharding strategy to segregate nodes by search terms. i.e. aa -> going to shard1, ab -> shard2, abc to abz -> shard3, ac -> shard4 and so on. 2. With this strategy we will be writing document ids for terms to particular shards. 3. In case of any hotspotting we can further split it. 4. In case of extremely popular terms like trump, we can have it span over multiple shards. 5. We can keep track of the shards using gossip protocol or zookeeper. From what I understood, in our current solution, we are doing a scatter gather where we are querying each partition and aggregating it. If we have hundreds of partitions, then we will have to filter out a lot of results. I understand that we can do it in parallel, but if we are only interested in top 2 results, then we will end up fetching top 2 results locally in every partition and then filter it out.
1) Yes you can 2) Agreed, but you have to be smart about how this is done, if you think of twitter you have a shit ton of tweets with the word "the" in it, how do you smartly partition these? 3) See #2 4) See #2 5) Sure Agree with your general sentiment, and agree that it theoretically makes more sense. In practice, I believe the majority of search terms that are used are extremely popular anyways, and so just keeping a local index of recently published tweets with some scoring of each local document (keep in mind relative scoring is tough as well when doing global partitioning because you need access to the whole document to do TF-IDF on a particular search term, meaning the document itself needs to be sent to potentially hundreds of shards on write).
@@jordanhasnolife5163 I see, so when a user searches for a term, the request would hit all partitions to get subset results and then they would be aggregated?
Wrote a long comment about how a posting list (documents containing a term) is implemented as a skip-list + encoding as per apache/lucene github repo Lucene99PostingsFormat. As I was wondering why we can't use similar idea for follower/following list storage in news feed problem (from System Design 2). But it's only viable if you either store the data in Lucene (I guess no one does that with this purpose in mind) or if you have a full control over DB code, so that you can do such advanced customization over a column (also not practical). nice guns
Hey man, qq. I was wondering if you thought it would be important in an interview to mention how we know which machine holds which partition? I was thinking maybe we could have a distributed search/index service that maintains the mappings between the partition -> machine. And that mapping could be made consistent across the “search/index service” nodes via a consensus algo or maybe zk. Does this make sense at all or am I missing something? Maybe it’s the local secondary indexes that take care of the problem I’m describing and I just don’t understand 🤷♂️
Like rather than relying on local index, if we knew which machine held which partition couldn’t we just go directly to the correct shard and perform a binary search?
Yes you would use zookeeper or a gossip protocol to keep track of which docs are held on which partition. Though this shouldn't really matter since we have to query each partition anyways.
Hmm sorry, maybe this is going over my head. Why is it that we need to query each partition if we know exactly the partition that contains the word we’re looking for? Like say someone is searching the word “gigachad” and we know that machine 1 holds the partition range with that word in it. Couldn’t we go directly to machine 1 and perform a binary search there rather than querying all the shards? Maybe my understanding is off?
@@neek6327 We aren't partitioning that way here - we are partitioning by groups of document Ids, not term. While in theory, partitioning by term is optimal, the reality is that there are often too many document IDs associated with one term to fit on a given machine, and as a result we have no choice really but to use local indexes on a group of documents.
Another great video. However, if possible, could you please merge this architecture diagram with the one your created on ruclips.net/video/bOhQLr7nbhQ/видео.html That way, users would be able to correlate how we extended our architecture to cover "search" use case. It seems we have shown "Log based MB" i.e. Kafka while on the other video, we mentioned both the "In-memory" as well as Kafka in the architecture and that may confuse people. Thank you very much for sharing your wonderful and quite practical designs.
Amazing content as always dude. Love how much in depth you go in all of your videos! My favorite channel of all by far! Have recommended this to several friends.
Thanks Snehil!
as allways great videos - 18:00 forgot all that stuff about ESs caching so thanks for the reminder, gonna reread that part in ES docs. great job knowing about lucene most of my applicants have no clue about ES definitely not that lucene is not a db but a search engine (hate the json syntax, but what can you do) again, super fun to listen to your vids and watch this content
Thanks Mickey!
Will Search Service pull the actual documents from the DB once it receives the documents Ids from cache/search index?
Yep!
dude u along with neetcode are my goto. Great content, and clear explanations.
I appreciate it!!
Sshhh Man's been hiding the gun show this whole time. Giga Chad on the low
Gotta do it to compensate for my miniscule peen
If we use the local index (meaning each node stores term -> [doc id's] and multiple nodes can reference the same term), does this mean we need to query all the nodes to answer a search query? How do we know which nodes have the term we are interested in if we are not partitioning by term?
Yes, you have to query them all and aggregate. It's unfortunate, but there's typically too much data to shard by term as opposed to document.
@@jordanhasnolife5163 could we first partition by term, then further partition into multiple shards if a single term has too much data?
16:00 we can also use debezium (for certain databases) and that would write to kafka and listen on that topic
I'll have to look into this! Haven't had the privilege of using Kafka during my career so haven't heard of debezium
@@jordanhasnolife5163 it’s pretty cool, I used it at my last company. We used debezium to capture changes from the database using the WAL, it would then write to a Kafka topic and we can read off it.
The one downside here is that it writes all the messages into a single topic, and a single partition to ensure ordering. So the approach you mentioned of writing directly to Kafka will allow us to write to multiple partitions if needed (allowing more parallelization)
16:00 exactly, I was thinking the same thing. Typically write to the source of truth, and use a queue to send it out to the various locations
today you assume the queue is the source of truth, then you spill into s3
Don't we need a parser/lexer service between kafka and search index that parses the tweets, hashes it to the correct partitions of the search index ?
Something like elastic search will do this for us, hence why I don't explicitly include it.
liked solely for the description
Hey Jordan, Question here:
1. Can we have hierarchical shards. Can we have a sharding strategy to segregate nodes by search terms. i.e. aa -> going to shard1, ab -> shard2, abc to abz -> shard3, ac -> shard4 and so on.
2. With this strategy we will be writing document ids for terms to particular shards.
3. In case of any hotspotting we can further split it.
4. In case of extremely popular terms like trump, we can have it span over multiple shards.
5. We can keep track of the shards using gossip protocol or zookeeper.
From what I understood, in our current solution, we are doing a scatter gather where we are querying each partition and aggregating it. If we have hundreds of partitions, then we will have to filter out a lot of results. I understand that we can do it in parallel, but if we are only interested in top 2 results, then we will end up fetching top 2 results locally in every partition and then filter it out.
1) Yes you can
2) Agreed, but you have to be smart about how this is done, if you think of twitter you have a shit ton of tweets with the word "the" in it, how do you smartly partition these?
3) See #2
4) See #2
5) Sure
Agree with your general sentiment, and agree that it theoretically makes more sense. In practice, I believe the majority of search terms that are used are extremely popular anyways, and so just keeping a local index of recently published tweets with some scoring of each local document (keep in mind relative scoring is tough as well when doing global partitioning because you need access to the whole document to do TF-IDF on a particular search term, meaning the document itself needs to be sent to potentially hundreds of shards on write).
great depth of core search design discuss with bit comedy ;)
Is it recommended to partition the search index by term or by the tweet_id/user-id?
I don't think by term is realistic at Twitter's scale to be honest
@@jordanhasnolife5163 I see, so when a user searches for a term, the request would hit all partitions to get subset results and then they would be aggregated?
@@lv0320 Ideally, there should be some partitions that correspond to recent data. That way you don't have to hit all of them.
Love your content ! Keep up the good work !!
Thanks Anupam!!
Wrote a long comment about how a posting list (documents containing a term) is implemented as a skip-list + encoding as per apache/lucene github repo Lucene99PostingsFormat. As I was wondering why we can't use similar idea for follower/following list storage in news feed problem (from System Design 2). But it's only viable if you either store the data in Lucene (I guess no one does that with this purpose in mind) or if you have a full control over DB code, so that you can do such advanced customization over a column (also not practical).
nice guns
Interesting, haven't heard of that data structure but would agree that it may be an overoptimization.
Thanks, I work hard on the guns haha
Hey Jordan, do you have any course on Udemy ? If not highly recommend u do that
I do not! Everything that I post I want to be free! If I'm going to sell something, hopefully it can provide some real utility to you guys :)
Hey man, qq. I was wondering if you thought it would be important in an interview to mention how we know which machine holds which partition? I was thinking maybe we could have a distributed search/index service that maintains the mappings between the partition -> machine. And that mapping could be made consistent across the “search/index service” nodes via a consensus algo or maybe zk. Does this make sense at all or am I missing something? Maybe it’s the local secondary indexes that take care of the problem I’m describing and I just don’t understand 🤷♂️
Like rather than relying on local index, if we knew which machine held which partition couldn’t we just go directly to the correct shard and perform a binary search?
Yes you would use zookeeper or a gossip protocol to keep track of which docs are held on which partition. Though this shouldn't really matter since we have to query each partition anyways.
Hmm sorry, maybe this is going over my head. Why is it that we need to query each partition if we know exactly the partition that contains the word we’re looking for?
Like say someone is searching the word “gigachad” and we know that machine 1 holds the partition range with that word in it. Couldn’t we go directly to machine 1 and perform a binary search there rather than querying all the shards?
Maybe my understanding is off?
@@neek6327 We aren't partitioning that way here - we are partitioning by groups of document Ids, not term. While in theory, partitioning by term is optimal, the reality is that there are often too many document IDs associated with one term to fit on a given machine, and as a result we have no choice really but to use local indexes on a group of documents.
Got it, that makes sense. Thanks 🙏
I liked your videos, new sub!
amazing, very helpful
Thanks giga bro
np gigachad
Did you really just "NOPQRS"ed to figure out what comes after P?
I am dumb
thank you!
Gigachad42 in da house
I've actually evolved to gigachad43 now
i dont understand most of the things. but thanks for the video.
feel free to elaborate
5:13 lol
Grokking the system design sucks at this question ngl, searched for a solution right after reading it
Another great video. However, if possible, could you please merge this architecture diagram with the one your created on ruclips.net/video/bOhQLr7nbhQ/видео.html That way, users would be able to correlate how we extended our architecture to cover "search" use case. It seems we have shown "Log based MB" i.e. Kafka while on the other video, we mentioned both the "In-memory" as well as Kafka in the architecture and that may confuse people. Thank you very much for sharing your wonderful and quite practical designs.
I think you might find what you're looking for in the twitter video of the 2.0 series!
00:40 lmaooooo the day in my life as a software engineer videos are cringey af
interviewee: Api design going to be pretty tiny
Interviewer: How much tiny?
Interviewee: You know....
This guy gets it 😙