Thank you for speeding moments when you don't talk(drawing), and thanks for your fast and understandable speech so I don't fall asleep, and Iam hooked all the time. Bravo!
Hey Gaurav.. I just feel like you are covering advanced topics.. I personally expect you to post some videos for a beginner/intermediate coder like me.
Can someone explain to me why the time complexity is O(1)? If I have 100 bitstrings, I still have to check all 100 to see which one has the most zeroes, making the time complexity O(n). Or am I missing something?
Hi Gaurav, at 4:31 the hint say about bloom filter, but when you explained it, it seemed to me more like Flajolet Martin algorithm with some little bit of Bloom Filter, does that seem right?
Hi Gaurav, this was an awesome video and I am watching the rest of the system design classes. I have having trouble with why the answer is loglog(7) at 7:28. The logic I used to justify is that the values from 0-64 have 65 numbers, which can only be expressed with 7 bits. Is this the correct logic? Thanks again and looking forward to more content.
Hii Gaurav. I've been watching your videos since a month and before that I had never started programming. Your videos are a little tough for me to understand quickly..can you provide any links or resources for programming ? Also what did you refer for programming?
I learnt programming in college, so I had 4 years and teachers to learn. I'll try and find some links to start programming, and will post them in a future video 🙂
How does this algorithm be expanded when we want multiple metrics (ex: Num of distinct active users in each region (US,EU etc)). I could think of a solution but not sure if this probabilistic algorithm can be expanded in this way. Instead of storing max zeroes in each server, we may store hash map of max zeroes observed in each region. Then consolidate for overall. Does this work? Also, what happens when a user goes offline? How is the substraction from total done?
Maybe a noob question, the 2nd solution which has a hashmap where we store users, wouldn't we need to remove users if connection isn't active anymore? In general, whats the strategy around dead connections?
We are eventually using a hash table(buckets per node) of size m, for storing the max count. Then how does this imply that we have reduced the space complexity to O(1) ? Thank you.
Bro i have been watching your content for a while now n let me tell you its bomb for students getting started with system design, keep up and thanks for the tutorials.
I have a question let’s say you’ve gotten user ids between 1-15 now when you go on to represent this range in binary your smallest will be 0001 and largest will be 1111 but if we follow the logic of keeping track of maximum number of 0’s we’ve seen in right then that will come from 1000 which is 2^3 =8 . But in reality we’ve seen a range of 15 user ids but we are saying we’ve seen only 8. Did I understand it wrong?
Designing Data Intensive Applications by Martin Kleppman is a fantastic reference to algorithms and storage structures used in popular distributed systems
Request, will it be possible for you to do a video on 'Design Online food ordering service (Uber eats/doordash/zomato) and explain how to integrate it with existing (Uber) ride-sharing service'
Bro, I have an ambition for doing developer job in Canada or New York base software company. At the beginning of my university life, I did problem-solving. After 1 year I discarded to problem-solving. Now Learning Node JS and all stuff related front end development. Can you suggest me the real path so that I will definitely get a job in an international company? Or will you suggest to start problem-solving again?? Thanks in advance!!
I used this algo to my existing dataset - using Presto's "approx_distinct" function, and got results to be +(3-4)% of the actual number. And this error rate was gradually increasing when I applied a "groupby" to the same query - Can you throw some light on this, why?
Interesting. I don't know the internal workings of Presto, especially for group results with approximates. Would love it if someone could help in the comments. Or I'll try and read up on this soon :)
Thanks bro. I love your videos.Videos were really helpful for me for understanding designing high scaling applications. I have a doubt. I have learn about CDN and Redis Cache Server. In my understanding both perform similar kind of functions (at least 70% same). Which will be better ? Are both needed for really high scalable application? If you need to explain the differences clearly ,if you have time you can make a video on it.😊😊
@@gkcs www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html Uses for cardinality aggregation. It mentions some hyperlog++ algorithm. God knows what hyperlog++ is. Infact, it must have started using the ++ version for 7.3 version. Before that, plain hyperloglog.
Facebook did not create the hyperloglog process , mathematicians did this en.wikipedia.org/wiki/Flajolet%E2%80%93Martin_algorithm. Please change the title to give them credit!
THIS COMMENT IS FOR MY PERSONAL REFERENCE. TO UNDERSTAND PROPERLY WATCH THE FULL VIDEO -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- requirement 0:13 brute force complexity 1:50 sol1 1:59 problem 3:17 problem statement 4:31 sol 5:15
Thank you for speeding moments when you don't talk(drawing), and thanks for your fast and understandable speech so I don't fall asleep, and Iam hooked all the time. Bravo!
So the Crux of this algo is smart bit manipulation , Very happy to see the usefulness of CP problems in real life ! 😄
Yup!
Great walkthrough. I’ve seen a few others and they didn’t quite click. Watched this and it makes perfect sense. Thanks!
Cheers!
wtf, I spent like 3 weeks reading papers about this structure, and I finally was able to understand it thanks to you.. Thanks!
:D
9:00 *"10 by 4 is 2.25" lol. Talk about professorial performance pressure! JK... Great job Gaurav*
Read a lot about this, but understood very well by watching this. Thanks for covering it in such good depth!
You're like the rockstar of algorithms.
Thanks a lot for the explanation, it was awesome!
You are welcome!
this thumbnail looks really good haha keep it up man! Thanks for these videos
Thanks! 😁
Thank you, Gaurav Sen!
Brotherrrrrr because of u I got placed , thank you so much for your system designing , seriously fan of yours 😍😍😍😍😍
Congratulations 😁
What a fantastic discussion, at a fantastic pace. Thank you!
You explained how this algorithm works better than my Prof!
you nailed it, thanks!
Brilliant video, thanks!
At 8:10, how you calculated 10, 01, 01 and 00 for those 4 userid's?
I am assuming a hash function which takes those numbers and outputs 10, 01, 01 and 10. So 00000 and 10011 give the same hash value of 01.
yes this part was poorly explained rest of video was good though
@@gkcs but is that based on some logic or you just made them up?
@@gkcs and how did you know those 2 numbers will give you 01
@@asakhala I think it's just a assumption, and the logic does not matter
Thank you so much! It was really easy to understand and really helpful!
Your videos are amazing, I keep learning a lot everytime.
Awesome concise explanation! Do more of these!
Great explanation... Thanks Gaurav.
I didn't understood much about the topic......but i understood the bit manipulation part......😃
I have used this Algorithm in my UDF. But the way you explained it man.. wow.. concepts are clear now.
Thanks!
Amazing explanation, really down to earth and well driven. Thank you very much.
Impressive demonstration ! Thks a lot
Hey Gaurav.. I just feel like you are covering advanced topics.. I personally expect you to post some videos for a beginner/intermediate coder like me.
Which topics are you looking for?
@@gkcs hashing
10/4 = 2.25 lol what a shitty channel...jk Gaurav, keep up the good work buddy, your videos are extremely helpful!!
Hahaha thanks!
Can someone explain to me why the time complexity is O(1)? If I have 100 bitstrings, I still have to check all 100 to see which one has the most zeroes, making the time complexity O(n). Or am I missing something?
Hi Gaurav, at 4:31 the hint say about bloom filter, but when you explained it, it seemed to me more like Flajolet Martin algorithm with some little bit of Bloom Filter, does that seem right?
Thanks @Graurav. Is the bucket approach requires multiple servers? Or they all sit in the same server?
Dear Gaurav. Hyperloglog uses leading zeroes.Not trailing zeroes.
For a random number, either one is good.
Hi Gaurav, this was an awesome video and I am watching the rest of the system design classes. I have having trouble with why the answer is loglog(7) at 7:28. The logic I used to justify is that the values from 0-64 have 65 numbers, which can only be expressed with 7 bits. Is this the correct logic?
Thanks again and looking forward to more content.
Thanks!
That's correct, the log of log of N is at most 7. We hence need at most 7 bits to represent this number.
I think they additionally use skip lists ( sorted sets in redis ) for live updates of count .
Interesting stuff.
@9:27 shouldn't it be 2.5?
Yes. Messed up there 😅
How do you remove elements from hyperloglog?
What do you mean by "hash the values based on the next two bits from the right side"? Can you please clarify sir?
Hii Gaurav. I've been watching your videos since a month and before that I had never started programming. Your videos are a little tough for me to understand quickly..can you provide any links or resources for programming ? Also what did you refer for programming?
I learnt programming in college, so I had 4 years and teachers to learn.
I'll try and find some links to start programming, and will post them in a future video 🙂
go with this teachyourselfcs.com
How does this algorithm be expanded when we want multiple metrics (ex: Num of distinct active users in each region (US,EU etc)). I could think of a solution but not sure if this probabilistic algorithm can be expanded in this way. Instead of storing max zeroes in each server, we may store hash map of max zeroes observed in each region. Then consolidate for overall. Does this work? Also, what happens when a user goes offline? How is the substraction from total done?
why isnt harmonic mean used here ?
So white was your fav color during this period
Lol, I have few colors :p
Maybe a noob question, the 2nd solution which has a hashmap where we store users, wouldn't we need to remove users if connection isn't active anymore? In general, whats the strategy around dead connections?
We could mark the connections as closed, or delete them on the database. I personally prefer marking them closed.
We are eventually using a hash table(buckets per node) of size m, for storing the max count. Then how does this imply that we have reduced the space complexity to O(1) ? Thank you.
Bro i have been watching your content for a while now n let me tell you its bomb for students getting started with system design, keep up and thanks for the tutorials.
Thanks!
how come the complexity of insertion came to be of order of 1 in second algo at 3:00
hashmap lookup is order 1 : O(1), barring heaps most DS would have "insertion complexity" 1 :D
Can you please clarify how u make the buckets
I have a question let’s say you’ve gotten user ids between 1-15 now when you go on to represent this range in binary your smallest will be 0001 and largest will be 1111 but if we follow the logic of keeping track of maximum number of 0’s we’ve seen in right then that will come from 1000 which is 2^3 =8 . But in reality we’ve seen a range of 15 user ids but we are saying we’ve seen only 8. Did I understand it wrong?
why are you ignoring 0000. If you consider this number then, the number of zeros would be 4 which gives 2^4 = 16.
You completely ignored the "Hash" function that converts the number into binary. It will not be that straightforward.
Hey Gaurav, This video is great. Kindly suggest me a Database Design Techniques book. Thanks in Advance!!!
Designing Data Intensive Applications by Martin Kleppman is a fantastic reference to algorithms and storage structures used in popular distributed systems
good video
Thank you!
Request, will it be possible for you to do a video on 'Design Online food ordering service (Uber eats/doordash/zomato) and explain how to integrate it with existing (Uber) ride-sharing service'
Hi Gaurav, how is number of zeros in user Id related to the number of users on a particular box?
I explained this in the video.
Is it necessary to be from an engineering background to get an IT job in India, what you think about coding bootcamps in India?
please make a video on system design irctc
Gauray , please add subtitle,my friend. Thx.
I have only one common question for you. How do you identify these algorithms and find explanations to it ? Just being curious.
Check out the sources in the description :)
Bro, I have an ambition for doing developer job in Canada or New York base software company. At the beginning of my university life, I did problem-solving. After 1 year I discarded to problem-solving. Now Learning Node JS and all stuff related front end development. Can you suggest me the real path so that I will definitely get a job in an international company? Or will you suggest to start problem-solving again?? Thanks in advance!!
Some what poorly explained is what I felt, compared to the other videos.
You should have explained with hashing first. This is a bit confusing. Cool algorithm though.
I dont think merging is that simple. we need to consider the correlation between two bucket columns.
but the number of buckets is small
something new and good . make me laugh at 7.32
😁
I used this algo to my existing dataset - using Presto's "approx_distinct" function, and got results to be +(3-4)% of the actual number. And this error rate was gradually increasing when I applied a "groupby" to the same query - Can you throw some light on this, why?
Interesting. I don't know the internal workings of Presto, especially for group results with approximates.
Would love it if someone could help in the comments. Or I'll try and read up on this soon :)
Gaurav - 10/4 is 2.5. Isn’t it?
Oh damn, sorry and thanks haha!
In the flow,it just happens sometimes.. Not a big deal..
Man you are talking like Pratham Mittal ;)
Where is your brother. ? are you not making videos with your brother anymore ? After getting job in Uber. ?
He is taking about log of log.. He had the guts to take it on before we do..
He is taking about log of log..
Are you an interviewbit graduate ?
Since no one talks about this. Let me do the revelation 10/4 = 2.5
@3:02 how is it hashmap when you have a table in db?
We can create an index on the column and assume a fast response on an "exists" query.
@@gkcs so index would take logn time and not constant right?
@@myjiosim2208 They can also be stored using hash data structures. UserID as key.
That's O(1) amortized.
09:00 10/4 = 2.5
Hey, 10/4 is 2.5, not 2.25 lol
9:01
Thanks bro. I love your videos.Videos were really helpful for me for understanding designing high scaling applications.
I have a doubt. I have learn about CDN and Redis Cache Server.
In my understanding both perform similar kind of functions (at least 70% same).
Which will be better ?
Are both needed for really high scalable application?
If you need to explain the differences clearly ,if you have time you can make a video on it.😊😊
Elasticsearch uses this for aggregations 😄😄
Really? Wow...
@@gkcs www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html
Uses for cardinality aggregation. It mentions some hyperlog++ algorithm. God knows what hyperlog++ is. Infact, it must have started using the ++ version for 7.3 version. Before that, plain hyperloglog.
0-64 - Can't they be represented by 6 bits? @7:16, Should be 6 bits
0-63 = 64 numbers in range. 6 bits required.
0-64 = 128> 65 >64 numbers in range. 7 bits required.
Facebook did not create the hyperloglog process , mathematicians did this en.wikipedia.org/wiki/Flajolet%E2%80%93Martin_algorithm. Please change the title to give them credit!
THIS COMMENT IS FOR MY PERSONAL REFERENCE. TO UNDERSTAND PROPERLY WATCH THE FULL VIDEO
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
requirement 0:13
brute force complexity 1:50
sol1 1:59
problem 3:17
problem statement 4:31
sol 5:15