Thank you for speeding moments when you don't talk(drawing), and thanks for your fast and understandable speech so I don't fall asleep, and Iam hooked all the time. Bravo!
Can someone explain to me why the time complexity is O(1)? If I have 100 bitstrings, I still have to check all 100 to see which one has the most zeroes, making the time complexity O(n). Or am I missing something?
Hi Gaurav, at 4:31 the hint say about bloom filter, but when you explained it, it seemed to me more like Flajolet Martin algorithm with some little bit of Bloom Filter, does that seem right?
Hi Gaurav, this was an awesome video and I am watching the rest of the system design classes. I have having trouble with why the answer is loglog(7) at 7:28. The logic I used to justify is that the values from 0-64 have 65 numbers, which can only be expressed with 7 bits. Is this the correct logic? Thanks again and looking forward to more content.
Hey Gaurav.. I just feel like you are covering advanced topics.. I personally expect you to post some videos for a beginner/intermediate coder like me.
Maybe a noob question, the 2nd solution which has a hashmap where we store users, wouldn't we need to remove users if connection isn't active anymore? In general, whats the strategy around dead connections?
How does this algorithm be expanded when we want multiple metrics (ex: Num of distinct active users in each region (US,EU etc)). I could think of a solution but not sure if this probabilistic algorithm can be expanded in this way. Instead of storing max zeroes in each server, we may store hash map of max zeroes observed in each region. Then consolidate for overall. Does this work? Also, what happens when a user goes offline? How is the substraction from total done?
We are eventually using a hash table(buckets per node) of size m, for storing the max count. Then how does this imply that we have reduced the space complexity to O(1) ? Thank you.
Hii Gaurav. I've been watching your videos since a month and before that I had never started programming. Your videos are a little tough for me to understand quickly..can you provide any links or resources for programming ? Also what did you refer for programming?
I learnt programming in college, so I had 4 years and teachers to learn. I'll try and find some links to start programming, and will post them in a future video 🙂
I used this algo to my existing dataset - using Presto's "approx_distinct" function, and got results to be +(3-4)% of the actual number. And this error rate was gradually increasing when I applied a "groupby" to the same query - Can you throw some light on this, why?
Interesting. I don't know the internal workings of Presto, especially for group results with approximates. Would love it if someone could help in the comments. Or I'll try and read up on this soon :)
Bro, I have an ambition for doing developer job in Canada or New York base software company. At the beginning of my university life, I did problem-solving. After 1 year I discarded to problem-solving. Now Learning Node JS and all stuff related front end development. Can you suggest me the real path so that I will definitely get a job in an international company? Or will you suggest to start problem-solving again?? Thanks in advance!!
Designing Data Intensive Applications by Martin Kleppman is a fantastic reference to algorithms and storage structures used in popular distributed systems
I have a question let’s say you’ve gotten user ids between 1-15 now when you go on to represent this range in binary your smallest will be 0001 and largest will be 1111 but if we follow the logic of keeping track of maximum number of 0’s we’ve seen in right then that will come from 1000 which is 2^3 =8 . But in reality we’ve seen a range of 15 user ids but we are saying we’ve seen only 8. Did I understand it wrong?
Request, will it be possible for you to do a video on 'Design Online food ordering service (Uber eats/doordash/zomato) and explain how to integrate it with existing (Uber) ride-sharing service'
Bro i have been watching your content for a while now n let me tell you its bomb for students getting started with system design, keep up and thanks for the tutorials.
@@gkcs www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html Uses for cardinality aggregation. It mentions some hyperlog++ algorithm. God knows what hyperlog++ is. Infact, it must have started using the ++ version for 7.3 version. Before that, plain hyperloglog.
Thanks bro. I love your videos.Videos were really helpful for me for understanding designing high scaling applications. I have a doubt. I have learn about CDN and Redis Cache Server. In my understanding both perform similar kind of functions (at least 70% same). Which will be better ? Are both needed for really high scalable application? If you need to explain the differences clearly ,if you have time you can make a video on it.😊😊
Facebook did not create the hyperloglog process , mathematicians did this en.wikipedia.org/wiki/Flajolet%E2%80%93Martin_algorithm. Please change the title to give them credit!
THIS COMMENT IS FOR MY PERSONAL REFERENCE. TO UNDERSTAND PROPERLY WATCH THE FULL VIDEO -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- requirement 0:13 brute force complexity 1:50 sol1 1:59 problem 3:17 problem statement 4:31 sol 5:15
9:00 *"10 by 4 is 2.25" lol. Talk about professorial performance pressure! JK... Great job Gaurav*
Thank you for speeding moments when you don't talk(drawing), and thanks for your fast and understandable speech so I don't fall asleep, and Iam hooked all the time. Bravo!
At 8:10, how you calculated 10, 01, 01 and 00 for those 4 userid's?
I am assuming a hash function which takes those numbers and outputs 10, 01, 01 and 10. So 00000 and 10011 give the same hash value of 01.
yes this part was poorly explained rest of video was good though
@@gkcs but is that based on some logic or you just made them up?
@@gkcs and how did you know those 2 numbers will give you 01
@@asakhala I think it's just a assumption, and the logic does not matter
So the Crux of this algo is smart bit manipulation , Very happy to see the usefulness of CP problems in real life ! 😄
Yup!
wtf, I spent like 3 weeks reading papers about this structure, and I finally was able to understand it thanks to you.. Thanks!
:D
this thumbnail looks really good haha keep it up man! Thanks for these videos
Thanks! 😁
Great walkthrough. I’ve seen a few others and they didn’t quite click. Watched this and it makes perfect sense. Thanks!
Cheers!
Read a lot about this, but understood very well by watching this. Thanks for covering it in such good depth!
10/4 = 2.25 lol what a shitty channel...jk Gaurav, keep up the good work buddy, your videos are extremely helpful!!
Hahaha thanks!
What a fantastic discussion, at a fantastic pace. Thank you!
Thank you, Gaurav Sen!
You're like the rockstar of algorithms.
you nailed it, thanks!
Thanks a lot for the explanation, it was awesome!
You are welcome!
Can someone explain to me why the time complexity is O(1)? If I have 100 bitstrings, I still have to check all 100 to see which one has the most zeroes, making the time complexity O(n). Or am I missing something?
Hi Gaurav, at 4:31 the hint say about bloom filter, but when you explained it, it seemed to me more like Flajolet Martin algorithm with some little bit of Bloom Filter, does that seem right?
Hi Gaurav, this was an awesome video and I am watching the rest of the system design classes. I have having trouble with why the answer is loglog(7) at 7:28. The logic I used to justify is that the values from 0-64 have 65 numbers, which can only be expressed with 7 bits. Is this the correct logic?
Thanks again and looking forward to more content.
Thanks!
That's correct, the log of log of N is at most 7. We hence need at most 7 bits to represent this number.
Brilliant video, thanks!
Awesome concise explanation! Do more of these!
Thanks @Graurav. Is the bucket approach requires multiple servers? Or they all sit in the same server?
@9:27 shouldn't it be 2.5?
Yes. Messed up there 😅
Your videos are amazing, I keep learning a lot everytime.
Thank you so much! It was really easy to understand and really helpful!
How do you remove elements from hyperloglog?
Brotherrrrrr because of u I got placed , thank you so much for your system designing , seriously fan of yours 😍😍😍😍😍
Congratulations 😁
You explained how this algorithm works better than my Prof!
Impressive demonstration ! Thks a lot
Amazing explanation, really down to earth and well driven. Thank you very much.
What do you mean by "hash the values based on the next two bits from the right side"? Can you please clarify sir?
I didn't understood much about the topic......but i understood the bit manipulation part......😃
how come the complexity of insertion came to be of order of 1 in second algo at 3:00
hashmap lookup is order 1 : O(1), barring heaps most DS would have "insertion complexity" 1 :D
Great explanation... Thanks Gaurav.
Dear Gaurav. Hyperloglog uses leading zeroes.Not trailing zeroes.
For a random number, either one is good.
Hey Gaurav.. I just feel like you are covering advanced topics.. I personally expect you to post some videos for a beginner/intermediate coder like me.
Which topics are you looking for?
@@gkcs hashing
I have used this Algorithm in my UDF. But the way you explained it man.. wow.. concepts are clear now.
Thanks!
why isnt harmonic mean used here ?
Maybe a noob question, the 2nd solution which has a hashmap where we store users, wouldn't we need to remove users if connection isn't active anymore? In general, whats the strategy around dead connections?
We could mark the connections as closed, or delete them on the database. I personally prefer marking them closed.
How does this algorithm be expanded when we want multiple metrics (ex: Num of distinct active users in each region (US,EU etc)). I could think of a solution but not sure if this probabilistic algorithm can be expanded in this way. Instead of storing max zeroes in each server, we may store hash map of max zeroes observed in each region. Then consolidate for overall. Does this work? Also, what happens when a user goes offline? How is the substraction from total done?
Can you please clarify how u make the buckets
I think they additionally use skip lists ( sorted sets in redis ) for live updates of count .
Interesting stuff.
Hi Gaurav, how is number of zeros in user Id related to the number of users on a particular box?
I explained this in the video.
So white was your fav color during this period
Lol, I have few colors :p
please make a video on system design irctc
We are eventually using a hash table(buckets per node) of size m, for storing the max count. Then how does this imply that we have reduced the space complexity to O(1) ? Thank you.
Hii Gaurav. I've been watching your videos since a month and before that I had never started programming. Your videos are a little tough for me to understand quickly..can you provide any links or resources for programming ? Also what did you refer for programming?
I learnt programming in college, so I had 4 years and teachers to learn.
I'll try and find some links to start programming, and will post them in a future video 🙂
go with this teachyourselfcs.com
I used this algo to my existing dataset - using Presto's "approx_distinct" function, and got results to be +(3-4)% of the actual number. And this error rate was gradually increasing when I applied a "groupby" to the same query - Can you throw some light on this, why?
Interesting. I don't know the internal workings of Presto, especially for group results with approximates.
Would love it if someone could help in the comments. Or I'll try and read up on this soon :)
Bro, I have an ambition for doing developer job in Canada or New York base software company. At the beginning of my university life, I did problem-solving. After 1 year I discarded to problem-solving. Now Learning Node JS and all stuff related front end development. Can you suggest me the real path so that I will definitely get a job in an international company? Or will you suggest to start problem-solving again?? Thanks in advance!!
Hey Gaurav, This video is great. Kindly suggest me a Database Design Techniques book. Thanks in Advance!!!
Designing Data Intensive Applications by Martin Kleppman is a fantastic reference to algorithms and storage structures used in popular distributed systems
@3:02 how is it hashmap when you have a table in db?
We can create an index on the column and assume a fast response on an "exists" query.
@@gkcs so index would take logn time and not constant right?
@@myjiosim2208 They can also be stored using hash data structures. UserID as key.
That's O(1) amortized.
I have only one common question for you. How do you identify these algorithms and find explanations to it ? Just being curious.
Check out the sources in the description :)
I have a question let’s say you’ve gotten user ids between 1-15 now when you go on to represent this range in binary your smallest will be 0001 and largest will be 1111 but if we follow the logic of keeping track of maximum number of 0’s we’ve seen in right then that will come from 1000 which is 2^3 =8 . But in reality we’ve seen a range of 15 user ids but we are saying we’ve seen only 8. Did I understand it wrong?
why are you ignoring 0000. If you consider this number then, the number of zeros would be 4 which gives 2^4 = 16.
You completely ignored the "Hash" function that converts the number into binary. It will not be that straightforward.
Request, will it be possible for you to do a video on 'Design Online food ordering service (Uber eats/doordash/zomato) and explain how to integrate it with existing (Uber) ride-sharing service'
Are you an interviewbit graduate ?
Bro i have been watching your content for a while now n let me tell you its bomb for students getting started with system design, keep up and thanks for the tutorials.
Thanks!
Is it necessary to be from an engineering background to get an IT job in India, what you think about coding bootcamps in India?
Gauray , please add subtitle,my friend. Thx.
Gaurav - 10/4 is 2.5. Isn’t it?
Oh damn, sorry and thanks haha!
In the flow,it just happens sometimes.. Not a big deal..
something new and good . make me laugh at 7.32
😁
You should have explained with hashing first. This is a bit confusing. Cool algorithm though.
Where is your brother. ? are you not making videos with your brother anymore ? After getting job in Uber. ?
He is taking about log of log.. He had the guts to take it on before we do..
He is taking about log of log..
Hey, 10/4 is 2.5, not 2.25 lol
9:01
I dont think merging is that simple. we need to consider the correlation between two bucket columns.
but the number of buckets is small
09:00 10/4 = 2.5
good video
Thank you!
Some what poorly explained is what I felt, compared to the other videos.
Since no one talks about this. Let me do the revelation 10/4 = 2.5
Man you are talking like Pratham Mittal ;)
Elasticsearch uses this for aggregations 😄😄
Really? Wow...
@@gkcs www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html
Uses for cardinality aggregation. It mentions some hyperlog++ algorithm. God knows what hyperlog++ is. Infact, it must have started using the ++ version for 7.3 version. Before that, plain hyperloglog.
Thanks bro. I love your videos.Videos were really helpful for me for understanding designing high scaling applications.
I have a doubt. I have learn about CDN and Redis Cache Server.
In my understanding both perform similar kind of functions (at least 70% same).
Which will be better ?
Are both needed for really high scalable application?
If you need to explain the differences clearly ,if you have time you can make a video on it.😊😊
0-64 - Can't they be represented by 6 bits? @7:16, Should be 6 bits
0-63 = 64 numbers in range. 6 bits required.
0-64 = 128> 65 >64 numbers in range. 7 bits required.
Facebook did not create the hyperloglog process , mathematicians did this en.wikipedia.org/wiki/Flajolet%E2%80%93Martin_algorithm. Please change the title to give them credit!
THIS COMMENT IS FOR MY PERSONAL REFERENCE. TO UNDERSTAND PROPERLY WATCH THE FULL VIDEO
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
requirement 0:13
brute force complexity 1:50
sol1 1:59
problem 3:17
problem statement 4:31
sol 5:15