Biggest weakness of HLL is dealing with JIRA tickets requesting new data sets from the folks running your query budget into the ground. 😂 This is a great explanation. Using traffic data is much easier to understand, since most real-life deployments I’ve seen are counting quite abstract things (in the trillions per day). Also, the SF Driver is a nice real-life PEBKAC analogy. :)
This seems like a one way hash on the entries which outputs a smaller value (with more collisions). Eg a ten character input string gets hashed to a digit number and stored. I have no idea how it's actually implemented but thats what it seems to do in my opinion
anybody thinks it would be helpful if the background music is turned off?
Biggest weakness of HLL is dealing with JIRA tickets requesting new data sets from the folks running your query budget into the ground. 😂
This is a great explanation. Using traffic data is much easier to understand, since most real-life deployments I’ve seen are counting quite abstract things (in the trillions per day).
Also, the SF Driver is a nice real-life PEBKAC analogy. :)
This seems like a one way hash on the entries which outputs a smaller value (with more collisions). Eg a ten character input string gets hashed to a digit number and stored. I have no idea how it's actually implemented but thats what it seems to do in my opinion
Here's a writeup by the Redis Hyperloglog author himself: antirez.com/news/75
Didn't notice any HyperLogLog explanations the video. Not even why the word Log is repeated twice in the name...
HyperLogLog is in fact a name of an algorithm, and that algo has O(loglog n) (not sure though)
Cheesy and superb
That's what I aim for! - Justin