The Power Of A Coin Toss In Big Data (CVM Algorithm)
HTML-код
- Опубликовано: 5 окт 2024
- To further enhance your computer science knowledge, go to brilliant.org/... to start your 30-day free trial and get 20% off an annual premium subscription.
If you want to experiment with the algorithm yourself, I've included the Python scripts on my Patreon: / b001io
💬 Discord: / discord
🐦 Follow me on Twitter: / b001io
🔗 More links: linktr.ee/b001io
bro great job with the animation, could have take a whole day going through the paper
FKIN SICK VIDEO MAN I LOVE THE BEAUTY OF ALGORITHMS(DONT GET THE COMPLEX MATH THOUGH)
this is related to the Law of large numbers in statistics
when you have an experiment like a coin flip with a prob of 0.5, if you throw it a high number of times, you'll eventually get half of the throws as tails and the other half as heads
this is also how casinos always win money on the long term
Amazing video, great job ! Not sure I really got it fully but I'll dig deeper thanks !
Good job covering a wider range of topic in computer science
Great stuff man
I was fully expecting an Anton Chigurh reference in this video.
Just watched that movie for the first time a couple weeks ago 😂
“What’s the most you’ve ever lost on a coin toss? 🤨”
I don't understand. So, if a person watches a RUclips video, one view increases. But if he comes back to view the video again, there's a chance of a coin flip that he will be removed from the memory pool, hence decreasing the viewcount?
if you're trying to count how many unique people that has viewed the video that is.
@@Huwng1337 6:25 flips a coin. If it's true, it stays. If it's false, it's removed from the memory pool. Removed.
It's an approximation. Your example is insanely contrived and shows that the algorithm doesn't make sense for very small inputs, however the larger the inputs get (and the more memory you allocated) the smaller the error becomes.
You need to use statistical tools like expected value and variance bounding to understand the algorithm. Using classic deterministic analysis won't work.
Watch counts do not count unique watches, they count all of those.
But once that happens the probability used in future iterations goes down, which in turn causes the approximation on the final amount of unique visitors to be calculated with a higher multiplier.
So whilst the amount of unique visitors goes down, the multiplier in turn gets increased, compensating for this loss in stored values using predictive probability.
(This is why it only really works effectively with large numbers)
There would also be the solution to go one by one threw the dataset and count up whenever the current value isnt already present in those before which dont need to be stored in-memory because you can re-iterate again. That would be at the cost of CPU time though while keeping the memory clear and accuracy high. I guess I/O operations would also be fairly high
Loved it. Thanks!
Out of curiosity, what do you animate with? : o
Banger vid 👍
I’m wondering, how does it know an address is a duplicate without iterating through all stored addresses first?
It all makes sense to me except for that, wouldn’t it still have to look through every stored address in the memory to find if there is a duplicate every time a new address is added?
I guess the memory will be smaller if only 50% or less of addresses get stored but still.
Presumably in both methods the checking of if it's already in the stored values is done with some sort of hash, so you don't have to go through the entire memory, but rather just check whether the hash of the value is already taken, which is O(1).
So there wouldn't be much difference in speed, since it would take N=(checking for duplicates) * (length of stream) operations either way.
Rather it allows you to not store as much data. Notably the stored addresses could be A LOT less than 50%, although the video doesn't specifically state how low you can go for a given error rate. But with large data you might easily be looking at a 100x if not more memory reduction.
What's the most you have lost over a coin toss, b001?
👀
No Country For Old Men reference? I like it😆😆😆
Thank you 😊
YOO @b001 what's your theme I'm actually freaking out it looks soo good, I'm searching for it everywhere but i cant find it what's the theme😅?
btw great video keep it up😄
SynthWave’84 turn off glow
@@b001 thanks man
Where’d you go? I love your videos
One coming soon! Stay tuned!
@b001 what's the font you're using? I'm trying to find it for days but can't find it 😅
Brass Mono
whats your VS code theme and how to get it
Nice video!
does anyone know the name of the theme
Very cool
55 seconds ago is crazy
"55 seconds ago is crazy" aah comment
@@null-0x ""55 seconds ago is crazy" aah comment" ahh comment
Blud thinks he's in Instagram reels
bro just use a long for counting unique vistors
the main problem is detecting duplicates
let's say you had a `long uniqueVisitors = 100;`
this doesn't tell me if specifically visitor A has already visited
a long would work if you were counting total views,
but it doesn't work for counting *unique* visitors.
@CrushedAsian255 it's easier to determine if a visitor is visiting for the first time client side so we just store a cookie that tells us that we have visited before, and if that cookie doesn't exist we can just send a message to the server telling it to increment the counter
That works for only websites but would it work for other scenarios?
and what if we don't want to store that data on the client side?
1
Hi