The Power Of A Coin Toss In Big Data (CVM Algorithm)

Поделиться
HTML-код
  • Опубликовано: 5 окт 2024
  • To further enhance your computer science knowledge, go to brilliant.org/... to start your 30-day free trial and get 20% off an annual premium subscription.
    If you want to experiment with the algorithm yourself, I've included the Python scripts on my Patreon: / b001io
    💬 Discord: / discord
    🐦 Follow me on Twitter: / b001io
    🔗 More links: linktr.ee/b001io

Комментарии • 48

  • @abhiram_ar
    @abhiram_ar 3 месяца назад +29

    bro great job with the animation, could have take a whole day going through the paper

  • @shubhamnagpal1740
    @shubhamnagpal1740 3 месяца назад +10

    FKIN SICK VIDEO MAN I LOVE THE BEAUTY OF ALGORITHMS(DONT GET THE COMPLEX MATH THOUGH)

  • @claer000
    @claer000 3 месяца назад +9

    this is related to the Law of large numbers in statistics
    when you have an experiment like a coin flip with a prob of 0.5, if you throw it a high number of times, you'll eventually get half of the throws as tails and the other half as heads
    this is also how casinos always win money on the long term

  • @lel7531
    @lel7531 3 месяца назад +2

    Amazing video, great job ! Not sure I really got it fully but I'll dig deeper thanks !

  • @donaldlee8249
    @donaldlee8249 3 месяца назад

    Good job covering a wider range of topic in computer science

  • @Thekingslayer-ig5se
    @Thekingslayer-ig5se 3 месяца назад +4

    Great stuff man

  • @largewallofbeans9812
    @largewallofbeans9812 3 месяца назад +6

    I was fully expecting an Anton Chigurh reference in this video.

    • @b001
      @b001  3 месяца назад +2

      Just watched that movie for the first time a couple weeks ago 😂

    • @Boofski
      @Boofski 2 месяца назад

      “What’s the most you’ve ever lost on a coin toss? 🤨”

  • @manifestasisanubari
    @manifestasisanubari 3 месяца назад +4

    I don't understand. So, if a person watches a RUclips video, one view increases. But if he comes back to view the video again, there's a chance of a coin flip that he will be removed from the memory pool, hence decreasing the viewcount?

    • @Huwng1337
      @Huwng1337 3 месяца назад +3

      if you're trying to count how many unique people that has viewed the video that is.

    • @manifestasisanubari
      @manifestasisanubari 3 месяца назад

      ​@@Huwng1337 6:25 flips a coin. If it's true, it stays. If it's false, it's removed from the memory pool. Removed.

    • @Jardynq
      @Jardynq 3 месяца назад

      It's an approximation. Your example is insanely contrived and shows that the algorithm doesn't make sense for very small inputs, however the larger the inputs get (and the more memory you allocated) the smaller the error becomes.
      You need to use statistical tools like expected value and variance bounding to understand the algorithm. Using classic deterministic analysis won't work.

    • @kmdsummon
      @kmdsummon 3 месяца назад +1

      Watch counts do not count unique watches, they count all of those.

    • @Boofski
      @Boofski 2 месяца назад

      But once that happens the probability used in future iterations goes down, which in turn causes the approximation on the final amount of unique visitors to be calculated with a higher multiplier.
      So whilst the amount of unique visitors goes down, the multiplier in turn gets increased, compensating for this loss in stored values using predictive probability.
      (This is why it only really works effectively with large numbers)

  • @floschy_1
    @floschy_1 3 месяца назад

    There would also be the solution to go one by one threw the dataset and count up whenever the current value isnt already present in those before which dont need to be stored in-memory because you can re-iterate again. That would be at the cost of CPU time though while keeping the memory clear and accuracy high. I guess I/O operations would also be fairly high

  • @dvir-ross
    @dvir-ross 2 месяца назад

    Loved it. Thanks!

  • @tvwatch9669
    @tvwatch9669 3 месяца назад +1

    Out of curiosity, what do you animate with? : o

  • @Boofski
    @Boofski 2 месяца назад

    Banger vid 👍

  • @Boofski
    @Boofski 2 месяца назад

    I’m wondering, how does it know an address is a duplicate without iterating through all stored addresses first?
    It all makes sense to me except for that, wouldn’t it still have to look through every stored address in the memory to find if there is a duplicate every time a new address is added?
    I guess the memory will be smaller if only 50% or less of addresses get stored but still.

    • @CamoEnjoyer
      @CamoEnjoyer 4 дня назад +1

      Presumably in both methods the checking of if it's already in the stored values is done with some sort of hash, so you don't have to go through the entire memory, but rather just check whether the hash of the value is already taken, which is O(1).
      So there wouldn't be much difference in speed, since it would take N=(checking for duplicates) * (length of stream) operations either way.
      Rather it allows you to not store as much data. Notably the stored addresses could be A LOT less than 50%, although the video doesn't specifically state how low you can go for a given error rate. But with large data you might easily be looking at a 100x if not more memory reduction.

  • @settingsun1
    @settingsun1 3 месяца назад +1

    What's the most you have lost over a coin toss, b001?

    • @b001
      @b001  3 месяца назад +1

      👀

    • @echobravo5488
      @echobravo5488 2 месяца назад

      No Country For Old Men reference? I like it😆😆😆

  • @dipeshsamrawat7957
    @dipeshsamrawat7957 3 месяца назад

    Thank you 😊

  • @ZAPEM
    @ZAPEM 2 месяца назад

    YOO @b001 what's your theme I'm actually freaking out it looks soo good, I'm searching for it everywhere but i cant find it what's the theme😅?

    • @ZAPEM
      @ZAPEM 2 месяца назад

      btw great video keep it up😄

    • @b001
      @b001  2 месяца назад +1

      SynthWave’84 turn off glow

    • @ZAPEM
      @ZAPEM 2 месяца назад

      @@b001 thanks man

  • @C_itsNemo
    @C_itsNemo 25 дней назад

    Where’d you go? I love your videos

    • @b001
      @b001  25 дней назад

      One coming soon! Stay tuned!

  • @Blank-nc4yo
    @Blank-nc4yo 2 месяца назад

    @b001 what's the font you're using? I'm trying to find it for days but can't find it 😅

    • @b001
      @b001  2 месяца назад

      Brass Mono

  • @leagueofangels1866
    @leagueofangels1866 Месяц назад

    whats your VS code theme and how to get it

  • @airi2402
    @airi2402 3 месяца назад

    Nice video!

  • @ahmedabdulgawad4758
    @ahmedabdulgawad4758 19 дней назад

    does anyone know the name of the theme

  • @GursimarSinghMiglani
    @GursimarSinghMiglani 3 месяца назад

    Very cool

  • @Sashim1X
    @Sashim1X 3 месяца назад +8

    55 seconds ago is crazy

    • @null-0x
      @null-0x 3 месяца назад +1

      "55 seconds ago is crazy" aah comment

    • @CrushedAsian255
      @CrushedAsian255 3 месяца назад +2

      @@null-0x ""55 seconds ago is crazy" aah comment" ahh comment

    • @jawadbenbrahim5933
      @jawadbenbrahim5933 Месяц назад

      Blud thinks he's in Instagram reels

  • @jimmlmao
    @jimmlmao 3 месяца назад +3

    bro just use a long for counting unique vistors

    • @CrushedAsian255
      @CrushedAsian255 3 месяца назад +6

      the main problem is detecting duplicates
      let's say you had a `long uniqueVisitors = 100;`
      this doesn't tell me if specifically visitor A has already visited
      a long would work if you were counting total views,
      but it doesn't work for counting *unique* visitors.

    • @jimmlmao
      @jimmlmao 3 месяца назад +4

      @CrushedAsian255 it's easier to determine if a visitor is visiting for the first time client side so we just store a cookie that tells us that we have visited before, and if that cookie doesn't exist we can just send a message to the server telling it to increment the counter

    • @AshifKhan-sn6jx
      @AshifKhan-sn6jx 3 месяца назад +6

      That works for only websites but would it work for other scenarios?
      and what if we don't want to store that data on the client side?

  • @MatheoDampfer-nl3no
    @MatheoDampfer-nl3no 3 месяца назад +1

    1

  • @caiwinson3083
    @caiwinson3083 3 месяца назад

    Hi