Understanding Shannon entropy: (2) variability and bits

Поделиться
HTML-код
  • Опубликовано: 4 окт 2024
  • In this series of videos we'll try to bring some clarity to the concept of entropy. We'll specifically take the Shannon entropy and:
    show that it represents the variability of the elements within a distribution, how different are they from each other (general characterization that works in all disciplines)
    show that this variability is measured in terms of the minimum number of questions needed to identify an element in the distribution (link to information theory)
    show that this is related to the logarithm of the number of permutations over large sequences (link to combinatorics)
    show that it is not in general coordinate independent (and that the KL divergence does not fix this)
    show that it is coordinate independent on physical state spaces - classical phase space and quantum Hilbert space (that is why those spaces are important in physics)
    show the link between the Shannon entropy to the Boltzmann, Gibbs and Von Neumann entropies (link to physics)
    Most of these ideas are from our paper:
    arxiv.org/abs/...
    which is part of our bigger project Assumptions of Physics:
    assumptionsofp...

Комментарии • 17

  • @gcarcassi
    @gcarcassi  3 года назад +12

    DISCLAIMER: I do not get any money from these videos, nor would I want to. I am not even eligible for monetization, and if I were I would it turn it off. RUclips put the ads on these videos, it didn't use to, it is out of my control, and I get nothing from them. I find it shameful that RUclips exploits my desire to share my knowledge for free and your desire to learn.

  • @TristoMegistos
    @TristoMegistos 2 месяца назад

    This and first video in the series let me understand what entropy really means. I watched few other videos before and read some articles but only this one explained things intuitively.

  • @deathraylabs_nature
    @deathraylabs_nature 3 года назад +3

    You are a phenomenal communicator and teacher. Thank you for taking the time to post these videos!

    • @gcarcassi
      @gcarcassi  3 года назад

      Thanks for the kind words! It means a lot!

  • @shinimasud8931
    @shinimasud8931 11 месяцев назад

    Wow i spent at least 2 hours today researching and finally I got it thanks a lot!

  • @mukeshjoshi1042
    @mukeshjoshi1042 2 года назад +1

    Very useful information for beginner learner. I am one of them. Thank you.

  • @geoffrygifari3377
    @geoffrygifari3377 2 года назад

    I think i noticed something in the hoffmann coding example with pets:
    1. any time a member has a Pᵢ roughly around twice as large as another member, the "chain" of questions drop once (i view it as dog! answer being one level below cat! answer)
    2. But when the members have Pᵢ's around the same value, it tends to lie on the "same level" (fish! and reptile! , bird! and small mammall!)
    3. It takes caution to treat cases with OR and AND compared to asking for one element directly

  • @whatitmeans
    @whatitmeans 2 месяца назад

    I have saw the full Entropy video series, and I think there is another Information Theory based explanation for Entropy that, maybe not as usefull as the ones you shared, it do gives a more intuitive explanation of what it tries to achieve: please extend the serie to the explanation given in the first 10 pages of the paper "Transmission of Information" by R. V. L. Hartley (1927)

    • @whatitmeans
      @whatitmeans 2 месяца назад

      by the way... StatQuest channel also have a good video about the meaning of Entropy for data science, where he simply explains the counterintuive relation about the "Surprise" concept: actually the weirder scenarios are those who carries more information... I cannot share the link here but I invite you to search for it, surely you already know what is said, but it could give you some ideas to extend this entropy series.

  • @ervaucher
    @ervaucher 3 года назад +1

    Dear Gabriele Carcassi,
    First of all, thank you very much for your videos, they are great.
    Still, something puzzles me. When you say around 08:20 that "Information Theory" (moreover Shannon's entropy) is about data, not knowledge, I think I understand what you mean. But, something doesn't feel quite right about this. Because, when we follow Shannon's methodology in order to measure the entropy of a given message, it seems that we always have to have some prior "knowledge" about the probability distribution (or message space) of the set we are looking at in order to do anything about it.
    It seems that, in Shannon's theory, information is always shifting from a point to another, but that, in fact, it is always a matter of information/knowledge. When the entropy of a message (i.e : "the information" contained in that message) falls to zero, it's in fact that the knowledge has moved from the message to the observer who now knows the exact distribution of symbols in the message.
    Not sure if you see what I mean by that. But I have the feeling that we often present Shannon's theory as being "blind" to the question of "knowledge", or "meaning", when in fact it always relies on prior knowledge to be executable, and achieve any kind of measurement.
    Anyway, thank you again !
    Best regards,
    Elliot.

    • @gcarcassi
      @gcarcassi  3 года назад

      Dear Elliot,
      Thanks for your note!
      For the information shifting from point to point, some people like to characterize Shannon entropy as "lack of information" in the sense, if I understand correctly, that you mean. The number of bits in the message represents what needs to be communicated, which is the information the receiver is missing. This makes total sense expecially in the context of communication theory, which is what Shannon was doing.
      As for Shannon's theory as being "blind" to "knowledge"... Yes, to interpret and setup any kind of measurement of entropy, you need prior knowledge of the context. That is what I was trying to clarify in the video since I find many physicists miss that. But this is true in general: to really know what the value of a measurement means, you typically need to know the details of the experimental setup.
      When I say that bits are about data and not knowledge, I simply mean the following. If I asked "how big is the trunk of your car?" you may say "5 meters cube": meters cube measure volume so that's an appropriate answer. If I asked "how much do you know about cars?" you can't say "3 kilobytes". That does not follow. If I asked "how much data does the experiment gather every day?" you may say "3 kilobytes". That would be appropriate, and you'd use it to properly size the data pipelines. I guess, in the narrowest sense, this is the point I was trying to make.
      And this is what is kind of frustrating about youtube: I can't modify the video to add the example!
      Gabriele

    • @ervaucher
      @ervaucher 3 года назад +1

      Dear Gabriele,
      Thanks for your answer, it's an honor and a privilege to discuss this subject with you.
      I'm so glad you came up with this "lack of information" idea, it's exactly what I was writing down while trying to wrap my mind around this Shannon entropy concept. To a certain extent, it seems that the only way to reduce the amount of entropy "inside" a given message is to have all the possible information "outside" of it.
      Anyway.
      Your last paragraph makes your point perfectly clear, and again, my question wasn't aimed at a lack of clarity in your explanation in the first place, it was about a lack of clarity in the echo it had with my own doubts regarding a subject that I'm just starting to discover, and, I hope, starting to understand gradually.
      Sincerely yours,
      Elliot.

    • @gcarcassi
      @gcarcassi  3 года назад

      Dear Elliot,
      Thank you for the kind words! Glad I could help!

  • @abhijithalder7094
    @abhijithalder7094 3 года назад

    Hello...
    I can't get the point how do you calculate the Avg. No. of question in the worse and better strategies?

    • @gcarcassi
      @gcarcassi  3 года назад +1

      You multiply the number of questions of each case times the probability of having that case. For example, in the first strategy you will have a square 12.5% of the times, and it will take you 2 question to identify it. Therefore it will contribute 12.5% * 2 (i.e. 0.125 * 2) questions on average. This is the second term of the sum in the bottom left. You proceed like that for all the terms. Is that better?

    • @abhijithalder7094
      @abhijithalder7094 3 года назад +1

      @@gcarcassi Actually, I forget to count the last possibility at that time...
      Anyway, thanks for your reply and also for delivering this kind of content to us....