The Most Important (and Surprising) Result from Information Theory

Поделиться
HTML-код
  • Опубликовано: 12 июл 2024
  • The machine learning consultancy: truetheta.io
    Join my email list to get educational and useful articles (and nothing else!): mailchi.mp/truetheta/true-the...
    Want to work together? See here: truetheta.io/about/#want-to-w...
    Information Theory contains one idea in particular that has had an incredibly impact on our society.
    David MacKay's lecture: • Information Theory, Pa...
    SOCIAL MEDIA
    LinkedIn : / dj-rich-90b91753
    Twitter : / duanejrich
    Github: github.com/Duane321
    Enjoy learning this way? Want me to make more videos? Consider supporting me on Patreon: / mutualinformation
    SOURCES
    Chapters 1, 9 and 10 from [1] were the primary source. Chapter 7 of [3] was a secondary source. [2] is inclusive of Shannon's 1948 original paper and provided useful historical context.
    [1] D. MacKay. Information Theory, Inference and Learning Algorithms. Cambridge University Press. 2003.
    [2] C. E. Shannon and W. Weaver. The Mathematical Theory of Communication. University of Illinois Press. 1964
    [3] T. M. Cover and J. A. Thomas. The Elements of Information Theory, 2nd Edition. Wiley-Interscience. 2006.
    TIMESTAMPS
    0:00 Problem Statement and the R3 Coding Strategy
    2:06 Bit Error Probability and Rate
    3:30 The Trillion Dollar Question
    4:12 Claude Shannon Proves Something Remarkable
    6:11 Sidebar on other Educational Content
    6:49 The Trick
    7:38 Check out David Mackay's Textbook and Lectures, plus Thank You

Комментарии • 115

  • @minecrafting_il
    @minecrafting_il 6 месяцев назад +6

    so, with f=0.1, there EXISTS an encoding scheme such that the data transfer rate is only about halved, but the probability of error on any one encoded bit is less than 1/Σ(1000) where Σ is the Busy Beaver function.
    and it only halves the data rate. and practically deletes noise. math is incredible.

  • @HufiVideo
    @HufiVideo 8 месяцев назад +20

    To quote my PhD supervisor, Prof. James L. Massey: "Do Communication Engineers Need Information Theory? - Not unless they want to understand what they are doing and to be at the forefront of future advances in communications!"

  • @tommasobanfi8133
    @tommasobanfi8133 9 месяцев назад +125

    i expected you to explain how the encoding works 😢

    • @Mutual_Information
      @Mutual_Information  9 месяцев назад +71

      I get that. Weirdly, Shannon gave a non constructive proof. He just showed the coding strategies exist, not how to construct them. And coding theory, the theory for constructing coding strategies, is a very deep topic. 3Blue1Brown explains some codes much better than I could, so you could check those out if you haven't already.

    • @thomassynths
      @thomassynths 9 месяцев назад +9

      Numberphile recently published a video on one of these encodings, titled "Error Correcting Curves - Numberphile"

    • @melodyecho4156
      @melodyecho4156 8 месяцев назад +3

      Three blue 1 brown's videos on hamming codes are great for the mathematical explanations

    • @leif1075
      @leif1075 8 месяцев назад

      ​@@Mutual_Informationthabks can you not do a condensed version of that?

    • @tomasstana5423
      @tomasstana5423 7 месяцев назад +1

      ​@@leif1075What? Can you not decide what others want or dont want to watch? Thx.

  • @rcoatney_maths
    @rcoatney_maths 9 месяцев назад +85

    MacKay's book is brilliant. I used it extensively in grad school. I love that all the lectures are online.

    • @Mutual_Information
      @Mutual_Information  9 месяцев назад +11

      Yea! I've long enjoyed the text but I only recently discovered the lectures online. Such a pleasure learning the book from the man himself.

    • @Mutual_Information
      @Mutual_Information  8 месяцев назад +1

      @@jakebrowning2373In the description

    • @diegomanrique4716
      @diegomanrique4716 7 месяцев назад

      Fibonacci sequence

    • @diegomanrique4716
      @diegomanrique4716 7 месяцев назад

      Theory information

    • @diegomanrique4716
      @diegomanrique4716 7 месяцев назад

      Cuasi crystals

  • @davecool42
    @davecool42 8 месяцев назад +20

    Same reason why it’s possible to encode frequencies up to half of the sample rate. He was a brilliant guy.

    • @manamimnm
      @manamimnm 6 месяцев назад

      Only under perfect conditions; you would want to go higher in sampling frequency if you wish to maintain high fidelity.

    • @Bakexcake1
      @Bakexcake1 6 месяцев назад +1

      I am a little bit confused as I thought half the sampling rate thing was from Nyquist. What is the relation between Shannon in this case?

  • @mattshnoop
    @mattshnoop 8 месяцев назад +7

    Wow. This video did not feel like 9 minutes long. Amazing work

  • @adrianopastore9800
    @adrianopastore9800 9 месяцев назад +34

    I do research in information and communication theory for a living, and I must say this is one of the best introductory videos to Shannon's coding theorem that I have come across. The only thing I would object to is that the mention of "doubling the message length" at 5:07 comes out of nowhere: it might be confusing to newbies, because 1) you haven't talked about "message length" up until then (you only do at the very end), 2) Shannon's theorem does not say anything about how much longer ("doubling"?) the message length ought to be. But nevertheless, congrats for the video's quality and for your channel in general. I would be thrilled to see a version 2.0 of this, perhaps with some extensions and additional explanations. Or even a short series (Hamming codes, error exponents, you name it)

    • @Mutual_Information
      @Mutual_Information  9 месяцев назад +9

      Hey it's very cool to get the eyes of a pro on this. I try to consider that as I'm creating it and that helps keep the standard high. And thank you for your compliment and notes. Some of it is due to me making a rather condensed video.. but I shouldn't let that cut out essential meat. I create reflection notes on each vid, and so this will be included.
      And regarding a follow up video, I don't have solid plans at the moment, but a follow up vid is likely over the longer term. I was thinking about asymptotic equipartition property actually, since understanding that reveals a lot about many info theory proofs.

    • @samanthae.1002
      @samanthae.1002 8 месяцев назад +1

      @@Mutual_Informationsuch respect online brings me joy

  • @LeoDaLionEdits
    @LeoDaLionEdits 8 месяцев назад +1

    Love your videos!

  • @losthalo428
    @losthalo428 7 месяцев назад +5

    Just stumbled upon your channel, you’re amazing dude

  • @lqlaliut897
    @lqlaliut897 9 месяцев назад +7

    channel capacity reaches its minimum 0 when f is 0.5 and maximum 1 when f tends to 0. Also it would be interesting to see whether the line between achievable and non achievable is linear after channel capacity.

    • @Mutual_Information
      @Mutual_Information  9 месяцев назад +9

      lol yea you can't communicate anything if there is a 50-50 chance of a bit flipped.. everything is noise. And regarding your second point, it is linear. That was another component of Shannon's proof. His prove is piecewise like that.

  • @sanchopanza9907
    @sanchopanza9907 8 месяцев назад +1

    Cool video! At the right time too as I'm working on MDS now.

  • @joaopedromarrey5477
    @joaopedromarrey5477 2 месяца назад +1

    Your channel is excellent. Extremely instructive and the videos are clearly thought out really well

  • @nikoskonstantinou3681
    @nikoskonstantinou3681 9 месяцев назад +5

    The quality of these videos is getting better and better. Keep up this great work!

  • @iamr0b0tx
    @iamr0b0tx 8 месяцев назад +1

    Thanks!

  • @actualBIAS
    @actualBIAS 9 месяцев назад +1

    Great Video! :)

  • @user__214
    @user__214 9 месяцев назад +4

    5:42 This was a great moment! =)

  • @General12th
    @General12th 8 месяцев назад +1

    This is so cool!

  • @kdhlkjhdlk
    @kdhlkjhdlk 8 месяцев назад +2

    Seems like it's because 0.5 is actually the highest error rate, if you're using flips as errors. At 1.0 you're just reversing the message.

  • @Cowtymsmiesznego
    @Cowtymsmiesznego 3 месяца назад

    My instinct was that we should be able to do much better than what you initially suggested by using something like a checksum for credit cards. The problem is that just tells you the message is wrong, not how to retrieve the correct one.

  • @JesseGilbride
    @JesseGilbride 8 месяцев назад

    I vaguely remember (from studies at college) something about using XOR to detect data errors

  • @marcusa.ragnos1041
    @marcusa.ragnos1041 6 месяцев назад

    You have all the exact same mannerisms as the channel “Whaddo You Meme” 😂. Love the video!

  • @kaishang6406
    @kaishang6406 Месяц назад

    I rewatched the video because i don't remember any details. then i afterwards realized that there isn't one to begin with...
    I'll watch the lectures next.

  • @MuniSreenivasPydi
    @MuniSreenivasPydi 9 месяцев назад +24

    Nice video! For readers who are interested in machine learning, there is a nice (but distant) analogue of this for binary classification. Suppose you are asked to build an AI that classifies whether an image is that of a cat or a dog. What is the best error rate that you can get? Turns out it is independent of the size of the dataset!

  • @karldewet5393
    @karldewet5393 8 месяцев назад

    Can you please do a video on Reed Solomon encoding?

  • @zokalyx
    @zokalyx 9 месяцев назад +11

    Have we found the "optimal" encoding then? Or is it still "out there"?

    • @Mutual_Information
      @Mutual_Information  9 месяцев назад +12

      We've gotten very, very close, but it depends on some assumptions about the channel. There are different types of noisy and flip operations. But, to answer what you're looking for, search "Turbo Code"

  • @jamesraymond1158
    @jamesraymond1158 8 месяцев назад +2

    I dimly remember Huffman code. Is that an example of the coding methods mentioned in this video? If yes, it would have been helpful to mention it.

    • @Mutual_Information
      @Mutual_Information  8 месяцев назад

      Huffman codes, for certain distributions over inputs, can achieve the lossless rate limit C. But for other input distributions, it can't and other methods will be more efficient (e.g. Arithmetic coding). In retrospect, I agree.. the video should have had more detail on some specific modern coding strategies.

  • @pizzarickk333
    @pizzarickk333 8 месяцев назад +3

    Can you please suggest me best books for learning binary or digital systems in general? The books I have don't do a good job at motivating the concepts and explaining them in a "rediscoverable" manner.

    • @Mutual_Information
      @Mutual_Information  8 месяцев назад +1

      "digital systems" is pretty general, so it's hard to say. For information theory, my go-to is "The Elements of Information Theory" by cover and for this, it's the book I showed. Another one is "Understanding Digital Signal Processing", which is for signal processing. These are all big ambitious books FYI.

    • @pizzarickk333
      @pizzarickk333 8 месяцев назад

      @@Mutual_Information thank you!
      Do you have a discord server?

    • @mackenziesalisbury5406
      @mackenziesalisbury5406 8 месяцев назад

      Arduino is a very fun way to learn some low level programming. Also NAND2tetris is a free comprehensive course that explains computers from transistors to logic gates to binary to machine code and more

  • @jamesraymond1158
    @jamesraymond1158 8 месяцев назад +1

    Excellent. So many speakers throw graphs out without properly explaining the axes. Not here!

  • @coltonhenry3636
    @coltonhenry3636 8 месяцев назад

    How does language effect the error correction? If the binary code was completely random, then there is no language. Though if there were some standardization or organization within the code could we use an algorithm to identify the MOST PROBABLE intended sequence, right? Like if I’m trying an essay and accidentally type “mathimativs’ where the word ‘mathematics’ was implied, since there is a standardization of the language, we can error correct both letters to find the intended word. So my question is: does imposing a language increase the likelihood of error correction?

    • @travcollier
      @travcollier 8 месяцев назад

      Sort of. You can think of speaking the same language as sharing knowledge of the encoding and decoding algorithms. Without that prior knowledge (information) there isn't much you can do error correction wise.
      Interestingly, this applies to language learning (linguistics) too. There has to be prior knowledge about the rules of a language or else it cannot be learned except in the trivial case of a finite language where the learner is given all the possible sentence/statements.

  • @meowsqueak
    @meowsqueak 8 месяцев назад +1

    Great video! Btw your mic is picking up occasional plosives from your voice at very low frequencies and these are booming quite loud on a decent sound system with a subwoofer.

    • @Mutual_Information
      @Mutual_Information  8 месяцев назад

      Thank you very much, I didn't know that. I've taken down a note to fix it going forward

  • @alleycatsphinx
    @alleycatsphinx 7 месяцев назад +2

    Shannon invented relu!

  • @tokajileo5928
    @tokajileo5928 8 месяцев назад

    what is your degree/scientific background?

  • @khatharrmalkavian3306
    @khatharrmalkavian3306 8 месяцев назад +3

    This isn't a result, it's just a lecture.

  • @Blueyzachary
    @Blueyzachary 8 месяцев назад

    I was about to say in the beginning “but the noise floor…”

  • @marcevanstein
    @marcevanstein 9 месяцев назад +2

    Great video! What's your background, by the way? Seems like you've maybe done a PhD in this kind of thing?

    • @Mutual_Information
      @Mutual_Information  9 месяцев назад +7

      Thank you, much appreciated. And my background is in financial engineering. Practically, I've just been modeling mostly economic data (prices, market dynamics, consumer behavior), and so that made a lot of machine learning relevant. And when studying that, I came across information theory (most with Cover and Thomas' text) and have found it useful + interesting.

    • @user__214
      @user__214 9 месяцев назад +1

      @@Mutual_Information I'm a data scientist myself, with no background in info theory, and modelers don't talk much about it. But through effort I found some good sources, and now I see that the relationship between compression and probability theory is fascinating! Info theory has legitimately sharpened my thinking on what it means to predict something or have a good model for it.

    • @Mutual_Information
      @Mutual_Information  9 месяцев назад +1

      @@user__214 Nice to meet a fellow DS! And I'm with you - a big emphasis of MacKay's work was the interplay of info theory + ML.. many interesting things happens in their intersection.

  • @user-vm1hi7bo5s
    @user-vm1hi7bo5s 8 месяцев назад +1

    Honestly, I was expecting a more exponential-looking curve. Is this a simplification? Isn't there any more effective error correction codes than the one used in the paper? I understand that it's not about the exact coding techniques, still can't wrap my head around on how they managed to neglect it. The paper should make it clear, i guess.

    • @Mutual_Information
      @Mutual_Information  8 месяцев назад +1

      Interestingly, the paper didn't use any error correcting codes. It was a non-constructive proof, where they show things exist without giving any examples. And in doing so, they show the achievability curve looks exactly as I showed. It's just flat and then a straight line. Not what anyone expected.

    • @user-vm1hi7bo5s
      @user-vm1hi7bo5s 8 месяцев назад

      @@Mutual_Information Ok, thanks for the answer

  • @jakubbelicki5755
    @jakubbelicki5755 9 месяцев назад +2

    Great channel, I like to watch your videos after work but always find them a bit too quiet. Maybe it's something with my hearing.

    • @Mutual_Information
      @Mutual_Information  9 месяцев назад +1

      That's good for me to know. Are other videos, outside of my channel, also quiet?

    • @kennynolan736
      @kennynolan736 9 месяцев назад +4

      Well like he explained, a noisy channel can cause issues!

    • @jakubbelicki5755
      @jakubbelicki5755 9 месяцев назад

      @@Mutual_Information I don't think so. I always remember to overamplify the sound when watching your channel.

    • @Mutual_Information
      @Mutual_Information  9 месяцев назад

      @@jakubbelicki5755 I see, I'll look into it. Thanks!

  • @findstr.s-hi-c._w
    @findstr.s-hi-c._w 3 месяца назад

    I assume the flip probability is random...and not fixed with Boolean switches.
    this creates the error, chance.
    your scrambling the message ( slightly ) , not hiding it.
    creates a receiver error.
    cause the flip is not a fixed switch.
    you would need to use like a roller, on an enigma, and flip with a algorithm.
    and create "fake" noise.

  • @superuser8636
    @superuser8636 8 месяцев назад

    Shout out to the Hadamard matrix and transform to recover n/2 +1 bits 😊

  • @bazoo513
    @bazoo513 8 месяцев назад

    ~ 4:43 - But, this is the crux of the problem!

  • @wertherquartett
    @wertherquartett 8 месяцев назад

    Google’s ads have made RUclips virtually unwatchable. The advertisers who have signed up for this subterfuge are totally despicable.

  • @event151
    @event151 2 дня назад

    It must be acknowledged that Shannon's "Information Theory" is, in fact, merely a "Quantitative Theory of Information". The concept of "information" is neither defined nor explained. Instead, the author employs quantitative parameters and assumptions in defining the concept of "information". It is a common misconception that Shannon invented the concept of information theory. In fact, he developed a mathematical model of communication, which he referred to as "A Mathematical Theory of Communication." This interpretation more accurately reflects the essence and significance of his work. In the field of information theory, Marshak's work, entitled "Value Theory of Information," published in "Elements for Theory of Team Management Science" (1995), is also worthy of note.
    However, if we consider the definition of information and its qualitative characteristics, Marian Mazur, PhD, developed a qualitative theory of information in his 1970 book, Qualitative Information Theory (Warsaw, 1970). The theory was initiated by the author as early as 1965.
    His work represents a classical (in the sense of classical logic) treatise on control in systems theory, wherein classical definitions are provided from the general to the specific via a deductive process. He defines information in terms of predefined concepts of transformations (which could be described as morphisms in category theory) and messages (which could be described as objects in category theory). His work is likely to be unknown outside the country in which he published. He demonstrated and proved many concepts, including informing, transinforming, pseudo-informing, disinforming, parainforming and metainforming.

  • @ericc6820
    @ericc6820 24 дня назад

    woah that’s nuts

  • @jakeaustria5445
    @jakeaustria5445 7 месяцев назад

    @Mutual_Information, I wanna ask about E.T. Jaynes book about Probability Theory and the Logic of Science. I just don't know where to ask, since the web also says the same thing. It is common knowledge that AND, NOT, and OR logic gates can create all logic operations of arbitrary number of variables. Thus, {AND, NOT, OR} constitutes an adequate set. However, the only single element adequate sets are {NAND} and {NOR} according to the book. Why does this SWITCH logic gate, as I call it, not an adequate set.
    This is its truth table
    A | B | Switch(A, B)
    F | F | F
    F | T | F
    T | F | T
    T | T | T
    I discovered that the SWITCH gate can simulate AND, OR, and NOT, so it must be an adequate set. But why does the book and wikipedia not acknowledge this gate as a single element adequate set?

  • @OhInMyHouse
    @OhInMyHouse 8 месяцев назад

    I didn't understand the "trick". Could anyone explain to me?

    • @Mutual_Information
      @Mutual_Information  8 месяцев назад +2

      Sorry about that. That’s on me. I probably shouldn’t have used the word trick. What surprising is that we can arbitrarily small error probability at reasonable fast. The confusing thing is.. how can it be *arbitrarily small*? How is an error prob if 10^-1000 achievable if you only need to double the message length? Well.. such probabilities are only realize-able over extremely long messages.. in other words, you have to send very long messages.. and those very long messages are encoded as even longer messages.. (only about 2x longer in the vid). And that’s how the rate is reasonable.

    • @OhInMyHouse
      @OhInMyHouse 8 месяцев назад +2

      So let me see if I understand what you say. The fact that the message length needs to be big enough is that I can group it into L chunks of K bits and then I can add redundancy to these chunks in a way that the coded chunk has a length of N bits. Then the addition of that redundancy at a chunk-level has the effect of increasing the length of the original message by a factor of 2 (for example) reducing the rate by one-half. So, the astonishing thing is that I can achieve an arbitrarily small Pb even by adding a small redundancy considering the total message (instead of L×K bits I will send L×N, in a way that L×N = 2(L×K)) and in the case of small message lengths I can't use this because the effect of the redundancy will be at a bit-level instead of a chunk-level, that is the "trick"?

    • @dekippiesip
      @dekippiesip 8 месяцев назад +1

      ​@Mutual_Information so essentially you have 2 parameters and not 1. The wanted probability and the message length.
      Could you say that for any given message length N, the best encoding tends to some other value smaller than c? Maybe even 0 for any message of at most N? But it only tends to c if the length of the message tends to infinity?

  • @Avokadik13
    @Avokadik13 8 месяцев назад +1

    Combination of mnim graphics and real life footage looks unusual for me

  • @grumpyoldman1257
    @grumpyoldman1257 8 месяцев назад

    Great ... but can you fix the "... that has gad an ..."?

  • @mohammedbelgoumri
    @mohammedbelgoumri 9 месяцев назад +4

    Error correcting codes?

    • @Mutual_Information
      @Mutual_Information  9 месяцев назад +5

      Indeed, but not specific to any single coding strategy

  • @SchoolofAI
    @SchoolofAI 9 месяцев назад +1

    I identify as 20-something. I am not 40+. That is just a datetime error.

  • @Daniel-Six
    @Daniel-Six 4 месяца назад +1

    What happened DJ? You quit on us? No new vids in four months! 😐

    • @Mutual_Information
      @Mutual_Information  4 месяца назад +1

      I'm working on it! It's a fat one. 30+ minutes. And made of solid gold.

    • @Daniel-Six
      @Daniel-Six 4 месяца назад

      @@Mutual_Information Coolio!

  • @timothyallen6411
    @timothyallen6411 8 месяцев назад +1

    I heard it. You must have had a noisy channel. Try again, and good luck.

  • @sundareshvenugopal6575
    @sundareshvenugopal6575 14 дней назад

    Claude shannon theory is not in the least bit true. It is at best a very supercilious view of compression coding. I have come up with scores of methods where 2^n bits of information can be losslessly coded in O(n) bits(order of n bits). So c*n bits of data, where c is a very very small constant can contain at least 2^n bits of information coded losslessly. Not only is massive lossless data compression a reality, but large numbers of terabytes sizes can be represented and manipulated within a few bytes, all mathematical operations performed within those few bytes.

    • @ddichny
      @ddichny 13 дней назад

      By the pigeonhole principle, it is trivial to prove that no lossless compression algorithm can exist which drastically compresses any possible input -- all compression algorithms must necessarily encode a lot of inputs into a larger "compressed" output.

  • @garcipat
    @garcipat 8 месяцев назад +1

    Sorry but i learned nothing from this video... Waste of time

  • @pappu10714
    @pappu10714 9 месяцев назад

    thank you! love from india.

  • @supercompooper
    @supercompooper 8 месяцев назад +1

    In italian accent 🤌"Galois o' here buddy"

  • @josephkanowitz6875
    @josephkanowitz6875 8 месяцев назад

    ב''ה, 🪑