The most versatile compression technique? (Burrows-Wheeler Transform)

Поделиться
HTML-код
  • Опубликовано: 21 сен 2024

Комментарии • 142

  • @graphicode_
    @graphicode_  6 дней назад +30

    Hi! Woke up to some comments and thought I should address some very important points here. Thank you for all the discourse! It makes me happy that people are raising concerns/voicing opinions here. More answers to FAQ/important things can be found in the specific video section here: graphicode.simple.ink/
    1. BWT in itself, is not a compression algorithm. It is more of a transformation (as the name implies), and has to be used in combination with other techniques (such as RLE talked about in the video) for it to be useful.
    2. As with all compression techniques, be it BWT+RLE, Lempel-Ziv, Arithmetic Coding, Huffman Coding etc., there is a limit to compression. If not, we would be compressing everything into nothingness, which just isn't possible. I see the inaccuracies in the title, I guess it comes with this caveat, and point (1) itself as well. Wanted to pose "always compressible" as a question originally, which is why the thumbnail has that re-emphasised, but I see why this is an issue. Not sure of a better way to phrase things as of now though...
    Edit: Title has been changed after thinking for some time on how to phrase it

    • @BrandonDyer64
      @BrandonDyer64 6 дней назад +4

      The title and thumbnail are still clickbait.

    • @TheWebgecko
      @TheWebgecko 5 дней назад

      How to watch that video? It doesn’t seem to work on mobile

  • @warmCabin
    @warmCabin 6 дней назад +150

    A few years ago, a teammate's inept implementation of this very algorithm attempted to load 8 copies of an entire Linux distro into my RAM and hardlocked my computer.

  • @MasterHigure
    @MasterHigure 7 дней назад +76

    Looks cool, but it still seems like a total coincidence that the final column is more susceptible to run-length encoding. With a few more examples I'm sure it will switch to looking like magic instead.

    • @BridgeBum
      @BridgeBum 6 дней назад

      I can see probabilistic arguments as to why it is more likely to get grouping (as we remove the same letters from the group, what remains will have relatively more repetition of other elements) but that's hardly proof. I'd be interested in seeing one as well.

    • @furbyfubar
      @furbyfubar 6 дней назад +26

      Yeah, the video glossed over *why* it's more likely to have repeated characters after the transform. So much so that I hit up wikipedia. It works because the letters that come after a specific letter (in natural languages) are more likely to be the same. The letter-shifting moves the first letter in each letter-pair into the rightmost column and sorts alphabetically on the second letters in each pair.
      The video title being *slightly* clickbaity doesn't help as this compression algorithm *doesn't* always work; it only works on text where we expect some letters to be more common after specific letters (so it wouldn't work on say cipher-text) and required a long enough text to for it to save characters. I suspect the video title means "That can always works to decompress", but that should be true of any loss-less compression. Of course, no compression algorithm *can* always make the data smaller; if such an algorithm existed we could feed it its own output and somehow infinitely compress data.
      Here's wikipedia's explanation of why the Burrows-Wheeler transform works, if I that didn't make enough sense above:
      To understand why this creates more-easily-compressible data, consider transforming a long English text frequently containing the word "the". Sorting the rotations of this text will group rotations starting with "he " together, and the last character of that rotation (which is also the character before the "he ") will usually be "t", so the result of the transform would contain a number of "t" characters along with the perhaps less-common exceptions (such as if it contains "ache ") mixed in. So it can be seen that the success of this transform depends upon one value having a high probability of occurring before a sequence, so that in general it needs fairly long samples (a few kilobytes at least) of appropriate data (such as text).

    • @MasterHigure
      @MasterHigure 6 дней назад +1

      @@furbyfubar This begs the question of whether it could be advantageous to reverse the string before encoding. What's more common? "The particular phrase XYZ is most often preceeded by W", which this algorithm seems to like, or "The particular phrase WXY is most often followed by Z", which would favor reversal.

    • @furbyfubar
      @furbyfubar 6 дней назад +6

      @@MasterHigure
      The algorithm only cares about *pair* or two adjacent characters. Not sure if this means the input text order doesn't matter as it becomes symmetrical, or if there might be a difference between how often "a character can predict the next character" and how often "a character can predict the preceding character"?
      OK, so you *might* have just nerd sniped my into trying to Burrows-Wheeler transform Hamlet and Hamlet in reverse using random webpages....
      The online tools I found were running the transformation server side, so they either didn't respond to long texts or they enforced a character limit. So I ended up testing it on only the first 1000 characters of Hamlet instead, plus the same 1000 characters but backwards. After the transforms I then replaced all instances of repeated characters of the output with two characters. (Probably not fully realistic for compression, but good enough for a test.)
      The result was that the compressed original text input was a single character shorter than the compressed text with the input backwards. So it seem like it at least *can* matter, but probably doesn't matter enough to be worth much further research.

    • @emilyyyylime-
      @emilyyyylime- 5 дней назад +3

      ​@@furbyfubarThe algorithm does consider more than two characters, given that there is enough input data for patterns to arise. This is because the lexicographic sort considers not just the first character of the word but all characters in order. Some back of the napkin maths tells me that out of 26³=17576 three-letter combinations, if just 10% appear commonly in English, your sample size is still about half as big as it should be to observe the most minimal of effects. I don't see any reason to beholden yourself to server-side 1000 character limits, when you could just run a barebones script locally, even using your own browser. I think I understand peer reviews now.

  • @dinhero21
    @dinhero21 6 дней назад +61

    Why does it "always work"?
    There even was a counter-example in the video where BANANA$ (7) got transformed (BWT + RLE) into 1A2N1B$2A (9)
    Also, why does lexicographically sorting all rotations of a string appended with a sentinal character and taking the last character of every rotation make it better at run-length encoding?

    • @AK-vx4dy
      @AK-vx4dy 6 дней назад

      @@dinhero21 I think always work was about it is reversible and boosting repetitions at the same time (not always ideally but still)

    • @minamagdy4126
      @minamagdy4126 6 дней назад +2

      It is impossible to make a compression algorithm that is guaranteed to output a shorter wncoding than the original whole remaining 1-1. What this manages to do is significantly improve on the naive run-length encoding by making it far more likely to have duplicate adjacent letters without sacrificing injectivity.

    • @graphicode_
      @graphicode_  6 дней назад +11

      Hellos! Yep the "always work?" was meant as a question in the title, and not a direct statement. In the video, the BANANA example is intentionally brought as a counterexample to show that it does fail. If I recall in the script I tried to mention that, but I think it could be emphasised more clearly that EVEN a compression algorithm has its limits.
      And as with any compression algorithm, there is always a compression limit :) Just added a pinned comment talking about this too.
      Regarding your second comment, do look at the video block here graphicode.simple.ink/ as I added an explanation there to help answer that. Thank you!

    • @donno2048
      @donno2048 6 дней назад +7

      Meant to comment that too, it's literally impossible to create a compression algorithm that shrinks every given input

    • @idle-hands
      @idle-hands 5 дней назад +3

      @@donno2048 lossy compression

  • @imabstrong
    @imabstrong 8 дней назад +33

    High quality stuff! Keep up the good work

  • @muskyoxes
    @muskyoxes 5 дней назад +28

    The ultimate internet compression algorithm is to just decide not to have any ads or tracking. Boom - you've reduced the bits over the wire by 99%

  • @clehaxze
    @clehaxze 7 дней назад +28

    Just me or the BGM sounds like Baba Is You?

  • @GameJam230
    @GameJam230 5 дней назад +7

    I mean, it can’t ALWAYS work, that violates the pigeon hole principle.

    • @Matyanson
      @Matyanson 3 дня назад

      the pigeon hole principle does not disallow compression that has output with allways less or exactly equal size, right?
      Now I wonder if such algorithm exists

    • @Matyanson
      @Matyanson 3 дня назад

      Maybe I came up with a "trivial" one?!
      So every string has an end character "$" at the end, where the data ends.
      Given an input x of length n, follow the instructions:
      create an empty list of repetitions.
      iterate over every character ch:
      1. scan the following characters and count the repetition of ch. The count is r.
      3. create a tuple of the current ch index and r: [index, r]
      4. if sizeof(characters that would be removed) > sizeof(tuple):
      add the tuple to the list
      remove all but one repeating characters from the input
      append the list to the end of the input.
      Examples:
      "aaaaaaaaah$" ->"ah$09"
      "aaaah$" -> "ah$04"
      "aah$" -> "ah$02" is longer -> "aah$"
      "banana$" -> "banana$"
      Decoding:
      "ah$09"
      1. find the end of the data "$"
      2. check every pair of numbers after:
      1. look at index 0: "a"
      2.insert "a" 9-1 times at index 0
      3. remove data after "$" symbol
      -> "aaaaaaaaah$"

    • @Matyanson
      @Matyanson 2 дня назад

      Of course you could improve the algorithm. For example by having a list of number triplets:
      the tuple is: [index of character, repetition window, number of repetitions].
      so for example repetition with window of 2 would be "ananananan"
      "banana$" -> "bana$122" (too long) -> "banana$"
      "bananana$" -> "bana$123"
      window of 12:
      "best of the best of the best of the best!$" -> "best of the best!$0 12 3" (there would not be any spaces needed after $)

    • @Matyanson
      @Matyanson 2 дня назад

      Ok, I overlooked that I can't use the character that marks the end $ like this. The end of the data always needs to be marked so the program knows when to stop reading data. Some other character would need to be used in the middle that is not used otherwise... which is a problem.
      What if the string is structured like this:
      The program knows how long the data is so it knows when to stop. Now my idea is:
      If the algorithm decides to include the tuples, add a header before the actual string data like so:
      The symbol would be the maximum possible value of . This means the algorithm is unable to encode a string of maximum . But is able to encode any other value with the output size smaller or equal to the source.

    • @GameJam230
      @GameJam230 2 дня назад

      @@Matyanson while it’s true that the pigeonhole principle doesn’t prevent COMPRESSING the data, decreasing the amount of bits that you represent data with can lead to conflicts when two files compress to the same result, which either must occur for two or more pieces of input data, OR some input data cannot be compressed, the latter of which means the algorithm doesn’t always work.
      But, in the event that a conflict occurs, you need more than one DECOMPRESSION function to return it to its original state, and if you don’t HAVE more than one decompression algorithm, then that means some conflicted compressed data cannot be returned to its original state, but compressed data that can’t be DECOMPRESSED is not useful, and you may as well just be passing the data through a hashing function instead of compressing it, meaning I would say that it’s far more likely that the second case I mentioned is true, which is that some input data cannot be reduced in size, and therefore the algorithm doesn’t “always” work.

  • @Singularity606
    @Singularity606 22 часа назад

    Never change out the cat. He's perfect.

  • @isle_of_violets
    @isle_of_violets День назад

    your animation style is amazing ♥️

  • @_notch
    @_notch 6 дней назад +15

    There is no such thing as a compression algorithm that always works. The title is clickbait.
    Edit:
    Proof: Try to compress the output of compressing some other data. In order for it to decompressed to compressed data and not to the original data, the input of encoding said data cannot be the same as the output of the data. If the new encoding data is smaller in size, repeat the process with that encoded data. Eventually you will find a counter example that where the output will be bigger than the input.

    • @graphicode_
      @graphicode_  6 дней назад +4

      With any compression algorithm, there is indeed a limit to compression - be it AC, Lempel-Ziv or Huffman codes. I like your short and intuitive proof for the others watching, and I just woke up to many comments, so I'll be addressing the common caveats/things to note! Appreciate your comment :D

    • @sdjhgfkshfswdfhskljh3360
      @sdjhgfkshfswdfhskljh3360 5 дней назад +1

      I think explanation can be simplified.
      If you define "always works" as producing strictly smaller output, than iteratively applying "always working" algorithm will eventually give zero size and negative size results, which make no sense.

  • @xelxen
    @xelxen 7 дней назад +6

    Great video! I can see your bright future, keep up the good work :D

  • @roostydoo2
    @roostydoo2 7 дней назад +4

    This is a great video for such a small channel subbed!

  • @jaceg810
    @jaceg810 6 дней назад +2

    at 6:14, the bottom most $ is incorrectly written as s. Unless there is a good reason for this?

    • @graphicode_
      @graphicode_  6 дней назад +1

      Good catch! Nice eye - yea it is a mistake haha

  • @YTomS
    @YTomS 6 дней назад +1

    Awesome video, nice job 🙂.

    • @graphicode_
      @graphicode_  6 дней назад

      Thank you for that! Glad you popped by :)

  • @DanielLenrd
    @DanielLenrd 6 дней назад +4

    "always works"?
    if that was true, you could repeat this algorythm until you get to 1 character

    • @LiEnby
      @LiEnby 6 дней назад +2

      lol god compression a single bit

  • @jakeaustria5445
    @jakeaustria5445 6 дней назад

    Thank You

  • @004307ec
    @004307ec 5 дней назад

    ❤ thank you.

  • @coolguyflex
    @coolguyflex 6 дней назад +1

    What's with the Baba is you music?

  • @panmelnyk
    @panmelnyk 7 дней назад +1

    nice vid bud. I wish you good luck getting some more eyes on your content, you deserve it frnc

  • @edhyjoxenbyl1409
    @edhyjoxenbyl1409 7 дней назад +6

    BABA music!!

  • @Asteria-kc9op
    @Asteria-kc9op День назад

    For anyony curious about the music used in the beginning - it's Baba Is You soundtrack.

    • @fractalinfinity
      @fractalinfinity 34 минуты назад

      Yes, and how do they feel about their music being used here?
      Did they give permission?

  • @isle_of_violets
    @isle_of_violets День назад

    i love the baba music

  • @AK-vx4dy
    @AK-vx4dy 7 дней назад +5

    Nice explanation, but i'm concerned about practical realisation, if we have block of n chars we must build n+1 words, then sort them it seems computationaly and/or memory
    expensive.

    • @panmelnyk
      @panmelnyk 7 дней назад +3

      always works != should be done.

    • @Zicrus
      @Zicrus 7 дней назад +6

      It can be computed very efficiently (I believe in both linear memory and time). The explanation in this video is just the general idea, to help understand it intuitively. You don't actually have to compute the n*n grid.

    • @__Brandon__
      @__Brandon__ 7 дней назад

      You can also just split the message into blocks of 1024 bytes and then encode the blocks separately

    • @AK-vx4dy
      @AK-vx4dy 6 дней назад

      @@__Brandon__ Yes but then you loose on compression level, depeneding on data of course

    • @AK-vx4dy
      @AK-vx4dy 6 дней назад

      @@Zicrus Yes i understand pruprose of video, but i wondered if author knows more ;)
      My intiutution says i don't have too, but to sort them without materalising some other tricks will be needed.

  • @YEWCHENGYINMoe
    @YEWCHENGYINMoe 6 дней назад +1

    nice

  • @alexeydmitrievich5970
    @alexeydmitrievich5970 6 дней назад

    Great video. The cat looks very diligent

  • @Ykulvaarlck
    @Ykulvaarlck 6 дней назад

    you should do a video about suffix arrays, ideally sa-is because i've yet to find a simple explanation of it

  • @djelilikejam
    @djelilikejam 3 дня назад

    this music is familiar,,, [2 min later] OH WAIT ITS BABA !!!!!

  • @Hi-gf4ts
    @Hi-gf4ts 7 дней назад +1

    I loved the video! I find compression techniques fascinating, and I had never heard of the Burrows-Wheeling Transform before. I made an implementation of it in JavaScript that compresses strings, but I ran into a problem. Even though it is consistently better than run-length compression on its own, it almost always seems to produce a string that is bigger than the original string. It only seems to produce a smaller output if the input is highly repetitive. Is this a known issue? Is there a workaround?

    • @ironfreak999
      @ironfreak999 7 дней назад +1

      You could run into multiple issues here:
      1. If you store characters A-Z and $, you won't fill up a byte per char (which is probably the default in JS). So you're effectively wasting like 3 bits per character. But this can be somewhat mitigated by a dense bit-packing (or another encoding).
      2. Suppose you don't care about those densly packed bytes and just count the characters. Then only certain repetetive patterns will be compressed by this algorithm, as you have observed.
      This is due to the nature of "RLE" adding a length to each repeated chunk. If there are less repeated chunks, you will have more length tokens in your compressed string.
      3. In case you try to compress random strings you almost always end up with more data (lookup incompressible string on wiki, if you want)
      4. There is actually no compression algorithm, that always yields less data. Because of this every compression algorithm will most likely fail for random data and heavily favours "repetative data". This can be shown by the number of unique strings per length. Suppose your string is n chars (bytes) long, so there are 256^n possible strings of that length. If every string would be compressed to at least "n-1" chars, there must be some pairs of strings, which look the same when compressed, since 256^(n-1) < 256^n. These collisions make it impossible to recover the original string.
      So yes this is a known issue with these kinds of compressions.
      A workaround would be to use a more suitable compression algorithm for the kind of data you want to compress. Usually there are algorithms for each use-case, for example images - jpg / videos - h264 / text - DEFLATE.
      If you want to dig more into text-compression, huffman encoding might be another thing to look into.

    • @Pystro
      @Pystro 6 дней назад +1

      The problem is that all the Z's in the first column (which are all in a single block at the bottom) match with _all_ the different letters that precede Z's in the last column. Which means that the end of the transformed string (assuming it _has_ Z's in the first place) has for example a higher proportion of vowels than usual (because consonants like Z pair with vowels more often than with other consonants). But the *order* of all of those letters that precede Z's is still more 'random' than we'd like.
      To give the two most extreme examples, the block of U's in the first column are where _all_ of the Q's will inevitably appear, but it won't _only_ be Q's. (And similarly for spaces being preceded by all dots in the text, but not exclusively by dots.) And on the other hand, capital letters are _always_ preceded by space characters; unless you spell any words in ALL CAPS. (Well, one of them will be preceded by the sentinel symbol, and maybe some by quotation marks or parentheses.)
      It's why bzip doesn't just do a BWT and then RLE.
      It first uses a *BWT* - in order to turn correlations between di-grams into _higher concentrations_ of letters in certain parts of the transformed string,
      then it uses the *"move-to-front" algorithm* - to turn concentrations of similar letters (and also just higher occurrence rates of certain letters like E's) into higher occurrence rates for symbols with low ID's,
      and then uses *entropy coding* - in order to turn the most frequent symbols (those with low ID's) into shorter bit codes.
      If you're playing around with it, I've always thought that using a custom alphabet (symbol order) might group the symbols even better. Even just one that is known ahead of time to sender and receiver could give significant improvements.
      If you use a custom alphabet, you'd most definitely want to use it for the sorting (especially where lower case vowels and lower case consonants are in separate groups, and all capital letters are appearing in an uninterrupted block the alphabet, and you can probably do _something_ smart with the space character too).
      And for seeding the the move-to-front algorithm in bzip you may want the same custom alphabet, the standard alphabet, or a second custom alphabet. (This one will mainly need to have the symbols in order of their frequency - or actually in the order in which they are most likely to appear in the BWT transformed string.)

  • @cc12yt
    @cc12yt 6 дней назад

    Here before 100k views, the production quality is insane 😎

    • @cc12yt
      @cc12yt 6 дней назад +1

      Also, less than 1k subs? Such an underrated channel

    • @graphicode_
      @graphicode_  6 дней назад +1

      Thank you! Haha appreciate those kind words

  • @CatCollective----
    @CatCollective---- 6 дней назад +3

    i love it when you explain it comprehensively and graphically step by step how the burrow wheeler algorithm scramble and unscramble the data!
    im interested in the encryption works, like how two private key that never shown to each other can be used to unlock the same locked box of encrypted data, can you make a video of it?

    • @graphicode_
      @graphicode_  4 дня назад

      Sure! Though I think this RUclipsr called @SpanningTree actually has a very good video on it already :)

    • @CatCollective----
      @CatCollective---- 4 дня назад

      @@graphicode_ true, understanding that the encryption calculate the keys twice privately is easy enough to understand.
      though me and some other (as shown in spanning tree comments) have a hard time grasping the concept that the result of each private calculation is halfway of the whole and would end up to the same results. i mean, modulo seems like a destructive calculation that points nowhere in the exponent table.. wondering how doing A then B is equal to doing B then A. its hard for me to explain the confusion myself.
      so i hope that you can fill in the gaps to clarify where most people lost in the algorithm.

    • @graphicode_
      @graphicode_  3 дня назад

      @@CatCollective---- hmm that is interesting! I'll take a look to see if I can make a video out of it

  • @sdjhgfkshfswdfhskljh3360
    @sdjhgfkshfswdfhskljh3360 5 дней назад

    There is an easy way to see that no compression algorithm is perfect:
    Just try to compress file filled with pure random noise.
    *Every* compression algorithm will give output larger than input.
    If you are getting smaller file, than your noise is not pure enough.

  • @xcoder1122
    @xcoder1122 6 дней назад +3

    BWT is NO compression algorithm at all. The video explains that correctly but the title claims it is, yet by itself BWT will not make data any smaller. And when combined with RLE and Huffman Coding, as in bzip2, it will not always work to reduce data size.
    RLE needs one character to signal a RLE sequence will follow (indicator). The problem is that if every character can be a valid character of the data stream, the indicator must be encoded somehow when you really want that character and not a RLE sequence and this encoding increases the data size. By using move to front encoding, you can ensure that the indicator is always the least frequently used character but if all characters are used about equally often in the data stream that buys you nothing. And if the data stream on top of that has no significant runs, not even after BWT, then the encoding overhead for the indicator will produce an even bigger data stream than your input stream. And as all characters are used about equally often, Huffman Coding cannot reduce the size either but would only increase it even further.
    So it's easy to construct a data stream that is not compressible with BWT at all and will even grow in size, after RLE and cannot be further reduced by Huffman Coding either.

    • @emurphy42
      @emurphy42 6 дней назад

      IIRC it's impossible for any lossless algorithm to compress 100% of inputs, because there are 2^N possible inputs with N bits, but only 2^N - 1 possible outputs with at most N - 1 bits, so something won't be able to fit no matter how clever the mapping is. (And no sneaky "add the extra bits to the filename", because you need room to store the filename too.) Of course many algorithms do a good job on the vast majority of inputs, and lossy algorithms can compress further because it's okay for multiple inputs to map to the same output.

    • @xcoder1122
      @xcoder1122 6 дней назад

      @@emurphy42 Many compression tools will at least not increase data sizes as they detect that the data is not compressible by it's compression method and then just adds the data uncompressed. That way at least data size will not grow but it will not shrink either. Some also implement more than one method and will try to switch to another method which sometimes can help. But your calculation is correct, of course there will always be an input that cannot be mapped to a smaller output with any given set of methods.

    • @reddragonflyxx657
      @reddragonflyxx657 6 дней назад

      ​@@xcoder1122Even if a compression tool doesn't compress the data, it has to indicate that the data wasn't compressed so the decompression tool doesn't interpret it as a compressed data stream.

    • @graphicode_
      @graphicode_  6 дней назад

      Yep you are right, I guess BWT in itself is not very useful, it has to be coupled with other things. BWT is more of a... pre-processing/transformation technically, but I was struggling to find the right word to encapsulate "block-sorting compression/transform + RLE" into the title. I did feel quite iffy and expected comments like these to point it out, so I am happy that there are people who notice! I'll add a pinned comment explaining things in more detail
      As with compression, analyzing it from an info theory perspective would be extremely cool and insightful. I'm very glad you brought up HC in this too!

  • @theunknown4834
    @theunknown4834 6 дней назад +3

    Random qn, are you Singaporean?

    • @graphicode_
      @graphicode_  6 дней назад +3

      Yes haha it's the accent eh?

    • @theunknown4834
      @theunknown4834 5 дней назад

      ​​@@graphicode_too many times have I heard tuition recordings in Popular

  • @papycoima
    @papycoima 6 дней назад +1

    I've seen channel with more subs and less quality! Great work. I found the explaination just a tiny bit unclear at times but that's probably cuz it's late and I'm half asleep lol

    • @graphicode_
      @graphicode_  6 дней назад

      Thanks! Any specific areas in which are unclear? Would love the feedback so I could write a better script next time

    • @papycoima
      @papycoima 6 дней назад

      ​​@@graphicode_Just to let you know, I'm someone who needs to know everything to the atomical level while learning something new :p
      That said, the three main doubts I have are:
      -Why does lexografical sorting give you an easily-compressable string in the last coloumn?
      -In the reverse process, how does re-sorting to the left and re-filling the last coloumn to the right give you the original grid?
      - How is the quicker version of the reverse proceds equivalent to the longer one? Specifically the "jumping back to the ith occurence of the letter" was a bit hard for me to make sense of

  • @sdspivey
    @sdspivey 6 дней назад +3

    Ok, now compress "abcdef".

    • @BridgeBum
      @BridgeBum 6 дней назад

      Yup, RLE cannot work on all data blocks. That's often why real world implementations use multiple algorithms. Having said that, bytes can only have 256 possible values so longer data blocks can guarantee repetition. It's also why textual data tends to compress better than binary blocks, higher frequencies of individual characters.

    • @wanfuse
      @wanfuse 5 дней назад +2

      f

    • @nisonatic
      @nisonatic 5 дней назад +1

      @@wanfuse I had heard rumors of the Pay Respect algorithm, but never seen it.

    • @wanfuse
      @wanfuse 5 дней назад

      110

    • @evnnxi
      @evnnxi 3 дня назад +1

      abcdef&
      &abcdef
      f&abcde
      ef&abcd
      def&abc
      cdef&ab
      bcdef&a
      &abcdef
      abcdef&
      bcdef&a
      cdef&ab
      def&abc
      ef&abcd
      f&abcde
      f&abcde
      1f&1a1b1c1d1e
      just dont compress this i guess.

  • @6XeNoN9
    @6XeNoN9 7 дней назад

    really nice video !

  • @Khusyasy
    @Khusyasy 6 дней назад

    ah yes baba soundtrack

  • @StefanReich
    @StefanReich 4 дня назад +1

    "i-the" should read "i-th"
    Good work otherwise, nice explanation

    • @graphicode_
      @graphicode_  4 дня назад

      Thanks for the catch :D and appreciate it

  • @coderized
    @coderized 5 дней назад +1

    1A69N1B1$69A

  • @edems131
    @edems131 7 дней назад +3

    I'm 491 sub, great video btw

  • @NakamuraSatou
    @NakamuraSatou 6 дней назад

    This is so fun

  • @grahamcracker1234
    @grahamcracker1234 6 дней назад

    commenting for engagement

  • @micmal2967
    @micmal2967 3 дня назад

    There is no compression algorithm that always works. a compression algorithm that always work will take all possible string from length 1 to n and will map each one independently to a new string such that the sum of the length of the maped strings will be less then the origin; which means that at least two strings are maped to the same output string

  • @Serizon_
    @Serizon_ 6 дней назад +2

    baba is you song!!!!!!!!!!!!!!!!!!!!!!!

    • @luketurner314
      @luketurner314 6 дней назад +1

      About to comment the same thing. It's a great game

  • @Jake28
    @Jake28 2 дня назад

    "The compression algorithm that always works?". I'm writing this in my comment incase the title of the video is changed later.
    I'm assuming by 'always works', we mean that it always compresses things to be smaller than they currently are. I hold that this is a reasonable assumption to be made from the title of the video. Here's the issue:
    That's impossible.
    If a hypothetical compression algorithm that acted in that way existed, we would be able to compress any message into any arbitrary amount of space. We can't. So, the situation proposed in the title of the video is impossible.
    I noticed the pinned comment after I finished writing this, but I'll still post it anyways. I might as be a part of the record of people pointing out the innacuracy, especially considering that despite a correction being posted in the comments, the title remains unchanged. At least add an asterisk or something in there...
    Congrats on making a video geared towards computer science audiences, while simeltaneously making an error obvious enough that most of them would probably click the video out of spite? I guess that's one way to game the algorithm!

    • @Jake28
      @Jake28 День назад

      Oh nice they changed the title. Comment retracted...

    • @graphicode_
      @graphicode_  День назад

      Sorry for the delay! Took a while to think of a good substitute and was caught up with work... Anyhow, appreciate your thoughts, as with many others :)) All forms of opinions, good or bad, are important to me

  • @MatthewFearnley
    @MatthewFearnley День назад

    Wow, the BWT is like nothing I've ever seen before. But struggling to understand what's going on in the reverse algorithm...

    • @graphicode_
      @graphicode_  День назад +1

      Which section is confusing? If you are interested - feel free to join the discord channel and perhaps we could discuss as a community too!

    • @MatthewFearnley
      @MatthewFearnley Час назад

      @@graphicode_ Hey, thanks for your reply.
      I understand how we can get the first column from lexicographically sorting the given last column. But for me things get kind of hazy from there.
      I'm not hugely into Discord communities, but I appreciate the invitation. If you think it's going to be a better place to ask and explore, maybe I'll take a look.

  • @notwithouttext
    @notwithouttext 3 дня назад

    wheelbarrow transform

  • @TheMasonX23
    @TheMasonX23 День назад

    "Evil Telecommunications company" is a bit redundant, isn't it?

  • @NeunEinser
    @NeunEinser 7 дней назад

    Neat!

  • @logitech4873
    @logitech4873 7 дней назад

    Here before 100K views :)

  • @Kurushimi1729
    @Kurushimi1729 6 дней назад +3

    What's with the stupid title? A compression algorithm that always works is obviously impossible. Is this for clickbait?

  • @alexdefoc6919
    @alexdefoc6919 3 дня назад

    Do the dab 😂❤

  • @StefaanHimpe
    @StefaanHimpe 5 дней назад +1

    if it always works you can compress anything to 0 bits - good luck restoring :d. Not worth my time watching....

  • @Stztic
    @Stztic 3 дня назад

    Cat, :3

  • @Kurushimi1729
    @Kurushimi1729 6 дней назад +2

    What's with the stupid title? A compression algorithm that always works is obviously impossible. Is this for clickbait?