There's a problem with the delta approach. Latency. Sure, you might save a few kilobytes... but those additional requests each incur latency. So whatever raw transfer speed improvements you would get (which are not simply linear due to packet sizes, saving 1 byte inside a packet isn't nearly as important as saving 1 byte that would overflow to a new packet) would almost certainly be eaten up by the latency of those requests, especially given the way consumer connections are throttled on upstream. (Not that requests for these things would be large and run into throttling themselves, but considering the ways in which the throttling is done, especially in circumstances where the network connection is being used, such as in a family home)
Could you please explain why the kiwi's in the PNG with the extra two columns of pixels were not compressed? I would've expected the second kiwi to have fit in the 32k window, and perhaps part of the third kiwi as well.
"Minification" reminds me of code obfuscation which companies used back in the day to distribute source code (back when the world was much more than Windows, OSX and Linux) while making it difficult for the user to read it, and -- even before that -- the tokenezation performed by the MS BASIC interpreter.
There is a mistake, in that the sorted list does not have the same cardinality as the source set. The correct result would be [0,1,1,0,1,0,1,1,1,1,1,1]. He wanted to make an ideal set [0,1,1,1,1,1,1,1,1,1] to demonstrate the compression. In a real world example, it would be fine to have multiple entries that were the same. Those would lead to a value of zero in the delta output like I showed.
So I guess this history lesson is the reason Chrome is being cautious and hiding brotli behind a flag in Canary even though brotli was also invented at Google; while Firefox is going ahead and releasing it into the wild in the very release I'm downloading the update for now. That said, assuming the proxy issues haven't disappeared (which they may not be as big an issue in the years since); it does tell me that those bugs won't be as big an issue for bzip2 or brotli in HTTPS traffic, which is a growing category of traffic.
The encode time(s) are a much larger part of latency than I would have thought. The 50ms of encode time per 100kb of data figure is non negligible for files that rarely change like CSS or JS. Serving them a pre-compressed files makes a lot of sense. For CMS frameworks we just need the tools to do it for us. Is the given amazon example based on the default gzip DeflateCompressionLevel (6)? And is this based on an average server side? Or a local machine?
The section around 27:00 is incorrectly stated and misleading -- "GZIP is inflating the smaller file" is /wrong/. Those red numbers are not an indictment of GZIP at all and are not indicating that GZIP is harmful or "scary". They're an indictment of the genetic algorithm-based minifier tools, which make the data inherently /less compressable/. In other words, they make GZIP a little bit less helpful, NOT harmful. GZIP is no less of a "silver bullet" with this argument. GA minification is.
I believe this is because people like the idea of being expert in one thing but just a few have eager and interest in discover more about what it is said to be important?
📺💬 I have been working with unreal-engine for 3 years and we found the problem of the game is that objects are in game sizes and when we move pixels that is a lot of data. 🥺💬 I understand also games and the GZip is involved in the compression of pixels when allowed some function working with data when it is in compression format. 📺💬 Huffman codes present in bits code by the order of the number of words present in the context that should be Huffman because more frequently present data is used less number of bits represent or priority in order for symmetric characters or word encoding. 📺💬 Yui you should correct the comment first 🥺💬 It is true if it is Huffman encoding, text line a, an, the or pronunciation will have the priority bits but the longest word matching is to find the number of them in sequences they are present in the context. Huffman is good for reading words because there are not much of repeating words when the table is but how about logging and number locations and bits represents⁉ 🥺💬 I also read WinZip which is the reason why DAT format compression is over 70 percent for some text format, it supports bot the longest search and Huffman. 🧸💬 Advantage of GZIP is you still can work with data when it is in compression format. 📺💬 Delta compression where we can send patches to update the data to the client by the client continues working on the original file. 🧸💬 There is an application not only to update the text file for JSON parameters value but visualization of objects, screen transformed data, mouse pointer, and keyboard types and rules. 🐑💬 It is a good application but a security concern, they invented it for many years back but an attacker can replay on this transmission communication by reading from encryption packages because of real-time transform limitations today loing encryption algorithms are helpful with this method. 📺💬 Horizontal Delta compression 🧸💬 Transferring of data in field format grouping and priority you can reduced sizes of communication packages because the data reply required to process are arrived at the same time.
Doesn't LZMA offer the best web solution because it has the smallest storage and network footprint and the fastest decoding? I ask this because it seems encoding time isn't any where near as important since it is done once before deployment.
+snetsjs gzip is about 2x faster at decompression than any lzma, although lzma compresses significantly better, getting the content to you (for decompression) quicker, so there's an awkward tradeoff. Basically, the faster the network, the more attractive gzip is.
+John Jones Oh, I just saw the bit where he provides the numbers which is why you thought lzma is faster than gzip. Those stats are garbage. LZMA is way better at compression than gzip, but takes much longer to compress and a little bit longer to decompress. The LPAQ and bzip stats look reasonable, so its really only LZMA that looks weird to me. It's all about the data your trying to compress at the end of the day, but these stats are weird.
About this newfangled WebP format: yeah, not too great, sounded too good to be true; Wikipedia cites researchers describing much more blurry results than with JPEG, and no significant reduction of size in memory and storage.
11:50 - large size plagues also native JavaScript, I was working on a small Web app - shop with custom T-shirts - and it got to about 1.1 MB of JavaScript bundle (Browserify), and 700 kB after minification. Most of the size was used up by Firebase library and React library.
what about convert the whole data file into a gigantic expandable numerical array, then sum all the values into a gigantic floating point number and then store this number into a new file? or would that not work?
sounds like its the middle boxes problems, not yours. If the browser and server determine what they can use, the all is good, if there is a middle box in the way, then let them get the complaints. Also if I may add, a lot of page laoding times I see are not due to a 2-5% saving in content size, but rather poor devlopers using all these frameworks and causing so much overhead and including 3rd party stuff, java, css, etc,, from 3rd party sites, its horrible.
Yeah, and "jizzip" :P most obviously from JiNU ZIP :P If all Google developers are that competent, that would explain the constant degradation of RUclips and their other websites :q
Seems I'm a couple of years late watching this. Never the less, very interesting stuff! :) Thanks for posting!
My vote is always for middle-out. I can't believe you Hooli guys can't get that figured out :-\
It's been years since I graduated as an engineer, but watching this made me realize I should've taken mathematics even more seriously.
Very interesting, good to see I'm not the only person interested in reducing overheads.
I totally agree. I don't watch most of the videos Google Developer uploads beacuse they're too many, but I watch only the ones I'm interested in.
There's a problem with the delta approach. Latency. Sure, you might save a few kilobytes... but those additional requests each incur latency. So whatever raw transfer speed improvements you would get (which are not simply linear due to packet sizes, saving 1 byte inside a packet isn't nearly as important as saving 1 byte that would overflow to a new packet) would almost certainly be eaten up by the latency of those requests, especially given the way consumer connections are throttled on upstream. (Not that requests for these things would be large and run into throttling themselves, but considering the ways in which the throttling is done, especially in circumstances where the network connection is being used, such as in a family home)
This video is really a gem
Really great video, learned a lot and am excited to try-out some of these methods.
Love it! Is there more? Can you please make a playlist for these educational videos?
Very great explanation! Thank you!
Could you please explain why the kiwi's in the PNG with the extra two columns of pixels were not compressed? I would've expected the second kiwi to have fit in the 32k window, and perhaps part of the third kiwi as well.
Thank you very much.
GOD bless you.
"Minification" reminds me of code obfuscation which companies used back in the day to distribute source code (back when the world was much more than Windows, OSX and Linux) while making it difficult for the user to read it, and -- even before that -- the tokenezation performed by the MS BASIC interpreter.
There is a mistake, in that the sorted list does not have the same cardinality as the source set. The correct result would be
[0,1,1,0,1,0,1,1,1,1,1,1]. He wanted to make an ideal set [0,1,1,1,1,1,1,1,1,1] to demonstrate the compression. In a real world example, it would be fine to have multiple entries that were the same. Those would lead to a value of zero in the delta output like I showed.
So I guess this history lesson is the reason Chrome is being cautious and hiding brotli behind a flag in Canary even though brotli was also invented at Google; while Firefox is going ahead and releasing it into the wild in the very release I'm downloading the update for now.
That said, assuming the proxy issues haven't disappeared (which they may not be as big an issue in the years since); it does tell me that those bugs won't be as big an issue for bzip2 or brotli in HTTPS traffic, which is a growing category of traffic.
The encode time(s) are a much larger part of latency than I would have thought.
The 50ms of encode time per 100kb of data figure is non negligible for files that rarely change like CSS or JS. Serving them a pre-compressed files makes a lot of sense. For CMS frameworks we just need the tools to do it for us.
Is the given amazon example based on the default gzip DeflateCompressionLevel (6)? And is this based on an average server side? Or a local machine?
The section around 27:00 is incorrectly stated and misleading -- "GZIP is inflating the smaller file" is /wrong/. Those red numbers are not an indictment of GZIP at all and are not indicating that GZIP is harmful or "scary". They're an indictment of the genetic algorithm-based minifier tools, which make the data inherently /less compressable/. In other words, they make GZIP a little bit less helpful, NOT harmful. GZIP is no less of a "silver bullet" with this argument. GA minification is.
I believe this is because people like the idea of being expert in one thing but just a few have eager and interest in discover more about what it is said to be important?
📺💬 I have been working with unreal-engine for 3 years and we found the problem of the game is that objects are in game sizes and when we move pixels that is a lot of data.
🥺💬 I understand also games and the GZip is involved in the compression of pixels when allowed some function working with data when it is in compression format.
📺💬 Huffman codes present in bits code by the order of the number of words present in the context that should be Huffman because more frequently present data is used less number of bits represent or priority in order for symmetric characters or word encoding.
📺💬 Yui you should correct the comment first 🥺💬 It is true if it is Huffman encoding, text line a, an, the or pronunciation will have the priority bits but the longest word matching is to find the number of them in sequences they are present in the context. Huffman is good for reading words because there are not much of repeating words when the table is but how about logging and number locations and bits represents⁉
🥺💬 I also read WinZip which is the reason why DAT format compression is over 70 percent for some text format, it supports bot the longest search and Huffman.
🧸💬 Advantage of GZIP is you still can work with data when it is in compression format.
📺💬 Delta compression where we can send patches to update the data to the client by the client continues working on the original file.
🧸💬 There is an application not only to update the text file for JSON parameters value but visualization of objects, screen transformed data, mouse pointer, and keyboard types and rules.
🐑💬 It is a good application but a security concern, they invented it for many years back but an attacker can replay on this transmission communication by reading from encryption packages because of real-time transform limitations today loing encryption algorithms are helpful with this method.
📺💬 Horizontal Delta compression 🧸💬 Transferring of data in field format grouping and priority you can reduced sizes of communication packages because the data reply required to process are arrived at the same time.
its an official google channel, I guess allot of people subscribed over the years but not many watch every vid ?
very great explantion!sir
This video is amazing.
Great talk!
Doesn't LZMA offer the best web solution because it has the smallest storage and network footprint and the fastest decoding? I ask this because it seems encoding time isn't any where near as important since it is done once before deployment.
+snetsjs gzip is about 2x faster at decompression than any lzma, although lzma compresses significantly better, getting the content to you (for decompression) quicker, so there's an awkward tradeoff. Basically, the faster the network, the more attractive gzip is.
+John Jones Oh, I just saw the bit where he provides the numbers which is why you thought lzma is faster than gzip. Those stats are garbage. LZMA is way better at compression than gzip, but takes much longer to compress and a little bit longer to decompress. The LPAQ and bzip stats look reasonable, so its really only LZMA that looks weird to me. It's all about the data your trying to compress at the end of the day, but these stats are weird.
the idea to make your api results complete hell to deal with for saving 100 bytes, yea seems like a bad route.
Wow, how about a 90% compression system that is math based and not huff based, what effect would that have on the net....video streaming, etc.
From 25:00 how do you sort the combination of digits since 3 & 2 are listed twice?
24:50 there are 2 2's and 2'3s
can we get this in 1080 or higher?
OWWWWwwwww MY HEAD~ My head.... I need ibuprofen & compression.
I was hoping to hear about sdch.
About this newfangled WebP format: yeah, not too great, sounded too good to be true; Wikipedia cites researchers describing much more blurry results than with JPEG, and no significant reduction of size in memory and storage.
Drinking game. Take a drink every time he says "Fantastic".
Good luck.
5:37 - as of 2017, Firefox, Safari and Edge still don't support WebP
11:50 - large size plagues also native JavaScript, I was working on a small Web app - shop with custom T-shirts - and it got to about 1.1 MB of JavaScript bundle (Browserify), and 700 kB after minification. Most of the size was used up by Firebase library and React library.
Is this solved in HTTP/2?
playing 0:10 over and over to understand how he got such speed
I use chrome everyday, and here is the developer.
Me ( aka super sexy bald guy) trolololol
what about convert the whole data file into a gigantic expandable numerical array, then sum all the values into a gigantic floating point number and then store this number into a new file? or would that not work?
There is not that much win with something else then gzip. But so much bla bla.
From 25:00
ЛЫСЫЙ ИЗ БРАЗЗЕРС АХАХАХАХАХА! КРУТО Я ПОШУТИЛ:)
i wonder, uve 400k subscribers but only around 100-1000 clicks per video. why is that
Six down votes? Why?
+Fernando Basso Until now, 12 people came to watch cats...but got bombarded with knowledge
Thanks GoogleDevelopers.
maybe because of the sheer volume of video uploaded..people selctively view videos..
i think its good stuff .
sounds like its the middle boxes problems, not yours. If the browser and server determine what they can use, the all is good, if there is a middle box in the way, then let them get the complaints.
Also if I may add, a lot of page laoding times I see are not due to a 2-5% saving in content size, but rather poor devlopers using all these frameworks and causing so much overhead and including 3rd party stuff, java, css, etc,, from 3rd party sites, its horrible.
Great info here. But I cringe every time I hear gif mispronounced. 'Choosy developers choose gif.'
+Ryan Patterson Jraphics interchange format?
+Ryan Patterson Choosy developers choose more than GIF and they also don't pronounce "Graphics" with a "J".... JIF is like a bad dream
+Ryan Patterson It will always be Jif to me.
Yeah, and "jizzip" :P most obviously from JiNU ZIP :P
If all Google developers are that competent, that would explain the constant degradation of RUclips and their other websites :q
That's how the guy who created the format pronounces it... "J"if. So that is the correct way I guess.
Thank you for not pronouncing "gif" like "jif."
gif and Google or jif and Joogle? ;-)
If it were a human, it would be able to drink and drive. lol
18:10
remove bzip support because of broken proxy software? and you call this idea "smart"? are you kidding??
399 thousents people wanna be programmers
ひどいじじ。やっぱり😠