For anyone interested: this is roughly how the fourier transform on images works. The main 3 differences is that you don't do it on 8x8 blocks but on the image as a whole, you consider 'waves' with lots of different angles and not just vertically and horizontally and you also add in sine waves and not just cosine waves.
There's a few steps that he skipped. as many of you might realize, images aren't all going to make those nice blocks of 8, so you need to pad the edges with a few pixels most of the time. second, he never actually talked about how the DCT is mathematically performed. Basically it's matrix multiplication between your shifted values and a DCT matrix that's generated as basically a sum of a bunch of cosine saves. third, i'm not sure if this is included with their huffman video or not, but the values are actually stored in 1's complement, which is interesting because they completely ignore 0's. In coding, there's basically a skip code which could mean either skip every remaining coefficient or skip a chain of several. fourth, the DC value (top left) isn't just stored separately, but it also needs to be encoded in a separate way. Because the values are typically so much larger than the others, you don't actually store the DC value itself, but the comparison to the previous value. For instance, if the first value is 84 and the second is 85, you store 84 for the first block, then 1 for the second. This is by far the best video I've ever seen for explaining jpeg, and all of the above isn't really necessary for anyone just curious about how jpeg works, but it's still cool stuff to know imo.
I'm trying to work out step two you listed, applying the DCT but I'm not getting the correct values. Would you be able to help me out? I'm using google sheets to get the sumproduct of T M T'. For the DC I get -332.7 instead of -370. The AC values are just completely out of whack
doe maeries Typical computer scientist. :P Almost half of my fellow students can do this aswell. And the drawing of the function? Well... If you are forced to draw them so often in your courses, you just get used to it, I guess. (He is competent and good though, nonetheless, it's just not THAT special in our field of nerds. :p)
Oh SOB, I didn't even notice it and I've seen this video a few times, only because of this comment, wow, awesome trick, smart guy, a little nervous had a energy release with that, it usually happens
I am so glad that I clicked on this video out of the search results to learn something about DCT. I have to say that the quality of teaching in this video is simply top-notch. Many other videos out there simply explains how to calculate DCT, without ever relating to any practical usage at all. Some of them dwell only in the dark regions of the textbook filled with a lot of formulas.
Okay so during my university course we learned this in 90 minutes. And there are still some bits in this video that we never learned. This is so much better explained than anything we learned or that I could find on the topic online. Very awesome video!
After an online university lecture about JPEG compression, that video sets all the stuff in my head to the right places! Thanks for such a great example of tables with their input/output performed
This is so cool! It's like Fourier Transform but only the cosine coefficient. In university class, analysis, the Prof always said it's being used all over the computer image and video compression, but never really gave an example, so now that I have one its really cool to see this at work
Thanks so much for the clear explanations! I was reading through different papers trying to understand the concept of DCT but always felt a gap here and there. This video gave a super lucid and straightforward understanding in a layman-friendly way.
How I missed this great lecture about JPEG all these years. Well youtube won't recommend these videos and I searched for JPED compression and landed up here part 1 and part 2. Amazing :D ....
For those wondering what is a *macroblock*, it is a superset of *blocks*. For instance, in *4:2:0 YCbCr* (subsampling by two both horizontally and vertically), a macroblock is 16x16 pixel², thus containing four 8x8 pixel² Y blocks + one 8x8 Cb block + one 8x8 Cr block.
Omg I paused the video, played it in slow motion a few times, practised the rotation by holding the pen with my other hand and after 10 minutes I did my first successful pen spin. I did not expect to learn that watching this video.
greetings from Turkey. we will do a jpeg algorithm this year at school. While researching I found this video. You explained it very well. I hope we can succeed too.
Very well done. Very clear explanation that included all necessary information to get an understanding of the entire process! I wished my prof would take that as an example of an efficient way of explaining a theory. He could save 50% of his time.
Worth waiting for! Thank you for this very enlightening explanation. I realise there's more to it than you showed, but I now have a very good idea of what's going on. It may sound odd to say this, but I think this is an important day in my life. I've been using JPEG since at least 1995, and twenty years later I've finally discovered some of its secrets. It's like having a deep talk with an old friend...
So the spectral method with truncation is the common practice in compressing images. Now I understand why we see the dirty patches from highly compressed jpeg images of texts or line works, which in fact have different weights (smaller in low freq. and greater in high freq.) from the pictures.
I always thought it was more complex than this, with JPEG using more systems/divisions or shapes/patterns for the image compression. I never realised the 8x8 sections were using just cosine waves. Wow.
Wonderfully done, but it would have been nice if you explained how the coefficient for each 8x8 DCT was calculated. I assume its just a straight accumulation of each pixel difference on the 8x8 block, hoping for a total of zero, but I'm left wondering.
I used a hex editor to mess with the quantization table of an image, fun times! The picture comes out all weird looking, but once you know how it works you can achieve some interesting effects.
It blows my mind how some people can just rattle this stuff off like it's nothing, meanwhile if you asked me what I ate for dinner last night I'd probably have to think for two solid minutes. Tons of great information! RGB hurts my brain much less than YCbCr..
I think that the next video should be about Haar Wavelet Transform and its superiority to the method in this video. It's a shame that newer implementations of JPEG are not more popular than this old method with DCT.
This was a very good and thorough explanation of the encoding. I am currently studying transform theory and signal processing and this was a great complement for further understanding!
There is one step missing between zigzag-reorder and Huffman: the zeros are actually not compressed by Huffman at all, but the block is zero-length encoded first: instead of storing all the 0s, you just store the number of zero elements.
It is the first time in my life I actually understand what happens with DCT in JFIF compression. So I am grateful for this video. But WHY has everything to be shown so fast. Papers, figures, graphics, diagrams are all shown like flash-flash-flash as if this were an action movie. As far as I understand video time on RUclips does not cost anything, and neither does it require reels of celluloid or film processing. Like I said, I understand how DCT is used, but I had to replay the video several times and pause it often.
Okay okay, guys.. I don't understand mathses that well so help me out... "Windowing" is defining a finite domain of the series of functions, right? And what is the difference between FFT and DCT exactly? They are both series of sine or cosine waves at different frequencies used to make an approximation of a signal or other function or such. tried the wikipedia page, but that is not much helpful with those big words :/
Modified DCT is used to compress MP3, AAC etc. audio formats.. Didn't know that, read it in wikipedia but since it is essentially a frequency filter, i thought it could be used for audio too. Audio, of course is 1 dimensional and images are 2D so the exact same can't be applied, thus "modified" DCT or MDCT.. Quote " In MP3, the MDCT is not applied to the audio signal directly, but rather to the output of a 32-band polyphase quadrature filter (PQF) bank."
matsv201 Ah, thanks for that tidbit. It was if i remember right it was marketed as MultiMedia Extension or something like that and it really was considerably faster than non-MMX machines.. Made my first album on 188MHz MMX :)
SquidCaps Actually sound compression using DCT operate reversed to image compression. JPEG transform real color values into frequency domain as explained in the video but sound is already in frequency so the DCT transform them into real value domain before compression.
Stig Helmer That's not correct. Uncompressed PCM audio data as well as raw data fed to the audio card is in the time domain, not the frequency domain. That's why filtering of audio data is done with FIR and and IIR filters in the time domain with convolution. If audio generally was represented in the frequency domain, you would just do filtering with multiplication and an appropriate window.
A lot of info compressed into 15 minutes, well done. If PNG is equally fun to explain then to that next! It would be good to see something lossless in comparison. Thanks.
This guy is really good at explaining really complicated concepts. College senior here, and I was fascinated by the gist of it. I always wondered how the hell are jpegs so small. Will there be any talk about PNG files? I guess they have a different encoding dimension for the opacity/alpha levels.
At 6:58 “we calculate the DCT coefficients” which are the weights of each cosine wave, or the amount that each cosine wave contributes to the original image. But the actual calculation is not shown, suddenly a piece of paper with DCT II coefficients just appears with all the numbers. How are these coefficients calculated??
you can see it at 6:26. Basically, each 8x8 block from the image has a contribution from all blocks "the ones that have blue borders". How do you find the contribution of each blue block to our 8x8 image block ? Well, you correlate the 8x8 image block with a blue block and the result will be a number (coefficient). This coefficient is the "weight" of the blue block to our 8x8 image. Correlation here means multiply each element in the 8x8 block with its corresponding element in the blue block and sum the result into one number. Hope this helps
Great video. Can you also talk about JPEG2000 compression algorithm? I heard that it uses discrete wavelet transform to achieve even higher compression than DCT.
1:24 that is the best freehand sine wave I've ever seen.
Cosine*
It's exactly the same curve, just shifted to the left by π/2 .....
Look again, he bottomed out the curve well past pi radians. That is a blatant error.
@@davidjames1684 lol you would hate to see what I can achieve freehand
David James Oh dear, that will not do. Fire up the guillotine and prep my guillotine dress.
18 years after university, I finally understand what my prof tried to explain....
But it's the best way to get in touch with professionals a and get job offers in the field as a junior
With luck and understanding other parts well enough.
doubt you need that to graduate (at least as bachelor) ... . took disney years to figure out how to handle fur of pets :P
Il vaut mieux tard que jamais
Better late than never
@@カラスKarasu It is? How so?
I love it when I procrastinate and notice "hey, I need the stuff this guy's explaining in my exam!"
double win
Was exactly my thoughts just before i saw this comment :D
This young gentleman uses paper and pen to explain something so well, much better than many others using fancy cartoons and movies. Thanks!
For anyone interested: this is roughly how the fourier transform on images works. The main 3 differences is that you don't do it on 8x8 blocks but on the image as a whole, you consider 'waves' with lots of different angles and not just vertically and horizontally and you also add in sine waves and not just cosine waves.
There's a few steps that he skipped.
as many of you might realize, images aren't all going to make those nice blocks of 8, so you need to pad the edges with a few pixels most of the time.
second, he never actually talked about how the DCT is mathematically performed. Basically it's matrix multiplication between your shifted values and a DCT matrix that's generated as basically a sum of a bunch of cosine saves.
third, i'm not sure if this is included with their huffman video or not, but the values are actually stored in 1's complement, which is interesting because they completely ignore 0's. In coding, there's basically a skip code which could mean either skip every remaining coefficient or skip a chain of several.
fourth, the DC value (top left) isn't just stored separately, but it also needs to be encoded in a separate way. Because the values are typically so much larger than the others, you don't actually store the DC value itself, but the comparison to the previous value. For instance, if the first value is 84 and the second is 85, you store 84 for the first block, then 1 for the second.
This is by far the best video I've ever seen for explaining jpeg, and all of the above isn't really necessary for anyone just curious about how jpeg works, but it's still cool stuff to know imo.
the first is at 14:45
hey man, can you answer me, I need your help
Explains why JPEG lossless cropping never quite matches the original boundaries unless they hit edges of the frame.
I'm trying to work out step two you listed, applying the DCT but I'm not getting the correct values. Would you be able to help me out? I'm using google sheets to get the sumproduct of T M T'. For the DC I get -332.7 instead of -370. The AC values are just completely out of whack
@@steelmagnum use matlab image processing toolbox
These series are fantastic for three reasons: 1) high quality information 2)organized, sequential presentation with examples 3) no youtube fluff
6:11 nice how he did the trick with the pen without even stopping to talk
doe maeries well spotted!
doe maeries This guy has skills. His freehand sinusoids are also pretty impressive.
doe maeries Typical computer scientist. :P Almost half of my fellow students can do this aswell. And the drawing of the function? Well... If you are forced to draw them so often in your courses, you just get used to it, I guess. (He is competent and good though, nonetheless, it's just not THAT special in our field of nerds. :p)
I rewatched that part 4-5 times, so awesome that was.
Oh SOB, I didn't even notice it and I've seen this video a few times, only because of this comment, wow, awesome trick, smart guy, a little nervous had a energy release with that, it usually happens
I am so glad that I clicked on this video out of the search results to learn something about DCT. I have to say that the quality of teaching in this video is simply top-notch. Many other videos out there simply explains how to calculate DCT, without ever relating to any practical usage at all. Some of them dwell only in the dark regions of the textbook filled with a lot of formulas.
Okay so during my university course we learned this in 90 minutes. And there are still some bits in this video that we never learned. This is so much better explained than anything we learned or that I could find on the topic online. Very awesome video!
Tried to understand this around a year ago, but Mike really put it in words better than any book I read. Thanks!
I'd love to experiment with changing the quantization numbers and see what weird images that would produce. Like glitch art maybe. :)
I had the same thought after watching this
I bet you could do some really interesting stenography with this
now those arts are being sold as NFT
now those NFTs are worthless
After an online university lecture about JPEG compression, that video sets all the stuff in my head to the right places! Thanks for such a great example of tables with their input/output performed
This is so cool! It's like Fourier Transform but only the cosine coefficient. In university class, analysis, the Prof always said it's being used all over the computer image and video compression, but never really gave an example, so now that I have one its really cool to see this at work
Thanks so much for the clear explanations! I was reading through different papers trying to understand the concept of DCT but always felt a gap here and there. This video gave a super lucid and straightforward understanding in a layman-friendly way.
Extremely well presented. There was a bit about what exactly the AC values represented that I didn't already know and this video didn't skip a beat.
This is by far the best video I have seen on JPEG compression. He explained the process thoroughly.
This guys is so eloquent. make the jpeg so much easier to understand
Wow. Just wow. Hands down, the best video for understanding JPEG on the internet! Thank you Sir :)
How I missed this great lecture about JPEG all these years. Well youtube won't recommend these videos and I searched for JPED compression and landed up here part 1 and part 2. Amazing :D ....
This man explained this so easily which I was not able to understand through any article/book. Great job!!
I think no one has ever explained frequency transformation as good as this video. Thank you man!
For those wondering what is a *macroblock*, it is a superset of *blocks*.
For instance, in *4:2:0 YCbCr* (subsampling by two both horizontally and vertically), a macroblock is 16x16 pixel², thus containing four 8x8 pixel² Y blocks + one 8x8 Cb block + one 8x8 Cr block.
Very well done! I've been meaning to learn more about JPEG, and here you come along and explain it very coherently. Thanks for that!
Branch Education just did a fantastic video on jpeg compression but this one is even more fantastic!
Obamna
this is sickly true.
That was such a discreet pen flip
looks more like a twirl to me, not a flip. Also, closer to 6:14, not 6:10.
Bloke talks a bit loud and fast, though. Snowden semi-doppelganger with brow jewel to boot. Sorry, back to the graph now.....
@@a.wosaibi
discrete* :)
Brilliant - I've wondered this for 25 years (since meeting the .jpg format in 1996). Another well prepared lecture by Dr Pound
This is the best basic explanation of JPEG compression that I've seen.
What I love about this channel is that it keeps me interessted in maths lessons.
I love the by-the-way-I-do-penspinning on 6:13 xD
Omg I paused the video, played it in slow motion a few times, practised the rotation by holding the pen with my other hand and after 10 minutes I did my first successful pen spin. I did not expect to learn that watching this video.
OMG! Mike just drew the most perfect sine curve I've ever seen drawn by hand! Impressive. Most impressive.
and once again this channel comes to the rescue, doing a superb job in explaining a complex concept in an easy manner
Man, this is brilliant. I'm going to put the video to my college students and sit with them to watch it instead of giving the lecture myself.
This is the best explanation for the DCT process I've ever searched.
greetings from Turkey. we will do a jpeg algorithm this year at school. While researching I found this video. You explained it very well. I hope we can succeed too.
14 day for finish...
Very well done. Very clear explanation that included all necessary information to get an understanding of the entire process! I wished my prof would take that as an example of an efficient way of explaining a theory. He could save 50% of his time.
Thank you for saving my course project, Sir.
Love the casual pen spin 6:14 .
Worth waiting for! Thank you for this very enlightening explanation. I realise there's more to it than you showed, but I now have a very good idea of what's going on. It may sound odd to say this, but I think this is an important day in my life. I've been using JPEG since at least 1995, and twenty years later I've finally discovered some of its secrets. It's like having a deep talk with an old friend...
You addressed the meaning of frequency in images, which others completely miss out . Thanks.
6:14 dayum boi that penflip tho
Your knowledge with your great voice makes this subject more interesting
This is so fascinating!!! This video is gotten me more interested in compression than i already am. I love seeing math at work.
this is just awesome. Thank you for explaining JPEG in a compressed form
You explained this 100x better than my prof did in 1/100th of the time... Thank-you for this!
The jpeg videos are probably some of my favorite computerphile videos! Well done! 😊
Very nice explanation. I've had some ideas of how the encoding works, but seeing the cosine chart really clarified it.
9:53 so satisfying when he reveals all the 0s that can be huffman-encoded!
What a nice repetition of forgotten lectures back from college :) Thanks a lot!
So the spectral method with truncation is the common practice in compressing images. Now I understand why we see the dirty patches from highly compressed jpeg images of texts or line works, which in fact have different weights (smaller in low freq. and greater in high freq.) from the pictures.
Well, that was much easier than reading 5 pages of the book. Thanks.
I always thought it was more complex than this, with JPEG using more systems/divisions or shapes/patterns for the image compression. I never realised the 8x8 sections were using just cosine waves. Wow.
You guys make the most interesting video on youtube, all channels considered. High level synthesis. Please keep doing what you do!
Wonderfully done, but it would have been nice if you explained how the coefficient for each 8x8 DCT was calculated. I assume its just a straight accumulation of each pixel difference on the 8x8 block, hoping for a total of zero, but I'm left wondering.
I used a hex editor to mess with the quantization table of an image, fun times! The picture comes out all weird looking, but once you know how it works you can achieve some interesting effects.
very nice series of videos. used them for preparing exams on multimedia systems.
This is by far the best JPEG explanation I found. Thank you!
Incredible, clear and concise explanation. Greetings from Argentina
This is one of my favourite videos on RUclips.
Great explanation of some fairly difficult subject matter. Looking forward to the next part.
Jpeg is like a fastfood worker who drops the bun on the floor and picks it back up "they won't notice"
And you won’t.
@@cancername true
It blows my mind how some people can just rattle this stuff off like it's nothing, meanwhile if you asked me what I ate for dinner last night I'd probably have to think for two solid minutes.
Tons of great information! RGB hurts my brain much less than YCbCr..
Dude may be the guys teaching in the video doesn't now what he eat yesterday but its the passion that helps people store this much info in brain.
its just practise and interest. its like learning a language. after some time (and work, most want to skip *g* ) you get there usually.
Sir, you have way too much of knowledge. Thanks a lot for such super high-quality knowledge resources that you are providing for free.
Ingenious method to remove quick changes in a channel!
This was very straightforward to understand although I am a Mechanical Engineer !
this application is cool. now I have a better and concrete appreciation for the cosine wave
One of the best videos on computerphile. Thanks for this.
I think that the next video should be about Haar Wavelet Transform and its superiority to the method in this video. It's a shame that newer implementations of JPEG are not more popular than this old method with DCT.
Very Impressive Dr. Mike Pound..
This was a very good and thorough explanation of the encoding. I am currently studying transform theory and signal processing and this was a great complement for further understanding!
There is one step missing between zigzag-reorder and Huffman: the zeros are actually not compressed by Huffman at all, but the block is zero-length encoded first: instead of storing all the 0s, you just store the number of zero elements.
He is the coder that rips people off.
It is the first time in my life I actually understand what happens with DCT in JFIF compression. So I am grateful for this video. But WHY has everything to be shown so fast. Papers, figures, graphics, diagrams are all shown like flash-flash-flash as if this were an action movie. As far as I understand video time on RUclips does not cost anything, and neither does it require reels of celluloid or film processing. Like I said, I understand how DCT is used, but I had to replay the video several times and pause it often.
the greatest video ever created tbh
Probably one of the best explanation I have ever seen!
DCT is very similar to FFT as it converts samples from a time domain to a frequency domain. MP3 also uses DCT but with windowing.
Mr Spectacals Correct. MP3 uses a variant of the DCT-IV with overlapped window, called the MDCT. that's applied to the granules of each subband
Okay okay, guys.. I don't understand mathses that well so help me out... "Windowing" is defining a finite domain of the series of functions, right?
And what is the difference between FFT and DCT exactly? They are both series of sine or cosine waves at different frequencies used to make an approximation of a signal or other function or such.
tried the wikipedia page, but that is not much helpful with those big words :/
Great explanation! All the 8x8 printouts worked well on my brain.
I'd really like to see a video like this one explaining how mp3 compression works
Modified DCT is used to compress MP3, AAC etc. audio formats.. Didn't know that, read it in wikipedia but since it is essentially a frequency filter, i thought it could be used for audio too. Audio, of course is 1 dimensional and images are 2D so the exact same can't be applied, thus "modified" DCT or MDCT..
Quote " In MP3, the MDCT is not applied to the audio signal directly, but rather to the output of a 32-band polyphase quadrature filter (PQF) bank."
SquidCaps Yepp, thats was why the Pentium MMX was made
matsv201
Ah, thanks for that tidbit. It was if i remember right it was marketed as MultiMedia Extension or something like that and it really was considerably faster than non-MMX machines.. Made my first album on 188MHz MMX :)
SquidCaps Actually sound compression using DCT operate reversed to image compression. JPEG transform real color values into frequency domain as explained in the video but sound is already in frequency so the DCT transform them into real value domain before compression.
Stig Helmer
Again, makes sense, audio is serial data to begin with.
Stig Helmer That's not correct. Uncompressed PCM audio data as well as raw data fed to the audio card is in the time domain, not the frequency domain. That's why filtering of audio data is done with FIR and and IIR filters in the time domain with convolution. If audio generally was represented in the frequency domain, you would just do filtering with multiplication and an appropriate window.
This guy is pretty good at drawing those waves.
A lot of info compressed into 15 minutes, well done.
If PNG is equally fun to explain then to that next! It would be good to see something lossless in comparison.
Thanks.
George Edwards PNG is unfortunately not as interesting as JPEG.
Dr. Mike I love the way you explain things
terrific video...cleared all the doubts in a flash...thank you sir
Damn, Mike Pound can really draw some freehand cosines
Great work, I have DCT on exam next week and I finally understand it. :)
This guy is really good at explaining really complicated concepts.
College senior here, and I was fascinated by the gist of it. I always wondered how the hell are jpegs so small.
Will there be any talk about PNG files? I guess they have a different encoding dimension for the opacity/alpha levels.
Very good explanation. The sheets with tables helped a lot!
Thanks a bunch! Helped with my exam tomorrow
I actually understood most of that. Things are looking up.
Finally i knew the relation between waves and images. Thank you!
Such a complex process but so well explained! awesome video
Never thought it's so complicated, great video!
Awesome video... Thanks for pointing out the important stuff needed to understand how JPEG actually works. :)
Wow! I really very badly needed this explanation. Thanks a ton!
At 6:58 “we calculate the DCT coefficients” which are the weights of each cosine wave, or the amount that each cosine wave contributes to the original image. But the actual calculation is not shown, suddenly a piece of paper with DCT II coefficients just appears with all the numbers.
How are these coefficients calculated??
you can see it at 6:26. Basically, each 8x8 block from the image has a contribution from all blocks "the ones that have blue borders". How do you find the contribution of each blue block to our 8x8 image block ?
Well, you correlate the 8x8 image block with a blue block and the result will be a number (coefficient). This coefficient is the "weight" of the blue block to our 8x8 image.
Correlation here means multiply each element in the 8x8 block with its corresponding element in the blue block and sum the result into one number.
Hope this helps
insane hand-drawn waves, wow.
That was really good! I've never been clear on how JPEG did its magic -- now I know. Thank you!!
Great video. Can you also talk about JPEG2000 compression algorithm? I heard that it uses discrete wavelet transform to achieve even higher compression than DCT.
FANTASTIC explanation.
Very nice video. Mike has got some serious didactic skills. Plus he's prepared!