Still images are fairly easy, the compression is relatively simple. Video is a lot harder to do quickly in software, especially compression, which is much slower. Computers, and especially smartphones have ASICs (Application Specific Integrated Circuit), to do it much more efficiently. Instead of general purpose CPU doing it, a circuit that is hardwired to do exactly one thing is used, offering massive efficiency gains, which are absolutely critical for use in low power devices. Software is more flexible, and tends to be able to get higher compression, or better quality for the same compression, at the expense of processing time and power usage.
as someone whos on youtube a lot, these still manage to be some of the best videos. The research, editing, the pace of the lesson. Thank you guys for the hard work
As somebody who has worked on JPEG implementations, I am still amazed by how perfect they got it on the first try. JPEG has been around for ages, and is still so good; only recently have meaningful replacements emerged, and JPEG still has some benefits over these.
@@allkillhon5209 avif is good too, although not widely supported yet. webp is much better than jpeg, don't really know why it hasn't been adopted widely yet
@@zxbryc JXL is definitely very interesting, although I don't know too much about it. I just think with rising performance in almost everything, and hardware decoders for AV1 & AVIF in new phones it might be a viable option in the future, for fast image loading on bad mobile connections for example
webp is trash though.... not much advantage vs jpeg and lots of compability issues. Instances when a site uses webp for own images but doesn't allow webp image upload is annoying. Windows can't open it out of the box, PS needs a plugin.... And that despite the format being out for 12 years 10% better compression might have been nice 10-15 years ago, today people have 100mb/s in their phones and gigabit via fiber, whether an image takes 1mb or 1.1mb doesn't make any difference.
The Discrete Cosine Transform blew my mind! I’m a software engineer and knew a lot of the basics of JPEG compression already but this bit in particular really stood out, learnt something completely new. Great video and high-quality explanation!
There's just nothing like these videos, really. I'm blown away every time. The voice-over and writing are just so far beyond anything else out there. I wish all information could be presented in this calm, clear, and concise manner than also somehow manages to never dumb it down.
I have a degree in computer science and I'm still amazed how good your channel is! Always wondered how JPEG compression worked. You have illustrated it in super simple terms and visualizations have as always been awesome! Happy to see more content like that, maybe more details on H.264.
The perfect reason to cancel all school loans. If youtube is going to tell everyone everything for free. You do know one day there won't be a pandemic right or a bunch of chaos coming from trump and his followers right? One day. You're just going to be competing with everyone who knows the same crap as you.
An absolute masterpiece of explanation. I've tried to understand how DCT works in JPEG before, and never understood the dry maths-only explanation. This video changed that and the image tables explained it perfectly. Thanks for filling in a gap that's been in my head for many years!
You were probably confusing it with the GIF format. It stores images with a limited palette so the numbers of colours in each imge is reduced. But unlike JPEG it allows transparency and animation, so despite the greater file size it was popular on the early web.
@@plashplash-fg6hd Moreover, JPEG is not just the image format, it is a compression algorithm. For example, one can use it in TIFF images. Damn, what obsolete trash! And it was obsolete trash already at the time of this publication.
Another factor that compression works so well is that your brain all the time fills in the «gaps» in information. Especially when watching video. Our eyes is actually worse than you think. The brain does a huge amount of work filling in various gaps in information. For example each of our eyes has a blind spot. We usually do not see it since the brain erases it and fills in the missing information
Your videos are SPECTACULAR! It's obvious that you put a lot of work into them, and it shows. You seem to intuitively know how to achieve the perfect balance of technical detail and higher-level concepts to enable the viewer to develop a solid overall understanding of the subject matter. THANK YOU!
Great Video, just a little feedback on chroma subsampling: The reason why the human eye can perceive more luminance resolution than chroma isn't because of the number of rods vs cones. The rods are actually inactive in all but the darkest lit environments, and their density in the fovea i.e. the center of the visual field on the retina is actually rather low. So the RGB cone cells are solely responsible for tristimulus color vision. The actual reason for the color resolution difference is because there is some amount of cell level processing happening in the eye before the signal reaches the visual nerve. This is very much analogous to the Lightness chroma separation done in digital images. Also, there is some correlation between spatial resolution of each of the red, green and blue cones in terms of the lightness but little correlation between them in terms of the color. Those effects together cause the lower chroma resolution of the human visual system.
"..actually inactive in all but the darkest lit environments, and their density in the fovea i.e. the center of the visual field on the retina is actually rather low." So that's why when I am in dark I see some very dimly lit stuff more clearly when not looking directly at it, but the moment I move my vision towards it, it disappears?
I'm a photographer and I always wanted to know this. I didn't understand anything. But great video. One day I will watch it again. Hopefully I will, then, understand it.
This is... AMAZING. It's probably almost impossible to explain this even just marginally better. The amount of well-considered effort you go through to make things more clear is extremely impressive. Subbed!
And this is the reason why religions are accepted and exists in the first place. Because it is more convenient for a LOT of people to think about complex topics as 'magic' and 'just works'...
A remarkable understanding to appreciate whether or not, how the conversion process to JPEG takes decision of how the human eye accepts what it is seeing.
@@lucasrem1870PNG and JPG are two very different kinds of compression, and as such, good for very different things. PNG is a lossless compression which deals the best with "blocky" images, i.e. where there are a bunch of consecutive pixels with the exact same color adjacent to each other. This is, for example, includes (rendered) text or (vector, engineerinng) drawing on solid color background, or similar ideas. These can be compressed to very small sizes with PNG without any artifacts, while the JPG-compressed counterpart will be worse in quality and most of the cases even bigger in file size (than the PNG version). However, compressing natural images where there are a lot of color gradients and edges, while it is possible with PNG, it is rather "encapsulation" and not really compression, as the file size will not be meaningfully smaller in most of the cases. This is the territory where JPG is better, provided that you are okay with it's lossy nature.
Since the very 1st video I watched, this channel is one of a very few that I clicked the bell button without even thinking twice, each subject is presented with great clarity, beautifully illustrated and great narration as well. well planned, edited and organized by key topics.
I remember when JPEG first came into existence and it would take forever to compress and decompress that you would see the image go from a pixely mess into a nice image. This was the only format of image shared across BBS's over dialup, that and GIF. Bitmaps (BMP's) could be found rarely but it would take a decade to download. This by far is the most handy format for quickly sharing images.
JPEGs going from a pixely mess to the proper image is because the image is a Progressive JPEG and not a Baseline JPEG. Baseline JPEGs are displayed line by line as they are downloaded while Progressive JPEGs contain a number of Level of Detail versions of the image. Progressive JPEGs will load and display the lowest LoD, and then when a higher LoD has finished downloading, it will switch to it, and it does this switching through the higher LoD images until the proper image is downloaded and displayed. Progressive JPEGs are larger than Baseline JPEGs because of the additional LoD images.
@@Oshroth Yes. The many LOD's were useful over slower connections as they showed you roughly what the image was quickly so you can make a choice without having to wait for the full image to load. This was back when browsing a webpage was faster cause they optimized it for speedy loading of the text and progressive loading of images.
16:28 Well, vector images are a different beast altogether. They aren't rasterised, so _any_ bitmapping format will result in quality loss, even bmp and png. Vectors are made to be scaled and work by having nodes and lines that make up the image. Instead, what you probably meant, was that jpg doesn't perform well on digital images, such as pixel art, or anything with sharp contrast, such as text overlaying on any other image. Jpg also isn't preferred when working with intermediate copies. If you have a photo that you want to use in a video, for example, where it may undergo additional colour grading, you do not want to use a jpg image. This is because any additional grading can highlight artefacts, and because the image would be compressed twice; once by the jpg algorithm and again by the video compression.
@@LucienHughes Yeah, but any format is a bad choice for saving vectors, except for vector formats. You rasterise it for viewing (unless you use a vector display, like an analog oscilloscope), but you can still scale it without losing detail. You can't do that with a bitmapped format, not even png.
I am amazed by the 3D transitions between different 2D text/diagram sections. This form of presenting learning feels closer to learning in VR. I wonder if one day this channel will create VR episodes which we can move around in. Great delivery of a new way of learning!
to me, the compressed example looks more colorful and with more contrast. If anything, the uncompressed image has slightly better dynamic range in the shadows
As someone who has studied image and video compression at university, I can say this is a GREAT video and it's spot on. It explains the basics in just the right amount of detail to get a general understanding. Would have helped me a lot in the beginning.
Nice, love details and simplicity in explanation of how it work. Small tip for far future video idea: accelerometers and gyro/magnetic sensors.... I always loved to read on how they managed to make those sensors.
@@sierra991 It actually use MEMS in scale like one or two µm that is insane(as most of stuff that is in nano scale) but in that small scale it need to be super accurate and still robust. Its just mind blowing that we managed to create few atom size transistors for eg SSD or OLED screens but mechanical components is just another level as its need to move in very specific way and best part is that MEMS technology is "low-cost"... its easier to make than grow damn tomato plant -_-
and they say perfection isn't achievable. this is an honestly amazing video and now I'm going to write a jpg parser and decompressor just for fun because you made it. I also do embedded development, and a microprocessor I was working with had hardware jpg decoding - yet I never really thought it could be something this simple because when I googled it all these complicated terms came up. I didn't stop to think that hardware implementation are usually only feasible with simple algorithms. but now I'll know better in the future I've also been aching to implement h.264 decoding on an fpga for direct network playback (so I don't need a nasty closed source firmware raspberry pi to do it), so I can't wait for that video too! I've been trying to read up on the subject but good resources are essentially non-existent. this video alone already helped me a lot in that regard, so even if you only release the h264 one in a few months and I'm done with at least a simple software implementation by then - I'll still watch it just because your videos are great, and much of that will still be owed to this very video right here. thank you!!!
" I didn't stop to think that hardware implementation are usually only feasible with simple algorithms." Any algorithm can be implemented in hardware (chip/transistor level), simple or complex. Logic gate structures can form turing complete computation, just the same as the general processors that use them. This also can drastically speedup the computation due to less clock cycles for the same task (no overhead for instruction sets and less data shuffling).
Still had my window up and figured I'd clarify another thing: "yet I never really thought it could be something this simple because when I googled it all these complicated terms came up." The actual implementation is much more complex than this video lets on. It's an excellent video for a general overview, but he skips through pretty much all of the math and even gets a few things wrong (like misusing the term 'vector' when discussing issues with Jpeg compressing a raster... but vector is entirely orthogonal to raster). The actual implementation deals with the math of fourier series and fourier transform, whereby an arbitrary signal can be represented by a summed series of cosine (and sine for complex) signals with a specific amplitude coefficient and phase. If you write a Jpeg parser, it would likely be beneficial to understand this math, though you can probably implement the algorithm without understanding it, just by running through the format documentation. As for the h.264, I'd be surprised if there isn't already an FPGA implementation of that floating around out there. However, it is licensed, so maybe they're protective of direct implementations just floating around, not sure.
This is the clearest, yet most comprehensive, explanation of JPEG compression that I've seen so far. I finally feel like I understand it. Thank you! One small quibble that is unrelated to the topic of JPEG compression: The video suggests that the rod cells in the human eye act something like the "luminance" channel of human vision to the cone cells "RGB" channel. The reality is quite a bit different. Rod cells are the "low light level" cells of the human eye, and do not contribute much to our visual perception in bright conditions. Indeed, bright lighting conditions, such as those of a regular phone or computer screen, are bright enough to overwhelm the rod cells and stop them from contributing a signal to our vision. They essentially "turn off" in such conditions. Conversely The cone cells are the "bright light level" cells and do not perform well in low light conditions. Also, the rod cells, though more numerous, do not work in isolation. In low light conditions, the human eye performs a kind of "binning" with rod cells, where it combines the signal from a large number of individual cells to produce a signal that passed on to the brain. Indeed, this binning can combine the light detection capabilities from so many cells that the resolution that we get in low light conditions (where the light-loving cone cells lose sensitivity) is far lower than one would expect given the number of cells involved. Try reading a printed page by moonlight alone and you'll see what I mean. (There's"temporal" binning as well, where the early processing of the eye's signal combines the light over a longer period, up to about 1/6th of a second, to produce a stronger signal and any movement during that time will "smear" the image.) So, when dealing with anything displayed on a computer screen, it is the "bright light" cone cells which do the heavy lifting. This does not affect the sharpness of our vision, however. That there are only "about 6 million cone cells" (and astonishingly few "blue" cone cells, BTW) is made up for by the fact that they are largely crammed into a tight area at the center of our vision known as the "fovea". The fovea covers only a few degrees of our vision- so the resolution in that region is quite high. Indeed, I've seen estimates that suggest if we spread the resolution of our fovea over our entire visual field, the result would be the equivalent of a 100 megapixel image or more. (TV manufacturers set their recommended viewing distance based on this sort of calculation, BTW.) Our perception of sharpness over a broad image is created by the fact that our eyes dart around (that is, perform "saccades") picking up bits and pieces of an image and assembling them in our brains. The parts of the image that our eyes do not land on are "filled in" with plausible sharpness or ignored. Try reading the paragraph above this one while staring fixedly at the period at the end of this sentence. (Don't cheat!) If you're a normal human being you can't do it, despite the fact the text above this paragraph is quite close to the center of your vision. Your eyes do not have many cone cells outside the range of that period and there's simply not enough retinal resolution there to make out the fine details of printed letters. You do not sense anything amiss, however, because your brain "fills in" the blurry areas with a "sense" of sharpness. The intensity levels are simply a measurement of the intensity of each cone cell's intensity- with a quite a bit of "local contrast enhancement" (a very interesting process, which would require many paragraphs of explanation to outline) done by the retina. Incidentally, the cone cells are not truly RGB (red, green, blue)- in that they have peak sensitivities evenly distributed across the visible spectrum. They are, instead, more like Y-YG-B (yellow, yellow-green, blue), but, with three different ranges of peak sensitivity, they are able to unambiguously distinguish all the colors of the rainbow. The rest of the details in the video which deal with our perception of spatial resolution and frequency is on point. It just doesn't have anything to do with the distinction between rod and cone cells.
WOW! All that computation required for a single pixel is amazing. And it's completed so quickly. Just imagine, when we went to the Moon, the rocket scientists relied mostly on their manual mathematical skills and slide rules. The power infused in a cell phone's processor is a hundred million times more powerful, yet it fits in your pocket. Technology is just amazing! Love this channel!
I'm a programmer and I've never bothered deep diving into jpeg, always treating it like some kind of magic. This just cleared my mind on everything about it in less than 10 minutes. I can now explain it to people, and it's almost worrying how fast this went. Damn, you're good.
Also, you should take a super deep-dive into how Laserdisc works. I get the basics of it, but still would be nice to dig into the actual nuts and bolts of the format. :D
The way you organized your presentation -mentioning a concept and saying you'll come back to it so you can continue in greater detail- was very appealing to my ADHD brain. You held my attention!
Amazing video as always!! The detail of explanation is precise and technically correct. I think that, once understood how JPEG works, MPEG should be not very difficult to understand. If you plan on doing videos about electric engineering topics, I'd love to see a video about OFDM access, and how it is used in 4G for letting hundreds of users connect at the same time at an e-Node B. Can't wait for the next video though!
Great video, I basically knew everything as my research is in video compression, I love the fact that he said video compression is extremely complex while giving an example of H.264, which is quite old now, it's still the most used for sure, but the newest codec VVC (versatile video coding) is times more complex than even HEVC which is newer than H.264, my work is in VVC, believe me it's a nightmare, don't take the videos you watch on the internet for granted ;)
I am curious to learn about the potential of the new codecs to be developed. I tried some google search but it looks difficult to find understandable source of information. Do you have any recommendations to start with ?
Interesting, I hadn't heard about VVC until now. As the successor to HEVC, which competes mostly with VP9, is VVC seen as a competitor to AV1 (since AV1 is the successor to VP9)?
No it's not the case well you said correct that video is compressed but if the video gets compress the low pixel pic must be as well. So the ratio must still be maintained.
Good thing to emphasize about the dct is that it's also reversible -- you get the same amount of information out as you put in, just arranged in a different way. (I explain this to myself by noticing that the frequency samples are all linearly independent -- none of them can be added to each other to make a third, no redundancy -- so you've switched from one 8x8 block of data to another 8x8 block of data.) This sort of thing (fourier transforms and cousins) are super important for signal processing
Great explanation on JPEG. I knew it compressed in blocks, and I vaguely knew it averaged color, but seeing this level of detail is amazing. Well done!
I love this channel! Super informative, great presentation and one with the best content in RUclips. Thanks for making those videos and please keep up the good work!
usualy videos like this claim to explain something but they just talk about it and dont explain anything .... but THIS video does actualy explain the techniques behind it i love it ... those vids are rare and hard to find .thank you for all the work you put into this
JPEG compression algorithm is what made me love image processing. It's one of the greatest example in computer science where you can see a consistent amount of mathematics used together to do something that is extremely useful in practice, with effects that are both visually and effectively tangible to anyone.
Those parts were probably recorded at a separate time, with a different microphone and/or settings. It's noticeable enough that it honestly kinda distracts me.
@@Sukigu That part on separate times is obvious, I noticed this with other channels as well. There is an unintentional sound effect on the 'second take' that i can sometimes hear even in movies (like a sort of high frequency tremelo on male voices) and i always wondered what caused it.
Came across your channel this week. Must say that your explanations are exceptional in the sense that they convey technical details very clearly and also very quickly grab the attention of the viewer. Exceptional i tell you. Keep it up. You should he used as the benchmark for teachers and lecturers all around the world.
This video is great and I would really like to see a follow up video for another popular image format called PNG, which is orders of magnitudes better than JPEG for vector images. The reason being that it is so widely supported while having other strengths and weaknesses compared to JPEG.
I must say that 10:1 compression ratio using JPG is quite impressive. For special images that are easily prone to artifacting, I just use the highest quality setting and that pretty much does the "trick". It would have been nice had they included a lossless option in original JPG for people that want a mathematically identical representation of the original image, but 1/2 to 1/3rd the size of a 24 bit RGB image. So basically getting it down to the filzesize of 8 or 12 bit color, but still retaining the original 24 bit color information.
The worst enemy of a photographer who is asked to post-process jpeg images from cell phones and make them look as if the scene was taken with a DSLR with a proper lens and saved as RAW data. A nightmare.
Due to the square/rectangular artifacts associated with lower-quality instances of the format, I'd always assumed JPG worked in some recursive way where it divides the image up into quadrants, throws out the least-used/prominent colors in each, and gets more "aggressive" about throwing out colors with each subdivision of quadrants, as the smaller the area the harder a discrepancy is to notice.
That's another algorithm, called JPEG2000. It works by subdividing the original image in four quadrants, and recursively subdividing the top left quadrant into other four quadrants, and so on. Each quadrant is described by a weighted combination of wavelets
@@gianluca.g this is what we call a quadtree. i heavily work with bin/quad/oct tree with my 3d engine. the first box is call the root, the smallest division/box are the leaf. i made a combination of all of those which i call a polymorphic tree (polytree for short)
@@jackl2254 as a software engineer myself I find quadtrees really elegant! Jpeg2000 however doesn't really use quadtrees, it just divides recursively the top left quadrant, not the other 3 quadrants. The top left quadrant get subdivided while the other 3 quadrants are described by some wavelet polinomial (similar to the cosine table in jpeg). OT: I rememeber binary trees being used in Doom to speed up sector rendering.
Back in the early 1990's I designed and worked on 'Slowscan' security systems over dialup modems. This was in the early days before the JPEG standard was published. We used our own Discrete Cosine Transform (DCT) based image compression, empirically tuned for the product's target usage. We actually used a hardware circuit to detect when an 8x8 block had changed enough to mark it as 'to be updated'. Due to the low processor computing power available at the time, the DCT was performed by an INMOS DCT/IDCT transform chip. The compression stage was left to the modem's built-in algorithm, such as V42bis.
Very good explained, maybe you should improve your voice, It's the only indication which shows you're not human, I appreciate you using your resources to help!
Can't wait for the x264 and x265 video, how P (Prediction) frames, I (Intra Frames) and B (Bidirectional Frames), AQ-Mode, AQ-Strength, block-size, B-Pyramid, etc., options work, how well it compresses data and why sometimes x264/x265 compress dark scenes badly along with artifacts or why x265 tends to blur out certain scenes while failing to apply proper DCT and quantization to such scenes properly to try and do an effective job at the compression or blur without any perceptive detail loss (which I believe such codecs need improvement in such area). Really need in-depth research video on it for innovation of future technologies!
Those signal processing algorithms behind the scenes are amazing. Especially the transforms that have been taught in the colleagues should not be rely on the mathematical calculations, instead there should be more visual explanations like this. Amazing video by the way!
When you got to step three I had horrible flashbacks to college when I learned about Fourier transforms - I can’t believe they are used here so often that actually blows my mind
I'm glad you mentioned Tom Scott when talking about Hoffman algorithm, I watched that video, and also the interframe compression that makes any confetti video look ugly😂
I've just seen the video and realized that you released the video just two days after I submitted my bachelor thesis about forensic jpeg algorithms :D Would have been great to have these visualizations back then but nonetheless great and simple explanation with really good animations!
Compression is amazing technology. I remember when the Surge 1 was been released. It was only 6gigs or so but the game was really large with many enemy types, a lot of music and many weapons and amour. I was wrecking my brain trying to figure out how in the world could they fit such a large game into such a small file size. I am still at a loss ist at how they did it. Even with this upload, it still feels like magic haha
Great video! JPEG is actually significantly simpler than I thought, thanks to your explanation. Would be great to have a video outlook into machine-learning based methods and the future of compression.
This is amazing as a graphic designer i only know to use jpeg file for compressing real life images for social media...and to use png transparency and to use Ai/PDF files for a vector file ... But i never knew how does it actually works! Amazing video truly appreciate it and you have a new subscriber 😊❤
Digital imagery takes up a great deal of my time for what I do, and I know from practice what file formats and compression do, but I never really looked into how they work. Thank you!
I hope every human being in this modern age watch this. As a designer, it sucks to receive low resolution images around 300 pixels to be used on a fucking A4 poster or a billboard.
I really wish I could bring myself to use Adobe stuff again like Photoshop. I honestly think having software-as-a-service should be illegal. The service is the cloud storage provided by Adobe the software is the software. It's actually on my hard drive therefore it's not a service, its a product. It's just companies abusing people's misunderstanding about software and how the cloud works. Slowly boil the Frog until everything will be considered a service. It makes so much more money....
make some app, sell for 10$/person and support that app with updates forever to work with new operating system, different devices, pay taxes, pay for the office, sallary for employees, marketing and more... Good luck :)
@@piotrbaranowski1 Imagine in the future where Microsoft OneDrive is so "essential" to the OS that you have to pay a monthly bill to use your computer. We are already really close to that as it is. companies like microsoft, apple, google, and amazon are boiling the frog to see what they can get away with before people notice. eventually it will become such a societal norm that no one will blink twice to how delusional s-o-s really is.
@@piotrbaranowski1 These companies are just abusing the fact that the average user isn't tech savvy enough to understand what the 'cloud' and its services really are. there is nothing good that comes out of that besides more dystopian consumerism. People aren't stupid, just kept uninformed about how the reality of the thing is. People assume it's too complex and the "tech persons job" to understand that. eventually your Ford Truck will need your username and password to your Ford account to start your truck. Its already getting close to that already with Tesla...Eventually we wont be able to own things anymore...TV? nope. sign into your Samsung account to "sync with the cloud"..No Samsung account, your TV wont work. Apple products are already like that....How long before all the major tech companies do that for your "convince" and all we have left are the shitty off-brands?
@@Decco6306 People were stupid, they are, and they will be. Of course they want your data to push you an ad. Advertisers pay for it. That's the only reason you could watch this movie for free, because google created a free platform for it and it still pays the creators. If you value privacy, delete your google account and go to the store for a newspaper or a book describing how JPG compression works. Now, when buying a device, you have access to many services, be it a phone, car, washing machine or oven with wifi. You can always disconnect from the Internet. The most important thing is to read the regulations and the privacy policy. I read how it suits me, accepts it, if not, then not.
My questions is: How did we come up with what data needs to be thrown away in JPEG compression algorithm if our eyes couldn't perceive any of those data points at the first place? 🤔
Numbers We see the numbers in the data, yet we don't see their representation in the image, might as well get rid of them (very... very, crude explanation)
because it is not an image captured by your eyes at first but by a camera device with a sensor : for instance the human eye can not see the wave length of infra red, yet we can build sensors that can produce image from infra red, because it can detect them. when a camera sensor receives light it saves data from it, the details it can process and encode might not be the one the human most care about however these details might be handy if you want to modify your pictures in addition if you want to store image data, you should consider to only capture data cleverly, indeed our eye sees light not in a linear way but in a logarithmic way as many of our sensors, which means saving data linearly is not interesting. if we perceive that a spot is bright at a level A, you have to increase the intensity of light more than twice ( much more ) to get the perception of a spot twice as bright as A
tldr; our eye sees a projection of reality, but sensors can see the reality differently, all sensors ( digital or organic ) only sample a fraction of reality but do it in a different way. so when a sensor saves data, we just have to throw away the data it collected that our eye does not care about
look for subjects like visual perception and also look for what are 'raw file formats' in professional cameras, there's a lot of invisible data in raw files that can be 'brought back to the life' in post processing.
It's not that you _can't_ perceive them, it's that we're not as sensitive to some kinds of data as others. A good example is color -- our eyes are more sensitive to some colors than others, yet we can see the whole visible spectrum. What that means is, LEDs can be optimized to produce the colors we see best, and less optimized for the colors we can't see as well, and it looks close enough to perfect to us. Producing perfect light sources is difficult and expensive, so that optimization leads to affordable products. But, it can only be an _optimization,_ not a complete disregard. We would miss cyan if it weren't there, for example. So, in a JPEG file, the high-frequency detail is basically just a way of saying, details that have sharp contrast in nearby pixels. You would have to zoom way in, making those details larger (and hence, lower-frequency) to be able to tell the difference between one that has all of its contrast intact, and one that has been "low-pass filtered" to result in a smaller data set. If you're zooming in (like when an image has a magnifying glass for detail inspection), or blowing it up to a large screen, or printing it on a page in a magazine, then you may need more of that detail despite not being able to see it on your phone's screen or in a 4x6" photo, for example.
Just some weeks ago I purchased the HEIC plugin for Windows and my mind is blown. The file size is only about one third of that of a JPG file (but depends on the content of course) and I can't even see any artifacts when I zoom in. Most of the times I use Q80 for JPG but with HEIC Q40 is absolutely okay even with high frequency images like dark tree branches and twigs against an almost white sky during winter.
@@frapooch What's your concern? Microsoft got your data anyway once you've created an account. Do you prefer to get your stuff from wares sites? If so, think again!
@@Osmone_Everony But... I remember there's a trick to get it gor free. (Something like hidden OEM page in MS Store) Also, you can also read those files by some 3rd party photo viewer
Hey really thanks man! I was tod to take a seminar for mpeg conversion even though I saw many articles in Wikipedia and watched many videos I can't able to understand but your video about jpeg conversion have given me enough knowledge to explain the concept of mpeg conversion
Hi, at 9:20 before Quantization, I'm having trouble understanding the Base image concept. Is one base image itself a 8X8 grid of pixels thereby meaning , each base image has 64 values. So, does size of one base image correspond to size of one block, matching exactly same 64 pixels? Is it right to think that each reconstructed block is basically a stacking(sum) of all base images multiplied with their respective constant ? Block = b1 x c1 + b2 x c2 + ....+ b64 x c64 where b - base image and c is their respective number of times.
Can someone please explain, I don't get it from 8:40. How are you recreating the block using the base images? How do those black/white line/checker combinations map to the pixels, which are currently just values on the black to white scale. And does the base image block map across to the 8x8 block so somehow the top left pixel of each block is described by the solid white image and the bottom right described by the fine checkered image? And what does it mean that "the 64 pixel block is transformed into 64 values / constants that represent how much of each base image is used"? What does is mean "how much of each base image?"
It never ceases to amaze me how much computation goes into things we take for granted and how inconceivably fast it happens over and over again...
Still images are fairly easy, the compression is relatively simple. Video is a lot harder to do quickly in software, especially compression, which is much slower. Computers, and especially smartphones have ASICs (Application Specific Integrated Circuit), to do it much more efficiently. Instead of general purpose CPU doing it, a circuit that is hardwired to do exactly one thing is used, offering massive efficiency gains, which are absolutely critical for use in low power devices. Software is more flexible, and tends to be able to get higher compression, or better quality for the same compression, at the expense of processing time and power usage.
What amazes me the most to this day is how we can have phonecalls moving at high speeds without an issue. It's like, how fast does it really happens??
@@rockraphlegal that's where processing power matter, that's why some chips are better in image Manipulation wrt to other
No need to be amazed. Just take a simple can of soda, the time and efforts it took takes years to come out with today's design.
@@weizhen77 that too amazes me :P
This channel is researching and (perfectly) presenting exactly what you think in your spare time
You are extremely right!
Ohhhh so this is where our google data goes!
@@sifanikalsa9790 88
@@sifanikalsa9790 damn this is hot
ruclips.net/video/fx6jtJrvJzU/видео.html
as someone whos on youtube a lot, these still manage to be some of the best videos. The research, editing, the pace of the lesson. Thank you guys for the hard work
As somebody who has worked on JPEG implementations, I am still amazed by how perfect they got it on the first try. JPEG has been around for ages, and is still so good; only recently have meaningful replacements emerged, and JPEG still has some benefits over these.
What is this replacement?
@@eduardmart1237 webp maybe ?
@@allkillhon5209 avif is good too, although not widely supported yet. webp is much better than jpeg, don't really know why it hasn't been adopted widely yet
@@zxbryc JXL is definitely very interesting, although I don't know too much about it. I just think with rising performance in almost everything, and hardware decoders for AV1 & AVIF in new phones it might be a viable option in the future, for fast image loading on bad mobile connections for example
webp is trash though.... not much advantage vs jpeg and lots of compability issues. Instances when a site uses webp for own images but doesn't allow webp image upload is annoying. Windows can't open it out of the box, PS needs a plugin.... And that despite the format being out for 12 years
10% better compression might have been nice 10-15 years ago, today people have 100mb/s in their phones and gigabit via fiber, whether an image takes 1mb or 1.1mb doesn't make any difference.
The Discrete Cosine Transform blew my mind! I’m a software engineer and knew a lot of the basics of JPEG compression already but this bit in particular really stood out, learnt something completely new. Great video and high-quality explanation!
Indeed. The DCT is the basis of all image compression algorithms, only the failed JPEG2000 used other transform
There's just nothing like these videos, really. I'm blown away every time. The voice-over and writing are just so far beyond anything else out there. I wish all information could be presented in this calm, clear, and concise manner than also somehow manages to never dumb it down.
you mean his compression algorithm is pretty good?
dumb @@YszapHun
I have a degree in computer science and I'm still amazed how good your channel is! Always wondered how JPEG compression worked. You have illustrated it in super simple terms and visualizations have as always been awesome! Happy to see more content like that, maybe more details on H.264.
lol I dont have a degree, but I already knew this.
@@kangarht good for you
@@m.moonsie you look happy and healthy, not me, if you ever cared to aaask
Having a degree means nothing though. It doesn't guarantee that you know even a tenth of what's there in your own field.
@@pedor5965 lol
You cover topics that I've always wanted to know and in the right amount of detail.🥰
Not compressed too much ;)
true that
The perfect reason to cancel all school loans. If youtube is going to tell everyone everything for free. You do know one day there won't be a pandemic right or a bunch of chaos coming from trump and his followers right? One day. You're just going to be competing with everyone who knows the same crap as you.
Exactly so relatable for me
YES SIR
An absolute masterpiece of explanation. I've tried to understand how DCT works in JPEG before, and never understood the dry maths-only explanation. This video changed that and the image tables explained it perfectly. Thanks for filling in a gap that's been in my head for many years!
Up until now, I just thought JPEG lumps some similar colours together to save on file size. This was literally eye-opening. What an algorhythm!
That is like 5% of it. There is alogoritmoc coding, and other things.
You were probably confusing it with the GIF format. It stores images with a limited palette so the numbers of colours in each imge is reduced. But unlike JPEG it allows transparency and animation, so despite the greater file size it was popular on the early web.
Tbh, I didn’t even know JPEGs were compressed.
@@plashplash-fg6hd Moreover, JPEG is not just the image format, it is a compression algorithm. For example, one can use it in TIFF images. Damn, what obsolete trash! And it was obsolete trash already at the time of this publication.
@@Micro-Moo Never knew that.
Another factor that compression works so well is that your brain all the time fills in the «gaps» in information. Especially when watching video. Our eyes is actually worse than you think. The brain does a huge amount of work filling in various gaps in information. For example each of our eyes has a blind spot. We usually do not see it since the brain erases it and fills in the missing information
We have built in image upscaler.
@@prateekpanwar646 with great edge detection neuron layer 😁
It also combines 2 separate images with slightly different angles into 1 👀
@@MLWJ1993 like 3d glasses but free
Is there anybody in this world I can trust?
Your videos are SPECTACULAR! It's obvious that you put a lot of work into them, and it shows. You seem to intuitively know how to achieve the perfect balance of technical detail and higher-level concepts to enable the viewer to develop a solid overall understanding of the subject matter. THANK YOU!
Great Video, just a little feedback on chroma subsampling: The reason why the human eye can perceive more luminance resolution than chroma isn't because of the number of rods vs cones. The rods are actually inactive in all but the darkest lit environments, and their density in the fovea i.e. the center of the visual field on the retina is actually rather low. So the RGB cone cells are solely responsible for tristimulus color vision.
The actual reason for the color resolution difference is because there is some amount of cell level processing happening in the eye before the signal reaches the visual nerve. This is very much analogous to the Lightness chroma separation done in digital images. Also, there is some correlation between spatial resolution of each of the red, green and blue cones in terms of the lightness but little correlation between them in terms of the color. Those effects together cause the lower chroma resolution of the human visual system.
Thanks for your input
No
@@blackpepper2610 What are you saying?
"..actually inactive in all but the darkest lit environments, and their density in the fovea i.e. the center of the visual field on the retina is actually rather low." So that's why when I am in dark I see some very dimly lit stuff more clearly when not looking directly at it, but the moment I move my vision towards it, it disappears?
What's the likelyhood of having hexastimulus color vision...like prawns...but in humans. Is it possible to design a baby capable of this?
Never thought that literal opticular biology would take part in a image file type, Great to know!
I'm a photographer and I always wanted to know this. I didn't understand anything. But great video. One day I will watch it again. Hopefully I will, then, understand it.
This is... AMAZING. It's probably almost impossible to explain this even just marginally better. The amount of well-considered effort you go through to make things more clear is extremely impressive. Subbed!
18 minutes to teach us what our smartphones does in miliseconds. Awesome!
You sould try CPU! post video!
It is incredible how complex things that we take for granted on a daily basis can be
And this is the reason why religions are accepted and exists in the first place.
Because it is more convenient for a LOT of people to think about complex topics as 'magic' and 'just works'...
@@temp50 I don’t think any religion says that
@@nickwilson3499 They don't say anything meaningful anyway.
A remarkable understanding to appreciate whether or not, how the conversion process to JPEG takes decision of how the human eye accepts what it is seeing.
I'm here because youtube won't stop recommending me this video!!!!!!!
JPEG stands for “Joint Photographic Experts Group”. It's a standard image format for containing lossy and compressed image data.
We needed an open standard, the CompuServe GIF Image Format was dominant, PNG for professional use and for reviewing only JPG
@@lucasrem1870PNG and JPG are two very different kinds of compression, and as such, good for very different things.
PNG is a lossless compression which deals the best with "blocky" images, i.e. where there are a bunch of consecutive pixels with the exact same color adjacent to each other. This is, for example, includes (rendered) text or (vector, engineerinng) drawing on solid color background, or similar ideas. These can be compressed to very small sizes with PNG without any artifacts, while the JPG-compressed counterpart will be worse in quality and most of the cases even bigger in file size (than the PNG version).
However, compressing natural images where there are a lot of color gradients and edges, while it is possible with PNG, it is rather "encapsulation" and not really compression, as the file size will not be meaningfully smaller in most of the cases. This is the territory where JPG is better, provided that you are okay with it's lossy nature.
Since the very 1st video I watched, this channel is one of a very few that I clicked the bell button without even thinking twice, each subject is presented with great clarity, beautifully illustrated and great narration as well. well planned, edited and organized by key topics.
I have no prior interested on this whatsoever but you grabbed my attention so well that I had to watch the entire vid
I remember when JPEG first came into existence and it would take forever to compress and decompress that you would see the image go from a pixely mess into a nice image. This was the only format of image shared across BBS's over dialup, that and GIF. Bitmaps (BMP's) could be found rarely but it would take a decade to download. This by far is the most handy format for quickly sharing images.
JPEGs going from a pixely mess to the proper image is because the image is a Progressive JPEG and not a Baseline JPEG.
Baseline JPEGs are displayed line by line as they are downloaded while Progressive JPEGs contain a number of Level of Detail versions of the image. Progressive JPEGs will load and display the lowest LoD, and then when a higher LoD has finished downloading, it will switch to it, and it does this switching through the higher LoD images until the proper image is downloaded and displayed. Progressive JPEGs are larger than Baseline JPEGs because of the additional LoD images.
@@Oshroth Yes. The many LOD's were useful over slower connections as they showed you roughly what the image was quickly so you can make a choice without having to wait for the full image to load. This was back when browsing a webpage was faster cause they optimized it for speedy loading of the text and progressive loading of images.
16:28 Well, vector images are a different beast altogether. They aren't rasterised, so _any_ bitmapping format will result in quality loss, even bmp and png. Vectors are made to be scaled and work by having nodes and lines that make up the image.
Instead, what you probably meant, was that jpg doesn't perform well on digital images, such as pixel art, or anything with sharp contrast, such as text overlaying on any other image.
Jpg also isn't preferred when working with intermediate copies. If you have a photo that you want to use in a video, for example, where it may undergo additional colour grading, you do not want to use a jpg image. This is because any additional grading can highlight artefacts, and because the image would be compressed twice; once by the jpg algorithm and again by the video compression.
I think he might have meant when saving a vector image to a bitmap format, to retain alpha.
I think what he meant was that when rasterising a vector graphic (which you literally have to do to view one) JPEG is a bad choice.
@@LucienHughes Yeah, but any format is a bad choice for saving vectors, except for vector formats.
You rasterise it for viewing (unless you use a vector display, like an analog oscilloscope), but you can still scale it without losing detail. You can't do that with a bitmapped format, not even png.
@@DaedalusYoung Distance fields are an interesting compromise between vectors and bitmaps. Some game developers use that for fonts for example.
How about TIFF?
I am amazed by the 3D transitions between different 2D text/diagram sections. This form of presenting learning feels closer to learning in VR. I wonder if one day this channel will create VR episodes which we can move around in. Great delivery of a new way of learning!
The amount of efforts you are putting in your video is amazing
the JPG at the beginning has richer contrast and deeper colors
Thank you! I was wondering if anyone else noticed this. It was so immediately apparent
ruclips.net/video/fx6jtJrvJzU/видео.html
to me, the compressed example looks more colorful and with more contrast. If anything, the uncompressed image has slightly better dynamic range in the shadows
Câmeras when shoot jpg aply on algorithm saturation sharpening etc... Things that you would do to a raw file on a editing software
no xD
As someone who has studied image and video compression at university, I can say this is a GREAT video and it's spot on. It explains the basics in just the right amount of detail to get a general understanding. Would have helped me a lot in the beginning.
Any book to recommend?
Fantastic explanation of discrete cosine transform. This is the best high-level explanation of DCT that I’ve ever seen.
I never thought I can understand the basic of JPEG algorithm in less than 20 minutes.
Brilliant video!
Nice, love details and simplicity in explanation of how it work.
Small tip for far future video idea: accelerometers and gyro/magnetic sensors.... I always loved to read on how they managed to make those sensors.
I think they have like 2 pieces of metal that change resistance based off the orientation. not sure tho I just remember reading that somewhere
@@sierra991 It actually use MEMS in scale like one or two µm that is insane(as most of stuff that is in nano scale) but in that small scale it need to be super accurate and still robust.
Its just mind blowing that we managed to create few atom size transistors for eg SSD or OLED screens but mechanical components is just another level as its need to move in very specific way and best part is that MEMS technology is "low-cost"... its easier to make than grow damn tomato plant -_-
and they say perfection isn't achievable.
this is an honestly amazing video and now I'm going to write a jpg parser and decompressor just for fun because you made it.
I also do embedded development, and a microprocessor I was working with had hardware jpg decoding - yet I never really thought it could be something this simple because when I googled it all these complicated terms came up. I didn't stop to think that hardware implementation are usually only feasible with simple algorithms. but now I'll know better in the future
I've also been aching to implement h.264 decoding on an fpga for direct network playback (so I don't need a nasty closed source firmware raspberry pi to do it), so I can't wait for that video too! I've been trying to read up on the subject but good resources are essentially non-existent. this video alone already helped me a lot in that regard, so even if you only release the h264 one in a few months and I'm done with at least a simple software implementation by then - I'll still watch it just because your videos are great, and much of that will still be owed to this very video right here.
thank you!!!
" I didn't stop to think that hardware implementation are usually only feasible with simple algorithms."
Any algorithm can be implemented in hardware (chip/transistor level), simple or complex. Logic gate structures can form turing complete computation, just the same as the general processors that use them. This also can drastically speedup the computation due to less clock cycles for the same task (no overhead for instruction sets and less data shuffling).
Still had my window up and figured I'd clarify another thing:
"yet I never really thought it could be something this simple because when I googled it all these complicated terms came up."
The actual implementation is much more complex than this video lets on. It's an excellent video for a general overview, but he skips through pretty much all of the math and even gets a few things wrong (like misusing the term 'vector' when discussing issues with Jpeg compressing a raster... but vector is entirely orthogonal to raster). The actual implementation deals with the math of fourier series and fourier transform, whereby an arbitrary signal can be represented by a summed series of cosine (and sine for complex) signals with a specific amplitude coefficient and phase. If you write a Jpeg parser, it would likely be beneficial to understand this math, though you can probably implement the algorithm without understanding it, just by running through the format documentation.
As for the h.264, I'd be surprised if there isn't already an FPGA implementation of that floating around out there. However, it is licensed, so maybe they're protective of direct implementations just floating around, not sure.
This is the clearest, yet most comprehensive, explanation of JPEG compression that I've seen so far. I finally feel like I understand it. Thank you!
One small quibble that is unrelated to the topic of JPEG compression:
The video suggests that the rod cells in the human eye act something like the "luminance" channel of human vision to the cone cells "RGB" channel. The reality is quite a bit different.
Rod cells are the "low light level" cells of the human eye, and do not contribute much to our visual perception in bright conditions. Indeed, bright lighting conditions, such as those of a regular phone or computer screen, are bright enough to overwhelm the rod cells and stop them from contributing a signal to our vision. They essentially "turn off" in such conditions. Conversely The cone cells are the "bright light level" cells and do not perform well in low light conditions.
Also, the rod cells, though more numerous, do not work in isolation. In low light conditions, the human eye performs a kind of "binning" with rod cells, where it combines the signal from a large number of individual cells to produce a signal that passed on to the brain. Indeed, this binning can combine the light detection capabilities from so many cells that the resolution that we get in low light conditions (where the light-loving cone cells lose sensitivity) is far lower than one would expect given the number of cells involved. Try reading a printed page by moonlight alone and you'll see what I mean. (There's"temporal" binning as well, where the early processing of the eye's signal combines the light over a longer period, up to about 1/6th of a second, to produce a stronger signal and any movement during that time will "smear" the image.)
So, when dealing with anything displayed on a computer screen, it is the "bright light" cone cells which do the heavy lifting. This does not affect the sharpness of our vision, however. That there are only "about 6 million cone cells" (and astonishingly few "blue" cone cells, BTW) is made up for by the fact that they are largely crammed into a tight area at the center of our vision known as the "fovea". The fovea covers only a few degrees of our vision- so the resolution in that region is quite high. Indeed, I've seen estimates that suggest if we spread the resolution of our fovea over our entire visual field, the result would be the equivalent of a 100 megapixel image or more. (TV manufacturers set their recommended viewing distance based on this sort of calculation, BTW.)
Our perception of sharpness over a broad image is created by the fact that our eyes dart around (that is, perform "saccades") picking up bits and pieces of an image and assembling them in our brains. The parts of the image that our eyes do not land on are "filled in" with plausible sharpness or ignored.
Try reading the paragraph above this one while staring fixedly at the period at the end of this sentence. (Don't cheat!) If you're a normal human being you can't do it, despite the fact the text above this paragraph is quite close to the center of your vision. Your eyes do not have many cone cells outside the range of that period and there's simply not enough retinal resolution there to make out the fine details of printed letters. You do not sense anything amiss, however, because your brain "fills in" the blurry areas with a "sense" of sharpness.
The intensity levels are simply a measurement of the intensity of each cone cell's intensity- with a quite a bit of "local contrast enhancement" (a very interesting process, which would require many paragraphs of explanation to outline) done by the retina.
Incidentally, the cone cells are not truly RGB (red, green, blue)- in that they have peak sensitivities evenly distributed across the visible spectrum. They are, instead, more like Y-YG-B (yellow, yellow-green, blue), but, with three different ranges of peak sensitivity, they are able to unambiguously distinguish all the colors of the rainbow.
The rest of the details in the video which deal with our perception of spatial resolution and frequency is on point. It just doesn't have anything to do with the distinction between rod and cone cells.
Thanks for this extra (huge) bit of information, it was similarly well explained and comprehensive as the video itself.
WOW! All that computation required for a single pixel is amazing. And it's completed so quickly. Just imagine, when we went to the Moon, the rocket scientists relied mostly on their manual mathematical skills and slide rules. The power infused in a cell phone's processor is a hundred million times more powerful, yet it fits in your pocket. Technology is just amazing! Love this channel!
I'm a programmer and I've never bothered deep diving into jpeg, always treating it like some kind of magic.
This just cleared my mind on everything about it in less than 10 minutes. I can now explain it to people, and it's almost worrying how fast this went.
Damn, you're good.
Every time I remain surprised by the amount of quality and work behind your videos, wow
Also, you should take a super deep-dive into how Laserdisc works. I get the basics of it, but still would be nice to dig into the actual nuts and bolts of the format. :D
This comment is to tell you that I love what you have done. Thankyou for including Tom Scott and Blender.
The way you organized your presentation -mentioning a concept and saying you'll come back to it so you can continue in greater detail- was very appealing to my ADHD brain. You held my attention!
Amazing video as always!! The detail of explanation is precise and technically correct. I think that, once understood how JPEG works, MPEG should be not very difficult to understand.
If you plan on doing videos about electric engineering topics, I'd love to see a video about OFDM access, and how it is used in 4G for letting hundreds of users connect at the same time at an e-Node B. Can't wait for the next video though!
Great video, I basically knew everything as my research is in video compression, I love the fact that he said video compression is extremely complex while giving an example of H.264, which is quite old now, it's still the most used for sure, but the newest codec VVC (versatile video coding) is times more complex than even HEVC which is newer than H.264, my work is in VVC, believe me it's a nightmare, don't take the videos you watch on the internet for granted ;)
I am curious to learn about the potential of the new codecs to be developed. I tried some google search but it looks difficult to find understandable source of information. Do you have any recommendations to start with ?
Used to be I followed these things out of curiosity - where does one do that now? 🤔
Interesting, I hadn't heard about VVC until now. As the successor to HEVC, which competes mostly with VP9, is VVC seen as a competitor to AV1 (since AV1 is the successor to VP9)?
ruclips.net/video/fx6jtJrvJzU/видео.html
@@barbecue1617 ruclips.net/video/dQw4w9WgXcQ/видео.html
"Can you see the difference?" - Actually not, the video is compressed as well
you can if you look enough
No it's not the case well you said correct that video is compressed but if the video gets compress the low pixel pic must be as well. So the ratio must still be maintained.
And I am watching at 360p ....
incredible, I would never have thought that there is so much complexity behind a jpeg image
people who invented jpeg ,mpeg/mp3 opened up a realm of possibilities for us. really magnificent work that should be marked in history.
You can clearly see those artifacts in H.264 compression in an MKBHD video where he re-uploaded the same video for 1000 times.👍
Good thing to emphasize about the dct is that it's also reversible -- you get the same amount of information out as you put in, just arranged in a different way. (I explain this to myself by noticing that the frequency samples are all linearly independent -- none of them can be added to each other to make a third, no redundancy -- so you've switched from one 8x8 block of data to another 8x8 block of data.) This sort of thing (fourier transforms and cousins) are super important for signal processing
Great explanation on JPEG. I knew it compressed in blocks, and I vaguely knew it averaged color, but seeing this level of detail is amazing. Well done!
I'm so glad that because of some great people out there, this channel exists.👏👏
These videos literally break my brain.
Literally?
Fantastic video, as an electronics engineer its pretty amazing to see topics such as frequency response being used everyday in our lifes
I love this channel! Super informative, great presentation and one with the best content in RUclips. Thanks for making those videos and please keep up the good work!
usualy videos like this claim to explain something but they just talk about it and dont explain anything .... but THIS video does actualy explain the techniques behind it i love it ... those vids are rare and hard to find .thank you for all the work you put into this
JPEG compression algorithm is what made me love image processing. It's one of the greatest example in computer science where you can see a consistent amount of mathematics used together to do something that is extremely useful in practice, with effects that are both visually and effectively tangible to anyone.
I like compressed photos. Every time I see such a small amount of data that can express such rich photo content, I feel very comfortable.❤
as a person working with image processing, this is absolutely fascinating. Also, that is a great explanation of DCT!
Well put together and very structured. I'd be interested to know what caused the change in the audio quality on the voice over (example 7:44 / 7:46).
Those parts were probably recorded at a separate time, with a different microphone and/or settings. It's noticeable enough that it honestly kinda distracts me.
@@Sukigu That part on separate times is obvious, I noticed this with other channels as well. There is an unintentional sound effect on the 'second take' that i can sometimes hear even in movies (like a sort of high frequency tremelo on male voices) and i always wondered what caused it.
I never thought that there was so much computing behind each image. I am truly humbled by your work team. Thanks!
This channel really goes in-dept. to the nano things a computing does. Makes you appreciate more the geniuses that invented these things
Came across your channel this week. Must say that your explanations are exceptional in the sense that they convey technical details very clearly and also very quickly grab the attention of the viewer. Exceptional i tell you. Keep it up. You should he used as the benchmark for teachers and lecturers all around the world.
This video is great and I would really like to see a follow up video for another popular image format called PNG, which is orders of magnitudes better than JPEG for vector images. The reason being that it is so widely supported while having other strengths and weaknesses compared to JPEG.
Yes. A similar video on PNG would be great!
run length encoding
more widely supported but blur
png is a raster image format and not a vector image format. svg is a vector image format.
@@joshix833 But for being a raster format, it's fantastic at compressing vector-type images. No compression artifacts, and incredibly small file sizes
I must say that 10:1 compression ratio using JPG is quite impressive. For special images that are easily prone to artifacting, I just use the highest quality setting and that pretty much does the "trick". It would have been nice had they included a lossless option in original JPG for people that want a mathematically identical representation of the original image, but 1/2 to 1/3rd the size of a 24 bit RGB image. So basically getting it down to the filzesize of 8 or 12 bit color, but still retaining the original 24 bit color information.
you can compress .pngs though
Ah, JPEG. The greatest friend of a web designer, the worst enemy of a digital artist.
webp is worse... XD
or fake-transparent png's
@@Mic_Glow omg I really hate those types of pngs like why just why
The worst enemy of a photographer who is asked to post-process jpeg images from cell phones and make them look as if the scene was taken with a DSLR with a proper lens and saved as RAW data. A nightmare.
3:43 - Rare branch education fail - Those are lilies, not tulips! (I love you guys and it's fun to finally catch you on something lol)
Couldn't understand 80% of it, but I am happy someone put so much effort into this for us viewers to understand.
Due to the square/rectangular artifacts associated with lower-quality instances of the format, I'd always assumed JPG worked in some recursive way where it divides the image up into quadrants, throws out the least-used/prominent colors in each, and gets more "aggressive" about throwing out colors with each subdivision of quadrants, as the smaller the area the harder a discrepancy is to notice.
That's another algorithm, called JPEG2000. It works by subdividing the original image in four quadrants, and recursively subdividing the top left quadrant into other four quadrants, and so on. Each quadrant is described by a weighted combination of wavelets
@@gianluca.g Ah, interesting - thanks for the info!
@@gianluca.g this is what we call a quadtree. i heavily work with bin/quad/oct tree with my 3d engine. the first box is call the root, the smallest division/box are the leaf.
i made a combination of all of those which i call a polymorphic tree (polytree for short)
@@jackl2254 as a software engineer myself I find quadtrees really elegant! Jpeg2000 however doesn't really use quadtrees, it just divides recursively the top left quadrant, not the other 3 quadrants. The top left quadrant get subdivided while the other 3 quadrants are described by some wavelet polinomial (similar to the cosine table in jpeg).
OT: I rememeber binary trees being used in Doom to speed up sector rendering.
@@gianluca.g i see, i though they were all divided and sub divided etc.. that seem to be some weird algorithm there. heh
Even though I always knew the basic idea behind compression this video finally gave me some real understanding - well done for a very complex subject
Back in the early 1990's I designed and worked on 'Slowscan' security systems over dialup modems.
This was in the early days before the JPEG standard was published. We used our own Discrete Cosine Transform (DCT) based image compression, empirically tuned for the product's target usage. We actually used a hardware circuit to detect when an 8x8 block had changed enough to mark it as 'to be updated'. Due to the low processor computing power available at the time, the DCT was performed by an INMOS DCT/IDCT transform chip. The compression stage was left to the modem's built-in algorithm, such as V42bis.
Awesome history lesson, thank you for writing this.
That’s amazing… amazes me that humans created all this.
Great content mate. It's truly mesmerizing how complex something seemingly simple can actually be.
Very good explained, maybe you should improve your voice, It's the only indication which shows you're not human, I appreciate you using your resources to help!
Can't wait for the x264 and x265 video, how P (Prediction) frames, I (Intra Frames) and B (Bidirectional Frames), AQ-Mode, AQ-Strength, block-size, B-Pyramid, etc., options work, how well it compresses data and why sometimes x264/x265 compress dark scenes badly along with artifacts or why x265 tends to blur out certain scenes while failing to apply proper DCT and quantization to such scenes properly to try and do an effective job at the compression or blur without any perceptive detail loss (which I believe such codecs need improvement in such area). Really need in-depth research video on it for innovation of future technologies!
Phenomenally well explained as usual!
Those signal processing algorithms behind the scenes are amazing. Especially the transforms that have been taught in the colleagues should not be rely on the mathematical calculations, instead there should be more visual explanations like this. Amazing video by the way!
When you got to step three I had horrible flashbacks to college when I learned about Fourier transforms - I can’t believe they are used here so often that actually blows my mind
Incredibly fascinating the fact of creating this video than the fact itself.
I'm glad you mentioned Tom Scott when talking about Hoffman algorithm, I watched that video, and also the interframe compression that makes any confetti video look ugly😂
0:14 Not with RUclips compression
JPEG amazes the shit out of me.
This guy have a voice God of kindness, i can listen to this all day!
I've just seen the video and realized that you released the video just two days after I submitted my bachelor thesis about forensic jpeg algorithms :D Would have been great to have these visualizations back then but nonetheless great and simple explanation with really good animations!
Do I look like I want to know what a jpeg is,
I just want a picture of a god dang hotdog 🌭
Compression is amazing technology. I remember when the Surge 1 was been released. It was only 6gigs or so but the game was really large with many enemy types, a lot of music and many weapons and amour. I was wrecking my brain trying to figure out how in the world could they fit such a large game into such a small file size. I am still at a loss ist at how they did it. Even with this upload, it still feels like magic haha
Great video! JPEG is actually significantly simpler than I thought, thanks to your explanation. Would be great to have a video outlook into machine-learning based methods and the future of compression.
This is amazing as a graphic designer i only know to use jpeg file for compressing real life images for social media...and to use png transparency and to use Ai/PDF files for a vector file ... But i never knew how does it actually works! Amazing video truly appreciate it and you have a new subscriber 😊❤
Digital imagery takes up a great deal of my time for what I do, and I know from practice what file formats and compression do, but I never really looked into how they work. Thank you!
I hope every human being in this modern age watch this. As a designer, it sucks to receive low resolution images around 300 pixels to be used on a fucking A4 poster or a billboard.
I see 240p haunts you aswell 😶🌫️
I really wish I could bring myself to use Adobe stuff again like Photoshop. I honestly think having software-as-a-service should be illegal. The service is the cloud storage provided by Adobe the software is the software. It's actually on my hard drive therefore it's not a service, its a product. It's just companies abusing people's misunderstanding about software and how the cloud works. Slowly boil the Frog until everything will be considered a service. It makes so much more money....
make some app, sell for 10$/person and support that app with updates forever to work with new operating system, different devices, pay taxes, pay for the office, sallary for employees, marketing and more... Good luck :)
@@piotrbaranowski1 Imagine in the future where Microsoft OneDrive is so "essential" to the OS that you have to pay a monthly bill to use your computer. We are already really close to that as it is. companies like microsoft, apple, google, and amazon are boiling the frog to see what they can get away with before people notice. eventually it will become such a societal norm that no one will blink twice to how delusional s-o-s really is.
@@Decco6306 Pirvacy exist and this is reason why I rarely pay for software.
@@piotrbaranowski1 These companies are just abusing the fact that the average user isn't tech savvy enough to understand what the 'cloud' and its services really are. there is nothing good that comes out of that besides more dystopian consumerism. People aren't stupid, just kept uninformed about how the reality of the thing is. People assume it's too complex and the "tech persons job" to understand that. eventually your Ford Truck will need your username and password to your Ford account to start your truck. Its already getting close to that already with Tesla...Eventually we wont be able to own things anymore...TV? nope. sign into your Samsung account to "sync with the cloud"..No Samsung account, your TV wont work. Apple products are already like that....How long before all the major tech companies do that for your "convince" and all we have left are the shitty off-brands?
@@Decco6306 People were stupid, they are, and they will be. Of course they want your data to push you an ad. Advertisers pay for it. That's the only reason you could watch this movie for free, because google created a free platform for it and it still pays the creators. If you value privacy, delete your google account and go to the store for a newspaper or a book describing how JPG compression works. Now, when buying a device, you have access to many services, be it a phone, car, washing machine or oven with wifi. You can always disconnect from the Internet. The most important thing is to read the regulations and the privacy policy. I read how it suits me, accepts it, if not, then not.
My questions is: How did we come up with what data needs to be thrown away in JPEG compression algorithm if our eyes couldn't perceive any of those data points at the first place? 🤔
Numbers
We see the numbers in the data, yet we don't see their representation in the image, might as well get rid of them (very... very, crude explanation)
because it is not an image captured by your eyes at first but by a camera device with a sensor : for instance the human eye can not see the wave length of infra red, yet we can build sensors that can produce image from infra red, because it can detect them. when a camera sensor receives light it saves data from it, the details it can process and encode might not be the one the human most care about
however these details might be handy if you want to modify your pictures
in addition if you want to store image data, you should consider to only capture data cleverly, indeed our eye sees light not in a linear way but in a logarithmic way as many of our sensors, which means saving data linearly is not interesting.
if we perceive that a spot is bright at a level A, you have to increase the intensity of light more than twice ( much more ) to get the perception of a spot twice as bright as A
tldr; our eye sees a projection of reality, but sensors can see the reality differently, all sensors ( digital or organic ) only sample a fraction of reality but do it in a different way. so when a sensor saves data, we just have to throw away the data it collected that our eye does not care about
look for subjects like visual perception and also look for what are 'raw file formats' in professional cameras, there's a lot of invisible data in raw files that can be 'brought back to the life' in post processing.
It's not that you _can't_ perceive them, it's that we're not as sensitive to some kinds of data as others.
A good example is color -- our eyes are more sensitive to some colors than others, yet we can see the whole visible spectrum. What that means is, LEDs can be optimized to produce the colors we see best, and less optimized for the colors we can't see as well, and it looks close enough to perfect to us. Producing perfect light sources is difficult and expensive, so that optimization leads to affordable products. But, it can only be an _optimization,_ not a complete disregard. We would miss cyan if it weren't there, for example.
So, in a JPEG file, the high-frequency detail is basically just a way of saying, details that have sharp contrast in nearby pixels. You would have to zoom way in, making those details larger (and hence, lower-frequency) to be able to tell the difference between one that has all of its contrast intact, and one that has been "low-pass filtered" to result in a smaller data set. If you're zooming in (like when an image has a magnifying glass for detail inspection), or blowing it up to a large screen, or printing it on a page in a magazine, then you may need more of that detail despite not being able to see it on your phone's screen or in a 4x6" photo, for example.
Nice video. I did have to put it on 1.25 speed to keep my attention.
You didn’t ask for this video but it showed up and here you are.
Do l look like I know what a jpeg is?
I just want a picture of a gat dang hot dog
Just some weeks ago I purchased the HEIC plugin for Windows and my mind is blown. The file size is only about one third of that of a JPG file (but depends on the content of course) and I can't even see any artifacts when I zoom in. Most of the times I use Q80 for JPG but with HEIC Q40 is absolutely okay even with high frequency images like dark tree branches and twigs against an almost white sky during winter.
you keep the raw meta data!
@@lucasrem1870 I'm not quite sure what you're trying to say with this.
🤔 Buying plugins in MS Store
@@frapooch What's your concern? Microsoft got your data anyway once you've created an account. Do you prefer to get your stuff from wares sites? If so, think again!
@@Osmone_Everony But... I remember there's a trick to get it gor free. (Something like hidden OEM page in MS Store)
Also, you can also read those files by some 3rd party photo viewer
Can you cover the new high efficiency image compression (.heic) files format that is more advanced than .jpeg?
.heic is just raw compression, keeping the meta data, that's why it more advanced.
It is just using more modern hevc. Frankly speaking it is mostly video codec.
@@lolerie 1980 freaks here!
bro youtube has been recommending this video to me for a month I don't know what but something about this video must be special
Hey really thanks man! I was tod to take a seminar for mpeg conversion even though I saw many articles in Wikipedia and watched many videos I can't able to understand but your video about jpeg conversion have given me enough knowledge to explain the concept of mpeg conversion
my brain hurts
Hi, at 9:20 before Quantization, I'm having trouble understanding the Base image concept. Is one base image itself a 8X8 grid of pixels thereby meaning , each base image has 64 values. So, does size of one base image correspond to size of one block, matching exactly same 64 pixels? Is it right to think that each reconstructed block is basically a stacking(sum) of all base images multiplied with their respective constant ?
Block = b1 x c1 + b2 x c2 + ....+ b64 x c64
where b - base image and c is their respective number of times.
Yes, you absolutely are right. The 8x8 original block is reconstructed by a linear combination of the base images
Can someone please explain, I don't get it from 8:40.
How are you recreating the block using the base images? How do those black/white line/checker combinations map to the pixels, which are currently just values on the black to white scale. And does the base image block map across to the 8x8 block so somehow the top left pixel of each block is described by the solid white image and the bottom right described by the fine checkered image? And what does it mean that "the 64 pixel block is transformed into 64 values / constants that represent how much of each base image is used"? What does is mean "how much of each base image?"
do this for png too!
I am really amazed by the research and hard work you put into making this video. Subscribed !!
I always know what channel to go to when I want my brain to melt. Thanks B.E.