@@mattymattffs Yes, we can blame him for not doing a good enough job at researching the topic of the video and related topics to give a comprehensive and, most importantly, ACCURATE representation of what is going on.
Well...I was around 20 years ago and I ask "why did they do that?". The language didn't change that much in 20 years. Sloppy programmers were even worse than they are now, and they are bad today. Don't get arrogant and think we're better today because we are not. We actually have far more nefarious ways of screwing up; some intentional.
11:17 If `COPY_CHUNK_SIZE` is greater than 1, `len` doesn't necessarily need to be zero. This is a do-while loop, which checks only for non-equality with zero. Since the values are unsigned, subtracting from any sufficiently small `len` (specifically, 0 ≤ `len` < `COPY_CHUNK_SIZE`) will cause the same underflow and out-of-bounds writes.
This would have been true, if not for the surrounding code. The initialization code _ensures_ `len` is always a multiple of `COPY_CHUNK_SIZE`. The only way `len` can become 0 (which is technically still a multiple of `COPY_CHUNK_SIZE `) at the start of the loop, is if `len` is ridiculously big (so adding `len += (COPY_CHUNK_SIZE - 1);` on line 1520 and then masking off the lower bits on the next line would yield 0). Then of course `COPY_CHUNK_SIZE ` is subtracted from zero at the end of the loop, and all trouble ensues. What they did wrong is they had gone with the do-while loop, instead of doing the for-loop, which checks _before_ entering the loop, then updates the variable at the end. (Or they could at least use the 'normal' while loop if they were being weird.)
The first part is so hilarious - the discrepancy in level of expertise between 'compressing is making LARGE files SMALLER' and then jotting aorund at lightning speed between these different tools and codes and reports.. incredible.
"remember guys, faster things have MORE air resistance and slower things have LESS" "So anyway here are the aerodynamic properties of an F-22 fighter jet"
Typical code written by mathematicians, it's impenetrable to software engineers but it sort of makes sense within the context of specs and papers underlying it.
@@lbgstzockt8493 I mean that's not even the problem. I'm sure the research paper didn't say use a bunch of macros and insane spacing and formatting. Naming something 'b' is far less egregious than all that craziness, especially if it really does tie back to a 'b' in the paper. Then it's perfectly understandable.
@@InfiniteQuest86 Yeah, i personally atribute it to the fact that Latex is macro language so most of the reaseraches are way too comfotable with macro-paradigme and want everything to be a macro(even if it means that it will be unreadable and umaintanable, since they are accustomed to turdy-code that has 20 pages of asterisks on how to circumvent issues and bugs)
Things like that are why I stopped using stable distros. Sooner or later you'll find a problem that only exists because a package is out of date, it's a law of nature.
Normally they will backport the security fix to whatever version that is in the repository which results in some frankenversion because updating to a newer version will break policy because reasons. Ubuntu does occasionally allow bypassing this and update if there is a very compelling reason via a stable release updates process, debian only does it extremely rarely and are super strict about it. Apparently the version in debian stable is not affected as the code wasn't introduced yet...
I'm not a C developer, but you have an amazing way of explaining what would otherwise be a very dry topic in a fascinatingly entertaining & interesting manner. It's very cool to be able to watch "over your shoulder" and get some insights into the world of code, vulnerabilities, how to mitigate and so on. Keep on doing what you do, it's fab.
I disagree with your takes. This is algorithmic code. These variables can't have a name. Even if they would have meaningful names you will not understand the complex algorithm anyway. And even bigger problem is you always want to optimize algorithms, because they are really taking a lot of CPU. Optimized code becomes less readable. I have written in the past decompression functions. And that's how such type of code always looks like. 6:45 - they are doing macros to optimize CPU cycles. common practice for decompression algorithms code
god i hate youtube. I literally reworded my comment 6 times just to not get shadow banned. i had to remove few key words and make my comment less informative
Hello! Can you please add some more detail or give some reading on the topic of optimizing CPU cycles with the macros? I am a beginner in C and haven't written any macros and I am curious why and how they are used here. Also, I am skeptical at all people who say that this is "just bad code".
@@S460-v2q I am not really C dev, I mostly program in C# (but I had to read a lot of C code). I don't know all the tricks you can do with macros, or where to read about them, but in short if CPU sees consequent instructions which are independent of each other it can run them in parallel. Instead of writing loop for i from 0 to 16, in most cases your code will work faster if you unwind this loop. The most easiest way is to use macros for such cases
@@Z3rgatul I don't really care if you are C dev, the important part for me was that you had knowledge about C. For that matter I am a nothing dev lol. I read a little bit about what you said and found the GCC docs containing a pragma for unwinding loops and looks cleaner. It has been added to gcc 8 though so probably the devs didn't have it and had to do it by hand.
I've written crypto, decompression and other math derived code too. The only reason you always end up with these terrible names is because the mathematicians who invented the algorithm you're implementing didn't give them meaningful names - but if you take the time tracing back the theory behind the RFC or whatever you realize they almost always *could.* It's often better to keep the terrible names anyway so it lines up with the paper, but that doesn't mean they're good names. (Side whinging: why the heck are Wikipedia maths articles so dang hard to read? The actual math textbooks are far easier most of the time, and those are for people that already have a math background!)
😳 That's what all bit packing/compression code looks like to future self. It seems obvious when it's fresh in your head, until you come back to it 3 months later 😱
@@ysakhno most people that follow this channel has no idea how to code, I don't expect more from a channel that just reads news. Making fun of open source projects instead of providing help. Classic youtuber.
I'm not entirely sure what this discussion is trying to tell me. It's a run of the mill overflow bug, could happen to anyone, it was reported and patched. And... that gets a sensational clickbaity video?
@@jeremydbjbjbjb I think it’s more about it’s a run of the mill bug, this is how it works, this is how to avoid it, and a heads up to update. 7zip is everywhere.
I love the RUclips algorithm. I have zero knowledge about programming, have never used 7zip (to my knowledge) and yet I have this video among my recommendations and I watched it. Only thing I understood was "compressing makes large files smaller". You are really entertaining.
@@TheeUnpleasantPeasant algorithm doesn't just throw exactly the videos you've watched at all times It always experiments throwing stuff you might never have watched. I often get small obscure videos in the tens or hundreds of views.
Brings back memories of when we first realised phones were vulnerable to hacking. First we added thousands of null pointer checks then later used fuzzing to uncover hundreds of less blatant vulnerabilities. It was In the order of a year's work for hundreds of developers and testers. One of my contributions was to make it possible to test the code off target, up to that point all testing was running the complete phone on an emulator or even an actual phone against test equipment.
That template code was common in the 80's and 90's when compilers were bad at optimizing code. I expect this code was first written on 486 type systems where every cycle counts.
When you want a fast code execution sometimes you do define trick, some compilers has a good optimization but they does not know what you really want to do so some times you still need to do this trick for code acceleration
Maybe a very poor attempt at optimization. Pretty much every compiler can out optimize a human these days. If this had been written on a PDP-11 in 1975 when compilers were stupid it could maybe be justified.
So it's a bug in ZStandard implementation, not the 7-zip (LZMA). The ZStandard is not 20 years old btw and 7-zip doesn't have any proprietary codecs from what I remember.
@@XenoCrimson-uv8uz7zip is a archiver program that supports many compression techniques. But 7zip is also the name of the file format that stores LZMA compressed data. what he's saying is that the bug is in the 7zip archiver's implementation of zstd, not in the 7zip file format (LZMA) or in the zstd algorithm itself.
That "impossible to read code" reminds me of my recent adventure in trying to understand file compression by writing my own deflate implementation. I'm not a mathematician, not a number-theory type at all, and this stuff does my head in. I also usually have to read a minimum of like three or four different explanations of an algorithm to even begin to understand how it works, because most of them are written by people who do understand them, and our brains do not share much space in that particular Venn Diagram, so their explanations make no sense to me at all.
@@nickwallette6201 I have the other brain, apparently. I low-key found the assertion that “this code is unreadable” offensive. Judging readability of code whose purpose you are unfamiliar with is either arrogant or ignorant. That code might be perfectly readable to those who are skilled and familiar with the domain, that is, those who maintain the code. I couldn’t judge in this case because I don’t know either compression or the compilers they are targeting well enough, but there’s also nothing that stands out as indicating that it’s “unreadable.” I found the guy in the video’s arguments for why it’s “unreadable” completely unpersuasive. He’s just casting aspersions for no apparent reason.
I thought this was a new CVE. This is something that was fixed in 7zip months ago. I am already on a version above the one mentioned containing the fix
Not infinite, there just needs to exist q such that len=q*chunk_size mod int_range or whatever type they using. Nvm, that can occur only if chunk_size is not a power of 2 which it probably is
Correct me if I'm wrong, but the vuln, and the repo you visited, are for a fork of 7zip modified by mcmilk to include the zstd algorithm, because 7zip's original author declined to include it. If so, 7zip's creator may appreciate some clarification here.
He really would. But he's Russian, so that's out of the question for American fascists. To me the name on mcmilk's page reads German. That completely destroys the anti-russian sentiment, but it doesn't stop the crowd here. I'm German too by the way, so no hard feelings. Talk is cheap every Open Source maintainer is thankful for support and PRs.
this type of analysis doesn't fall under the umbrella of IT (information technology), it's cybersecurity and reverse engineering which is a different field entirely
You should make a video on all of the tools that people can use to FIND easy to fix bugs, and vulnerabilities in their own projects. Like a "Fix Your Shit Toolkit" that gives you tons of useful things they can run against their own stuff like you just did here.
The beauty of open source software is that the whole premise of many hands make light work and many eyeballs spot flaws and bugs faster is that by scrutinizing and finding problems with it you're actually making it safer and better. While indeed almost all open source projects are maintained by 1 or 2 guys, writing all of the code this is true for almost all software projects. Unlike closed source though, you're welcome to check and improve their work. Just because closed source is "Secret" it doesn't make it any more secure because at the end of the day, the compiled executable code is 100% visible to tinker with and there are very talented reverse engineers who can quickly pick it apart. Look at how DRM gets defeated and how much lengths even Denuvo must go to to try and encrypt the actual machine code haha!
7 zip is an awesome utility, tho. For a basic windows user it may be one of the easiest way of reading folders hidden by malware and stuff like that, even when it seems impossible from basic cmd commands.
@@byAnArgentinian eh, I stopped using 7zip because of all the security vulnerabilities that took a long time to patch. Their code isn’t very good, as this video demonstrates.
Месяц назад
How do you read malware hidden files in 7zip ? By just exploring a directory with the 7zip binary ?
Interesting to see this video. I've used 7-zip many times and have found it to be very good. Never had any problems. Having said that, I wouldn't know a code vulnerability if one walked up to me with a flashing neon sign saying "I'M A CODE VULNERABILITY".
Yeah the title was a bit of click bait. The code is hard to understand but its efficient and runs well and was patched. It would have to be a malicious archive to cause all those crashes.
I wish Igor would implement recovery records like Rar does. I still have to use Rar on linux for that purpose (and for work, but I digress), and it's sad to see that 7-zip still is stuck in 2007 on this issue. People actually care about archiving their data, and whenever they find out that 7-zip eats its own header data randomly on file creation and have no clue how to recover their family photos/emails, etc, always breaks my heart. This is never an issue on Rar or standard ZIP/Tar creations, and it shouldn't be on 7-zip
Emm, ZIP literally puts the main header at the end of the ZIP-file just because it's easier to overwrite it when you add more files to it. If your file system failing to safely crash, and if RAR does not creates a copy of the file before editing it and then renaming it over the old one once it's done creating it, nothing will help you, because file is gone from your file system. Also, just use tar to archive photos, or zip with no compression disabled. If you want to store in the same archive something very compressable (like plain-text documents), then that's very not efficient, because if you put everything in the tar (that does not do any compression, it's just a bunch of blocks 512-bytes each with just headers and data in them) and then compress everything with xz (lzma, the same as 7-z) or gzip (exists everywhere, including web), you'll get more compression out of it because compressor (xz, gzip) will compress all of the tar archive ("archive" is just a bunch of files stored more closely to each other than on the file system, because file systems, usually, store each file in chunks of 4 kb each, usually, for optimizations reasons), not each file individually.
@@JohnSmith-vd8nn if a single bit of data bitflips (I.E, 1 becomes 0 because of hardware degradation anywhere in the archive) on 7zip, you have no chance of recovery nor any chance of extracting because it's one long stream. Rar on the other hand, with no recovery record, will just skip the file affected and extract everything else. Rar with a 5% recovery record, will go right ahead and fix the bitflip, and will extract everything just like it was
@@JohnSmith-vd8nn the 'eating header data' part is mostly seen in enterprise environments. I have seen this issue firsthand at work, which caused us to switch from 7zip to RAR and gzipped tarballs, but I know some friends of friends who have said they had that issue too, and friends of those friends' friends, and etc, so not an entirely isolated incident
My guess is that the macros are used for speed reasons. They avoid the overhead of a function call, and were commonly used in speed critical code before compilers could reliably use inline to do the same thing. We used to do this back when I learned C in the 80's but I doubt that many people do it now since it makes the code far less readable.
@55s - puzzling claim that 7z is proprietary. you should have corrected that error on sight instead of repeating it ;-p 7z format is public domain. that's freer (sic) than open source!
7:00 Why do Macro Programming? Because when the project was first written, C didn't have _inline_ functions and the compiler didn't just inline stuff on its own.
Yeah its been like this since about 2012. I daily fuzz ^^ [ pen test] my own applications just to see vulnerabilities. Kali Linux is invaluable since about 2013. That's when everything computer software, for me, got easy: kali linux is my daily driver. I remember before kali, I had to actually scrape the forums, android sites, hackintosh sites, freebsd forums, archlinux sites, torrents, and talk to white hatters to find information that would give me a glimpse or name of some dungeon program. That all ended with kali + github + youtube + twitter. That and search github for fuzzers and analyzing programs and plugins. It literally takes like 30 minutes or less to find exploits with the right hardware.
@2:36: The difference between Byte and unsigned. In the 7zip realm, Byte is an unsigned char (8 bits wide), unsigned is an unsigned int (32-bits wide, on x86-64). The change of width does not really affect outcome, since the overflow is checked by the inserted lines 1313 & 1314 (the actual bug fix). The type change is probably done so that the test at line 1313 is done on two variables of the same width, to avoid a compiler warning.
@@0xhhhhff 7zip has acquired a spaghetti-like structure throughout the years. I haven't read all of the diffs in the code, but it's likely that similar boundary checks were added in as many as a dozen (or two dozens) similar loops in the code.
@@0xhhhhff I have to say that I do have an issue with his explanation @11:05. If you and pause @6:57, you'll see the whole code, that is COPY_PREPARE, COPY_CHUNK, and CopyLiterals(). The two commented lines above CopyLiterals() are the contract: len != 0 and len
@@jacobstamm Oh I mean, people tend to shift the conversation to "Can we get a RCE out of this?" when the CVE has base CVSS score of like 7 or below. It's kinda weird, I don't see it that way, RCE shouldn't be a goal imo, you can cause DOS or do serious damage, that should be a concern.
if you haven't done an episode lately on code browsers and browsing/browser features...always nice for a refresh. EDIT: also, GREAT video on not just instrumenting your code but what it's for. It's stuff like that tells me I'm subbed to the right channel. If you want to go deeper, if you haven't talked about generating symbol files for use with trace/logic analyzers, that's pretty amazing stuff - especially for people just starting, very visual demonstration of the internals in process of embedded debug/test. It becomes even more useful in test/validation automation ;-) thanks again for spending your time putting up nice videos that teach people actual skills and thought processes behind them.
In this context, I think "integer underflow" is not quite the correct term for the title of the CVE. Any integer operation that would lead to "wrapping around" would be considered overflow, regardless of in what direction you're going. Generally, "underflow" is a term reserved for inaccuracies in floating-point operations.
I started using adb, then gdb, 45 years ago on V7 then first gen BSD. With the incredible power of modern software dev tools, it's somehow heartwarming that gdb still matters... that said, the entire sequence you use seems like it should be used before any production software is released.
7zio ZStandard is not an offical 7zip product but instead a modified version of 7zip to support additional archive formats such as ZStd, LZ4, LZ5, Lizard.
I thought that was the case. Gotta love the claim of 1000’s of crashes too, I’ve used 7zip for probably 15 years and think I’ve only encountered one or two while doing other things and leaving it extracting in the background
He's talking about main 7-Zip. The 7-Zip Standard fork have different implementation of Zstd which is not affected by this CVE. 7-Zip only supports decompression and had the CVE
Old code has lots of macros because back in the day, you couldn’t trust the compiler to actually inline. So everyone had to know how to write macros and used them extensively. So they got used in ways that were terrible ideas.
Heavy use of macros is generally because compilers weren't always great, and you found profiling results which indicated that there was performance to be had. So you rewrote things using macros to get that delicious inline code. Unfortunately, macros are sticky, you generally won't later get profiling results which tell you that your compiler can now do all that for you, because the non-macro code paths no longer exist and the profiler doesn't see macros. For the MOST part, people don't go crazy with C macros just on a whim.
7-Zip added a zstd decoder recently. It's not 20 years old. The author wanted to write it on his own rather than use a library, and couldn't yet make an encoder.
If you look at the preprocessed code, the macros should expand to the code they're defined with anywhere you use them, the idea is that its like making a function call without having to actually make a function call so it doesn't have to allocate a stack frame and all that shit but whether or not it makes any sense to do that depends on how often the code is being ran and it probably makes les sense to do that on modern computers, I think nowadays the preferred way to do this is to use the inline keyword
If ever you encounter a PySR generated structure... You will never forget it. It is like a code from the year 3000. A must if you haven't seen it. It is beyond human.
3:24 "I was realizing that this code is impossible to read" Dear gods, Silicon Valley season 2 was right when the Dinesh and Gilfoyle tried to make sense of the original compression library code without the help of Richard XD
Where I used to work, 7-zip was on all our windows laptops. It was widely used to extract packages of logs that were zip’s, cpio’s, tgz’s, and xz’s within zips.
@11:20 "len" does not have to start out at 0, to allow this bug to overwrite unintended memory. It is sufficient for that variables value to be less-than COPY_CHUNK_SIZE.
I play your videos for gaining knowledge, i stay for the backgroundnoise/ white noise that they become when i eventually lost the track. My sleeping quality imrpoved.
algorithmic code is always hard to understand. First you must understand the algo, next the code and the tricks are used, and after that you might be able to understand it and change things. Even for the developer itself it's hard to understand when revisiting the code after a while. Same is true for graphics engines, or even small javascript-animations. Cool video showing about using c, fuzzing & the debugger!
You're so clean, succinct, and clear as a narrator. It'd be amazing if you did a tutorial series on learning to use some of these tools like the fuzzer, something to teach developers how to be their own red team?
ZDI only pay for exploitable bugs, so the person who found this would have been able to exploit this or at the least show control of execution flow. Great video!
The biggest question was not asked - why do they *use their own zstd decompressor?* Why they rewrite decompressors of initially open-source standards, instead of using libraries that were extensively fuzzy-tested? Yes, 20 years ago it was fine - many libraries were not portable, not optimized or not opensource (like RAR), but why now?
Many of the most widely used open source projects are riddled with places where signed integer variables are used to hold values than can never legitimately be negative. The real surprise for me is that vulnerabilities like this aren’t found more frequently.
I could follow that fine just from the brief view you gave on screen. Nothing wrong with it. You have some serious snob issues if you think you need to verbose name a temp loop variable or limited scope variable that does not span a page. From what I can see, that code is perfectly readable.
I learned more watching this in 6 min that I did in 4 years as a "security engineer" Really helps to have people teach instead of hide information from you..
It seems it wasn't affected, as the code was not present. Look for it in the Debian security tracker (CVE-2024-11477). The stable 7zip was not affected neither because the bug was introduced in v24.01 and the Debian stable package currently uses 22.01. LMAO and it really shows that stable is actually better for security concerns (older versions with known bugs are also patched for security reasons when needed.)
What's odd is that they called it an RCE, while it's not directly an RCE, as 7-Zip itself does not interact with network in any way by itself to be directly exploitable, and there's no known software that is vulnerable because of it
@@LowLevelTV where are they get it though? Wrote entierly from scratch? Because if this bug exists somewhere else, like in libzstd, then we're in a big trouble. Arch, for example, opted in to compress everything with this algorithm in the mainline repo. I bet many distros done this as well.
@@rogo7330 Yes, Igor Pavlov wrote the Zstd implementation used in 7zip himself, for whatever reason. So other Zstd implementations that are done to spec aren't affected.
The reference implementation of zstd has already been security audited as far as I know. The comments at the top of 7-zip's implementation say that it was written based on the spec. It doesn't use any of the code from the reference implementation.
Of all the criticisms, you are complaining about names like "src" and "inSize"? These are very common names in programming that are intuitive and unambiguous when talking about processing the contents of a file. I agree that "b0" and "b1" are not good names, but the examples you used are completely reasonable. What would you consider a minimum acceptable variable name?
There's an absolute massive ton of programs that bundle their own portable version of 7zip along with the rest of the program. Sometimes there's even multiple nested bundles of 7zip. So presumably all of those would need to be patched too, which is never going to happen.
Which typically run once, and only on their own data. The only thing that could interfere with that is a program already running on a system or a user, which could do the same things anyway.
(Macros are often used because it saves the overhead of the stack push and function call. It runs significantly faster if it's called millions of times. But it's only useful if the macro is used in many places, if used in only one place then it makes no sense. )
8:22 - I'm sorry but I was just exploding out of laughter on that xD TRRRRRRRRRRRRR Its interesting how we take softwares available to everyone as "suppose to work, no harm" for granted.
Was surprised when you recreated the context using crash file and gdb. the crash file is same as core dumped? That is superb. Never seen a way to recreate a crash context in any "modern/high level languages". We just simulate it in our head. I mostly do php, js and little C++. And debugging tools in these is garbage.
Looks like the COPY_CHUNKS macro needs to defensively check for less than zero rather than assume the len var will land at zero (false), just in case the buffer size isn't divisible by CHUNKSIZE. Big assumption in COPY_CHUNKS that it is given a matching buffer and chunksize. However, maybe the COPY_PREPARE macro is doing that check and setting len to something appropriate, a bit hard to read ...
This quirk of C that you can treat an integer as a boolean, in an 'if' test, has resulted in other bugs than this one. Why not just test exolicitly for > 0 e.g. while ((len -= COPY_CHUNK_SIZE) > 0) and avoid possibility of going past the boundary. By more than part of one chunk size.
Go check out Docker's security features! Secure your applications with docker scout: dockr.ly/4g4UdDJ
never!!! 😀🥰🤫 j/k
ok
I have a weird question: What keyboard do you have? It's sounds so amazing.
@@PewPewPew_viperi have 33 years old mechanik one. Still works.
Mitsumi yellow switches. 65 gram push for click.
@@GOOGLE-IS-EVIL-EMPIREoooo nice
1:01 7zip is opensource. 7z is open format.
7za and p7zip open-source
I’m open source drinking 7up
I’m closed-source playing 7Sins
@@AdrianDX I'm proprietary and will tell you nothing
Open format is new to me, what’s it?
most of the time you ask "I don't know why they did that" the answer is: "That's the way it was done then (20 years ago)"
Can you blame him? It was before he was born
@@mattymattffs but he should know, because he is making Videos about it
@@mattymattffs Yes, we can blame him for not doing a good enough job at researching the topic of the video and related topics to give a comprehensive and, most importantly, ACCURATE representation of what is going on.
@@satsubatsu347 You're taking a joke about him being incredibly young very seriously.
Well...I was around 20 years ago and I ask "why did they do that?". The language didn't change that much in 20 years. Sloppy programmers were even worse than they are now, and they are bad today. Don't get arrogant and think we're better today because we are not. We actually have far more nefarious ways of screwing up; some intentional.
11:17 If `COPY_CHUNK_SIZE` is greater than 1, `len` doesn't necessarily need to be zero. This is a do-while loop, which checks only for non-equality with zero. Since the values are unsigned, subtracting from any sufficiently small `len` (specifically, 0 ≤ `len` < `COPY_CHUNK_SIZE`) will cause the same underflow and out-of-bounds writes.
Yeah good catch I realized this after I published
medžuslovjansky integer podtok
Yeah, came to comments to mention this. Checking for '
This would have been true, if not for the surrounding code. The initialization code _ensures_ `len` is always a multiple of `COPY_CHUNK_SIZE`. The only way `len` can become 0 (which is technically still a multiple of `COPY_CHUNK_SIZE `) at the start of the loop, is if `len` is ridiculously big (so adding `len += (COPY_CHUNK_SIZE - 1);` on line 1520 and then masking off the lower bits on the next line would yield 0). Then of course `COPY_CHUNK_SIZE ` is subtracted from zero at the end of the loop, and all trouble ensues.
What they did wrong is they had gone with the do-while loop, instead of doing the for-loop, which checks _before_ entering the loop, then updates the variable at the end. (Or they could at least use the 'normal' while loop if they were being weird.)
@@TanerH Haha.. I feel silly sometimes about checking for "
It's funny that you used afl-gcc (named after American Fuzzy Lop, a domesticated rabbit species) to go down a rabbit hole...
I actually did not have a clue that it was a rabbit, I only knew of it as the software fuzzing tool, this is awesome :D
So it's the furry version of gcc? xD
TOP TIER COMMENT
As a rabbit enthusiast I approve of this.
@@unconnectedbedna All versions of GCC are furry.
The first part is so hilarious - the discrepancy in level of expertise between 'compressing is making LARGE files SMALLER' and then jotting aorund at lightning speed between these different tools and codes and reports.. incredible.
"remember guys, faster things have MORE air resistance and slower things have LESS"
"So anyway here are the aerodynamic properties of an F-22 fighter jet"
LOL
Typical code written by mathematicians, it's impenetrable to software engineers but it sort of makes sense within the context of specs and papers underlying it.
Yeah, this looks like it was implemented straight from a research paper.
@@lbgstzockt8493 I mean that's not even the problem. I'm sure the research paper didn't say use a bunch of macros and insane spacing and formatting. Naming something 'b' is far less egregious than all that craziness, especially if it really does tie back to a 'b' in the paper. Then it's perfectly understandable.
@@InfiniteQuest86no one is saying the problem is that it's from research, they're saying those sorts of problems are common in research code.
@@SianaGearz And most of the time this is real paper! Lucky you if you have a scanned pdf.
@@InfiniteQuest86 Yeah, i personally atribute it to the fact that Latex is macro language so most of the reaseraches are way too comfotable with macro-paradigme and want everything to be a macro(even if it means that it will be unreadable and umaintanable, since they are accustomed to turdy-code that has 20 pages of asterisks on how to circumvent issues and bugs)
7zip devs: Patched in v24...
Meanwhile: Debian, Ubtunu, Mint maintainers: v23
I'm on Debian and `7zip` is on version `22.01`. Meanwhile, I have `pzip7-full`, which is on version `16.*`, but in their versioning scheme, so...
classic package manager moment
Things like that are why I stopped using stable distros. Sooner or later you'll find a problem that only exists because a package is out of date, it's a law of nature.
Don't have this problem anymore... I use Arch, BTW :3
Normally they will backport the security fix to whatever version that is in the repository which results in some frankenversion because updating to a newer version will break policy because reasons. Ubuntu does occasionally allow bypassing this and update if there is a very compelling reason via a stable release updates process, debian only does it extremely rarely and are super strict about it.
Apparently the version in debian stable is not affected as the code wasn't introduced yet...
I'm not a C developer, but you have an amazing way of explaining what would otherwise be a very dry topic in a fascinatingly entertaining & interesting manner. It's very cool to be able to watch "over your shoulder" and get some insights into the world of code, vulnerabilities, how to mitigate and so on. Keep on doing what you do, it's fab.
thank you so much, that is very kind :)
Didn't know that there are 7-zip haters until I read newest comments. It is by far the best of its kind that is truly free.
Me neither I've even recently began to love it
I didn't know such a kind of hater could exist.
I've used 7zip over WinZip for years just because they don't heckle me.
Winrar is just better
@@emilydavidson8844 ?
7z? Should have used the xz utils, much safer.
jia tan? 👀
the test files ensure its safety
@@LowLevelTV lmaoooo was about to say that
legendary comment!
To be fair everything is safer until it is not. Bad actors are bad actors and they can be in any project.
Same with mistakes in coding.
I disagree with your takes. This is algorithmic code. These variables can't have a name. Even if they would have meaningful names you will not understand the complex algorithm anyway.
And even bigger problem is you always want to optimize algorithms, because they are really taking a lot of CPU. Optimized code becomes less readable.
I have written in the past decompression functions. And that's how such type of code always looks like.
6:45 - they are doing macros to optimize CPU cycles. common practice for decompression algorithms code
god i hate youtube. I literally reworded my comment 6 times just to not get shadow banned. i had to remove few key words and make my comment less informative
Hello! Can you please add some more detail or give some reading on the topic of optimizing CPU cycles with the macros? I am a beginner in C and haven't written any macros and I am curious why and how they are used here. Also, I am skeptical at all people who say that this is "just bad code".
@@S460-v2q I am not really C dev, I mostly program in C# (but I had to read a lot of C code). I don't know all the tricks you can do with macros, or where to read about them, but in short if CPU sees consequent instructions which are independent of each other it can run them in parallel. Instead of writing loop for i from 0 to 16, in most cases your code will work faster if you unwind this loop. The most easiest way is to use macros for such cases
@@Z3rgatul I don't really care if you are C dev, the important part for me was that you had knowledge about C. For that matter I am a nothing dev lol. I read a little bit about what you said and found the GCC docs containing a pragma for unwinding loops and looks cleaner. It has been added to gcc 8 though so probably the devs didn't have it and had to do it by hand.
I've written crypto, decompression and other math derived code too. The only reason you always end up with these terrible names is because the mathematicians who invented the algorithm you're implementing didn't give them meaningful names - but if you take the time tracing back the theory behind the RFC or whatever you realize they almost always *could.*
It's often better to keep the terrible names anyway so it lines up with the paper, but that doesn't mean they're good names.
(Side whinging: why the heck are Wikipedia maths articles so dang hard to read? The actual math textbooks are far easier most of the time, and those are for people that already have a math background!)
3:24 "This code is impossible to read" Oh good, I'm not as stupid as I thought I was. XD
What are you talking about? *All* C code looks like this. It is _impossible_ to write it differently. What names the variables have is irrelevant.
@@ysakhno uhm... just no.
Decent naming scheme and proper structs & macros usage can lead to very readable C code, it's just hard to find.
That does not mean what you think, you could be at the bottom tier to begin with... just kidding 🤣🤣🤣
😳 That's what all bit packing/compression code looks like to future self. It seems obvious when it's fresh in your head, until you come back to it 3 months later 😱
@@ysakhno most people that follow this channel has no idea how to code, I don't expect more from a channel that just reads news. Making fun of open source projects instead of providing help. Classic youtuber.
TL;DR:
* Download the latest 7-zip and you'll be okay
Dang it, you spoiled half of this channel videos
Is forgetting to zip your pants a memory corruption bug?
No it is an illegal overflow of variable size lol
I can pen test but first I need to cut out your corrupted content. If I can't get access, I'll use the backdoor.
I’m worried about what casting would do in this circumstance 💀
Might want to fix that dangling pointer.
yes unfortunately the Paffendorf video had a buffer overflow and overwrote the memory block i was using to store zipPants()
I'm not entirely sure what this discussion is trying to tell me. It's a run of the mill overflow bug, could happen to anyone, it was reported and patched. And... that gets a sensational clickbaity video?
@@jeremydbjbjbjb I think it’s more about it’s a run of the mill bug, this is how it works, this is how to avoid it, and a heads up to update. 7zip is everywhere.
I agree to the first sentence. I know English reasonably well, but this was a dialect i have no clue about. Why am i here?😯
I love the RUclips algorithm. I have zero knowledge about programming, have never used 7zip (to my knowledge) and yet I have this video among my recommendations and I watched it. Only thing I understood was "compressing makes large files smaller". You are really entertaining.
It is pretty funny how much prior context you need to understand almost every word in this video
This just proves the algorithm stinks lol
Same 🤣
same :D
@@TheeUnpleasantPeasant algorithm doesn't just throw exactly the videos you've watched at all times
It always experiments throwing stuff you might never have watched. I often get small obscure videos in the tens or hundreds of views.
Brings back memories of when we first realised phones were vulnerable to hacking. First we added thousands of null pointer checks then later used fuzzing to uncover hundreds of less blatant vulnerabilities. It was In the order of a year's work for hundreds of developers and testers. One of my contributions was to make it possible to test the code off target, up to that point all testing was running the complete phone on an emulator or even an actual phone against test equipment.
4:09 ermmm...its the GNU Compiler Collection, akshually
AKSHUALLY
didn't they just rename it? IIRC GCC used to be a compiler for a lot of languages, now it supports like 4
Also, "G-N-U" instead of "Gnoo".
no? unless it was retroactively added(which is possible), even versions from 1999 call it "GNU Compiler Collection" @@tcscomment
@@SomebodyHere-cm8dj I honestly have no idea.
What am I doing here.
I like turtles
i ask that myself everyday i wake up.
Aliens are real
I don't belong here
I don't care if it hurts.
That template code was common in the 80's and 90's when compilers were bad at optimizing code. I expect this code was first written on 486 type systems where every cycle counts.
Actually Zstd was created rather recently (~2016) at Facebook
Nope, 7zip is much newer than that, and zstd even newer.
@@jsrodman just because the program is newer doesnt mean the programmer isnt
old habits die hard (and can be passed down)
When you want a fast code execution sometimes you do define trick, some compilers has a good optimization but they does not know what you really want to do so some times you still need to do this trick for code acceleration
Maybe a very poor attempt at optimization. Pretty much every compiler can out optimize a human these days. If this had been written on a PDP-11 in 1975 when compilers were stupid it could maybe be justified.
So it's a bug in ZStandard implementation, not the 7-zip (LZMA). The ZStandard is not 20 years old btw and 7-zip doesn't have any proprietary codecs from what I remember.
yeah its a bug in their implementation of 7zip, not the Zstd spec.
@LowLevelTV I am confused, you agree and say the opposite?
You agree that its a bug in Zstd implementation not in 7-zip, then say its a bug in 7zip?
@@XenoCrimson-uv8uz i think he agreed with the no proprietary codecs?
@@XenoCrimson-uv8uz7zip is a archiver program that supports many compression techniques. But 7zip is also the name of the file format that stores LZMA compressed data.
what he's saying is that the bug is in the 7zip archiver's implementation of zstd, not in the 7zip file format (LZMA) or in the zstd algorithm itself.
@@XenoCrimson-uv8uz He means it's a bug in 7-Zip's implementation of Zstd rather than the Zstd spec itself.
That "impossible to read code" reminds me of my recent adventure in trying to understand file compression by writing my own deflate implementation. I'm not a mathematician, not a number-theory type at all, and this stuff does my head in. I also usually have to read a minimum of like three or four different explanations of an algorithm to even begin to understand how it works, because most of them are written by people who do understand them, and our brains do not share much space in that particular Venn Diagram, so their explanations make no sense to me at all.
@@nickwallette6201 I have the other brain, apparently. I low-key found the assertion that “this code is unreadable” offensive. Judging readability of code whose purpose you are unfamiliar with is either arrogant or ignorant.
That code might be perfectly readable to those who are skilled and familiar with the domain, that is, those who maintain the code. I couldn’t judge in this case because I don’t know either compression or the compilers they are targeting well enough, but there’s also nothing that stands out as indicating that it’s “unreadable.”
I found the guy in the video’s arguments for why it’s “unreadable” completely unpersuasive. He’s just casting aspersions for no apparent reason.
I thought this was a new CVE. This is something that was fixed in 7zip months ago. I am already on a version above the one mentioned containing the fix
6:55 the reason they're doing that there is in the name - it's hand optimized to enable vectorization!
len doesn't have to be 0 to cause a crash, it just needs to not be a multiple of COPY_CHUNK_SIZE, right?
Exactly right, as far as I can see, I thought the same thing.
That's one of the things I learned early on, never loop down to x == 0, loop down to x
yes, but I don't think you can exploit it then since you would have an infinite loop
Not infinite, there just needs to exist q such that len=q*chunk_size mod int_range or whatever type they using. Nvm, that can occur only if chunk_size is not a power of 2 which it probably is
@@OhhCrapGuy Correct.
Correct me if I'm wrong, but the vuln, and the repo you visited, are for a fork of 7zip modified by mcmilk to include the zstd algorithm, because 7zip's original author declined to include it. If so, 7zip's creator may appreciate some clarification here.
He really would.
But he's Russian, so that's out of the question for American fascists.
To me the name on mcmilk's page reads German. That completely destroys the anti-russian sentiment, but it doesn't stop the crowd here.
I'm German too by the way, so no hard feelings. Talk is cheap every Open Source maintainer is thankful for support and PRs.
@LiveWireBT OK, Lennart.
@@LiveWireBT Russia is loved by American fascists though.
@@LiveWireBT the fascists do love Russia tho lol
2:37 `const Byte ptr` is an unsigned byte (8 bits), `const unsigned sym` is an unsigned int (32-bits)
I've been in IT for more than 2 decades and still get blown away by this level of security analysis.
yeah he's pretty impressive . Does anybody know what his actual job is , or was ?
this type of analysis doesn't fall under the umbrella of IT (information technology), it's cybersecurity and reverse engineering which is a different field entirely
@@sammxn-w2v I'd rather my IT staff had this level of expertise rather than just be capable of configuring a server
@@sammxn-w2v cyber security also known as IT security most definitely falls under IT
@@sammxn-w2v so does reverse engineering specifically reverse engineering of computer programs the very thing he does. how is either of them not IT?
I've been waiting for 8zip to drop for decades
3:46 Obfuscation by programmer
truest, most efficient security measure
@@sanjaycse9608 me when I name my Java custom named query Journal.query and then place it in the publisher class
You should make a video on all of the tools that people can use to FIND easy to fix bugs, and vulnerabilities in their own projects. Like a "Fix Your Shit Toolkit" that gives you tons of useful things they can run against their own stuff like you just did here.
The beauty of open source software is that the whole premise of many hands make light work and many eyeballs spot flaws and bugs faster is that by scrutinizing and finding problems with it you're actually making it safer and better. While indeed almost all open source projects are maintained by 1 or 2 guys, writing all of the code this is true for almost all software projects. Unlike closed source though, you're welcome to check and improve their work. Just because closed source is "Secret" it doesn't make it any more secure because at the end of the day, the compiled executable code is 100% visible to tinker with and there are very talented reverse engineers who can quickly pick it apart. Look at how DRM gets defeated and how much lengths even Denuvo must go to to try and encrypt the actual machine code haha!
7 zip is an awesome utility, tho. For a basic windows user it may be one of the easiest way of reading folders hidden by malware and stuff like that, even when it seems impossible from basic cmd commands.
@@byAnArgentinian eh, I stopped using 7zip because of all the security vulnerabilities that took a long time to patch. Their code isn’t very good, as this video demonstrates.
How do you read malware hidden files in 7zip ? By just exploring a directory with the 7zip binary ?
The built explorer of 7zip shows (well) hidden folders and files under Windows as if you were using linux
what folders are still hidden after you enable showing hidden files/folders in the file explorer settings?
@declspecl oh believe me there's WAY more than that lol
Interesting to see this video.
I've used 7-zip many times and have found it to be very good. Never had any problems.
Having said that, I wouldn't know a code vulnerability if one walked up to me with a flashing neon sign saying "I'M A CODE VULNERABILITY".
Yeah the title was a bit of click bait. The code is hard to understand but its efficient and runs well and was patched. It would have to be a malicious archive to cause all those crashes.
7-zip is awesome. Didn't know it's just two people!
I wish Igor would implement recovery records like Rar does. I still have to use Rar on linux for that purpose (and for work, but I digress), and it's sad to see that 7-zip still is stuck in 2007 on this issue. People actually care about archiving their data, and whenever they find out that 7-zip eats its own header data randomly on file creation and have no clue how to recover their family photos/emails, etc, always breaks my heart. This is never an issue on Rar or standard ZIP/Tar creations, and it shouldn't be on 7-zip
I'm not familiar with this 7zip issue. Would you please expand on it?
Emm, ZIP literally puts the main header at the end of the ZIP-file just because it's easier to overwrite it when you add more files to it. If your file system failing to safely crash, and if RAR does not creates a copy of the file before editing it and then renaming it over the old one once it's done creating it, nothing will help you, because file is gone from your file system. Also, just use tar to archive photos, or zip with no compression disabled. If you want to store in the same archive something very compressable (like plain-text documents), then that's very not efficient, because if you put everything in the tar (that does not do any compression, it's just a bunch of blocks 512-bytes each with just headers and data in them) and then compress everything with xz (lzma, the same as 7-z) or gzip (exists everywhere, including web), you'll get more compression out of it because compressor (xz, gzip) will compress all of the tar archive ("archive" is just a bunch of files stored more closely to each other than on the file system, because file systems, usually, store each file in chunks of 4 kb each, usually, for optimizations reasons), not each file individually.
@@JohnSmith-vd8nn if a single bit of data bitflips (I.E, 1 becomes 0 because of hardware degradation anywhere in the archive) on 7zip, you have no chance of recovery nor any chance of extracting because it's one long stream. Rar on the other hand, with no recovery record, will just skip the file affected and extract everything else. Rar with a 5% recovery record, will go right ahead and fix the bitflip, and will extract everything just like it was
@@JohnSmith-vd8nn the 'eating header data' part is mostly seen in enterprise environments. I have seen this issue firsthand at work, which caused us to switch from 7zip to RAR and gzipped tarballs, but I know some friends of friends who have said they had that issue too, and friends of those friends' friends, and etc, so not an entirely isolated incident
Just stop using 7zip for visual media
My guess is that the macros are used for speed reasons. They avoid the overhead of a function call, and were commonly used in speed critical code before compilers could reliably use inline to do the same thing. We used to do this back when I learned C in the 80's but I doubt that many people do it now since it makes the code far less readable.
@55s - puzzling claim that 7z is proprietary. you should have corrected that error on sight instead of repeating it ;-p 7z format is public domain. that's freer (sic) than open source!
7:00 Why do Macro Programming? Because when the project was first written, C didn't have _inline_ functions and the compiler didn't just inline stuff on its own.
Even with inline functions there's no guarantee that the compiler will inline the code. Macro programming on the other hand guarantees it.
I've been out of the loop for a long, long time... But you can fuzz like that now?! That's insane... And also, scary and awesome.
Yeah this made me feel ipd and outdated
Yeah its been like this since about 2012. I daily fuzz ^^ [ pen test] my own applications just to see vulnerabilities. Kali Linux is invaluable since about 2013. That's when everything computer software, for me, got easy: kali linux is my daily driver. I remember before kali, I had to actually scrape the forums, android sites, hackintosh sites, freebsd forums, archlinux sites, torrents, and talk to white hatters to find information that would give me a glimpse or name of some dungeon program. That all ended with kali + github + youtube + twitter. That and search github for fuzzers and analyzing programs and plugins. It literally takes like 30 minutes or less to find exploits with the right hardware.
I'm not a good programmer but I learned a lot by following and understanding your process flow. Very well presented.
Having ads in browser as a cyber security channel is crazy
Why? Do ads bring security issues?
I've thought they are just annoying.
@@AsdAsd-n6u Yes, blocking javascript in different websites is crucial
@@emerjay348 why don't browsers do this by default then?
Most do @@shotnothing3419
@@shotnothing3419 user experience, removing JavaScript can break websites
@2:36: The difference between Byte and unsigned. In the 7zip realm, Byte is an unsigned char (8 bits wide), unsigned is an unsigned int (32-bits wide, on x86-64). The change of width does not really affect outcome, since the overflow is checked by the inserted lines 1313 & 1314 (the actual bug fix). The type change is probably done so that the test at line 1313 is done on two variables of the same width, to avoid a compiler warning.
So the location he said the bug is, isn't actually where it is?
@@0xhhhhff 7zip has acquired a spaghetti-like structure throughout the years. I haven't read all of the diffs in the code, but it's likely that similar boundary checks were added in as many as a dozen (or two dozens) similar loops in the code.
@@0xhhhhff I have to say that I do have an issue with his explanation @11:05. If you and pause @6:57, you'll see the whole code, that is COPY_PREPARE, COPY_CHUNK, and CopyLiterals(). The two commented lines above CopyLiterals() are the contract: len != 0 and len
Love this story format, man~~ I also love when you're excited to share stuff. When you're excited, I'm excited.
Every. CVE. is. Not. RCE. 😭
@@xanaxity but every RCE can be a CVE
@@EvilGPT Nowadays, with user level utilities (like 7z) RCE is always a chain of CVEs.
Oh yehhhhh
@@xanaxity “every CVE is not RCE” means “no CVE is RCE”. What you meant was “not every CVE is RCE”.
@@jacobstamm Oh I mean, people tend to shift the conversation to "Can we get a RCE out of this?" when the CVE has base CVSS score of like 7 or below. It's kinda weird, I don't see it that way, RCE shouldn't be a goal imo, you can cause DOS or do serious damage, that should be a concern.
if you haven't done an episode lately on code browsers and browsing/browser features...always nice for a refresh.
EDIT: also, GREAT video on not just instrumenting your code but what it's for. It's stuff like that tells me I'm subbed to the right channel. If you want to go deeper, if you haven't talked about generating symbol files for use with trace/logic analyzers, that's pretty amazing stuff - especially for people just starting, very visual demonstration of the internals in process of embedded debug/test. It becomes even more useful in test/validation automation ;-) thanks again for spending your time putting up nice videos that teach people actual skills and thought processes behind them.
In this context, I think "integer underflow" is not quite the correct term for the title of the CVE. Any integer operation that would lead to "wrapping around" would be considered overflow, regardless of in what direction you're going. Generally, "underflow" is a term reserved for inaccuracies in floating-point operations.
Correct.
I started using adb, then gdb, 45 years ago on V7 then first gen BSD. With the incredible power of modern software dev tools, it's somehow heartwarming that gdb still matters... that said, the entire sequence you use seems like it should be used before any production software is released.
Thank you for the reminder to update 7zip
As a coder for more than 25 years I am truly impressed abt what you are doing!
7zio ZStandard is not an offical 7zip product but instead a modified version of 7zip to support additional archive formats such as ZStd, LZ4, LZ5, Lizard.
7zip recently added native support for decompression at least, I'm not sure which one is referred here in the video
@@ytxzwthe one from the GitHub at 2:02
I thought that was the case. Gotta love the claim of 1000’s of crashes too, I’ve used 7zip for probably 15 years and think I’ve only encountered one or two while doing other things and leaving it extracting in the background
He's talking about main 7-Zip. The 7-Zip Standard fork have different implementation of Zstd which is not affected by this CVE. 7-Zip only supports decompression and had the CVE
@@TheStolenBattenberg Yes, and you aren't throwing billions of purposefully corrupted inputs at it.
Of course you don't encounter those crashes.
Old code has lots of macros because back in the day, you couldn’t trust the compiler to actually inline. So everyone had to know how to write macros and used them extensively. So they got used in ways that were terrible ideas.
Heavy use of macros is generally because compilers weren't always great, and you found profiling results which indicated that there was performance to be had. So you rewrote things using macros to get that delicious inline code. Unfortunately, macros are sticky, you generally won't later get profiling results which tell you that your compiler can now do all that for you, because the non-macro code paths no longer exist and the profiler doesn't see macros. For the MOST part, people don't go crazy with C macros just on a whim.
7-Zip added a zstd decoder recently. It's not 20 years old. The author wanted to write it on his own rather than use a library, and couldn't yet make an encoder.
If you look at the preprocessed code, the macros should expand to the code they're defined with anywhere you use them, the idea is that its like making a function call without having to actually make a function call so it doesn't have to allocate a stack frame and all that shit but whether or not it makes any sense to do that depends on how often the code is being ran and it probably makes les sense to do that on modern computers, I think nowadays the preferred way to do this is to use the inline keyword
Some language compilers will optimize and inline automatically. For example C# can do this.
I love your videos, everytime I realise how MUCH I still don't know. It's just amazing.
If ever you encounter a PySR generated structure... You will never forget it. It is like a code from the year 3000. A must if you haven't seen it. It is beyond human.
In the past, Winrar had a similar issue and with the help of it an attacker can execute his program directly
Watching the channel more than a year. Worth subscribing !
My personal bet is there isn't an exploit. It's not "good", but the CVE system has thoroughly proven itself to be unreliable.
Care to elaborate?
I just found your channel from my algorithm and this was such a great video! You do an amazing job explaining things and I will be subbing!
3:24 "I was realizing that this code is impossible to read"
Dear gods, Silicon Valley season 2 was right when the Dinesh and Gilfoyle tried to make sense of the original compression library code without the help of Richard XD
Where I used to work, 7-zip was on all our windows laptops. It was widely used to extract packages of logs that were zip’s, cpio’s, tgz’s, and xz’s within zips.
it is just the best..
@11:20 "len" does not have to start out at 0, to allow this bug to overwrite unintended memory. It is sufficient for that variables value to be less-than COPY_CHUNK_SIZE.
not divisible by COPY_CHUNK_SIZE
However, I'm not sure if that would be exploitable, because you somehow have to leave this infinite loop eventually.
@@snygg1993Correct. I think you could only leave the infinite loop if COPY_CHUNK_SIZE was not a power of 2.
@@mattrogers6646 It might still take "a few" overflows until you eventually hit zero 😁
I play your videos for gaining knowledge, i stay for the backgroundnoise/ white noise that they become when i eventually lost the track. My sleeping quality imrpoved.
12:35 It's winget update to you! lol
algorithmic code is always hard to understand. First you must understand the algo, next the code and the tricks are used, and after that you might be able to understand it and change things. Even for the developer itself it's hard to understand when revisiting the code after a while.
Same is true for graphics engines, or even small javascript-animations.
Cool video showing about using c, fuzzing & the debugger!
You're so clean, succinct, and clear as a narrator. It'd be amazing if you did a tutorial series on learning to use some of these tools like the fuzzer, something to teach developers how to be their own red team?
Such a nice and cool demonstration. I speak a very tiny bit of C based languages, but I still got an idea what was going on. Thanks!
It's always a missing length check before memcpy
ZDI only pay for exploitable bugs, so the person who found this would have been able to exploit this or at the least show control of execution flow. Great video!
1:10 - since when 7z is proprietary?
The biggest question was not asked - why do they *use their own zstd decompressor?* Why they rewrite decompressors of initially open-source standards, instead of using libraries that were extensively fuzzy-tested? Yes, 20 years ago it was fine - many libraries were not portable, not optimized or not opensource (like RAR), but why now?
There's nothing I dislike more than trying to parse someone else's badly written code 😫
"this is bad, dumb code, and more importantly it's bad dumb code that doesn't make any sense here" ~~ Valve employee Kyle on TF2's code.
Let's be fair though, just because you can't read someone else's code doesn't mean it's badly written. You could also just be dumb.
One of your best videos, very informative.
Good job!
wait, 7Zip is a mess? I thought it was the best compress/decompress tool
Many of the most widely used open source projects are riddled with places where signed integer variables are used to hold values than can never legitimately be negative. The real surprise for me is that vulnerabilities like this aren’t found more frequently.
if you're worried about macro programming, take a look at QEMU's object model. It's a giant mess, I can't whitstand C anymore
Zig for the rescue ! Or Rust.
@lolilollolilol7773 zig does not enforce safe code unfortunately
@@lolilollolilol7773 i'd rather Zig.
@@lolilollolilol7773 Rust FTW
@@lolilollolilol7773 zig does not enforce safety
after seeing the code for 7zip in this video. I now understand why Winrar decided to make the 30 day trial never expire
I could follow that fine just from the brief view you gave on screen. Nothing wrong with it. You have some serious snob issues if you think you need to verbose name a temp loop variable or limited scope variable that does not span a page. From what I can see, that code is perfectly readable.
Danm, docker sponsoring you? For me this is on another level
I learned more watching this in 6 min that I did in 4 years as a "security engineer"
Really helps to have people teach instead of hide information from you..
Your channel is called low level, but it is one of the highest level of content on this platform
Do we know if p7zip package contains this vulnerability?
It seems it wasn't affected, as the code was not present. Look for it in the Debian security tracker (CVE-2024-11477). The stable 7zip was not affected neither because the bug was introduced in v24.01 and the Debian stable package currently uses 22.01. LMAO and it really shows that stable is actually better for security concerns (older versions with known bugs are also patched for security reasons when needed.)
No
What's odd is that they called it an RCE, while it's not directly an RCE, as 7-Zip itself does not interact with network in any way by itself to be directly exploitable, and there's no known software that is vulnerable because of it
True and i’m not sure how this could be used in practice so I think it is a nothing burger
WWIV will be over before someone gets hacked by 7zip.
I like your funny words magic coding man
This can't be just maintained by just 2 people. This needs to be a much bigger project with a much bigger crowd. Like holy crap.
I always like a good vulnerable code bath. 4:55
I came here for this comment
This exact video was my life for a few years as a product security incident response analyst.
Does this affect Zstd as well?
Or was 7zip using their own implementation?
Zstd would be a very big target, like Xz.
nope not the Zstd spec, just their implementation of Zstd.
@@LowLevelTV where are they get it though? Wrote entierly from scratch? Because if this bug exists somewhere else, like in libzstd, then we're in a big trouble. Arch, for example, opted in to compress everything with this algorithm in the mainline repo. I bet many distros done this as well.
@@rogo7330 It's a bug in the zstd decoder implementation of 7zip, which was written from scratch by Igor Pavlov.
It's unrelated to libzstd.
@@rogo7330 Yes, Igor Pavlov wrote the Zstd implementation used in 7zip himself, for whatever reason. So other Zstd implementations that are done to spec aren't affected.
The reference implementation of zstd has already been security audited as far as I know. The comments at the top of 7-zip's implementation say that it was written based on the spec. It doesn't use any of the code from the reference implementation.
Of all the criticisms, you are complaining about names like "src" and "inSize"? These are very common names in programming that are intuitive and unambiguous when talking about processing the contents of a file. I agree that "b0" and "b1" are not good names, but the examples you used are completely reasonable. What would you consider a minimum acceptable variable name?
There's an absolute massive ton of programs that bundle their own portable version of 7zip along with the rest of the program. Sometimes there's even multiple nested bundles of 7zip. So presumably all of those would need to be patched too, which is never going to happen.
Which typically run once, and only on their own data. The only thing that could interfere with that is a program already running on a system or a user, which could do the same things anyway.
you mean a "UnZipMe.exe"?
(Macros are often used because it saves the overhead of the stack push and function call. It runs significantly faster if it's called millions of times. But it's only useful if the macro is used in many places, if used in only one place then it makes no sense. )
8:22 - I'm sorry but I was just exploding out of laughter on that xD TRRRRRRRRRRRRR
Its interesting how we take softwares available to everyone as "suppose to work, no harm" for granted.
Was surprised when you recreated the context using crash file and gdb. the crash file is same as core dumped? That is superb. Never seen a way to recreate a crash context in any "modern/high level languages". We just simulate it in our head. I mostly do php, js and little C++. And debugging tools in these is garbage.
"typically when you compile code you use this thing called GCC"... Or msvc or clang and only specifically when compiling C or C++ code
I understood nothing about this, but I support your enthusiasm!
docker sponsors youtubers? damn
I'm conditioned to sponsors basically being malware, I'm not sure how to respond now.
@@hellcoreproductions lol
Looks like the COPY_CHUNKS macro needs to defensively check for less than zero rather than assume the len var will land at zero (false), just in case the buffer size isn't divisible by CHUNKSIZE. Big assumption in COPY_CHUNKS that it is given a matching buffer and chunksize. However, maybe the COPY_PREPARE macro is doing that check and setting len to something appropriate, a bit hard to read ...
This quirk of C that you can treat an integer as a boolean, in an 'if' test, has resulted in other bugs than this one. Why not just test exolicitly for > 0 e.g. while ((len -= COPY_CHUNK_SIZE) > 0) and avoid possibility of going past the boundary. By more than part of one chunk size.