@@RohitKumarAnkam modern CPUs aren't truly x86 CPUs. they have their own internal non-x86 machine code, and expose that to the world through an x86 abstraction. the x86 abstraction layer isn't fully hard wired into the chip, and can be reprogrammed by flashing a new firmware onto the CPU. there can be bugs in the underlying hardware, or in the x86 abstraction layer. the x86 layer can often be patched to work around either type of bug, or fix them. CPU manufacturers don't often post full listings of their CPU firmwares, or fully document their microcode implementation layer. they often end up being a proprietary trade secret.
we made a pact with the devil the moment we decided that the cpu needed to try to be smart about what it runs and not the program smarter in the way it runs in the cpu. I've always said that this stuff is bs and I rather we use less performant but more barebones cpus but it is what it is.
@@Reiikzwith my job in HPC, I'd be a damn broker for the devil for the amount of deals I'd do for him to keep the insane BS that CPUs need to go faster. We've had to go down real far to get the performance we need.
It always makes my mind bend that the CPU manufacturers just made a smaller compiler/software stack inside of hardware to run the actual software we see faster.
Well, think about that there can be potentially multiple models of CPUs running on the same type of motherboard, and vice versa, it's not hard to say there's at least one layer of abstraction inside the CPU to ensure croes-platform compatibility. And when there's one, there can be many...
Its not a pre-processor within the hardware. x86ASM is still a "high" level language i.e. its not machine code. There's an x86ASM compiler that will compile these instructions down to your CPU op codes. This is where the pre-processor runs and the binary is generated. This is all still in software land.
@@KieranDevvs He's referring to the CPU microcode, which is indeed a tiny CPU inside the CPU that contains the code that tells how to run the instruction in the architecture's instruction set. It's mind blowing indeed
@@alexcani4957If he's talking about cpu microcode then that doesn't make sense either. Microcode isn't for performance, its to add dynamic compatibility and to simplify certain routines. It takes longer to decode a microcode instruction and execute the actual machine code because you're going through an abstraction.
@@KieranDevvsI agree with your point about the performance, but I usually choose to believe that people don't believe that the processor runs actual assembly code...
You can feel the satisfaction of the person who optimized strcmp to use avx instructions. And also the satisfaction of the person who came up with that zero flag in the register file. And then the satisfaction of Tevis who came up to use these creatively 😂
This video/series is amazing! I knew the basics of assembly before, but learning about Zenbleed and other similar vulns has always intimidated me, because they seem so complex at first (and they are!). This is the first time I feel like I understand a hardware vulnerability of this type. Thank you for this explaination :)
In my opinion, services that rent out hardware should absolutely inform and educate their customers. Imagine if a vehicle manufacturer had a critical issue with their product and a rental service just rented out the vehicles without informing people or fixing it. I realize it's not a great metaphor but hopefully you get the gist of my perspective.
Yes and No. You see, investing in good communication with your customers cost money and not all companies can. They'll end up with the basic newsletter like "this week, we patched all microcodes of our machines, to circumvent to the last known CPU flaws you may have heard of on the news, bla bla blah". But this kind of behavior is detrimental because such a company would be cherry picking bugs. What happens when the company is vulnerable to something they didnt inform about ? Only the biggest companies can inform 24/7 (OVH) about the hundred of vulnerabilities discovered each week and the actions taken. Most choose to stay silent and act behind the scene when most needed.
No need to inform in this case. You don’t update the microcode per se. The firmware is packaged as a binary blob that your kernel applies at runtime. Even if the server company did not know about the exploit, the moment they update the kernel (most servers are Linux servers) and microcode packages, they are protected.
Hwd rent services in any industry can’t have it both ways. Either its my property that i am liable for during a time duration, or its their property that they are liable for that I am paying for the privilege to use. If the liability is on the customer then the customer should get the rights that are associated with being able to avoid them!
The PC I'm watching this on happens to have a Zen 2 CPU, so I tried the exploit code in a VM. Sure enough, I found strings that definitely belong to the host machine, including HTTP header data for my browser fetching this very video. So I just hypervisor-leaked myself with this. Interesting and scary stuff.
Shows how public cloud is a dead on arrival concept for anything that handles sensitive data. Look at the results of sandsifter and tell me that there aren't instructions there that have even worse security implications.
Your Packagemanager of choice should automagically install the new microcode Fun fact: the Microcode can be patched without rebooting: Microcode should be applied as early as possible, it should be one of the first things your OS does when booting and if you install a new version you shouldn't wait for the next reboot to apply it
As always, it depends. If an update provides additional CPU flags or features, like IBRS (Indirect Branch Restricted Speculation) or SSBD (Speculative Store Bypass), then you'll need to reboot to make those available.
@@dealloc Huh interesting... I thought microcode had to be applied on every boot (as the cpu is missing storage to store it). Did I get that wrong, or is there some point that breaks new features? Like something only being patchable in like 16 bit mode or whatever?
Apparently the fix for this causes 54% of a performance drop for MariaDB. I don't think that it should be installed automatically for that reason. Admin should decide what is more important.
That's inception, CVE-2023-20569. This is zen bleed, CVE-2023-20593. DIfferent bugs. Inception is for Zen 3 and 4, zen bleed is for Zen 2. Zenbleed was only published 5 days ago(i think most operating systems released their patches on the same day or the day before), i don't think people have measured the speed of the fix yet. .@@_sneer_
@@_sneer_ I think with past CPU vulnerabilities the mitigations were enabled by default but could be disabled if you needed the performance and were sure you were unaffected by the security issues
So: Humans have a thing they want to do, and they optimize the heck out of that and turn it into code. Compiler takes the code and after optimizing the heck out of it, turns it into assembly. Assembler then takes it, optimizes the heck out of if and turns it into machine code. The CPU takes the machine code, _optimizes the heck out of that_ and runs it in ... uh, an architecture and micro-architecture specific execution pipeline which takes every shortcut and guess it can? WCPGW? Yeah, a breeding ground for bugs (all mentioned levels, actually). Again, thanks for covering Zenbleed. This video was even more interesting than the previous (I actually thought the assembly part would be boring, but oh no...) I'm out of vocabulary, and my brain is out of breath. I assumed all registers a CPU has are fixed in the chip, but there goes that old piece of knowledge. (I knew about the zero register in RISC-V but that's used differently.)
I know that CPUs have cache, but for some reason I've never thought about how it might be used. "heap", renamed registers and pointers in HW, that's something I've never even started to think about. But now when I think about the parallell/out of order/speculative execution demands it all suddenly fits togerther. It actually makes sense!
If they provide the hardware as a service, but not access to said hardware (i.e. to secure yourself from vulnerabilities present in hardware), they are responsible for making sure that hardware is up to date with any security fixes and should disclose any vulnerabilities that may be applicable. Whether or not they were fixed. What's good is that in their FAQ they do acknowledge Zenbleed and Downfall: > We apply all currently released stable patches and microcode updates. Depending on the update the mitigation becomes either active immediately or is activated as part of the continous maintenance and update cycle of our platform.
There’re some parts not really affect one to understand this bug, like register renaming. It’s a higher layer abstraction(relative to register file) isn’t really involved here. But on the other hand, I think it’s worth to explain some more details. For example, what is the expected state after side effect roll back when speculation breaks. Or where will a freed register points to after rollback, what variants we can have etc.
Hey, your old video about SIM cards popped up in my recommended earlier and it really intrigued me to see how something I never even thought about worked, in detail. I was just wondering after watching that video how printers and scanners communicate with devices, through USB and wifi, as that technology has been around for so many years that there will surely be relics of programming from that time still used today, and something like that would be so interesting to me and I'm sure many others as well. Thanks!
An awesome explanation of how the zenbleed might have worked in practice! I really liked the way on how you dissected and explained each instruction and the relation of "strcmp" libc function with AVX registers. Great explanation and the way it was presented. Thank you for making such educational videos.
For merge optimization, if what you do (for example SQRT2SS) does not affect the upper 96 bits of a destination register. It can bypass result merging (validation of the result). Result merging is when you do an operation that doesn't modify all bits, the processor has to merge the result back to validate and use it - like it combines your 32 bits (example) with your old data you didn't touch before (96 bits) to create the new value that you have asked it to (128 bits, full register). With merge optimization for things like SQRT2SS the processor intrinsically knows how many 0s there is in the register at all times, and does not need to validate how many 0s there is thus it doesn't need to merge the results, which speeds up processing greatly.
A detailed video "What is microcode" would be nice, because it's there especially to fix CPU issues. I guess this topic would be even interesting for people who are already very familiar with computer science. Not sure how easy it's to make such a video because I think it's especially interesting because it's a very complex topic. I guess microcode not just implements all the more complex instructions, it likely also handles the structures mentioned as "register file" and "register allocation table". I was surprised when I learned that the microcode isn't static and need to be loaded each time to patch the CPU. That the BIOS applies the initial update on start and that the OS can update it again, like when the BIOS is outdated.
The issues themselves often exist in the microcode as all of these complex instructions is implemented there (and francly all the less advanced once too), and not actually in the CPU itself. At a rough level, the microcode does the actual work of your decoded instruction, we're talking code like set register 1 to the ALU side A, register 27 to ALU side B, set the ALU carry to zero, ALU flag to do an addition, store the result in register 7, set flags such as negative, zero, overflow, and carry, and jump to the next instruction and increase the IP, and you have something like an add instruction. Assembly is kinda just like python or java opcode, but with a low level CPU interpreter - the microcode. And here we just have a bit of "if zeroflag is set, don't do this code".
@@genstian As far as I know the microcode exists specifically to have something to fix CPU issues. It were added to patch the CPU in a case something isn't working as it should be to prevent that the CPU can't be used reliably anymore. So of course it's designed that if there is a bug that this is likely in the microcode, for the most or maybe even all instructions. An issue in the hard wired transistor logic would be a nightmare. _I can imagine that some of the simpler and first instructions which were present since the first x86 CPU and so are well known are maybe not in microcore or if in a rudimentary way._
@@maxmustermann7397 Every single x86 instruction map either a single uop (like those very simple instructions that have been present since forever), or upto 4 per insutrction in the decoding table (agnors data from 2015, like those "do this and that" instructions) or a full microcode function, and of cource, some instructions do have hardware uops and simply map those 1:1, modern cpus have like 20.000 different once, even full blown hardware instructions for sha or rsa, but any assemply instruction can be overwritten in microcode. A fairly large permanent microcode for instructions have been present since the 486.
@@maxmustermann7397microcode can be *updated* to fix some CPU issues, but it is not created for that. think of it as a kind of firmware: everything you do with a modern CPU is going through the microcode from day zero because the hardware itself doesnt actually run x86 machine code. without a microcode, your CPU is useless, so you dont even have a chance to worry about CPU bugs.
@@maxmustermann7397in Ben Eater's breadboard CPU series, his CPU uses microcode to specify how each instruction is actually implemented. It's basically a sequence of states for the various control lines in the CPU. For example, the add instruction could look like this in microcode: 1. register 1 output enable, ALU input A enable 2. register 2 output enable, ALU input B enable 3. ALU add, ALU output enable, register 1 input enable And if there were some bug in this implementation, a microcode update could fix the bug. It could also add completely new features though, for example it might add a second add instruction that operates directly on memory rather than registers. Microcode in modern CPUs likely works in a similar way, but of course way more complex with all the optimizations they make.
Tavis is a legend. His research into stepping out of VM's, into other adjacent VM's and even into the host hypervisor, was invaluable in bringing those championing that virtualisation increased security, back down the Earth. He thankfully proved that virtualisation essentially created brand new attack surfaces.
If registers are essentially all stored together in the same memory collection, it seems that the idea of a register is more fluid. Thus, could we see a day where there's a CPU that uses a different paradigm, other than registers?
if you want to throw away all the compilers and software stacks in the world yes (since there would be no level of rewrite to make them compatible to a new architecture that does not use registers - unlike for example translating from x86 to ARM or RISC V). You could implement C in the CPU itself directly (the heap) without the CPU needing registers. You would loose the ability to run low level assembly and compilers for all other languages, but you coudl do it. Oh yes you would also need to rewrite all microcode / BIOS as well. The change would be so big as going from traditional to quantum computing. Everything reset to point 0. But sadly no benefit, just downsides (CPUs would lose the ability to do low level optimisations like in this video, and they would not be language agnostic anymore, or you would need to run code that has been compiled into the intermediary language - such ones already exist but here we are talking about a worldwide usage, not a specialized use case).
Thanks for the detailed breakdown of how this exploit works, or bug is triggered. And fixed. :) Gosh, we're going to be in trouble when ai is used to accelerate such bug discovery and exploit construction.
The sad part about your videos is I understand it all at a basic level due how well you explain things but I usually forget the contexts too eventually 😅😢 Would love to make a joke about x86 but ARM is proven to have bleed bugs as well.
@@yarmgl1613"highness" of level is very relative... It almost makes no sense to talk about it. Compared to minecraft command block language, C is definitely not mid level - it's very very low level. Compared to machine code, even assembly is relatively high level, let alone C. Again not mid level but very high level.
Copying data and forgetting to modify the correct copy. Getting deep in a nested operation and forgetting to set modified data back to the correct values. Using memory after freeing it. This video helped me realize that CPU designers are just regular programmers making the same mistakes as the rest of us.
Good video but the end... if you don't know how to update the microcode, then you shouldn't admin your own public accessible server. You can't blame your lack of knowledge on companies. There are so many parameters, they can't inform you about everything. Do you know how many CVEs are released every month? I personally don't expect my real-estate agent to teach me how to fix everything in my house. Or my car dealer to teach me how to service the car I bought.
Excellent vid thank you. That was very well explained. So programmers just need to make sure avx instructions aren't used in processing of security sensitive data to avoid this issue?
First, the choice of instructions is not yours, but the library writers. On program startup the fastest variants for each routine available on the CPU is selected. Second, this bug is just in a specific generation of the processor. There should exist a microcode update removing the problem. It just wasn't installed here at the time of video creation. If you avoid all complex stuff in your code, there is some probability you introduce other bugs by doing so. And then there is the question of how much paranoia is healthy or if you are paranoid enough.
Don't know what OS your minecraft server is running, but microcode updates should be provided via the regular updating process in your package manager for most Linux distros.
Hetzner has any kind of hyperthreading disabled unlike some other cloud providers. They also do CPU pinning so your registers are just seen by you using this bug. This bug is only relevant for you if you run code from untrusted third parties on your machine. I find: "This is your machine" a reasonable stance in this situation as the abstractions they sell you (this is your machine, no one else can read your registers) is not violated.
Ive been tryong to switch to an alternative of x86 since… forever. I have a server on arm64, but the bigLITTLE 2/4 cores is still too weak for anything serious. Ive got my hands on apple M1, which is *extremely* fast, but the suppoet for a serious operating system is still in alpha. My dream is 16 core ARM A73 ❤
Yeah I really couldn't stand the constant recommendations of G-Mail and Google Docs and the new Pixel 7 Pro during the explanation of Assembly instructions. How can I even be sure that YMM registers are actually 256 bits wide and not just Google's way of telling me that the new Pixel watch can now generate a EKG right from your wrist?
no dude, and you should know that I am a firm advocate and long time respecter of your work. Did I not reach out with a suspected zero day, Not saying you shouldn't have a new life and new format and even make some money, just saying things have changed. also FWIW this is an awesome bug and Tavis is a hero for finding it. The world still exists outside the Google bubbble, in case you forget @@LiveOverflow
I think the Linux kernel already has a Zenbleed patch so if you're running a modern distro on that dedicated server that should be good enough to reboot into for protecting it. Ideally there already shouldn't be many ways for people to get into the machine and run these commands. Famous last words given Minecraft was a huge hitter with l4j...
Wouldn't the kernel or microcode update need to be applied to the host machine? If Hetzner informed you, does patching your provided KVM emulated CPU fix anything? A bad actor could still run their VM with an unpatched version. Edit: I realized now your Minecraft does not run on a Cloud server.
An ISA is really just an interface. Behind the scenes CPUs can do literally anything as long as the "clients" get the expected end-result. And CPU vendors really ran with it. If I understand what you people just said correctly, they went and hardcoded a full-on, hardware-level dynamic JIT recompiler between the CPU and our software to rework the suboptimal machine code spewed out by our compilers. And a part of me worries they are doing things this way explicitly to conceal the implementation details of their silicon at the expense of performance and security. Which is unacceptable. Really, isn't it more logical to let CPU vendors make the compilers, have them release these as open-source software, and throw away this microcode VM bullshit? Then all these "hardware" bugs would reveal their true identity as plain old software bugs. CPUs could just be told all the static analysis assumptions that are figured out at compile time and implement things like hyperthreading and speculative exec at that level, as instruction set extensions to be directly managed by the OS and userspace software. That way, systems could just... selectively opt not do any combination of these on a thread by thread basis, instead of globally disabling CPU features. CPUs could disable and enable hyperthreading, speculative exec, register renaming, all of that stuff on the fly based on instant OS scheduler management decisions and the requirements of userspace. And knowing the legacy of x86, at what point should we just pull an Apple and drop the x86_64 ISA altogether and just tank the perf hit on backwards compatibility for a few years in exchange of better security and performance potential down the road? When are AMD and Intel planning to develop a brand-new ISA for PCs, if at all? It doesn't necessarily have to be RISC-based, but maybe they'd be able to take back that segment of the server and high-performance computing space that has since been taken over by ARM-based processors and GPUs... Maybe even convince Apple to get back on the Intel train, hmm? I mean, they don't even have to stop making x86-based CPUs, necessarily. They could still be making them for the people that still need them (I mean, modern enterprise motherboards with 90s I/O are still being made, alongside compact little SOMs that use modern manufacturing techniques to run DOS, Windows 98 and all the decades-old, pre-Internet software and hardware that only works there, and never needed security updates because they are total data islands that never interact with foreign information). But then they could freeze the performance level of this ISA instead of continuing to desperately look for ways to speed up this old behemoth.
Every major CPU and GPU vendor seems to have experimented with VLIW instruction sets. It turns out that writing efficient compilers for them is a pain in the butt. The CPU and GPU vendors may still be using VLIW under the hood: only they brought the compiling in-house.
I honestly love that Zenbleed exists. This will mean lots of cloud providers offloading Epyc Rome chips in massive quantities. it'll be a great time for building or upgrading a homelab very soon
@@jamesphillips2285 it comes with performance penalties of up to 50% depending on workload. Many cloud providers will not see that as an acceptable trade-off
Woudn't that just leak only older/yonger bits of stored data? So if there was string "ABCDEFG" woudn't that just leak "DEFG", and also arent all passwords stored (and compared) as hashes and present as text only during hashing operation?
I think raw passwords were used just for illustration purposes. saying "password hash" would have been a little more cumbersome. Also passwords may not be the only sensitive data a machine is processing. You may be signing messages with the VM's private key. It may also allow information to leak out of sandboxes that things like webbrowsers are now using.
So the registers are kinda like how we normally store files on a disk a file on a file system that keeps track of where that file is physically stored in this massive pool of sectors aka the disk Correct me if that's a stupid analogy
Maybe I missed something, but what really is the attack target for this types of attacks? How do you recognize the leaked data represents some password, instead of some useless data? Btw. I really think they should at least provide convenient means to fix hardware/firmware issues like that. It is at best questionable service.
For example if you target leaking data from /etc/shadow. You know the line starts with root:, so you look for a leak that contains this. And you repeat this slowly leaking additional characters.
im assuming if proper security measures are taken, a simple hashing of passwords for example, could mitigate the effect this issue could have, please correct me if i'm wrong
This is a vulnerability concerning multi User systems. If user A can exploit this to get information of User B's running processes. User B can of course make the information in his processes less valuable, but the issue remains, that user A can leak User B's information.
Yeah, as long as the hashing algorithm is a security one then all this issue is going to do is leak the hash of the password and not the password itself.
This is not going to work as the password could still be leaked before being hashed. Mitigations like this would require application developers to put immense amounts of effort into obfuscating their own private memory and probably even more performance detriments than mitigations in the kernel or microcode.
@@telceantiberiu7715 The password being leaked before being hash is outside the scope of the problem. We are only answering whether hashing the password will mitigate the issue.
I don't think speculative execution works they way you explain it. Modern cpus can execute all branches up to a certain point, and return only the validated path one later. There's no "could have executed" path, all paths run until you can validate the correct one. That is the essential problem with speculative exec, and the feature should be removed in all cpu architectures, otherwise these bugs will keep surfacing and patching a few registers to zero after a vulnerability publication does not fix the underlying problem. More hacky patches leads to more complexity and more undefined behavior.
I'm not an expert. But everything I have read about speculative execution bugs, it has always been an important part to craft assembly in a way that specific code is speculatively executed. Which means not everything is speculatively executed. Where have you read the "all paths run until you can validate the correct one"?
The researcher is really irresponsible. Releasing working code of the undetectable exploit, before AMD has issued a patch, is extremely stupid. People will be vulnerable for a few months now and no malware detector can spot this.
SLIM VIDEO UPDATE! For over a year I accidentally recorded my video stretched in width. It's fixed now
i actually didn't notice it
FatOverflow is no more
😂 Yeahhhh! Great!))))
I want a video about what the hell is micro code and how it can fix these issues.
@@RohitKumarAnkam modern CPUs aren't truly x86 CPUs. they have their own internal non-x86 machine code, and expose that to the world through an x86 abstraction.
the x86 abstraction layer isn't fully hard wired into the chip, and can be reprogrammed by flashing a new firmware onto the CPU.
there can be bugs in the underlying hardware, or in the x86 abstraction layer. the x86 layer can often be patched to work around either type of bug, or fix them.
CPU manufacturers don't often post full listings of their CPU firmwares, or fully document their microcode implementation layer. they often end up being a proprietary trade secret.
Note to self: the parts of microarchitecture that made my brain hurt when learning about them are where all the worst vulnerabilities live
we made a pact with the devil the moment we decided that the cpu needed to try to be smart about what it runs and not the program smarter in the way it runs in the cpu.
I've always said that this stuff is bs and I rather we use less performant but more barebones cpus but it is what it is.
its because not even the CPU architects and engineers can fully understand it themselves.
@@ReiikzI think that people that buy cpus based on benchmarks wouldn't do it...
@@Reiikzwith my job in HPC, I'd be a damn broker for the devil for the amount of deals I'd do for him to keep the insane BS that CPUs need to go faster. We've had to go down real far to get the performance we need.
@@naterthan5569We refer to this kind of vulnerability as Eldritch horror
It always makes my mind bend that the CPU manufacturers just made a smaller compiler/software stack inside of hardware to run the actual software we see faster.
Well, think about that there can be potentially multiple models of CPUs running on the same type of motherboard, and vice versa, it's not hard to say there's at least one layer of abstraction inside the CPU to ensure croes-platform compatibility.
And when there's one, there can be many...
Its not a pre-processor within the hardware. x86ASM is still a "high" level language i.e. its not machine code. There's an x86ASM compiler that will compile these instructions down to your CPU op codes. This is where the pre-processor runs and the binary is generated. This is all still in software land.
@@KieranDevvs He's referring to the CPU microcode, which is indeed a tiny CPU inside the CPU that contains the code that tells how to run the instruction in the architecture's instruction set. It's mind blowing indeed
@@alexcani4957If he's talking about cpu microcode then that doesn't make sense either. Microcode isn't for performance, its to add dynamic compatibility and to simplify certain routines. It takes longer to decode a microcode instruction and execute the actual machine code because you're going through an abstraction.
@@KieranDevvsI agree with your point about the performance, but I usually choose to believe that people don't believe that the processor runs actual assembly code...
If everybody would just use the same password, we wouldn’t have to worry about this.
This would still end in xkcd standards.
No security issues if theres no security 🤷🏻♂️
yeah totally. Have no idea why ppl collectively spend enormous amount of effort to try to solve something this easily solvable
I only allow 8-char passwords in my software, so they're all immune to zenbleed.
Problem -> solution
You can feel the satisfaction of the person who optimized strcmp to use avx instructions. And also the satisfaction of the person who came up with that zero flag in the register file.
And then the satisfaction of Tevis who came up to use these creatively 😂
@@JohannaMueller57Point it out, then.
This video/series is amazing! I knew the basics of assembly before, but learning about Zenbleed and other similar vulns has always intimidated me, because they seem so complex at first (and they are!).
This is the first time I feel like I understand a hardware vulnerability of this type. Thank you for this explaination :)
In my opinion, services that rent out hardware should absolutely inform and educate their customers. Imagine if a vehicle manufacturer had a critical issue with their product and a rental service just rented out the vehicles without informing people or fixing it. I realize it's not a great metaphor but hopefully you get the gist of my perspective.
Yes and No. You see, investing in good communication with your customers cost money and not all companies can. They'll end up with the basic newsletter like "this week, we patched all microcodes of our machines, to circumvent to the last known CPU flaws you may have heard of on the news, bla bla blah". But this kind of behavior is detrimental because such a company would be cherry picking bugs. What happens when the company is vulnerable to something they didnt inform about ? Only the biggest companies can inform 24/7 (OVH) about the hundred of vulnerabilities discovered each week and the actions taken. Most choose to stay silent and act behind the scene when most needed.
So AMD should inform everybody.
@@ABaumstumpf AMD does not know who is the end user of a server that hosts an AMD cpu - they might inform Hetzner, but not the customers of Hetzner
No need to inform in this case. You don’t update the microcode per se. The firmware is packaged as a binary blob that your kernel applies at runtime. Even if the server company did not know about the exploit, the moment they update the kernel (most servers are Linux servers) and microcode packages, they are protected.
Hwd rent services in any industry can’t have it both ways. Either its my property that i am liable for during a time duration, or its their property that they are liable for that I am paying for the privilege to use. If the liability is on the customer then the customer should get the rights that are associated with being able to avoid them!
This is the best liveoverflow video so far, this is the kind of content we who wants the lower level look on issues want , thanks a lot fab
The PC I'm watching this on happens to have a Zen 2 CPU, so I tried the exploit code in a VM.
Sure enough, I found strings that definitely belong to the host machine, including HTTP header data for my browser fetching this very video.
So I just hypervisor-leaked myself with this. Interesting and scary stuff.
Shows how public cloud is a dead on arrival concept for anything that handles sensitive data.
Look at the results of sandsifter and tell me that there aren't instructions there that have even worse security implications.
@@magfal Even ignoring things like this, many providers are simply not GDPR-compliant
Your Packagemanager of choice should automagically install the new microcode
Fun fact: the Microcode can be patched without rebooting: Microcode should be applied as early as possible, it should be one of the first things your OS does when booting and if you install a new version you shouldn't wait for the next reboot to apply it
As always, it depends. If an update provides additional CPU flags or features, like IBRS (Indirect Branch Restricted Speculation) or SSBD (Speculative Store Bypass), then you'll need to reboot to make those available.
@@dealloc Huh interesting... I thought microcode had to be applied on every boot (as the cpu is missing storage to store it). Did I get that wrong, or is there some point that breaks new features? Like something only being patchable in like 16 bit mode or whatever?
Apparently the fix for this causes 54% of a performance drop for MariaDB. I don't think that it should be installed automatically for that reason. Admin should decide what is more important.
That's inception, CVE-2023-20569. This is zen bleed, CVE-2023-20593. DIfferent bugs. Inception is for Zen 3 and 4, zen bleed is for Zen 2. Zenbleed was only published 5 days ago(i think most operating systems released their patches on the same day or the day before), i don't think people have measured the speed of the fix yet.
.@@_sneer_
@@_sneer_ I think with past CPU vulnerabilities the mitigations were enabled by default but could be disabled if you needed the performance and were sure you were unaffected by the security issues
So: Humans have a thing they want to do, and they optimize the heck out of that and turn it into code. Compiler takes the code and after optimizing the heck out of it, turns it into assembly. Assembler then takes it, optimizes the heck out of if and turns it into machine code. The CPU takes the machine code, _optimizes the heck out of that_ and runs it in ... uh, an architecture and micro-architecture specific execution pipeline which takes every shortcut and guess it can? WCPGW? Yeah, a breeding ground for bugs (all mentioned levels, actually).
Again, thanks for covering Zenbleed. This video was even more interesting than the previous (I actually thought the assembly part would be boring, but oh no...) I'm out of vocabulary, and my brain is out of breath. I assumed all registers a CPU has are fixed in the chip, but there goes that old piece of knowledge. (I knew about the zero register in RISC-V but that's used differently.)
I know that CPUs have cache, but for some reason I've never thought about how it might be used. "heap", renamed registers and pointers in HW, that's something I've never even started to think about. But now when I think about the parallell/out of order/speculative execution demands it all suddenly fits togerther. It actually makes sense!
This was effing GREAT! Man, I feel like I could talk with Tavis for hours!
Great job, both of you!
I'm really liking the use of "goto boring" in the C code at 5:11. I always avoid goto, but maybe sometimes I shouldn't
This is why software like Bochs is important - it allows to be sure that code will be executed exactly as expected. But it is slow of course.
When you started explaining the "shadow world", it remind me to quantum superposition
Thank you for the explanations in this video. The complexity and the mechanisms under the hood are blowing my mind.
If they provide the hardware as a service, but not access to said hardware (i.e. to secure yourself from vulnerabilities present in hardware), they are responsible for making sure that hardware is up to date with any security fixes and should disclose any vulnerabilities that may be applicable. Whether or not they were fixed.
What's good is that in their FAQ they do acknowledge Zenbleed and Downfall:
> We apply all currently released stable patches and microcode updates. Depending on the update the mitigation becomes either active immediately or is activated as part of the continous maintenance and update cycle of our platform.
There’re some parts not really affect one to understand this bug, like register renaming. It’s a higher layer abstraction(relative to register file) isn’t really involved here. But on the other hand, I think it’s worth to explain some more details. For example, what is the expected state after side effect roll back when speculation breaks. Or where will a freed register points to after rollback, what variants we can have etc.
He explained it because the register renaming is important for the bug. That behavior confuses the compiler when it runs vzeroupper.
Hey, your old video about SIM cards popped up in my recommended earlier and it really intrigued me to see how something I never even thought about worked, in detail.
I was just wondering after watching that video how printers and scanners communicate with devices, through USB and wifi, as that technology has been around for so many years that there will surely be relics of programming from that time still used today, and something like that would be so interesting to me and I'm sure many others as well.
Thanks!
An awesome explanation of how the zenbleed might have worked in practice! I really liked the way on how you dissected and explained each instruction and the relation of "strcmp" libc function with AVX registers. Great explanation and the way it was presented.
Thank you for making such educational videos.
Absolutely loved the video. The way you explained it... I am thankful.
For merge optimization, if what you do (for example SQRT2SS) does not affect the upper 96 bits of a destination register. It can bypass result merging (validation of the result). Result merging is when you do an operation that doesn't modify all bits, the processor has to merge the result back to validate and use it - like it combines your 32 bits (example) with your old data you didn't touch before (96 bits) to create the new value that you have asked it to (128 bits, full register). With merge optimization for things like SQRT2SS the processor intrinsically knows how many 0s there is in the register at all times, and does not need to validate how many 0s there is thus it doesn't need to merge the results, which speeds up processing greatly.
That code overlay at 7:18 to show it's the same code is pure gold
This video opened up a whole world to me. Had no idea cpus and registers are now like mini compilers
A detailed video "What is microcode" would be nice, because it's there especially to fix CPU issues. I guess this topic would be even interesting for people who are already very familiar with computer science. Not sure how easy it's to make such a video because I think it's especially interesting because it's a very complex topic. I guess microcode not just implements all the more complex instructions, it likely also handles the structures mentioned as "register file" and "register allocation table".
I was surprised when I learned that the microcode isn't static and need to be loaded each time to patch the CPU. That the BIOS applies the initial update on start and that the OS can update it again, like when the BIOS is outdated.
The issues themselves often exist in the microcode as all of these complex instructions is implemented there (and francly all the less advanced once too), and not actually in the CPU itself. At a rough level, the microcode does the actual work of your decoded instruction, we're talking code like set register 1 to the ALU side A, register 27 to ALU side B, set the ALU carry to zero, ALU flag to do an addition, store the result in register 7, set flags such as negative, zero, overflow, and carry, and jump to the next instruction and increase the IP, and you have something like an add instruction.
Assembly is kinda just like python or java opcode, but with a low level CPU interpreter - the microcode.
And here we just have a bit of "if zeroflag is set, don't do this code".
@@genstian As far as I know the microcode exists specifically to have something to fix CPU issues. It were added to patch the CPU in a case something isn't working as it should be to prevent that the CPU can't be used reliably anymore. So of course it's designed that if there is a bug that this is likely in the microcode, for the most or maybe even all instructions. An issue in the hard wired transistor logic would be a nightmare. _I can imagine that some of the simpler and first instructions which were present since the first x86 CPU and so are well known are maybe not in microcore or if in a rudimentary way._
@@maxmustermann7397 Every single x86 instruction map either a single uop (like those very simple instructions that have been present since forever), or upto 4 per insutrction in the decoding table (agnors data from 2015, like those "do this and that" instructions) or a full microcode function, and of cource, some instructions do have hardware uops and simply map those 1:1, modern cpus have like 20.000 different once, even full blown hardware instructions for sha or rsa, but any assemply instruction can be overwritten in microcode. A fairly large permanent microcode for instructions have been present since the 486.
@@maxmustermann7397microcode can be *updated* to fix some CPU issues, but it is not created for that. think of it as a kind of firmware: everything you do with a modern CPU is going through the microcode from day zero because the hardware itself doesnt actually run x86 machine code. without a microcode, your CPU is useless, so you dont even have a chance to worry about CPU bugs.
@@maxmustermann7397in Ben Eater's breadboard CPU series, his CPU uses microcode to specify how each instruction is actually implemented. It's basically a sequence of states for the various control lines in the CPU. For example, the add instruction could look like this in microcode:
1. register 1 output enable, ALU input A enable
2. register 2 output enable, ALU input B enable
3. ALU add, ALU output enable, register 1 input enable
And if there were some bug in this implementation, a microcode update could fix the bug. It could also add completely new features though, for example it might add a second add instruction that operates directly on memory rather than registers.
Microcode in modern CPUs likely works in a similar way, but of course way more complex with all the optimizations they make.
This was super interesting, thank you!
This is crazy! Thank you sooo much for an update on this!
Really good second part, thank you for the really good explanations and providing additional background knowledge. Learned so much from it!!
Tavis is a legend. His research into stepping out of VM's, into other adjacent VM's and even into the host hypervisor, was invaluable in bringing those championing that virtualisation increased security, back down the Earth.
He thankfully proved that virtualisation essentially created brand new attack surfaces.
WOW! Amazing! I have just discovered that exists a whole computer inside a CPU!!!! Mind blowing!!!!
If registers are essentially all stored together in the same memory collection, it seems that the idea of a register is more fluid. Thus, could we see a day where there's a CPU that uses a different paradigm, other than registers?
if you want to throw away all the compilers and software stacks in the world yes (since there would be no level of rewrite to make them compatible to a new architecture that does not use registers - unlike for example translating from x86 to ARM or RISC V). You could implement C in the CPU itself directly (the heap) without the CPU needing registers. You would loose the ability to run low level assembly and compilers for all other languages, but you coudl do it. Oh yes you would also need to rewrite all microcode / BIOS as well. The change would be so big as going from traditional to quantum computing. Everything reset to point 0. But sadly no benefit, just downsides (CPUs would lose the ability to do low level optimisations like in this video, and they would not be language agnostic anymore, or you would need to run code that has been compiled into the intermediary language - such ones already exist but here we are talking about a worldwide usage, not a specialized use case).
@@marsovactldr not going to happen? xD
A CPU without registers is just a rock. Even stack machines need registers to store the stack pointer and program counter.
zen2_leak_pepo_unrolled is my favorite Twitch emote
Holy crap, great explanation! Changed the way I think about modern CPUs.
I can't thank you enough for this video. The explanation blew my mind, but it is the best explanation that I have found.
Wooow, if he made a c or asm learning tutorial I would definitely follow to the letter ,this is great
this is so simple that it makes me feel good about my personal level of skill
You are an amazing teacher, thanks for sharing this!
Thanks for the detailed breakdown of how this exploit works, or bug is triggered. And fixed. :)
Gosh, we're going to be in trouble when ai is used to accelerate such bug discovery and exploit construction.
Thank you for the amazing work on this vídeo.
Awesome video! Thanks
The sad part about your videos is I understand it all at a basic level due how well you explain things but I usually forget the contexts too eventually 😅😢
Would love to make a joke about x86 but ARM is proven to have bleed bugs as well.
My man really called a CPU manual "Really high level" 💀💀💀
Well, compared to the electric fields that bend and twist and flip and flop inside the physical transistors, it kind of is high level 😅
It's all relative. C was a high level language at some point
@@bene5431i'd say C is mid level
@@yarmgl1613"highness" of level is very relative... It almost makes no sense to talk about it. Compared to minecraft command block language, C is definitely not mid level - it's very very low level.
Compared to machine code, even assembly is relatively high level, let alone C. Again not mid level but very high level.
Op code - machine code - assembly - c
Copying data and forgetting to modify the correct copy. Getting deep in a nested operation and forgetting to set modified data back to the correct values. Using memory after freeing it. This video helped me realize that CPU designers are just regular programmers making the same mistakes as the rest of us.
So why is there any data in that register anyway? Shouldn't that be cleared during the context switch? Or is the clearing also optimized away?
How do they access the CPU? Through the existing backdoor?
Good video but the end... if you don't know how to update the microcode, then you shouldn't admin your own public accessible server. You can't blame your lack of knowledge on companies. There are so many parameters, they can't inform you about everything. Do you know how many CVEs are released every month?
I personally don't expect my real-estate agent to teach me how to fix everything in my house. Or my car dealer to teach me how to service the car I bought.
do we already know how to disable those mitigations once they are in?
you can update your microcode by updating the `amd-ucode` package on your system and rebooting
Id like to know why it doesn't work on zen 3, I assume because the silicon architecture changed and the bug is no longer present?
This might be your best video ever
All I can say is thank god Ormandy uses his powers for good.
"root@minecraft": glad to see you're still a minecraft channel
Excellent vid thank you. That was very well explained. So programmers just need to make sure avx instructions aren't used in processing of security sensitive data to avoid this issue?
First, the choice of instructions is not yours, but the library writers. On program startup the fastest variants for each routine available on the CPU is selected.
Second, this bug is just in a specific generation of the processor. There should exist a microcode update removing the problem. It just wasn't installed here at the time of video creation.
If you avoid all complex stuff in your code, there is some probability you introduce other bugs by doing so. And then there is the question of how much paranoia is healthy or if you are paranoid enough.
Don't know what OS your minecraft server is running, but microcode updates should be provided via the regular updating process in your package manager for most Linux distros.
My Ubuntu-based distro provided an AMD-specific microcode update just the other day, so I'm assuming that's what it was
Hetzner has any kind of hyperthreading disabled unlike some other cloud providers. They also do CPU pinning so your registers are just seen by you using this bug.
This bug is only relevant for you if you run code from untrusted third parties on your machine. I find: "This is your machine" a reasonable stance in this situation as the abstractions they sell you (this is your machine, no one else can read your registers) is not violated.
how far does register renaming go back? i am looking at the z80 with suspicion!
Ive been tryong to switch to an alternative of x86 since… forever.
I have a server on arm64, but the bigLITTLE 2/4 cores is still too weak for anything serious.
Ive got my hands on apple M1, which is *extremely* fast, but the suppoet for a serious operating system is still in alpha.
My dream is 16 core ARM A73 ❤
Once upon a time watching you was not just watching adverts from google.
does the quality of the video suffer from this being sponsored by google?
Yeah I really couldn't stand the constant recommendations of G-Mail and Google Docs and the new Pixel 7 Pro during the explanation of Assembly instructions. How can I even be sure that YMM registers are actually 256 bits wide and not just Google's way of telling me that the new Pixel watch can now generate a EKG right from your wrist?
no dude, and you should know that I am a firm advocate and long time respecter of your work. Did I not reach out with a suspected zero day, Not saying you shouldn't have a new life and new format and even make some money, just saying things have changed. also FWIW this is an awesome bug and Tavis is a hero for finding it. The world still exists outside the Google bubbble, in case you forget @@LiveOverflow
23:23 My God! Who allowed this thinking it would be a good idea?
how did u go from me thinking "Oh ya that guy who rly can hack in minecraft" to "Like the smartes guy i know about explotation" :0
Is Zen+ affected as well? My 3200G is Zen+
I feel like microcode updates should be the hoster's responsibility. Its their hardware after all.
I think the Linux kernel already has a Zenbleed patch so if you're running a modern distro on that dedicated server that should be good enough to reboot into for protecting it. Ideally there already shouldn't be many ways for people to get into the machine and run these commands. Famous last words given Minecraft was a huge hitter with l4j...
Wouldn't the kernel or microcode update need to be applied to the host machine? If Hetzner informed you, does patching your provided KVM emulated CPU fix anything?
A bad actor could still run their VM with an unpatched version.
Edit: I realized now your Minecraft does not run on a Cloud server.
He says it's a dedicated server, so no emulated CPU but real hardware. I would however expect hosting providers to keep the Agesa up to date
Them saying you're fully responsible for your rented server is stupid.
In the example at 9:01, woudl'nt the lower part of the register be at the right, and not at the left? MSBs are stored to the left
I cant believe nobody called their vulnerability REKTBLEED yet. Dibs!
Amazing video.
An ISA is really just an interface. Behind the scenes CPUs can do literally anything as long as the "clients" get the expected end-result. And CPU vendors really ran with it. If I understand what you people just said correctly, they went and hardcoded a full-on, hardware-level dynamic JIT recompiler between the CPU and our software to rework the suboptimal machine code spewed out by our compilers. And a part of me worries they are doing things this way explicitly to conceal the implementation details of their silicon at the expense of performance and security. Which is unacceptable.
Really, isn't it more logical to let CPU vendors make the compilers, have them release these as open-source software, and throw away this microcode VM bullshit? Then all these "hardware" bugs would reveal their true identity as plain old software bugs. CPUs could just be told all the static analysis assumptions that are figured out at compile time and implement things like hyperthreading and speculative exec at that level, as instruction set extensions to be directly managed by the OS and userspace software. That way, systems could just... selectively opt not do any combination of these on a thread by thread basis, instead of globally disabling CPU features. CPUs could disable and enable hyperthreading, speculative exec, register renaming, all of that stuff on the fly based on instant OS scheduler management decisions and the requirements of userspace.
And knowing the legacy of x86, at what point should we just pull an Apple and drop the x86_64 ISA altogether and just tank the perf hit on backwards compatibility for a few years in exchange of better security and performance potential down the road? When are AMD and Intel planning to develop a brand-new ISA for PCs, if at all? It doesn't necessarily have to be RISC-based, but maybe they'd be able to take back that segment of the server and high-performance computing space that has since been taken over by ARM-based processors and GPUs... Maybe even convince Apple to get back on the Intel train, hmm? I mean, they don't even have to stop making x86-based CPUs, necessarily. They could still be making them for the people that still need them (I mean, modern enterprise motherboards with 90s I/O are still being made, alongside compact little SOMs that use modern manufacturing techniques to run DOS, Windows 98 and all the decades-old, pre-Internet software and hardware that only works there, and never needed security updates because they are total data islands that never interact with foreign information). But then they could freeze the performance level of this ISA instead of continuing to desperately look for ways to speed up this old behemoth.
Every major CPU and GPU vendor seems to have experimented with VLIW instruction sets. It turns out that writing efficient compilers for them is a pain in the butt. The CPU and GPU vendors may still be using VLIW under the hood: only they brought the compiling in-house.
Maybe it's a feature and not a bug, just not published.
I prefer the shadow version of liveOverflow
I honestly love that Zenbleed exists. This will mean lots of cloud providers offloading Epyc Rome chips in massive quantities. it'll be a great time for building or upgrading a homelab very soon
Sounds like they just need to get a BIOS update that patches the microcode.
@@jamesphillips2285 it comes with performance penalties of up to 50% depending on workload. Many cloud providers will not see that as an acceptable trade-off
Updating microcode is just an amd package in your Linux distribution.
Woudn't that just leak only older/yonger bits of stored data?
So if there was string "ABCDEFG" woudn't that just leak "DEFG", and also arent all passwords stored (and compared) as hashes and present as text only during hashing operation?
I think raw passwords were used just for illustration purposes. saying "password hash" would have been a little more cumbersome. Also passwords may not be the only sensitive data a machine is processing. You may be signing messages with the VM's private key. It may also allow information to leak out of sandboxes that things like webbrowsers are now using.
Similar thing found on Apple M1 , M1RACLES (CVE-2021-30747)
So the registers are kinda like how we normally store files on a disk
a file on a file system that keeps track of where that file is physically stored in this massive pool of sectors aka the disk
Correct me if that's a stupid analogy
22 52 zeroflag instead of written zeros? This shook me
22:03 Uber nerd glasses push.
Ipmi can be used to update those I think
Is there supposed to be "advertisement" written top right all the time?
yeah, it's sponsored by google
@@LiveOverflow Ah, thanks. I thought it was a mistake. :) Great video!
CPU design is fantastically interesting.
Corporate doesn't deserve protection
wow this one was loaded with info, thanks
Esisteine schwarze Box. Everybody knows what MSRs are and needs no explanation.
Maybe I missed something, but what really is the attack target for this types of attacks? How do you recognize the leaked data represents some password, instead of some useless data?
Btw. I really think they should at least provide convenient means to fix hardware/firmware issues like that. It is at best questionable service.
For example if you target leaking data from /etc/shadow. You know the line starts with root:, so you look for a leak that contains this. And you repeat this slowly leaking additional characters.
@@LiveOverflow Ah, of course. Thanks for the clarification!
fab explaining fab vulnerability.😅
Let’s goooo
I wish this video was shorter and more to the point, I feel like it repeats a lot and takes longer then needed to explain certain concepts.
I read yumm yummm yummmm
If you need physical access to a machine, like with 99% of the other recent "hacks", than this is just more drama.
Yeah, well, you don't. At all. Did you even watch the video?
Real fact , your CPU can run 30 percentage slower after fix for this bug
im assuming if proper security measures are taken, a simple hashing of passwords for example, could mitigate the effect this issue could have, please correct me if i'm wrong
This is a vulnerability concerning multi User systems. If user A can exploit this to get information of User B's running processes. User B can of course make the information in his processes less valuable, but the issue remains, that user A can leak User B's information.
Yeah, as long as the hashing algorithm is a security one then all this issue is going to do is leak the hash of the password and not the password itself.
This is not going to work as the password could still be leaked before being hashed. Mitigations like this would require application developers to put immense amounts of effort into obfuscating their own private memory and probably even more performance detriments than mitigations in the kernel or microcode.
@@telceantiberiu7715 The password being leaked before being hash is outside the scope of the problem. We are only answering whether hashing the password will mitigate the issue.
We need an open CPU architecture and deterministic execution.
Speed should be achieved via software optimizations, not hardware.
Brought to you by Intel lol
Still nothing compared to what intel has had to with in the past
Hello all, i am 7 minute late ❤😂
First! Saludos from Spain, love your channel
I don't think speculative execution works they way you explain it. Modern cpus can execute all branches up to a certain point, and return only the validated path one later. There's no "could have executed" path, all paths run until you can validate the correct one.
That is the essential problem with speculative exec, and the feature should be removed in all cpu architectures, otherwise these bugs will keep surfacing and patching a few registers to zero after a vulnerability publication does not fix the underlying problem. More hacky patches leads to more complexity and more undefined behavior.
I'm not an expert. But everything I have read about speculative execution bugs, it has always been an important part to craft assembly in a way that specific code is speculatively executed. Which means not everything is speculatively executed. Where have you read the "all paths run until you can validate the correct one"?
The researcher is really irresponsible. Releasing working code of the undetectable exploit, before AMD has issued a patch, is extremely stupid. People will be vulnerable for a few months now and no malware detector can spot this.
why do you think the code was released before the issue got patched?
@@LiveOverflow because consumer CPUs won't be getting a patch for another month or two