grab a yubikey at yubi.co/lowlevellearning-2024 and secure yourself online with two factor authentication. thanks again Yubico for sponsoring todays video!
Isn't the Yubikey 5 series vulnerable to side-channel attack? (Ninjalab's EUCLEAK). Edit: I see the firmware version is 5.7, where the vulnerability has been patched.
@@StephenKingston To my knowledge that requires physical access. You likely already know, but for others reading my comment that are unaware, for a system that is in use, and therefore is at least partially decrypted but likely effectively unencrypted, is basically hacked if a malicious actor gains physical access. There is very little you can do. So basically, in the situation you could physically grab the Yubikey and take it, you can also digitally copy it.
So we are going to essentially compute hashes on addresses, and every single time we want to use said pointer? I wonder what kind of impact on speed it could have?
Probably very small impact, each memory access already involves things like virtual to physical address translation and caching, they can probably squeeze hashing in there without increasing latency
Shouldn’t it have none? I only understand the theory and haven’t implemented this yet, but isn’t this akin to putting a primative data type onto a pointer?
And there I was thinking, after reading the title, using "signed" pointers meant to allow them to have negative values, and could not think how that would be feasible. This makes more sense.
there is no such thing as a signed pointer..... it is a binary number, pointer is just a collection of bits..., you could designate any bit a stupid name... like "sign bit" , or "hacker defeat bit" , but it means FA, becasue it is only down to the WAY YOU interpret those bits that gives them intrinsic functionality. you could take that "pointer" or signed pointer, then put it into a box that says ascill characters in 8 bit groups does that now mean the pointer has magically transitioned into ASCII? nope, it means you just redefined how you want it interpreted. now let's say you wrote a shitty ASCII interpretation routine that only assumes 6 bits are passed and the top 2 are always zero bits.... and that "magic hacker proof pointer" potentially just became your worst nightmare
Actually, since existing x86-64 implementations don't implement a full 64 address bits and the architectural specification requires that all unimplemented bits have the same value for a pointer to be valid, x86-64 pointers are effectively signed in the positive/negative sense. If you consider them unsigned, the set of valid pointers consists of two separated subsets at opposite ends of the address space. If you consider them to be signed, it's all one set in the middle of the address space (immediately on either side of zero).
No, but maybe you want to limit the number of people that write software that gets distributed, especially OS stuff. You couldr put resources on checking the software. There is a lot of private use and educational use of software.
I was so distracted by this, not because it bothers me but because i was trying to figure out if im just stupid or if its out of sync because I felt something wasnt quite right but was to stupid to actually figure out if it was..
@@dennisestenson7820 are you sure? Because I watched it on a computer. I don't think it has anything to do with what you're saying. I think it's just like 50ms out of sync.
6:30 - So ARM is limiting their virtual address space per program? 29 bits of a 64 bit address space being reserved seems like it would mean you can use up to just over 34 million addresses. While that may seem like a lot for most programs, I do wonder if it is kinda a "640k is all you'll need" sort of situation.
21 час назад+54
A simple solution is to disallow modifying pointers altogether. On the AS/400, for example, *only* the platform firmware can hand out pointers. You can't create pointers out of thin air, and you can't modify them. Once you perform arithmetic on a pointer, it's no longer a pointer but just a number. In addition, pointers are typed, and they are tagged with an owner and with access restrictions. Another nice feature of the AS/400 is that only the OS can compile to native code. You cannot write native code, and you can't compile to native code. You can only compile to intermediate code, the OS then compiles to native code. And the intermediate code is "safe", i.e., it contains no instructions that could violate memory safety, pointer safety, or type safety. This idea also exists in Microsoft Research's Singularity OS. All code is delivered as high-level byte code, together with a manifest that contains a list of all the privileged actions the code needs to perform. The compiler will first prove that the code only uses the privileges listed in the manifest, otherwise it will reject the code. Only then will it compile it to native code. And just like OS/400, the compiler is a privileged OS service that cannot be invoked by a user. Singularity uses what they call SIPs (Software-Isolated Processes) for process isolation. In fact, all of Singularity runs in Ring 0 in a single address space. Why? Because the language itself already guarantees stronger isolation properties than the CPU and MMU can anyway, so why bother? Each SIP has its own object space with its own garbage collector. There are no instructions in the byte code which can access memory or manipulate pointers. All data exchange between SIPs is through message-passing. Messages are only ever owned by one SIP: before sending, they are exclusively owned by the sender, after sending, they are exclusively owned by the receiver - there is no shared memory, ever. SIPs define their messaging protocols statically, this allows the compiler to generate code that actually operates on a shared memory region. So, you effectively get the safety and semantics of message-passing shared-nothing concurrency with fully isolated processes, but with the performance characteristics of shared-mutable-memory concurrency using lightweight threads. Sadly, because of backwards-compatibility requirements with C, approaches like this never see the light of day.
So according to this bullshit... it's not possible to "hack" MS programs, which we all know it nonsense... RING 0 has been hacked so many times it's now more open than a clowns pocket. Plus your fantasy makes ABSOLUTELY NO correct assumptions about "glitching"
This fascinating- have you got a recommended article I can google to read up on this? Or if you’ve written a blog article or something on it perhaps? The concept (although Microsoft) is very interesting- taking the lowest level of computing and just whacking the OS right on top of it sounds like (for most these days) an absolute nightmare. If what you’ve stated is anywhere near accurate then it’s an interesting approach and I want to understand it and the caveats that come with it (some of which you’ve listed already)
@@TheBlegghas I understand it, indexing an array requires a system call. This call will check that the offset is valid. The way the architecture works this isn't as slow as it sounds. You don't have address spaces or context switching any more iirc.
It's possible to do this and be compatible with C. The CHERI project out of Cambridge and ARM Morello both accomplish this. As for performance, your CPU is already doing this check when it does virtual address translation. If the objects are finely grained enough, it's basically free.
I worked on z/os kernel for many years, this problem was solved by verifying pointers by making sure something in the operating system points to it. No storage is copied into directly.
@@thecodemachine I've worked with compilers that use it compiling code from the 80's. Lot of hidden undocumented features in the compiler that made modifying the code error prone with no way to figure out why other than trial and error.
The boundaries aren't really clear with modern processors. There is multiple contexts that could be considered kernel and user spaces, each with their own security features and simultaneous thread.
I’m using this channel to understand the spoken English, but idk why in my native language (Spanish) nobody is talking about these features, that’s cool man, you really know a lot of this stuff I appreciate your content buddy
@@stevesteve8098 It only has to be valid for the length of a kernel function call. It is not unreasonable to require kernel functions to reget memory if a bad return code is called.
I once found a weakness in this where you can use tail call optimisation and exfiltration and longjmp (if you can line all those up) as a pointer signing widget. Tail call optimisation causes your choice of return address to be rewritten with a predictable corruption of a correct signature before being passed to the next function, but this doesn't cause a fault, yet. If you can then exfiltrate that address then you can fix it yourself to get your pointer with a correct signature. After that you need to avoid the exception, which you can do via longjmp(). Next time around use the pointer you exfiltrated last time, with the correction, and return normally rather then via longjmp() to the destination of your choice.
Does anyone else ever fantasize about what we could do with our computers if half of our development time and half of the computational energy didn't have to be dedicated to security?
No. What I fantasize about is what we could do with our computers if "hello world" would compile to less than ten gigabytes (with 29345 files in tow, all irreplaceably ESSENTIAL!) these days. And yes, this is hyperbole, but it unfortunately holds true for ANYTHING more complex than "hello world".
Simple fix. Switch to temple OS! No security, the user is king! But ya, we also have inbuilt protected memory addresses and stuff for operating systems.
@@AttilaAsztalos roll everything yourself. I built my own graphics framework to replace SDL2 in my FOSS projects. One of my builds dropped from 8 MB to 2 MB. I replaced GLM (which is practically an industry standard for graphics programming) with a custom linear algebra library and gained a 30x FPS boost in my UI. Granted, I don't need several thousand FPS for something that isn't a game engine, but it still amazes me how inefficient GLM was considering how widely used it is and how easy it was to reimplement.
@@vrclckd-zz3pv I would if I could. But I'm a chip monkey, not a code monkey, and condemned to remain so (yes I did try). On the other hand, the full-3D full-parametric CAD I use almost daily is a SINGLE FILE, and is, in total, smaller than ten MEGABYTES. It's called "Solvespace", in case you were wondering... Now, maybe, you understand why half-terabyte abominations make me mad.
Pointer tricks like this are terrible. Back when I was putting Linux username apps to aarch64 we noticed certain apps and libraries advise the upper bits of 64bit pointers. The problem was 64bit arm was enabling 52bit addressed memory... Some super computer somewhere required that much size, but I digress... We were not able to compile things like Mozilla warez because of these clever hacks. These are nice until one day the entire address space is taken for memory. Yes people, even on 64 but machines not the entire 64 bird are given over to memory, because that's crazy talk... But every 12 years or seems two more bits go to memory. These kind of trucks will eventually be gone.
on big endian systems like IBM Power with tagged memory the tricks do not eat into allocatable memory like that, only one bit per many bytes to hash the page table, in little endian the address is the 'wrong' way around so more overhead i think
@foobarf8766 that's not how I remember it working, not the big endian IBM machines with tested memory, but rather abusing pointers. Most 64 bit architectures simply don't use the full 64 bit, it's 48 bits, 52 bits, etc... so 12 or 16 bits left over for any number of tricks, such as pointer hashtag but it could be anything. Like I mentioned before, when we raised the bits boundary from 48 to 52, it suddenly broke certain things because a handful of people realized they could do pointer tricks. Perhaps this is where IBM power-be would help, with memory tags? Incidentally I was also on the team that bootstrapped IBM power9 little-endian to Linux, but I don't remember the tag stuff, but that's possibly because it simply wasn't broken such as this idea for pointer hashtags... We didn't have to fix anything in the port.
This makes sense, E2K uses 128-bit descriptor-"adresses" for secure mode, 64-bit adresses for normal mode, and 32-bit for x86-compatibility. And E2K has separate 4 bit tags per 64 bits of data/address.
Stack limit register might have been even better option. Imagine a hidden register that could be modified only by special, for example, "branch, link and limit" instruction family, to point to address of return address value in stack and any access beyond it that is not by 'ret' instruction will generate an exception. Should be cheap to implement at hardware level and transparent to use, but hell to implement in backwards-compatible way though, since having an extra value alongside return address would definitely be an ABI-breaking change.
I was thinking it would be baked in the compiler, and it is, just not for all the use-cases. It really looks clunky for, lets call them user introduced pointers, for a lack of a better term.
its just one more assembly instruction, it might be a few cycles slower than the return call that doesnt check but we are probably wasting way more cycles per call doing other kinds of checks to mitigate this problem. Resource hungry malware like Denuvo is being installed into hardware to stop these kind of exploits, if we can do away with any of those, we end up reclaiming wasted overhead.
Correction at 5:34: the PML4, PDPT, PD, PT tables are NOT inside the CPU! The PDB inside the CR3 register is the only thing inside the CPU that bootstraps the Virtual2Physical address translation process (apart from the general purpose register containing the virtual memory address). All the tables are inside RAM, except for the TLB cache.
Cryptographic authentication of pointers... that's, um, interesting. Refusing to make fundamental changes to the architectures we still use leads to all kinds of delirious approaches. Completely separating return stacks from data stacks would solve 90% of all security issues due to stack corruption. You could still corrupt data, but not make the code go where it shouldn't. And if you add to that proper validation of your inputs everywhere it's at least moderately critical, you solve probably 99% of issues. There will always be this annoying 1% left, but it may be deluded to think we can get to 100% anyway. And, these changes would imply a bit more work for sure, both on CPU design and on software, so we prefer "enhanced status quo" (= do everything as usual, but with new tools).
one thing people often dont know is that the absolute minimum requirement to jump to an arbitrary address (im aware of other methods, but this one is very simple) in x86. all you need is 6 bytes, equivalent to"push [32 bit address]; ret;" in assembly. this jumps to an arbitrary address, as "ret" is surprisingly more complicated than one may expect
This has been a 'solved' problem on IBM Power for decades, in big-endian mode there are memory tagging extensions that use a hashed page table. Some other great stuff in Linux like wipe/fill on free and alloc works great on Power too. In little endian the overhead becomes heavier because check bits go at the 'wrong' end of your address so you need 'software' solutions like to XOR or put the check bits where they eat into allocatable memory. It's good that Arm are bringing this feature back into general purpose processors, it's been a thing on IBM Power for like 30 years.
Thanks for the video - It’s a very interesting concept. My only hot take question is theoretical and perhaps naive since I’m not well versed in security - Since we expect compiled executables to be portable between machines that are of identical platform and spec, wouldn’t the executable require the decryption keys to be bundled in? And if yes, how would those be protected?
Yeah! That's one of the techniques that can be used to avoid buffer overflows: make sure you only copy as much into the buffer as space it has available.
Yes. However, he's just using the obviously-unsafe gets() as an example. Real-life code is much more complicated, and mistakes happen, it's not always that easy. For example many low-level protocols are of the form "2 bytes encoding the length, followed by data" Now if the attacker lies about the length of the data... There are tons of other ways to trick software to read more (or less) than it should.
You still have the problem that the CPU assumes that you are returning to the same place you came from. A buffer overrun hack abuses the CALL and RET machine code operations. When the CPU executes a function return, it assumes the top 8 bytes of the stack are the correct address to return to and will go there, and doesn't check if the value of those bytes have changed since the last CALL instruction wrote them there.
If I remember correctly, the IBM AS/400 and its descendants have done this for decades. Their pointers are extremely long. Disclaimer: my memory comes from the dim and distant past. Please correct me if I am mistaken.
I have been under impression that we don't access physical addresses anymore? That raises the question:does row pointers perform jump instruction to the address they point to, or they move instruction pointer to the said address?
...Wouldn't shift-arithmetic on the pointer (sig = iptr >> 35) reveal it? And if the signing algo is public knowledge (it _will_ be whether it's intended or not), you could do shenanigans like (iptr = getSig(badptr)
It's stored on the actual value of the pointer, you only ever have access to the virtual memory address of your program. You ask to sign 00000000, the value in the return register, the mmu maps that to 00001024 where the first four bits are reserved for things other than addressing, it sets the signature in the bits it doesn't need but in your program you still read 00000000 thanks to the mmu. If you don't use the mmu then you probably could, supposing the instruction is even supported in that mode, but it wouldn't make much sense to use real mode and care about this.
Ahh, I wish the title of the video was properly explored. Naturally, it does a great job explaining what pointer signing is about, but I wish there was a discussion section in the video focusing on adoption and whether it ever makes sense to sign *all* pointers. Cool topic though!
Yes but there's a performance penalty since it's an optional feature. ARM Trustzone is a parallel system that is mostly invisible to the executing environment.
in the video it looked like it required addition code and it generated custom instructions in the asm? it also sounded like it does more work on the chip to sign, verify, and authorize all the pointers? shadow stack is just a compare of pointers in a seperate hardware pointer stack and doesn't require addition code, iirc?
@@nightshade427 Yes it's "just" that but it's not a separate system. It's built into the chip and it's dark silicon unless it's used. That means sometimes it's using features that are LOTO until a normal thread can use it so that's the performance penalty which also precludes side-channel attacks.
If that sense, doesn't it means the "malicious pointer" still exists somewhere? So that mal-function can still have a signed pointer? How signing the pointer helps in that sense?
When it comes to this types of vulnerabilities, the C/C++ was always mentioned, but I wanted to know: does programs written on Rust, Python, Javascript (Node.JS/Deno/Bun) can be hacked by changing pointers? Or those languages already used some sort of encryptions and doesn't need more protection than out-of-the-box?
Hey stupid question that you probably won't see but somewhat relevent to the video : how much of a security risk from cuda allowing to create memory buffer that is visible by both the GPU and CPU ? I'm asking myself that question because ( from my understanding) it's allocating memory and directly give access to the physical adress of the memory pointer so that both the GPU and CPU knows exactly where the data is. The gpu drivers then is supposed to keep this memory range allocated (cudaMallocHost) until you dealocate it(cudaFreeHost) ; however I remember a while back a bug that the drivers never dealocate the memory even if we deallocated the memory in the program. The result was that my benchmark program for gpu testing was able to crash a 128 GB computer with 2 xeon and 2 top of the line GPU at the time with SLI split in 2 numa banks. Also worked on a standard laptop. The test was benchmarking specifically targeting memory transfert between CPU & GPU and running it engough time it was able to reserve most of the memory to itself even after we stop the process. I followed the RAII principle using classes for all of the GPU related memory aquisition and using the standard library for anything else. I even mannaged to make a template that is able to alocate the correct type of memory for a CPU lambda that you want to send to the GPU, however the issue was still there afterwards.
Is pointer signing enough? It's still possible to do arbitrary function call: 1. Find an exploitable function. 2. Somehow read signed pointer to that function. 3. Overwrite the stack with a valid frame mimicking call to that function. 4. Overwrite LR to our signed pointer. 5. Return.
I once coded an array of pointers to functions. It was tricky. I'm sure pointer authentication will increase the complexity another notch or two. At least pointer authentication is an option at this time.
The whole point is the hacker wants the program to return to an address they specify. If you put the original pointer back then that won't happen. You also can't just put the "signature" (top bits) on top of your own pointer as that will fail the check.
This would require a separate data structure/memory location to store return addresses in. Forth does something similar with its distinction between the data and return stacks. Say the programmer was holding a function pointer in a stack variable and calling that later. Having a separated stack isn’t going to stop the user from getting an arbitrary execute primitive if there’s a buffer overflow on the stack. I don’t think the extra security potential it adds outweighs the performance impacts that might have.
@@imciviled So I'm not an expert on this at all but I don't think you've described the mechanism correctly. My understanding is there is no separate pointer storage. The pointers are in the usual places but simply with additional decoration (hashed high bits). This decoration is done by the kernel when requested and when the function return pointer is popped with the secure version of that instruction then the kernel verifies the decorated pointer and throws an exception if it doesn't pass. You are right that this will certainly impose performance restrictions simply due to the extra work needed to compute and verify decorated pointers. There will also likely be cache locality impacts.
@Masq_RRade@@nowave7 Offtop: There is no such thing as 'it's just bytes, you can interpret it however you want' in C, because strict aliasing rule, at least not without using functions that implicitely create objects like memcpy.
Seems like this would interfere with NaN packing that many dynamically typed languages use in order to store pointers and other data types inside the diagnostic information portion of NaN floating point numbers. (Basically turning 64-bit floats that are NaN into enums that store both type information and value).
20 years? We've had something more secure since 1963, and we're now starting to see it in consumer CPUs with CHERI and ARM Morello. Much stronger than simply signing a pointer, capability hardware can actually enforce memory safety (and even capability safety), and when well designed it's even faster than a traditional MMU.
Yep, CHERI is a great approach. As I said above, add to that separating return stacks from data stacks (which I think should have been done long ago) and you eliminate most issues. Separate return stacks are not a new idea either. They tend to make things slightly more complicated, but not outrageously so. I think there is little excuse not using that these days. And yet...
That's one simple approach! There are a range of protected stack models that have worked well. The technique on the B5500 is inspirational even if it doesn't directly translate. The BiiN/i960 used a simple model that matched RISC register window usage too. @@joseoncrack
Someone explain what I am not understanding. Can this not just be implemented across all architectures when the code for physical and virtual memory is written? Why is this specific to ARM?
While the operating system kernel plays an important role in memory mapping, most of the heavy lifting is done by the memory management unit in the CPU. This is essential for good performance. That means that some features need support in the MMU hardware to be effectively implemented.
The really secure way to hold return pointers inside of registers instead of the stack registers, I am too lazy to search for the source but I am pretty Intel and Arm both work on this
certainly makes sense, but also only helps for the immediate return of the current function. All layers above would still need to store their return pointers on the stack.
Another mitigation is implementing a shadow stack that only holds saved return addresses. Calling a routine pushes both to the actual stack and the shadow stack, returning from a routine pops it from the stack and checks against the shadow stack. This is already implemented by Intel processors
5:40 - it’s pretty common for the virtual & physical addresses to share the bottom N bits (ie. They aren’t remapped). Where N relates to the page size, as in 2^N. Your example is different - maybe you could have used 0x9e90 vs 0xface in the physical example? Actually for a 4K page side that’s just the 0xe90 part (12 bits). #JustThinkingOutLoud
Everyone in security: You should never use gets. People from other languages: So they're going to fix it, or deprecate it, right? Security people: :) Other people: .... RIGHT?!?!?
A failed authentication causes a program crash. If an attack crashes your program 9 times out of 10, the likelihood that the vulnerability is detected and fixed before any real damage is done is extremely high.
I wonder if 128 bit machines are going to be come sooner than we think. If mitigations like signed pointers will become a thing, then the more bits in addresses etc, the better. The more bits for encryption, the better right? Or I guess, the better the algorithm, since FS EC is less bits than RSA right, like 256bits upwards?
Unlike normal encryption, pointer encryption is not the first line of defense. A single wrong guess crashes the program, so any attack that doesn't always work will be detected very quickly and the vulnerability in the code can be fixed. This means that the encryption doesn't have to be very strong. The downside of using more bits is of course that at the very least one needs more silicon area on the CPU chip. In addition larger hashes take longer so there can be a performance penalty. As long as the hash is quick enough that doesn't matter because memory access is slow compared to other CPU tasks, but if the time rises above a certain amount of clock cycles there will be an impact. (Kind of like putting an extra object into a box that still has space is no issue, but if the extra object becomes too big you need to get a bigger box)
I wonder how this handles systems where the programmer i.e. has an array of some struct referenced by a char pointer add 3*sizeof(the_struct) to some position in the array to get the bytes of the 4th struct instance in the array after the current pointer.
I'm curious about how that signing works. Maybe it works against overwriting pointers using byte-by-byte or number over pointer writes. But I doubt it can save us from situations when code already writes pointers based on input. It sound more like randomization of memory layout, but with more predictable behavior.
Firstly, as someone has said, I wonder what kind of impact this has on code execution. Because this is entirely done in hardware, from what I've read, I think the performance might not be that bad. However, I also KNOW that this kind of pointer authentication has already been successfully bypassed. Moreover, if you can modify the return address, there's a high likelihood that someone can modify retaa so that it returns without authentication.
About your last point: retaa is an instruction located in the part of memory where the program is stored which is usually write-protected. While the actual return address is simply a value on the stack. If an attacker manages to remove write protection on the program they already have arbitrary code execution and don't need to mess with returns anymore.
The buffer is stored on the stack, the same part of memory where the return address is located. However the instructions are stored in a different (usually write-protected) part of the memory so they can't be changed by this kind of attacks.
Maybe I'm missing something, but what prevents an attacker from exploiting something else to get their pointer signed? The clang API has to result in actual machine code; why can't they execute the same code? I suppose that if currently, the most common path to exploitation is rop gadgets, then you can't start the execution of those via retaa. But I'll be surprised if no other path exists.
@bryankadzban1159 If an attacker gets the ability to execute code, this becomes an irrelevant meassure. However, this is supposed to prevent an attacker from gaining execution in the first place. If the way an attacker wants to get execution of their code is to change a pointer to anything else, this prevents it. The attacker doesn't yet have gained execution to ask the system to sign the pointer, they just have the ability to write an essentially infinite amount of data that they want to use to manipulate the way execution happens. They can't pre-sign the pointer due to the key necessary for signing being part of the system. So in the end, if implemented correctly, an attacker never gains access.
I don't agree. I feel we need a new executable format entirely with a new kernel memory layout that is more secure. PE/COFF and ELF both came out around 1992 or 1993. about 64 bit. aren't our CPUs 48 bit or 57 bit for practical reasons?
Does the program crash if someone tampers with it? Then it would be a quick way to exploit and crash other programs like anti maleware… If not 29 bits is not enough, as you could just keep guessing and within a few minutes you likely gain an overflow
The program crashes. But an attacker can only crash the program that has the vulnerability, not other programs. They could crash this program anyway if they wanted to.
I don't get it. I thought all of this was prevented by using virtual memory? ARM is used for lots of embedded systems, so maybe there is some benefit there where you don't already have segmentation faults.
Promotion of USB Key is kind of hilarious as you can think of the Attack Surface they have themselfes... and the fact that a compromised system possibly can use them nevertheless.
How can you disable NX with a simple write? You would need to write to the page tables in kernel memory, which wont work unless you already exploited some other bug….
modern CPUs and their NPUs allow direct memory access. So they seem like a perfect attack vector for privileged hardware (inside the CPU) that can run code easily... Like from a website doing some NN inference locally via webNN or something.
Memprotect is broken and will be fixed… with all pages handling user input permanently marked as non-executable in addition with a memory safe language there is much security to be gained. Hashing pointers needs hardware AND compiler support, it‘s going to take time.
So the MAC of the pointer is stored in a memory address, hmmm, Lets just compute our own MAC for our malicious pointer and store it in that memory address. Problem solved, hack continues as normal. Just a speedbump.
Wouldn't it be simpler to just create an instruction for assigning pointer ranges (and an instruction for removing those ranges)?. The CPU can just store the addresses in a dedicated cache or dedicate a part of existing cache for the addresses. It already has something for page ranges so why not something for pointers within those pages. Personally I would like dynamic memory from malloc/new/etc to only be accessible via dedicated functions like memcpy or something. For example if you create a list with malloc the memory given should not be directly addressable but instead to get any element in the list you'd have to use memcpy to copy it into a stack variable. I'm currently designing an arena allocator with just such a property, it doesn't hand out addresses but offsets from the base address. My malloc replacement I made for testing purposes just adds the base address back because I can tell the arena on creation AND on growth that I don't want it to move. At the moment it uses actuall memory for it since I need to iron out bugs but later once I've got it working I'll convert it to a variant that uses something similar to /proc/self/mem. If I could make access to the file/pages require a key then I could protect them both from externally controlled read/writes. The problem would be how would debuggers be able to work on it then 🤔
It's called CHERI and it's still being developed. On 64bit it extends pointer to 128bit and encodes some properties (length and allowed access) into the pointer itself. Unfortunately there's ton of poorly written shovelware that assumes "void *" is the same as "unsigned long" and lots of code that needs to be recompiled.
29 bits isn't exactly a "strong" MAC. It's better than nothing, I suppose. MACs are awesome though. Stick em on the end of a public facing ID and you can reject requests with inaccurate/mistyped/brute forced information without needing to check your database if the information was correct 99.99999% of the time. Can be applied to license codes, API keys, or any sort of ID that you distribute to another party or system
grab a yubikey at yubi.co/lowlevellearning-2024 and secure yourself online with two factor authentication. thanks again Yubico for sponsoring todays video!
Don't forget to tell people to NOT use honey on that nice little affiliate link lmao. Liked the vid!
Isn't the Yubikey 5 series vulnerable to side channel attacks? 😄
ninjalab.io/wp-content/uploads/2024/09/20240903_eucleak.pdf
Isn't the Yubikey 5 series vulnerable to side-channel attack? (Ninjalab's EUCLEAK).
Edit: I see the firmware version is 5.7, where the vulnerability has been patched.
@@StephenKingston To my knowledge that requires physical access. You likely already know, but for others reading my comment that are unaware, for a system that is in use, and therefore is at least partially decrypted but likely effectively unencrypted, is basically hacked if a malicious actor gains physical access. There is very little you can do. So basically, in the situation you could physically grab the Yubikey and take it, you can also digitally copy it.
What I would need to do if I would loose a Yubikey? What if I wouldn’t know that I have lost it?
So we are going to essentially compute hashes on addresses, and every single time we want to use said pointer? I wonder what kind of impact on speed it could have?
Probably very small impact, each memory access already involves things like virtual to physical address translation and caching, they can probably squeeze hashing in there without increasing latency
@@nowave7 If it's built into the cpu instruction set, theoretically it could be done with little or no performance impact.
I think you can already try it out on ARM devices (or Apple Macs) with pointer authentication
Shouldn’t it have none? I only understand the theory and haven’t implemented this yet, but isn’t this akin to putting a primative data type onto a pointer?
Good point. Profile it and check, I'd love to know the results.
Security is always a tradeoff, so you have to think if it's worth it or not.
And there I was thinking, after reading the title, using "signed" pointers meant to allow them to have negative values, and could not think how that would be feasible.
This makes more sense.
me too ! signed for signature
there is no such thing as a signed pointer..... it is a binary number, pointer is just a collection of bits..., you could designate any bit a stupid name...
like "sign bit" , or "hacker defeat bit" , but it means FA, becasue it is only down to the WAY YOU interpret those bits that gives them intrinsic functionality.
you could take that "pointer" or signed pointer, then put it into a box that says ascill characters in 8 bit groups does that now mean the pointer has magically transitioned into ASCII?
nope, it means you just redefined how you want it interpreted.
now let's say you wrote a shitty ASCII interpretation routine that only assumes 6 bits are passed and the top 2 are always zero bits.... and that "magic hacker proof pointer" potentially just became your worst nightmare
Lol, lmao even
Actually, since existing x86-64 implementations don't implement a full 64 address bits and the architectural specification requires that all unimplemented bits have the same value for a pointer to be valid, x86-64 pointers are effectively signed in the positive/negative sense. If you consider them unsigned, the set of valid pointers consists of two separated subsets at opposite ends of the address space. If you consider them to be signed, it's all one set in the middle of the address space (immediately on either side of zero).
lol, same.
At this point we should disallow everyone who is not a level 3 magician access to a c compiler
Ken Thompson hack. There goes the rabbit hole.
Good old IBM Days
No, but maybe you want to limit the number of people that write software that gets distributed, especially OS stuff. You couldr put resources on checking the software. There is a lot of private use and educational use of software.
Or maybe we should help people become a level 3 magician instead of scaring them away from foundations. In the long term this seems more sustainable
Ah yes so then C just does with the wizards because novices are not allowed to learn it with practical knowledge.
You got sponsored by Yubico ? 😳😳 bro they don’t even put out ads, You basically bagged the most legit sponsor any YTber has ever had.
The audio in this vid is ever so slightly out of sync, just FYI
I was so distracted by this, not because it bothers me but because i was trying to figure out if im just stupid or if its out of sync because I felt something wasnt quite right but was to stupid to actually figure out if it was..
@@robshaw2639 I just noticed, god damn
Thank God I haven't noticed it yet with my headache.
It's 60fps and my phone can't handle it. I wish RUclipsrs would stop the trend of posting 60fps at phone resolutions.
@@dennisestenson7820 are you sure? Because I watched it on a computer. I don't think it has anything to do with what you're saying. I think it's just like 50ms out of sync.
Thanks, I will talk about this in my next Bumble date
I think you meant Grindr.
FurryMate
will wait for screenshot
6:30 - So ARM is limiting their virtual address space per program? 29 bits of a 64 bit address space being reserved seems like it would mean you can use up to just over 34 million addresses. While that may seem like a lot for most programs, I do wonder if it is kinda a "640k is all you'll need" sort of situation.
A simple solution is to disallow modifying pointers altogether. On the AS/400, for example, *only* the platform firmware can hand out pointers. You can't create pointers out of thin air, and you can't modify them. Once you perform arithmetic on a pointer, it's no longer a pointer but just a number. In addition, pointers are typed, and they are tagged with an owner and with access restrictions.
Another nice feature of the AS/400 is that only the OS can compile to native code. You cannot write native code, and you can't compile to native code. You can only compile to intermediate code, the OS then compiles to native code. And the intermediate code is "safe", i.e., it contains no instructions that could violate memory safety, pointer safety, or type safety.
This idea also exists in Microsoft Research's Singularity OS. All code is delivered as high-level byte code, together with a manifest that contains a list of all the privileged actions the code needs to perform. The compiler will first prove that the code only uses the privileges listed in the manifest, otherwise it will reject the code. Only then will it compile it to native code. And just like OS/400, the compiler is a privileged OS service that cannot be invoked by a user.
Singularity uses what they call SIPs (Software-Isolated Processes) for process isolation. In fact, all of Singularity runs in Ring 0 in a single address space. Why? Because the language itself already guarantees stronger isolation properties than the CPU and MMU can anyway, so why bother? Each SIP has its own object space with its own garbage collector. There are no instructions in the byte code which can access memory or manipulate pointers. All data exchange between SIPs is through message-passing. Messages are only ever owned by one SIP: before sending, they are exclusively owned by the sender, after sending, they are exclusively owned by the receiver - there is no shared memory, ever.
SIPs define their messaging protocols statically, this allows the compiler to generate code that actually operates on a shared memory region. So, you effectively get the safety and semantics of message-passing shared-nothing concurrency with fully isolated processes, but with the performance characteristics of shared-mutable-memory concurrency using lightweight threads.
Sadly, because of backwards-compatibility requirements with C, approaches like this never see the light of day.
So according to this bullshit... it's not possible to "hack" MS programs, which we all know it nonsense... RING 0 has been hacked so many times it's now more open than a clowns pocket.
Plus your fantasy makes ABSOLUTELY NO correct assumptions about "glitching"
How do arrays work if you're not allowed to do pointer arithmetic?
This fascinating- have you got a recommended article I can google to read up on this? Or if you’ve written a blog article or something on it perhaps?
The concept (although Microsoft) is very interesting- taking the lowest level of computing and just whacking the OS right on top of it sounds like (for most these days) an absolute nightmare.
If what you’ve stated is anywhere near accurate then it’s an interesting approach and I want to understand it and the caveats that come with it (some of which you’ve listed already)
@@TheBlegghas I understand it, indexing an array requires a system call. This call will check that the offset is valid. The way the architecture works this isn't as slow as it sounds. You don't have address spaces or context switching any more iirc.
It's possible to do this and be compatible with C. The CHERI project out of Cambridge and ARM Morello both accomplish this.
As for performance, your CPU is already doing this check when it does virtual address translation. If the objects are finely grained enough, it's basically free.
I worked on z/os kernel for many years, this problem was solved by verifying pointers by making sure something in the operating system points to it. No storage is copied into directly.
Do you think it's still in use today or has it been replaced with hardware pointer verification?
@@YolandaPlayne It is still in use today. That code was written in the early 80s.
@@thecodemachine I've worked with compilers that use it compiling code from the 80's. Lot of hidden undocumented features in the compiler that made modifying the code error prone with no way to figure out why other than trial and error.
The video is more about memory safety within a single execution context, not really about kernel/user memory space isolation
The boundaries aren't really clear with modern processors. There is multiple contexts that could be considered kernel and user spaces, each with their own security features and simultaneous thread.
I’m using this channel to understand the spoken English, but idk why in my native language (Spanish) nobody is talking about these features, that’s cool man, you really know a lot of this stuff
I appreciate your content buddy
You can even put a Hash in it that has an expiration time, so dangling pointers don't stick around for days and days and days.
how you gonna define that time ?
and where u gonna get it...
ever had your mother board battery run out ?
@@stevesteve8098 It only has to be valid for the length of a kernel function call. It is not unreasonable to require kernel functions to reget memory if a bad return code is called.
So a stack alloc
Not as cool as my 42GB-consuming jvm tho.
I once found a weakness in this where you can use tail call optimisation and exfiltration and longjmp (if you can line all those up) as a pointer signing widget. Tail call optimisation causes your choice of return address to be rewritten with a predictable corruption of a correct signature before being passed to the next function, but this doesn't cause a fault, yet. If you can then exfiltrate that address then you can fix it yourself to get your pointer with a correct signature. After that you need to avoid the exception, which you can do via longjmp(). Next time around use the pointer you exfiltrated last time, with the correction, and return normally rather then via longjmp() to the destination of your choice.
Does anyone else ever fantasize about what we could do with our computers if half of our development time and half of the computational energy didn't have to be dedicated to security?
I fantasize about the day when base64 becomes obsolete. Also hex, but that would require bytes to be 4bits wide
No. What I fantasize about is what we could do with our computers if "hello world" would compile to less than ten gigabytes (with 29345 files in tow, all irreplaceably ESSENTIAL!) these days. And yes, this is hyperbole, but it unfortunately holds true for ANYTHING more complex than "hello world".
Simple fix. Switch to temple OS! No security, the user is king! But ya, we also have inbuilt protected memory addresses and stuff for operating systems.
@@AttilaAsztalos roll everything yourself. I built my own graphics framework to replace SDL2 in my FOSS projects. One of my builds dropped from 8 MB to 2 MB. I replaced GLM (which is practically an industry standard for graphics programming) with a custom linear algebra library and gained a 30x FPS boost in my UI. Granted, I don't need several thousand FPS for something that isn't a game engine, but it still amazes me how inefficient GLM was considering how widely used it is and how easy it was to reimplement.
@@vrclckd-zz3pv I would if I could. But I'm a chip monkey, not a code monkey, and condemned to remain so (yes I did try). On the other hand, the full-3D full-parametric CAD I use almost daily is a SINGLE FILE, and is, in total, smaller than ten MEGABYTES. It's called "Solvespace", in case you were wondering... Now, maybe, you understand why half-terabyte abominations make me mad.
Random bits on an integer isn't really encryption and doesn't stop anything from looking for pointers with undefined behavior.
sign != encrypt
Pointer tricks like this are terrible. Back when I was putting Linux username apps to aarch64 we noticed certain apps and libraries advise the upper bits of 64bit pointers. The problem was 64bit arm was enabling 52bit addressed memory... Some super computer somewhere required that much size, but I digress... We were not able to compile things like Mozilla warez because of these clever hacks. These are nice until one day the entire address space is taken for memory. Yes people, even on 64 but machines not the entire 64 bird are given over to memory, because that's crazy talk... But every 12 years or seems two more bits go to memory. These kind of trucks will eventually be gone.
on big endian systems like IBM Power with tagged memory the tricks do not eat into allocatable memory like that, only one bit per many bytes to hash the page table, in little endian the address is the 'wrong' way around so more overhead i think
@foobarf8766 that's not how I remember it working, not the big endian IBM machines with tested memory, but rather abusing pointers. Most 64 bit architectures simply don't use the full 64 bit, it's 48 bits, 52 bits, etc... so 12 or 16 bits left over for any number of tricks, such as pointer hashtag but it could be anything. Like I mentioned before, when we raised the bits boundary from 48 to 52, it suddenly broke certain things because a handful of people realized they could do pointer tricks. Perhaps this is where IBM power-be would help, with memory tags? Incidentally I was also on the team that bootstrapped IBM power9 little-endian to Linux, but I don't remember the tag stuff, but that's possibly because it simply wasn't broken such as this idea for pointer hashtags... We didn't have to fix anything in the port.
This makes sense, E2K uses 128-bit descriptor-"adresses" for secure mode, 64-bit adresses for normal mode, and 32-bit for x86-compatibility. And E2K has separate 4 bit tags per 64 bits of data/address.
Stack limit register might have been even better option. Imagine a hidden register that could be modified only by special, for example, "branch, link and limit" instruction family, to point to address of return address value in stack and any access beyond it that is not by 'ret' instruction will generate an exception. Should be cheap to implement at hardware level and transparent to use, but hell to implement in backwards-compatible way though, since having an extra value alongside return address would definitely be an ABI-breaking change.
sounds like a lot of overhead
I was thinking it would be baked in the compiler, and it is, just not for all the use-cases. It really looks clunky for, lets call them user introduced pointers, for a lack of a better term.
I was referring to the execution speed but yeah that too. Although I'm sure they'll improve the programmer experience over time..
compute power scaling is much bigger than memory speed, it probably doesn't hurt much + accelerator on chips doing hard work
for my favorite government backdoor
its just one more assembly instruction, it might be a few cycles slower than the return call that doesnt check but we are probably wasting way more cycles per call doing other kinds of checks to mitigate this problem. Resource hungry malware like Denuvo is being installed into hardware to stop these kind of exploits, if we can do away with any of those, we end up reclaiming wasted overhead.
Mint video
Isn't this similar to what CHERI is doing? I believe CHERI is a bit more exhaustive and/but requires the software to be aware of it
Correction at 5:34: the PML4, PDPT, PD, PT tables are NOT inside the CPU! The PDB inside the CR3 register is the only thing inside the CPU that bootstraps the Virtual2Physical address translation process (apart from the general purpose register containing the virtual memory address). All the tables are inside RAM, except for the TLB cache.
On x86 (32 bit mode in cr3, in 64 bit in cr4 iirc) , not on arm, there its probably in another register
Going to be a woot woot when we reach a million! Low Level, you've made a great community!
Once that buffer over floor has been patched, another one will crop up somewhere. That’s unfortunately life.
lol authentication overflow or maybe someone will write a code that is turing complete only using authentication pointers only like movfscator
Cryptographic authentication of pointers... that's, um, interesting. Refusing to make fundamental changes to the architectures we still use leads to all kinds of delirious approaches.
Completely separating return stacks from data stacks would solve 90% of all security issues due to stack corruption. You could still corrupt data, but not make the code go where it shouldn't. And if you add to that proper validation of your inputs everywhere it's at least moderately critical, you solve probably 99% of issues. There will always be this annoying 1% left, but it may be deluded to think we can get to 100% anyway. And, these changes would imply a bit more work for sure, both on CPU design and on software, so we prefer "enhanced status quo" (= do everything as usual, but with new tools).
one thing people often dont know is that the absolute minimum requirement to jump to an arbitrary address (im aware of other methods, but this one is very simple) in x86. all you need is 6 bytes, equivalent to"push [32 bit address]; ret;" in assembly. this jumps to an arbitrary address, as "ret" is surprisingly more complicated than one may expect
This has been a 'solved' problem on IBM Power for decades, in big-endian mode there are memory tagging extensions that use a hashed page table. Some other great stuff in Linux like wipe/fill on free and alloc works great on Power too. In little endian the overhead becomes heavier because check bits go at the 'wrong' end of your address so you need 'software' solutions like to XOR or put the check bits where they eat into allocatable memory. It's good that Arm are bringing this feature back into general purpose processors, it's been a thing on IBM Power for like 30 years.
5:38 Cool Babe Face
😂 I see what you did there
Thanks for the video - It’s a very interesting concept.
My only hot take question is theoretical and perhaps naive since I’m not well versed in security -
Since we expect compiled executables to be portable between machines that are of identical platform and spec, wouldn’t the executable require the decryption keys to be bundled in? And if yes, how would those be protected?
Holy shit... is signing in Ring 0. that sounds mind-blowing
I'm just learning, but would it be possible to limit the input that the user is able to input to prevent overflow before it can happen?
Yeah! That's one of the techniques that can be used to avoid buffer overflows: make sure you only copy as much into the buffer as space it has available.
Yes. However, he's just using the obviously-unsafe gets() as an example. Real-life code is much more complicated, and mistakes happen, it's not always that easy. For example many low-level protocols are of the form "2 bytes encoding the length, followed by data" Now if the attacker lies about the length of the data... There are tons of other ways to trick software to read more (or less) than it should.
You still have the problem that the CPU assumes that you are returning to the same place you came from. A buffer overrun hack abuses the CALL and RET machine code operations. When the CPU executes a function return, it assumes the top 8 bytes of the stack are the correct address to return to and will go there, and doesn't check if the value of those bytes have changed since the last CALL instruction wrote them there.
@@williamdrum9899 very well explained
What's the font you're using tho?
If I remember correctly, the IBM AS/400 and its descendants have done this for decades. Their pointers are extremely long.
Disclaimer: my memory comes from the dim and distant past. Please correct me if I am mistaken.
I have been under impression that we don't access physical addresses anymore? That raises the question:does row pointers perform jump instruction to the address they point to, or they move instruction pointer to the said address?
I want my pointers to have a tracking id and signature verification... heck throw in the insurance too!
@lowleveltv at what point does the pointer get signed? Is there an opportunity to change the pointer at a stage before it gets signed?
JWT is going to be brought to the whole next level lol
The pointer values are typically constrained to effective 48 bits by forcing the most significant bits to be all zeroes or all ones.
Not pointers, virtual addresses
“Babyface” got me
...Wouldn't shift-arithmetic on the pointer (sig = iptr >> 35) reveal it? And if the signing algo is public knowledge (it _will_ be whether it's intended or not), you could do shenanigans like (iptr = getSig(badptr)
@@WackoMcGoose you would need the secret key to produce a valid tag on attacker controlled pointer data.
It's stored on the actual value of the pointer, you only ever have access to the virtual memory address of your program. You ask to sign 00000000, the value in the return register, the mmu maps that to 00001024 where the first four bits are reserved for things other than addressing, it sets the signature in the bits it doesn't need but in your program you still read 00000000 thanks to the mmu.
If you don't use the mmu then you probably could, supposing the instruction is even supported in that mode, but it wouldn't make much sense to use real mode and care about this.
Ahh, I wish the title of the video was properly explored.
Naturally, it does a great job explaining what pointer signing is about, but I wish there was a discussion section in the video focusing on adoption and whether it ever makes sense to sign *all* pointers.
Cool topic though!
would shadowstack achieve similar?
Yes but there's a performance penalty since it's an optional feature. ARM Trustzone is a parallel system that is mostly invisible to the executing environment.
in the video it looked like it required addition code and it generated custom instructions in the asm? it also sounded like it does more work on the chip to sign, verify, and authorize all the pointers? shadow stack is just a compare of pointers in a seperate hardware pointer stack and doesn't require addition code, iirc?
@@nightshade427 Yes it's "just" that but it's not a separate system. It's built into the chip and it's dark silicon unless it's used.
That means sometimes it's using features that are LOTO until a normal thread can use it so that's the performance penalty which also precludes side-channel attacks.
If that sense, doesn't it means the "malicious pointer" still exists somewhere?
So that mal-function can still have a signed pointer?
How signing the pointer helps in that sense?
When it comes to this types of vulnerabilities, the C/C++ was always mentioned, but I wanted to know: does programs written on Rust, Python, Javascript (Node.JS/Deno/Bun) can be hacked by changing pointers? Or those languages already used some sort of encryptions and doesn't need more protection than out-of-the-box?
Hey stupid question that you probably won't see but somewhat relevent to the video : how much of a security risk from cuda allowing to create memory buffer that is visible by both the GPU and CPU ? I'm asking myself that question because ( from my understanding) it's allocating memory and directly give access to the physical adress of the memory pointer so that both the GPU and CPU knows exactly where the data is.
The gpu drivers then is supposed to keep this memory range allocated (cudaMallocHost) until you dealocate it(cudaFreeHost) ; however I remember a while back a bug that the drivers never dealocate the memory even if we deallocated the memory in the program.
The result was that my benchmark program for gpu testing was able to crash a 128 GB computer with 2 xeon and 2 top of the line GPU at the time with SLI split in 2 numa banks. Also worked on a standard laptop. The test was benchmarking specifically targeting memory transfert between CPU & GPU and running it engough time it was able to reserve most of the memory to itself even after we stop the process.
I followed the RAII principle using classes for all of the GPU related memory aquisition and using the standard library for anything else. I even mannaged to make a template that is able to alocate the correct type of memory for a CPU lambda that you want to send to the GPU, however the issue was still there afterwards.
You can use IOMMU to ensure that the CPU and the GPU see the same addresses. Support is improving!
afaik microsoft's kernel allocator doesn't actually zero on free but that's optional on linux too, need to turn it on
Is pointer signing enough? It's still possible to do arbitrary function call:
1. Find an exploitable function.
2. Somehow read signed pointer to that function.
3. Overwrite the stack with a valid frame mimicking call to that function.
4. Overwrite LR to our signed pointer.
5. Return.
I once coded an array of pointers to functions. It was tricky. I'm sure pointer authentication will increase the complexity another notch or two. At least pointer authentication is an option at this time.
I feel like I am missing something. Would it not be possible to "fix" the pointer by overwriting it with the original address before "retaa"?
The whole point is the hacker wants the program to return to an address they specify. If you put the original pointer back then that won't happen. You also can't just put the "signature" (top bits) on top of your own pointer as that will fail the check.
This would require a separate data structure/memory location to store return addresses in. Forth does something similar with its distinction between the data and return stacks.
Say the programmer was holding a function pointer in a stack variable and calling that later. Having a separated stack isn’t going to stop the user from getting an arbitrary execute primitive if there’s a buffer overflow on the stack. I don’t think the extra security potential it adds outweighs the performance impacts that might have.
@@imciviled So I'm not an expert on this at all but I don't think you've described the mechanism correctly. My understanding is there is no separate pointer storage. The pointers are in the usual places but simply with additional decoration (hashed high bits). This decoration is done by the kernel when requested and when the function return pointer is popped with the secure version of that instruction then the kernel verifies the decorated pointer and throws an exception if it doesn't pass. You are right that this will certainly impose performance restrictions simply due to the extra work needed to compute and verify decorated pointers. There will also likely be cache locality impacts.
1:02 that is not a pointer to data, that is a function pointer!
It's all just bytes
@@Masq_RRade With different access permissions...
@@Masq_RRade Well sure, even an address is data, it's all data, it just depends how you interpret it. :)
@@roeetoledano6242 The page can be read/execute, it can be read/write, doesn't matter. It's all data. It's all just bytes
@Masq_RRade@@nowave7
Offtop: There is no such thing as 'it's just bytes, you can interpret it however you want' in C, because strict aliasing rule, at least not without using functions that implicitely create objects like memcpy.
what is your colorscheme in vim? I thought it was kanagawa but yours is more readable than mine. I mean eye friendly. Cool video. take care!!!
Seems like this would interfere with NaN packing that many dynamically typed languages use in order to store pointers and other data types inside the diagnostic information portion of NaN floating point numbers. (Basically turning 64-bit floats that are NaN into enums that store both type information and value).
20 years? We've had something more secure since 1963, and we're now starting to see it in consumer CPUs with CHERI and ARM Morello. Much stronger than simply signing a pointer, capability hardware can actually enforce memory safety (and even capability safety), and when well designed it's even faster than a traditional MMU.
Yep, CHERI is a great approach. As I said above, add to that separating return stacks from data stacks (which I think should have been done long ago) and you eliminate most issues. Separate return stacks are not a new idea either. They tend to make things slightly more complicated, but not outrageously so. I think there is little excuse not using that these days. And yet...
That's one simple approach! There are a range of protected stack models that have worked well. The technique on the B5500 is inspirational even if it doesn't directly translate. The BiiN/i960 used a simple model that matched RISC register window usage too. @@joseoncrack
Someone explain what I am not understanding. Can this not just be implemented across all architectures when the code for physical and virtual memory is written? Why is this specific to ARM?
While the operating system kernel plays an important role in memory mapping, most of the heavy lifting is done by the memory management unit in the CPU. This is essential for good performance. That means that some features need support in the MMU hardware to be effectively implemented.
When they inject instructions, couldn't they just remove the authentication instructions since they're changing code paths anyway
The really secure way to hold return pointers inside of registers instead of the stack registers, I am too lazy to search for the source but I am pretty Intel and Arm both work on this
certainly makes sense, but also only helps for the immediate return of the current function. All layers above would still need to store their return pointers on the stack.
Another mitigation is implementing a shadow stack that only holds saved return addresses. Calling a routine pushes both to the actual stack and the shadow stack, returning from a routine pops it from the stack and checks against the shadow stack.
This is already implemented by Intel processors
5:40 - it’s pretty common for the virtual & physical addresses to share the bottom N bits (ie. They aren’t remapped). Where N relates to the page size, as in 2^N. Your example is different - maybe you could have used 0x9e90 vs 0xface in the physical example? Actually for a 4K page side that’s just the 0xe90 part (12 bits). #JustThinkingOutLoud
didnt the xbox 360 had something similar where the cache was used to hold the signature of the memory to detect tampering?
Without watching the video, The MSB will become the signed bit. When the pointer is then passed, chaos will insue if the MSB is > 1.
Can't believe we're still talking about buffer overflows 25 years into the 21st century
seee? hehe. that was a dumb joke.
As long as your CPU blindly dereferences the stack pointer and jumps there you will have this problem
@@williamdrum9899
we've had 40 years to learn to NOT do that
Reminds me of the IBM 7040 which could set a word without updating the parity bit. It was used to detect uninitialized variables.
Everyone in security: You should never use gets.
People from other languages: So they're going to fix it, or deprecate it, right?
Security people: :)
Other people: .... RIGHT?!?!?
*_0xc00lbabeface_* you got me xD
16 bits is very small for a MAC, what's to stop the attacker from just brute forcing it?
A failed authentication causes a program crash. If an attack crashes your program 9 times out of 10, the likelihood that the vulnerability is detected and fixed before any real damage is done is extremely high.
btw what neovim are you using?, coloruscheme looks great
I wonder if 128 bit machines are going to be come sooner than we think. If mitigations like signed pointers will become a thing, then the more bits in addresses etc, the better. The more bits for encryption, the better right? Or I guess, the better the algorithm, since FS EC is less bits than RSA right, like 256bits upwards?
I heard that 128 bit is verry ineffuciant which is why they dont exist today
Unlike normal encryption, pointer encryption is not the first line of defense. A single wrong guess crashes the program, so any attack that doesn't always work will be detected very quickly and the vulnerability in the code can be fixed. This means that the encryption doesn't have to be very strong.
The downside of using more bits is of course that at the very least one needs more silicon area on the CPU chip. In addition larger hashes take longer so there can be a performance penalty. As long as the hash is quick enough that doesn't matter because memory access is slow compared to other CPU tasks, but if the time rises above a certain amount of clock cycles there will be an impact. (Kind of like putting an extra object into a box that still has space is no issue, but if the extra object becomes too big you need to get a bigger box)
I wonder how this handles systems where the programmer i.e. has an array of some struct referenced by a char pointer add 3*sizeof(the_struct) to some position in the array to get the bytes of the 4th struct instance in the array after the current pointer.
I'm curious about how that signing works. Maybe it works against overwriting pointers using byte-by-byte or number over pointer writes. But I doubt it can save us from situations when code already writes pointers based on input.
It sound more like randomization of memory layout, but with more predictable behavior.
ARM already has similar spec called "memory tagging extension" what is advantage of this approach vs mte ?
Lmao time to create a "memory traversal hashing algorithm"
Dig down to every memory address and hash them to check
yeyeh, yubikey is awesome, but the services that don't allow you to register a second device as backup just suck.
Firstly, as someone has said, I wonder what kind of impact this has on code execution. Because this is entirely done in hardware, from what I've read, I think the performance might not be that bad.
However, I also KNOW that this kind of pointer authentication has already been successfully bypassed.
Moreover, if you can modify the return address, there's a high likelihood that someone can modify retaa so that it returns without authentication.
About your last point: retaa is an instruction located in the part of memory where the program is stored which is usually write-protected. While the actual return address is simply a value on the stack. If an attacker manages to remove write protection on the program they already have arbitrary code execution and don't need to mess with returns anymore.
@entcraft44 That's a good point about execute versus write. Although, I do this all the time on the x86 architecture with WriteProcessMemory().
The Intel iAPX432 from 1981 and Capability based memory protection from the 1970s has entered the chat
What would stop the hacker just overflowing the buffer to replace `retaa` with `ret`?
The buffer is stored on the stack, the same part of memory where the return address is located. However the instructions are stored in a different (usually write-protected) part of the memory so they can't be changed by this kind of attacks.
Maybe I'm missing something, but what prevents an attacker from exploiting something else to get their pointer signed? The clang API has to result in actual machine code; why can't they execute the same code?
I suppose that if currently, the most common path to exploitation is rop gadgets, then you can't start the execution of those via retaa. But I'll be surprised if no other path exists.
@bryankadzban1159 If an attacker gets the ability to execute code, this becomes an irrelevant meassure.
However, this is supposed to prevent an attacker from gaining execution in the first place.
If the way an attacker wants to get execution of their code is to change a pointer to anything else, this prevents it.
The attacker doesn't yet have gained execution to ask the system to sign the pointer, they just have the ability to write an essentially infinite amount of data that they want to use to manipulate the way execution happens.
They can't pre-sign the pointer due to the key necessary for signing being part of the system. So in the end, if implemented correctly, an attacker never gains access.
when i was younger I thought for a while that signed integers have a signature (:
I don't agree. I feel we need a new executable format entirely with a new kernel memory layout that is more secure.
PE/COFF and ELF both came out around 1992 or 1993.
about 64 bit. aren't our CPUs 48 bit or 57 bit for practical reasons?
Isn't this the same exact mechanism targeted by the PACMAN exploit for M1 macs? Did ARM fix the vulnerability?
Does the program crash if someone tampers with it? Then it would be a quick way to exploit and crash other programs like anti maleware…
If not 29 bits is not enough, as you could just keep guessing and within a few minutes you likely gain an overflow
The program crashes. But an attacker can only crash the program that has the vulnerability, not other programs. They could crash this program anyway if they wanted to.
Signed pointers seems like a great idea, if only you didn't have to do it yourself in the source code. A compiler extension would be better.
I don't get it. I thought all of this was prevented by using virtual memory? ARM is used for lots of embedded systems, so maybe there is some benefit there where you don't already have segmentation faults.
Promotion of USB Key is kind of hilarious as you can think of the Attack Surface they have themselfes... and the fact that a compromised system possibly can use them nevertheless.
Who control the key?
I need all my gigacycles.
MTE (memory tagging extension) and memseal (make memory essentially unmodifiable) are better approaches to exploit protection in my opinion
##WHICH theme he is using in vim
See MIT's PACMAN attack paper from 2022.
How can you disable NX with a simple write? You would need to write to the page tables in kernel memory, which wont work unless you already exploited some other bug….
Remapping the stack using mprotct
@@rodneynsubuga6275 please explain how mprotect on the user stack gives access to the kernel page tables
modern CPUs and their NPUs allow direct memory access. So they seem like a perfect attack vector for privileged hardware (inside the CPU) that can run code easily... Like from a website doing some NN inference locally via webNN or something.
Memprotect is broken and will be fixed… with all pages handling user input permanently marked as non-executable in addition with a memory safe language there is much security to be gained. Hashing pointers needs hardware AND compiler support, it‘s going to take time.
So the MAC of the pointer is stored in a memory address, hmmm, Lets just compute our own MAC for our malicious pointer and store it in that memory address. Problem solved, hack continues as normal. Just a speedbump.
stdio means standard data input output / standard input output
Rebrand of a rebrand ?
Wouldn't it be simpler to just create an instruction for assigning pointer ranges (and an instruction for removing those ranges)?. The CPU can just store the addresses in a dedicated cache or dedicate a part of existing cache for the addresses. It already has something for page ranges so why not something for pointers within those pages. Personally I would like dynamic memory from malloc/new/etc to only be accessible via dedicated functions like memcpy or something. For example if you create a list with malloc the memory given should not be directly addressable but instead to get any element in the list you'd have to use memcpy to copy it into a stack variable.
I'm currently designing an arena allocator with just such a property, it doesn't hand out addresses but offsets from the base address. My malloc replacement I made for testing purposes just adds the base address back because I can tell the arena on creation AND on growth that I don't want it to move. At the moment it uses actuall memory for it since I need to iron out bugs but later once I've got it working I'll convert it to a variant that uses something similar to /proc/self/mem. If I could make access to the file/pages require a key then I could protect them both from externally controlled read/writes. The problem would be how would debuggers be able to work on it then 🤔
It's called CHERI and it's still being developed. On 64bit it extends pointer to 128bit and encodes some properties (length and allowed access) into the pointer itself. Unfortunately there's ton of poorly written shovelware that assumes "void *" is the same as "unsigned long" and lots of code that needs to be recompiled.
Am I a traitor if I turned my back to Rust in favor of C++? I have heard it is gonna get compiler features similar to a borrow checker.
Thats the drawback when going from CISC to RISC. This could be easily handled hardwired or in Microcode. Now we have to take all this shit
Okay, but did you see the Reddit post of the guy that bypassed his Yubikey with a paperclip and a piece of aluminum foil?
Interestingly I first herd about this at university, was in a security course. Didn't really go much deeper than this tbf.
What if RET wasn't "JMP RSP"
so many reasons to hate on C, and all valid. always thought in the early 80's that compiled basic should have been better backed.
Very cool information.
excellent video.
29 bits isn't exactly a "strong" MAC. It's better than nothing, I suppose.
MACs are awesome though. Stick em on the end of a public facing ID and you can reject requests with inaccurate/mistyped/brute forced information without needing to check your database if the information was correct 99.99999% of the time. Can be applied to license codes, API keys, or any sort of ID that you distribute to another party or system