Note that this does not apply to all architectures but is specific to x86 (and maybe some others). For example, on Arm, calls ("branch and link" instruction bl/blr) put the return address into a register (x30). Thus functions must push the return value on the stack themselves before calling another function (functions that do not call other functions do not need to push it as the register containing the return value is never overwritten). The return on Arm is just a normal branch instruction, but encoded in a different way such that the CPU is aware that it is a return, not just a normal jump to a register (this enables microarchitectural optimizations like a return stack buffer to speed up execution). I really liked the video though :) It would be nice if you can talk about out-of-order CPUs at some point since understanding (at least having a rough idea) how modern CPUs work prevents some optimization pitfalls one may fall into when just thinking about a "traditional" Van Neumann Computer. And it is the gateway drug to the world of microarchitectural optimizations, side channels, and transient execution attacks :D
In MIPS... there's a jump and link(jal), which also saves the current pc+4 to a register 31(or $ra)... and it's responsibility of the subprogram to take care of it(pushing to the stack if it needs to call another function and popping it back). By convention, registers 4-7(or $v0 to $v3, inclusive) are used for arguments, registers 8-15(or $t0-$t7) are used for temporaries and may be modified by inner functions calls, while registers 16-23($s0-$s7) are to be preserved at the return(if you really need to use them, you should push their values... and pop them back when you are done) Finally registers 2 and 3(or $v0 and $v1) are the convention for the return.
I took a course in IBM 360 (370?) assembly in 1982 and my second favorite instruction was BAL - branch and link. My favorite was ZAP - zero and add packed (packed decimal), because, well, ZAP.
@@matheusjahnke8643 That convention is the o32 ABI, there's also the n32/64 ABI which defines a slightly different usage for the registers: Arguments are now from $4-$11 ($a0-$a7) Temporaries are from $12-$15 ($t4-$t7) The callee saved registers are the same, $16-$23 ($s0-$s7), plus the others ($gp, $sp, $fp) And the return registers are also the same $2 and $3 ($v0 and $v1)
The Arm calling convention expects the callee to preserve registers r4 to r12. Complier people know this so they would do this extra work (pushing and poping the registers in the callee if it requires extra registers). R0 is usually used as return value. Actually the first few registers as passing parameters
Gotta admit that Assembly was very challenging for me, I think the most challenging task in my life, because it feels so alien. However I'm very slowly starting to make sense of it. This is a testiment to perseverance. No matter how difficult you think something is, continue. Take your time, try multiple times, look for different sources, whatever, but don't quit, ever. You'll get there, and after some time you'll look behind yourself and will realise how much you've learned. Thank you for this awesome video and helping come a little bit closer to really understand assembly.
Assembly is very easy if you start thinking about it in terms of physical circuits and the mosfet transistors and all their electric fields and depletion regions. Basically remember that it's all physics in the end, and assembly occupies some intermediate step.
It’s a lot easier to comprehend when you learn with computer architecture and organization. It’s why a lot of colleges combine the two classes together nowadays it’s usually called “Computer organization and Assembly” or something along those lines.
@@fackyoutube8452 very true, I remember going through the whole organization of basic 8086 arch and the registers and their uses etc and then it made much more sense what each instruction is doing and where its storing stuff and how its storing and updating etc
Basic understanding of assembly is important when using any language that compiles directly to assembly. A lot of programmers, even good ones, don't have a clue as to how some of this stuff works. My basic recommendation is to learn to read some assembly by taking a peek at the generated output of a simple program. Look up each instruction one at a time and work out what is happening. It is amazing to me how few programmers don't know some of this stuff.
I learned a lot of assembly by doing exactly this but then going in, cleaning it up, adding comments, and gutting half of it because gcc generates some really inefficient assembly. Seriously, I've seen gcc do this: mov %eax %ebx mov %ebx %eax multiple times. Still haven't figured out what possible purpose that could have.
A few things to add. Floating point values on x86 are returned via the first scalar of xmm0 register. In general the way data is transferred between caller and callee is defined by what's called "Calling Conventions". There are a bunch, and they are determined by the the hardware architecture and the operating system, among other things. Fascinating stuff really.
Calling conventions is something that you have to have in mind when you are calling a function from a different programming language. It is quite fascinating to delve into. Another term worth looking into is ABI or Application Binary Interface
If the language is higher level, there's likely a calling convention layered on top of the architecture's convention, especially if it's a multiplatform higher level language.
The calling convention (stdcall, fastcall, cdecl) also controls how arguments are passed to functions and who's responsible for cleaning up the stack after returning
Do a video about stack unwinding on ARM next. I had a FreeRTOS fw that was running on ARM, and would dump the contents of the stack into the UART. We used this to reproduce the call stack. It was an interesting task and I learned a lot. Could honestly be a direct follow-up to this video.
Oooh. I know this one. Its the function epilogue. Functions get compiled into some bits and the function calling conventions have a prologue that stuffs the current program counter pushed to the stack. Return just pops that value back when the function returns. This is why recursive function calls can overrun the stack
Call stack has a limited size on most platforms. On Linux the default is 8 MB. So if a program uses more than that then a stack overflow error will occur.
I remember the 6502 architecture had a hardware stack of 255 bytes. I dimly recall you had to push/pop two numbers. Exceeding this stack caused a Stack Overflow fatal error. Not sure how newer architectures do it, I imagine there is a stack in RAM which could be any size or location. Don’t forget that the operating system also uses the stack.
@@darylcheshire1618on Monday modern systems every thread has its own call stack. They're randomly placed in memory to avoid hackers from accessing them.
@@darylcheshire1618 Modern systems typically use the upper portion of heap memory, if your stack gets two big it can start to clobber the heap but this doesn't usually happen except with a botched recursion or the like
More detail: 1. strlen can store the i variable in rax and does not need a stack frame at all. 2. even though bp is supposed to point to the start("top") of the stack frame, in practice it is often used as just a general-purpose register.
an empty string is not empty, it's an array of char that ends with /0, the char that mean string ending, so "abc" length is in fact 4, ('a', 'b', 'c', '/0'). After the call it says "The string is __ bytes long", bytes not characters.
@@MT-cf2ms that's true. but calling the function `strlen` and giving it different semantics than the C library's strlen function (which returns 0 for an empty string) is cursed behavior. calling it something like `bytelen` would be a bit more acceptable. Or changing that text and fixing the bug would have worked too.
The stack is upside-down, at least on the x86 architecture. SP actually starts at the top of the memory segment (whatever is assigned by the OS), then decrements with push/call instructions and increments with pop/ret, so data is actually stored in reverse order. This is why overrunning a stack-based buffer can overwrite the return address (which is stored at higher memory addresses than local variables) and be exploited in buffer overflow attacks.
I couldn't help but notice that the strlen() implementation shown in the video has an off-by-one error. It always returns the string length plus one (i.e. it includes the null character).
also interesting (and useless) fact: when returning a struct from a C function (that's not gonna fit into 32/64-bit eax/rax register) a compiler allocates the struct on the caller stack and passes the pointer to it as an extra hidden argument to the called function (callee), which is also what one would typically do in C when wanting a function to "return" multiple values. On asm level both do the same thing.
@@SirusStarTVwhich functions? If you’re referring to libc ones, it might be because they were written before the optimisation discussed above was implemented. Just speculation
Yes. Yes, I'm sure. I've designed several language compilers and interpreters. And the runtime architecture presented is not the only one possible. E.g. you could pass parameters and return values on a separate data stack and have a return stack for the return addresses - just to name one.
One thing I'd like to pointt out is that if you see something like "RET 8" in assembly, it is NOT the same as "return 8;" in C. In assembly, a number after RET tells the CPU how much to adjust the stack before/after returning (I forget which)
It adjusts the stack after popping of the return address into IP in 80x86 systems. Also, whether the value passed back is in the register or on the stack can be language and compiler dependent even on 80x86 systems.
As Alex said. When the calling code needs to get more parameters to a function than the CPU can hold in its registers, one strategy is to put them onto the stack before calling (another one is to put them somewhere in memory and have a pointer to that address in a register). So the stack would hold the parameters, then the return address, then the function's local variables. When returning, those extra parameters need to be removed from the stack, too. The calling code can do that manually (it put them there, so it knows how much to remove) or the function can use a RET that can do that and let the CPU do that for free.
That depends on the calling conventions. Some architectures have a “frame pointer” in addition to the stack pointer; IIRC the frame pointer would save the value of the stack pointer just after a jump to subroutine. Then no matter how much data the function pushed onto the stack all the CPU has to do on a return is copy the frame pointer back to the stack pointer to get back to where the return address is on the stack.
Would love to see some kind of reverse engineering series in which you go step by step, reversing a real-life application and explain every difficulty that you come across.
We did something similar to that at Uni. We were given C code and had to turn it into MIPS (the assembly language Playstation 1 was written in). For small programs it's okay, but for a real-life application written in a high-level language like JS or Python, it would take weeks, months or even years I imagine... Our C was really simple, for example, figuring out if a number is a leap year. Assembly is fun, you can just "go to" whichever line you like lol. I found it really enjoyable but I think C runs almost as fast, so cost-to-benefit ratio-wise, it isn't worth coding in assembly. Check out Rollercoaster Tycoon from 1999, as it's coded entirely in assembly which is pretty cool :)
@@FreeWithinMe I've learned assembly but I barely understand C, and often with C, I have to figure out how to get the toolchain working with makefiles etc... it's a huge pain. I'm not sure which route to take since most people learn C before asm and I ended up doing the opposite
I don’t want to program in asm anymore if mov rax, 0 is faster than xor rax, rax. Seriously, since when moving an immediate value to a register is faster than a logical operation? How fucked up is the x64 ISA now?
@@YarisTex actually, its not really about speed, even tho that may be somewhat faster (I am not sure that its even faster lol). Its about saving bytes, since `xor eax, eax` takes less bytes to encode than `mov eax, 0`. And even that is not the reason why compilers at `-O3` optimization leves use it (since at maximum speed level, the compiler kinda disregards the size of your code), its about register renaming. The `eax` 32 bit register is actually a couple of dozens of registers which get assigned as the "current eax register" dynamically at runtime to go around false-register-dependencies in your code and this increases the speed of execution DRASTICALLY. So the `mov eax, 0` instruction itself has the same speed, but the next couple of instructions will sometimes run a little slower because it uses the "wrong" register.
Read the title, but have not yet watched the video... It pops a value off the stack and writes that value to the instruction pointer. In the case the return statement is paired with a value, IIRC, the value is stored in memory and a specific register/pointer/flag is set to indicate there was a value returned from the subroutine. In general, at least; different architectures might handle things a little differently. After watching the video... I was more or less correct. What I referred as "the instruction pointer" is sometimes a counter in different architectures, but I covered that at the end with the "in general..." bit. My explanation of the returned value was vague/broad because it's been a while since I've dealt with low level code. Good to freshen up every so often.
Getting into this video i thought "doesn't assembly just jump to functions with CALL and get back with RET by reading the return address from the stack?" And I was right apparently
Out of love and interest for a specific game I dove deep into it's assembly code using debuggers, even though I had no previous knowledge about assembly. I learned so many things just by observing which was a wonderful and fun task, and it's nice to see that these things are actually universal at least for x86.
I've found most asm languages easy to learn once you've learned any one of them. While they have different instructions etc, they broadly work in a similar fashion.
@@jemmerl Don't know if RUclips is glitching right now but my reply seems to be gone. I am talking about Maniaplanet, the platform for the Trackmania 2 games.
I believe that for m68k the instruction is not "call" but rather "jsr" or "Jump to SubRoutine". I used to make my own games for Sega Genesis when I was 12 years old 😅
The best way to learn this stuff is to implement your own/sort of programming language. I never completed it, but you had to think about go to instructions to implements ifs and loop statements and return statements and pushing parameters to functions.
I built a toy compiler for university. The bugs that you can be FUN: like where I was indexing the stack upside down because I forgot what direction the stack was growing 😂
I absolutely love low level stuff. It's sad that we only learn the basics of ARM and x86 at uni. Great video. (and yes, I was sure I knew how it worked :P)
There's a very valid reason why we develop high level abstraction on top of low level architecture like x86 and ARM - to solve more complex problems that will take forever to accomplish using assembly
I totally agree. I used to be a teaching assistant in my old uni and we had a class where we covered assembly. It was so much fun. It was so annoying that the Uni did more advanced architecture classes for final year undergrads only after I left 😢
@@anirudhkumar9139 i'm more interested in the system behind all these abstractions than programming anything useful with them 😄because i feel stupid trying to use them, like what are they even do behind the scene!? if i don't understand it i can't properly utilize them.
Direct access to memory is fun, you can invent drawing algorithms yourself instead of relying on others. Of course it's easier to use those shape and lines drawing functions but you would forever be speculating on how they work (if you're curious).
It's also not correct to say the return value goes into a register. It's more correct to say "It goes into the register - if it fits". Keep in mind you can return structs or other potentially very large data types, which are allocated on the stack. In these cases, space is actually allocated on the stack frame of the *calling* function, which is where the return value is copied to. This is usually not done, because moving large amounts of memory around on the heap is not a good idea - but it is technically possible.
Hrm... The "are you sure" assertion made me expect some interesting edge cases, but it didn't even mention how floats are returned, not to mention stuff like tail calls, returning values larger than registers, why larger epilogs are sometimes needed, or link registers on ARM. Apparently I was more sure than the clickbait title. 😕
The (are you sure?) part clickbaited me haard. Last semester at uni we were implementing a full pipeline compiler for a subset of C; lexer -> parser -> syntax analyzer -> assembler; and it was a very satisfying process with a nice final result which I would recommend to anyone who is curious what happens under the hood of everything. Of course you can only grasp so much, but the general concepts are all there.
Yes I do. And a much simpler way to understand not just “return” but every other component of a high-level language (like arrays, function stacks, etc) is just learn MIPS.
@@williamdrum9899my first processor was the 6502. I quickly learnt to program it in hex without op codes as the computers I generally used had no assembler nor disassembler. The 6502 has some interesting quirks. Its little endianness speeds up processing of calculated addressing: it adds the offset to the LSB whilst the MSB is loading, and only needs the extra cycle to increment the MSB if a page boundary is crossed. Due to synchronous memory accessing whilst the 6502 is deciding and incrementing the MSB if necessary, it loads the byte at the address found so far. If the MSB needed changing it throws this away and loads the correct byte. It is this synchronous memory access that allows a read to be a cycle shorter if a page boundary is not crossed, but to prevent memory being splatted over a write always waits until the correct address is confirmed before writing. The incorrect address is the same offset in the previous page of memory. Normally this may not be a problem, but if that page contains memory mapped IO a side effect can be to trigger hardware that relies on just a memory access (the Apple ][ had some such memory locations). A write does a read before the write so writing to such memory locations results in a double access. The bug in the JMP (I) instruction is due to the LSB being incremented but no check being made for overflow to 0 from 0xFF. For a JSR it also stores the return address as the last byte of the instruction, so that when a RTS is executed, after loading the return address it needs to increment it for the next instruction. This is something to do with the "pipelining" the 6502 does as part of its synchronous memory access whereby it is finishing off the previous instruction when it loads the next instruction.
I better understand it considering that I studied two courses of computer science (based on the well known book from Patterson) and I had to learn the basics (architecture and assembly) of MIPS-architecture. I don't know the details of X86 assembly though.
It's a bit different, in that there are fewer registers and more specialized commands. Also there are much fewer limitations on addressing modes (you can use register offsets for indexing unlike MIPS)
When you call a function, the return address is pushed onto the stack as well as any variables being passes into the function. you jmp to the address where the function lives, execute the code there, and the last few lines of (machine code) of the function should pop all of the locals off of the stack and JMP back to the saved return address. Now lets see how wrong I am(have not watched the video yet).
@@ayoubedd The only assembly code ive learned is LC-3 (stupid oldschool). I cant remeber the exact order i do remember we moved the program counter, as well as the local vars for the function directly into our registers (of which we only had 7). Its good to review.
Abstract machines almost always work in this way. In real world processor architectures there are myriad of variations, mostly regarding the use of registers (because they are faster than memory based stack, although there are architectures where the pot of the stack is aliased to a register bank). However, to make this useful to viewers sufficiently "naive" not to know this already, an explanation of the concept of stack (or LIFO) as a data structure is in order.
I think the idea will 99% of the time be the same for all architectures, but they all execute it a bit differently. For example: MIPS Assembly uses 'jal ' as the function call. That instruction writes the PC value into the $ra register, then writes the function offset into PC. Then, the 'return' would be 'jr $ra', which would write the $ra value into PC :) (basically a jump function but using the values stored in $ra as the address)
I believe the 6502 would use JSR, jump to subroutine, and push the PC onto the stack. RTS, return from subroutine, would grab whatever 2 values were on the stack and jump to that address.
I just got done with a computer architecture class where we learned about the risc-v architecture. 2 minutes into the video I realized I already knew how return works lol. Nice video
Just finished my Assembly class, so I did in fact know about how "return" works. push pc onto the stack frame pointer stores the location of pc ... code ... return values go in r0-r3 registers use the frame pointer to bring pc back
Thank you for this. I tried to elaborate on the semantics of `return` (and the case of Rust's implicit return for expressions) to my classmates but failed to wrap the concept in words.
The "are you SURE?" had me doubt if I actually knew. But nothing new here. Pre-stored return addresses cannot work either for recursively called functions. What I do wonder about is how the return of multiple values work in languages that allow it. Would the calling function allot some space beyond the base pointer for the "out" variables?
@@gideonz74b Ah, I am sorry, I was thinking in high level languages. While I don't know for sure what it actually is in terms of assembly, I guess they probable return something similar to what they return for a class. Likely an address that points to the tuple data.
@@Beliar_83 The whole point of this video is how the return statement in a higher language actually works under the hood. So, any other higher level (compiled) language would need to have a pattern to translate tuples to. A pointer to an object is highly inefficient, as it would require memory allocation. Therefore, I mentioned that I imagined that the compiler reserves some room on the stack to place the return tuple into, but this stack space must belong to the caller, obviously. It is possible, because the compiler knows the return type compile time, and it is the same for all paths exiting the function. The return type is defined in the signature of the function.
Yes I do. return tells the architecture to pop the stored program counter off the stack and replace the current program counter with it, while also moving any return values into register(s) specified by the architecture as return registers... *video* Yes I am sure. Yay for learning ARM in my youth and never using it.
OK, that's a good introduction to dynamic call stacks, but that "are you SURE?" in the title misled me into believing there was something there that would surprise programmers like me who already think we know how return works. At least you could have explained the purpose of the frame pointer (bp), because I still don't know why it exists.
The frame pointer is used to save the value of the stack pointer at the time of the jump to subroutine; i.e. it points to the return address on the stack. This way no matter how much extra data is pushed onto the stack the CPU knows exactly where to pop back to on return, _and_ the function can use it as a base reference for any call parameters passed to the function (which would have been pushed to the stack before the JSR.)
Some constructive critisism if welcome; in my experience, people who need or want this concept explained often don't understand what a stack is, or why that type of datastructure is used here. I think you could do a followup about how a stack generally works, that it's useful because it models a stack trace naturally, and perhaps a bit about why it's also preferred because physical machines can easily implement it in hardware.
I was able to follow this reasonably well because of how carefully and methodically you speak, but I laughed when you said "don't worry, it's not complicated"
I got interested in assembly as a kid while learning to program. Never wrote anything in asm, but I learned how some C calls are broken into instructions, and the whole concept of the stack and registers. So my intuition was basically the same thing that this video laid out. Just wasn't sure if the return value was going on the stack somehow or a register, but the register made more sense.
The reality is that often in Assembly you're free to implement these conventions however you like. Some architectures like x86 have a "call" instruction that makes you jump to a particular address and also puts the return address on the stack, but really you could just jump and store the return address in a register instead. The reason to have conventions, however, is that if your program is calling a C function it is often expected to follow the "C convention" (in the early days a "Pascal convention" was common on Apple Computers). And some architectures like MIPS just declare what the convention for programmers are, but it isn't really enforced. Some instructions may assume that certain values are stored in certain places though. But Assembly is absolute freedom.. make as big of a mess as you like!
Maybe you should add that ... the calling convention (and I include the value return here) is just that. A convention. When writing assembly yourself, nothing stops you on completely ignoring this, like abusing MMX registers for storage of ... e.g. a global pointer to your hand-written memory manager, using r14 as the first parameter (now that'd be weird), jumping into the middle of another function, manually altering the stack pointer and returning from an entire callchain at once, you name it. The drama of debugging is only limited by your creativity 😄
This is really only valid for a tiny subset of cases, and leaves off an explanation of different calling conventions and architectures which have different return patterns. On top of that oversimplification your implementation of strlen() is actually wrong. Forget best practices, at the very least return a correct result and use the correct function signature: size_t strlen ( const char *s ) { const char *t = s; while ( *t ) t++; return t - s; }
Basically, return and function calling in general works as if the called function had an implicit return address parameter. It's filled by the caller and used by the callee. Named return value optimization (NRVO) does the same for the result: the caller makes space for the result and the callee has an implicit return parameter that is bound to the address of the space for the result.
How does it return to the right place... Ah yes. The short version. Pointer on the stack. a lot of attacks with active code injection works on running a series of NOP instructions until it hits an instruction pointer that we injected to a place other than intended. That essentially boils down to treading into unsafe memory using a buffer overrun. The only reason I know the return statement and how it works as well as I do is that I've had formal programming education, and one of my instructors demonstrated how a lot of keywords worked, what they actually do, and how they have been historically abused to get to memory that is otherwise not belonging to the current program.
On a 6502, I wanted to copy a disk volume to a RAM disk, the disk was then ejected leaving the RAM disk as the only boot device. I wanted to reboot the machine, this was achieved by a small machine language procedure which threw away the return value causing the reboot.
Funny enough, for the longest time when beginning learning about OOP, "return" was the most difficult thing for me to wrap my head around as previously in learning "echo"ing was the "alternative" used with "returns".
Guess a quick solution would be to create an small if statement every time a user inputs the index location to confirm it’s within the length of the array. Bit tedious but a safeguard nonetheless
What if the function is declared as void on its declaration but then you actually return something with a different declaration at linking? Did I explain what I mean? Like, u define in an object ur void function and in another, u declare ur non void function with same name and then link them, how does it handle that?
Noticed this when I tried to learn about game devlopement, and whenever I attached a debugger to the process, when I aproached the ret intruction the address where it was called was at the top of the stack.
Is there a subject or topic to study that goes more in depth into this lesson? There is learning assembly but how about learning how the cpu works and runs a program. All the stuff covered in this video. What would be a good resource?
I think Chibiakumas is a good resource (look it up on youtube.) The CPU in general terms operates a lot like a BASIC interpreter from the 80s. Now, there are some more complex things going on in modern CPUs, but in general your assembly code is executed from the start to the finish, from the top of your source code document going down, one line at a time. Chibiakumas' videos are more in the context of retro game consoles than computers, which do have a few differences to be aware of. For example, on a Game Boy or Sega Genesis, your "main()" isn't going to return since there's nothing to return to. Trying to return from main on one of those systems is more than likely going to crash, as you're just taking whatever junk is on top of the stack and blindly GOTOing it.
I watched because you asked if I was sure. 😊 I was but that's not a problem; this was nice and concise. Also nice to hear, at the end, something pentesters and CTF folks should appreciate.
Yes, I do know because I am a computer scientist, and not just a so-called programmer. I have taken low level classes like computer architecture and organization aka assembly.
as someone who has an on and off interest in implementing esolangs, creating control stacks in software is a really common bit of boilerplate. Even a language as simple (and devoid of conventional subroutines) as Brainf**k may need to know several potential jump points simultaneously in order to function properly. It's really interesting seeing these things I typically think of as snippets of C code being performed completely "on the metal" as it were.
Great video, with very simple but very clear explanation! Even though I knew most of this, it clarified certain terms for me (stack frame, as the term for the fraction of the stack related to 1 function/subroutine scope). Maybe I am going to learn ASM some day, at least the x86_64 version. At the moment I know a wide range of languages, varying between more functional to more imperative and from OOP to procedural, but I have never really done ASM. Although I have quite seriously played around with the IL assembly language from DotNet, which is certainly more abstract than real ASM, but a lot lower level than DotNet's C# language.
4:25 Not just CPUs have calling conventions! Even shell-scripts have the convention to set the `REPLY` environment-variable to the return value of a function/subroutine. There's another convention to print the return value to stdout, which requires the caller to call the function within a subshell, then capture the printed value in an arbitrary variable. The former is faster, but it has global side effects. The latter is more akin to the functional-paradigm, but it's much slower because of I-O and subshells
TLDR; The result is put in the accumulator register and the return address is pushed to the stack on a function call and popped out on function completion
when i started learning programming i saw functions without return and understood them with a level of simplicity but when i started learning about functions with "return" i wondered where does the return value store itself?
Without watching the video, I’m guessing that the memory location of the function call is pushed onto the stack and when return is called the location is popped from the stack and the program jumps back to where it left off. I’m pretty sure that’s how it works in assembly, so in a compiled language it should be the same.
It's unfortunately also the reason why buffer overrun attacks exist. If an array stored on the stack is indexed out of bounds, a hacker can overwrite the return address with the address of any function he wants and "return" to it, even if the program hasn't been there before.
first
i wasn't first
F
congratulations michel 👑
Nice
🍕
Note that this does not apply to all architectures but is specific to x86 (and maybe some others).
For example, on Arm, calls ("branch and link" instruction bl/blr) put the return address into a register (x30).
Thus functions must push the return value on the stack themselves before calling another function (functions that do not call other functions do not need to push it as the register containing the return value is never overwritten).
The return on Arm is just a normal branch instruction, but encoded in a different way such that the CPU is aware that it is a return, not just a normal jump to a register (this enables microarchitectural optimizations like a return stack buffer to speed up execution).
I really liked the video though :)
It would be nice if you can talk about out-of-order CPUs at some point since understanding (at least having a rough idea) how modern CPUs work prevents some optimization pitfalls one may fall into when just thinking about a "traditional" Van Neumann Computer.
And it is the gateway drug to the world of microarchitectural optimizations, side channels, and transient execution attacks :D
I mean.. there is the 6502 and z80, how about some 6800 too?
In MIPS... there's a jump and link(jal), which also saves the current pc+4 to a register 31(or $ra)... and it's responsibility of the subprogram to take care of it(pushing to the stack if it needs to call another function and popping it back).
By convention, registers 4-7(or $v0 to $v3, inclusive) are used for arguments, registers 8-15(or $t0-$t7) are used for temporaries and may be modified by inner functions calls, while registers 16-23($s0-$s7) are to be preserved at the return(if you really need to use them, you should push their values... and pop them back when you are done)
Finally registers 2 and 3(or $v0 and $v1) are the convention for the return.
I took a course in IBM 360 (370?) assembly in 1982 and my second favorite instruction was BAL - branch and link. My favorite was ZAP - zero and add packed (packed decimal), because, well, ZAP.
@@matheusjahnke8643 That convention is the o32 ABI, there's also the n32/64 ABI which defines a slightly different usage for the registers:
Arguments are now from $4-$11 ($a0-$a7)
Temporaries are from $12-$15 ($t4-$t7)
The callee saved registers are the same, $16-$23 ($s0-$s7), plus the others ($gp, $sp, $fp)
And the return registers are also the same $2 and $3 ($v0 and $v1)
The Arm calling convention expects the callee to preserve registers r4 to r12. Complier people know this so they would do this extra work (pushing and poping the registers in the callee if it requires extra registers). R0 is usually used as return value. Actually the first few registers as passing parameters
Gotta admit that Assembly was very challenging for me, I think the most challenging task in my life, because it feels so alien. However I'm very slowly starting to make sense of it. This is a testiment to perseverance. No matter how difficult you think something is, continue. Take your time, try multiple times, look for different sources, whatever, but don't quit, ever. You'll get there, and after some time you'll look behind yourself and will realise how much you've learned. Thank you for this awesome video and helping come a little bit closer to really understand assembly.
The Journey is Hard Than it seems.
Hardness builds a man .
Assembly is very similar to basic in terms of program flow
Assembly is very easy if you start thinking about it in terms of physical circuits and the mosfet transistors and all their electric fields and depletion regions. Basically remember that it's all physics in the end, and assembly occupies some intermediate step.
It’s a lot easier to comprehend when you learn with computer architecture and organization. It’s why a lot of colleges combine the two classes together nowadays it’s usually called “Computer organization and Assembly” or something along those lines.
@@fackyoutube8452 very true, I remember going through the whole organization of basic 8086 arch and the registers and their uses etc and then it made much more sense what each instruction is doing and where its storing stuff and how its storing and updating etc
Basic understanding of assembly is important when using any language that compiles directly to assembly. A lot of programmers, even good ones, don't have a clue as to how some of this stuff works. My basic recommendation is to learn to read some assembly by taking a peek at the generated output of a simple program. Look up each instruction one at a time and work out what is happening. It is amazing to me how few programmers don't know some of this stuff.
The same is true for any language that compiles into some intermediate form, like JVM bytecode, BTW.
I learned a lot of assembly by doing exactly this but then going in, cleaning it up, adding comments, and gutting half of it because gcc generates some really inefficient assembly. Seriously, I've seen gcc do this:
mov %eax %ebx
mov %ebx %eax
multiple times. Still haven't figured out what possible purpose that could have.
@@nightfox6738 This is why I never trusted compilers. I've seen them do amazing things but also really dumb things. But that's a new low lol
A few things to add.
Floating point values on x86 are returned via the first scalar of xmm0 register.
In general the way data is transferred between caller and callee is defined by what's called "Calling Conventions". There are a bunch, and they are determined by the the hardware architecture and the operating system, among other things. Fascinating stuff really.
Calling conventions is something that you have to have in mind when you are calling a function from a different programming language. It is quite fascinating to delve into.
Another term worth looking into is ABI or Application Binary Interface
If the language is higher level, there's likely a calling convention layered on top of the architecture's convention, especially if it's a multiplatform higher level language.
When you see "WINAPI" when you program in windows it's macro alias for "__stdcall" calling convention. Same as "CALLBACK" or "APIENTRY".
Great one. It is easy to forget this when not thinking about it directly and remembering this helps greatly in reverse engineering efforts.
teehee ty
@@LowLevelTV bro I love you 💋💋💋
The calling convention (stdcall, fastcall, cdecl) also controls how arguments are passed to functions and who's responsible for cleaning up the stack after returning
Do a video about stack unwinding on ARM next. I had a FreeRTOS fw that was running on ARM, and would dump the contents of the stack into the UART. We used this to reproduce the call stack. It was an interesting task and I learned a lot. Could honestly be a direct follow-up to this video.
before watching this video i didnt actually now what "return" did, but after watching it, i now know that i dont know what it does
Oooh. I know this one. Its the function epilogue. Functions get compiled into some bits and the function calling conventions have a prologue that stuffs the current program counter pushed to the stack. Return just pops that value back when the function returns. This is why recursive function calls can overrun the stack
Call stack has a limited size on most platforms. On Linux the default is 8 MB. So if a program uses more than that then a stack overflow error will occur.
8 MB is actually very generous, using more than that is quite difficult.
I remember the 6502 architecture had a hardware stack of 255 bytes. I dimly recall you had to push/pop two numbers.
Exceeding this stack caused a Stack Overflow fatal error.
Not sure how newer architectures do it, I imagine there is a stack in RAM which could be any size or location.
Don’t forget that the operating system also uses the stack.
@@darylcheshire1618on Monday modern systems every thread has its own call stack. They're randomly placed in memory to avoid hackers from accessing them.
@@darylcheshire1618 Modern systems typically use the upper portion of heap memory, if your stack gets two big it can start to clobber the heap but this doesn't usually happen except with a botched recursion or the like
More detail:
1. strlen can store the i variable in rax and does not need a stack frame at all.
2. even though bp is supposed to point to the start("top") of the stack frame, in practice it is often used as just a general-purpose register.
I actually knew this! Proud of myself.
same, from ben eater's videos iirc
Your strlen will return 1 for an empty string.
Write your tests, kids!
pre and post increments should be considered harmful
what confuses me most is the presence of a second semicolon right past the only one required, to my knowledge.
what is its point ?
an empty string is not empty, it's an array of char that ends with /0, the char that mean string ending, so "abc" length is in fact 4, ('a', 'b', 'c', '/0'). After the call it says "The string is __ bytes long", bytes not characters.
@@MT-cf2ms that's true. but calling the function `strlen` and giving it different semantics than the C library's strlen function (which returns 0 for an empty string) is cursed behavior. calling it something like `bytelen` would be a bit more acceptable. Or changing that text and fixing the bug would have worked too.
The stack is upside-down, at least on the x86 architecture. SP actually starts at the top of the memory segment (whatever is assigned by the OS), then decrements with push/call instructions and increments with pop/ret, so data is actually stored in reverse order. This is why overrunning a stack-based buffer can overwrite the return address (which is stored at higher memory addresses than local variables) and be exploited in buffer overflow attacks.
I learned about stack smashing a few weeks ago. It feels so crazy that I actually understand this stuff now.
I couldn't help but notice that the strlen() implementation shown in the video has an off-by-one error. It always returns the string length plus one (i.e. it includes the null character).
also interesting (and useless) fact: when returning a struct from a C function (that's not gonna fit into 32/64-bit eax/rax register) a compiler allocates the struct on the caller stack and passes the pointer to it as an extra hidden argument to the called function (callee), which is also what one would typically do in C when wanting a function to "return" multiple values. On asm level both do the same thing.
Return value optimization. This is why you shouldn't be afraid to return structs.
Instead of returning structs most functions i seen require to pass explicitly a pointer to a struct that needs to be filled with information.
@@SirusStarTVwhich functions? If you’re referring to libc ones, it might be because they were written before the optimisation discussed above was implemented. Just speculation
Yes. Yes, I'm sure. I've designed several language compilers and interpreters. And the runtime architecture presented is not the only one possible. E.g. you could pass parameters and return values on a separate data stack and have a return stack for the return addresses - just to name one.
Sick video, wish you made it before my compilation exam lol
Thanks sharing all this quality content !
My name is not jeff
MYNAMEEHJEGF
And this is not the code report
my name, will never be Jeff either
Pfp says PP, but Names PK??
my name's not shane kid
why did i watch this i knew it
One thing I'd like to pointt out is that if you see something like "RET 8" in assembly, it is NOT the same as "return 8;" in C. In assembly, a number after RET tells the CPU how much to adjust the stack before/after returning (I forget which)
It adjusts the stack after popping of the return address into IP in 80x86 systems. Also, whether the value passed back is in the register or on the stack can be language and compiler dependent even on 80x86 systems.
As Alex said. When the calling code needs to get more parameters to a function than the CPU can hold in its registers, one strategy is to put them onto the stack before calling (another one is to put them somewhere in memory and have a pointer to that address in a register). So the stack would hold the parameters, then the return address, then the function's local variables. When returning, those extra parameters need to be removed from the stack, too. The calling code can do that manually (it put them there, so it knows how much to remove) or the function can use a RET that can do that and let the CPU do that for free.
That depends on the calling conventions. Some architectures have a “frame pointer” in addition to the stack pointer; IIRC the frame pointer would save the value of the stack pointer just after a jump to subroutine. Then no matter how much data the function pushed onto the stack all the CPU has to do on a return is copy the frame pointer back to the stack pointer to get back to where the return address is on the stack.
@@trevinbeattie4888 EBP register in x86
Love this, exploring more than just the driving of the car, seeing the people and process behind the product.
Would love to see some kind of reverse engineering series in which you go step by step, reversing a real-life application and explain every difficulty that you come across.
We did something similar to that at Uni. We were given C code and had to turn it into MIPS (the assembly language Playstation 1 was written in). For small programs it's okay, but for a real-life application written in a high-level language like JS or Python, it would take weeks, months or even years I imagine... Our C was really simple, for example, figuring out if a number is a leap year. Assembly is fun, you can just "go to" whichever line you like lol. I found it really enjoyable but I think C runs almost as fast, so cost-to-benefit ratio-wise, it isn't worth coding in assembly. Check out Rollercoaster Tycoon from 1999, as it's coded entirely in assembly which is pretty cool :)
@@FreeWithinMe I've learned assembly but I barely understand C, and often with C, I have to figure out how to get the toolchain working with makefiles etc... it's a huge pain. I'm not sure which route to take since most people learn C before asm and I ended up doing the opposite
I don’t want to program in asm anymore if mov rax, 0 is faster than xor rax, rax. Seriously, since when moving an immediate value to a register is faster than a logical operation? How fucked up is the x64 ISA now?
@@YarisTex actually, its not really about speed, even tho that may be somewhat faster (I am not sure that its even faster lol). Its about saving bytes, since `xor eax, eax` takes less bytes to encode than `mov eax, 0`. And even that is not the reason why compilers at `-O3` optimization leves use it (since at maximum speed level, the compiler kinda disregards the size of your code), its about register renaming. The `eax` 32 bit register is actually a couple of dozens of registers which get assigned as the "current eax register" dynamically at runtime to go around false-register-dependencies in your code and this increases the speed of execution DRASTICALLY. So the `mov eax, 0` instruction itself has the same speed, but the next couple of instructions will sometimes run a little slower because it uses the "wrong" register.
How about a video on stack unwinding and C++-like exception handling?
Read the title, but have not yet watched the video...
It pops a value off the stack and writes that value to the instruction pointer. In the case the return statement is paired with a value, IIRC, the value is stored in memory and a specific register/pointer/flag is set to indicate there was a value returned from the subroutine. In general, at least; different architectures might handle things a little differently.
After watching the video...
I was more or less correct. What I referred as "the instruction pointer" is sometimes a counter in different architectures, but I covered that at the end with the "in general..." bit. My explanation of the returned value was vague/broad because it's been a while since I've dealt with low level code. Good to freshen up every so often.
it pays to learn assembly.........
Getting into this video i thought "doesn't assembly just jump to functions with CALL and get back with RET by reading the return address from the stack?"
And I was right apparently
Out of love and interest for a specific game I dove deep into it's assembly code using debuggers, even though I had no previous knowledge about assembly. I learned so many things just by observing which was a wonderful and fun task, and it's nice to see that these things are actually universal at least for x86.
What was the game, out of curiosity?
I've found most asm languages easy to learn once you've learned any one of them. While they have different instructions etc, they broadly work in a similar fashion.
@@jemmerl Don't know if RUclips is glitching right now but my reply seems to be gone. I am talking about Maniaplanet, the platform for the Trackmania 2 games.
I literally had to write about this in a final-exam yesterday about what CALL and RET statement does in 8086 assembly
I believe that for m68k the instruction is not "call" but rather "jsr" or "Jump to SubRoutine". I used to make my own games for Sega Genesis when I was 12 years old 😅
Can confirm, about a year ago I was trying to make a genesis game. But I couldn't get the sound to work
u must be a genius
The best way to learn this stuff is to implement your own/sort of programming language. I never completed it, but you had to think about go to instructions to implements ifs and loop statements and return statements and pushing parameters to functions.
Turing Complete is also a good game to learn that kind of stuff
lmao
I built a toy compiler for university. The bugs that you can be FUN: like where I was indexing the stack upside down because I forgot what direction the stack was growing 😂
Doing that you would understand and gain more skills even in interpreted languages like python.
I absolutely love low level stuff. It's sad that we only learn the basics of ARM and x86 at uni. Great video. (and yes, I was sure I knew how it worked :P)
There's a very valid reason why we develop high level abstraction on top of low level architecture like x86 and ARM - to solve more complex problems that will take forever to accomplish using assembly
I totally agree. I used to be a teaching assistant in my old uni and we had a class where we covered assembly. It was so much fun. It was so annoying that the Uni did more advanced architecture classes for final year undergrads only after I left 😢
@@anirudhkumar9139 i'm more interested in the system behind all these abstractions than programming anything useful with them 😄because i feel stupid trying to use them, like what are they even do behind the scene!? if i don't understand it i can't properly utilize them.
Direct access to memory is fun, you can invent drawing algorithms yourself instead of relying on others. Of course it's easier to use those shape and lines drawing functions but you would forever be speculating on how they work (if you're curious).
love your content man! As an aspiring malware analyst, your content is always enlightening and fun to watch! Long live to low level languages!
It's also not correct to say the return value goes into a register. It's more correct to say "It goes into the register - if it fits". Keep in mind you can return structs or other potentially very large data types, which are allocated on the stack. In these cases, space is actually allocated on the stack frame of the *calling* function, which is where the return value is copied to. This is usually not done, because moving large amounts of memory around on the heap is not a good idea - but it is technically possible.
Hrm... The "are you sure" assertion made me expect some interesting edge cases, but it didn't even mention how floats are returned, not to mention stuff like tail calls, returning values larger than registers, why larger epilogs are sometimes needed, or link registers on ARM. Apparently I was more sure than the clickbait title. 😕
The (are you sure?) part clickbaited me haard. Last semester at uni we were implementing a full pipeline compiler for a subset of C; lexer -> parser -> syntax analyzer -> assembler; and it was a very satisfying process with a nice final result which I would recommend to anyone who is curious what happens under the hood of everything. Of course you can only grasp so much, but the general concepts are all there.
1:35 a good compiler would remove those unused functions, bypassing the posed problem 😀
Yes, I definitely understand how return works. I've learned nearly everything I know about assembly from Ben Eater. He is quite an excellent teacher.
I learned mostly from Chibiakumas
I learned about computers from Ben Eater and redstone
I love his breadboard computer series. It taught me so much about computer architecture.
Yes I do. And a much simpler way to understand not just “return” but every other component of a high-level language (like arrays, function stacks, etc) is just learn MIPS.
I learned 6502 Assembly first, haha. Took me about a year to "get it" but now I can learn basically any assembly language no problem.
@@williamdrum9899my first processor was the 6502. I quickly learnt to program it in hex without op codes as the computers I generally used had no assembler nor disassembler.
The 6502 has some interesting quirks. Its little endianness speeds up processing of calculated addressing: it adds the offset to the LSB whilst the MSB is loading, and only needs the extra cycle to increment the MSB if a page boundary is crossed. Due to synchronous memory accessing whilst the 6502 is deciding and incrementing the MSB if necessary, it loads the byte at the address found so far. If the MSB needed changing it throws this away and loads the correct byte. It is this synchronous memory access that allows a read to be a cycle shorter if a page boundary is not crossed, but to prevent memory being splatted over a write always waits until the correct address is confirmed before writing.
The incorrect address is the same offset in the previous page of memory. Normally this may not be a problem, but if that page contains memory mapped IO a side effect can be to trigger hardware that relies on just a memory access (the Apple ][ had some such memory locations). A write does a read before the write so writing to such memory locations results in a double access.
The bug in the JMP (I) instruction is due to the LSB being incremented but no check being made for overflow to 0 from 0xFF.
For a JSR it also stores the return address as the last byte of the instruction, so that when a RTS is executed, after loading the return address it needs to increment it for the next instruction. This is something to do with the "pipelining" the 6502 does as part of its synchronous memory access whereby it is finishing off the previous instruction when it loads the next instruction.
I better understand it considering that I studied two courses of computer science (based on the well known book from Patterson) and I had to learn the basics (architecture and assembly) of MIPS-architecture. I don't know the details of X86 assembly though.
Which book you are referring to?
It's a bit different, in that there are fewer registers and more specialized commands. Also there are much fewer limitations on addressing modes (you can use register offsets for indexing unlike MIPS)
When you call a function, the return address is pushed onto the stack as well as any variables being passes into the function. you jmp to the address where the function lives, execute the code there, and the last few lines of (machine code) of the function should pop all of the locals off of the stack and JMP back to the saved return address.
Now lets see how wrong I am(have not watched the video yet).
@@ayoubedd The only assembly code ive learned is LC-3 (stupid oldschool). I cant remeber the exact order i do remember we moved the program counter, as well as the local vars for the function directly into our registers (of which we only had 7).
Its good to review.
Definitely, one of your best, easy to follow but quite intense videeos!
Abstract machines almost always work in this way. In real world processor architectures there are myriad of variations, mostly regarding the use of registers (because they are faster than memory based stack, although there are architectures where the pot of the stack is aliased to a register bank).
However, to make this useful to viewers sufficiently "naive" not to know this already, an explanation of the concept of stack (or LIFO) as a data structure is in order.
I think the idea will 99% of the time be the same for all architectures, but they all execute it a bit differently. For example: MIPS Assembly uses 'jal ' as the function call. That instruction writes the PC value into the $ra register, then writes the function offset into PC. Then, the 'return' would be 'jr $ra', which would write the $ra value into PC :) (basically a jump function but using the values stored in $ra as the address)
I believe the 6502 would use JSR, jump to subroutine, and push the PC onto the stack. RTS, return from subroutine, would grab whatever 2 values were on the stack and jump to that address.
It's mostly the same except most CISC computers use the top of the stack to store the return address
1:39 Fun fact both doublequotes in the functionA are pointing inwards to the text but in functionB they both point right
1:36 I was thinking, bl and blr (powerpc asm)
I just got done with a computer architecture class where we learned about the risc-v architecture. 2 minutes into the video I realized I already knew how return works lol. Nice video
This is more complicated than I assumed it would be. I'm glad I clicked on this so that I could understand it better.
Just finished my Assembly class, so I did in fact know about how "return" works.
push pc onto the stack
frame pointer stores the location of pc
... code ...
return values go in r0-r3 registers
use the frame pointer to bring pc back
Returning function calls was the first concept I really understood without explanation! Lol!
Thank you for this. I tried to elaborate on the semantics of `return` (and the case of Rust's implicit return for expressions) to my classmates but failed to wrap the concept in words.
The "are you SURE?" had me doubt if I actually knew. But nothing new here. Pre-stored return addresses cannot work either for recursively called functions. What I do wonder about is how the return of multiple values work in languages that allow it. Would the calling function allot some space beyond the base pointer for the "out" variables?
The languages I can think of return multiple values as a tuple.
@@Beliar_83 A tuple doesn't fit in a register.
@@gideonz74b Ah, I am sorry, I was thinking in high level languages. While I don't know for sure what it actually is in terms of assembly, I guess they probable return something similar to what they return for a class. Likely an address that points to the tuple data.
@@Beliar_83 The whole point of this video is how the return statement in a higher language actually works under the hood. So, any other higher level (compiled) language would need to have a pattern to translate tuples to. A pointer to an object is highly inefficient, as it would require memory allocation. Therefore, I mentioned that I imagined that the compiler reserves some room on the stack to place the return tuple into, but this stack space must belong to the caller, obviously. It is possible, because the compiler knows the return type compile time, and it is the same for all paths exiting the function. The return type is defined in the signature of the function.
I spent a decent amount of this video shouting STACK!!! at my monitor
Yes I do. return tells the architecture to pop the stored program counter off the stack and replace the current program counter with it, while also moving any return values into register(s) specified by the architecture as return registers...
*video*
Yes I am sure.
Yay for learning ARM in my youth and never using it.
You made me question my knowledge, but is is exactly what I learned at Uni. So yes. I'm sure
OK, that's a good introduction to dynamic call stacks, but that "are you SURE?" in the title misled me into believing there was something there that would surprise programmers like me who already think we know how return works. At least you could have explained the purpose of the frame pointer (bp), because I still don't know why it exists.
It's used to retrieve function parameters and local variables.
The frame pointer is used to save the value of the stack pointer at the time of the jump to subroutine; i.e. it points to the return address on the stack. This way no matter how much extra data is pushed onto the stack the CPU knows exactly where to pop back to on return, _and_ the function can use it as a base reference for any call parameters passed to the function (which would have been pushed to the stack before the JSR.)
Some constructive critisism if welcome; in my experience, people who need or want this concept explained often don't understand what a stack is, or why that type of datastructure is used here. I think you could do a followup about how a stack generally works, that it's useful because it models a stack trace naturally, and perhaps a bit about why it's also preferred because physical machines can easily implement it in hardware.
I was able to follow this reasonably well because of how carefully and methodically you speak, but I laughed when you said "don't worry, it's not complicated"
I got interested in assembly as a kid while learning to program. Never wrote anything in asm, but I learned how some C calls are broken into instructions, and the whole concept of the stack and registers. So my intuition was basically the same thing that this video laid out. Just wasn't sure if the return value was going on the stack somehow or a register, but the register made more sense.
The reality is that often in Assembly you're free to implement these conventions however you like. Some architectures like x86 have a "call" instruction that makes you jump to a particular address and also puts the return address on the stack, but really you could just jump and store the return address in a register instead.
The reason to have conventions, however, is that if your program is calling a C function it is often expected to follow the "C convention" (in the early days a "Pascal convention" was common on Apple Computers). And some architectures like MIPS just declare what the convention for programmers are, but it isn't really enforced.
Some instructions may assume that certain values are stored in certain places though. But Assembly is absolute freedom.. make as big of a mess as you like!
cant wait to watch for the next five times while I try to figure out what basics I'm missing out to get this.
Maybe you should add that ... the calling convention (and I include the value return here) is just that. A convention. When writing assembly yourself, nothing stops you on completely ignoring this, like abusing MMX registers for storage of ... e.g. a global pointer to your hand-written memory manager, using r14 as the first parameter (now that'd be weird), jumping into the middle of another function, manually altering the stack pointer and returning from an entire callchain at once, you name it. The drama of debugging is only limited by your creativity 😄
Thank you for good explanation, as a self-taught dev it really helps deeper knowledge ! Keep it up please 👑
Loved the video :) Watched it to make sure I remembered what I learned from Assembly and Comp Arch courses
This is really only valid for a tiny subset of cases, and leaves off an explanation of different calling conventions and architectures which have different return patterns. On top of that oversimplification your implementation of strlen() is actually wrong. Forget best practices, at the very least return a correct result and use the correct function signature: size_t strlen ( const char *s ) { const char *t = s; while ( *t ) t++; return t - s; }
Basically, return and function calling in general works as if the called function had an implicit return address parameter. It's filled by the caller and used by the callee.
Named return value optimization (NRVO) does the same for the result: the caller makes space for the result and the callee has an implicit return parameter that is bound to the address of the space for the result.
strlen is wrong. Is should subtract 1 at the end or initialize i at -1
Got my sub with that witty function
How does it return to the right place... Ah yes. The short version. Pointer on the stack. a lot of attacks with active code injection works on running a series of NOP instructions until it hits an instruction pointer that we injected to a place other than intended. That essentially boils down to treading into unsafe memory using a buffer overrun.
The only reason I know the return statement and how it works as well as I do is that I've had formal programming education, and one of my instructors demonstrated how a lot of keywords worked, what they actually do, and how they have been historically abused to get to memory that is otherwise not belonging to the current program.
It’s cool to see this video and actually already knowing how return work
On a 6502, I wanted to copy a disk volume to a RAM disk, the disk was then ejected leaving the RAM disk as the only boot device. I wanted to reboot the machine, this was achieved by a small machine language procedure which threw away the return value causing the reboot.
In ARM, IP is R12, and PC is R13, so calling IP the "instruction pointer" is a bit misleading if you don't specify what machine you are targeting.
Truly, this is one of the things that happens in a computer
Funny enough, for the longest time when beginning learning about OOP, "return" was the most difficult thing for me to wrap my head around as previously in learning "echo"ing was the "alternative" used with "returns".
Turns out i really knew how return works
Although I had a general idea of how it worked, it was nice to learn a few extra details about assembly, thanks! Great video!
Guess a quick solution would be to create an small if statement every time a user inputs the index location to confirm it’s within the length of the array. Bit tedious but a safeguard nonetheless
What if the function is declared as void on its declaration but then you actually return something with a different declaration at linking? Did I explain what I mean? Like, u define in an object ur void function and in another, u declare ur non void function with same name and then link them, how does it handle that?
The declared function will store the return in RAX but then the caller will just ignore it :)
Noticed this when I tried to learn about game devlopement, and whenever I attached a debugger to the process, when I aproached the ret intruction the address where it was called was at the top of the stack.
Is there a subject or topic to study that goes more in depth into this lesson? There is learning assembly but how about learning how the cpu works and runs a program. All the stuff covered in this video. What would be a good resource?
I think Chibiakumas is a good resource (look it up on youtube.) The CPU in general terms operates a lot like a BASIC interpreter from the 80s. Now, there are some more complex things going on in modern CPUs, but in general your assembly code is executed from the start to the finish, from the top of your source code document going down, one line at a time. Chibiakumas' videos are more in the context of retro game consoles than computers, which do have a few differences to be aware of. For example, on a Game Boy or Sega Genesis, your "main()" isn't going to return since there's nothing to return to. Trying to return from main on one of those systems is more than likely going to crash, as you're just taking whatever junk is on top of the stack and blindly GOTOing it.
the stack! This is why infinite recursion will give you a stack overflow error
That's right. RAM is finite, and every time you recurse you use more.
@@williamdrum9899 why have RAM if you don't use all of it lmao
@@sentjojo It's ok to use all of it, but not more than you have
But that's exactly how I expected it to be done... Why is that surprising?
0:22
In R, we don't have to explicitly call *return.* The function will return what ever value it lastly assigned
Best videos I've seen to answer the question, "How does my computer work?"
ret instruction, it uses address on the stack to set rip register and jump back to caller
I watched because you asked if I was sure. 😊 I was but that's not a problem; this was nice and concise. Also nice to hear, at the end, something pentesters and CTF folks should appreciate.
Yes, I do know because I am a computer scientist, and not just a so-called programmer. I have taken low level classes like computer architecture and organization aka assembly.
as someone who has an on and off interest in implementing esolangs, creating control stacks in software is a really common bit of boilerplate. Even a language as simple (and devoid of conventional subroutines) as Brainf**k may need to know several potential jump points simultaneously in order to function properly. It's really interesting seeing these things I typically think of as snippets of C code being performed completely "on the metal" as it were.
Thanks for the video. I learned this back in the uni but totally forgot about it lol. Always good to refresh memory on low level stuff.
Great video, with very simple but very clear explanation! Even though I knew most of this, it clarified certain terms for me (stack frame, as the term for the fraction of the stack related to 1 function/subroutine scope).
Maybe I am going to learn ASM some day, at least the x86_64 version. At the moment I know a wide range of languages, varying between more functional to more imperative and from OOP to procedural, but I have never really done ASM. Although I have quite seriously played around with the IL assembly language from DotNet, which is certainly more abstract than real ASM, but a lot lower level than DotNet's C# language.
4:25 Not just CPUs have calling conventions! Even shell-scripts have the convention to set the `REPLY` environment-variable to the return value of a function/subroutine. There's another convention to print the return value to stdout, which requires the caller to call the function within a subshell, then capture the printed value in an arbitrary variable.
The former is faster, but it has global side effects. The latter is more akin to the functional-paradigm, but it's much slower because of I-O and subshells
TLDR; The result is put in the accumulator register and the return address is pushed to the stack on a function call and popped out on function completion
I figured it would save the return adress at call, and moving the stack related pointers actually explain lifetimes of stack variables quite well.
Yup! The way the stack frame is constructed is the reason local variable scope exists. Thanks for watching!
I really like the hand shake between Arnold and a dude whose been pushing too many pencils as the image for an agreement!
when i started learning programming i saw functions without return and understood them with a level of simplicity but when i started learning about functions with "return" i wondered where does the return value store itself?
Yes I understood this already before watching the video.
Without watching the video, I’m guessing that the memory location of the function call is pushed onto the stack and when return is called the location is popped from the stack and the program jumps back to where it left off. I’m pretty sure that’s how it works in assembly, so in a compiled language it should be the same.
Update: HOLY SHIT I WAS RIGHT LET’S GOOOOOOOO
I actually guessed this is how it's done. Such a simple but brilliant way to handle it
It's unfortunately also the reason why buffer overrun attacks exist. If an array stored on the stack is indexed out of bounds, a hacker can overwrite the return address with the address of any function he wants and "return" to it, even if the program hasn't been there before.
Funny how $pc and $ip can both be confusing names for the same thing, considering those acronyms are also two major things relevant to computers