Ahh yes, so we can cry/tear up ever more. And i like to chop my onions instead so somedays ill code a react component and then an hour after i glue together some relay switches to make some gnarly XOR gates.
Every one who works in software should get a 8085 microprocessor course. And physically code the programs in hex code using the training kits. Trust me it unlocks something in brain.
I can also recommend completing a game SIC-1, where you write code using only one instruction (subleq). Second half of challenges is mindblowing because of self-modifying code.
i used to code ARM assembler back in the 1990s. It was such a joy to code VS x86. the conditional execution bits were just awesome. no need to jump over multiple bits of code.
"gcc is the same" no by default gcc is a symbolic link to clang on MacOS Can you use "-masm=intel" on x86_64 next time 'cause att syntax is quite ugly. "-O3 can break things sometimes" no it is not allowed to break things. If "-O3" breaks things then it is either a compiler bug (which is unlikely) or you simply did something, that is not allowed by ISO/IEC 14882.
I'll be happy to see you talk about the instruction set architecture (ISA), then different microarchitectures, then the logical building blocks, MOSFETs then boom silicon wafers, etc as you dive deeper.
gcc on mac is an alias for clang, if you have Xcode utilities installed. You need to install gcc separately with homebrew and run with gcc-version (eg gcc-14)
x86 registers are also inheriting previous size registers. AH/AL(8 bit) -> AX (16 bit) -> EAX (32 bit) -> RAX (64 bit). It's just that GCC chose to do it in another way so you did not get the same code. But you can move AX if you want to move half the RAX register. GCC probably did it different because it is faster in the way it did.
There's a difference in number of cycles required to move data, GCC or any other compiler will always choose the fastest path whenever possible. Microcode optimizations are so integrated in tool chain that is hardly discussed or taught unless you're doing it. The fact that very few people work at metal level limits the exposure of the knowledge, and it reflects in the software side as well. P.S the instructions to opcode is usually hidden in CISC machines, different architecture impmemtations will have different approach to same problem varying in number of clock ticks to perform same operation.
Hi @HusseinNaser I beg to disagree .... In arm also there are similar instructions like adds, addc movs, and several other added to the main instruction mov... The difference between arm and x86 is that there are no in memory operation in arm unlike x86 so in arm you have to bring the value from memory to register using load and operate on that and then store it back. This helps to clear up the pipeline as now each instruction for most of the time it will not take more1 cycle that's why arm is so efficient. A 6 stage pipeline in arm is comparable to a 20 stage pipeline in x86.
Hey Hussein :) Thanks for your amazing videos and courses. Offtop question, why did you set it up so that your videos can't be played in the background? I would like to listen to your spoken videos while doing something else (cycling, for instance), without having to keep the app open on my phone all the time.
Can you do a video explaining why arm is better than x86 in terms of performance per watt, heat, etc? (explaining from the code or in depth hardware side) I'm very fascinating by this but the videos I saw all speak in a high level and don't show examples from the code or cpu behavior.
@@erkintek it's also because ARM has less complexity and can use smaller transistor nodes which are inherently more energy efficient. x86 is always a couple generations behind ARM in terms of fabrication process, Intel being stuck on 14nm for like 5 gens was memed to hell and back lol.
compiler was probably aligning the stack for cache purposes. "The ARMv7-M architecture guarantees that stack pointer values are at least 4-byte aligned." That comes from the reference manual as well
There's no strange alignment. ARM64 compiler will try to allocate at least 64bit per register (stack is a virtual register for the compiler, ARM64 uses 64bit regs) and align stack to 128 bit alignment, as potential 128bit loads and and stores (ldp, stp) require at least 128bit alignment. You can probably use "-fconserve-stack" on GCC, however you should never expect that if you write "a = b + 1", the compiler will ever emit addition operator, as the optimizer will perform various tricks.
@@brice.rhodes Cache can explain something, i know that alignment maybe 4 but first or last long (64bit) variable was 8 and next variable was on 20, so 16 bytes for 8 byte variable seems strange
@@pikachulovesketchup666 That what I'm asking for, pontetnial 128 bit loads explain alignment somewhat. Are those command load or store multiple registers at once? I k
@@AK-vx4dy ARM 32bit has stm,ldm,stmia,ldmia, etc. with multiple registers (this can used to implement e.g. memset operation), AArch64 has only ldr/str with single register and ldp, stp with register pair, optional offset and increment/decrement of the address. AFAIR 32bit ARM still requires 64bit alignment if you load and store multiple regs (I used it mostly in kernel mode exception handler entry/exit code). Compiler will usually emit e.g. "stp x0, x1, [sp]" as "push" of 2 registers, however stack must be aligned to 128bit, otherwise alignment exception may occur. The code in video is unoptimized code, otherwise the entire core would just be "mov x0, #something" and then "ret".
Donald E. Knuth. The art of computer programming. Sample programs and exercises are in fictional MIX and later MMIX assembly language for ideology reasons. So this is high quality programming course with everything in assembly. Actually, big part of Volume 1 is math class that is required for performance estimation. I.e. generating functions. Hardly you can find better introduction to assembly programming. MMIX is fictional, but nowadays it gives a bonus of simulators with performance measurements. Each operation has declared cost, in oops and mems, and simulator can measure that. Notes on MIX to MMIX migration. MIX is quite old machine. Each byte is 6 bits, and each word contains one sign bit and 5 bytes, 31 bit in total. Memory is 4000 MIX words for both program and data. This is roughly 16kb of normal bytes. MMIX is less esoteric. Bytes are 8 bits, memory addressing is 64-bit, registers are 64-bit. So migration from MIX to MMIX is occurring slowly. Most recent editions still did not incorporate all changes. To work with MMIX, you need TAoCP itself. Then Volume 1 Fascicle 1. MMIX - "A RISC computer for the new millennium". It is supposed to be incorporated into Volume 1, but not yet. Then you need Martin Ruckert «The MMIX Supplement: Supplement to The Art of Computer Programming Volumes 1, 2, 3 by Donald E. Knuth». Fascicle 1 only contains description of MMIX. The MMIX Supplement contains sample programs that are supposed to replace MIX sample programs in Volumes 1-3. Volume 4A and next ones are in MMIX from the beginning.
ruclips.net/video/cOYK3nbpa2w/видео.htmlsi=r4vNzksLHyhNWbMX&t=619 You said you don't understand why the immediate is first moved to the register only to then be moved into the stack address. x86-64 supports a 64-bit absolute addressing mode only for load/store operations. It's why loading the immediate into a register first is required.
Omg, there are two different syntax for x86 asembly. If you chose Intel syntax then you will get one mov keyword. Please please do more research before making video. I would love to watch you but you make me angry for this reason :(
Apart from the fake accent he tries to use, the knowledge you share is amazing. I kindly request that you stop using the fake accent and focus on better communication, as it's extremely annoying.
What fake accent lmao. That's just how he talks. He has the typical accent of people of his ethnicity/nationality. Don't tell me you thought he's Indian.
Let us keep peeling that onion...
Until we reach to the electrons and channel turns into physics channel
@@ravikumarmistrythe ultimate goal
@@ravikumarmistry 🤣🤣🤣 Indeed
merch!
Ahh yes, so we can cry/tear up ever more. And i like to chop my onions instead so somedays ill code a react component and then an hour after i glue together some relay switches to make some gnarly XOR gates.
Every one who works in software should get a 8085 microprocessor course. And physically code the programs in hex code using the training kits. Trust me it unlocks something in brain.
I do DSP assembly and nothing can replace writing assembly and seeing the signal being generated in an oscilloscope
@harrytsang1501 really envy you guys, when compared to work with flutter app development
I can also recommend completing a game SIC-1, where you write code using only one instruction (subleq). Second half of challenges is mindblowing because of self-modifying code.
How do I get this course
Any recommendations?
I love the japanese accent at 06:21 !! very nice.
i used to code ARM assembler back in the 1990s. It was such a joy to code VS x86. the conditional execution bits were just awesome. no need to jump over multiple bits of code.
"gcc is the same" no by default gcc is a symbolic link to clang on MacOS
Can you use "-masm=intel" on x86_64 next time 'cause att syntax is quite ugly.
"-O3 can break things sometimes" no it is not allowed to break things. If "-O3" breaks things then it is either a compiler bug (which is unlikely) or you simply did something, that is not allowed by ISO/IEC 14882.
I think he confused O3 for Ofast
I'll be happy to see you talk about the instruction set architecture (ISA), then different microarchitectures, then the logical building blocks, MOSFETs then boom silicon wafers, etc as you dive deeper.
gcc on mac is an alias for clang, if you have Xcode utilities installed. You need to install gcc separately with homebrew and run with gcc-version (eg gcc-14)
Building a NES emulator and debugging games made me appreciate Assembly and 80s game dev so much more!!
I did a bucketload of assembly in college 25 years ago, not sure about it being beautiful.
A course on assembly will be game changer 😊
Man. You made me want to learn the lang. I learned the most basics while being entertained.
x86 registers are also inheriting previous size registers. AH/AL(8 bit) -> AX (16 bit) -> EAX (32 bit) -> RAX (64 bit).
It's just that GCC chose to do it in another way so you did not get the same code.
But you can move AX if you want to move half the RAX register.
GCC probably did it different because it is faster in the way it did.
There's a difference in number of cycles required to move data, GCC or any other compiler will always choose the fastest path whenever possible. Microcode optimizations are so integrated in tool chain that is hardly discussed or taught unless you're doing it. The fact that very few people work at metal level limits the exposure of the knowledge, and it reflects in the software side as well.
P.S the instructions to opcode is usually hidden in CISC machines, different architecture impmemtations will have different approach to same problem varying in number of clock ticks to perform same operation.
Fun watch! Would love to see a video on compiler optimizations, they can be very spooky at times.
Hi @HusseinNaser I beg to disagree .... In arm also there are similar instructions like adds, addc movs, and several other added to the main instruction mov... The difference between arm and x86 is that there are no in memory operation in arm unlike x86 so in arm you have to bring the value from memory to register using load and operate on that and then store it back. This helps to clear up the pipeline as now each instruction for most of the time it will not take more1 cycle that's why arm is so efficient. A 6 stage pipeline in arm is comparable to a 20 stage pipeline in x86.
Hey Hussein :)
Thanks for your amazing videos and courses.
Offtop question, why did you set it up so that your videos can't be played in the background? I would like to listen to your spoken videos while doing something else (cycling, for instance), without having to keep the app open on my phone all the time.
isn't that a RUclips problem? do videos in general play in the background?
@@cheebadigga4092yes, others videos from other channels plays in background
He has a podcast channel (Backend Engineering Show) on Spotify as well, you can try that.
@@cheebadigga4092it requires RUclips premium for background play.
and why is your comment shown as 3 wk ago, if the video is published only 15 hr ago 😅
subscribed. this was awesome. i will have to check out the os course.
How about RISC V processors??
is this AT&T syntax? 🤔
Can you do a video explaining why arm is better than x86 in terms of performance per watt, heat, etc? (explaining from the code or in depth hardware side)
I'm very fascinating by this but the videos I saw all speak in a high level and don't show examples from the code or cpu behavior.
All commands means a circuitry. I many circuitry is waiting, ie 10 different mov's for x86 but in arm 1-2. 10 circuits burns much power.
@@erkintek it's also because ARM has less complexity and can use smaller transistor nodes which are inherently more energy efficient. x86 is always a couple generations behind ARM in terms of fabrication process, Intel being stuck on 14nm for like 5 gens was memed to hell and back lol.
logic gates coming up next
That mov keyword, is it related to move semantics by any chance? Asking for a friend.
Nope. Just the same name. Btw in asembly it actualy copy the operand but the name stuck so
Someone can elaborate on this strange aligment on arm version?
compiler was probably aligning the stack for cache purposes.
"The ARMv7-M architecture guarantees that stack pointer values are at least 4-byte aligned." That comes from the reference manual as well
There's no strange alignment. ARM64 compiler will try to allocate at least 64bit per register (stack is a virtual register for the compiler, ARM64 uses 64bit regs) and align stack to 128 bit alignment, as potential 128bit loads and and stores (ldp, stp) require at least 128bit alignment. You can probably use "-fconserve-stack" on GCC, however you should never expect that if you write "a = b + 1", the compiler will ever emit addition operator, as the optimizer will perform various tricks.
@@brice.rhodes Cache can explain something, i know that alignment maybe 4 but first or last long (64bit) variable was 8 and next variable was on 20, so 16 bytes for 8 byte variable seems strange
@@pikachulovesketchup666 That what I'm asking for, pontetnial 128 bit loads explain alignment somewhat. Are those command load or store multiple registers at once? I k
@@AK-vx4dy ARM 32bit has stm,ldm,stmia,ldmia, etc. with multiple registers (this can used to implement e.g. memset operation), AArch64 has only ldr/str with single register and ldp, stp with register pair, optional offset and increment/decrement of the address. AFAIR 32bit ARM still requires 64bit alignment if you load and store multiple regs (I used it mostly in kernel mode exception handler entry/exit code). Compiler will usually emit e.g. "stp x0, x1, [sp]" as "push" of 2 registers, however stack must be aligned to 128bit, otherwise alignment exception may occur. The code in video is unoptimized code, otherwise the entire core would just be "mov x0, #something" and then "ret".
Some assembly and a smattering of philosophy :)
where can one lear Assembly?
Donald E. Knuth. The art of computer programming.
Sample programs and exercises are in fictional MIX and later MMIX assembly language for ideology reasons. So this is high quality programming course with everything in assembly. Actually, big part of Volume 1 is math class that is required for performance estimation. I.e. generating functions. Hardly you can find better introduction to assembly programming.
MMIX is fictional, but nowadays it gives a bonus of simulators with performance measurements. Each operation has declared cost, in oops and mems, and simulator can measure that.
Notes on MIX to MMIX migration. MIX is quite old machine. Each byte is 6 bits, and each word contains one sign bit and 5 bytes, 31 bit in total. Memory is 4000 MIX words for both program and data. This is roughly 16kb of normal bytes. MMIX is less esoteric. Bytes are 8 bits, memory addressing is 64-bit, registers are 64-bit. So migration from MIX to MMIX is occurring slowly. Most recent editions still did not incorporate all changes. To work with MMIX, you need TAoCP itself. Then Volume 1 Fascicle 1. MMIX - "A RISC computer for the new millennium". It is supposed to be incorporated into Volume 1, but not yet. Then you need Martin Ruckert «The MMIX Supplement: Supplement to The Art of Computer Programming Volumes 1, 2, 3 by Donald E. Knuth». Fascicle 1 only contains description of MMIX. The MMIX Supplement contains sample programs that are supposed to replace MIX sample programs in Volumes 1-3. Volume 4A and next ones are in MMIX from the beginning.
Assembly Language for x86 Processors by Kip Irvine
Prison Break is unreal! Great video getting under the bonnet.
i learned motorola 68k processor assembly language....
6:26 i thought i have ability to translate into japan
x86 is built on decades of cruft. ARM was designed by Sophie Wilson and Steve Furber , with a design philosophy based on simplicity and efficiency
Interesting
حيو
Real programmers represent asm into binary and read it that way.
ruclips.net/video/cOYK3nbpa2w/видео.htmlsi=r4vNzksLHyhNWbMX&t=619
You said you don't understand why the immediate is first moved to the register only to then be moved into the stack address.
x86-64 supports a 64-bit absolute addressing mode only for load/store operations. It's why loading the immediate into a register first is required.
Omg, there are two different syntax for x86 asembly. If you chose Intel syntax then you will get one mov keyword. Please please do more research before making video. I would love to watch you but you make me angry for this reason :(
1 - compiler explorer
2 - meaningless video, you should compile with -O2 optimisations
a total waste of my time...
Apart from the fake accent he tries to use, the knowledge you share is amazing. I kindly request that you stop using the fake accent and focus on better communication, as it's extremely annoying.
What fake accent lmao. That's just how he talks. He has the typical accent of people of his ethnicity/nationality. Don't tell me you thought he's Indian.