The Magic Of ARM w/ Casey Muratori

Поделиться
HTML-код
  • Опубликовано: 23 янв 2025

Комментарии • 448

  • @MattGodbolt
    @MattGodbolt Месяц назад +140

    I'm indeed very lucky to have been born with this awesome family name :) Thanks for the shout out! 😊

    • @JayDee-b5u
      @JayDee-b5u Месяц назад +6

      What gave you the inspiration to create this tool? And are you a one man team?

    • @MattGodbolt
      @MattGodbolt Месяц назад

      @JayDee-b5u it's a story I've talked about before, basically trading software speed. I explain it in the Microarch Club podcast at some point. I'm luckily not a one man team, there's a small group of volunteers who help.

    • @valentinrafael9201
      @valentinrafael9201 13 дней назад +2

      Actual GOAT right here. Love you mate!

    • @user-m-lev
      @user-m-lev 10 дней назад +2

      Thank you for that tool. In the modern world distance between low-level and high level grow fast and what you did is just amazing.

  • @MrWalrus3451
    @MrWalrus3451 3 месяца назад +446

    Hour and a half with Casey? YES!

    • @Anteksanteri
      @Anteksanteri 3 месяца назад +1

      You sound like an anime girl and I'm all for it 👍

  • @hariangr
    @hariangr 3 месяца назад +475

    I love Casey, well spoken, knowledgeable, easy to follow even for non native English speaker (edit: I AM not native English speaker, sorry for the confusion). Technical enough yet relatively easy to understand

    • @pablomelana-dayton9221
      @pablomelana-dayton9221 3 месяца назад +21

      Smart yet humble, good combo and makes for good teachers

    • @mattmurphy7030
      @mattmurphy7030 3 месяца назад +6

      @@pablomelana-dayton9221he’s not very humble

    • @CaseyChesshir
      @CaseyChesshir 3 месяца назад +1

      yeah i like him for another reason too

    • @brod515
      @brod515 3 месяца назад +9

      @@mattmurphy7030 he actually is.

    • @pepesilvia4564
      @pepesilvia4564 3 месяца назад +4

      ??? hearing this guy has been the most infuriating experience this week. He just CAN'T get to the point holy.... he kept rambling, I'm at minute 21 of the video and he STILL hasn't got the to point he wanted to say after minute 3 of the video. He reminds me a the boomer engineers that I work with just ramble and complain and never get anything done.

  • @grimm_gen
    @grimm_gen 3 месяца назад +250

    I could listen to Casey talk for DAYS and not be bored

    • @RealGrandFail
      @RealGrandFail 3 месяца назад +3

      DA : Once you know the stuff, you will get bored. It's like a machine on repeat.

    • @grimm_gen
      @grimm_gen 3 месяца назад +16

      @@RealGrandFail I feel like once you know the stuff, the joy comes from teaching others!

    • @RealGrandFail
      @RealGrandFail 3 месяца назад +2

      @@grimm_gen totally agree 💯

    • @ravenecho2410
      @ravenecho2410 3 месяца назад

      mollyrocket is his youtube handle (his wife does a childrens novel I think if I remember the lore correctly?), he has several amazing vids on there!

  • @yayinternets
    @yayinternets 3 месяца назад +173

    Casey is the best. He's forgotten more than I know. And I'm just a bit behind him staring down reaching 30 years as a Software Engineer.
    I am in awe of how verbally articulate he is over such a wide range of knowledge, in depth. Both wide and deep knowledge + articulate is a very rare gift and puts you at the top of the top in Engineering.
    I've had the good fortune of working directly with several "Distinguished Engineers" over my career and Casey has the all of the same qualities.
    Humble, incredibly articulate to a very detailed level at a wide range of subjects, doesn't talk in absolutes and knows to mention some of the tradeoffs, and know when they are getting into areas where they might lean on someone else for specific expertise.
    They are the best people to work with and know how to work at different levels of people without being patronizing or making you feel imposter syndrome.
    Casey is definitely in that class of Engineering and it's always a treat how well him and Prime work together despite coming from very different backgrounds.
    Well done as always, gentleman! I learned so much from this video that I had to come back and edit my original comment to add much more.

  • @nickwilson7241
    @nickwilson7241 3 месяца назад +68

    Casey is my favorite of your guests. Always love when he's on

  • @Goras147
    @Goras147 2 месяца назад +7

    Wanted to put this as background, turned out I can sit on my toilet for whole 1.5 hrs just listening to this.
    Very informative! Thank you Primeagen and Casey!

    • @aviral2759
      @aviral2759 27 дней назад +1

      hope your legs recovered and youre not wheelchairbound

  • @namelessbeast4868
    @namelessbeast4868 3 месяца назад +36

    Casey is such a great guest! I always learn so much when I watch these videos

  • @getattrs
    @getattrs 3 месяца назад +30

    I remember back when I was in highschool trying to get into game dev I found Casey's GJK video. Reading the paper was way over my head with academic language and math symbols - but his walktrough helped me implement it and EPA. It really helped me see that stuff that seemed untouchable (paper, cryptic code, abstract code) was understandable if you broke it down, take it step by step and try to visualize.
    I wish I had more teachers like him back in school, or more material like his available back then. Kids these days are really lucky to have content like this available almost effortlessly

    • @thesenamesaretaken
      @thesenamesaretaken 3 месяца назад +5

      It's both a blessing and a curse. Great learning materials are out there and readily available if you know where to look, but knowing where to look is the hard part, with low quality or outright hostile content often winning at SEO and pushing down the gems.

    • @Loanshark753
      @Loanshark753 2 месяца назад

      The issue of junk search results is only growing, hopefully soon we get hypergoogle.

  • @rdustinlane
    @rdustinlane 3 месяца назад +32

    As an embedded engineer, this was so great to listen to. It's hard to find good content in the embedded domain.

    • @mareksicinski3726
      @mareksicinski3726 3 дня назад

      i thought that especially recently it was common to just write in assembly, and assembly is specifically pretty common, casey said 'only in specific domains of embedded'

  • @Angel-Fish
    @Angel-Fish 3 месяца назад +19

    That was FANTASTIC!!! Pretty nostalgic too. I was lucky enough to build my 286, 386, & 486 computers back in the day when they came out. If they kept that naming convention, I wonder if the latest computer would be a 10086 or 20086 by now.... I totally had an assembly course in college. It good to know "nobody" writes that stuff nowadays. If you still do, then consider yourself nobody.

    • @r.k.vignesh7832
      @r.k.vignesh7832 3 месяца назад

      You'd be happy to know that there is a new Intel 285 chip coming out soon! The Core Ultra 9 285 has 24 cores and is among the highest tier of the upcoming Arrow Lake chips.

    • @Angel-Fish
      @Angel-Fish 3 месяца назад

      @@r.k.vignesh7832 Yikes! I almost thought they went backwards. 286 was short for the 80286 processor... looks like the 285 is short for 285K (285,000). Not sure if those numbers are a true apples to apples comparison but at least they are headed in the right direction. 😅

    • @r.k.vignesh7832
      @r.k.vignesh7832 3 месяца назад

      @@Angel-Fish The K is used to distinguish chips w/ unlocked multipliers from the standard ones. There will also be a 285 non-K. This would have been the Core i9 15900(K) with last year's naming scheme, but they changed it for some reason. Probably to confuse us even more.

  • @TerenceKearns
    @TerenceKearns 3 месяца назад +8

    I love the way Casey explains stuff. I learned so much just from his preamble.

  • @jonathanhoffstadt1366
    @jonathanhoffstadt1366 3 месяца назад +74

    It’s time to reboot the “Jeff and Casey” show with the new “Prime and Casey” show.

    • @jesusmgw
      @jesusmgw 2 месяца назад +2

      I would love to see Jeff interact with Prime too. And throw in Jon Blow there too.

  • @Bvngee
    @Bvngee 3 месяца назад +12

    Casey just seems like such a wonderful human being.

  • @AndrewTSq
    @AndrewTSq 3 месяца назад +69

    People seem to forget that both Intel and AMD had RISC cpus already in early 90ies. One of Sega's most popular arcade games used the Intel i960 (Sega rally yeaaaahhhh)

    • @MartialBoniou
      @MartialBoniou 3 месяца назад +9

      True. I still have a i860 in my NeXTcube. At some point, Intel has also made an ARM CPU: the XScale.

    • @AndrewTSq
      @AndrewTSq 3 месяца назад +4

      @@MartialBoniou oh, a NeXT cube :O I love that design. Always when I did a drawing of a computer I made it like a NeXT Cube :) . I hade forgot about th XScale actually lol :)

    • @ВладиславДараган-ш3ф
      @ВладиславДараган-ш3ф 3 месяца назад +3

      OMG we only have i9 today and there was already i960 in 90ies

  • @ra_0403
    @ra_0403 3 месяца назад +10

    So the processors that fetch multiple instructions in one cycle are called superscalar. And they can either be in-order execution or out-of-order execution. When it is out of order, they undergo register renaming (using a map and a free list of physical registers) to resolve dependencies (other than true dependency), and get dispatched into a buffer (Register Update Unit) where they wait until their operands are ready. A group of instructions get picked from this RUU and is executed since all the dependencies are resolved. Then, there is an in-order commit for the instruction at the head of the RUU. So we get in-order dispatch, out-of-order exec and in-order commits

  • @hunterleeves131
    @hunterleeves131 3 месяца назад +5

    Casey’s performance aware programming course is so rad, this dude rules

  • @cryptonative
    @cryptonative 3 месяца назад +129

    Casey is better than wikipedia

    • @Deadshadows
      @Deadshadows 3 месяца назад +7

      No doubt

    • @grendel6o
      @grendel6o 3 месяца назад +4

      Most things are

    • @asdfghyter
      @asdfghyter 2 месяца назад

      @@grendel6o nah, wikipedia is way better than e.g. most social media, including youtube comments. wikipedia is also way better than many youtube videos, especially when it comes to stuff like accuracy

    • @grendel6o
      @grendel6o 2 месяца назад +1

      @@asdfghyter Get a degree in aerospace engineering and try to use Wikipedia for anything related.

    • @asdfghyter
      @asdfghyter Месяц назад

      @@grendel6o try to use social media for that :P
      anyways, i have found the wikipedia articles for the scientific topics i have studied helpful, though not very pedagogical. i have no idea about the situation for aerospace engineering, but in general it's an encyclopedia, not a textbook on advanced topics. if you're looking at wikipedia for course material for any graduate level courses, you're using it wrong

  • @MrHaggyy
    @MrHaggyy 3 месяца назад +4

    This was a great talk from Casey, especially out of the head. There is one thing i would like to add to "the ARM ISR". There is not only one but a bunch of them. The most important ones are Cortex -A, -M, and -R. Their main difference is how you attack performance requirements from a (descrete) math point.
    Cortex A is the general compute approach. They are designed to run an OS and are used as CPU's, in phones, mobile, or AI clusters. Their goal is pure compute power, even at the cost of determinism or safety with things like branch-prediction, chunkwise caching, etc.
    Cortex R is realtime applications like the ABS/ESP in a car, a flight controller or the primary controll of a power/production plant. They are designed to guarantee a computation within a certain timeframe, provide redundancy, private memory for certain things etc.
    Cortex M is for microcontrollers. In very broad terms they are a hybrid of R and A. They can map a view realtime features, while still do some general compute when necessary. They are a great choice for a car door with the window control and a view buttons.
    Intel used to have different sets with x8150, x82 etc. but the portfolio narrowed down to what is known as x86 today. While ARM diversified from theoriginal ARMv1 / ARMv2 chip. They are also roughly the same age, they just grew in different industries.

  • @Summanis
    @Summanis 3 месяца назад +21

    Ian Cutress did an interview with Jim Keller and has a clip that would make a great supplement to this titled "Jim Keller: Arm vs x86 vs RISC-V - Does it Matter?".

  • @bentomo
    @bentomo 3 месяца назад +22

    One big extra power burn is x86-64 devices is the platform is desktop and laptop with expandable RAM. You need more voltage to drive big ram sticks further away. And ARM has always been on embedded with soldered down ram. Intel just demonstrated with Lunar Lake chips with soldered ram on the laminate, saving the memory controller voltage puts them a LOT closer to Apple Silicon in terms of performance per watt. You could bucket a big thing like RAM config in Casey's business explanation. REALLY good explanation from Casey!

    • @-_James_-
      @-_James_- 3 месяца назад +3

      ARM was developed as a desktop CPU though, and that's where it started. On the desktop.

    • @bentomo
      @bentomo 3 месяца назад

      @@-_James_- thanks for the correction. It wasn't until 1992 that the apple newton was a mobile device with an arm cup in it.

    • @lugaidster
      @lugaidster 3 месяца назад +1

      To be fair, mobile atom CPUs used in cellphones of the era were using embedded dram too.

    • @Loanshark753
      @Loanshark753 2 месяца назад

      First there are cores developed by ARM UK and GPUs developed by ARM Norway, then there are third party designs, by Qualcomm and Apple.

    • @-_James_-
      @-_James_- 2 месяца назад

      @@Loanshark753 Intel had some ARM designs for a while too after they acquired them from DEC.

  • @tengstrand
    @tengstrand 3 месяца назад +14

    This was a great one. I spent thousands of hours programming the 6502, M68000, and M68020 back in the ’80s and ’90s. It was a lot of fun, but nowadays I’m quite happy to be coding in higher-level languages, especially my favourite - Clojure. Still, I sometimes miss the days of programming in Assembly and C. There was something special about having complete control over everything running on the machine.

    • @kippie80
      @kippie80 3 месяца назад

      Yep, past few years, been filling in and expanding knowledge and capability in assembly, for fun

    • @iraniansuperhacker4382
      @iraniansuperhacker4382 3 месяца назад

      Assembly is still pretty fun its just a lot of instructions to keep track of. Ive messed around with doing a basic x11 hello world and it was almost 1000 lines

    • @toby9999
      @toby9999 3 месяца назад +1

      Same for me... 6502 and 68000. I still prefer lower level coding. Most of my work is with lagacy C code and C++

  • @CaptTerrific
    @CaptTerrific 3 месяца назад +56

    I had no idea Godbolt was named after a Mr. Godbolt!!!! He just took the #1 spot on the "best surnames of all time" list from my friend Mr. Goldhammer

  • @UnidimensionalPropheticCatgirl
    @UnidimensionalPropheticCatgirl 3 месяца назад +63

    0:58 Prime being hilarious while ruffling a lot of feathers completely on accident.
    The risc-v guys really don’t like being called cisc even though it essentially turns into one the moment you include any of the common high perf extensions.

    • @cyuria
      @cyuria 3 месяца назад +6

      I think there's not really a solid boundary between risc and cisc, but I reckon risc-v at least does it well by splitting the entire isa into extensions which have individual purpose as opposed to have extensions hacked on with new versions or whatever. I believe the beauty of risc-v is that you can create tailored chips for a specific application. For example, you might slap a bunch of vector extensions and parallelisation extensions but leave out stuff like atomics to get a low power, efficient gpu (ofc the technology isn't really developed to that point, but that's the theory anyway). So risc-v is really good for specialised chips as opposed to necessarily desktop cpus, which are pretty much always going to devolve into cisc anyway at some point

  • @Karn0010
    @Karn0010 2 месяца назад

    Finally got time to sit and watch this. I absolutely love these chats with Casey, I always learn so much. He is an amazing teacher and I'm glad there are people out there like him. I'm so glad Prime has him on and that Casey wants to be on as well. Can't wait for the next lesson.

  • @mike200017
    @mike200017 3 месяца назад +8

    Great talk! A good follow-up topic might be the memory model differences because (1) it's one of the major differences an actual programmer might hit when porting code from x86 to ARM, and (2) I would imagine it has power consumption implications since x86 chips are required to do more possibly useless work to keep caches coherent.

  • @JohanStrandOne
    @JohanStrandOne 3 месяца назад +3

    Casey has literally flipped my approach to web performance on its head. Love it!

  • @spidermancrawlingtheweb
    @spidermancrawlingtheweb 3 месяца назад +16

    "I can't believe we're doing all of this just to run JavaScript"
    lmao

  • @JackWse
    @JackWse 3 дня назад

    That intro was the most thoroughly agreed upon and understandable statement I've heard in a long time lol.. and I'm only seen one appearance of this guy.. It is kind of amazing The the real bar for entry when it comes to being able to distribute this kind of knowledge is simply to have somebody that also knows how to talk to people, or at least wants to try lol.. That's usually where it falls apart.

  • @kutto5017
    @kutto5017 3 месяца назад +7

    Used the BBC micro B at school.... It was the business.... The RISC based Archimedes was on the horizon and it was truly from another universe 😊. It was so far ahead it was indescribable in the late 80s... It was a jump from 8 bit to 32... That's pretty massive.... Price tag to match.....

    • @TheSulross
      @TheSulross 16 дней назад

      I jumped on getting an MC68K Mac when it came out thinking this was leading edge in both software and choice of CPU (for personal computer market) - little did I know what our cousins across the pond were cooking up (and, of course, Apple today has landed on using the CPU family that they created back then)

  • @voltflake
    @voltflake 10 дней назад

    Casey and Prime is such a great content combo, i can't describe how much i love that

  •  20 дней назад

    My favorite episodes literally the ones that include Casey! Thank you both!

  • @wmouse
    @wmouse 3 месяца назад +10

    I've been followed Casey since he started Handmade Hero and I love the dynamic between you two.

  • @damirkekez4692
    @damirkekez4692 3 месяца назад +2

    Another Casey video, this is just what I needed to make my day.

  • @gurdeepgss
    @gurdeepgss 3 месяца назад +39

    i am 30 minutes in and i think i can listen to casey 10 hours. 👍🏽

  • @MrWalrus3451
    @MrWalrus3451 3 месяца назад +6

    Casey is so powerful flip actually zoomed in when he said it.

  • @matthewoldham4804
    @matthewoldham4804 2 месяца назад

    The amount of preamble here was v precisely calibrated - I’ve never looked at assembly at all, but followed every point made, expertly done!!!

  • @JohnFrancisShade
    @JohnFrancisShade 3 месяца назад +3

    Love The Primeagen’s priorities on display! ❤

  • @tikabass
    @tikabass 2 месяца назад +1

    I didn't think much about ARM until I had to program data transfer using DMA. The ARM DMA subsystem is a marvel to behold, a fine piece of art.

  • @TurtleKwitty
    @TurtleKwitty 3 месяца назад +10

    About the arm chip being 0 power usage; if memory serves the anecdote is that the input power of the clock signal for the display was enough to power the rest of the chip

    • @ControversialOpinion
      @ControversialOpinion 3 месяца назад

      That's how I remember it. Or was it current on the data pins? Something like that. Not electric fields though, never heard of that. And doesn't really make sense, either. :D

    • @TurtleKwitty
      @TurtleKwitty 3 месяца назад

      @@ControversialOpinion input signals in general most likely yeah, might have a variation of which input depending on where you heard it from haha

    • @-_James_-
      @-_James_- 3 месяца назад +3

      It was voltage leakage from the support chips that provided enough power for the first ARM samples to run without any dedicated power supply of their own.

  • @robertlawson4295
    @robertlawson4295 3 месяца назад +9

    Power consumption is a byproduct of the electronics design (transistor architecture) and NOT ANY firmware or software characteristics. That's why the first ARM chip just happened to be able to operate using stray electric currents from peripheral components on the PCB. That wasn't on purpose but something that was discovered by accident. Well, that sort of discovery now becomes a desired "feature" to pursue on purpose and here we are.

    • @SimonAyers
      @SimonAyers 3 месяца назад +5

      That is true. However energy = power x time. So if a process takes longer to execute it can consume more energy even at a lower power consumption. So for a particular application a lower power device is not guaranteed to be more energy efficient.

    • @Microphunktv-jb3kj
      @Microphunktv-jb3kj 28 дней назад

      @@SimonAyers i heard somwhere that sending a message in facebook, is like having a 40-60w lightbulb turned on for 3 hours.
      claiming that languages and the code we write has no effect is complete bs....
      what does consume more power a loop wich runs on C , or a loop wich runs on python :DD
      "not any firmware or software"
      myths like thjis probably why software sucks these days, everything getting slower and slower...

    • @mareksicinski3726
      @mareksicinski3726 3 дня назад

      well software can be written inefficienylu

  • @ARKSYN
    @ARKSYN 3 месяца назад +9

    You guys really need to just start a podcast. The chemistry is great, Casey is a blackhole of knowledge and Prime keeps the mood lighthearted and fun.

    • @mareksicinski3726
      @mareksicinski3726 3 дня назад

      idk id rather he alaborate on the mbedded systems use of assembly, seems to be less common now but p comm

  • @user-mikesmith
    @user-mikesmith 3 месяца назад +13

    For the variable length instruction decoding on Intel, the CPU doesn’t necessarily need to decode what the compiler generated, it can theoretically decode something else.
    The CPU executes what is in instruction cache and the move from memory to instruction cache is slow. In theory you could remove variable length instructions on the fetch to instruction cache and give the CPU fix length microcode instructions.

    • @UwU-f2a
      @UwU-f2a 3 месяца назад +2

      That have cons. Intel CPUs are designed to execute legacy x86 instructions, and these are inherently variable length. Converting instructions into fixed length microcode would require a significant architecture overhaul, impacting compatibility with existing software and instructions. Intel CPUs already have optimizations like the micro op cache. This cache holds decoded uops for reuse, reducing the need to repeatedly decode instructions from memory. This already achieves a similar goal of reducing decoding overhead by reusing pre decoded instructions

    • @KF1847VM2
      @KF1847VM2 3 месяца назад +2

      > For the variable length instruction decoding on Intel, the CPU doesn’t necessarily need to decode what the compiler generated, it can theoretically decode something else.
      No. The incoming instruction stream, regardless of whether it is variable or fixed length, has to be decoded as is.
      > The CPU executes what is in instruction cache and the move from memory to instruction cache is slow.
      As slow as the memory system can operate at, provided that software does not interfere by making things worse - which sadly is a common case. Without reuse caching is not faster than directly running off memory.
      > In theory you could remove variable length instructions on the fetch to instruction cache and give the CPU fix length microcode instructions.
      In practice this is what various platforms did and continue to do in various forms for several decades. What gets fed into the core from the instruction stream perspective is very different to what is actually being acted upon internally.

  • @michaelk__
    @michaelk__ 3 месяца назад +1

    As someone that did some arm assembly writting for learning and such, this was really cool to listen to.

  • @olivierdulac
    @olivierdulac 3 месяца назад +1

    Thank you Casey, it's always a treat to learn from you.

  • @yourposer
    @yourposer 3 месяца назад +4

    i 💜 Casey Muratori's deep dives

  • @Whatthetrash
    @Whatthetrash 3 месяца назад +1

    Thank you for going slowly to make sure that you don't leave anyone behind, Casey! Thank you!

  • @ilu1994
    @ilu1994 3 месяца назад +2

    Love to see Casey, please come on more often!

  • @Kniffel101
    @Kniffel101 2 месяца назад +1

    56:00 There's a great 3-parter video interview with Sophie Wilson on channel "Charbax".
    If I remember correctly, she talks about the low power ARM stuff in one of those.

  • @Ahsan_Fazal
    @Ahsan_Fazal 3 месяца назад +7

    ANOTHER CASEY VIDEO!!! ❤🎉

  • @benb3928
    @benb3928 3 месяца назад

    Thank you for introducing the godbolt decompiler for those of us that didn't know. Having done some x86, PIC and other chip assembly programming in school long long ago ( that I hardly remember) this is a great primer for demystifying low-level instructions. There is a small hang-up I'd love to get his take on for clarity, I seem to recall that x86 had a much much larger instruction set with machine instructions that would take 10-20 cycles to execute while the more basic (Motorolla etc) chips did not; the more basic chips used, AFAIR, only the accumulator to perform operations (with few exceptions), while x86 allowed a subset of instructions to perform operations entirely within CPU registers without touching the accumulator value. Even ops like addition to direct memory locations were possible (beyond the CPU registers) whereas basic chips would have to move those values from memory to registers, perform add op and the result would have to be moved back from accumulator to the original mem location.
    All this to say the idle power draw to the extra transistors that x86 has to perform the ops on so many working registers was significantly higher, and as a result x86 arch was not as power efficient over the long periods where it doesn't use those extra functions. Is that still the case or is ARM arch now as "bloated" as x86 where it has similar transistor count in the ballpark order of magnitude as x86?

  • @JOHNSMITH-ve3rq
    @JOHNSMITH-ve3rq 3 месяца назад +2

    Lmao prime bailing to deal w the kid is brilliant. Love it

  • @DaveSmith-s6e
    @DaveSmith-s6e 21 день назад

    You cannot understand assembly without understanding Von Neumann architecture, and most programmers don’t go into the detail of how VNA works. That’s the big benefit of understanding that learning assembly gives you. It’s a fundamental understanding of how the processor is actually processing. Once that clicks working with data in registers at an address level through pointers becomes the most natural feeling in the world.

  • @GTXDash
    @GTXDash 2 месяца назад

    Man. This guy is so good at explaining things that even someone such as myself that doesn't code can understand.

  • @eliasepg
    @eliasepg 3 месяца назад

    I loved this talk, I learnt a ton, and helps understand everything so much better

  • @backendtower6580
    @backendtower6580 Месяц назад

    Casey my man, still looking great bro. This reminds to go back to Handmade Hero. Thanks Prime for this video man; you made this possible.

  • @asdfghyter
    @asdfghyter 2 месяца назад

    this is absolutely fantastic! very informative!

  • @myentertainment55
    @myentertainment55 3 месяца назад +17

    Low level programming but in simple language.
    What a treat!
    ❤❤❤

    • @dogman_2748
      @dogman_2748 3 месяца назад +1

      I wouldn't say LLL talks in an overly complicated way

  • @seanvandermolen7287
    @seanvandermolen7287 Месяц назад

    With power, he touches on the fact that there isn't much of a difference between the ISA. Which I think is true, to expand on that ARM was built around a SOC design with just only the requirements for the device. With x86, it was designed for a generalized desktop pc so the chips also included PCI connections and buses for expansion. Each PCI connection requires die space and consumes energy.

  • @ravenecho2410
    @ravenecho2410 3 месяца назад

    As someone with like datascience/machine learning, I always have no idea where Casey is going but I always love to come on the adventure and I always learn something new -- pulling up the webtool and following and playing along really helps with this video!
    Casey's channel is "molly rocket" btw, it always escapes my brain and then I remember -- incase you are looking for it u.u

  • @liquidpebbles
    @liquidpebbles 2 месяца назад

    Casey is the GOAT. I can't get enough

  • @TheBadFred
    @TheBadFred 3 месяца назад +5

    In the early 80s, if you had a Commodore 20/64 8-bit with a MOS6510 and your programs had to run full speed, there was nothing but assembler.

    • @michaelday341
      @michaelday341 3 месяца назад

      I bought "Creating Arcade Games on the Commodore 64," and I think I also bought a machine language book, too. Sadly, I didn't get very far with either book. But, I remember the excitement I had finding out that books like that existed, because I really wanted to program games. Too bad I didn't have the skills that others did.

    • @TheBadFred
      @TheBadFred 3 месяца назад +1

      @@michaelday341 Basic was better than nothing.

    • @eavdmeer
      @eavdmeer 3 месяца назад

      Exactly this got me into assembler on the C64. Pure performance poverty 😂 Not even a compiler. Just writing code directly in my Power Cartridge monitor.

  • @CjqNslXUcM
    @CjqNslXUcM 3 месяца назад +9

    x86 is like utf8 and ARM is like utf16

  • @xenmax
    @xenmax 2 месяца назад

    About the ARM no power anecdote there is an interview to one of the engineers that work on the first ARM chip in Acorn (ARM used to be Acorn Risc Machine) in which he explains that the first prototype of the chip when they first tested it they measured 0mA current going in the power rails. They soon realize it was because the power ralis were disconnected but the chip was working anyway because the current was flowing in by other pins in the package. It doesn't mean that the chip used virtually no power, only that it used little, so little that only with the input signals and capacitors had enough to work with, without the power rail conected to anything.

  • @myca9322
    @myca9322 3 месяца назад

    i'm sure this won't get seen, but around 1:17:39 when discussing instructions getting added bc they are used commonly etc-is it possible an adaptive ISA could solve these issues? i guess this is something like the instruction cache that was mentioned, but i'm imagining something that would keep commonly-cached instructions around and somehow build them in to the instruction set, as it's being used... is this possible? impractical? am i misunderstanding something?

  • @marble_wraith
    @marble_wraith 3 месяца назад +5

    Godbolt sounds like a man who is blazingly fast!

  • @maddada
    @maddada 3 месяца назад +6

    Casey is right that it's not the ISA that's mostly affecting efficiency. Intel Lunar Lake is an example of how x86 can match or even beat ARM in terms of low power - while keeping backwards compatibility.
    Intel and AMD just needed to prioritize low power and Apple + Qualcomm finally gave them a real reason to.
    Lunar lake has similar performance, heat, and battery runtime numbers to M3 and Snapdragon. See Just Josh's lunar lake video for more about this.
    However ARM is better since it's more open and more competition is happening there to get the best performance per watt.

  • @KvapuJanjalia
    @KvapuJanjalia 3 месяца назад +14

    1:17:08 I remember when Intel invented new instructions specifically for XML parsing. I would not be surprised if we see JSON parsing instructions in next i9 or something.
    EDIT: I exaggerated quite a bit: SSE4.2 text processing instructions are general purpose, not intended for XML processing only.

    • @poteitogamerbr2927
      @poteitogamerbr2927 3 месяца назад

      Seriously? Tried to google it to found what instructions do this but found nothing. Do you have sources?

    • @KvapuJanjalia
      @KvapuJanjalia 3 месяца назад +7

      @@poteitogamerbr2927 SSE4.2 text processing instructions: PCMPESTRI, PCMPESTRM, PCMPISTRI and PCMPISTRM. I guess when they were introduced, XML was the new hotness, and these were marketed accordingly. Looks like they actually are general purpose and can be used for JSON processing too.

    • @mss664
      @mss664 3 месяца назад +3

      @@KvapuJanjalia Those are really just for string searching. You can use them to implement for example strpbrk. And they have a variant for null-terminated strings.

    • @poteitogamerbr2927
      @poteitogamerbr2927 3 месяца назад

      @@KvapuJanjalia thanks, it seems very cool. I wonder if compilers like gcc actually optimize say C code into those instructions since they are very specific or you must call them directly.

    • @morosis82
      @morosis82 3 месяца назад +1

      ​@@poteitogamerbr2927 that might depend on a couple of things.
      As far as I understand, if it's a fairly widely supported instruction then your compiled binary may contain it with a fallback for a chip that doesn't support it.
      If it's quite specific you might need to let the compiler know through flags to include it.

  • @benoitlaine1131
    @benoitlaine1131 Месяц назад

    How great can a podcast be! Thx

  • @SergioStankevich-ef2mf
    @SergioStankevich-ef2mf 2 месяца назад

    I adore the Casey streams and the rabbitholes²

  • @renecouture3719
    @renecouture3719 3 месяца назад

    Lot of knowledge and history here! Sounds like ARM instructions are a better design, I'll keep it in mind

  • @przemekkobel4874
    @przemekkobel4874 2 месяца назад

    56:00 Guy in a documentary I saw told they forgot to connect Vcc rail, but the first Acorn Risc Machine chip was able to run on currents passing through pull-up resistors (stuff that stabilizes bus state).

  • @jasonchen-alienroid
    @jasonchen-alienroid 3 месяца назад +3

    ex-system architect here. instruction sets are not the issue, it's the way how it's architected. As ex-bios engineer worked on APM and ACPI and later specialized in power management on ARM devices, it's just night and day differences on how two architecture approaches designs.
    One example of why instructions doesn't matter. When I was a bios engineer, I worked on x86 asm. When I worked on arm, I've mostly used c/c++. Only the rare time I have to use jtag and debug in asm and that's almost never the issue.
    On power implementation approach, x86 is almost an after thought. ARM platforms I worked on literary think of every possible way to try to improve power in every iteration.

    • @Freshbott2
      @Freshbott2 2 месяца назад

      Great to come across someone who’s really familiar with it. For Intel - WHY is it an afterthought? Don’t they have as much to gain from the same?
      But by the original notion - isn’t it expensive to run all this fancy decode outside the core when modern compilers just aren’t using the breadth of x86? Surely that’s a whole bunch of transistors ARM just doesn’t need to contend with?

    • @jasonchen-alienroid
      @jasonchen-alienroid 2 месяца назад

      @@Freshbott2 I didn't work for Intel but I suspect it's purely due to politics. They had ARM license back in the days when they did PXA270 and they know how it works. The fact that they sold it off and not apply much to their architecture (at least from external pov), seems they just didn't care for it enough. I'd assume they were making so much money from server side that they just didn't care for the ARM threat.
      On fancy decode, it's not that expensive to run outside (just think how mobile works). Also not that complex to add these to compilers (maybe back in the days if they add that to gcc). Or the ISA can have prefetch to sort of know this is certain kind of workload that needs to be offload to the correct component/core.
      Again, just think how mobile works. It has all the features of a pc in a SoC.

    • @lyingcat9022
      @lyingcat9022 11 дней назад +1

      But can the layout of the individual institutions themselves be tailored to an optimization in the CPU architecture itself? For instance I was looking at the layout of some of the riskV instructions and the size of and bit layout of the operands, destinations and instruction seemed kinda random in certain instructions. Can’t think of one off the top of my head but I remember reading that the decision of the bit layout was due to it somehow being beneficial for the physical configuration of the CPU components. With this in mind in a sense, is the instruction set and the architecture be kinda coupled in terms of performance or power efficiency?

    • @jasonchen-alienroid
      @jasonchen-alienroid 11 дней назад +1

      @@lyingcat9022 There are lots of factors for layout. If you compete in Asia, like mediatek, they would use that potentially as a competitive advantage. The trade offs are the hours of hardwork from their engineers. I'd say these days, we are fairly modularized in the sense that the subsystems are large enough and wants less interference from another subsystem, so the layout on performance probably isn't top of the priority as to other factors like thermal... but that's just my observation.

  • @skilz8098
    @skilz8098 Месяц назад

    Very nice presentation with an excellent guest appearance. Yet as much as you guys did cover within this, it's still only the tip of the iceberg.

  • @burkskurk82
    @burkskurk82 3 месяца назад +4

    I can’t shake the feeling that this discussion becomes second guessing after some 40 mins. It’d be good to invite Jim Keller on the show.

  • @chaitanyakumar3809
    @chaitanyakumar3809 3 месяца назад

    If you get Casey on again for a similar topic, I think reading through and discussing David Chisnall's article "There's No Such Thing as a General-Purpose Processor: And the belief in such a device is harmful" would be interesting -- he goes into things like the energy impact of complex decoding machinery.

  • @avidessauer154
    @avidessauer154 3 месяца назад +3

    You should have someone on to talk about the difference in memory models (x86 strong, arm/riscv weak).
    Also worth touching on how the C11 memory model's adoption has made far more software compatible with weak memory models.

  • @BenniK88
    @BenniK88 3 месяца назад +1

    Love it, the content we need. Thx ❤

  • @skeleton_craftGaming
    @skeleton_craftGaming 3 месяца назад +1

    Fun fact, the a in arm originally stood for acorn, the makers of the BBC micro... The first arm chips were literally acorn asking how they could make a sequel to the BBC micro [or one of its successors. I'm not British or a computer historian for that matter]😊

  • @_vdm_
    @_vdm_ 3 месяца назад +1

    Love these videos with Casey

  • @mlv60
    @mlv60 2 месяца назад

    i cant get enough of casey talking about computers ❤

  • @steffennilsen2132
    @steffennilsen2132 3 месяца назад +1

    I learn so much from this, quality content

  • @FunwithBlender
    @FunwithBlender 3 месяца назад +1

    "I only look at it occasionally" lol after that knowledge bomb

  • @CoderDBF
    @CoderDBF 3 месяца назад +1

    Thank you, Casey.

  • @rev0lu7ion
    @rev0lu7ion 3 месяца назад

    i love hearing casey talk about anything

  • @jordanjackson6151
    @jordanjackson6151 2 месяца назад

    SO glad this is finally up. ARM is on my to 'RUN' list. It's apparently effective at reading Malware. I've been spoiled by Lua, Python, JavaScript and so on.

  • @FunwithBlender
    @FunwithBlender 3 месяца назад +1

    love Casey! he has a big brain

  • @ehh54
    @ehh54 3 месяца назад +12

    Love the opening 😂

  • @nick4uBB
    @nick4uBB 3 месяца назад

    That was great. I learned a lot - thank you!

  • @PavelAslanov
    @PavelAslanov 3 месяца назад +1

    Maybe I do not understand all the details, but I think memory model is way more important in the limit. x86 is way more restrictive on how it can reorder memory access (for atomic operations it will always be memory_order_seq_cst) in spirit it is very similar to GIL in python. While arm is free to do way more reordering and given how slow memory access is I can see how this difference can bring substantial edge in performance.

  • @dukemo6551
    @dukemo6551 5 дней назад

    My favorite comment was "I think you taught me something and I didn't come here to learn" lololol😂

  • @kippie80
    @kippie80 3 месяца назад

    It is lower power because it is lower count of transistors AND there are less switching transitions per productive computation. Initially. Then yes, the trend to lower voltages and physical layout of transistors. Still though, those initial design constructs count. Also, switching to thumb mode is way to power down extra circuitry in chip. Power is burned when a transistor transitions.

  • @muayyadalsadi
    @muayyadalsadi 3 месяца назад

    54:23 at minute 54 and orthogonal memory access is not mentioned. 1:01:09 beside the decoders. Orthogonal memory access modes in x86 is why it needs more transistors to be implemented.

  • @benoitlaine1131
    @benoitlaine1131 Месяц назад

    Does the CPU not do part of what a compiler did in yesteryear ? When converting complex "assembly" assumptions into u-ops ?

  • @nexovec
    @nexovec 3 месяца назад +1

    Legendary video with a mandatory algorithm boosting comment from me.

  • @garethlagerwall
    @garethlagerwall 2 месяца назад

    Love these discussions

  • @martinrodriguez1329
    @martinrodriguez1329 2 месяца назад +1

    What I take from this is that x86 comes from a very old place where instructions didn't take more than just 2 bytes, but as time went by, the need for bigger instructions lead to a solution meant for retrocompatibility, which made instructions take more clock cycles to figure out what you're trying to do. ARM, on the other hand, decided (probably due to experience) to keep a fixed size for instructions with a certain large that they would think it's enough, and thus making them all take the same time which would be (I would assume) 1 clock cycle.
    The other thing I take from this is that there's not a big necessity for better CPU's and the companies are relying on programmers wasting resources so they would need better products due to that inefficiency so they can keep the marketing going, which is... concerning.

  • @SETHthegodofchaos
    @SETHthegodofchaos 3 месяца назад

    Great stuff! More Casey please :)

  • @skilz8098
    @skilz8098 Месяц назад

    From my own understanding, the original ARM processor and its design was originally motivated by the ability of it being engineered by a small team of talented people as opposed to large team. This was one of the major pushes for the RISC design compared to the then Intel's CISC design. Once they achieved this stage of development the project leader then pushed it onto the engineers to reduce the amount of Heat output from all of the individual parts of the underlying circuitry - logic. The project leader didn't want any additional cooling components. He wanted it to be manufactured on small, cheap simple plastic substrates without the need for any kind of heat sinks. He wanted the cost to be about $.04-0.6 / chip as opposed to $20.00 / chip. The was also a huge influence in the original design. The engineers then had to go and measure all of the voltages, amps, and watts for every single path and connected component - device within the chip. This was a huge task and ended up being an engineering feat in its own right. This is what I know about the history of ARM from the BBC days that ASFAIK did originally use the 6502 as opposed to the Motorola, the Z80, or the 8086 of the early 80s.

  • @Nightwulf1269
    @Nightwulf1269 3 месяца назад

    Well....x86 around 1600 instructions, Arm around 150 and RISC-V (GC) has around 40....but that's not the sole deciding thing, On RISC-V the instructions are no longer human readable (if that's even possible) in their hexadecimal form and optimized for the instruction decoding logic to be as simple as it could get. So if we compare those, compare comparable things.
    But other than that detail, fantastic video and great knowledge shared by Casey! Thank you very much!