The Truth About The Fast Inverse Square on N64 | Prime Reacts

Поделиться
HTML-код
  • Опубликовано: 26 ноя 2024

Комментарии • 193

  • @capsey_
    @capsey_ 11 месяцев назад +645

    The fact that N64 is not bounded by CPU is so funny to me. Instead of punching on Sega because 64 > 32 their marketing team could've just said "our CPU is so fast it's fucking useless"

    • @0-Kirby-0
      @0-Kirby-0 11 месяцев назад +108

      It also creates such a deeply fascinating problem space, where you not only optimise for binary size over cycle count, which is already unusual, but you're specifically worried about fitting as much computation as possible into a single cache-load, so you don't have to go back to ram and disturb the renderer.
      It feels oddly multithread-y, where you bulk-copy what you want to work on into your own little bucket, so you don't have to acquire-release as much, except it happens with the instructions themselves, not just the memory being worked on.
      Absolutely nuts.

    • @Takyodor2
      @Takyodor2 11 месяцев назад +11

      @@0-Kirby-0 Is it really unusual? Copying data from memory is a couple of orders of magnitude more expensive than your average floating-point calculation (with respect to energy and latency), so I'd expect this type of optimization to be sort of common?

    • @Trisslotten96
      @Trisslotten96 11 месяцев назад +10

      The same kind of applies to modern CPUs as well.

    • @samphunter
      @samphunter 11 месяцев назад +18

      ​​​@@Takyodor2it is unusual. Your cpu will usually predict what instructions it will need ahead of time and load them long before the loading slows anything down. That is why branchless is more common optimization nowdays (avoid guessing what will run next)
      That said, it is common for data to be optimized to fit cache. There is a video about why linked lists are almost always slow which explains this.

    • @kreuner11
      @kreuner11 11 месяцев назад +6

      They really flunked on the ram and texture vram, if these were made in a different more sensible architecture, the n64 might have been so much more ahead of the competition

  • @KazeN64
    @KazeN64 11 месяцев назад +533

    The second column becomes irrelevant because all the instructions to run the entire graphics thread can be cached at the same time, so we have no cache misses = no memory used (basically)

    • @AlbertBalbastreMorte
      @AlbertBalbastreMorte 11 месяцев назад +25

      We are not worthy

    • @zeckma
      @zeckma 11 месяцев назад +13

      this needs to get pinned

    • @konberner170
      @konberner170 11 месяцев назад +2

      Bravo!

    • @marcelocardoso1979
      @marcelocardoso1979 11 месяцев назад +19

      I'm still baffled at how you managed to fit an entire renderer in 12kB. Absolute genius!

    • @SICunchained
      @SICunchained 11 месяцев назад +1

      Thanks for the awesome vids man. Love watching your shit. Hope to be able to achieve your knowledge and application of math someday. ^.^

  • @kuhluhOG
    @kuhluhOG 11 месяцев назад +98

    This video in a nutshell:
    Do you still have questions?
    [ ] No, I understood everything.
    [x] No, I didn't understand enough to even begin phrasing questions.

  • @triplebog
    @triplebog 11 месяцев назад +244

    As a graphics engineer, this is one of the only primeagen videos that actually makes sense

    • @khhnator
      @khhnator 11 месяцев назад +21

      ikr! im a game/systems dev and most of the time i completely go "what is this react stuff" to primagen stuff

    • @ramy701
      @ramy701 11 месяцев назад +7

      can you recommend any resources / books for someone wanting to get into graphics engineering ? :)

    • @headecas
      @headecas 9 месяцев назад

      ​@@ramy701 dot

  • @bertram-raven
    @bertram-raven 11 месяцев назад +45

    This reminds me of the developer who optimised drum-memory execution by scattering the instructions across the drum in such a way as the read-head was exactly over the next instruction just as it was required. He also added jump instructions which went nowhere so the motion of the drum would flatten loops and reuse code in a way which "increased" drum capacity - effectively storing 6.2KB of code in 6KB.
    This is next-level optimisation.

  • @Entropy67
    @Entropy67 11 месяцев назад +27

    7:50 he rewrote it so that it would all fit on a single page, its tiny, the CPU will never remove it from cache. That means that we never cache miss. I think

  • @isodoubIet
    @isodoubIet 11 месяцев назад +50

    The reason the number of instructions becomes irrelevant is that, as far as the cpu running is concerned, what matters is the number of cycles. The number of instructions matters _only_ inasmuch as you're having to stream all that stuff from memory into the instruction cache, since the size of the machine code is roughly proportional to the number of instructions (this is not necessarily the case for monstrosities like x86 architecture, but it is the case for the n64 which has a MIPS processor with a very typical RISC pipeline). Since the entire renderer code fits into the cache anyway, you can have as many instructions as you like, and all you're comparing is how many cycles are being spent on running the instructions themselves.

    • @jeremylakeman
      @jeremylakeman 11 месяцев назад +2

      And that's because the primary thing he's trying to optimise here is memory bus usage. Since the "GPU" needs to use it for texture mapping (etc) to improve frame rates.

  • @Tabu11211
    @Tabu11211 11 месяцев назад +20

    My favorite programer streamer covering my favorite niche coder?! What a good day!

  • @magfal
    @magfal 11 месяцев назад +69

    11:52 the aggregate improvements he's made would have ensured a release of Mario 64 2nd adventure or something if Nintendo mysteriously received them back in the day.
    A 3X improvement would have enabled enough of a difference in capability to justify it.
    Also check out the Portal 64 project.

    • @NoNameAtAll2
      @NoNameAtAll2 2 месяца назад

      portal64 failed the anonymity and got lawyer-ed

    • @magfal
      @magfal 2 месяца назад

      @@NoNameAtAll2 I would love for someone like the GOG team to use their experience and contacts to broker a deal. It's not like it's existence is diminishing value for any of the parties involved, or that clearing things up contractually would take long if all parties are positive and constructive.
      It would be free advertisement for all involved parties, if they could all agree to a narrow licence that would allow it to move forward as a non-commercial project (labling the patreon funding the development as unrelated subscription revenue from the development streaming and videos).

  • @TheTrienco
    @TheTrienco 11 месяцев назад +52

    Not thought about 20us in a long time? That depends on what you're working on. Granted, 20us over a full frame probably won't matter much, but for a function you call a lot, that can add up quickly. If you think about it: for 100fps, your frame budget is 10ms.. for AI, physics, game logic, rendering and whatever else needs to be done. That's only 500 of those 20us.

    • @someonespotatohmm9513
      @someonespotatohmm9513 11 месяцев назад +2

      In the micro electronics industry it is also quite common to see u or n as prefix.

    • @prgnify
      @prgnify 11 месяцев назад +12

      @@someonespotatohmm9513 I've done a bit of embedded work, when he said "I don't think any of us has thought of 20us in a long time" I felt invisible.

    • @TrueHolarctic
      @TrueHolarctic 11 месяцев назад +2

      ​@prgnify tbh we dont think about 20us a lot. Thats just one eternity. That sweet sweet 100MHz clock

    • @prgnify
      @prgnify 11 месяцев назад +2

      @@TrueHolarctic Yes, the first thing I though of was a joke from the embedded programming subreddit, that played with the concept of 20us being an eternity to us and completely indifferent for most others

    • @MikkoRantalainen
      @MikkoRantalainen 6 месяцев назад +1

      20 µs is quite important time even for web programming if you're computing JSON response that consist of 500 elements. If you take 20 µs per element, it will be 10 ms for the whole list already.

  • @Mempler
    @Mempler 11 месяцев назад +17

    "The Mario64 command is my favourite linux command"

    • @halle0327
      @halle0327 6 месяцев назад

      glad I’m not the only one that says this

  • @whamer100
    @whamer100 11 месяцев назад +13

    god i LOVE Kaze, his contributions to the sm64 romhacking community is next level

  • @zebraforceone
    @zebraforceone 11 месяцев назад +10

    I'm pretty sure that second column becomes irrelevant because the instruction set and it's operating memory is stored on the cache, as opposed to operations across the memory bus

  • @zweitekonto9654
    @zweitekonto9654 11 месяцев назад +6

    his brain was in so much shambles, he forgot the outro.

  • @glauco_rocha
    @glauco_rocha 11 месяцев назад +4

    as rob pike said: never tune for performance until you have measured, and even so, don't do it until the code you're measuring OVERWHELMS the rest of the code.

    • @MikkoRantalainen
      @MikkoRantalainen 6 месяцев назад

      I mostly agree but if you make *everything* super slow then your code ends up really bad until one single part overwhelms all the rest. I think it's better to decide the required latency at the start and then do whatever it takes to keep the final performance at least as good as your decided latency.
      For a game, the decided latency could be 1/30 or 1/60 seconds to match typical displays. For a web service, the target latency could be 50 or 100 ms.
      Then you know when you have to start optimizing: whenever your existing code cannot meet the latency requirement. What to optimize? The part that overwhelms everything else at that time.

  • @chhihihi
    @chhihihi 11 месяцев назад +16

    Caching and Cache misses are an easy concept to understand and will dramatically increase the speed of your code. I strongly recommend you check it out.
    Because there's a ton more distance between the cpu and ram, cache is blazingly fast and keeping everything on cache greatly reduces the amount of cycles (time) it takes to complete a set of instructions and what keeps you in cache is having few enough instructions and data to fit. Think of it as a primary resource and ram as a back up.

    • @entertain8648
      @entertain8648 11 месяцев назад

      what about apple chips that has ram close to cpu?

    • @snooks5607
      @snooks5607 11 месяцев назад

      @@entertain8648 still order(s) of magnitude difference, even apple's unified memory can have 100-400ns latency for a fetch depending on multiple things like TLB lookups, where as on-die L1 cache access has cost just couple cycles ever since they were introduced to PCs with 486DX in 1989 (on a modern ~4GHz machine about 1ns)

    • @Takyodor2
      @Takyodor2 11 месяцев назад +3

      @@entertain8648 It's still not _in_ the CPU. Analogy: going to the grocery store in your own town is faster than the one in the next closest town, but both are many times slower than accessing your fridge. (The hard drive would be on the moon in this analogy)

    • @entertain8648
      @entertain8648 11 месяцев назад

      @@Takyodor2 well I understand what kind of difference is that
      What I am asking if anyone knows how apples disign saves the situation

    • @Takyodor2
      @Takyodor2 11 месяцев назад +3

      @@entertain8648 It doesn't save the situation, just makes the huge performance hit slightly less huge.

  • @tornoutlaw
    @tornoutlaw 11 месяцев назад +21

    20ms per frame...does this mean an N64 could run Mario64 in 60fps?

    • @traister101
      @traister101 11 месяцев назад +40

      Yep. Kaze has a rom that does 60fps on a native N64.

    • @chainingsolid
      @chainingsolid 11 месяцев назад +2

      1000/60 = ~16ms so almost..

    • @MrAbrazildo
      @MrAbrazildo 11 месяцев назад +1

      ​@@chainingsolid1000 ms / 20 ms = 50 FPS.

    • @binguloid
      @binguloid 11 месяцев назад +8

      guys he said microseconds not miliseconds

    • @MrAbrazildo
      @MrAbrazildo 11 месяцев назад +3

      ​@@binguloidYeah, and that was a performance earn, not the entire time for the frame.

  • @starleaf-luna
    @starleaf-luna 2 месяца назад +2

    hahaha! the comment about "your site runs slower on a faster processor than Mario 64" does not apply, because my website is just CSS, HTML and barely any JS just to not have to split the navigation bar across multiple files to make it easier to update! (there's probably a better way to do that, but still, most of it is just CSS and HTML. the JS runs just once on page load.)

  • @Emil_96
    @Emil_96 11 месяцев назад +2

    Ocarina of Time is just pure nostalgia and it'll always have a spot in my heart

  • @Kiyuja
    @Kiyuja 11 месяцев назад +6

    yeah Kaze always does great content

  • @SianaGearz
    @SianaGearz 11 месяцев назад +4

    N64 is a unified memory gal, there is no VRAM, there's just RAM. The whole RAM is connected to the GPU (semi-custom chip) and when the CPU (off the shelf chip) asks for something from RAM over the CPU bus, the GPU has to stop whatever it's doing to serve that memory request for the CPU.
    This is why verbose code that consists of a ton of individually fast machine instructions is generally BAD, since instruction loads use up your RAM bandwidth and slow down the GPU rendering.
    Except when you have a pass in your rendering or other code that munches a lot of data while using all the same code, and all the code is laid out contiguous across a handful pages of memory (as to avoid cache lines fighting for tag space) and all those pages of code end up stored in the CPU's onboard cache for all but the first iteration, then there's no memory hits. You'd call this cache L1 today, but L2 or L3 doesn't exist so it's just Cache. Then you don't care about how verbose the code is in each function, you just need the whole code for that whole pass to fit.
    So for code that is more sprawling and expected to be uncached, such as gameplay code, you want it to be as terse as possible; while for code that can be formulated as an L1-resident pass or kernel, you can actually make it verbose in places.
    I myself program on another classic device which also only has L1, so i'm learning from Kaze a lot. I had also been cursed by a very sharp and strict numerics professor 20 years ago, it took me YEARS of hard work to pass that exam, so well i suppose i'm getting a lot more than just hype from floating point trickery, even though i'm more rusty than i'd like to be.

    • @MikkoRantalainen
      @MikkoRantalainen 6 месяцев назад

      One could also explain N64 as having no RAM at all but being able to run CPU instructions from VRAM if GPU is stopped while that's happening. And there's 16 kB cache in the CPU itself so as long as you can keep everything in that space, the GPU doesn't need to be stopped.

  • @jerichaux9219
    @jerichaux9219 11 месяцев назад +2

    I see you and I both have mastered the ancient knowledge of almost-kind-of-remembering-floating-point-formats-but-not-really.

  • @Kavukamari
    @Kavukamari 11 месяцев назад +3

    getting some much needed neck nodding exercises in on this video, Prime is gunna be swole soon

    • @bozoc2572
      @bozoc2572 11 месяцев назад +1

      He's clueless

  • @tacokoneko
    @tacokoneko 11 месяцев назад +14

    i hope some that some day kaze can improve the Linux kernel for N64 because right now it could already theoretically run any Linux program .. as long as it fits in 8 MB of RAM alongside everything else. It would be incredible if he could install a web server and then we can really build react for N64

    • @blarghblargh
      @blarghblargh 11 месяцев назад

      n64 already runs doom.
      sometimes you gotta stop and ask why :D

  • @MrAbrazildo
    @MrAbrazildo 11 месяцев назад +8

    7:08, in old hardware, the engine instructions/data didn't fit entirely on the cache. So, depending on how many instructions an action takes, CPU had to seek the RAM, which uses to be 100x slower (maybe less in a console). On modern hardware, all instructions/data are in the cache, which has much more memory than they require, for an old game. However, RAM is still used even nowadays, for multimedia stuff: images, video, audio, textures and other more than 64 KB sized. The optimization for these large things targets to load part of the RAM on the VRAM (GPU cache memory), in a moment the user doesn't care, like a loading scene - i.e. God of War's Kratos passing through some rocks. Sometimes this is used for loading from files to RAM too.
    11:58, but he is doing it for modern hardware, isn't he? The video's goal is just to explain why Quake's alg. is not meant for all cases.
    13:00, the sad truth is that these pointer transformations are UB (undefined behavour). That's why the guy commented it as "evil": he just wanted to get his job done, leaving the comment for the future masochist who will deal with the potential nasty bug. UB means the operation is not standardized. So, the app may someday start crashing or giving wrong values (out of nowhere!), if any thing change from the original setup: hardware, OS, any imaginable protocol that interacts to the game. Not even old C had an expected action for that, as long as I heard.
    13:52, in math, a minus exponent means that the number is divided. So, x*0.5 == x / 2 == x*2^(-1). Instead of multiplying the whole number, it's possible to change its exponent, by sum or subtraction, which are faster operations.

    • @isodoubIet
      @isodoubIet 11 месяцев назад +3

      No he runs this stuff on original hardware

    • @isodoubIet
      @isodoubIet 11 месяцев назад +3

      As for it being UB, technically it is, but so many people do it that I don't think most compilers take advantage of it. C++ has only recently added a non-UB way to do it in C++20 with std::bit_cast. Before, the only way to do it was with memcpy, which would defeat the purpose.

    • @tannerted
      @tannerted 11 месяцев назад

      Why are these pointer transformations UB? Casting to a different type doesn't change anything about the underlying bit representation. So cast and then shift and then cast back is just fine and deterministic according to the C spec. Am I wrong? (I might be; please teach me if I have something wrong) I feel like this is done all the time in embedded systems and OSs

    • @MrAbrazildo
      @MrAbrazildo 11 месяцев назад

      ​@@isodoubIet- Are you saying that Nintendo didn't tried to pack the data into the cache? This seems absurd. I can't imagine a game being made that way. It sounds so amateur.
      - I've forgot about this bit_cast. I need to study C++20 deeply.
      - Because memcpy would be slower?

    • @MrAbrazildo
      @MrAbrazildo 11 месяцев назад

      ​@@tannertedI heard this on a presentation. I don't read standards. But I also heard that C unions are now well defined. Since they were used for type punning, maybe this is now valid C - _UB in C++, because there are other resources, as this std::bit_cast_ .

  • @alexaneals8194
    @alexaneals8194 11 месяцев назад +4

    The problem when you optimize for a CPU or GPU is that you should comment for which CPU and GPU was optimized for. Later versions may break your optimization or may offer options that are far better. If the code isn't commented then when someone goes in to make changes, they don't know whether the optimization still applies or if it should be changed without spending a few cycles trying to figure out why the optimization was done. The same principle applies for higher level optimizations. And a note to my past myself this includes personal projects.

    • @Minty_Meeo
      @Minty_Meeo 11 месяцев назад

      Sure, but Kaze's SM64 codebase basically only works on MIPS-GCC and only on N64 with all of the inline asm and illegal code it uses to go fast. He is way beyond the point of cross-compiling his mod.

  • @Tobsson
    @Tobsson 11 месяцев назад +2

    I watched so many videos of him. I'm not even 0.0000000000000001% smarter or more knowledgable since then, but it sure sounds cool.

    • @JohnSmith-ox3gy
      @JohnSmith-ox3gy 9 месяцев назад

      "I like your funny words, magic man." -JFK

  • @keyboard_g
    @keyboard_g 11 месяцев назад +5

    Outside of the tiny texture cache and the ram latency (high bandwidth, bad latency), the N64 was a computational beast for the time.
    Nothing else was close.

    • @skilz8098
      @skilz8098 11 месяцев назад

      I don't know about that. The PS1 and the Sega Dreamcast were both impressive machine architectures too.

    • @jc_dogen
      @jc_dogen 11 месяцев назад +4

      ​@@skilz8098Dreamcast was the next generation and N64 runs at 3x the clock speed as the PS1

    • @skilz8098
      @skilz8098 11 месяцев назад

      @@jc_dogen Yeah, but the PS1 was a breakthrough in its day. It wasn't the first "CD-ROM" type console because there was the Sega CD "eh" then the Sega Saturn which was okay around the same time as the PS1. Panasonic even had their own, I think it was the 3DO but it didn't go over so well. The PS1 with its capabilities and affordable cost plus all of the available game titles made it very successful.
      Here's the 90s in a nutshell
      *Sega MegaDrive/Genesis - 88 (cart)
      Commodore 64 - 90 (cart)
      Neo Geo - 90-91 (cart)
      SNES - 90 (cart)
      Philips CD-i - 91 (disc)
      Sega CD - 91 (disc addon to the mega drive)
      3DO - 93 (disc)
      Jaguar - 93 (cart/disc addon in 95)
      Sega 32X (cart-addon to the Genesis)
      Neo Geo CD - 94 (disc)
      Sega Saturn - 94-95 (disc)
      PS1 - 94-95 (disc)
      N64 - 96 (cart)
      3DO M2 - 98 (disc)
      Dreamcast 98-99 (disc)
      *PS2 - 2000 (CD/DVD)
      Some of them were good, some were okay, some were flops. Some were great.
      For me, the SNES, the PS2 were of some of the best consoles. The Genesis was decent, the PS1 was very good. The N64 and Dreamcast were both good. Some of them I never played on such as the 3DO, CD-i, Neo Geo or the Commodore. The Sega Saturn was okay, it had potential but was inferior to other consoles. The Sega CD was a nice concept but didn't go over to well, and the 32X was a major bust. And it was around this time that PC Gaming started to become a commonplace thing too.
      They were definitely the Good old days. I kinda jumped ship from SNES to PS when Final Fantasy dropped Nintendo and migrated to the PS1. And then games such as Resident Evil 1 & 2, Silent Hill, Parasite Eve, Castlevania: Symphony of the Night, Tony Hawk, Cool Boarders, etc... The PS1 then later the PS2 just took over. The SNES was a very popular and long favorite with many great titles... but eventually the PS2 became the champ. I still have both my SNES and my PS2. I don't have my original Atari, NES, Genesis, or PS1 anymore but I still have my PS1 games and I use my PS3 for that. I stopped getting into the "console" fad after the PS2 and only picked up the PS3 used about 2 years ago just for a select few titles and to be able to use my PS1 games. I still have Diablo for PS1. The only game I'm really wanting or missing for my PS1 collection is Ogre Battle.
      And as for Dreamcast being next Gen... kind of. It only came out 2-3 years after N64 just before the PS2 released. So the N64 had a head start on them. And it wasn't until 2001 until Nintendo came back with the Gamecube. So that's basically the 90s in a nutshell. Well from about 1988 - 2002.
      There were a few other consoles but not really worth mentioning as some of the were really obscure or niche console markets.
      But yeah as for performance and being a console with 3D graphics on a Cartridge in 64 bit, yes the N64 was a very nice machine. Mario Kart 64, Bomberman 64, FZero was decent but wasn't as good as FZero on the SNES, yet the FZero version for the Gamecube was great. Then you had Metroid. I could go on... What can I say, I've been gaming since Warlords, Pitfall, Circus Circus, Space Invaders, Asteroids, Missile Command, Breakout, Pacman, and much more... Been at it since the early 80s.

    • @jc_dogen
      @jc_dogen 11 месяцев назад +2

      @@skilz8098 text dump bro. lmao
      but I agree, the dreamcast was in-between gens, though it was still a very big jump. I would also say the n64 was (mostly) much more powerful than the ps1. Some aspects made this less obvious (cartridges, very small 4K texture memory), and the hardware had some serious problems that ate into it's performance (rambus latency and bandwidth problems) that were probably just mistakes. But, at the end of the day, performance sapping features like sub-pixel accurate rendering, perspective correct texturing, texture filtering, and z-buffering were only possible because of the extra power it had.

    • @skilz8098
      @skilz8098 11 месяцев назад

      @@jc_dogen Well as the saying goes a picture is worth a 1,000 words, and I have about 1,000 pictures in mind, lol...

  • @nonetrix3066
    @nonetrix3066 11 месяцев назад

    I think the cooler thing is that many of that he mentions in other videos where not known at the time the game was made, so we can take more advantage of the hardware today then we could have ever dreamed in the 90s

  • @madmax2069
    @madmax2069 11 месяцев назад

    Kaze just shows how much potential game consoles (in this case the N64) has that was never reached in their active lifetime (active meaning still manufactured and sold and supported by the manufacturer). This is something that the modding community and homebrew community are good for, figuring out every little aspect of the hardware inside a game console, making better SDKs, fixing bugs and issues in the games, optimizing code for said game, heck just look at the person making portal for the n64.

  • @xdanic3
    @xdanic3 11 месяцев назад

    FINALLY! I've been waiting for you to react to kaze for a while now! And after this we could have a kaze reacts to ThePrimeTime, but you reacting first was more expected, now I gotta watch the video 👀

  • @BeamMonsterZeus
    @BeamMonsterZeus 11 месяцев назад +1

    As an amateur astrophysicist, I always knew the N64 was a universal anomaly, but not for the reasons discovered here

  • @noxlupi1
    @noxlupi1 11 месяцев назад

    The Fast Inverse Square Root, was ahead of its time, a long time ago.

  • @n00blamer
    @n00blamer 11 месяцев назад

    You guys... the Devil's in the details but the underlying maths is quite simple: reciprocal is negation of the exponent, 1/(n ^ m) is n ^ -m. sqrt(n ^ m) is n ^ (m / 2), and these can be combined into: n ^ (m * -0.5) == 1.0 / sqrt(n ^ m). The code gets a good initial value and Newton-Raphson iterations converge.

  • @yxyk-fr
    @yxyk-fr 11 месяцев назад

    There's Newton algorithm (already devised by Babylonians before). And then there's Newton-Raphson iterations that converges like crazy...

  • @JimWitschey
    @JimWitschey 6 месяцев назад

    Kaze Emanuar has maybe the strangest career of any programmer alive

  • @antoniogarest7516
    @antoniogarest7516 11 месяцев назад +2

    Prime and Kaze
    Subscribed

  • @i3looi2
    @i3looi2 5 месяцев назад

    When that guy decides to invent another JS Framework after I just settled on my JS Framework of choice
    "BgRcky: fuck you"

  • @dominikmuller4477
    @dominikmuller4477 4 месяца назад

    to be fair, using the fast inverse square root and then inverting it is a dumb way to go about it. A fairer comparison would be to make an analogous fast square root algorithm. The same floating point magic that turns 1/sqrt(x) into (-1/2)* (int x) + magic number would also support turning sqrt(x) into (1/2)* (int x) + different magic number.

  • @jelliott3604
    @jelliott3604 9 месяцев назад

    It's that there is a threehalves label for .. 3 halves .. but the offset just gets a WTF!

  • @scottbuffington5964
    @scottbuffington5964 10 дней назад

    Floating points are easy to understand. Just say yes, may I have another until the end of time; or until you don't hear a response. =]

  • @rayanmazouz9542
    @rayanmazouz9542 11 месяцев назад +4

    apparently Mario 64 wasn't even compiled with optimization enabled

    • @RuySenpai
      @RuySenpai 11 месяцев назад +2

      This is a myth and Kaze himself has a video on it.

    • @felixjohnson3874
      @felixjohnson3874 11 месяцев назад

      ​@@RuySenpaiwell it's not so, if he does and it says what you are, it's wrong. I'd love to confirm that but I can't find the video your talking about so I can't.
      We literally have reverse engineered the source code and, using the compiler they were at the time, we can generate byte-for-byte the same code... when optimizations are disabled.
      The PAL version IS optimized, but it's already running about 16% slower anyway.

    • @RuySenpai
      @RuySenpai 11 месяцев назад +1

      @@felixjohnson3874 got it confused, it wasn't kaze it was modern vintage gamer who made the video.
      It was my bad, it isn't a myth that usa sm64 had compile optimizations off, but it's overstated how significant it is.

    • @Bobbias
      @Bobbias 11 месяцев назад

      @@RuySenpai That's primarily because compilers of the time didn't have great optimizations to begin with. Even without platform specific optimizations, modern compilers can do far more optimization than the compilers back in the day could, so whether or not it was optimized was not as big an issue back then as it would be now.
      That said, even with modern optimizations, that's not going to magically buy you a ton more performance. Kaze's massive performance improvements come from the fact that he's basically rewritten the entire game engine (with some of the only untouched code being the actual movement physics and such) with performance in mind.

    • @jc_dogen
      @jc_dogen 11 месяцев назад

      ​@@RuySenpaino he made a video explaining why it was reasonable for the computer optimizations to be turned off

  • @apollolux
    @apollolux 11 месяцев назад

    Sweet, it's a Kaze reaction! :)

  • @ThatJay283
    @ThatJay283 9 месяцев назад +1

    5:10 "you should be ashamed of yourself" - yup. back before i knew any better, i started a react app with a backend in nodejs, express, and typescript. i thought it'd be the "easy way" and instead i just ended up with a nightmare of react components and pointless middle steps that are too late to leave out now. and on top of all of that, it also runs like shit.

  • @alfiegordon9013
    @alfiegordon9013 11 месяцев назад +2

    Lets all love Kaze

  • @TheHackysack
    @TheHackysack 11 месяцев назад +1

    shoutouts to simpleflips

  • @dsdy1205
    @dsdy1205 7 месяцев назад

    10:33 I nearly spat out my drink

  • @csabaczcsomps7655
    @csabaczcsomps7655 11 месяцев назад

    Know you data and know we're dancing.

  • @guilhermeraposo6080
    @guilhermeraposo6080 11 месяцев назад

    I think about how nice an extra 20us would be every time the wife and I are erm... Playing SM64

  • @skilz8098
    @skilz8098 11 месяцев назад +1

    I have over a billion transistors that are all rated with a 0.05 nano second propagation delay. One of them is working at 0.09 nano seconds. One of my logic gates is slower than the rest and it is the source of all my bottlenecks. I want a full refund! LOL!!!!

  • @lukasoliverleo3730
    @lukasoliverleo3730 11 месяцев назад

    I never expected anyone to react to Kaze

  • @jorge28624
    @jorge28624 10 месяцев назад

    2:54 we have come full circle lol

  • @Bliss467
    @Bliss467 11 месяцев назад

    Game dev on limited hardware is truly next fuckin level

  • @v2ike6udik
    @v2ike6udik 11 месяцев назад

    2:44 MAN DOWN, MAN DOWN!

  • @stevez5134
    @stevez5134 11 месяцев назад

    this is the one from only 3 weeks ago??? anyways great stuff

  • @bertram-raven
    @bertram-raven 11 месяцев назад

    Adding my own piece of magic from the 1970s.
    a%=b
    b%=a
    a%=b
    Works for all types, structures, and cache swaps.

    • @n00blamer
      @n00blamer 11 месяцев назад

      If the % is exclusive-or then that would swap in-place.

    • @MikkoRantalainen
      @MikkoRantalainen 6 месяцев назад

      @@n00blamer Typically XOR would use syntax a^=b because usually % means remainder for a division.

    • @n00blamer
      @n00blamer 6 месяцев назад +1

      @@MikkoRantalainen That is why I assumed OP mistakenly used % when he meant ^

  • @rodrigoqteixeira
    @rodrigoqteixeira 2 месяца назад

    7:36 if code stored cache code no need read from ram if code no read from ram renderer able use ram so game faster
    In readable words: if the code is stored in the cache instead of the ram, then the renderer can use the ram while the cpu is computing, because now it has no need to read from ram, since it's on cache, which doesn't occupie the ram bus and in the end os faster.

  • @HrHaakon
    @HrHaakon 8 месяцев назад

    My backend runs smooth and it has something like 150 mhz (I have a few millicpus in the cluster to play with) of Xeon time.
    So a lot more than the N64, but since we moved to a cloud based platform CPU time got expensive.

  • @michaelrobb9542
    @michaelrobb9542 9 месяцев назад

    Cause he can't stop hearing the music. 9:04.

  •  11 месяцев назад

    I remember a qnx kernel fits in the cpu cache, weird times, isn’t?

  • @MikkoRantalainen
    @MikkoRantalainen 6 месяцев назад

    5:00 100% agreed!

  • @jim0_o
    @jim0_o 7 месяцев назад

    A Link/link To the past to the past was the pinnacle of 2D Zelda(and adventure/exploration games at the time) Ocarina of Time was the base-point of 3D Zelda games, Majora's Mask was a good look back at the gritty darker Zelda games. (they all had darkness but Zelda 2(Link), Majora's Mask and Twilight Princess were different) so both ALttP and MM were better game'wise but Ocarina of Time was ground breaking... now back to the remaining 80% of the video.
    Edit: AFAIK the mask guy(The Happy mask salesman) was a kind of Deus Ex Machina, ie. he was the "hand of god" that started and ended everything, if you follow the story its all his fault (He brought the mask to Clock town getting it into the hands of the Skull kid, but he is also the one that bugs you to get it back.) this is probably also why he is designed to look like the creator of the Zelda series... now back to 50% of the video...

  • @blipojones2114
    @blipojones2114 11 месяцев назад

    "link to the past" is hard, just started playing and am pretty hard stuck

  • @mfc1190
    @mfc1190 11 месяцев назад +4

    Ocarina of Time was so much better than majora’s mask like who TF said that

    • @billynasir3146
      @billynasir3146 11 месяцев назад

      Majora Mask is way more alive and open-world despite having less dungeons

    • @623-x7b
      @623-x7b 11 месяцев назад

      The Ocarina of Time is better than GTA 5. GTA 5 is better than Majora's mask

  • @BeamMonsterZeus
    @BeamMonsterZeus 11 месяцев назад

    I admit I've only beat OoT a few times and MM a few as well. I was more into Goldeneye, all of this at age 4-9 btw I'm a babby

  • @bernicefenton
    @bernicefenton 11 месяцев назад

    "take a moment and consider your life and what you've built... versus this... you should be ashamed of yourself" 😂 not so harsh

  • @AdhirRamjiawan
    @AdhirRamjiawan 11 месяцев назад +2

    I'm ashamed i'm part of the bigger problem :'(

  • @FastVideoProdInNash
    @FastVideoProdInNash 11 месяцев назад

    2:24 😂😂😂😂😂😂
    That was crazy.

  • @BustinJustin951
    @BustinJustin951 9 месяцев назад

    "Majoro's Mask" 🙄

  • @wesleymcbob
    @wesleymcbob 11 месяцев назад

    Yeesss Kaze is the greatest

  • @lMINERl
    @lMINERl 11 месяцев назад

    5:16 im ashamed😢

  • @tgirlshark
    @tgirlshark 11 месяцев назад

    OMG I LOVE KAZE

  • @StingSting844
    @StingSting844 11 месяцев назад +3

    This is a guy who gives imposter syndrome to the ones we get imposter syndromes from

  • @andrewtfluck
    @andrewtfluck 11 месяцев назад

    Kaze is awesome 😎

  • @remigoldbach9608
    @remigoldbach9608 11 месяцев назад +1

    John Carmack is a beast (sqrt approximation in Quake)

    • @jc_dogen
      @jc_dogen 11 месяцев назад +3

      wasn't his code though

  • @ikirachen
    @ikirachen 4 месяца назад

    MDK2 FTW :)

  • @oserodal2702
    @oserodal2702 11 месяцев назад +1

    Doesn't the original code of the fast inverse square root also technically has undefined behaviour.

    • @ea_naseer
      @ea_naseer 11 месяцев назад +1

      yeah it talked about the fact since he's not going for accuracy they probably would never divide by zero so no undefined behaviour like original.

    • @isodoubIet
      @isodoubIet 11 месяцев назад +4

      @@ea_naseer The UB is not in any division by zero, it's in accessing a pointer through a different type. That violates the C (and C++) aliasing rules. I don't think it's an actual problem on any real compiler since it's such a common thing to do (and C++ only recently added a standard way to do it), but technically it's UB according to the standard.

  • @cherubin7th
    @cherubin7th 10 месяцев назад +1

    Majora's Mask is my favorite.

  • @cweasegaming2692
    @cweasegaming2692 11 месяцев назад

    AHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

  • @TheIridescentFisherMan
    @TheIridescentFisherMan 11 месяцев назад

    My boy was like " DAMN 20 MICRO SECONDS??? THATS WAY MORE THAN I THOUGHT ".

  • @BigDaddyMort
    @BigDaddyMort 11 месяцев назад +1

    Incorrect, sir! The best Super Mario game was on the SNES: Super Mario World RPG, Legend of the Seven Stars.

    • @B1GL0NGJ0HN
      @B1GL0NGJ0HN 11 месяцев назад

      New Switch remake is a lot of fun!

  • @connorskudlarek8598
    @connorskudlarek8598 11 месяцев назад

    Prime playing Ocarina of Time makes me wonder if he's ever played the Randomizer or Online Multi-player Randomizer with his kids.

  • @o0shad0oo
    @o0shad0oo 11 месяцев назад

    Fast inverse square root *

  • @aeaehow
    @aeaehow 7 месяцев назад

    15:30

  • @jordixboy
    @jordixboy 11 месяцев назад

    Could someone explain in the measurements what cycles refer to? The amount of operations the cpu has to do? (whats the difference with instructions? cycles != instructions?) is is the amount of times it has to request/store data in registers/ram?

    • @ricardoamendoeira3800
      @ricardoamendoeira3800 11 месяцев назад +1

      Many instructions take several cycles to finish. Interacting with RAM is a good example.
      One cycle is generally the time needed for the shortest CPU instruction to run.

  • @notapplicable7292
    @notapplicable7292 11 месяцев назад

    Honestly considering prime apparently at some point in his career did embedded programming he seems to know very little about it.

  • @chronxdev
    @chronxdev 11 месяцев назад

    Yep, he cooked my noodle

  • @CatherineBert
    @CatherineBert 11 месяцев назад

    I don’t know who you are, but I’m interested in the video you are showing.
    Enjoy this evidence you’re stealing this contents views. Never seen your channel before.

  • @sa_lowell
    @sa_lowell 11 месяцев назад

    I absolutely hate that I laughed at the vector normies thing. I'm done.

  • @Georgggg
    @Georgggg 11 месяцев назад +1

    TL;DR: this is all became irrelevant 25 years ago

  • @Valerius123
    @Valerius123 8 месяцев назад

    Was Majoras Mask better than Ocarina Of Time? I didn't think so for the longest time but in objective hindsight... yeah, probably.

  • @MemeConnoisseur
    @MemeConnoisseur 11 месяцев назад

    Nintendo suing his ass, how dare he make a good mario game

  • @Raven-fu1zz
    @Raven-fu1zz 11 месяцев назад

    I think his talent would really useful on making a compiler or IDE for new age video games, modern video games are needlessly using so many resources unfortunately

    • @MadaraUchihaSecondRikudo
      @MadaraUchihaSecondRikudo 11 месяцев назад +3

      Remember that all of this is for a game that runs on a very specific-custom hardware whose entire specs is known and consistent. This will be a lot more difficult today even if you discount PC (which has so many different processors with so many different features, cache sizes, hardware optimizations, etc) and just go for modern consoles. It essentially became impossible to do these kinds of optimizations by hand a while back for any real scale and to be fair the compiler will do many of those optimizations for you (eg. SIMD, branch prediction, etc), Instruction cache in particular isn't that big of an issue anymore.
      What you still need to be aware of today is memory allocation. You generally want to be CPU bound and not memory bound - and the primary reasons high level languages are generally slower than low-level ones is that it's harder to track and control what memory you allocate and when. If you're smart about allocating memory and working with values you've already loaded (as they're cached), you're 95% of the way there, which is generally more than enough.

  • @catskinner6
    @catskinner6 11 месяцев назад

    Ocarina of Time >>>>>>>>>> Mario64 Fact

  • @bobanmilisavljevic7857
    @bobanmilisavljevic7857 11 месяцев назад

    I guess it's True when they say,
    D = 8
    8==D