The main theory that was discussed during the project afaik was that the developers were tight on time and just released an debug build. It makes sense since there are multiple inaccessible debug features and unused objects present in the ROM.
Inaccessible features are ubiquitous, specially in physically distributed games, where there is no benefit to optimizing for space you don't need, as the storage won't be shared with anything else.
That makes sense ~ honestly the game does feel like it was rushed in some ways. The back of Princess Peach's Castle is just an invisible wall, the two rear pillars textures you can walk through. I mean especially when you look at some earlier builds in 1995, they changed so very many things in such a short amount of time. I often wonder if they were given more time, like let's say if they wanted they could have had another year or two, how the game might've looked and what file size it may have been.
@@thenurseinblack1810 if they shipped a debug build, they probably didn't felt safe releasing the optimized version without testing it a little, and decided it was better to just release a stable debug version without compiler optimizations, might sound crazy but think about it, nobody is gonna know that you didn't use compiler optimizations, at least not for a while, and once the truth comes out most people (non technical people/casual players) aren't going to care, but if the game crashes under some circumstance because of the compiler optimizations, people are going to notice you did something wrong and are going to care
This is not entirely true. The CPU is almost never fully utilized. The mistake with the DDD lag is making the collision triangles dynamic rather than permanent. O2 compilation only mitigates this issue. It could be entirely solved by making the collision from the sub permanent (and I did that in SM64 Multiplayer, for example) Most of the lag in SM64 comes from GPU which this doesn't change so O2 compilation doesn't change anything for most of the game.
This was one of the first 3D platformer games ever and it is still one of the best. Controlling Mario in this world gives the player an undeniable sense of weight, momentum, and intention that no other game has ever quite captured again.
A theory I've heard is that it was developed so early in the N64s lifetime that the toolchain wasn't ready and any optimization at all could have introduced bugs. As a result, to insure that it worked at all times, even if the game had to slow down to do it, they disabled the optimization.
@@aurastrikethere's definitely nothing dumb about the way this game was developed. unfortunately they were short for time and nearing the deadline so removing any debug flags would require extra testing that they just didn't have time for, so releasing the game in a state they know is functional was a perfectly reasonable decision imo
Nintendo gonna shut that shit down. Kazar emuar or whatever the guys handle is who produces hacks constantly get his roms taken down pretty quickly these days.
You normally turn off optimizations when your debugging code so the compiler doesn't mess with it. My only guess is the game was on a tight deadline and it was the early to mid 90s so they didnt have time to test if -O2 was as bug free. PAL came a year after so they had time to test or knew it was safe by then.
Well that's kind of true. They actually tested the hell out of it. But it's important to remember the circumstances. The N64 was a stripped down version of cutting edge hardware and it had a lot of unique mechanisms and features developers had never used before. In top of that almost none had worked on a full 3D game before especially not one with so much riding on it as Mario 64. On top of even that since it was a launch game there was no development history or even a development kid as the N64 was still being designed so there was a lot of guesswork and trail and error involved. A lot of optimization of Mario 64 was actually done by hand so that they could very carefully observe the effects and immediately catch any bugs. Not to mention Nintendo was entirely relying on Mario 64 to sell the console and was not doing well financially at the time. Mario 64 having some kind of game breaking bug at launch would have been absolutely catastrophic. Also it's not that there is no O2 optimization, there is plenty. Just only in the areas deemed safe.
And talking about if O2 is bug free is oversimplifying. There’s two questions: 1) Are there bugs in the compiler? 2) Are there bugs in the game that might be triggered by turning on optimizations, such as race conditions, or undefined behavior. We can’t even answer these questions. We can’t answer the first one because we don’t have a copy of the compiler. And we can’t answer the second one, because if there were bugs that might’ve caused undefined behavior, the people doing the decompilation would have instead written the safe version of the code rather than the buggy version. The fact that the PAL version is optimized, makes me think there may have been a compiler bug. Something that was breaking the game in odd ways. Maybe it was even fixed near release, but they didn’t have time for enough testing after fixing the compiler.
I prefer the 3DS version. Lots more stars and you can play as Yoshi, Luigi, and Wario beside Mario. Mario floats Luigi goes invisible. Wario has the iron suit. Yoshi breathes fire. Also, the lag was fixed.
flarn2006 source code is human readable and you can run the game on a n64 emulator anyways so if there are any viruses I believe as long as your OS has protected memory then your pretty good tbh.
@@Sparkette Ah yes, the infamous n64 virusses. Though I heard that if you just download with an incognito browser, the virusses can't legally infect your computer.
I don't think there's a reason to change the geometry. A lot of what exists currently is used in speedruns and patching it out would be stupid at best. I think that just telling Nintendo to re-release SM64 with the updated PAL version code would be fine (the same way Banjo Kazooie has a 1.0 version and a 1.1 version).
The N64 has only 4MB of RAM. Mario 64 alone is 8MB, so they wouldn't load the entire game onto the RAM. And Mario 64 is one of the smaller cart sizes, the biggest was 64MB. So, actually, the games are pretty much constantly loading from the cart. Carts are just so fast that it's like transferring from area of RAM to another.
If anyone is watching this now, Super Mario 64 source code got leaked. So now we can have actual confirmation about the compilation discussion. Hopefully MattKC covers this.
oof I wish. Unfortunately the code is still not available as the leaked source only contains a very small portion of it, the rest is just precompiled code in the form of object (.o) files used to assemble the game. Also a lot of files are zeroed.
@@enigmatico6209 We do pretty much have the source code though. Look up the SM64 decompilation project. It's a project to decompile SM64 into fully readable C. It recompiles into exactly the same ROM as the release ROM. I know that's not the exact same thing, and some structure, comments, ambiguous code, constants, etc are impossible to reverse to the actual original version. But given that it was a compiler from 25 years ago with optimizations not fully enabled, it's pretty damn close. Also almost (or all by now?) all of it is now commented, variable names have been brought back, structure, etc. So while we may not have *the* source code to SM64, we have source code for SM64.
@@lost4468yt I know about the decompilation project (wasn't it also mentioned in this video?). But unfortunately it's also not the original source code.
@@enigmatico6209 Sure but it has come a long way since this video. It's very documented now. If you just go and look through the git you'd swear it's the original source code. Yeah as I said it's not *the* source code. But it's functionally, and even practically equivalent now. The only thing you could really hope to gleam from the original source code vs the decompiled one would be the comments, variable names, and maybe a slight change in some of the architectural choices in the code. But other than those you wouldn't discover anything else at all from the actual source code vs the decompiled source code.
@@lost4468yt No amount of work on the decompilation project will give the answer to why the optimizations were left out. The only thing we could hope for is either one of their developers saying something (unlikely) or a leak of some documents from back then discussing the matter.
But my guy, it goes one step further. Assembly gets assembled into machine code, which is even more bare than assembly. So they were doing both the disassembly and decompilation.
The amazing thing is: it's MIPS64 assembly. As with all assembly languages you need to have intimate knowledge of the hardware in order to understand how to read it. So not only is this guy really good with the MIPS architecture (which isn't a very common architecture), he had enough time and resources to go through the entire sm64 machine code and decompile it
It was translated from machine code, which is usually represented in hex. It is the literal byte contents of the ROM. Assembly language is actually represented in key words and instructions, like C, just at a much lower level of detail. Assembly language has to be compiled just like any other language. Translating assembly language to C is pretty hard but most people could do it. Translating hex code to C is borderline impossible. Especially C code that compiles down exactly to the same ROM. That is nothing short of amazing.
Machine code and assembly are a 1:1 transform. Yes, machine code is nigh impossible for humans to read, but translating it back to human-approachable assembly is trivial (for a computer). They're just different representations of the same information.
6:34 as you said transparency is hard. transparency in 3d is even harder. on transparency you still have to render it and everything behind it unlike a opaque object. the n64 was fill limited so this was costly. also the order you rendered also mattered because transparency blending has to work in a specific order so you could not use only the z-buffer. There are many tricks you can do to speed this up but it depends on the use case.
If you only have one layer of transparency, it gets considerably easier, though, First render all opaque objects, then render the transparent object on top. I don't know SMB64 very well but I assume that's what they did.
@@IkeFoxbrush They likely tried this back then. A lot of ideas for optimization were still new back then. hell Z-fighting and face culling was still a discussion back then. Sorting your renderer before pushing data out to the GPU was a lot more expensive back then. Extra step for transparent objects would need more attention than a simple priority queue that checked Delta-Z from camera.
It's important to keep in mind as well that the N64 at the time was cutting edge exotic hardware that worked very differently than developers were used to and had a lot of features none had ever used before. Also since it was a launch title there was no development kid as the exact specifications were still being worked on until the last moment. So during the development of Mario 64 there was a lot of guesswork and trail and error involved. A lot of optimization was actually done by hand so that any bugs or logic breaks could be caught immediately. Not to mention most of the O2 optimization is actually enabled in the initial release. It's only the ones most likely to break game code that was not. Not to mention the N64 even now is very troublesome to work with. From it's single RAM that is both Super fast but also super laggy and it's tiny texture cash. Even now emulating that weird hardware is still very tricky.
This blows my mind, I always found it weird the sub in DDD lagged so bad, I figured it was just something like the sub being some last minute addition to the game thrown in via some unconventional method. I would have never guessed it was because of a compiling oversight or poor decision.
Realistically these aren't related at all, it lags because Nintendo choose to build the map like that and just deemed the frame rate drop as acceptable. A funny story overall though
@@r033cx Enabling compiler optimization would increase the likelihood that an out-of-bounds glitch would cause memory clobbers in ways that might corrupt the contents of save game states or have other undesirable consequences. Compiler writers take the attitude that when the Standard doesn't specify how implementations should process non-portable code or code that receives erroneous input, all possible behaviors should be regarded as equally useful. In the C language invented by Dennis Ritchie, many actions are characterized as having "machine-dependent" behavior, and code which is written to exploit that in cases where the Standard imposes no requirements but the target platform is known can be more efficient than would be possible with code that was written to be "optimization proof".
I'll be honest, I never really thought about it too much. When I played SM64, I was more amazed by how nice it looked to notice if it slowed down to much in certain areas, like most of the water levels. Even when I did notice, I thought that was a design choice for the level itself, since the speed of game play seems to match the speed of the music. Of course, this is coming from a non-speed runner's point of view.
I think this is something you only realize after years of technological improvement in video games. I think at one point this was natural for everyone.
@@BeretBay Yeah, it's the same with resolutions, back in the day they looked perfectly fine.. But as we've gotten older and can see higher resolutions and frame rates we now see how awful the games looked in their native resolutions and frame rates..
It's not just slowdown, it's actually slow motion, which effects the movement and control of the game, a very cool different look and feel compared to standard gameplay, I've always enjoyed it.
In all honesty I thought the slowdown was intentional, I was only able to notice it on water levels and just thought "you can't swim as fast as you walk, so you move slower in-game to reflect that". It was only recently that I played an emulator and noticed the framerate drop.
If the PAL Version has these Compiler Optimizations, maybe the Japanese Shindou Edition has them too since that came after the PAL release. Did someone check on that?
The code says explicitly: if its (the code's, essentially) version is 'eu' then set the optimization flag to -O2, else go with -g. So no, I think PAL is the one and only that uses it.
Keep in mind this -O2 "if" must've been added to the decompilation makefile based on observations of the PAL binary being different and optimized, since an NTSC binary wouldn't have any traces left of unapplied flags from the makefile
Your comment about SD Card speeds is irrelevant since the game is loaded onto the Everdrive's ram which will act just like a real cart. Its not streaming from the SD Card throughout the game.
Exactly. The main reason to frown upon custom ROMs is it's hard to verify they're indeed identical, and that some smart ass didn't just load a patched ROM into his flashcart. 6 months from now, the reverse-engineering team might find and fix more bugs, they might be using a different compiler version that is better at optimizing.
I wouldn’t say they are all like that. Some of them are yes, but he’s made some good videos. And the main purpose is just to be entertaining, and some are very believable and evidenced.
I imagine the bigger reason flash carts would be banned in speedruns, besides faster load speed, is because you literally could just tweak the source code for the game and recompile to add in some subtle cheats in your favor. It would be like playing a speedrun with an action replay attached; a bit suspicious.
That's why you have to manufacture carts that can't be rewritten to (if possible) with the proper ROM loaded on; a MD5 hash could be used to ensure a valid cart/game.
@Lassi Kinnunen I dont think checking if the cart is valid was ever a problem in the speed running community. To just give it a different category just splits things even more
There is no loading-benefit as the game is loaded to ram beforehand and it can not load any faster as the games also were not coded to do anything during loading but wait the exact programmed amount of time as then the data was ready.
To answer the question of "why didn't Nintendo use optimizations." The guys responsible for reverse engineering SM64 fixed a number of bugs in the code. When you write incorrect (the technical term is Undefined Behavior, or UB for short) code, it might appear to work correctly by sheer accident; when compiled in debug mode, the compiler adds a bunch of extra "safety" code (such as zero-initializing every variable, flushing every variable update back to memory), precisely to help the programmer track down bugs. When optimizing UB code, disastrous results can happen; and the more you optimize UB code, the worse the result. Nintendo likely just rushed the game out to meet the Japanese deadline, and couldn't afford to fix all the problems, that no doubt the compiler was screaming at them with endless warnings about. The debug mode was enough to shield the final code from all the coding errors, so that was the chosen quick fix.
@@Mobius14 What's unlikely? The guy leading the RE effort explicitly mentioned on the Discord, the various UB bugs they fixed. As Kaze posted in another comment, there's more that creates lag than the compiler optimizations. Mainly, the inefficient use of the graphics hardware.
@@dkosmari You said "The debug mode was enough to shield the final code from all the coding errors, so that was the chosen quick fix." which implies they turned it on last minute. it was enabled since Spaceworld 1995.
@@Mobius14 Debugging output is used throughout the development. Because every time you change the code, there's a chance for introducing bugs, or revealing old ones that went unnoticed. Notoriously, the audio code isn't built in debug mode, possibly because it was developed separately, and was a more mature code base; so it got built separately without debug flags, and just linked into the game, that was still not working properly without debug. I didn't mean to imply they were 1 day from shipping the game out, and somebody said "let's try enabling debug flags", and it suddenly ran without crashing. Software in general is developed in debug mode, as the norm. It's just that they relied on the extra safety created by debug code too much, and couldn't afford to fix the code in time for launch.
I always wondered why I don't have any memories of DDR being as laggy on PAL (which I played as a kid) as it is on the NTSC version I run on emulators nowadays. BTW, the reason why this wouldn't be allowed in a speedrun is simply because it isn't official hardware or software.
Well, Google is so ubiquitous for searching on the internet, that it's actually pervaded colloquially used language to just mean "searching on the internet". Thus is completely valid to say "I'm gonna google that" and then use some other search engine.
@@osparav I feel like he could have just shown the reverse engineered source code for SM64... For some reason he went out of his way to show the incorrect language.
Also that makefile you showed being changed was not from Nintendo, that conditional was put in there by the decomp project so builds of a given version (EU or other) would result in a ROM identical to the ROM found on the corresponding retail cartridge.
From what I've heard somewhat consistently, the reason the O2 optimization wasn't used was because of how early the flag was, they didn't know how stable it was and decided not to use it. Though I'm still really new to the sm64 decomp scene so I could be 100% wrong in literally every aspect
Working in software development, I can say that flags get missed or mis-set pretty frequently due to human error. No idea if that is the case with Mario 64, just saying that it isn't as unlikely as it sounds. Also, I find it impressive that the majority of the game works very well without optimizations.
As MVG said, most of the libraries that SM64 uses have O2 and even O3 optimization, and the rest is just a small part of the code, which Nintendo probably opted to keep unoptimized because SM64 was a launch game and back then the developer tools had a few bugs, some of them having to do with the optimizers, and they wanted to be safe. If you disable O2 and O3 in the libraries, then you'll see how the game is actually going to run (very badly)
Early N64 compilers sometimes introduced bugs relating to doing math on the 3D geometry when using O2 (collision detection was the main victim here), which was later addressed, but Mario 64 was a launch title and did not benefit from these fixes. O2 also suppressed any warnings about potential issues - this made it very difficult to debug and this is why Super Mario 64 devs would have not used it.
iirc sm64 was launch title for the n64 and developer tools could also have been not mature enough for even -O2 or -O1 to optimise correctly they probably picked -g to produce predictable unoptimised output to make sure the game doesn’t crash on actual user copies initially i also wonder how the sm64 updated version (the one released in japan) compares in terms of performance
This is actually what is true, the N64 development kit was still not complete at the time of the release of Super Mario 64, so the developers were wary to do any hardcore optimizations.
4:16 "O2 is considered quite safe." I've actually managed to write a piece of code that would behave wrongly when compiled with the -O2 flag just a short while back. Granted, I was compiling for a somewhat more exotic piece of hardware (the ESP8266), but still. I had to downgrade to -O1 because of that. :(
@@LightTheMars @Sullivan I was making an LED blink and used something along the lines of "brightness = (millis() % 1024) < 512 ? 0 : 1023" and it didn't work (nothing UB about it as far as I can tell). Changing the 0 to some greater number would magically make it work, but I couldn't go below (I believe) 13. I can't remember too many details but changing the ternary operator to an if statement didn't do anything. Going down to -O1 was the only thing that I could do to make it work. I didn't look at the compiled assembly to find out what the issue was, though it might have been interesting if I had the time to do it.
@@MrOmegatronic brightness in this case is just a variable that I use later on to set the duty cycle of a PWM output. There weren't really any loops that could get unrolled.
Hmm... not sure what could've caused it, then, @@ThePC007. That's the only thing that comes to mind, unless it's a case of extremely subtle UB that the compiler assumes isn't there, or a slight bug in the compiler, or something. Strange.
acording to MVG it was likely due to the tools being in a very early version and therefore being to undstable/a pain to work with, so they just left it in the default debug settings since that was atleast working
This video was really interesting, this game is one of my all time favorites. I still have my original save file. It's wild to see people learning new things about the game so many years later.
In that era it was pretty common for all sorts of software. Processor speeds were very limited and graphical hardware provided relatively little assistance (esp on IBM compatible PCs, but some other computers were better in this regard). In any case when you're up against a lot of hardware limitations you have to be very efficient with your code. Also, if you go back to the early 80s I'm not sure how efficient compilers were. I know they had optimizations but I have no idea how they compare to today. I remember PC games in the 80s that didn't even run under DOS. They would just boot right off the floppy directly. If you weren't trying to read DOS filesystems it wasn't like that OS did much for a game anyway, other than allow it to be run from a hard drive. Actually, I think even MS Flight simulator booted straight from floppy without DOS in that era.
ASM in that era wasn't really that time consuming. C-code is almost the same in terms of complexity, and back then a C-Compiler could run for days if not months over your entire project whereas an Assembler would be much, much faster.
My thought would be that it wasn't optimized because the game was definitely being developed at the same time as the hardware, and optimizing was a secondary concern much like how Fire Emblem Awakening had a lack of feet because they didn't know what the final device would look like. Especially if some revision along the lines had already caused issues.
This is one of the best explanations I've had for the nuances of compilation. I had a vague understanding of the process of code compilation but never fully enough to really see the whole picture.
I had a couple of friends that worked at different games companies during the 90s (Rare, and Interactive Studios), both working on N64. From memory they told me that despite its on-paper specs the N64 was terrible at anything that didn't cull or couldn't cull, and doing so would massively slow down the console. This can be seen with anything that required redrawing the entire screen - e.g. full screen water or transparencies, or drawing large objects over the world. The friend at rare had no PS1 experience to compare against, but the other friend did. It might have been related to the anti aliasing mode which was not well optimised for objects over objects. The dithered overlay shown at 6:32 would not have suffered this as it's just a 1bit overlay, it's also not perspective correct, or filtering a texture to that (likely 2D) polygon. So it would inherently be a faster option. Good video though, really interesting to hear about little quirks like this fell through the gap. The same friends used to compare stories of Nintendo QA, which for the time was very very picky, but obviously didn't extend to looking at the code and understanding the compiler.
Very well done video! I love this kind of technical stuff. Cracking open games in kinda my thing. There were some cool secrets to be found inside of the code in Aladdin for Sega Genesis, too.
My guess: they didn't have time to test if the optimized version had any errors, so they decided to play it safe and release the unoptimized version which was tested because it would be a disaster if there were game-ruining errors in the optimized version.
Fun fact about the Mega Man X4 clip you showed at 6:32- the dithering present there is exclusive to the Sega Saturn version of the game. In the PS1 version, this part and other parts in the game like it use proper transparency instead!
Now that's nintemdo has the 35th anniversary of mario 3d Allstars wandering if its going to run well and if its optimized better for the switch. We know the graphics look way better but keeps the lower polygon but in higher resolution.
It's true that -O2 has been generally considered pretty safe BUT that's only if you're on a *well-supported platform*. For custom CPUs or weird variant chips it can often blow up in your face. Also code that accidentally depends on undefined behavior can work fine without optimization but utterly fall down with optimization turned on. Finally, debugging optimized code is much more of a pain, and unless you really need the code to be optimized, shipping the same version that you debug with can have its benefits.
De-compiling the code like that is a fucking godlike achivment. I cannot convey my level of absolute fucking mad respect to the people that did that. Madlads, truly
I heard in another video that it wasn't optimized yet because the compiler wasn't mature enough yet and even -O2 was broken. They even went back and tried the old compiler version on this same source code and it actually did crash somewhere.
@@zabotheother423 floating point optimization can cause weird "glitches", like already seen as differences between virtual console and real hardware. (subtle errors when removing a few instructions increases or decreases the math accuracy, leading to drifting values over long time)
@@kneesnap1041 It didn't compile with -O3. The compiler says "wrong file format" right at the end. Tried with -O2 and it worked fine, so it's not my setup that's making it fail.
It was fun, smart, and cool that rare put easter eggs in their makings. D.K. Racing introduced banjokazooie and conker. Banjokazooie introduced conker with an image inside gruntilda's ship in "rusty bucket bay".
I remember one time when trying to do this glitch Mario’s head model got super stretched out but his eyes stayed the same size, it’s the stuff of nightmares
I remember reading something like this when looking into the large address aware patch for Skyrim on pc. My memory of this is fuzzy but, something about him decompiling the exe to find that Bethesda didn't use some of those optimization flags for the pc version of the game. Its apparently a direct port of the console versions. It also put a copy of the textures in the main ram... for ... reasons i guess... Seems they made the game on pc, converted it to xbox360/ps3. Then converted THAT version back to pc without re-enabling the flags that you cant use on consoles. The guy couldnt post any proof though, as he would have to upload a modified version of the game... Interesting anyway.
@@azazellon you literally need to beat this specific star in order to beat the game casually. Unless you were a speedrunner god when you were a child you had to play this stage in order to beat the game.
@@diegomedina9637 Maybe he had some weird version of the game? I had a mario kart ds catridge wich was straight up missing a stage. I know it might be Extreme-Lyrik unlikely and rare
My guess would be since building using optimizations takes longer than regular builds (which are used during development), and running on a tight schedule, it was just to big of a risk enabling the optimizations relatively close to the release.
This was a really cool video, MattKC-san. Thank you. I always felt that parts of the game were a bit slow, but I didn't realize it was a bottleneck issue. O_O
As someone who’s been in the game industry for 26 years for me the answer is pretty obvious: they had to f***ing ship the NTSC version and had no time to test O2 thoroughly. When you’re shipping on cartridges you can’t take any chance of a game being buggy or crashing, however slim, especially when you’re Nintendo and it’s your flagship title for a new system. The PAL version comes out later, more time for testing, and more of an incentive to do so due to it being indeed naturally slower due to 50hz.
The simplest and most likely explanation: they made a mistake with the flag. Could be a typo, could be just forgetting to turn it on. And even the PAL version having it on is easily explained by them releasing that one nearly a year later (June 23 1996 Japan, September 29 1996 USA, March 1 1997 EU). Since for PAL they needed it to run slower to match their N64s and TVs, they happened across the mistake that was missed even when translating for USA. By then, being a physical release, it would be easier to fix it for PAL and ignore it for other ones than to recall it or release new copies and tell people the mistake.
It’s not uncommon to purposefully set the compiler at -O0 for production code, especially in threaded programs written by lots of people. A missing “volatile” keyword somewhere can make some optimizations unsafe.
I'd love for a follow-up with some more in depth tests. There is a GameShark code for a lag counter. There also is one to "deblur" SM64, removing their AA filter that causes a lot of lag. Then there is OverClockied the N64. I'm wondering if a combination of the recompiled version, the removal of AA, and an OCed N64 could make there be zero slow down. I know the removal of AA helps the framerate quite a bit, but no one has tested OCing the N64.
with a buggy compiler not even directly designed for the Nintendo 64 (called Ultra 64 at the time) hardware that had the chance of introducing bugs just because you enabled optimizations
With appendices, "The Importance of Your Toolchain Being Tested on Your Target Platform", and "The Importance of Your Target Platform Existing Long Enough to Have a Fully Functional Toolchain". ;3
Ooooh. I thought the title said "super show" so I was wondering when this submarine talk was going to translate into the super Mario bros super show beginnings.
I was about to say "N64 didn't use C, PSX was the first" but after looking, it appears C was used a lot on N64, ad-hoc, as C usually is I suppose. There wasn't an official C kit, unlike the Playstation dev kit.
Would it be possible to compile the decompiled version with a C compiler from prior to when the game came out? That way we can see if the compiler was to blame back then at making buggy code.
Ugh. Two things. One, I wish you had talked about the sub in DDD having dynamic collision as part of the reason it was so slow, but I guess you need to be more familiar with the codebase or talked to anyone on the decomp project to have known that. Two, the everdrive isn't reading directly from the SD card while you are playing, when you start a game it loads the ROM onto a piece of dedicated memory on the cart that has at least the same access time as an original cartridge ROM chip. The way cart ROM is accessed is such that it is always accessed at the same speed, usually measured in a number of CPU cycles.
Did the GCC compiler contain a -O2 back in 1995? It could just be possible that the optimization engine did not exist or if it did, it was not stable. Remember, it would have needed to be have been developed for MIPS R3000 core. I know a lot of stuff for those branches were not introduced until 1999.
1:30 this is one of the most impressive feats of programming grit and toughness I have ever seen done. The guys responsible for the "perfect" copy of SM64, truly my hat is off. Have a golden star ;) Oh that's right you got all 120 already!
While I know it's not the exact same thing, there is a world boss in WoW that, at one point, was causing massive frame lag and latency issues when you got within (roughly) 50 yards of her and it was made even worse if you were in a raid group. Yep, even if you weren't in a raid and approached the area, the entire game would lag in both latency and frames, causing major issues with trying to take her down (and that doesn't even begin to touch on how OP she was). After a couple months, suddenly everything was ok with no major latency or lag and I wonder if Blizz made an oops compiling the code, but had to "ship it out" anyway to get the patch out on time.
It could be different for the Nintendo 64, since they had one specific hardware target, but typically speaking, C compiles to assembly, not directly to machine code. Otherwise great video, super interesting!
The main theory that was discussed during the project afaik was that the developers were tight on time and just released an debug build. It makes sense since there are multiple inaccessible debug features and unused objects present in the ROM.
Inaccessible features are ubiquitous, specially in physically distributed games, where there is no benefit to optimizing for space you don't need, as the storage won't be shared with anything else.
That makes sense ~ honestly the game does feel like it was rushed in some ways. The back of Princess Peach's Castle is just an invisible wall, the two rear pillars textures you can walk through. I mean especially when you look at some earlier builds in 1995, they changed so very many things in such a short amount of time.
I often wonder if they were given more time, like let's say if they wanted they could have had another year or two, how the game might've looked and what file size it may have been.
@@godlyBlade But what about the fact that no compiler optimization was used?
@@thenurseinblack1810 if they shipped a debug build, they probably didn't felt safe releasing the optimized version without testing it a little, and decided it was better to just release a stable debug version without compiler optimizations, might sound crazy but think about it, nobody is gonna know that you didn't use compiler optimizations, at least not for a while, and once the truth comes out most people (non technical people/casual players) aren't going to care, but if the game crashes under some circumstance because of the compiler optimizations, people are going to notice you did something wrong and are going to care
The crunch is ancient
This is not entirely true. The CPU is almost never fully utilized. The mistake with the DDD lag is making the collision triangles dynamic rather than permanent. O2 compilation only mitigates this issue. It could be entirely solved by making the collision from the sub permanent (and I did that in SM64 Multiplayer, for example)
Most of the lag in SM64 comes from GPU which this doesn't change so O2 compilation doesn't change anything for most of the game.
This comment needs to be pinned
Hi man love your ROMs!
So you're saying if the sub never left there would be no lag?
@@SSDARKPIT no
How did you learn how to make all those ROMS or fangames Kaze?
> talks about C
> shows C++ code and docs
*angry programmer noises*
RuRo is the binary example at the beginning even real?
@@daddysneck Probably not, but it would be compiler specific anyway.
Yeah lol
Solix No, a simple program to print “Hello World!” would be a good bit bigger than that
William
system.out.println("Hello World!");
love how over 20 years later we're still finding stuff about this game
Wow your icon takes me back
24 years
kirby 64 this and OOT
This was one of the first 3D platformer games ever and it is still one of the best. Controlling Mario in this world gives the player an undeniable sense of weight, momentum, and intention that no other game has ever quite captured again.
@@shadoxfm8822 he said over 20 years
A theory I've heard is that it was developed so early in the N64s lifetime that the toolchain wasn't ready and any optimization at all could have introduced bugs. As a result, to insure that it worked at all times, even if the game had to slow down to do it, they disabled the optimization.
Modern Vintage Gamer has a video about it. That's actually the most likely reason.
They also forgot to disable the debug flags like dumbasses
@@aurastrikethere's definitely nothing dumb about the way this game was developed. unfortunately they were short for time and nearing the deadline so removing any debug flags would require extra testing that they just didn't have time for, so releasing the game in a state they know is functional was a perfectly reasonable decision imo
This is rad and all, but you can't just casually talk about the fact that SM64 is basically OPEN SOURCE now. We're going from modding to forking bois
Nintendo gonna shut that shit down. Kazar emuar or whatever the guys handle is who produces hacks constantly get his roms taken down pretty quickly these days.
@@calebb7012 They can't really do anything to stop it, only slow it down. No amount of legal action will permanently erase the decomp from existence
I wish we could do what we did with Doom to Mario 64
Someone should create a super sonic 64
Go clone the repo now while you still can.
You normally turn off optimizations when your debugging code so the compiler doesn't mess with it. My only guess is the game was on a tight deadline and it was the early to mid 90s so they didnt have time to test if -O2 was as bug free. PAL came a year after so they had time to test or knew it was safe by then.
Well that's kind of true. They actually tested the hell out of it. But it's important to remember the circumstances. The N64 was a stripped down version of cutting edge hardware and it had a lot of unique mechanisms and features developers had never used before. In top of that almost none had worked on a full 3D game before especially not one with so much riding on it as Mario 64. On top of even that since it was a launch game there was no development history or even a development kid as the N64 was still being designed so there was a lot of guesswork and trail and error involved.
A lot of optimization of Mario 64 was actually done by hand so that they could very carefully observe the effects and immediately catch any bugs.
Not to mention Nintendo was entirely relying on Mario 64 to sell the console and was not doing well financially at the time. Mario 64 having some kind of game breaking bug at launch would have been absolutely catastrophic.
Also it's not that there is no O2 optimization, there is plenty. Just only in the areas deemed safe.
And talking about if O2 is bug free is oversimplifying. There’s two questions:
1) Are there bugs in the compiler?
2) Are there bugs in the game that might be triggered by turning on optimizations, such as race conditions, or undefined behavior.
We can’t even answer these questions. We can’t answer the first one because we don’t have a copy of the compiler. And we can’t answer the second one, because if there were bugs that might’ve caused undefined behavior, the people doing the decompilation would have instead written the safe version of the code rather than the buggy version.
The fact that the PAL version is optimized, makes me think there may have been a compiler bug. Something that was breaking the game in odd ways. Maybe it was even fixed near release, but they didn’t have time for enough testing after fixing the compiler.
I prefer the 3DS version. Lots more stars and you can play as Yoshi, Luigi, and Wario beside Mario.
Mario floats
Luigi goes invisible.
Wario has the iron suit.
Yoshi breathes fire.
Also, the lag was fixed.
"Almost perfect source code!"
2020: Ya want source code kids?
It’s sad that we can’t compile the game cause it’s not really the complete source
I'd scan it for viruses if I were you.
@@Sparkette >scanning .c files which when compiled output a .n64 file
flarn2006 source code is human readable and you can run the game on a n64 emulator anyways so if there are any viruses I believe as long as your OS has protected memory then your pretty good tbh.
@@Sparkette Ah yes, the infamous n64 virusses. Though I heard that if you just download with an incognito browser, the virusses can't legally infect your computer.
It's interesting to see what one minor tweak could enhance in a game.
*THAT TORNADO IS CARRYING A CAR!*
@@FIRSTEBITOS *WHAT???!!!*
It makes me think of the AI fuck up in Aliens: Colonial Marines.
I don't get or see it being slower where ?
@@oceanix91 Sonic 06 meme son, the best kind of memes.
Someone should combine this modification with the one that improves Mario’s model geometry deformation and release it as kind of an unofficial patch
Nah.
@@kane00000 yes
@@kane00000 yes
@Kane Williams yes
I don't think there's a reason to change the geometry. A lot of what exists currently is used in speedruns and patching it out would be stupid at best. I think that just telling Nintendo to re-release SM64 with the updated PAL version code would be fine (the same way Banjo Kazooie has a 1.0 version and a 1.1 version).
The EverDrive (and every other flashcart for other systems) loads the game's ROM to on-board memory. It's not streamed off the SD card.
The N64 has only 4MB of RAM. Mario 64 alone is 8MB, so they wouldn't load the entire game onto the RAM. And Mario 64 is one of the smaller cart sizes, the biggest was 64MB.
So, actually, the games are pretty much constantly loading from the cart. Carts are just so fast that it's like transferring from area of RAM to another.
@@SpeedfreakUK He means the on-board memory of the EverDrive. The cart has 64MB of SRAM to store the currently selected ROM.
Lord Mordington ah gotcha
Pretty sure the reason flash carts aren't typically accepted for speedrunning is because it's hard to check if the runner is running a hacked rom.
Correct
If anyone is watching this now, Super Mario 64 source code got leaked. So now we can have actual confirmation about the compilation discussion. Hopefully MattKC covers this.
oof I wish. Unfortunately the code is still not available as the leaked source only contains a very small portion of it, the rest is just precompiled code in the form of object (.o) files used to assemble the game. Also a lot of files are zeroed.
@@enigmatico6209 We do pretty much have the source code though. Look up the SM64 decompilation project. It's a project to decompile SM64 into fully readable C. It recompiles into exactly the same ROM as the release ROM. I know that's not the exact same thing, and some structure, comments, ambiguous code, constants, etc are impossible to reverse to the actual original version. But given that it was a compiler from 25 years ago with optimizations not fully enabled, it's pretty damn close. Also almost (or all by now?) all of it is now commented, variable names have been brought back, structure, etc. So while we may not have *the* source code to SM64, we have source code for SM64.
@@lost4468yt I know about the decompilation project (wasn't it also mentioned in this video?). But unfortunately it's also not the original source code.
@@enigmatico6209 Sure but it has come a long way since this video. It's very documented now. If you just go and look through the git you'd swear it's the original source code.
Yeah as I said it's not *the* source code. But it's functionally, and even practically equivalent now. The only thing you could really hope to gleam from the original source code vs the decompiled one would be the comments, variable names, and maybe a slight change in some of the architectural choices in the code. But other than those you wouldn't discover anything else at all from the actual source code vs the decompiled source code.
@@lost4468yt No amount of work on the decompilation project will give the answer to why the optimizations were left out. The only thing we could hope for is either one of their developers saying something (unlikely) or a leak of some documents from back then discussing the matter.
My jaw dropped when I figured out that somebody decompiled the assembly code back into c. I have trouble just reading the assembly code.
i had trouble reading the c code until I actually learned c (partially from the decomp itself)
asm scares me
But my guy, it goes one step further. Assembly gets assembled into machine code, which is even more bare than assembly. So they were doing both the disassembly and decompilation.
The amazing thing is: it's MIPS64 assembly. As with all assembly languages you need to have intimate knowledge of the hardware in order to understand how to read it. So not only is this guy really good with the MIPS architecture (which isn't a very common architecture), he had enough time and resources to go through the entire sm64 machine code and decompile it
It was translated from machine code, which is usually represented in hex. It is the literal byte contents of the ROM. Assembly language is actually represented in key words and instructions, like C, just at a much lower level of detail. Assembly language has to be compiled just like any other language.
Translating assembly language to C is pretty hard but most people could do it. Translating hex code to C is borderline impossible. Especially C code that compiles down exactly to the same ROM. That is nothing short of amazing.
Machine code and assembly are a 1:1 transform. Yes, machine code is nigh impossible for humans to read, but translating it back to human-approachable assembly is trivial (for a computer).
They're just different representations of the same information.
6:34 as you said transparency is hard. transparency in 3d is even harder. on transparency you still have to render it and everything behind it unlike a opaque object. the n64 was fill limited so this was costly. also the order you rendered also mattered because transparency blending has to work in a specific order so you could not use only the z-buffer. There are many tricks you can do to speed this up but it depends on the use case.
If you only have one layer of transparency, it gets considerably easier, though, First render all opaque objects, then render the transparent object on top. I don't know SMB64 very well but I assume that's what they did.
@@IkeFoxbrush They likely tried this back then. A lot of ideas for optimization were still new back then. hell Z-fighting and face culling was still a discussion back then. Sorting your renderer before pushing data out to the GPU was a lot more expensive back then. Extra step for transparent objects would need more attention than a simple priority queue that checked Delta-Z from camera.
It's important to keep in mind as well that the N64 at the time was cutting edge exotic hardware that worked very differently than developers were used to and had a lot of features none had ever used before. Also since it was a launch title there was no development kid as the exact specifications were still being worked on until the last moment.
So during the development of Mario 64 there was a lot of guesswork and trail and error involved. A lot of optimization was actually done by hand so that any bugs or logic breaks could be caught immediately. Not to mention most of the O2 optimization is actually enabled in the initial release. It's only the ones most likely to break game code that was not.
Not to mention the N64 even now is very troublesome to work with. From it's single RAM that is both Super fast but also super laggy and it's tiny texture cash. Even now emulating that weird hardware is still very tricky.
This blows my mind, I always found it weird the sub in DDD lagged so bad, I figured it was just something like the sub being some last minute addition to the game thrown in via some unconventional method. I would have never guessed it was because of a compiling oversight or poor decision.
Realistically these aren't related at all, it lags because Nintendo choose to build the map like that and just deemed the frame rate drop as acceptable.
A funny story overall though
@@tiernanmccarthy well yes, but still if they would compile it correctly it wouldn't lag
So they didn't do that right anyway
@@tiernanmccarthy They might've done testing with compiler optimisations, noticed it ran fine and assumed it would be releasing with the optimisations
@@r033cx Enabling compiler optimization would increase the likelihood that an out-of-bounds glitch would cause memory clobbers in ways that might corrupt the contents of save game states or have other undesirable consequences. Compiler writers take the attitude that when the Standard doesn't specify how implementations should process non-portable code or code that receives erroneous input, all possible behaviors should be regarded as equally useful. In the C language invented by Dennis Ritchie, many actions are characterized as having "machine-dependent" behavior, and code which is written to exploit that in cases where the Standard imposes no requirements but the target platform is known can be more efficient than would be possible with code that was written to be "optimization proof".
@@ShadowGaro of they had no time to do it, because it had to be released in last minute.
I'll be honest, I never really thought about it too much. When I played SM64, I was more amazed by how nice it looked to notice if it slowed down to much in certain areas, like most of the water levels. Even when I did notice, I thought that was a design choice for the level itself, since the speed of game play seems to match the speed of the music. Of course, this is coming from a non-speed runner's point of view.
I think this is something you only realize after years of technological improvement in video games. I think at one point this was natural for everyone.
Hey, that's a good point. It probably wouldn't have been noticed back in the modding scene of the past.
@@BeretBay Yeah, it's the same with resolutions, back in the day they looked perfectly fine.. But as we've gotten older and can see higher resolutions and frame rates we now see how awful the games looked in their native resolutions and frame rates..
It's not just slowdown, it's actually slow motion, which effects the movement and control of the game, a very cool different look and feel compared to standard gameplay, I've always enjoyed it.
In all honesty I thought the slowdown was intentional, I was only able to notice it on water levels and just thought "you can't swim as fast as you walk, so you move slower in-game to reflect that". It was only recently that I played an emulator and noticed the framerate drop.
If the PAL Version has these Compiler Optimizations, maybe the Japanese Shindou Edition has them too since that came after the PAL release. Did someone check on that?
Any SM64 version released after US uses -O2
The code says explicitly: if its (the code's, essentially) version is 'eu' then set the optimization flag to -O2, else go with -g. So no, I think PAL is the one and only that uses it.
You're wrong. The decomp doesn't support Shindou
69 likes, make a wish!
Keep in mind this -O2 "if" must've been added to the decompilation makefile based on observations of the PAL binary being different and optimized, since an NTSC binary wouldn't have any traces left of unapplied flags from the makefile
Me: I don't remember slow down in the Pal version I played...
Also Me: Oh yeah.... 25fps
Lol
25*
yeah it's 25
Your comment about SD Card speeds is irrelevant since the game is loaded onto the Everdrive's ram which will act just like a real cart. Its not streaming from the SD Card throughout the game.
Exactly. The main reason to frown upon custom ROMs is it's hard to verify they're indeed identical, and that some smart ass didn't just load a patched ROM into his flashcart. 6 months from now, the reverse-engineering team might find and fix more bugs, they might be using a different compiler version that is better at optimizing.
@@dkosmari what the fuck are you talking about
@@Porygonal64 What language do you speak, if not English?
@@Porygonal64 I legitimately had to laugh when I read your comment.
Ever heard of checksums?
"A very uneducated without any substantial evidence" theory is also Game Theory.
How much better would the world be if Matthew Patrick never "graced" us with his presence on RUclips?
I wouldn’t say they are all like that. Some of them are yes, but he’s made some good videos. And the main purpose is just to be entertaining, and some are very believable and evidenced.
I imagine the bigger reason flash carts would be banned in speedruns, besides faster load speed, is because you literally could just tweak the source code for the game and recompile to add in some subtle cheats in your favor. It would be like playing a speedrun with an action replay attached; a bit suspicious.
That's why you have to manufacture carts that can't be rewritten to (if possible) with the proper ROM loaded on; a MD5 hash could be used to ensure a valid cart/game.
@Lassi Kinnunen I dont think checking if the cart is valid was ever a problem in the speed running community. To just give it a different category just splits things even more
Make the game not run w/o internal checksum, hash of which is shown on screen at boot.
@@verve1858 Then a cheater makes the game display the expected hash as a fixed message instead, and can change anything else without being noticed
There is no loading-benefit as the game is loaded to ram beforehand and it can not load any faster as the games also were not coded to do anything during loading but wait the exact programmed amount of time as then the data was ready.
To answer the question of "why didn't Nintendo use optimizations."
The guys responsible for reverse engineering SM64 fixed a number of bugs in the code. When you write incorrect (the technical term is Undefined Behavior, or UB for short) code, it might appear to work correctly by sheer accident; when compiled in debug mode, the compiler adds a bunch of extra "safety" code (such as zero-initializing every variable, flushing every variable update back to memory), precisely to help the programmer track down bugs. When optimizing UB code, disastrous results can happen; and the more you optimize UB code, the worse the result. Nintendo likely just rushed the game out to meet the Japanese deadline, and couldn't afford to fix all the problems, that no doubt the compiler was screaming at them with endless warnings about. The debug mode was enough to shield the final code from all the coding errors, so that was the chosen quick fix.
Unlikely. Spaceworld 1995 build is also unoptimized thanks to the DDD lag being the indicator. It's been there from the start.
@@Mobius14 What's unlikely? The guy leading the RE effort explicitly mentioned on the Discord, the various UB bugs they fixed. As Kaze posted in another comment, there's more that creates lag than the compiler optimizations. Mainly, the inefficient use of the graphics hardware.
Interesting you can hardly blame them either game was still a masterpiece
@@dkosmari You said "The debug mode was enough to shield the final code from all the coding errors, so that was the chosen quick fix." which implies they turned it on last minute. it was enabled since Spaceworld 1995.
@@Mobius14 Debugging output is used throughout the development. Because every time you change the code, there's a chance for introducing bugs, or revealing old ones that went unnoticed. Notoriously, the audio code isn't built in debug mode, possibly because it was developed separately, and was a more mature code base; so it got built separately without debug flags, and just linked into the game, that was still not working properly without debug.
I didn't mean to imply they were 1 day from shipping the game out, and somebody said "let's try enabling debug flags", and it suddenly ran without crashing. Software in general is developed in debug mode, as the norm. It's just that they relied on the extra safety created by debug code too much, and couldn't afford to fix the code in time for launch.
What about the Virtual Console version of this game, since that doesn't have slowdown? How does it compare to O2 on the N64?
The Wii VC ROM is identical, bit for bit.
Virtual Console just runs better than original n64 I assume
@@Wyatt_James it isn't
ZEROTWOOOOOO oh wait wrong video. hehe, sorry
@@Wyatt_James ...kinda. there's a change the VC does to the ROM after loading it, and some of the emulation is faulty.
I always wondered why I don't have any memories of DDR being as laggy on PAL (which I played as a kid) as it is on the NTSC version I run on emulators nowadays.
BTW, the reason why this wouldn't be allowed in a speedrun is simply because it isn't official hardware or software.
9:23
"All I can say is google"
shows DuckDuckGo
Hehehe
Well, Google is so ubiquitous for searching on the internet, that it's actually pervaded colloquially used language to just mean "searching on the internet". Thus is completely valid to say "I'm gonna google that" and then use some other search engine.
Says... C
(Shows Qt C++)
Hardly the C89 that SM64 uses.
@@kenziemac130 that was so frustrating to watch haha
@@janikarkkainen3904 Isn't that a problem for google, because then their name is no longer copyrigt protected or something like that?
@@osparav I feel like he could have just shown the reverse engineered source code for SM64... For some reason he went out of his way to show the incorrect language.
Also that makefile you showed being changed was not from Nintendo, that conditional was put in there by the decomp project so builds of a given version (EU or other) would result in a ROM identical to the ROM found on the corresponding retail cartridge.
I swore the title said “Nintendo’s big mistake that made Super Mario Super Show”. I just woke up.
Eat your arms and then again
Do the mario
I am god and then you know
Eat the dario
Do do do do do
Just like that
Joker DIC.
"Hey paisanos! It's the Super Mario Bros. Super Show!"
MISTAKES WERE MADE
pasta power
Talks about C, shows C++ code and docs.
nice.
I c..
also not a single & pointer
@@gnaurai6251 & is a reference or address.
@@pow9606 & is a reference in a declaration and an operator to get the pointer when used on a variable.
And it's Qt
From what I've heard somewhat consistently, the reason the O2 optimization wasn't used was because of how early the flag was, they didn't know how stable it was and decided not to use it. Though I'm still really new to the sm64 decomp scene so I could be 100% wrong in literally every aspect
Working in software development, I can say that flags get missed or mis-set pretty frequently due to human error. No idea if that is the case with Mario 64, just saying that it isn't as unlikely as it sounds. Also, I find it impressive that the majority of the game works very well without optimizations.
1:14
Shoutouts to Simpleflips
oh my aching tentacles
Brush your teeth bitch
The fucking moment I see a mario pop outta the pipe and he's in a fucking wheelchair I'm gonna fart my soul alright?
As MVG said, most of the libraries that SM64 uses have O2 and even O3 optimization, and the rest is just a small part of the code, which Nintendo probably opted to keep unoptimized because SM64 was a launch game and back then the developer tools had a few bugs, some of them having to do with the optimizers, and they wanted to be safe. If you disable O2 and O3 in the libraries, then you'll see how the game is actually going to run (very badly)
Early N64 compilers sometimes introduced bugs relating to doing math on the 3D geometry when using O2 (collision detection was the main victim here), which was later addressed, but Mario 64 was a launch title and did not benefit from these fixes. O2 also suppressed any warnings about potential issues - this made it very difficult to debug and this is why Super Mario 64 devs would have not used it.
later games were made with a new compiler, likely for Windows
iirc sm64 was launch title for the n64 and developer tools could also have been not mature enough for even -O2 or -O1 to optimise correctly
they probably picked -g to produce predictable unoptimised output to make sure the game doesn’t crash on actual user copies initially
i also wonder how the sm64 updated version (the one released in japan) compares in terms of performance
this is the correct answer
Yeah my understanding is that they were using a shitty broken version of gcc where O2 didn't work.
This is actually what is true, the N64 development kit was still not complete at the time of the release of Super Mario 64, so the developers were wary to do any hardcore optimizations.
4:16 "O2 is considered quite safe." I've actually managed to write a piece of code that would behave wrongly when compiled with the -O2 flag just a short while back. Granted, I was compiling for a somewhat more exotic piece of hardware (the ESP8266), but still. I had to downgrade to -O1 because of that. :(
Perfectly spec-faithful code or something UB related?
@@LightTheMars @Sullivan I was making an LED blink and used something along the lines of "brightness = (millis() % 1024) < 512 ? 0 : 1023" and it didn't work (nothing UB about it as far as I can tell). Changing the 0 to some greater number would magically make it work, but I couldn't go below (I believe) 13. I can't remember too many details but changing the ternary operator to an if statement didn't do anything. Going down to -O1 was the only thing that I could do to make it work. I didn't look at the compiled assembly to find out what the issue was, though it might have been interesting if I had the time to do it.
Knowing nothing whatsoever about how brightness is used, @@ThePC007, I'd guess loop unrolling.
@@MrOmegatronic brightness in this case is just a variable that I use later on to set the duty cycle of a PWM output. There weren't really any loops that could get unrolled.
Hmm... not sure what could've caused it, then, @@ThePC007. That's the only thing that comes to mind, unless it's a case of extremely subtle UB that the compiler assumes isn't there, or a slight bug in the compiler, or something. Strange.
acording to MVG it was likely due to the tools being in a very early version and therefore being to undstable/a pain to work with, so they just left it in the default debug settings since that was atleast working
and them not having the time to test whether the bugs in the compiler actually affected the game or not, so they just went with g just to be safe
i expected you to compile the game in O3
This video was really interesting, this game is one of my all time favorites. I still have my original save file. It's wild to see people learning new things about the game so many years later.
"Up until then their games had been written in Assembly".... oh..... that's.. uhm.... time consuming I guess...
This comment is amazing. Just sayin
In that era it was pretty common for all sorts of software. Processor speeds were very limited and graphical hardware provided relatively little assistance (esp on IBM compatible PCs, but some other computers were better in this regard).
In any case when you're up against a lot of hardware limitations you have to be very efficient with your code.
Also, if you go back to the early 80s I'm not sure how efficient compilers were. I know they had optimizations but I have no idea how they compare to today.
I remember PC games in the 80s that didn't even run under DOS. They would just boot right off the floppy directly. If you weren't trying to read DOS filesystems it wasn't like that OS did much for a game anyway, other than allow it to be run from a hard drive. Actually, I think even MS Flight simulator booted straight from floppy without DOS in that era.
@@RichFreeman Hell, PS1 games though they used c, they used assembly also.
ASM in that era wasn't really that time consuming. C-code is almost the same in terms of complexity, and back then a C-Compiler could run for days if not months over your entire project whereas an Assembler would be much, much faster.
It's the only way to fly, lad. Well, on classical processors. Might be fairly pointless on modern processors.
Now I understand why I didn’t remember to have any noticeable slowdown in my mario 64. It was always slowed down but optimized.
Maybe they mean that they played the PAL version, slower overall but less frame dips?
My thought would be that it wasn't optimized because the game was definitely being developed at the same time as the hardware, and optimizing was a secondary concern much like how Fire Emblem Awakening had a lack of feet because they didn't know what the final device would look like.
Especially if some revision along the lines had already caused issues.
This is one of the best explanations I've had for the nuances of compilation. I had a vague understanding of the process of code compilation but never fully enough to really see the whole picture.
03:16
So they were able to recreate the source code?
That blows my mind!
I had a couple of friends that worked at different games companies during the 90s (Rare, and Interactive Studios), both working on N64. From memory they told me that despite its on-paper specs the N64 was terrible at anything that didn't cull or couldn't cull, and doing so would massively slow down the console. This can be seen with anything that required redrawing the entire screen - e.g. full screen water or transparencies, or drawing large objects over the world. The friend at rare had no PS1 experience to compare against, but the other friend did. It might have been related to the anti aliasing mode which was not well optimised for objects over objects. The dithered overlay shown at 6:32 would not have suffered this as it's just a 1bit overlay, it's also not perspective correct, or filtering a texture to that (likely 2D) polygon. So it would inherently be a faster option.
Good video though, really interesting to hear about little quirks like this fell through the gap. The same friends used to compare stories of Nintendo QA, which for the time was very very picky, but obviously didn't extend to looking at the code and understanding the compiler.
Very well done video! I love this kind of technical stuff. Cracking open games in kinda my thing. There were some cool secrets to be found inside of the code in Aladdin for Sega Genesis, too.
I misread the title and I thought this video was about a glitch in SM64 that created the Super Mario Bros. Super Show
My guess: they didn't have time to test if the optimized version had any errors, so they decided to play it safe and release the unoptimized version which was tested because it would be a disaster if there were game-ruining errors in the optimized version.
I read this as "Super Mario 64 Super Show" at first.
If only...
10 mins and 13 secs compressed into a single sentence: Nintendo released a debug build.
Thanks, this guy doesnt know what hes talking about anyway
Nintendo might have released a debug build*
Or they made a real build but choose to not go O2 because of UB in their code.
@@Guztav1337 I don't think O0 would be that bad. The build almost certainly contained debug assertions & other binary bloating.
I love the decompilation too! I used it to create my Detective Luigi hack, which runs super well on everdrive! Good video explaining all this.
7:11 So a regular Game Theory then.
Fun fact about the Mega Man X4 clip you showed at 6:32- the dithering present there is exclusive to the Sega Saturn version of the game. In the PS1 version, this part and other parts in the game like it use proper transparency instead!
Now that's nintemdo has the 35th anniversary of mario 3d Allstars wandering if its going to run well and if its optimized better for the switch. We know the graphics look way better but keeps the lower polygon but in higher resolution.
It's true that -O2 has been generally considered pretty safe BUT that's only if you're on a *well-supported platform*. For custom CPUs or weird variant chips it can often blow up in your face.
Also code that accidentally depends on undefined behavior can work fine without optimization but utterly fall down with optimization turned on.
Finally, debugging optimized code is much more of a pain, and unless you really need the code to be optimized, shipping the same version that you debug with can have its benefits.
>mario 64 will be 30 years old in 6 years
>it will be a quarter of a century old next year
oh
Ok boomer
De-compiling the code like that is a fucking godlike achivment. I cannot convey my level of absolute fucking mad respect to the people that did that. Madlads, truly
Speed runners: Damn this lag!
Me: This game is fun!
1:24 “for years, if not decades” Christ, I felt that
3:34 so uh, about that
I heard in another video that it wasn't optimized yet because the compiler wasn't mature enough yet and even -O2 was broken. They even went back and tried the old compiler version on this same source code and it actually did crash somewhere.
1:50 Hehe that's not C
Ikr
And now Kaze Emanuar has rewritten SM64's code to run a smooth 60fps at all times.
"Almost as if nintendo leaked the source code themselves"
Wow this guys a genius, why didnt we think of that
2020 and still there's new content about this game
Impressive...
Now I'm wondering what would happen if you compiled it using O3... Would it run? Would it even compile?
Try it out! There are instructions for how to compile it if you look.
Make a video and comment back for notifications bump
I would guess it would. Usually O3 does things like unrolling loops and changing floating point instructions. Probably would still work just fine.
@@zabotheother423 floating point optimization can cause weird "glitches", like already seen as differences between virtual console and real hardware. (subtle errors when removing a few instructions increases or decreases the math accuracy, leading to drifting values over long time)
@@kneesnap1041 It didn't compile with -O3. The compiler says "wrong file format" right at the end. Tried with -O2 and it worked fine, so it's not my setup that's making it fail.
Ah, some of the world's most popular programming languages: Sea, Sea++, and Sharp Sea.
I can't even imagine how a décompiler is even coded
listen closely to the part where he said it was decompiled
they did it by hand
@@sodiboo that sounds even worse.
It was fun, smart, and cool that rare put easter eggs in their makings. D.K. Racing introduced banjokazooie and conker. Banjokazooie introduced conker with an image inside gruntilda's ship in "rusty bucket bay".
I remember one time when trying to do this glitch Mario’s head model got super stretched out but his eyes stayed the same size, it’s the stuff of nightmares
I remember reading something like this when looking into the large address aware patch for Skyrim on pc. My memory of this is fuzzy but, something about him decompiling the exe to find that Bethesda didn't use some of those optimization flags for the pc version of the game. Its apparently a direct port of the console versions. It also put a copy of the textures in the main ram... for ... reasons i guess... Seems they made the game on pc, converted it to xbox360/ps3. Then converted THAT version back to pc without re-enabling the flags that you cant use on consoles. The guy couldnt post any proof though, as he would have to upload a modified version of the game... Interesting anyway.
Why is it that I don't remember anything about this submarine, this feels new
Me too. What is this level i never bothered to find
Its mandatory to beat the game tho, maybe your older brother/sister beat it for you? lol
@@taxtapyinc.2921 I've literally never seen this level but I know I've beaten the game o-o
@@azazellon you literally need to beat this specific star in order to beat the game casually. Unless you were a speedrunner god when you were a child you had to play this stage in order to beat the game.
@@diegomedina9637
Maybe he had some weird version of the game? I had a mario kart ds catridge wich was straight up missing a stage. I know it might be Extreme-Lyrik unlikely and rare
My guess would be since building using optimizations takes longer than regular builds (which are used during development), and running on a tight schedule, it was just to big of a risk enabling the optimizations relatively close to the release.
This was a really cool video, MattKC-san. Thank you. I always felt that parts of the game were a bit slow, but I didn't realize it was a bottleneck issue. O_O
"I'm not even gonna get into what Super Mario 64 is, everyone knows what Super Mario 64 is" and if you don't, it's Super Mario on the Nintendo 64
This was really informative. Nice.
As someone who’s been in the game industry for 26 years for me the answer is pretty obvious: they had to f***ing ship the NTSC version and had no time to test O2 thoroughly. When you’re shipping on cartridges you can’t take any chance of a game being buggy or crashing, however slim, especially when you’re Nintendo and it’s your flagship title for a new system. The PAL version comes out later, more time for testing, and more of an incentive to do so due to it being indeed naturally slower due to 50hz.
1:57
talks about c
shows c++ website
I like how RE videos like this generates so many truthful insightful comments! Thanks for sharing/showing off your knowledge!
The simplest and most likely explanation: they made a mistake with the flag. Could be a typo, could be just forgetting to turn it on. And even the PAL version having it on is easily explained by them releasing that one nearly a year later (June 23 1996 Japan, September 29 1996 USA, March 1 1997 EU). Since for PAL they needed it to run slower to match their N64s and TVs, they happened across the mistake that was missed even when translating for USA. By then, being a physical release, it would be easier to fix it for PAL and ignore it for other ones than to recall it or release new copies and tell people the mistake.
Yeah this is absolutely something that could fall through the cracks when putting together a submission
It’s not uncommon to purposefully set the compiler at -O0 for production code, especially in threaded programs written by lots of people. A missing “volatile” keyword somewhere can make some optimizations unsafe.
Now I know why I never had those huge slowdowns in JRB. I always had a PAL copy.
Well, your entire game was slowed down
I'd love for a follow-up with some more in depth tests. There is a GameShark code for a lag counter. There also is one to "deblur" SM64, removing their AA filter that causes a lot of lag. Then there is OverClockied the N64. I'm wondering if a combination of the recompiled version, the removal of AA, and an OCed N64 could make there be zero slow down. I know the removal of AA helps the framerate quite a bit, but no one has tested OCing the N64.
Thank you, this was incredibly interesting!
The PAL version actually runs smoothly. I guess they had time to realize 02 was not harmfull.
Alternate title: "The Importance of Understanding your Toolchain (ft. Nintendo)"
with a buggy compiler not even directly designed for the Nintendo 64 (called Ultra 64 at the time) hardware that had the chance of introducing bugs just because you enabled optimizations
With appendices, "The Importance of Your Toolchain Being Tested on Your Target Platform", and "The Importance of Your Target Platform Existing Long Enough to Have a Fully Functional Toolchain". ;3
Ooooh. I thought the title said "super show" so I was wondering when this submarine talk was going to translate into the super Mario bros super show beginnings.
This comment is just to help this video get big cause it's intresting.
Phew, for a moment i thought "wait.. i don't remember it to drop that heavy" until the section with the EU/PAL version came XD
"Speedrunning edition" loool
I was about to say "N64 didn't use C, PSX was the first" but after looking, it appears C was used a lot on N64, ad-hoc, as C usually is I suppose. There wasn't an official C kit, unlike the Playstation dev kit.
link for the new optimized mod, please?
Try KAZE Emanuars 60fps romhack
Would it be possible to compile the decompiled version with a C compiler from prior to when the game came out? That way we can see if the compiler was to blame back then at making buggy code.
Ugh. Two things. One, I wish you had talked about the sub in DDD having dynamic collision as part of the reason it was so slow, but I guess you need to be more familiar with the codebase or talked to anyone on the decomp project to have known that. Two, the everdrive isn't reading directly from the SD card while you are playing, when you start a game it loads the ROM onto a piece of dedicated memory on the cart that has at least the same access time as an original cartridge ROM chip. The way cart ROM is accessed is such that it is always accessed at the same speed, usually measured in a number of CPU cycles.
You're absolutely right!
Now I'm wondering how we should enhance the Nintendo DS port of Super Mario 64?
Did the GCC compiler contain a -O2 back in 1995? It could just be possible that the optimization engine did not exist or if it did, it was not stable. Remember, it would have needed to be have been developed for MIPS R3000 core. I know a lot of stuff for those branches were not introduced until 1999.
Thanks for showing us the decompilation project! I always dreamed about seeing source code of n64 games! SO AWESOME!!!
1:30 this is one of the most impressive feats of programming grit and toughness I have ever seen done.
The guys responsible for the "perfect" copy of SM64, truly my hat is off.
Have a golden star ;) Oh that's right you got all 120 already!
While I know it's not the exact same thing, there is a world boss in WoW that, at one point, was causing massive frame lag and latency issues when you got within (roughly) 50 yards of her and it was made even worse if you were in a raid group. Yep, even if you weren't in a raid and approached the area, the entire game would lag in both latency and frames, causing major issues with trying to take her down (and that doesn't even begin to touch on how OP she was). After a couple months, suddenly everything was ok with no major latency or lag and I wonder if Blizz made an oops compiling the code, but had to "ship it out" anyway to get the patch out on time.
It could be different for the Nintendo 64, since they had one specific hardware target, but typically speaking, C compiles to assembly, not directly to machine code.
Otherwise great video, super interesting!
6:31 - Just like Super Mario World's water. At least, that's what a *_Beta64_* video told me.