IMHO, this is a compiler (and floating point libraries) issue and not related to the OS on which you are running the compiler. You should get the same results on Windows using the same compiler/libraries. The advantage on Linux might be that is easier to install (or compile) the best compiler version. I hope Raspberry Pi updates soon the VSCode extension to install this compiler version on all platforms.
Yes, while I find VScode and Windows unproductive they are unlikely to slow down execution speed. The problem is that they are defaulting to inferior code generation. I would want to know what aspect of that toolchain is a bad choice. My understanding from the video is that it's the choice of cg options that was allowed by choosing features manually, perhaps generating code that's only good for the 2350 and not generically, but I may have misunderstood. I don't understand otherwise why the developers of the vscode-based toolchain have chosen such a poor solution.
If you look at the RP2350 datasheet, single precision FP is included in the M33 as an extension, while as the double precision is added as an extra coprocessor with limited functionality. According to Table 112 in the datasheet, this coprocessor is slower than a full hardware implementation, but it's still an order of magnitude faster than a software implementation (6 vs 70-90 cycles for an add operation, and 49 vs 130-650 for a sqrt). Single precision should be noticeably faster, still. It'll be interesting to compare with both kinds of operations separately!
@@GaryExplains You're right! I completely forgot about that😅. The single precision benchmark would probably show some substantial uplift as well, particularly for RISC-V, but whether it does or not doesn't undermine the point of the video. Thanks!
The RP2350 has for the M33 a FPU (SP) and custom hardware for DP-floats (see the RP2350-datasheet). And of course you can use vscode with your self-compiled tool-chains.
@@joseph9915 Due to the overhead it's make no sense. The surprising RISC-V bonus are showing, that they can drop ARM, and move the extensions (CoProcessors) and security features to the RISC-V cores. RISC-V seems to be a good option.
This is good to know. I did not feel like installing VS-Code. I like to know what is going on in the compile phase so I like to be in control, or at least be able to see what is happening. The Raspberry Pi Pico extension hides too much for my liking so I guess I'll stick to the command line tools or use the Eclipse IDE for this.
Wow, this is good stuff and a whole lot of work. Thanks! (Testing is good. Even better is to have a verifiable hypothesis before testing commences, and to investigate after testing if the results do not confirm the hypothesis.)
Great and surprising video. My RP2350 devices arrived yesterday. I got two Pico 2 boards and 2 Pimoroni Tiny RP2350s. On the compiler issue, I imagine in the near future someone will update the VS Code extension and get it on par with the SDK. I just started some dev on a Pico 2 last night in MicroPython in VS Code. Still some rough edges there too. In the 19080s and 90's I worked on some optimizing C/C++ compilers, I also do cross-platform builds in Linux or Mac. Your video is a reminder how critical each component of the build chain is in achieving the best results.
I've been using a Docker image for my toolchain which uses Debian Linux. Thanks for this - I expect I'll have to review my Dockerfile to make sure I get the best performance.
How are the Single Precision Floating point benchmarks for each of the platforms? And could you squeeze more performance out of the RP2040 Cortex M0 when using a command line compiler?
You should be able to still use VS Code but then select a different compiler with the CMake extension tools, or any other way that you can use to change the compiler in CMake.
This has F-all to do with VS Code, which is just a text editor, or with Windows. It's just which compiler is doing the compiling. (Now, it may be that the optimised compiler is harder to get working on Windows, I don't know. But once you have gcc running, it can compile a new version of itself.)
No, VS Code is a lightweight IDE, and he's talking about the official extension for PICO. He just shortens it to "VS Code", since he's already explained that he's talking about the extension, and any intelligent person should be able to follow.
Interesting results. I've just got a board too and ran some benchmarks (SinglePrecision whetstone from STM32Duino examples) and got the same results with the VSCode extension on Windows and the manually installed compiler on Linux (WSL2). Something else that I find interesting is that the RP2350 got a ~1.3x better score than a STM32H5 at the same frequency of 250MHz.
something i feel would be popular is a small, simple, as cheap as possible RP chip requiring basically no external components to use. maybe to replace ATMega chips now that they're crazy expensive. 5v power input, usb, internal flash, QFN32 or something. maybe using that RISC-V core
That could be a very good question. And not only peak consumption, but metered from begining to end of task and normalised to be able to compare the numbers.
Could you also compare the code size of ARM vs RISC-V ? If one binary takes much more space then the other that could be a major difference in some projects.
So that also put in question the RP2040 results. Did you rerun the tests for that also using the optimized compiler? Some time ago i installed the toolchain for that on a raspberry pi4, because it isnt microsoft (i know VS code got installed, i don't use it). Your videos on how to compile using that toolchain are really good btw.
@@nonchalanto He may have, but he didn't say he did, and if he did it did not make any difference. He only said he recompiled the RP2350 ARM cores. Actually no improvement is what I would expect. The RP2350 has been around long enough for the VS Code Extension to be optimized. A little patience and I guess they will optimize it for RP2350 with both core types.
A StarFive VisionFive killer, even without VScode sorting out its compile flags. The Hazard3 cores may be slower, than the Cortex M33 and SciFive, but they're there, as a freebie, for those who want to tick the played with RISC-V box, and for just a fraction of the $5 SBC you can use as a standard Pico; rather than spending $100 on a separate, barley supported board.
This is more along the lines of what i expected a hardware fpu would do compared to a software emulated library. I suspect with some tuning it could approach at least 50x over software.
Couldn't the owners of the Vscode extension follow your compiler build instructions and fix the issue while maintaining all the ease of use features it adds? Can you not use the same build file under windows and get the same results? I am new so forgive me these are dumb questions.
Yes they could and may well do so in an update but until then there is a work around for the early birds that have already got the RP2350. Might be the extension developer is waiting for theirs to show up so they can do some testing before they release an update. Remember it was only announced less than a couple weeks ago and most of the people that have one already and are doing videos on them have prerelease versions that they got so they could make these close to release to drive interest. Give it a couple months and I would expect issues like this will have disappeared.
Hopefully someone can improve this inside of the VSCode extension? This just seems like the ones bundled with the extension have sub-optimal compiler configurations.
I once ran into compiler overhead when using an interrupt with a 30x right shift to use the 2 msb. But I found a clever trick that took 150 instructions used for that and reduced it to 5 to 7. Basically I first used a trick to extract the most significant byte likely 2 instructions. Then used a never touched instruction called swap which is sometimes called a nibble swap. 3 instructions total at this point, and 2 right shifts to go for a minimum of 5 instructions and a maximum of about 7. Best case is it's 30x faster!!!! This was on the atmega328p BTW. Arduino uno board.
You don't have to do it like this, I suspect you have 32 bit value from which you wanted to extract the 2 most significant bits, you can create a struct with 16 fields, each field represents 2 bits. Now you map your 32 bit variable to this struct and extract the 2 bits using the fields into a unit8_t. The compiler will automatically generate the right code for you. In assembly perhaps the most optimized version could be just extracting the most significant byte and right shifting it 2 times with roll over and zeroing the rest of the upper 6 bits with AND.
@@fredrikbergquist5734 RiscV does have floating point extensions as well. Most general purpose chips have them, but RaspberryPi did not include them on the Hazard3 design, or at the very least in this implementation. I think it makes sense, as adding hardware floating point would increase the die size, and if you need the feature, you can just use the M33 cores.
On 2024-09-11 a new version (0.16.0) of the VS Code Extension was released. Among the changes seem to be upgrades to both the ARM and RISC-V toolchains. I haven't run any tests, but thought it worth mentioning.
Didn’t even knew there was a VSCode extension, I installed the sdk once about 18 months ago and at that time the instruction booklets let me know to do it via the commandline.
@GaryExplain, Is the test code available for download? What compiler version and options where you using? I was re-running a Wheatstone test I adapted (some time ago) to the RP2040 and was surprised to see it jump from 10 to 30 MIPS after updating the SDK and tools. It turns out that the optimizer was throwing away a huge part of the test! I had to turn off the in-lining of a function in the procedure call testing ("module 8" in the original comments) to get a more meaningful result.
@@GaryExplains Looks like your code is about the same as mine. One thing you may try is to add a "__attribute__ ((noinline))" before "void P3" in the Wheatstone test. This part of the test ("module 8" in the comments) should test procedure calls, but the optimizer will throw it all away if you don't add this attribute. This makes a huge difference in the end result (as I mentioned, the RP2040 jumps from 10 to 30 MIPS when this part is thrown out by the compiler). My overall impression is that the Hazard3 core at 150MHz is just a little faster than the M0+ running at 125 MHz.
So, cant you change the compiler options in vs code to use the sdk and tool chain? I think this can be done, but it has been a while since I have played in that area
I'm not familiar with this, but looking at the code of the VS Code extension, it seems like all platforms use the Core-V Toolchain for Risc-V, aside from linux on arm64. When it comes to the compiler, the arm cores seem to be using the arm-gnu-toolchain version 13.2.rel1, which is based on GCC 13.2. For RISC-V, its not so clear but I believe it should be based on GCC version 13.2 for windows, and 13.3 for other OSes. There's no open issues or pending commits regarding this on the repo at this point.
8:54 it always takes time for updates of all the software for new hardware to get into all the Linux distributions, may it be Linux or VSCode extensions, etc.
I am suprised that if you rebuild the compiler and load the latest SDK's that the VS Code extension does not pick these up. Definatly a big shame lets hope the extension gets updates very soon.
@@GaryExplains Thanks for the explanation lets hope for an update soon. A detailed video on building and using the latest tool chain would be appreciated
Gary its ALWAYS been all about the quality of the compiler, if you've been around for so long, how is this a revelation? For example, historically Intels proprietary compiler was for a long time out classing gcc on x86 on Linux though things have likely changed since last time I checked. Likewise LLVM based compilers will also produce different performing code to GCC ans then there's compilers like Zig built on top of llvm.
It isn't a revelation that the compiler is key, it is a revelation that different tools in the same ecosystem can be so radically different. The compiler in vs code should be the same as the compiler in the command line environment.
@@GaryExplains Not sure how things are in windows land, but on Linux, there is no "compiler in vscode", as you mentioned yourself in the video, the vscode plugin is just there to make life easier by downloading the required toolchain, you could still run that very same compiler toolchain from the command line. It all comes down to the toolchain and the specific compiler used in each toolchain.
@mksln I don't want to be contrary, but when you use the Pico extension with VS Code on Linux, the extension downloads the SDK and toolchain into a private folder, it doesn't use the compiler installed at the system level. So, in Linux is there a compiler in VS Code in this context.
Did you use the same compiler options in both situations? By default VSCode builds for debug build with optimisation disabled, but if you're building from the command line it'll default to a release build
You can change the build type to Release by running “cmake build -DCMAKE_BUILD_TYPE=Release” from the terminal in VSCode - we will change the extension to default to release builds in future
@@GaryExplains Luke's right about the current compiler for RISC-V but the debug options will effect both RISC-V and ARM significantly. We are currently only using the upstream Windows build, but will build our own version soon (we didn't do it originally because it takes so long on GitHub actions to build!)
@gordonhollingworth5281 Thanks for the info. I tested the debug/release flags when using VS Code and it does indeed make a difference (as we would expect). VS Code is building debug binaries by default. But the RISC-V Whetstone test went from 1.9 to 5.6 without the new compiler and to 25.0 with the new RISC-V compiler. PS. I am about to email you, it would be good if we could chat about some of these things offline. I would greatly appreciate if you could reply. Thx.
If you are serious about this kind of development, you set up your toolchain. Your own. On Linux. Then you are good forever, tweaking whenever you need to. Why not use the easy method? Use everything set up on Windows? Well, this video tells you why. If you are just playing, sure, VS Code is fine. I tore my hair out when I was developing Palm apps on Windows until I got so fedup I gave up on developing on Windows. For the last 15 years, any issues I encountered on Linux is my own. Rather than some obscure issues that used to take forever to find out, when I was developing on Windows. Mind you, if you are developing for Windows, sure, use Windows. You have no other choice. My condolences. Otherwise, heck, use MacOS. That is still better than developing on Windows.
@@theelmonkYou a Linux bigot? A real question as I have used so many operating systems I don't care about who/what/how it's made.I contributed to Linux back in the 90's. None of my code is used anymore. My desktop is Windows. I can have all the gnu utilities, gcc everything. My Linux system is not located for direct physical access for flashing, so, easier to flash from my desktop/laptop.
@@theelmonk Linux is the least hassle for you. For others, maybe not. That's why I asked. The most convenient for me would be my Windows desktop/laptop. I have to run windows based software that will not run in a VM.
@@jfseaman1 I agree a lot of it is what you're used to. But if you don't expose yourself to alternatives you won't find better. And if you don't want to run a VM, just use another machine (as a developer you'll have half a dozen unused, right ?). You can run all of Linux remote so you lose no desk space.
@@GaryExplains ahh ! I see, my first language isn't English so maybe that's why I didn't hear it correctly. Anyway, I think we all understood what was meant 🙂
As far as I'm concerned MS, with the introduction of the self-contained PC in the 1980s that had no easy access to the outside world, cost a generation or more of makers and programmers. Now they're at it again - make it easy, but make it (relatively) dumb.
That's hardware floating point vs software emulation. Unsurprising. But most microcontroller tasks don't need FP at all, or need very little of it e.g. for updating a status display on an LCD or something, and speed is unimportant. An FPU adds a lot of transistors on a simple low end in-order CPU.
Considering oceantoo doesn't use any floating point math, why are you using it for benchmarks on what is ostensibly a video about floating point hardware / library / optimization? I felt like this was misleading/confusing.
I think it was a video about the performance differences between compilers (and their libraries). Floating point is one of the major differences, but not the only one.
I know i should be grateful for things getting faster, better and free open source, but the size of the core-v riscv stuff to build your own compiler... 3.6GB... I mean, really? My poor RPi 4 is still, after some couple of hours, struggling with the compilation of the compiler. The hardware giveth and the source code taketh away...
Welcome to the wonderful world of software development. Some 25 years ago it also took a few hours to compile the complete Linux system for our newly developed ARM10 tablet. Those were the days that someone would check code carefully before starting a compiler sessions 😁
One big thing: GCC 14.1 cortex-m33 codegen is utterly horrible at O2/O3/Ofast. Redundant instructions, all other kind of headscratchers. No attempts whatsoever to use any DSP instructions, etc, even when you try to spoon feed it a suitable pattern. Have to use intrinsics directly to extract proper performance; not nice. Newest clang does quite a bit better, but it also doesn't use the full instruction set. For 16-bit math code it often compiles something that takes 3-5x more CPU cycles than it should! 32-bit math 2-3x vs hand optimized. Ugh. RP2350 is a powerhouse, but at least ARM Cortex-M33 compilers available could do a lot better.
Note that my examples were things like alpha blending (graphics composition), texture mapping, audio processing etc. Generic non-mathy code is usually fine with clang.
I wonder why they added those crappy/useless riscV cores? It would be much more useful if they added more ram and wifi instead. Only 2 uarts are very low for an 80 pin mcu too.
The biggest reason is probably testing the waters. If they go over well, it will lower per chip costs a little. Even better, they could potentially open source everything from the ground up which fits well with their values. I’d imagine there’s also more freedom in design. They can much more easily tailor their RISCV design to their exact needs where ARM would need to be paid a small fortune for a completely customized core variant.
While I'm more enthusiastic than you about the availability of the Risc V cores, it does seem an odd decision to put them both in and yet in a way that makes only using one at a time possible. Feels like a waste of silicon. Maybe cheaper than creating and testing 2 chips, admittedly. I guess somewhere it's explained (I think it somehow doesn't just leave the unused cores dead but reuses part of them). Would be interested in a deep description of the design choices. Of course, we could probably argue forever on what the 'spare' space could be used for. You'd like more ram, I'd like a faster USB interface. Doubtless as many preferences as there are users !
Those crappy RISC-V cores that were designed part-time by a single R Pi engineer, and are as fast as or faster than Arm's flagship Cortex-M33 designed by a large team?
Gary, this is somewhat the same feeling that I got when started working with RP2040 back in the days after working with a Arduido (Bluefruit Feather). Adafruit docs were SO MUCH BETTER, and the software worked WAY MORE RELIABLY without requiring you to run obscure commands, or be an expert in CMAKE / NINJA. And code examples were not anything like "//okay so we do it here like so and now THIS works but everything else does not. Good luck making it work for your use case!". Apparently this is a controversial take ¯\_(ツ)_/¯
Only for the remaining Office users who haven't caught up yet. Most of us have been there for years. I think I changed to it (from RISCiX) around 1995.
FYI eveyone. Doom doesn't use floating point it uses fixed point integer arithmetic. So... RISC-V should crush it. 😘
IMHO, this is a compiler (and floating point libraries) issue and not related to the OS on which you are running the compiler. You should get the same results on Windows using the same compiler/libraries. The advantage on Linux might be that is easier to install (or compile) the best compiler version. I hope Raspberry Pi updates soon the VSCode extension to install this compiler version on all platforms.
GL building cross platform toolchain on windows.
@@panjak323 It's possible, but a lot more headaches than on Linux. Not worth anyone's time.
Yes, while I find VScode and Windows unproductive they are unlikely to slow down execution speed. The problem is that they are defaulting to inferior code generation. I would want to know what aspect of that toolchain is a bad choice.
My understanding from the video is that it's the choice of cg options that was allowed by choosing features manually, perhaps generating code that's only good for the 2350 and not generically, but I may have misunderstood. I don't understand otherwise why the developers of the vscode-based toolchain have chosen such a poor solution.
@@panjak323 I do it every single day ... on WSL.
If you look at the RP2350 datasheet, single precision FP is included in the M33 as an extension, while as the double precision is added as an extra coprocessor with limited functionality. According to Table 112 in the datasheet, this coprocessor is slower than a full hardware implementation, but it's still an order of magnitude faster than a software implementation (6 vs 70-90 cycles for an add operation, and 49 vs 130-650 for a sqrt).
Single precision should be noticeably faster, still.
It'll be interesting to compare with both kinds of operations separately!
Which I did in my RP2350 FPU video, but I didn't feel the need to do that again in this video as it is more about the compilers.
@@GaryExplains You're right! I completely forgot about that😅. The single precision benchmark would probably show some substantial uplift as well, particularly for RISC-V, but whether it does or not doesn't undermine the point of the video. Thanks!
Excellent research on these hardware and software options Gary! 🙌🤣
Thanks for the update, Gary! My Pico 2 arrived yesterday in the mail. Time to test my floating point code. 👍
"263 Whetstones I get", Enlightened you are, Master Gary!
Hit like a Dragon Ball transformation to lvl 5 😅
The RP2350 has for the M33 a FPU (SP) and custom hardware for DP-floats (see the RP2350-datasheet). And of course you can use vscode with your self-compiled tool-chains.
Can the hazard cores also use the math coprocessor?
@@StanleyPinchak No, see RP2350 datasheet, page 100 .
@@deterdamel7380 but vscode is the official software or not?
@@deterdamel7380 couldn't someone write a library, so 1 hazard core could use 1 arm core as a floating point coprocessor?
@@joseph9915 Due to the overhead it's make no sense.
The surprising RISC-V bonus are showing, that they can drop ARM, and move the extensions (CoProcessors) and security features to the RISC-V cores. RISC-V seems to be a good option.
Great work Gary!
Would you mind sharing the details for your GCC build for the M33?
This is good to know. I did not feel like installing VS-Code. I like to know what is going on in the compile phase so I like to be in control, or at least be able to see what is happening. The Raspberry Pi Pico extension hides too much for my liking so I guess I'll stick to the command line tools or use the Eclipse IDE for this.
What about redoing the RP2040 scores with the command line compiler?
👏🏽👏🏽👏🏽 Thanks for this, Gary!!! YES!!! The compiler matters-- A TON!!!
Wow, this is good stuff and a whole lot of work. Thanks!
(Testing is good. Even better is to have a verifiable hypothesis before testing commences, and to investigate after testing if the results do not confirm the hypothesis.)
Great and surprising video. My RP2350 devices arrived yesterday. I got two Pico 2 boards and 2 Pimoroni Tiny RP2350s. On the compiler issue, I imagine in the near future someone will update the VS Code extension and get it on par with the SDK. I just started some dev on a Pico 2 last night in MicroPython in VS Code. Still some rough edges there too. In the 19080s and 90's I worked on some optimizing C/C++ compilers, I also do cross-platform builds in Linux or Mac. Your video is a reminder how critical each component of the build chain is in achieving the best results.
I've been using a Docker image for my toolchain which uses Debian Linux. Thanks for this - I expect I'll have to review my Dockerfile to make sure I get the best performance.
How are the Single Precision Floating point benchmarks for each of the platforms? And could you squeeze more performance out of the RP2040 Cortex M0 when using a command line compiler?
You should be able to still use VS Code but then select a different compiler with the CMake extension tools, or any other way that you can use to change the compiler in CMake.
This has F-all to do with VS Code, which is just a text editor, or with Windows. It's just which compiler is doing the compiling. (Now, it may be that the optimised compiler is harder to get working on Windows, I don't know. But once you have gcc running, it can compile a new version of itself.)
No, VS Code is a lightweight IDE, and he's talking about the official extension for PICO. He just shortens it to "VS Code", since he's already explained that he's talking about the extension, and any intelligent person should be able to follow.
Interesting results.
I've just got a board too and ran some benchmarks (SinglePrecision whetstone from STM32Duino examples) and got the same results with the VSCode extension on Windows and the manually installed compiler on Linux (WSL2).
Something else that I find interesting is that the RP2350 got a ~1.3x better score than a STM32H5 at the same frequency of 250MHz.
Gary has been explained to ;p
something i feel would be popular is a small, simple, as cheap as possible RP chip requiring basically no external components to use. maybe to replace ATMega chips now that they're crazy expensive.
5v power input, usb, internal flash, QFN32 or something. maybe using that RISC-V core
The Platform IO extension in VS Code was easy to setup for the Pico SDK and works very well.
Thank you Gary! You explain like no other.
How does the power consumption compare between arm and r5 for similar compute tasks?
That could be a very good question. And not only peak consumption, but metered from begining to end of task and normalised to be able to compare the numbers.
Could you also compare the code size of ARM vs RISC-V ? If one binary takes much more space then the other that could be a major difference in some projects.
So that also put in question the RP2040 results. Did you rerun the tests for that also using the optimized compiler? Some time ago i installed the toolchain for that on a raspberry pi4, because it isnt microsoft (i know VS code got installed, i don't use it). Your videos on how to compile using that toolchain are really good btw.
Yes, he did. Did you not watch the whole video?
@@nonchalanto He may have, but he didn't say he did, and if he did it did not make any difference. He only said he recompiled the RP2350 ARM cores. Actually no improvement is what I would expect. The RP2350 has been around long enough for the VS Code Extension to be optimized. A little patience and I guess they will optimize it for RP2350 with both core types.
I was very curious about this as well - just to see the real numbers there. It wouldn't surprise me that vsCode'd be slower also with the old rp2040
@@john_hind I would hope that if you say I rerun the tests and show three different graphs those would be the ones you rerun ..
I would like to see the re-run of 2040 as well.
Nice deep dive! Turns out little details matter :D
Is the manual installation on for example MAC for the SDK and toolchain documented somewhere ?
A StarFive VisionFive killer, even without VScode sorting out its compile flags. The Hazard3 cores may be slower, than the Cortex M33 and SciFive, but they're there, as a freebie, for those who want to tick the played with RISC-V box, and for just a fraction of the $5 SBC you can use as a standard Pico; rather than spending $100 on a separate, barley supported board.
Pretty sure that will get updated soon. VS code will have the best compiler available soon enough.
This is more along the lines of what i expected a hardware fpu would do compared to a software emulated library. I suspect with some tuning it could approach at least 50x over software.
2x better than pico may not worth the upgrade, but 26x is another level.
Thanks for your follow up video!
Couldn't the owners of the Vscode extension follow your compiler build instructions and fix the issue while maintaining all the ease of use features it adds? Can you not use the same build file under windows and get the same results? I am new so forgive me these are dumb questions.
Yes they could and may well do so in an update but until then there is a work around for the early birds that have already got the RP2350. Might be the extension developer is waiting for theirs to show up so they can do some testing before they release an update. Remember it was only announced less than a couple weeks ago and most of the people that have one already and are doing videos on them have prerelease versions that they got so they could make these close to release to drive interest. Give it a couple months and I would expect issues like this will have disappeared.
Hopefully someone can improve this inside of the VSCode extension? This just seems like the ones bundled with the extension have sub-optimal compiler configurations.
I wonder how/if this same thing applies with rust on the RP2350. Does the default compiler work this well or not?
I once ran into compiler overhead when using an interrupt with a 30x right shift to use the 2 msb. But I found a clever trick that took 150 instructions used for that and reduced it to 5 to 7. Basically I first used a trick to extract the most significant byte likely 2 instructions. Then used a never touched instruction called swap which is sometimes called a nibble swap. 3 instructions total at this point, and 2 right shifts to go for a minimum of 5 instructions and a maximum of about 7. Best case is it's 30x faster!!!! This was on the atmega328p BTW. Arduino uno board.
You don't have to do it like this, I suspect you have 32 bit value from which you wanted to extract the 2 most significant bits, you can create a struct with 16 fields, each field represents 2 bits. Now you map your 32 bit variable to this struct and extract the 2 bits using the fields into a unit8_t. The compiler will automatically generate the right code for you.
In assembly perhaps the most optimized version could be just extracting the most significant byte and right shifting it 2 times with roll over and zeroing the rest of the upper 6 bits with AND.
As the RiscV architecture matures there will be hardware for floating point also.
@@fredrikbergquist5734 RiscV does have floating point extensions as well. Most general purpose chips have them, but RaspberryPi did not include them on the Hazard3 design, or at the very least in this implementation.
I think it makes sense, as adding hardware floating point would increase the die size, and if you need the feature, you can just use the M33 cores.
Wow, we need to know what difference is between the sdk setup script and the vscode extension...
On 2024-09-11 a new version (0.16.0) of the VS Code Extension was released. Among the changes seem to be upgrades to both the ARM and RISC-V toolchains. I haven't run any tests, but thought it worth mentioning.
Yes, I made a new video about it already 👍
Didn’t even knew there was a VSCode extension, I installed the sdk once about 18 months ago and at that time the instruction booklets let me know to do it via the commandline.
Kind of crazy improvements, so why is VScode so borked?
@GaryExplain, Is the test code available for download? What compiler version and options where you using? I was re-running a Wheatstone test I adapted (some time ago) to the RP2040 and was surprised to see it jump from 10 to 30 MIPS after updating the SDK and tools. It turns out that the optimizer was throwing away a huge part of the test! I had to turn off the in-lining of a function in the procedure call testing ("module 8" in the original comments) to get a more meaningful result.
Yes, the code is in my GitHub repo. Just search for "Gary explains GitHub"
@@GaryExplains Looks like your code is about the same as mine. One thing you may try is to add a "__attribute__ ((noinline))" before "void P3" in the Wheatstone test. This part of the test ("module 8" in the comments) should test procedure calls, but the optimizer will throw it all away if you don't add this attribute. This makes a huge difference in the end result (as I mentioned, the RP2040 jumps from 10 to 30 MIPS when this part is thrown out by the compiler). My overall impression is that the Hazard3 core at 150MHz is just a little faster than the M0+ running at 125 MHz.
So, cant you change the compiler options in vs code to use the sdk and tool chain? I think this can be done, but it has been a while since I have played in that area
Did you try compiling the rp2040 via the command line? I've never used vscode, just notepad++ and the command line tool suite.
I'm not familiar with this, but looking at the code of the VS Code extension, it seems like all platforms use the Core-V Toolchain for Risc-V, aside from linux on arm64. When it comes to the compiler, the arm cores seem to be using the arm-gnu-toolchain version 13.2.rel1, which is based on GCC 13.2. For RISC-V, its not so clear but I believe it should be based on GCC version 13.2 for windows, and 13.3 for other OSes. There's no open issues or pending commits regarding this on the repo at this point.
Did re-running the results on the RP2040 change its results as well?
Impressive!
I think the RP2350 FPU only supports single precision floating point, it's probably using a lot of software to deal with double precision numbers.
It has also hardware assisted double precision. For example float64 addition takes 6 cycles or so.
@@az09letters92 thanks, didn't know.
What happens if you compile and use comandline on Windows?
8:54 it always takes time for updates of all the software for new hardware to get into all the Linux distributions, may it be Linux or VSCode extensions, etc.
No.
and what is it compared to a esp32-S3 ?
Great job
I am suprised that if you rebuild the compiler and load the latest SDK's that the VS Code extension does not pick these up. Definatly a big shame lets hope the extension gets updates very soon.
The vs code extension installs everything including the toolchain in a private directory, not on the path or at a system level.
@@GaryExplains Thanks for the explanation lets hope for an update soon. A detailed video on building and using the latest tool chain would be appreciated
So where's the URL for the compiler rebuild?
Eh?
@@GaryExplains Ahh found it, Latest Raspberry Pi Pico-series C/C++ SDK - Page 30
@@simonbooth4888 TY! Would've been nice to have included that and some other sources in the description. Still, a great piece of research
@@guiorgy Now the challenge is building it for Windows and replacing / adding to the default versions of toolchains
@@simonbooth4888 Or you could just use the one that already works
Gary its ALWAYS been all about the quality of the compiler, if you've been around for so long, how is this a revelation? For example, historically Intels proprietary compiler was for a long time out classing gcc on x86 on Linux though things have likely changed since last time I checked. Likewise LLVM based compilers will also produce different performing code to GCC ans then there's compilers like Zig built on top of llvm.
It isn't a revelation that the compiler is key, it is a revelation that different tools in the same ecosystem can be so radically different. The compiler in vs code should be the same as the compiler in the command line environment.
@@GaryExplains Not sure how things are in windows land, but on Linux, there is no "compiler in vscode", as you mentioned yourself in the video, the vscode plugin is just there to make life easier by downloading the required toolchain, you could still run that very same compiler toolchain from the command line. It all comes down to the toolchain and the specific compiler used in each toolchain.
@mksln I don't want to be contrary, but when you use the Pico extension with VS Code on Linux, the extension downloads the SDK and toolchain into a private folder, it doesn't use the compiler installed at the system level. So, in Linux is there a compiler in VS Code in this context.
Did you use the same compiler options in both situations? By default VSCode builds for debug build with optimisation disabled, but if you're building from the command line it'll default to a release build
Where do you set those options in the Pico extension. Also how do you explain Luke's comment?
You can change the build type to Release by running “cmake build -DCMAKE_BUILD_TYPE=Release” from the terminal in VSCode - we will change the extension to default to release builds in future
@@GaryExplains Luke's right about the current compiler for RISC-V but the debug options will effect both RISC-V and ARM significantly. We are currently only using the upstream Windows build, but will build our own version soon (we didn't do it originally because it takes so long on GitHub actions to build!)
The official getting started guide says to install VS Code extension, is it not correct anymore? What's the best way to code on Windows machine
@gordonhollingworth5281 Thanks for the info. I tested the debug/release flags when using VS Code and it does indeed make a difference (as we would expect). VS Code is building debug binaries by default. But the RISC-V Whetstone test went from 1.9 to 5.6 without the new compiler and to 25.0 with the new RISC-V compiler.
PS. I am about to email you, it would be good if we could chat about some of these things offline. I would greatly appreciate if you could reply. Thx.
Bit of a shocker thx
Thank you... And why isn't there a "$Thanks$" button on YT?
🤷♂️
If you are serious about this kind of development, you set up your toolchain. Your own. On Linux. Then you are good forever, tweaking whenever you need to. Why not use the easy method? Use everything set up on Windows? Well, this video tells you why. If you are just playing, sure, VS Code is fine. I tore my hair out when I was developing Palm apps on Windows until I got so fedup I gave up on developing on Windows. For the last 15 years, any issues I encountered on Linux is my own. Rather than some obscure issues that used to take forever to find out, when I was developing on Windows.
Mind you, if you are developing for Windows, sure, use Windows. You have no other choice. My condolences. Otherwise, heck, use MacOS. That is still better than developing on Windows.
Is it really a mess ?
Wow 😮 !
Wow, just wow. Well, I guess I'll have to figure out command line tool chain. I wonder if the command line took chain on Windows would equal Linux?
Why would you want to make life harder for yourself, though ?
@@theelmonkYou a Linux bigot? A real question as I have used so many operating systems I don't care about who/what/how it's made.I contributed to Linux back in the 90's. None of my code is used anymore.
My desktop is Windows. I can have all the gnu utilities, gcc everything. My Linux system is not located for direct physical access for flashing, so, easier to flash from my desktop/laptop.
@@jfseaman1 I use various, but linux is the least hassle
@@theelmonk Linux is the least hassle for you. For others, maybe not. That's why I asked. The most convenient for me would be my Windows desktop/laptop. I have to run windows based software that will not run in a VM.
@@jfseaman1 I agree a lot of it is what you're used to. But if you don't expose yourself to alternatives you won't find better. And if you don't want to run a VM, just use another machine (as a developer you'll have half a dozen unused, right ?). You can run all of Linux remote so you lose no desk space.
6:54 "pretty insignificant" euuh, I think you meant the opposite ???
I actually said "pretty err significant" not insignificant. 🤦♂️
@@GaryExplains ahh ! I see, my first language isn't English so maybe that's why I didn't hear it correctly. Anyway, I think we all understood what was meant 🙂
As far as I'm concerned MS, with the introduction of the self-contained PC in the 1980s that had no easy access to the outside world, cost a generation or more of makers and programmers.
Now they're at it again - make it easy, but make it (relatively) dumb.
26x at a similar clock speed, hot damn!
That's hardware floating point vs software emulation. Unsurprising. But most microcontroller tasks don't need FP at all, or need very little of it e.g. for updating a status display on an LCD or something, and speed is unimportant. An FPU adds a lot of transistors on a simple low end in-order CPU.
Doom when ?
WDYM? google "RP2040 Doom" - you don't need a speed boost to run it.
It already happened at Defcon
Anything can happen when people start working on it, rather than just asking questions.
🤔Does this work using WSL?
I have successfully compiled the RISC-V gcc in WSL, but I have no RP2350 board.
Now the vs code guys will see this video and update vs code.
Let me know when it installs on Windows as easily as it does on a RPi. Until then, I’m not using it
optimisation is key
Considering oceantoo doesn't use any floating point math, why are you using it for benchmarks on what is ostensibly a video about floating point hardware / library / optimization? I felt like this was misleading/confusing.
I think it was a video about the performance differences between compilers (and their libraries). Floating point is one of the major differences, but not the only one.
What @zephsmitg3499 said
I know i should be grateful for things getting faster, better and free open source, but the size of the core-v riscv stuff to build your own compiler... 3.6GB... I mean, really? My poor RPi 4 is still, after some couple of hours, struggling with the compilation of the compiler. The hardware giveth and the source code taketh away...
I hope you have an NVMe drive!
@@GaryExplains Nope! My poor micro SD card i crying. :)
Welcome to the wonderful world of software development.
Some 25 years ago it also took a few hours to compile the complete Linux system for our newly developed ARM10 tablet. Those were the days that someone would check code carefully before starting a compiler sessions
😁
i hope you are reporting all of this to the VS Code extension team -- or at least that they are listening 😉
The VS Code extension is made by Raspberry Pi and I have been told it has all been fixed. I will test next week and make a follow up video.
One big thing: GCC 14.1 cortex-m33 codegen is utterly horrible at O2/O3/Ofast. Redundant instructions, all other kind of headscratchers. No attempts whatsoever to use any DSP instructions, etc, even when you try to spoon feed it a suitable pattern. Have to use intrinsics directly to extract proper performance; not nice.
Newest clang does quite a bit better, but it also doesn't use the full instruction set. For 16-bit math code it often compiles something that takes 3-5x more CPU cycles than it should! 32-bit math 2-3x vs hand optimized.
Ugh.
RP2350 is a powerhouse, but at least ARM Cortex-M33 compilers available could do a lot better.
Note that my examples were things like alpha blending (graphics composition), texture mapping, audio processing etc. Generic non-mathy code is usually fine with clang.
So basically:
Linux makes everything better, unless you use VSCode on it, which naturally makes things worse again since it's a microshaft product 😂
There is a follow up video, you should watch it, VS Code is now the better solution.
I wonder why they added those crappy/useless riscV cores? It would be much more useful if they added more ram and wifi instead. Only 2 uarts are very low for an 80 pin mcu too.
wifi is reputedly coming later this year. I'd guess the risc-V cores are there as a hedge in case it gets significantly more popular.
The biggest reason is probably testing the waters. If they go over well, it will lower per chip costs a little. Even better, they could potentially open source everything from the ground up which fits well with their values.
I’d imagine there’s also more freedom in design. They can much more easily tailor their RISCV design to their exact needs where ARM would need to be paid a small fortune for a completely customized core variant.
@@rdavis43beggars can't be choosers.
While I'm more enthusiastic than you about the availability of the Risc V cores, it does seem an odd decision to put them both in and yet in a way that makes only using one at a time possible. Feels like a waste of silicon. Maybe cheaper than creating and testing 2 chips, admittedly.
I guess somewhere it's explained (I think it somehow doesn't just leave the unused cores dead but reuses part of them). Would be interested in a deep description of the design choices. Of course, we could probably argue forever on what the 'spare' space could be used for. You'd like more ram, I'd like a faster USB interface. Doubtless as many preferences as there are users !
Those crappy RISC-V cores that were designed part-time by a single R Pi engineer, and are as fast as or faster than Arm's flagship Cortex-M33 designed by a large team?
Lesson - Integers should be all you use.
Gary, this is somewhat the same feeling that I got when started working with RP2040 back in the days after working with a Arduido (Bluefruit Feather). Adafruit docs were SO MUCH BETTER, and the software worked WAY MORE RELIABLY without requiring you to run obscure commands, or be an expert in CMAKE / NINJA. And code examples were not anything like "//okay so we do it here like so and now THIS works but everything else does not. Good luck making it work for your use case!". Apparently this is a controversial take ¯\_(ツ)_/¯
Fantasy and dreams 😂. Others call it embedded programming.
Great it worked for you. Just don't make the mistake assuming it's same for everyone else.
This is the year of linux.
So it's 2040 now?
Only for the remaining Office users who haven't caught up yet. Most of us have been there for years. I think I changed to it (from RISCiX) around 1995.