Looking to upgrade my audio setup with a Blue Yeti USB microphone! If you'd like to support the channel, you can buy me a coffee here: ko-fi.com/thecodinggopher.
One problem with LLVM is that its focus on general optimisations hurts small 8 and 16 bit platforms such as AVR or MSP430. These processors in particular do not have multiple shift instructions. However LLVM will still produce generic optimisations that depend on the existence of such instructions. The result in many cases is code generation that is much larger and slower than the equivalent GCC generated one. As a former contributor to the LLVM project, some years ago I proposed a solution to that problem. However it involved changes on the generic optimisation passes, and was not accepted by the developers community. Nobody has fixed this so far. As a result LLVM still generates code that is significantly worse than GCC for said small platforms.
That's interesting! Are you sure about this? Because this seems trivially fixable, due to how incredibly much more extensible LLVM (so clang/clang++ compilers) than the gnu compiler "platform", no? Judging by your comments, this seems to be an area of low hanging fruit for contributions?
@@simonfarre4907 The problem that I encountered is that the LLVM developers community is highly reluctant to introduce target dependent variations to the code that is not part of specific backends. It's understandable because the front end is supposed to generate the same IR code, while applying generic optimizations, regardless of target. Specific backends are responsible for target dependent optimizations, and to translate the IR code into the targeted assembly code. However, in practice the generic code is highly influenced by the main targets capabilities, namely the x86-64 and the ARM64, and many optimizations that should technically be considered as target specific are coded as generic just because (most?) targets will support them. What happens is that simpler targets such as 8 bit or 16 bit processors can't efficiently translate some pieces of the IR to their ISAs because some instructions are just missing, the most flagrant example is lack of multiple shift instructions. You are absolutely right that fixing this should be easy. In fact, my code submissions were not complicated at all, as they mostly consisted on skipping generic code transformations that would result in multiple shifts, based on a compiler command option, which also could be made default depending on architecture. This worked fine as per my tests on the AVR and MSP430 targets, and it was not strictly target specific because it would apply to any architectures that do not support multiple shifts. However, since this involved parametrising the generic LLVM transformations, it was rejected. As an alternative, I was told to "reverse" the already applied optimisations to the IR code during the target specific passes. I actually spend some time looking at it, but found that it was very difficult or required a big amount of code. The reason is that LLVM will apply several code transformations in a row, one over another, until no further transformations are possible. This often creates IR code that is very hard or impossible to "reverse" to the original state. In this case, you may end with parts of IR code making heavy use of multiple shifts that are introduced by generic optimizations, and there's no way to figure out what to do to remove them. This, while just by skipping the transformations that created the shifts in the first place, would just enable the backend or even the generic optimizer itself, to apply alternative optimizations that could be actually translated into efficient target assembly code. (I can go to further detail and examples if you are really interested) At the end, I gave up with trying to "reverse" the undesired "optimizations", not without some frustration as my solution to the problem was already implemented and working, just not acceptable by the LLVM community. So as said, the LLVM complier still lacks /significantly/ behind GCC when compiling for small targets, which is the reason it is not generally used for them.
@@simonfarre4907open source spaces are very political. Even if the change is trivial, if it touches something that is "core" to the project, it will inevitably meet a lot of resistance.
@dancom6030 I wouldn't say they are very political, I've contributed to a few different open source projects, one has to be mindful (and respectful! Incredibly important!) of a whole host of things. May be legacy reasons why things are not accepted or not a stated goal etc. I think painting it with a broad brush (though that may not have been your intention) and saying it is "very political" I think is not fair, necessarily even though, of course, those projects exist as well. I should probably have skipped calling it trivially fixable as that probably gave my question the wrong impression.
@@DeVibe. Auto-vectorization is the term he wanted. It means using extending registers to check large chunks of data rather than byte by byte. Think of a c++ vector, how it's just an array. That's how the term is used here. Vectorizing the data to an array and performing multiple operations on that array at once.
@@stolenlaptop thank you for your explanation. I know what verctorization is, but I'm still not sure whether he meant that or mere parallelization across cpu cores, as he said. It's still ambiguous.
It must be emphasized that type of licence a compiler uses (how compiler source is licensed) does not dictate the licence of target software it compiles. GPL'ed compiler can compile software that is completely proprietary or completely open.
just recently GCC actually announced they fully support Rust now with their own compiler version called gccr. Honestly that shocked me as I never thought that would happen. I personally would still prefer LLVM though.
Yeah, the launch of gccrs was definitely unexpected. It’s still experimental and needs a special flag to work - probably aiming to support platforms LLVM doesn’t. I still prefer LLVM too; it’s tough to beat its optimizations / mature ecosystem. Curious to see how gccrs evolves, though :)
@@ivanmaglica264 yes, the final compilation step is usually the linking step which works at the binary level and can link arbitrary code together independent of the language/compiler. The Linux kernel can also load modules which are also just binaries and which can be compiled on it's own.
I thought that using a GPLed software is free. Therefore you can use GCC for closed source projects, since it's using. If someone was making changes to GCC and giving/selling binaries of this without ability to get modified sources without additional pay, this would be violation of GPL . I don't see why this can be a concern for closed-source projects, unless you are doing compilers: either for hardware with exotic CPU or doing offloading to GPU or whatever parallel-computing devices, in which case this can be true: someone wants money for making a compiler for a really specific need and not providing source code. Also, from my Gentoo experience, compiling KDE plasma with GCC + O3 + LTO is working, but compiling with clang + O3 + thin-LTO doesn't : plasma cannot start lol, a lot of segfaults.
You’re spot on that using GPL-licensed software, like GCC, in a closed-source project doesn’t violate the GPL. The license only kicks in if you modify / redistribute the compiler itself. So, as long as you're just compiling your code with GCC and not distributing a modified version of it, you're in the clear. The real concern pops up in niche scenarios, as you mentioned (e.g. writing custom compilers). In those cases, companies may need to tweak the compiler, and if they plan to distribute it without sharing their modifications, that’s when the GPL becomes an issue. And wow; Gentoo + KDE Plasma + GCC + O3 + LTO? Respect; I've heard horror stories about builds breaking under aggressive optimizations. Your experience with Clang / thin-LTO crashing Plasma sounds like a classic case of "just because it compiles doesn’t mean it runs." Sometimes, the combo of optimizations / specific toolchains can push things just over the edge. Thanks for sharing.
For what little it's worth, I've often found that `gcc` optimizes generally equal to `clang`, but there are a few things that it does slightly better and a few it does slightly worse. Of course, I would still recommend that if someone is to create their own language that they should write a parser and lexer from scratch rather than depending on existing code.
It's pretty common to write your own lexer, parser, and process the language semantics followed by dumping IR code into something like LLVM to finish up the optimization and code generation
@@__Brandon__ Yeah, it is, but I still say it shouldn't be. I only update my system maybe once every five years and I've had at least 20 different languages refuse to build from source because my LLVM wasn't a high enough version number. The C3 binaries all refused to work for me because of a version mismatch with my GLIBC and `cmake` refused to generate a makefile, so I had to write one myself. Although, I suppose it's also mostly an admonition of crap build systems and poor project development because as gigantic as Rust is, it built from source perfectly each time I've ended up doing it.
LLVM IR is a low-level IR used across many compilers for optimization / JIT compilation. GIMPLE is a simplified IR that is specific to GCC - mainly for breaking down complex code for easier optimization. LLVM IR targets multiple platforms; GIMPLE mainly serves GCC’s internal needs. Hope that makes sense.
It's confirmed unfortunately RUclips is all about shitty clickbaity shinning thumbnail content going viral while precious channels like this remain a hidden gem 💎 Subscribed ✅
I do get a kick out of how the thumbnail says "battle of the compilers" and then it ends up being a very informative, mostly objective, look at each compiler's strengths and weaknesses.
Nice video, but you didn't say much (or anything, really) about SSA (static single assignment form). This, and analogous transformations such as continuation-passing style in the functional world, are important tools for parallelisation!
Great vid, thanks for sharing ! I know so little about low code, it's quite useful. Btw, I am quite attracted to C and creating systems that interact with os but still deliver some kind of complex, high level softwares. Compilaters, assembly, that is pretty much the limit of the range I find interesting in development. Could someone give me advice on what to learn ? I've unfortunately never had the chance of diving down in these waters. Thanks in advance !
Glad you liked the video! For low-level systems, start with C and systems programming; learn about OS internals, memory management, and hardware interaction, etc. Build a simple compiler, study assembly, and explore tools like Lex, Yacc, and LLVM. I would also recomend some books, e.g. "The C Programming Language" and "Computer Systems: A Programmer's Perspective" are great. If you have time, try hands-on projects like writing a basic OS or kernel. Thanks for watching :)
@ Well thanks for these recommandations, I will definitely try to do that, I just need to find the time 😅 May I ask you one more question ? For the tools you mentioned, do you have a milestone, some arbitrary point that if I were able to reach, I could use them reliably ? (Like, let's take an example I know: In javascript, if you've "reached" a good understanding of promises and evreything before, you are capable of doing almost every task, just maybe not very well...) I know by experience that learning is far from a straight road, but would you have a general guess ? It helps so much to know where you are heading 😁 Anyway, thank you very much, for the vid and the answer! Have a great night/day!
@@basilenordmann7356 My pleasure, and happy to help. For tools like Lex, Yacc, and LLVM, you'll feel "ready" when you’ve got a solid handle on C, memory management, and OS basics. A good milestone is when you can build a simple compiler / interpreter in C and understand (at a low-level) how it turns code into assembly. For Lex / Yacc, knowing regular expressions and grammar concepts is pretty important. With LLVM, you’ll want to be comfortable with compilers and machine code. Once you can build a basic compiler or parser, you’ll be good to go (and this is a fair milestone to achieve). Have a great night yourself :)
Great video! I use gcc not by choice but because rust+llvm is broken on ibm power5 arch and results in illegal instructions (SIGILL), rust has been an absolute nightmare on some old IBM hardware. You are right about gcc being better integrated!
6:07 The permissive license of LLVM is the entire reason it replaced GCC in the FreeBSD project, as the GPL license is incompatible. Also just because you aren't forced to share source code changes, most companies do, to avoid technical dept.
I thought technical debt refers to the future need for refactoring that's incurred when speed is prioritized over correctness. So in a sense, it's a debt to your future self, not society.
@@adiaphoros6842I think the insinuation is that sharing the code is a way to avoid having to potentially come back later to fix it, because someone may decide to improve it in the mainline version. And then you can just use that version as a company.
well my friend just corrected me yesterday that it's risc-v ("five or 5") not risc-v ("V"). Now I am confused😄. Btw really organized content so liked and subbed.
@@TheCodingGopher Before I knew it was V for the roman five, I liked to tell myself it was RISC V for Vendeta, born in this world to topple the dominion of x86 and Arm.
I'm familiar with QBE. Definitely worth taking a look for something lightweight, with its "90% of LLVM's optimization power at 10% of the complexity" philosophy. Cranelift seems to have fast, JIT-friendly compilation (i.e. would be solid for envs like Wasmtime).
QBE is very interesting, for the reasons @TheCodingGopher explained. I'm planning to research the Hare language, which uses it, more as a safer and more modern alternative to C; although C23 looks pretty good, I must say.
Probably Java is not a good example of an input to the GCC pipeline, as Java compiles into its own bytecode to be executed by the java virtual machine runtime (jvm).
Correct, and good callout. Java source code is typically compiled by the Java Compiler (javac) into Java bytecode - which is then executed by the Java Virtual Machine (JVM). It's separate from the GCC pipeline (which is mainly used for compiling C, C++, etc. into native machine code). I will point out though - that there are tools that allow Java to interact with native code (e.g. the Java Native Interface or compilers that generate native binaries from Java code) - but yes, Java is not a standard input for GCC. Thanks for watching.
But there exists GCJ, that compiled Java to native... At least it was a thing like 10 or 15 ye ago, able to compile Eclipse and other big programs. I think it was removed from the GCC compiler collection od main languages, a few years ago.
LLVM is not a full compiler on Windows unless you install the gcc version from MSYS2. It does not even have a libc and cannot build itself. So gcc and openwatcomv2 on Windows (PellesC or VSStudio if you need proprietary) are the Open Source kings.
Microsoft has their own in house compiler called msvc. I believe it stands for Microsoft visual C compiler. It is heavily integrated with dll files and more importantly, it is the foundation upon which directx is built on. Llvm currently exists as a cross platform equivalent to visual c capabilities. Llvm is heavily used in vulkan code, the open source competitor to directx, as well as a replacement for open gl, which has deeper roots with fcc then Llvm, being a much older rendering framework. Llvm is rapidly becoming popular, especially when it comes to compiling c++ code. Personally, clang is a more ambitious and unpredictable compiler, I have heard many Linux devs complain about the break neck pace at which it changes, often breaking package builds without much warning after an update to the compiler. Have experienced it firsthand as an end user. It is often a hit or miss frequent occurrence. Boils down to release scheduling not being very cohesive.
@@josephlh1690 I mentioned that above, for the msvc (Visual Studio) which is not a proper C compiler unlike PellesC. LLVM on windows needs msvc for its compilation and as a way to provide tooling and header files. There is a possibility that it will change. But until then gcc is the undisputable king. For the LLVM breakage, terralang guys have also experienced that first hand.
1) LLVM isn't a compiler, Clang is. 2) Even though Clang doesn't have libc++ on Windows, Clang tends to use the standard library used on the OS to produce ABI-compatible binaries. So in Windows it uses the MVSC standard library by default, and on Linux the GCC standard library as well. Because of this you DON'T need MSYS2 to compile, link nor debug (on VS Code you need the CodeLLDB extension to debug), those three elements are inside the LLVM installer from the LLVM website. You could get MSYS2 if you want more libraries from a package manager. Also, let's not forget that one thing is installing Clang from the MinGW32/64 command line which makes it use the old MSVCRT runtime, and another is using the Clang32/64 command line for installing Clang which makes it use the new UCRT runtime which I recommend for better compatibility of all C versions.
Licenses don't really matter if you don't want your code to be "abused". There are countless companies ignoring copyleft licensing and getting away with it...
I will Linux was permissive licensed, because it is the only reason why FreeBSD still popular and scatters efforts of open source community, forcing it to maintain two large OS instead of one
The difference in the shown compilation flow diagrams is partially artificial: a compiler has so many components that there are many ways to group them in an overview. Both have CPU specific optimizations (f.i. to allocate registers), but you may list them into the main optimization (there is a lot over overlap/reuse in that logic anyway) or in the specific CPU backends (which will in reality share a lot of code). So: the flow between components will be organized at bit differently, but by far most effort is put in components which exist in both compilers... in any compiler: the drawings make it look more different than it is.
LLVM is important and widespread but neither it nor gcc can be called the future of coding. The overwhelming volume of coding is not in languages that use the compilers (although they may use products of these compilers). The majority of coding is in Python, JavaScript, and Java, with .NET close behind. (Of course, gcc and llvm are vital tools for the production of runtimes for these languages)
Could you elaborate on how the roles of LLVM and GCC in runtime production influence the development and performance of high-level languages like Python, JavaScript, and Java?
huh? it s not important lol . the key is the process to make a chip . it have a lot of chemiscal , physic , math , mechanic behind it . inheritance is a way to make product cheaper ,and we could spend less to reach same wealth while still have time for other field . a lot of stuff still need to be reseach , but rather that people choosing law to protect their revenue ? is that a faling of education system ? salary and income is a trap of modern world . because of monopoly of previous generation , it could be avoid if people dont make a person too powerful . but they did thought politic and relationship , and it create a economy situation that a nomal people can archive the wealth as they are expected while tech could be the bridgh for the thing they want . and instead fixed the law , they choose to competion for a small pieces of pie of entire market . lol programming just a appication step , it also the last step of product development . you will see GCC will win in price factor on market if india and china , afica ,South east ASIA joining. some way to reach the same result , i don think complier make sense . the truly make sense is macro optimation in value chain between manufacturer and programmer , and program . cheer!
Please stop calling this processor architecture RISC-V[i], it's RISC-five (Roman five) - fifth implementation of the most basic and "mainstream" RISC architecture, of which /early/ MIPS is first gen. implementation. And there's nothing "exotic" in it really - it's core set is the basic with about 50 instructions or so.
Good callout on pronunciation. As for being 'exotic,' the architecture itself has a minimal core set, but what sets it apart is its open-source nature / flexibility, which allow for custom extensions. IMO, that’s what makes it 'exotic' compared to more traditional ISAs like x86 or ARM
Looking to upgrade my audio setup with a Blue Yeti USB microphone! If you'd like to support the channel, you can buy me a coffee here: ko-fi.com/thecodinggopher.
One problem with LLVM is that its focus on general optimisations hurts small 8 and 16 bit platforms such as AVR or MSP430. These processors in particular do not have multiple shift instructions. However LLVM will still produce generic optimisations that depend on the existence of such instructions. The result in many cases is code generation that is much larger and slower than the equivalent GCC generated one. As a former contributor to the LLVM project, some years ago I proposed a solution to that problem. However it involved changes on the generic optimisation passes, and was not accepted by the developers community. Nobody has fixed this so far. As a result LLVM still generates code that is significantly worse than GCC for said small platforms.
That's interesting! Are you sure about this? Because this seems trivially fixable, due to how incredibly much more extensible LLVM (so clang/clang++ compilers) than the gnu compiler "platform", no? Judging by your comments, this seems to be an area of low hanging fruit for contributions?
@@simonfarre4907 The problem that I encountered is that the LLVM developers community is highly reluctant to introduce target dependent variations to the code that is not part of specific backends.
It's understandable because the front end is supposed to generate the same IR code, while applying generic optimizations, regardless of target. Specific backends are responsible for target dependent optimizations, and to translate the IR code into the targeted assembly code.
However, in practice the generic code is highly influenced by the main targets capabilities, namely the x86-64 and the ARM64, and many optimizations that should technically be considered as target specific are coded as generic just because (most?) targets will support them.
What happens is that simpler targets such as 8 bit or 16 bit processors can't efficiently translate some pieces of the IR to their ISAs because some instructions are just missing, the most flagrant example is lack of multiple shift instructions.
You are absolutely right that fixing this should be easy. In fact, my code submissions were not complicated at all, as they mostly consisted on skipping generic code transformations that would result in multiple shifts, based on a compiler command option, which also could be made default depending on architecture.
This worked fine as per my tests on the AVR and MSP430 targets, and it was not strictly target specific because it would apply to any architectures that do not support multiple shifts.
However, since this involved parametrising the generic LLVM transformations, it was rejected.
As an alternative, I was told to "reverse" the already applied optimisations to the IR code during the target specific passes. I actually spend some time looking at it, but found that it was very difficult or required a big amount of code. The reason is that LLVM will apply several code transformations in a row, one over another, until no further transformations are possible. This often creates IR code that is very hard or impossible to "reverse" to the original state.
In this case, you may end with parts of IR code making heavy use of multiple shifts that are introduced by generic optimizations, and there's no way to figure out what to do to remove them. This, while just by skipping the transformations that created the shifts in the first place, would just enable the backend or even the generic optimizer itself, to apply alternative optimizations that could be actually translated into efficient target assembly code. (I can go to further detail and examples if you are really interested)
At the end, I gave up with trying to "reverse" the undesired "optimizations", not without some frustration as my solution to the problem was already implemented and working, just not acceptable by the LLVM community.
So as said, the LLVM complier still lacks /significantly/ behind GCC when compiling for small targets, which is the reason it is not generally used for them.
@@simonfarre4907open source spaces are very political. Even if the change is trivial, if it touches something that is "core" to the project, it will inevitably meet a lot of resistance.
@@dancom6030 Sad but true. No wonder why corporations tend to move faster. They are far less allergic to change
@dancom6030 I wouldn't say they are very political, I've contributed to a few different open source projects, one has to be mindful (and respectful! Incredibly important!) of a whole host of things. May be legacy reasons why things are not accepted or not a stated goal etc. I think painting it with a broad brush (though that may not have been your intention) and saying it is "very political" I think is not fair, necessarily even though, of course, those projects exist as well.
I should probably have skipped calling it trivially fixable as that probably gave my question the wrong impression.
6:30 autovectorization does not split tasks across cpu cores, it just utilizes SIMD instructions for better performance
Good correction
@@TheCodingGopherOK, now, what did you really mean when you mentioned autovectorization?
good catch!
@@DeVibe. Auto-vectorization is the term he wanted. It means using extending registers to check large chunks of data rather than byte by byte. Think of a c++ vector, how it's just an array. That's how the term is used here. Vectorizing the data to an array and performing multiple operations on that array at once.
@@stolenlaptop thank you for your explanation. I know what verctorization is, but I'm still not sure whether he meant that or mere parallelization across cpu cores, as he said. It's still ambiguous.
It must be emphasized that type of licence a compiler uses (how compiler source is licensed) does not dictate the licence of target software it compiles. GPL'ed compiler can compile software that is completely proprietary or completely open.
Do you mean doesn't dictate?
@@dancom6030 sorry, I meant "does not". My first post threw an error, so I had to retype and post it, but I made a mistake
@@ivanmaglica264 you're all good, my man. Stuff like this happens all the time. I just wanted to clarify.
just recently GCC actually announced they fully support Rust now with their own compiler version called gccr. Honestly that shocked me as I never thought that would happen. I personally would still prefer LLVM though.
Yeah, the launch of gccrs was definitely unexpected. It’s still experimental and needs a special flag to work - probably aiming to support platforms LLVM doesn’t. I still prefer LLVM too; it’s tough to beat its optimizations / mature ecosystem. Curious to see how gccrs evolves, though :)
How does Rust compile in Linux kernel? Linux kernel uses GCC. Can it mix with LLVM compiled modules?
first time I'm hearing of this. gonna jump down the rabbithole
@@ivanmaglica264 yes, the final compilation step is usually the linking step which works at the binary level and can link arbitrary code together independent of the language/compiler. The Linux kernel can also load modules which are also just binaries and which can be compiled on it's own.
I thought that using a GPLed software is free. Therefore you can use GCC for closed source projects, since it's using. If someone was making changes to GCC and giving/selling binaries of this without ability to get modified sources without additional pay, this would be violation of GPL . I don't see why this can be a concern for closed-source projects, unless you are doing compilers: either for hardware with exotic CPU or doing offloading to GPU or whatever parallel-computing devices, in which case this can be true: someone wants money for making a compiler for a really specific need and not providing source code.
Also, from my Gentoo experience, compiling KDE plasma with GCC + O3 + LTO is working, but compiling with clang + O3 + thin-LTO doesn't : plasma cannot start lol, a lot of segfaults.
You’re spot on that using GPL-licensed software, like GCC, in a closed-source project doesn’t violate the GPL. The license only kicks in if you modify / redistribute the compiler itself. So, as long as you're just compiling your code with GCC and not distributing a modified version of it, you're in the clear.
The real concern pops up in niche scenarios, as you mentioned (e.g. writing custom compilers). In those cases, companies may need to tweak the compiler, and if they plan to distribute it without sharing their modifications, that’s when the GPL becomes an issue.
And wow; Gentoo + KDE Plasma + GCC + O3 + LTO? Respect; I've heard horror stories about builds breaking under aggressive optimizations. Your experience with Clang / thin-LTO crashing Plasma sounds like a classic case of "just because it compiles doesn’t mean it runs." Sometimes, the combo of optimizations / specific toolchains can push things just over the edge.
Thanks for sharing.
I kinda knew all this already, but it was still excellent to hear it explained in such a clear and systematic way. A really good and useful video!
Glad it was helpful!
But one can add different front end for GCC as well.
I think I should just use both and see who produces the best binary, in perfomance terms, not that I expect much difference.
Great video! Subscribed
❤
For what little it's worth, I've often found that `gcc` optimizes generally equal to `clang`, but there are a few things that it does slightly better and a few it does slightly worse. Of course, I would still recommend that if someone is to create their own language that they should write a parser and lexer from scratch rather than depending on existing code.
That's a great point about licensing! Could you delve deeper into how the switch to LLVM's permissive license has impacted FreeBSD's development?
It's pretty common to write your own lexer, parser, and process the language semantics followed by dumping IR code into something like LLVM to finish up the optimization and code generation
@@__Brandon__ Yeah, it is, but I still say it shouldn't be. I only update my system maybe once every five years and I've had at least 20 different languages refuse to build from source because my LLVM wasn't a high enough version number. The C3 binaries all refused to work for me because of a version mismatch with my GLIBC and `cmake` refused to generate a makefile, so I had to write one myself. Although, I suppose it's also mostly an admonition of crap build systems and poor project development because as gigantic as Rust is, it built from source perfectly each time I've ended up doing it.
I don't really understand the difference between LLVM IR and GIMPLE. There's any number of different languages which target GIMPLE.
LLVM IR is a low-level IR used across many compilers for optimization / JIT compilation. GIMPLE is a simplified IR that is specific to GCC - mainly for breaking down complex code for easier optimization. LLVM IR targets multiple platforms; GIMPLE mainly serves GCC’s internal needs. Hope that makes sense.
It's confirmed unfortunately
RUclips is all about shitty clickbaity shinning thumbnail content going viral while precious channels like this remain a hidden gem 💎
Subscribed ✅
Thank you for this comment 🚀!
Made my day :)
@TheCodingGopher cheers 🥂
I do get a kick out of how the thumbnail says "battle of the compilers" and then it ends up being a very informative, mostly objective, look at each compiler's strengths and weaknesses.
Thanks for your sharing
🙏
Thanks for information.
My pleasure
Nice video, but you didn't say much (or anything, really) about SSA (static single assignment form). This, and analogous transformations such as continuation-passing style in the functional world, are important tools for parallelisation!
Thanks for the comment! Follow-up video on SSA incoming
@@TheCodingGopher Excellent!
Great vid, thanks for sharing !
I know so little about low code, it's quite useful. Btw, I am quite attracted to C and creating systems that interact with os but still deliver some kind of complex, high level softwares. Compilaters, assembly, that is pretty much the limit of the range I find interesting in development. Could someone give me advice on what to learn ? I've unfortunately never had the chance of diving down in these waters.
Thanks in advance !
Glad you liked the video!
For low-level systems, start with C and systems programming; learn about OS internals, memory management, and hardware interaction, etc. Build a simple compiler, study assembly, and explore tools like Lex, Yacc, and LLVM. I would also recomend some books, e.g. "The C Programming Language" and "Computer Systems: A Programmer's Perspective" are great. If you have time, try hands-on projects like writing a basic OS or kernel.
Thanks for watching :)
@ Well thanks for these recommandations, I will definitely try to do that, I just need to find the time 😅
May I ask you one more question ?
For the tools you mentioned, do you have a milestone, some arbitrary point that if I were able to reach, I could use them reliably ? (Like, let's take an example I know: In javascript, if you've "reached" a good understanding of promises and evreything before, you are capable of doing almost every task, just maybe not very well...)
I know by experience that learning is far from a straight road, but would you have a general guess ? It helps so much to know where you are heading 😁
Anyway, thank you very much, for the vid and the answer! Have a great night/day!
@@basilenordmann7356 My pleasure, and happy to help.
For tools like Lex, Yacc, and LLVM, you'll feel "ready" when you’ve got a solid handle on C, memory management, and OS basics. A good milestone is when you can build a simple compiler / interpreter in C and understand (at a low-level) how it turns code into assembly. For Lex / Yacc, knowing regular expressions and grammar concepts is pretty important. With LLVM, you’ll want to be comfortable with compilers and machine code. Once you can build a basic compiler or parser, you’ll be good to go (and this is a fair milestone to achieve).
Have a great night yourself :)
another great LLVM video! it’s always a good day when you post a new vid :)
My pleasure; thank you for watching!
Great video! I use gcc not by choice but because rust+llvm is broken on ibm power5 arch and results in illegal instructions (SIGILL), rust has been an absolute nightmare on some old IBM hardware. You are right about gcc being better integrated!
6:07 The permissive license of LLVM is the entire reason it replaced GCC in the FreeBSD project, as the GPL license is incompatible.
Also just because you aren't forced to share source code changes, most companies do, to avoid technical dept.
I thought technical debt refers to the future need for refactoring that's incurred when speed is prioritized over correctness. So in a sense, it's a debt to your future self, not society.
@@adiaphoros6842I think the insinuation is that sharing the code is a way to avoid having to potentially come back later to fix it, because someone may decide to improve it in the mainline version. And then you can just use that version as a company.
Great video you've got there, a short way to explain how those two monsters work on the inside without making it too complicated.
Thanks for the kind words :)
These two rivalred compilers are themselves giant! How to reduce their sizes as -Os or garbage collection of inaccessible code (or dead code)?
well my friend just corrected me yesterday that it's risc-v ("five or 5") not risc-v ("V"). Now I am confused😄. Btw really organized content so liked and subbed.
Thanks for watching! It's RISC-"five". My mistake - and I stand corrected (some other commenters have mentioned this)
great catch!!
@@TheCodingGopher Before I knew it was V for the roman five, I liked to tell myself it was RISC V for Vendeta, born in this world to topple the dominion of x86 and Arm.
@@Kollum I like that. Petition to rebrand it to Risc 'V'?
have you looked at cranelift or QBE?
I'm familiar with QBE. Definitely worth taking a look for something lightweight, with its "90% of LLVM's optimization power at 10% of the complexity" philosophy. Cranelift seems to have fast, JIT-friendly compilation (i.e. would be solid for envs like Wasmtime).
QBE is very interesting, for the reasons @TheCodingGopher explained. I'm planning to research the Hare language, which uses it, more as a safer and more modern alternative to C; although C23 looks pretty good, I must say.
Underrated video. Very informative.
Why letter “M” is slightly higher than V in “LLVM” in preview???? Anyways, thanks for video!
Super perceptive. I just noticed that myself. Be right back, asking my graphic designer (me) what he's been up to. Thanks for watching!
"Rainman" quality level of OCD there, my friend.
Probably Java is not a good example of an input to the GCC pipeline, as Java compiles into its own bytecode to be executed by the java virtual machine runtime (jvm).
Correct, and good callout. Java source code is typically compiled by the Java Compiler (javac) into Java bytecode - which is then executed by the Java Virtual Machine (JVM). It's separate from the GCC pipeline (which is mainly used for compiling C, C++, etc. into native machine code).
I will point out though - that there are tools that allow Java to interact with native code (e.g. the Java Native Interface or compilers that generate native binaries from Java code) - but yes, Java is not a standard input for GCC.
Thanks for watching.
But there exists GCJ, that compiled Java to native... At least it was a thing like 10 or 15 ye ago, able to compile Eclipse and other big programs. I think it was removed from the GCC compiler collection od main languages, a few years ago.
@@TheCodingGopherthere exists GCJ, although a bit abandoned these days
If i would comment with words it would be too much, so here you go, an emoji: ❤️
❤
Idk, for a moment i thought it had something to do with Gucci and LVMH.
LLVM is not a full compiler on Windows unless you install the gcc version from MSYS2. It does not even have a libc and cannot build itself. So gcc and openwatcomv2 on Windows (PellesC or VSStudio if you need proprietary) are the Open Source kings.
What about LLVM on Linux?
@@bojidarvladev4890 I think there llvm-libc is mostly complete. But I do not use Linux these days so please consult other sources too.
Microsoft has their own in house compiler called msvc. I believe it stands for Microsoft visual C compiler. It is heavily integrated with dll files and more importantly, it is the foundation upon which directx is built on. Llvm currently exists as a cross platform equivalent to visual c capabilities. Llvm is heavily used in vulkan code, the open source competitor to directx, as well as a replacement for open gl, which has deeper roots with fcc then Llvm, being a much older rendering framework. Llvm is rapidly becoming popular, especially when it comes to compiling c++ code. Personally, clang is a more ambitious and unpredictable compiler, I have heard many Linux devs complain about the break neck pace at which it changes, often breaking package builds without much warning after an update to the compiler. Have experienced it firsthand as an end user. It is often a hit or miss frequent occurrence. Boils down to release scheduling not being very cohesive.
@@josephlh1690 I mentioned that above, for the msvc (Visual Studio) which is not a proper C compiler unlike PellesC. LLVM on windows needs msvc for its compilation and as a way to provide tooling and header files. There is a possibility that it will change. But until then gcc is the undisputable king. For the LLVM breakage, terralang guys have also experienced that first hand.
1) LLVM isn't a compiler, Clang is.
2) Even though Clang doesn't have libc++ on Windows, Clang tends to use the standard library used on the OS to produce ABI-compatible binaries.
So in Windows it uses the MVSC standard library by default, and on Linux the GCC standard library as well.
Because of this you DON'T need MSYS2 to compile, link nor debug (on VS Code you need the CodeLLDB extension to debug), those three elements are inside the LLVM installer from the LLVM website.
You could get MSYS2 if you want more libraries from a package manager. Also, let's not forget that one thing is installing Clang from the MinGW32/64 command line which makes it use the old MSVCRT runtime, and another is using the Clang32/64 command line for installing Clang which makes it use the new UCRT runtime which I recommend for better compatibility of all C versions.
When machine make machine, it need no human rule set for its code. It's target audience is another machine. Gcc is for human, not for machine.
I have no idea how I am supposed to interpret this comment.
what kind of job you guys do to be on top of this stuff? i'm talking about half the comment section
btw risc-v is actually "risc five"
I realized this after recording the audio. Thanks for the pointer :)
Why not both 😏
I'd like that
👌🏻👍🏻
That video would have made sense a few years ago, apple slightly dumped Clang, and google even more, so development is slowing down quite a bit.
Excellent
Thank you!
I wish LLVM was copyleft but here we are. Permissive licenses are very prone to abuse and only hold up because of PR...
Apache license is the reason why LLVM exists
@leonardeuler4 Imagine craving for future vendor lock-in
Licenses don't really matter if you don't want your code to be "abused".
There are countless companies ignoring copyleft licensing and getting away with it...
I will Linux was permissive licensed, because it is the only reason why FreeBSD still popular and scatters efforts of open source community, forcing it to maintain two large OS instead of one
@@nikitf7777 so that moneybags can exploit that into "Playstation"s and "MacOS"es of future with 0 respect to your contributions?
Frankly i still don't understand rhe differences, to me both explanations are extremely similar.
The difference in the shown compilation flow diagrams is partially artificial: a compiler has so many components that there are many ways to group them in an overview. Both have CPU specific optimizations (f.i. to allocate registers), but you may list them into the main optimization (there is a lot over overlap/reuse in that logic anyway) or in the specific CPU backends (which will in reality share a lot of code). So: the flow between components will be organized at bit differently, but by far most effort is put in components which exist in both compilers... in any compiler: the drawings make it look more different than it is.
LLVM is important and widespread but neither it nor gcc can be called the future of coding. The overwhelming volume of coding is not in languages that use the compilers (although they may use products of these compilers). The majority of coding is in Python, JavaScript, and Java, with .NET close behind. (Of course, gcc and llvm are vital tools for the production of runtimes for these languages)
Could you elaborate on how the roles of LLVM and GCC in runtime production influence the development and performance of high-level languages like Python, JavaScript, and Java?
huh? it s not important lol . the key is the process to make a chip . it have a lot of chemiscal , physic , math , mechanic behind it . inheritance is a way to make product cheaper ,and we could spend less to reach same wealth while still have time for other field . a lot of stuff still need to be reseach , but rather that people choosing law to protect their revenue ? is that a faling of education system ? salary and income is a trap of modern world . because of monopoly of previous generation , it could be avoid if people dont make a person too powerful . but they did thought politic and relationship , and it create a economy situation that a nomal people can archive the wealth as they are expected while tech could be the bridgh for the thing they want . and instead fixed the law , they choose to competion for a small pieces of pie of entire market . lol
programming just a appication step , it also the last step of product development . you will see GCC will win in price factor on market if india and china , afica ,South east ASIA joining. some way to reach the same result , i don think complier make sense . the truly make sense is macro optimation in value chain between manufacturer and programmer , and program .
cheer!
????
RISC vee?
Please stop calling this processor architecture RISC-V[i], it's RISC-five (Roman five) - fifth implementation of the most basic and "mainstream" RISC architecture, of which /early/ MIPS is first gen. implementation. And there's nothing "exotic" in it really - it's core set is the basic with about 50 instructions or so.
Good callout on pronunciation. As for being 'exotic,' the architecture itself has a minimal core set, but what sets it apart is its open-source nature / flexibility, which allow for custom extensions. IMO, that’s what makes it 'exotic' compared to more traditional ISAs like x86 or ARM