CPU Micro Architecture Levels Are Not Real

Brodie Robertson

Просмотров 15 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 22 дек 2024

Комментарии • 198

@thomasburette9129 5 дней назад ⁺¹⁵⁰
What a nightmare. Imagine being the bedrock of the entire software architecture then looking down and realise the hardware under you is quicksand.
@bigpod 5 дней назад ⁺¹²
Have you ever seen arm its even worse
@oserodal2702 5 дней назад ⁺¹¹
@@bigpodFor all its shortcomings, ARM actually does this specific situation marginally better.
@bigpod 5 дней назад ⁺¹⁰
@@oserodal2702 really take 3 random ARM CPUs what the chance that their instruction sets are completly different, but with jsut common base
@leftybot7846 4 дня назад ⁺⁶
@@bigpodif things will continue to go in this direction we'll soon be back in the days where code could not be executed on another machine. True cyberpunk dystopia
@bigpod 4 дня назад ⁺⁴
@@leftybot7846 well i forsee even worse compatibility on Risc V at least with ARM arm kinda mandates common base
@Archbtw_ 5 дней назад ⁺¹⁸⁶
at this point i'm surprised devs can even agree on the meaning of the word "standard"
@jamesbeebe2870 5 дней назад ⁺¹²
What are words even in this day and age
@autarchprinceps 5 дней назад ⁺¹²
That's because it isn't a standard, it is a grouping for simplification.
@hubertnnn 4 дня назад
They cant. It reminds me of the situation where github replaced keyword "master" with "main" to not upset some minority.
The effect was that our entire automation chain broke down and I had to spend 2 days figuring out what is going on, that 20 applications started failing at the same time for no apparent reason. Yes it took me so long, because "main" and "master" look so similar that neither me nor 3 other developers didn't even spot the difference for a few hours.
A lot of money was lost that day.
@nectarinetangerineorange 4 дня назад ⁺⁵
We know what 'standard' means....
Whatever we pretend to do in front of management
@jamesbeebe2870 4 дня назад ⁺²
@@nectarinetangerineorange I like this definiton lol
@nintendoeats 4 дня назад ⁺⁵⁷
A point: you are talking about Instruction Set Architecture, not microarchitecture. The ISA defines what instructions and registers the CPU has. The microarchitecture is how a specific CPU implements the ISA that it supports.
@stefanalecu9532 4 дня назад ⁺⁵
To give a concrete example relevant to the video: the ISA is x86_64, while a microarchitecture would be Skylake or Nehalem Similarly, AMD implements the same ISA, but an example of a microarchitecture would be Zen 4 or Excavator or K6.
@MechMK1 5 дней назад ⁺¹⁰²
I don't think CPU microarchitecture levels are a good idea, precisely because of what Linus said: They're not a strict linear progression. There exist concurrent CPUs by the same manufacturer, which include different feature sets. The only correct way to handle this situation is to query specifically which features an individual CPU supports and use these features on-the-fly.
This also does not cause worse performance. Querying CPU ID on startup is quick, and setting pointers to certain calls making use of such functions is fast too.
@olnnn 4 дня назад ⁺¹³
That was literally what these levels were first designed for - to be used with the glibc-hwcaps mechanism to that loads different versions a library depending on the cpu. The levels adds some simplified baselines instead of having a million different files for each variation of CPU instructions.
@kuhluhOG 4 дня назад ⁺⁹
@@olnnn the only problem is, the levels give the impression that it's linear, the features are added in the form of a line
in reality, it's a tree (or even something more complicated)
and even more when modern CPUs remove certain extensions
@nobodyimportant7804 4 дня назад ⁺⁶
@@olnnn Leave it to the GNU project to overcomplicate things but if it was "designed" for glibc, then the kernel has no reason to follow suit.
@jonathanbuzzard1376 4 дня назад ⁺⁵
Compiling to v3 gives a 5 to 10% improvement in performance. I work in the HPC space and there a substantial performance benefits to get your compiler options right. For most people not flogging the CPU at 100% for days on end over hundreds of nodes it is not worth it though.
@kuhluhOG 4 дня назад ⁺⁸
@@jonathanbuzzard1376 the argument is not that these optimizations aren't desirable, but that the way these are being done (in "levels") is misguided since these levels imply some sort of linear progression even if there isn't such a thing
@ninele7 5 дней назад ⁺⁵⁵
x86_64v3 is mostly fine. The only problem is that somewhat recent mobile CPUs as you mentioned don't support AVX, so they fall into x86_64v2. There are no x86_64v2/v3 CPUs. All hybrid CPUs support x86_64v3. And on all hybrid CPUs advanced instructions aren't available when E cores are enabled (and advanced instructions aren't advertised so no one is using them).
The problems starts with v4. While AVX512 was developed, the consensus was that all cores sooner or later will include 512bit vector support. But, Intel couldn't implement 512bit vectors in small cores in efficient way, so they came up with new spec AVX10, which specifies 512bit vector support as optional. Now v4 is only universally supported on zen4+ and Intel server CPUs (which is quite a lot). But the problem is, that there will be new CPUs which would support lots of great instructions above x86_64v3 but wouldn't support x86_64v4. And no one knows what to do about it.
Maybe, we should just drop v4 level, and create something in it's place that would be universally supported sooner or later.
@jamesbeebe2870 5 дней назад ⁺¹¹
No we need the AVX512, I need it for my PS3 "backups" to run right lol
@oscarsmith3942 5 дней назад ⁺⁸
v4 is fine as long as you don't care that Intel releases new chips that suck. Sure there will be new CPUs that fall in between V3 and V4 (just like Atom falls between V2 and V3), but it's not Linux's fault that Intel doesn't know how to implement an instruction set that they designed.
@No-mq5lw 5 дней назад ⁺⁴
I guess we could create minor point revisions (e.g. x86-64v3.1) like what ARM does, but I don't expect that to make anything clearer.
Walking back AVX-512 from its own major version would 100% help greatly.
@ninele7 5 дней назад ⁺⁴
@@jamesbeebe2870 With fast enough CPUs RPCS3 will ran fine even without AVX512. So while now it improves performance, in 4-5 years it wouldn't be necessary.
@niteriderevo9179 5 дней назад ⁺⁷
sorry, but i agree with the linux kernel's creator, the whole 'x86 levels' thing is a hard no, confusion at best. cpuid register will give what's available or not already.. the 'levels' thing implies that there is a whole generational subset of instructions that has to be fully present, that kind of really isn't a thing with how both intel and amd kind of really pick-and-choose what x86[-64] ISA addons are supported or not.
@MonochromeWench 4 дня назад ⁺¹⁹
E-cores and P-cores having different instruction sets is not a problem with feature levels but an unnecessary complication caused by intel. Intel wants to say they have AVX512 but only half the CPU actually supports it. Get rid of feature levels and the problem still exists As the CPUs only support AVX512 sometimes. This is an intel problem that screwed the v4 feature level. We can just say that all intel CPUs with E-cores are are V3 and the problem goes away.
@YouSeeKim 4 дня назад ⁺³
This seems like some reincarnation of the browser detection versus feature detection debate from back in the days when we saw a lot of emerging features in some browsers at the same time as older tech like Internet Explorer 6 still had a not so insignificant market share where polyfill libraries only can get you so far.
Browser detection is seen as bad practice but that is easier to say when it all can be done at runtime. so it will be interesting to see what best practice will prevail in the Linux community with the constraints they have to work with.
@autarchprinceps 5 дней назад ⁺¹³
I think Arch with ALHP provides a pretty good solution for most of this. The base libraries of Arch are v1 (or in special cases complex enough to select the vector extension to use themselves, largely for CPU video encoding or the likes), and if your CPU supports it you can add each feature level on top with ALHP, and whatever package still compiles properly with it will be available at that feature level, and you use the newest available feature level of both the package's and your CPU's limitations. That way no one with old hardware get's left out, and you still get the speedup for pretty much 99% of packages. Win win.
The primary issue remaining is indeed if you have a big little feature difference situation as you describe. But that is just plain a stupid idea. It even breaks compiling it locally with -march=native then, or from JIT. You'd have to pin software using the new features to P cores or compile each programm for each core type and then pin each process to a core type. What a nightmare of a design. I think even if they are not properly faster on them, the efficiency cores should have to support the same features fundamentally at least, or else I don't know how Intel thinks any software will be properly able to make use of them. Big little is a pretty good feature on ARM, but Intel seems to just have issue after issue with it.
@SlinkyD 5 дней назад ⁺¹⁸
8:10 found that out the hard way a 5 years ago. Still dealing with it now on a different machine.
I just wanna compile a program to run cus I got the code. Not wait until "hopefully" it make it into a repo cus somebody else built it, or a damn flatpak. Good code I know that compiles on somebody else's machine, won't on mine cus of that one/four things that my CPU ain't got but "should" have.
I see why mom used to get computers from her job in the 90s and NEVER bought a new one. She "need it to work, not waste my fucking time". Best computer lesson after "garbage in, garbage out".
@IlluminatiBG 5 дней назад ⁺¹⁰
This reminds me in the time where we were writing JavaScript for specific version of a browser. At some point you realize, why not check for features than versions. Of course, for CPU to do that they need to augment CPUID instruction to provide that information, hopefully they do, so compilers use that for optimization instead of levels.
@uis246 5 дней назад ⁺³
march=native uses CPUID
@Daktyl198 5 дней назад ⁺⁶
Distros have to maintain large repos of compiled software. The fewer copies of a program they have to compile, the better. Runtime checks are never going to have the same performance as being optimized during compiling/linking. It makes far, far more sense for distros to check CPU features at install time and then assign a V2/V3 repo and mirrors depending on the result than to compile all of their packages with a hundred runtime flags and thus codepaths, none of which are particularly fast and all of which balloon the size of the binary.
@mk72v2oq 4 дня назад
@@uis246 no it doesn't. It just sets the march to whatever CPU model you are currently running.
$ gcc -Q --help=target -march=native | grep 'march'
-march= znver4
@uis246 4 дня назад
@mk72v2oq it also sets cache sizes and supported instructions.
@MixMastoras 5 дней назад ⁺³¹
Guess what! There was a time when some PCs didn't have all the latest SSE instructions! This stuff happens all the time but the burden is always on the developers to strictly limit or widely support all architectures and their instruction sets!
@IbilisSLZ 4 дня назад ⁺³
There was a time not everyone had math coprocessor ;P.
@AccSwtch50 3 дня назад
Now is the time where not everyone has all the avx extensions.
@elmariachi5133 3 дня назад
@@IbilisSLZ There was a time computers ran using water! Can you go earlier? xD
@Linuxdirk 4 дня назад ⁺⁶
A 4 years old chip is v2 ...
Looking at my 10 years old CPU.
[chuckles] "I'm in danger"
@AnEagle 4 дня назад
If it's an intel cpu, it may well be v3, the 4000 series supports it
@Daktyl198 5 дней назад ⁺⁸
V3 and prior are really good to have for distros to compile against a "group" of CPUs. It's trivial to query a CPU during install time to determine if the CPU contains the feature-set required for V3, V2, or V1 and then assign the proper repositories/mirrors based on the results as CachyOS does and what Fedora is planning on doing. CachyOS proves that there is performance to be gained (sometimes quite a bit) simply by compiling a program with support for newer instructions.
V4 is a mess, but even just using V3 compiled packages on a V4 CPU is far, far better than compiling the program with no newer instructions than basic SIMD. Using V3 as baseline and V2 repos for "classic" CPUs detected on install seems fine as well in Ubuntu's case.
@forivall 4 дня назад ⁺³
Torvalds is still making Monty Python references... What a champ
@RandomGeometryDashStuff 5 дней назад ⁺⁵
05:14 does program that use avx2 sometimes work and sometimes crash depending on what cpu scheduler decides?
@hubertnnn 4 дня назад ⁺³
Kinda yes. That's exactly what was happening on older Windowses when intel first released those CPUs.
@mohammedgoder 4 дня назад ⁺⁶
Dude, this isn't micro architecture. This is about ISA. I was thinking you were going on a deep dive on micro arch. Fix the title.
Micro arch is the physical layout of the chip.
ISA is the interface.
@somenameidk5278 4 дня назад ⁺²
The sources shown in the video call it microarchitecture levels.
@mohammedgoder 4 дня назад ⁺³
@somenameidk5278 I'm not surprised.
People in the industry misuse words all the time.
@volodumurkalunyak4651 4 дня назад ⁺⁴
5:28 NO, you don't. Alder lake, Raptor Lake and Arrow Lake are 100% x86-64v3 (official Intel specs, some Alder lake parts could have AVX-512 aka v86-64v4 unlocked if E-cores aren't enabled). Intel currently gives NO WAY to have asymmetric ISA across desktop / laptop CPU's. Windows doesn't support x86-64v2v3 or v3v4 CPU's so Intel doesnt allow that configuration.
@nobodyofconsequence6522 4 дня назад ⁺¹¹
CPU microarch levels are a good compromise between never ever deprecating and letting old hardware hold us back vs building a unique package for every sodding cpu family. I like them. I think we should be creating a new one roughly every 5 to 10 years on based on "every CPU released in the last decade supports this instruction, right? Yes? Yes! Great, add it to v6 or whatever"
@fireztonez-teamepixcraft3993 4 дня назад ⁺⁴
Have you watch the video? Because he clearly explains what is the issue with the microarchitecture levels as it is implemented.
I'm not an expert, but in my opinion the best way would be to simply check if the technology in question is supported and if it is, enabled it, if not just disabled it. CPU architecture and technology change with time, new technology is implemented, older is deprecated or completely abandoned. For me microarchitecture level seems more limitative than beneficial at the end, especially when the exact things that are supported changed from CPU to CPU and not linearly.
It could work if Intel and AMD would have hard set standard of what will and will not be supported by their next CPU generation, but this is not the case at all. Both brand also have a tendency to rename older low or mid-tier CPU with new name, so you could get a brand new CPU with new naming conventions from a 5 yrs old architecture, just to make things even more complicated.
@thewhitefalcon8539 4 дня назад ⁺²
You can define compatibility levels but they have nothing to do with the CPUs, only your package builds. You can say we build four versions of the package: one for the latest consumer desktop generation, one that supports all CPUs back to 2019, one that supports all CPUs back to 2005, and one that supports all CPUs back to the 386. That's fine.
@stefanalecu9532 4 дня назад ⁺¹
If x86_64-v4 is such a shitshow already, wait until we get to x86_64-v5, that will be so much fun
@nobodyofconsequence6522 4 дня назад
@@fireztonez-teamepixcraft3993 "I'm not an expert, but in my opinion the best way would be to simply check if the technology in question is supported and if it is, enabled it, if not just disabled it."
This works if you compile your own packages. Most people don't. I fucking don't. Compiling is a heavy workload. A little package like ffmpeg takes 2 minutes and 52 seconds and I have a Ryzen 9 3900. A browser like chromium will take 2 hours and eat 32 gigs of ram in the process. You do not build your own browser. It's not worth it. Any time that would be saved from the more optimized binary over the 1-2 weeks the binary stays current will easily be eaten by the compiling process.
So that leaves anyone with a brain stuck with a precompiled binary. And nobody is going to build a separate binary for every CPU family. Why the fuck would anyone build "chromium for haswell" and "chromium for broadwell" and "chromium for skylake" and "chromium for coffeelake" and "chromium for bulldozer" and "chromium for zenv1" and "chromium for zenv3", each build taking 2 hours of compute on a decent CPU when they could just build "chromium for the lowest common denominator, who cares about your cpu's fancy features nerd?". The people who'd even heard of the difference are in the hundreds.
x86_64_v3 is the next lowest common denominator. It would be ridiculous to have 20 different versions of the chromium binary which would take a total 40 hours to build. Anyone who cares is probably building their own packages anyway. But x86_64_v3? That's most CPUs from the past few years! You may find a distro willing to build its entire set of packages just one more time. Like they did when i386 was a supported architecture. That's much more reasonable an ask.
Yes I watched the video. I've also been through gentoo and back and know from that experience exactly why your suggestion is absolutely pie in the sky. Nobody is offering to build you a special snowflake package designed to run perfectly on your specific hardware. Your options are x86_64 baseline, x86_64_v3 if you're on a really forward thinking distro (which is still like the common denominator of 10 years ago) or build your own damn packages, and enjoy not using your computer ever.
@nivayu 5 дней назад ⁺⁴
I first thought, that the CPU levels were similarly defined like in Vulkan.
Basically define a common set of features flags and minimum available resources (like min memory cache size).
Over time, as hardware and software advances, add new levels with the now more modern generally available features, that software developers then can target.
Having it clearly defined as a feature set would mean, that it's just a mapping of the standard to the list of features flags it contains. Basically, for example in the kernel, it would internally only use feature flags as it does today and the cli in the very beginning translates the chosen feature set to the defined list of feature flags.
But in the case here, the feature sets aren't clearly defined, if i understood it correctly, which makes them pretty much useless.
@guildpilotone 4 дня назад ⁺³
forgive my ignorance here - can user apps still be compiled to use "advanced" instructions (given that your CPU has them) independent of what the kernel is compiled for?
@atiedebee1020 4 дня назад ⁺³
Yes
@guildpilotone 4 дня назад ⁺¹
@atiedebee1020 Tanx. I knew that used to be true for PPC Macs, but not sure w/Linux.
@hoefkensj 4 дня назад ⁺⁵
gentoo level : does -march=native and -O3 count lol
@Winnetou17 4 дня назад ⁺¹
Gentoo FTW! I like how so many problems/issues are not a problem/issue in Gentoo.
@esra_erimez 4 дня назад ⁺²
I'm glad you asked, E5-2699 v4
@lesh4357 4 дня назад ⁺²
I'm sure someone (maybe the processor manufacturer) could produce a config (feature definition) file for each individual type /model / release of processor. If they feel that is too much work for the multi-billion dollar chip they are selling, then they could have a base file for features that all processors so far support, then a feature set for a model of processor with + or - for both 64bit and 32bit modes (or any x-bitness that may come along).
They should release this file at the same time as they release a new processor.
If it's not a makefile itself, it could be used to automate the production of anything else needed.
This "level" thing is creating confusion, and I'm sure it will lead to situations where features of a particular processor are not utilized because it's BETWEEN categories / levels !
@GrzesiekJedenastka 4 дня назад
It exists. That's CPUID, and it isn't a "config", just something you can get from your CPU if you ask it for that. This levels talk is not because you can't optimize for a specific CPU, it's because most people DON'T optimize for a specific CPU, as that'd require recompiling the entire OS on every computer. Having well defined levels would allow developers and distributions to create builds that would work on all CPUs supporting a given level, and can be easily downloaded.
@lesh4357 3 дня назад
@@GrzesiekJedenastka what I was talking about is a file in a standard format that can be used to automate the process of building optimized kernels for any specific cpu type.
You would not need to be on a machine with that type of cpu.
If you want / need an optimized kernel, it could be used during the install process, either by choosing or building or linking to get an optimized kernel.
@GrzesiekJedenastka 3 дня назад
@@lesh4357 What would that be for? You can (in theory, though probably in practice as well) target any capability set when compiling, not only your own. The help of CPU makers is not needed here, at all.
Thing is, other than some specific use cases, this is completely useless. It doesn't solve the problem of vendors wanting to offer optimized builds, because you would *not* build and distribute one program a thousand of times for _every CPU released ever_ separately.
@onceuponaban 4 дня назад ⁺²
We can now add "CPU architectures" to the list of things that don't actually exist, next to fish, adjectives, and trees.
@GnBst 4 дня назад ⁺¹
"Tis a silly place" sums up the entire concept. CPUID has been around forever. Packages, kernels, etc get compiled with these requirements included and that has worked for decades (Intel MMX comes to mind). No MMX, you don't run that compiled package, that's all there is to it. The AVX extensions thing and it's hit and miss implementation through the last few generations makes this a problem. The whole i386/i686 architecture differentiation was pretty clear-cut in comparison (although it did suffer the same problem with CPUs that were i386 being sold well after most i686 offerings were no longer in production (AMD Geode). I would also add that with both Intel and AMD being involved in the classification process, this sounds like an easy way for them to assist in the unnecessary demise of old hardware in order to force sales of new stuff. They need to stop forcing the old stuff to become obsolete and entice me to buy newer hardware when it actually brings something better to the table. Running Rocky 9.5 on an Ivy-Bridge E5-2670v2 already gives me a warning during boot that it "may not be supported in a future version".
@vilijanac 4 дня назад ⁺¹
In future there will be a probisto just to figure out what hardware you have and adequate kernel can be build.
Then you can chose to download any distro.
@Wkaelx 4 дня назад ⁺¹
Just to know, the v1, v2, v3, v4 in Xeon line processors means the MicroArch version? Like a Xeon 2666 v2, only suports v2 stuff and the Xeon 2666 v3 supports the v2 and v3 stuff?
@marcosmagalhaes6174 4 дня назад ⁺³
Not at all. The vs in Xeon line are something else entirely.
@Wkaelx 4 дня назад ⁺²
@@marcosmagalhaes6174 Thank you bro, I didn't found any relevant info online at all.
@complexacious 4 дня назад
There are even issues amongst the supported features. I have an ancient Atom that supports SSE2/3 and some extensions but when I run ffmpeg with certain encoders it tells me that it's specifically avoiding using certain instructions because they are slower than the code path without them. Runtime evaluation just seems to be a necessary evil. Us lazy programmers dream of being able to add some compiler flags and get significant speedups across the board, but more features just always winds up meaning more testing and more if_this_then_that overrides to fix specific edge cases.
@mercuriete 4 дня назад ⁺⁴
I use Gentoo BTW...
march=native
@PanduPoluan 4 дня назад ⁺¹
Hell, yea! "-march=native" squad, all the way!
@mercuriete 4 дня назад
@PanduPoluan One question.
I saw you use Gentoo.
If gcc is doing unroll loops and vectorization with -O3 ....
This could end up creating AVX512 instructions?
This could end up with your system only executing in Intel p-cores?
Or the -march=native is not reporting AVX512 unless is in all cores?
(I use AMD ryzen but I am curious)
@PanduPoluan 3 дня назад
@mercuriete Ehehe I'm on AMD as well so I'm not sure how it will end up on Intel hybrid.
@mercuriete 4 дня назад ⁺³
So if I understand correctly...
If you have a modern Intel cpu and your gcc is doing loop unroll and vectorization you could end up with a system that only works on p-cores?
Thats a little bit crazy, because Gentoo is defaulting to march=native -O3
@mohammedgoder 4 дня назад ⁺¹
It's a feature not a bug.
Although, I heard that newer Intel E cores are getting vector extensions.
@KeefJudge 5 дней назад ⁺²
I like the levels thing as a concept, but it has got very messy, because AMD/Intel don't subscribe to it and expose each CPU feature support separately, mixing them up for each CPU model, so it's much more of an abstraction than a standard.
I remember coding for D3D back in the day where you checked the caps bits for whether a particular GPU supported each feature, except back then the driver would often lie to you and say "Yeah, we do this", when in reality the driver was sometimes emulating the feature slowly in software, or would say the GPU had full support for something when it would only work in certain cases.
On modern GPUs you just test for D3D level 12.0 or 12.1 (or the Vulkan equivalent), and it's much easier for the programmer cos you know (barring driver bugs) that it'll work, though this only works because AMD/Nvidia/Intel are all on board with it as a standard, unlike the CPU microarchitecture levels.
@rawrrrer 4 дня назад ⁺²
I remember Raymond Chen's book "The Old New Thing" has a portion talking about GPU drivers inappropriately implementing D3D. It dives into detail about how these drivers attempt to cheat WHQL.
@complexacious 4 дня назад ⁺¹
I suppose it's not that different really when "supported" and "fast" are too very different things. It's fine for the broad brush things like RT where you can just offload the decision on the player to enable it or not, but for the most part the old "if gpu == atiragexl then dothis(); else if gpu == tnt2 dothat(); else dosomethingelse();" only really sort of went away due to a two-party system where one is a far more equal party than the other meaning that most devs simply stopped caring if stuff was slow on AMD GPUs and Intel was simply not supported. Now Intel is a bigger player again and nVidia is slowly losing dominance due to outpricing the market you'll start to see players taking gamedevs to task when things are slow on other GPUs leading to a return to board specific optimisation and all the troubles that causes.
@DelticEngine 4 дня назад ⁺¹
What CPU do I have? My main machine has a pair of AMD Opteron 6380 processors in it. I know it's archaic, but it works. For the most part it does the job, even if not as quickly as a current system might be. One of the stumbling blocks is the AVX512 extensions. It seems in some situations there are significant performance gains to be had, but otherwise the gains are marginal at best. I would be very interested in a video exploring CPU instructions and extensions so that an informed decision could be made.
@crayzeape2230 4 дня назад ⁺¹
You have FMA4 instructions on the 6380 too, they won't be turned on by any of the version levels. It's a crazy mess.
@Winnetou17 4 дня назад ⁺¹
I remember with Rocket Lake aka Intel's 11th gen appeared, with AVX512, that in the applications that were able to use AVX512 it was literally 5-6 times faster than the rest of the CPUs. But those apps were quite niche, things you'd use on a specific workplace or for some simulation I think.
@3lH4ck3rC0mf0r7 5 дней назад ⁺³
Maybe the correct thing to do here would've been a "Compile for the given CPU feature dump" flag. Something like -march/mtune=cpuid.bin
If you wanna pick what featureset you wanna support and to what extent, nothing would be more flexible. If you wanna come up with any kind of profiles or generalizations, you can. If you wanna compile code that would only work in a specific CPU like -march=native but don't wanna do the compile _on_ that CPU, you could.
But that's on GCC to implement, not the Linux kernel.
@hubertnnn 4 дня назад ⁺¹
I think its already supported, at least I remember seeing something similar in gentoo's compiler documentation.
The whole discussion was about what should be the default, not what user can set himself, and if there should be few default presets.
@MonochromeWench 4 дня назад ⁺²
That is pretty much what -march does when you give it a specific cpu to use. or you can go an specifiy all the individual instruction extensions you want to enable in long form
@zhongj 4 дня назад ⁺¹
what level does core 2 duo P7350 fall under? Is it the same for GPUs as well?
@russjr08 4 дня назад ⁺²
No clue about the core 2 duo, but GPUs don't have an ISA level in this manner (to my knowledge that is) - rather usually you'd query and check which OpenGL/Direct3D/Vulkan version the GPU/Driver supports which is the equivalent for this situation.
@MrJackfurry 3 дня назад
So true so true. Love all these logical informative videos
@13thravenpurple94 4 дня назад
Brilliant video! Thanks a ton 👍
@jouniosmala9921 4 дня назад ⁺²
Architectural levels is a good idea poorly executed. However, hardware manufacturers should AGREE to keep designing new stuff to fit new levels, instead of abandoning them.
Basically Intel should have kept ISA level support in a way that future mainstream CPU:s would be a superset of previous generation CPU:s. Either kept big only cores, or adding wider registers (not execution units) to little cores. Going back in AVX-512 support is something that seems absolutely horrendous decision. Just be clear. You could get main benefit of AVX-512 with 128bit execution units and 512bit registers. The main advantage of AVX-512 is the conditional execution per channel, combined with scatter and gather, with scatter and gather implementations being capable of handling element per cache port per cycle with conditional bit checked BEFORE issuing the load or store to memory subsystem. Those three things combined is what's really needed to expand vectorization of algorithms. And all those exists even in Skylake-X. (And minimal reasonable implementation of that standard wouldn't harm the little cores, as the goal isn't to add maximum flops per cycle with wider vectors but to add ISA support to existing width, the downsides of AVX-512 are mostly around power management effects of very wide execution width involving multiplication.)
@sjzara 4 дня назад ⁺¹
I assumed that high and low efficiency cores on the same CPU would have the same instruction sets.
@alexturnbackthearmy1907 4 дня назад
Problem is that they are basically 2 different processors in one. Even different architecture.
@the_real_bitterman 4 дня назад
You missed openSUSE also providing v3 optimized packages for Tumbleweed, while openSUSE Aeon (Or just Aeon) is the only openSUSE variant which will automatically install them on supported hardware.
@VarriskKhanaar 4 дня назад ⁺¹
I have the Zen1/Zen2 options in CachyOS. I imagine those are more targeted based upon AMD's generational architecture.
@beatadalhagen 4 дня назад ⁺¹
Same fun in the 32-bit era, 'optional' instructions and all.
@kevinpaulus4483 4 дня назад ⁺¹
I used to try to build the smallest linux kernel for slackware that I could (got it to 1.2 mb back in the day ...) to obtain the best performance I thought I could get but it took a lot of time. And sorry gentoo guys -- or who's still left - ricing is bad mmmkay. That was the consensus and the joke. Have fun spending weekends for 0.5-2% more performance improvement. However that was then and this is now and there are a lot of new SIMD/Vector and virtualisation instructions that I've heard of since that time and not using them is giving away an edge that has real costs (performance/electricity, ...). There should be in my opinion easier/distro tools to recompile the desktop part of a distro . Just like the HPC guys have different toolchains with exotic enabled extensions and hardware here and there and even proprietary compilers (icc for example) for some of the differing nodes. And what can be done with good and fast run-time CPU detection ?
@TheUAoB 5 дней назад ⁺¹
It was always a terrible idea. It seemed to all begin when I was on Phoronix criticising the practice if assuming specific extension support instead of writing code with conditonal support, and pointed out how it wasn't as much if an issue in ARM due to the architecutre level support. I suugested what was happening was defacto x86-64 architecture levels, but didn't think that was a good idea.
That seemed to trigger this all to start. Sorry!
@rars0n 4 дня назад ⁺³
x86 v(-1): CPUs that support MMX and 3DNow!.
@FAYZER0 5 дней назад ⁺¹
Levels would be great if they weren't defined it retrospect. In order for them to work you would have to rid the world of the current ones and instead set up levels agreed upon moving forward. Problem is, that would basically make them useless until all the non compliant CPUs die, and we don't want everything to become e-waste. So, yeah, it's just generally a bad idea, or at least a flawed one. I do understand why we want them as it is nice to make sure that new features actually get taken advantage of in cases where they do speed things up. (Even though benchmarks show that is not an easy calculus.) But hey, I use CachyOS now that it's stable on NVIDIA, so I can use v3 on my Alderlake while still using the AUR. Gentoo is just frustration.
@bigpod 5 дней назад ⁺¹
So amd64 architecture has same problem as arm just on per generation not on per cpu
@needsLITHIUM 3 дня назад
I have a laptop with an Intel N4020 and that is v2 and it's from 2019 in an ASUS laptop from 2021. It came with Windows 11 (ugh) and just being able to get to the bios from W11 is a PitA. The splash screen is completely hidden by default, so you HAVE to setup an online Microsoft account user in Windows just to be able to get to the god damn boot menu by holding shift as you reboot - staying offline it just tells you to try again when you have a network connection and prompts you to shut down. And, as soon as I did that I put MX Linux KDE on there. My desktop has better specs, so I really just need the laptop for watching movies on flights/in hotels on vacation, or being able to use amp sims to play guitar in scenarios where my actual amps aren't viable, which the laptop can do in Linux.
On Windows, would imagine the same tasks are doable, save for the amp sims - my fiancee has the same laptop, and I tried. Neural Amp Modeler kinda works, but Audio Assault Amp Locker/Bass Locker, Neurontube Debut, ToneLib GFX, they all crap out on the N4020 on Windows. I can't even get Guitarix to work in WSL on my main machine. She had various distros of Linux on her laptop for 2 years, trying Feren OS, then Kubuntu, then finally settling on MX Linux KDE, same as me, then put Windows 11 back on it, just because she was curious as to how bad it is, and she kept forgetting to update her packages to point things would nag at her or break, and she kept forgetting to setup auto update. Now she rarely uses the laptop, lol. She doesn't hate Windows 11 as much as she thought she would, but at the same time she still doesn't really like it, especially compared to Debian and Ubuntu based Linux or Windows 7/10, because even when she sets scheduled updates it just does it whenever it feels like it, and Windows is slower than Linux on that hardware. Now she's stuck in a spot where she doesn't like Windows because 11 is a hassle, but Linux requires too much attention for her, and she's just annoyed.
@patw1687 4 дня назад ⁺¹
One of my desktops is an old i7 3rd Gen. It works like a champ.
@tagKnife 4 дня назад
Something needs clearing up here. Intel and AMD were not involved in the creation of the levels. Infact, levels were not even created for x86-64. Levels were created by ARM, as their instructions are clearly seperated by these level subsets. It was GCC/GLIBC that decided to take ARM levels and apply them to x86-64
@hubertnnn 5 дней назад ⁺¹
Those versions that you are montioning are not called v1, v2, ...
They are called extensions, ones I can remember out of my head are mmx, avx, sse.
edit:
I see, they added some weird standard, and Linus Torwalds said exactly what I think about it.
@serras_ 4 дня назад ⁺¹
At the end of the day all I want is a binary that uses my hardware to the uptmost of its ability, and not to feel like i have to (potentially) leave performance on the table to support 'legacy' (for lack of a better term?) cpus.
Is the naming condition bad? Kinda.
Is mfrs still putting out 'modern' cpu's with cut down feature sets stupid? Absolutely
Do I want to compile everything with --native instead of using a precompiled binary, because the above 2 points are stupid? Absolutely fuckin not
@cheako91155 5 дней назад
I'm surprised there isn't an intel/amd split... are they working together?
@alexturnbackthearmy1907 4 дня назад
Always has been. And then they implement different instructions in "AVX-512"...
@SlyEcho 4 дня назад ⁺²
cmpxchg = compare and exchange
@nullplan01 4 дня назад
Why does the kernel even care about AVX or AVX512? It is compiled with -mgeneral-regs-only, meaning those extensions can never be used.
@complexacious 4 дня назад
It's two different discussions smashed into one. The kernel part was just about how compiling a 32bit kernel takes on compiler flags you might not want it due to host specific feature sets and setting a -march= by default was proposed which led to Linus saying it's a mess and let's not but then backing down partially and saying maybe we should at least use generic x86_64 which is reasonable. AVX is just a good example of why x86_64 feature sets are confusing but not specifically related to the kernel.
@JonBrase 4 дня назад ⁺¹
Really, if manufacturers are going to implement big.little architectures, they need to make sure that all of the cores used have the same feature support. If the P cores support AVX-512, the E cores had better do so too. The E core implementation can be dog💩, it can work at one uop per cycle and take 100 uops to implement one AVX-512 instruction and have a 1-port register file for the 512 bit registers and whatever you need to shave space, but it had better run the instructions, however slowly. And when an AVX-512 instruction is retired, it needs to set a bit in some control register so that the scheduler can see that this is an AVX-512 process and only schedule it on P-Cores after the current timeslice.
@mathgeniuszach 4 дня назад ⁺¹
what's the point of even having new cpu features, if you don't make it possible for developers to consistently rely upon whether or not all recent chips will have those features? They're not gonna build software for it. So it may as well not exist.
@GrzesiekJedenastka 4 дня назад
I mean, you can still query for support during runtime, and programmers do that for tasks that are performance critical. But it does bare the compiler from optimizing the code as it would see fit, so yeah.
@GegoXaren 4 дня назад
I run an FX-8350...
Works well enough in most games...
Though, having by games on a spinny boi (HDD) is not great... The cpu waits more than it should in some games, causing stuttering.
@Rohambili 2 дня назад
14:43 Bruda, i have all of them you can imagine...
@KeinNiemand 4 дня назад ⁺¹
AVX10 is going to make this even more of a mess then avx512, not every cpu that supports avx10 will support the 512bit instructions, AVX10 instruction come in 128/256/512bit.
I guess AVX10 and APX will be v5 but most cpus probably won't support all of v5 and we could get cpus with AVX10 256 support and apx but no avx512
@asmod4n 5 дней назад ⁺¹
cant they just make a installer which picks the right thing and downloads it. you can detect that all at runtime of a program.
@GrzesiekJedenastka 4 дня назад
If distros targeted all CPUs then there would be way too many "things" for it to make sense. So no, they can't. That's what we need some standardization for.
@stephenreaves3205 3 дня назад
This makes Gentoo with `-march=native` look sane. EDIT: Since you asked, I'm still rocking my i7 4790k
@ai-spacedestructor 4 дня назад
my cpu is an Intel i7-8700 because that is good enough for VR and will be enough probably a few more years and is more powerfull then some of the newer cpus that cost more.
@elalemanpaisa 3 дня назад
the idea in general would be great but only if and really only if you would have the full distro to embrace it otherwise it is just.. meh.. the biggest chunk still is user land not the kernel on mordern machines who would care..
@NFvidoJagg2 4 дня назад ⁺¹
Unless AMD and intel come together and get strict to what the different levels are. It doesn't do any good.
@deadeye1982a 4 дня назад
Fun Fact: Definition of Industry 1.0, 2.0, 3.0 and 4.0. The definition came after the inventions. World War 1,2, [3]... same
@foznoth 4 дня назад
Nice to see Linus is a Monty Python fan.
@linuxguy1199 4 дня назад ⁺¹
CPU Microarchitecture levels sound awful, heres a better idea, just provide an argument like amd64-num, where num is substituted with the hexadecimal value of the required CPUID bits. So lets say you have something that requires sse (0x0080), avx512 (0x1000), and mmx (0x0020) would be in the amd64-10A0 packages
@blinking_dodo 5 дней назад ⁺¹
Optimize for local machine WOULD be nice...
@Poldovico 4 дня назад ⁺²
you can just do that if you want to. The downside is you have to compile your own stuff.
@medicalwei 4 дня назад ⁺¹
Meanwhile RISC-V...
RV64IMAFDC
@_Jayonics 3 дня назад
I went down the same rabbit hole around compiling for Intel P and E core architectures over multiple weekends. Asking whether you can run code compiled for instructions present on the P cores but not on the E cores on those CPUs, would it run, what if you enforce thread affinity...
There's still no answer on my stack overflow post. another poop 💩 from Intel
@rashidisw 4 дня назад ⁺¹
I'm in favor of [Supported CPUID] list.
@GrzesiekJedenastka 4 дня назад
So basically software vendors define their own "levels"? I mean yeah that could work.
@StephenMcGregor1986 5 дней назад ⁺¹
On Intel it doesn't make sense because Intel
@DrewWalton 4 дня назад
Like many things in the tech world: good concept, horrendous implementation.
@bleack8701 4 дня назад
Someone call the code of conduct crew. Linus called something idiotic without provocation
@GrzesiekJedenastka 4 дня назад
Something. Not someone.
@jonathanbuzzard1376 4 дня назад
You need AVX2 for v3, AVX is not enough
@medicalwei 4 дня назад
The CPU actually using: Apple A17 Pro (sorry)
@siyiabrb8388 4 дня назад ⁺²
Completely broken is right, breaking "legacy" systems and maintaining different repos for v2...v3... for 1% performance gain is not worth it.
@GrzesiekJedenastka 4 дня назад ⁺¹
It can be significant, and it can be worth it. If I bought the whole CPU why can't I use the whole CPU? Whether older systems will still be supported depends on the vendor - I am sure Debian will support base x86-64 for the next 20 years, remember that Debian 13 will still support 32-bit x86, and who knows, maybe 14 will too.
@NikoNemo 4 дня назад
Really...
@jan_harald 4 дня назад
hey stupid
I love you
also lol, even though my cpu is reported to be v4 compatible, enabling the unofficial v4 repos for arch made every program segfault
but v3 works fine, from same maintainer
@somenameidk5278 4 дня назад
Segfault? Odd, i would assume they would terminate with SIGILL (illegal instruction)
@knghtbrd 3 дня назад
I have all four levels. 😅
@nicholasbrooks7349 4 дня назад
interesting
@alex-oc1wo 4 дня назад
Meme while cachy os repo user in side be like 😅😅😂
@VerbenaIDK 5 дней назад
i have a Ryzen 7 4800HS
@rj7250a 5 дней назад ⁺³
Linus say it is unofficial, when these arch levels have been developed with help from Intel and AMD.
I dont get it
@polinskitom2277 5 дней назад ⁺²
Linus is getting a bit too old, and should be replaced so that Linux can survive the future to be honest.
@JEM_Tank 5 дней назад ⁺⁸
He's saying they are unofficial because the hardware people haven't stuck to them as they should, thus they aren't following their own standard they created
@Poldovico 4 дня назад ⁺³
Linus probably made an assumption that they were unofficial (possibly based on how poorly CPUs that are sold map onto the levels) and was mistaken. The level are official, just ill-defined.
@rj7250a 4 дня назад
@JEM_Tank oh, i understand.
I mean, Intel's fault for not creating a 256 bit encoding of AVX512 earlier, now they will create AVX1p.
@DJDocsVideos 4 дня назад
Architecture Levels have been stupid from the get go. Something only a marketing drone could have come up with. Also your explanation is not completely correct as it's perfectly possible for software to detect cpu features at compile time and there is no technical reason to come up with more or less dumb groups. You can actually do it at runtime alas with bigger binaries and for little gain.
@realmwatters2977 4 дня назад
intel atom 32-64
@gr33nDestiny 4 дня назад
Why can’t a code be use it v3.2 or v4.2.1 etc. I’m guessing they thought of that?
@TheRedFoxPlayz 4 дня назад
The 10th-gen Celeron and Pentium CPUs are 'v2' because these chips lack the AVX/AVX2 instruction set. Even though they are recent enough, they cannot run an Ubuntu compiled for x86_64-v3, which is ridiculous.
@Winnetou17 4 дня назад ⁺¹
I wouldn't say it's ridiculous. Not all new chips really need to have support for all the instructions that appeared. At least theoretically, having fewer things included (or, to rephrase it, to remove things that aren't really needed) will make the chip smaller, cheaper and more efficient.
In practice this might be on the cents level but, hey, I still say it's a decision that Intel or whomever should be allowed to make.
In ARM-land, it's like making Cortex-M0 (which is REALLY barebones) chips. Those are still manufactured today.
@TheRedFoxPlayz 4 дня назад ⁺¹
@@Winnetou17 Intel's decision is understandable; that wasn't what triggered me. It's the 'Let's optimize for x86_64-v3' approach, given the number of devices out there. x86_64-v2 would offer a performance improvement over regular x86_64 without excluding chips that were released in the past five years or so. They are effectively turning those chips into e-waste (at least for desktop use cases).
@Winnetou17 4 дня назад ⁺¹
@@TheRedFoxPlayz Ooh, sorry, my bad.
But isn't Ubuntu also shipping non-v3 ISOs ? I thought they only offered that as an extra option.
Though come to think of it, I think Fedora also did something like this and without the non-v3 part. Frankly I'm baffled why is Fedora so recommended for new users. To me it seems a distro for people which already know what Fedora's limitations are. Which are quite subtle. Anyway, rant off.
@JassonCordones 5 дней назад
AMD Ryzen 5 7600
@3amael 4 дня назад
You can always count on engineers to over complicate things, just take the hit and overhaul the ISA every 10 years... yeah it will be painful but progress is never easy, or am I being stupid (tm) and missing something?
@majoraxehole 5 дней назад ⁺³
Any real Linux enjoyer is going to be on AMD processors. Will they ever do weird shit like Intel? No idea, hope not
@floppa9415 5 дней назад ⁺⁷
Are you high? AMD Bulldozer had nonsense like FPU shared between two cores.
@Poldovico 4 дня назад ⁺³
Intel and AMD both participated in defining the levels, and they also both release CPUs that don't map neatly onto a level.

Следующие

Автовоспроизведение