Two Decades of Hardware Optimizations Down The Drain

Lavafroth

Просмотров 100 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 12 май 2024
Credits:
Christian Mutti's original blog post: chrs.dev/blog/clean-code-rust/
Rust code used in the video: gist.github.com/chrsmutti/698...
Casey Muratori's Video: • "Clean" Code, Horrible...
Clean code philosophy Gist: gist.github.com/wojteklu/73c6...
x86_64 Instruction Set Reference: www.felixcloutier.com/x86/
Rust mascot Ferris the crab: www.rustacean.net/assets/rust...
Icons: primer.github.io/octicons
Photo of Robert C. Martin (CC BY SA 4.0): en.wikipedia.org/wiki/File:Ro...
I Interviewed Uncle Bob - ThePrimeagen: • I Interviewed Uncle Bob
Backing soundtrack "Keys Of Moon - Somewhere in the Clouds" is under a Creative Commons (CC BY 3.0) license.
• 🕯️ Free Relaxing Medit...
Code used to render this video: github.com/lavafroth/videos/t...
Наука

Комментарии • 419

@lavafroth 18 дней назад ⁺²³⁷
Errata:
0:30 - I misspelled "understandable"
2:08 missed "accum1" variable declaration (thanks @morels)
6:25 - Technically the addresses are offsets from the image base (thanks @qwendolyn5421)
@xClairy 17 дней назад ⁺¹⁸
Thought that was intentional
@vilian9185 17 дней назад ⁺²
@@xClairylmao same
@TheOriginalDuckley 17 дней назад ⁺⁸
Genuinely thought that it was just a funny ass mistake, as a programmer myself, I know it’s super easy to make them and it’s so obvious once you see it!
@FruchtcocktailUndCo 17 дней назад ⁺⁷
understandable.
@vilian9185 17 дней назад ⁺⁴
@@FruchtcocktailUndCo have a nice day
@undergrounder 17 дней назад ⁺⁸⁰⁶
See? It’s not me, it’s the compiler.
@monad_tcp 17 дней назад ⁺⁴⁶
in this case its the compiler, C# is able to make much better optimized code that actually use SIMD for those cases.
That's the problem with using VTables in statically compiled languages, they basically can't optimized that.
If you're going to rely heavily on OO patterns, don't use C++ or Rust, use C# or Java, they're much better optimized for that.
@TapetBart 17 дней назад ⁺¹⁸
@@monad_tcp compilation time polymorphism on the other hand is based.
@monad_tcp 17 дней назад ⁺¹²
@@TapetBart ironically Objective-C was good at that.
C++ can do it sometimes (that is, if you don't use virtual calls).
The real problem is that in-lining code across polymorphism is a hard problem.
When you use templates on C++ it can do it via creating lots of copies, your binary gets huge, but its fast.
@mihneabuzatu784 17 дней назад ⁺³
@@monad_tcphow could C# apply SIMD in the dynamic dispatch example without knowing what the area function does at compile time? Or are you talking about applying it at runtime?
@animarain 17 дней назад
Best comment ever!! 🤣
@12q8 17 дней назад ⁺⁵⁴³
This is a very niche example, but shows the point.
I was taught everything Clean Code at uni as well. I remember asking one of my professors the same question “isn’t clean code less performant?”
And his answer was yes, but it is also easy to read, understand, and modify, and computers are so fast nowadays that, in most cases, the performance impact is negligible.
In real-life situations, what I’ve seen in industry, is the intuition experienced and knowledgeable developers exhibit in knowing when clean code should be applied over performance and when performance is needed over clean code.
Something my previous manager taught me.
@thisguy.-. 17 дней назад ⁺³⁴
not trying to dispute your experience, as I'm not even employed yet. But I think the example does heavily apply to extensible libraries, something which is more significant to the industry than you give it credit for here. They must keep a minimal performance overhead at every step since they're the backbone of every app in existence. In this example the polymorphic "clean code" is - as stated - great for functionality with external crates, but for internals, enums are practical for not only performances sake but even in usability.
For end user applications, sure computers are fast and you probably should just not care. But if we apply that logic to all areas of the industry then that's how we get the mess that is electron, npm, and embedded apps that lag for several seconds every time you press a button on a remote. Like you said, it's on the developer to know when they need to write good code or not, but writing performant code is much less niche than you made it out to be here.
@liquidsnake6879 17 дней назад ⁺³¹
Depends what you're doing, if performance is critical enough to you then knowledge and application of direct assembly language is a requirement even to this day. No compiler can be trusted to be performant in all scenarios. For the overwhelming majority of programs even a 5x slowdown in a particular calculation produces no significant user-noticeable different in the whole final product.
But obscure code that is hard to modify and maintain produces daily headaches for your team and for your users as your team struggles to keep the product stable. That's why clean code matters and is superior to concerns of performance most of the time, ofc that's not ALL of the time and that's why we still get paid and haven't been taken over by AI and probably won't
@12q8 17 дней назад ⁺⁷
@@thisguy.-.
I meant the creating a billion shapes examples and the performance benchmark based on adding the area.
Though it delivers the point.
I don't think extensible libraries applies completely, since there are generally many that have their pros and cons, and new ones get made alot. One thing you'll see once you start working is how diverse the requirements can get, and many times you'll have to read through the source code to figure out why something is not doing what it should (and then discover the devs added some automagical stuff that changes the params some 4 functions deep) and tracing that is made way easier with clean code rather than obfuscated for performance purposes.
I've also had to write some projects that used the decorator pattern, and the source code being readable and documented helped me understand how everything works, and knowing a library inside out helps avoid so many headaches and looking up/asking AI what is happening.
Another thing you'll find is how many work arounds exist for various problems from what you've listed, which usually boils down to figuring out what a command does, and asking yourself "do I need it to do all of this?"
One such case is monorepos and pulling them. Pulling doesn't just download the source code, but the entire git tree, with all its commits and branches, and you really don't need that. With some reading the docs, and some handle medium articles, you can bring that down from 30mins to mere seconds.
You probably also had your professors forcing you to read the man pages and documentation, and thinking it is useless when you can just look it up, or asking ChatGPT, but what they are trying to teach you is to be comfortable reading through the docs, because you'll use many libraries and to know how to use them, there won't be CS classes or office hours, you'd have to read the docs.
I didn't learn this at uni, because I really thought it was a waste of time, but mostly through personal experience hopping over to daily driving Linux and using libraries at work for scripts that use docker and kube.
@taragnor 17 дней назад ⁺¹⁶
Clean code is generally the rule, with the exception being performance critical sections. And sometimes the benefit of the clean code is worth it. Here we see an extremely simplified example. A shape with one method associated with it, taking the area, with 4 more permutations of shapes. Blow that up to 12 different shapes with 20 different potential methods, and your code becomes a nightmare to read if you want to rely exclusively on enums and match statements. The thing is that clean code standards are mostly for large projects, so it will usually seem nonsensical to apply them to simple toy programs. Seems simple when it's just taking the area of a few shape types, but in real projects you'd probably have rotate, draw, scale, move_vertex, detect_intersection, and all manner of other functions that go with those shapes. And what turns out to at first seem "not that bad", can turn into a bloated looking mess in a hurry.
@andreasvaldma8428 17 дней назад ⁺⁷
Yeah. For example when your writing web services most of the time is not used for computation but waiting for io. In that case it's unreasonable to prefer performance. If you writing a game engine it is a different story.
@sdjhgfkshfswdfhskljh3360 17 дней назад ⁺²⁸⁴
Readability is just yet another variable for optimization like RAM and CPU resources consumption.
Sometimes you optimize for humans, sometimes for computers - it depends on what is more important in specific place of the code.
@cerulity32k 12 дней назад
Exactly. Low level and heavy math is super optimized for computation, just look at Q_rqsqrt.
@tykjpelk День назад ⁺¹
Right. Code that does the wrong thing really fast because you didn't catch a bug isn't very successful.
@amongussuss341 17 дней назад ⁺¹⁹
Becuase of the heavies and the two decades in the title i thought this was about tf2
@scotmcpherson 17 дней назад ⁺²⁵⁶
This is why I practice what I call Clean Enough Code...not just because you lose access to the hardware, but also there are some software optimizations that just aren't "clean" but uncle bob's definition.
@Carltoffel 16 дней назад ⁺⁴⁴
It really depends on how critical the code is. Usually, most of the runtime is spent in a tiny fraction of the code base.
And keep in mind: Fast is better than slow, but slow is better than unmaintainable.
@scotmcpherson 15 дней назад
@@Carltoffel I am assuming you read "clean enough" ?
@andrewtran9870 18 дней назад ⁺²²⁹
Insanely high quality content for how small the channel is. Complex concepts are explained simply, visuals are clean and work astonishingly well to help demonstrate the topics
@R.B. 17 дней назад ⁺⁸²
Clean code isn't about writing the fastest code, it is about writing maintainable code first and foremost. You pointed out the disadvantage using enums, as a derived class wouldn't be able to inherit the area method from its parent, and that is precisely why it is less desirable. There is a time and place where optimization is necessary in a way that will break clean code, but for most programming the algorithms are not often the performance bottleneck anymore. If you've written something where that is the bottleneck, you should ensure you have your validation test cases and then work on the optimization.
@georgeweller1 14 дней назад ⁺⁹
Yeah but hotshot kids arent interested in being good developers, they want to be 10x con artists who hop from interview to interview and write shitty blog posts about how everyone else is an idiot and they alone have figured out the truth.
@BaremetalBaron 14 дней назад ⁺¹⁵
The real problem isn't performance, the problem is that "clean code" LEADS to unmaintainable messes and the advice is simply BAD. I don't avoid clean code because it's "slow", I avoid clean code because that much indirection and abstraction leaves you constantly digging through over-engineered glue code trying to figure out where anything is, often splitting a simple set of steps into multiple methods in multiple classes across multiple files, so that you have to maintain this deep call graph in your working memory, scrolling around, and tabbing through files, for something that could all fit on screen inline in order.
It also becomes a source of insidious bugs, because, if something is extracted into a function that doesn't need to be, you now have to worry about when it can be called, what can call it, and under what pre-conditions that produces valid results, whereas inline code simply flows from one statement to the next and the state and flow of execution are obvious (barring say, goto shenanigans).
@R.B. 13 дней назад ⁺⁵
@@BaremetalBaron there's over engineering problems for sure, but there's also a balance which can be struck while still allowing for modular design. If you're building a class which isn't sealed, then concrete implementations, as described, will make maintenance more difficult. It sort of depends on what you're building. For anything significant in size, abstraction allows you to deconstruct the problem into manageable sized chunks.
@delphicdescant 13 дней назад ⁺²
Maintainability doesn't need to be the #1 priority.
If you're writing some dull enterprise software "solution" for some mind-numbing corporate web junk, then sure, prioritize maintainability if you want.
But that's not everybody. So it would be nice to stop hearing the so-called "clean code" doctrine applied so universally.
@BaremetalBaron 13 дней назад ⁺⁸
@@R.B. I'm not arguing against abstraction, I'm arguing against the specific recommendations of Uncle Bob's "Clean Code" on how to factor your codebase. The recommendations lead to over-factoring, which actually makes it *harder* to understand the system as a whole as it gets larger, because it increases the surface area of the code.
@lucasmontec 18 дней назад ⁺¹⁹¹
Only insane or juniors really believe any code/architecture idea is to be followed blindly and everywhere, to it's core. Clean code was created under a large-application context, with up to hundreds of developers working on the same system at the same time. The idea is to make people able to actually write code together, not to generically "write better code". The title is missleading and the content doesn't justify your point. You wrote really simple code, code that requires almost no architecture, and for a very resource intense task, with only one developer working on it, on a modern language that is intended for performance. This is like coding an http server in assembly to justify that apache is bad, or writing an FPGA image processor to say that OpenCV is slow. You can indeed write VERY EXPENSIVE, hard to maintain, ugly and super fast code, most people can, but that code is not at all what most people work with everyday, which is very far away from the bottlenecks of any system. Most code can and should be slower, if it can be fixed and adapter faster. That's exactly what clean code and clean architecture was about... making software soft, easy, adaptable. The computer will always understand you, even if your orders are stupid. People wont. People are also much more expensive.
@lavafroth 18 дней назад ⁺⁴¹
Part of the problem is to come up with a proper middle ground. Blind use of clean code can prevent the compiler to catch places where it can optimize your code (including enabling SIMD). Of course if traits were completely useless, langauges wouldn't ship them but keeping track of these layers of abstractions is also mentally taxing.
@adamhenriksson6007 17 дней назад ⁺⁸
Also, "clean code" add boilerplate-complexity and structure to keep the code from not getting spagetty. One thing I noticed is by doing the easiest possible thing (most primitive, CPU friendly that is), not only is everything easy to understand since it is less structure and less code, but it also is easier to create new features, features can be created with less code, and changes are easier and faster to perform since the program is easier, smaller and with less boilerplate.
Simplicity has a multiplicative snowballing effect that simplifies and improves the efficiency and output of all future work, which also further snowballs.
Imagine this but the exact opposite... This has been my exact experience with all pattern-heavy OOP development during my 5+ uni years and all OOP production codebases I've seen so far that relies on OO patterns.
Also, I even dont care about DRY anymore if it means i can avoid using the worst programming concept ever imagined, "abstract" 🤮
@0xO2 17 дней назад ⁺²
From my pov all such clean big apps are rewritten or abandoned anyways.
@lucasmontec 17 дней назад ⁺⁴
@@0xO2 your pov is not science. Most of the professional enterprise applications are clean or "cleaner". Most of the backends in the world are like that too.
@lucasmontec 17 дней назад ⁺⁴
@@lavafroth I'm not sure how does it work in rust, but at least in java (where clean was created) and C#, it's not at all taxing. A layer there is a directory, a folder. There are 3 layers. Not taxing at all. Lot only that, you can ignore the layering, you can focus on other aspects. You can extract methods to have smaller units for example, and counter the stackframes by asking the compiler to inline. You can extract variables from inside if conditions, so developers can read your code without having to read comments or interpret complex instructions. You can minimize class sizes, making classes do only one major thing... Again, you don't need to do all of this at once, nor force any of this where it doesn't fit. It's just 80/20 or 90/10! You don't need performance in most of your application, and the readability will pay much more than performance. If you need super high speeds somewhere, you can then just write that performance critical part closer to the metal, even calling simd or other parallel instructions directly. I work in game dev and where we need to talk to the GPU is NEVER clean. It's usually ugly, yet organized, but always hard to read. It needs to be really fast. My whole comment is about context. Architecture is always contextual. It's like precision in machining, it's costly, slower and not necessary everywhere. Still, precision is fundamental and allows machines to work longer, faster and with less stress. No one grinds every face of every part to a few thou tho. For web apps, the layering is well defined, and actually, for most applications that have outputs it's usually: model, business logic, presentation. If those layers don't make sense for your app, maybe your app doesn't need them indeed. Maybe your code is not big enough, most systems and apps are not big enough for such considerations. Just take in mind the problems it's trying to solve.
@exotic-gem 17 дней назад ⁺²⁶⁹
I’m surprised the loop “unrolling” into 4 SIMD accumulators isn’t done by the compiler directly, seems like something it should be able to figure out.
@Hardcore_Remixer 17 дней назад ⁺³³
Hey, I'm surprised it figured out that it can do SIMD by itself. For now I've been doing it by myself by using Intel's intrinsics.
@Jason9637 17 дней назад ⁺¹⁸⁴
It's actually not allowed to, since reordering floating point operations can change the result
@agsystems8220 17 дней назад ⁺⁶¹
Floating point arithmatic is not strictly associative in computers. Imagine the case where you have a list consisting of alternation of x and negative x, for a precise x, and an odd number of elements. If you add them in a line you stay near zero, and keep all the precision. If you split it off into 4 accumulators each of them will not stay near zero, and you will lose precision.
This isn't loop unrolling, because loop unrolling maintains the ordering of the loop. You are talking about telling the cpu that it is an associative fold of a map, and that it can do the adds however it sees fit. Corner cases prevent that.
@catgirlQueer 17 дней назад
@@Jason9637just have some fun safe math optimizations! (-funsafe-math-optimizations) it'll be fine
@LucasSantos-ji1zp 17 дней назад ⁺¹²
@@Jason9637 I wonder if the compiler would do this optimization if we enabled relaxed floating point operations with compiler flags.
@max_ishere 15 дней назад ⁺¹²
I can't believe you had tf2 and Rust in the thumbnail
@KiraSlith 11 дней назад ⁺⁴
IMO the most interesting thing to come out of this video is actually in the comments, namely the diversity in what people actually consider "clean code" while assuming they're talking about the same thing. Some see "clean code" the same way I do, it's all about laying out code and functions in blocks of self-explanatory code with plain text variables, while others see "clean code" as a matter of using the simplest code possible to achieve the desired output, or using the fewest lines, or modulating the code into meta-packages, just using or not using specific functions, or some arbitrary combination.
@yuack2398 17 дней назад ⁺³⁷
I write and run programs for HPC. In my field, polymorphism is still used for some highly optimized programs, which are designed to run in thousands of computing nodes in very efficient manner. In this case, they do not use those clean code things deep inside the code, as it is not compatible with optimization, but they are still good solution to manage other parts of programs, where performance doesn't matter.
@mikkelens 17 дней назад ⁺⁵¹
Polymorphism does not inherently mean dynamic dispatch, and it disapoints me to see you not mentioning the static dispatch you can get with impl Trait in rust, which in many cases is the perfect solution to a lot of these problems, potentially even being more efficient than grouping types using an enum. The only reason you avoided this was because you can’t use impl Trait as a standin for multiple arbitrary implementations at the same time (they need to be in a vector with the same internal element size), but this is only something you did for your benchmark without arguing why you needed it. Why do the shapes in your implementation need to be in the same vector? It seems incredibly arbitrary to me, and it kind of ruins the narrative you’re going for in the video, at least to me. This video is very well produced, and I think you can make great things (although you need a better microphone), but I do wish you were more critical of your “assumptions” if you will, even if the video concept was based on other literature. I really wish you dove into the type-level SIMD that is experimental in rust right now but is a way better solution to your problem of choice.
@lavafroth 17 дней назад ⁺²¹
True, nightly rust SIMD is a better solution. Perhaps I'll make a more in depth video in the future. Thank you for the criticism.
@yuitachibana8829 17 дней назад ⁺¹
One reason I can think of using Vec is to handle entity processing in a simulation/game where ordering matters between different types of entity.
@WelteamOfficial 16 дней назад
@@lavafrothwhat do you think about his suggestion to use static dispatch ?
@curatedmemes9406 15 дней назад
@@yuitachibana8829 YUI????
@asdfghyter 14 дней назад ⁺²
@@yuitachibana8829 This is exactly why Entity Component Systems are more performant in game engines, since they don't place all objects in a big bag of dynamically dispatched trait objects
@Ruzgfpegk 13 дней назад ⁺²
As far as I remember, the "Clean Code" book is more of a list of common issues arising in software development, associated with the "rules" Robert C. Martin found to avoid them.
And all the rules aren't meant to be applied on all projects.
What's more important is to often check that we aren't "footgunning" ourselves or our colleagues either in the present or the future, by recognizing in advance we're heading towards a tumultuous direction, and to act on time by applying one of the rules if it's applicable.
I think I remember one of the takes being like "organizing code can have an efficiency cost, but if you organize better the vast majority of your code that is not speed-critical then everybody wins".
The example here would be a speed-critical part, so it wouldn't need to follow closely every "Clean Code" paradigm.
What negates two decades of hardware optimization is more often a lack of skills (not having the right indexes on a database, not caching what could be cached, not profiling the code, not using the right tool for the right job, …) and knowledge (not having experienced lower-level programming and not having any idea of how the code runs in the end) than trying to follow too closely a set of ideal rules.
@X39 17 дней назад ⁺²¹
I think you have a general misunderstanding here at place. Clean code does not violate the performance as it works on a different rule in software development:
Rule 2: Software must be maintainable, except where it collides with the first rule
Rule one being: a program must be correct. If performance is a requirement (eg. A render target of 120fps must be reached, but boxing would introduce too much latency) , abandoning clean code principles is hence a must, where a violation of rule 1 would be introduced otherwise.
For all other cases tho, performance is part of rule 3: Programms must be efficient, except where it violates rule 1 or to.
Clean code hence is very relevant, because most applications do not have a relevant target speed beyond (nice to have). If speed for specific sections is required, hindering hence the correctness of a program, violating some or all principles of clean code is a must.
@MayoiHachikuji88 13 дней назад
I don't believe in "maintainable" software. For any given problem, there's an efficient solution, if requirements change, 90% of it will have to be thrown away.
"Maintainable" software leads to legacy nightmares. It's quite literally easier to create something new. This garbage industry is plagued by sunk cost fallacy that could be avoided if idiots who are investing money into this realized that in the long run it will be cheaper to tailor N programs to N problems than to make one program with M amount of parameters that can solve N problems.
Ironic that you mention games, because games are proof that this works, video game companies literally throw away 90% of old game code and write new code for new games most of the time, renderer is the only part that really cannot be thrown away because it's an already solved problem and there's nothing to rewrite.
@delphicdescant 13 дней назад
Ok, but what if someone says your rules #2 and #3 should be swapped? Whether or not your rules as written came from some very highly-respected source, they are still up for debate.
@X39 13 дней назад ⁺¹
@@delphicdescant They are the basic rules for our very industry. There ain't no debating.
Software must be correct, maintainable and fast
You can't have fast,maintainable,correct, as your software then ain't running.
correct,fast,maintainable simply makes no sense.
@delphicdescant 13 дней назад ⁺⁴
@@X39 > correct,fast,maintainable simply makes no sense.
Why not? You're just stating "this is the way it is, and nobody can argue." So basically just 100% dogma and 0% rational.
The "industry" shouldn't be in the practice of following magical spells and reciting doctrines.
@TheMelnTeam 13 дней назад ⁺²
Maintainable is pretty important to sustaining both correct and fast if the software will see updates.
If you're sure it won't and you're right, it's perfectly maintainable.
Picking fast over maintainable will put correctness and fast on a timer. Sooner or later, probably sooner, you will need new software.
But perhaps this is an acceptable sacrifice in some contexts.
@noobdernoobder6707 16 дней назад ⁺⁹
"unederstandable" is a beauty in itself.
@Blacksun777 17 дней назад ⁺¹⁰
If not using clean code makes you "automatically" programm with knowledge of compiler/hw optimizations, yes. But I don't. I find performance issues arise not of inefficient for loops or the like, but much more because of a complex interplay in the system architecture. The only place I needed optimizations as shown, was when doing benchmark optimizations on toy examples.
Actually it would be interesting to see how to take advantage of these HW optimizations. More of them and also in different language types(interpreted, Intermediate Code based ones)
@zerotwo7319 17 дней назад ⁺¹²⁰
If your project value speed, you create that pattern, write documentation and that will be your modular clean code. It is all about your needs and not some random guy.
@doom-child 17 дней назад ⁺¹²
This is really well done. You made some pretty detailed stuff make intuitive sense.
It sounds like English might not be your first language, so I thought I'd let you know about something that really tripped me up in your pronunciation. The word "register" is pronounced with the stress on the first syllable, as in "REG-ister", not the second. I thought for the first couple of times that you were saying "resistor" (where the stress is on the second syllable, as in "re-SIS-tor"). English is so weird.
@agsystems8220 17 дней назад ⁺²⁹
Arguably a better option would be to have a cleverer structure than a vector that partitions it's elements by type. When you ask it to map this structure to area it can look at the type v-table once for each block (rather than per shape), run the same code on each shape in the block, and then zip along accumulating the output once the are all done. In the dumb loop version we are resolving the code for each individual shape at run time, despite them all falling into a small number of buckets that could easily be batched and the code on each of the buckets being inlined and optimised. The compiler has to do this because the functions it may be calling might not even exist yet.
The next level would be to use a macro to implement the traits in question for vectors (as a new type for scoping reasons) of each specific type as a mapping. The code just needs to apply a map to the vector, and the compiler will unroll the loop and parallelise. Then our smart structure would contain a set of blocks that all implement the trait (using the v table), but each individual block will run vectorised code over itself in concrete way. Unfortunately the compiler doesn't seem to create vectorised versions of functions by default, so we need to hack it a bit. It also means we need to prespecify the concrete types that might exist, so we are not quite as extensible as we would like, but adding a new type would be just adding one line implementing the macro on a new shape type.
This toy example is easy to beat, because the area functions are essentially the same, the code is small enough that maintenance is easy, and the functions are tiny so that the overhead becomes the major factor. It doesn't need to worry about scaling problems that might come later. I would sort of reject your assumption that the code presented is optimal though, because you have fallen into a trap that abstraction steers you away from; When presented with a stream of objects of different types you generally want to split it into streams based on type and handle them concretely. What you have done is collapsed the types (that encode information for optimisation) into one mega type that you are attempting to handle concretely. Here it is possible, and pretty efficient, but it doesn't generalise and is hard to maintain.
The problem isn't that the types are abstract, it is that you are mixing abstract types with a concrete loop and expecting it to perform well. What you should be doing is abstracting away the loop into something that lets you leverage the type information in a smarter way.
@MayoiHachikuji88 13 дней назад
It would be more clever to stop being lazy and stop trying to push compile time decisions to runtime.
For example in a raytracer, lets say you support spheres and planes. Well okay you also support collections of planes for things like cubes. Your planes can also differ, triangle vs trapezoid and so on...
Tell me, why do we need this v-table bullshit for what should be few separate handwritten loops?
@GLeD101 13 дней назад
The box isn’t just slower from the deref it’s largely because of cache misses caused by loss of locality
@Rudxain 14 дней назад ⁺²
I agree with the "moral of the story", but we should remember that premature optimization is bad, especially if there's no profiling/benches! @WhatsACreel talked about branchless programming, and why some "clever" optimizations that remove conditional execution might confuse the compiler, causing the emission of suboptimal code.
I recommend Michael Abrash's "Graphics Programming Black Book: Special Edition", where he uses greatest-common-divisor as an example: If we find the GCD by making use of the verbatim definition, it runs as O(2^n) (where n is bit-length), but if we use the Euclidean algorithm it runs as O(n^2) (same as multiplication). Both algorithms are simple and readable, but Euclid still wins.
I'm aware those examples have nothing to do with polymorphism, but they do impact performance
@DeathSugar 14 дней назад ⁺¹
There are two major notes here - specialized algorithm over well known data types will always outperform algorithms over generalized types. You always can write assembly which (sometimes) will be more efficient than compiler. But would it be maintainable in the long run? No. That's why clean code.
You can write pretty clean code using iterators APIs instead (map, filter, reduce approach), it would provide pretty efficient baseline with auto vectorization (which means it will use SIMD, if possible). But there are a lot kinds of tasks which aren't reducible and will not be optimized, so you should at least make it maintainable and fast enough, coz generally you almost always will have some room to squish some more juice.
@L1n34r 13 дней назад ⁺²
I think it's important to remember the "why" behind approaches like clean code. Why do it? Because it reduces the time spent looking for bugs, having new team members understand the codebase, or changing / adding onto existing functionality. That's the why. Why not do it? Well, if you want to squeeze every drop of performance out of a specific critical section of code. So if your product is complex accounting software that needs to work with the least amount of bugs possible while requirements change once a year and older sections of code constantly need to be revised to support new accounting legislation, then maybe clean code is a great idea. If you're working on a video codec library, maybe speed is more important. I would say the vast majority of software written does not need to run as fast as it's possible for it to run, but it does need to be bug-free, easily maintainable, and written in a timely manner.
@Markov39 15 дней назад ⁺¹
I think "clean code" is also more testable, which is very important for any product.
@TehGettinq 16 дней назад ⁺²
Btw for library design in rust: you can still use enum dispatch (the technique you showed using an enum instead of a dyn trait) and allow users of the library to implement it for their own types. You simply need to have a variant that contains a Box where T impl the trait (here T would be the type the lib's user implements himself). That way the library user can extend it (its still slightly less flexible).
A problem with enum dispatch is when you have too many types implementing the trait/functionality, it becomes harder to manage.
@evlogiy 17 дней назад ⁺²³
If I encounter these split accumulators in code, I would be very angry because, in some cases, it can make the code slower instead of faster, and readability is significantly compromised. Trying to optimize the compiler implicitly never ends well in my experience. If you need to force the compiler to apply SIMD instructions, do this explicitly by using std::intrinsics::simd.
@lavafroth 17 дней назад
std::intrinsics::simd is arguably better but not everyone uses nightly.
@JerehmiaBoaz 17 дней назад ⁺⁴
@@lavafroth No it isn't, avoid hand coding optimizations (which also negatively impact code readability) if the compiler can achieve the same. You've benchmarked your optimization so it can only be slower on a different platform but that's also true for a hand coded simd optimization, so the argument is moot unless you take it to mean that you shouldn't optimize at all and produce clean code instead.
@felixjohnson3874 16 дней назад
I did a quick search because it seemed odd that the Rust compiler wasn't doing this automatically and it seems like Crates like "slipstream" might be a better middle ground here
@JerehmiaBoaz 15 дней назад ⁺¹
@@felixjohnson3874 Creating 4 partial sums and adding them together could result in precision loss. If the floats range from very small to very large and they're sorted, the 4 partial sums will all have very different magnitudes (the first will contain a very small number while the last one will contain a very large number) and if you add those together you'll lose precision. IOW the compiler is correct here..
@felixjohnson3874 15 дней назад ⁺²
@@JerehmiaBoaz yes, but we're talking a precision loss that is magnitudes away from being relevant in 99.9% of cases. This optimization should, at most, be a compiler flag, not require alterations of the codebase to nudge it to do what we want. The cases where this precision loss matters are both literal and figurative rounding errors and in every other case it's an optimization about as 'free' as optimizations can get yet yeilds massive returns.
As a general rule if you need to restructure your code in a functionally meaningless way to nudge the compiler to do what it already can and should be doing that's a pretty significant problem. The vast majority of use cases will not written with performance consciousness, so as many free or near-free optimizations should be applied or made standard practice as possible. If every user application ran 10% faster life would be a hell of a lot better overall, but those applications weren't written intentionally to be fast because they didn't need to be. Getting those 'non-performance critical' applications benefiting as seamlessly as possible from every available optimization is important because those are the applications that have the easiest pickings and still yield meaningful QoL improvements for the end user. I mean thats basically the philosophy behind Rust at its core "Dedicated people can make fast code; dedicated people can make safe code; how do we let non-dedicated people make fast and safe code"
@chrsmtt 18 дней назад ⁺¹⁸
Absolutely incredible! You made a complicated topic very easy to grasp, keep it up!
Thanks for covering it in such detail.
@qu765 17 дней назад ⁺³⁷
uncle bob is a strong supporter of reducing cost to developers ir better than reducing costs to servers
which is less true with google now spending more on computer than developers
and also does not apply at all to anything front end
@b.6603 17 дней назад ⁺¹⁵
Hahahaha it ABSOLUTELY applies to frontend
If you think it doesn't, it means you have never worked in a big frontend.
Of course in the frontend resources are much more constrained. But the way to get performance is avoiding stuff like unnecessary repainting and blocking the event loop, not by avoiding an extra function call.
@SKULDROPR 16 дней назад ⁺³
I don't do front end so I wouldn't know. From what I can tell, front end sometimes has to run on very low power devices. Like a cheap TV, Chromecast or even a fridge for example. Performance would probably matter in these cases, wouldn't it?
@qu765 7 дней назад
@@SKULDROPR yea that's what i'm saying
@thomashamilton564 17 дней назад ⁺⁸
Note that the XMM registers are 128bit, while most modern CPUs have got access to the YMM registers which are 256bits (AVX2), rust doesn't use the YMM registers by default for portability as you mention. Would be interesting to see what speed up you get (if any) from using target-cpu=native when compiling. Also note that generally speaking you don't need to unroll loops to get simd instructions, it seems kinda random and bad that the compiler didn't do it for you in this case.
@1e1001 17 дней назад
& if you have avx512 you also get zmmN registers which are 512 bits
@user-tv6sw3vt9q 3 дня назад ⁺¹
Clean code is optimization for the human operator and, by extension, human-operated constructs.
@foxiewhisper 17 дней назад ⁺²
Every so often, the YT algo brings up gems like these. Love these early videos from new hungry creators. +1 sub.
@xeviusUsagi 18 дней назад ⁺¹¹
insane quality and explanation!
if you keep this up, I can bet this channel will grow quite a lot 💪
@JH-pe3ro 17 дней назад ⁺³
There's a contrast to be made between "Clean Code" methods and "Thinking Forth". Clean Code is about managing the complexity you are presented with by, in essence, shuffling around the papers so that it looks nice. It exists within the reality of having a huge codebase with a lot of code that is usually executed just once during the program's lifespan to configure something, and therefore it never has a performance problem. Thinking Forth - and most of the ideas of Forth - is about defining down the problem until you don't have a complex problem, so you need to throw less hardware at it and you understand that hardware better.
So when we are presented with something like x86 SIMD floating-point instructions, it's already "too complex to be good Forth". You would be advised to design a fixed-point solution instead, since you can make equivalent fixed-point numeric code that is more accurate for a given range, and use less silicon.
@felix30471 17 дней назад ⁺²
Eh, I'd say that "debunking" is a bit too strong of a word here.
Don't get me wrong, I absolutely believe that the performance costs of dynamic dispatch vs static dispatch are something worth talking and knowing. They should be kept in mind, but aren't the only factor that decide what is right for a particular situation.
@rafaels9790 14 дней назад ⁺¹
Kind of a bait and switch as the title implies a dunk on Clean Code. Other than that, you do fantastic visualizations and easy to understand explanations throughout the video. Very good work!
@l4zycod3r 17 дней назад ⁺¹⁵
“Premature Optimization Is the Root of All Evil”, dynamic dispatch costs basically negligible when compared to heavy computation code. So this is example of a very specific and unusual case when actual function is too simple. In general it is better to write working code, profile it, find bottlenecks and optimize, if it is really a concern
@asdfghyter 14 дней назад
There are definitely cases where you can easily accidentally get dynamic dispatch in the tight performance-critical loop of a system in ways that are difficult to fix afterwards. One common example is when using OOP for a game engine. In this case, there is a risk that every single rendered object performs dynamic dispatch several times in every frame, which can have a significant performance impact when you have tons of tiny objects.
If you have done this, you can't really fix this issue by making small tweaks, since it's built into the core of the design of your entire system. Your only options here are to either just cope with the issue and try to optimize other places or to make an entirely different system.
This is exactly the issue that Entity Component Systems were made to solve. In addition to reducing dynamic dispatch, they also improve cache locality over OOP based game engines, by placing similar data together and they reduce branch prediction misses by handling one kind of object at a time instead of working with a big pile of mixed objects.
In the end, you could still use polymorphism with an ECS, but you would move the polymorphism higher up, so it's not used inside a tight loop and so you don't get arrays of polymorphic objects
@Andrew90046zero 13 дней назад ⁺¹
To me, “clean code” is a broader idea about putting more care into how code is written so that your not staring at some function for 2 hours trying to figure out wtf it does, because the author tried to be smart and do manual optimizations that were only relevant 20+ years ago.
Compilers now do a lot of the boring optimizations that are important so we can focus on writing code the way we think about it in our brains.
@Hector-bj3ls 17 дней назад ⁺²²
Just wanted to mention that Christian Mutti's blog was not the original. I'm sure many people have talked about this before, but specifically the blog mentions a video by Casey Muratori: ruclips.net/video/tD5NrevFtbU/видео.html
I think you should add it to the list of sources in the video description.
@empireempire3545 17 дней назад
definitely!
@PikeBot 13 дней назад ⁺¹
I can’t believe someone made a sequel video to Muratori’s garbage fire, a software talk so wrong - and so confident in its complete wrongness - that listening to it made me the angriest I’ve ever been in living memory.
@codercommand 17 дней назад
Great video, but why did you not explain the third version? What's the major difference between an enum wrapping structs and structs that contain a constant/value/enum. I'm curious to know why one is faster than the other.
@lavafroth 16 дней назад
You're basically using a lookup table as described. So in an ideal case, the floating point numbers are laid out packed next to each other in the struct. This makes it easier to load them into registers (look ma! no more vtables).
@Dr-Zed 17 дней назад ⁺¹
Incredible video. I loved the visual SIMD explaination!
@chasebrower7816 15 дней назад ⁺¹
IMO this is a matter of applying the right tool for the job. The vast majority of code you write in application-level software has zero performance implications, so long as it is written with the least bit of competence. The reason is that the few operations that have non-negligible performance cost are generally off-loaded to libraries that can handle that sort of thing. Clean Code principles are best applied to application development, not core framework/library/embedded/low-level development.
If you were to look at any of the apps I've worked on recently, and you picked out random functions from my code, you could probably lengthen the execution time of that function by 100x and the user wouldn't even realize. In this case, I strongly prefer every bit of readability.
@gustawbobowski1333 14 дней назад
Beautiful motion design, Great vid.
@george-broughton День назад
I've known about radare2 for YEARS and this video was the one hurdle to actually getting to grips with it and understanding how to use it lmao
@monsieurouxx 17 дней назад ⁺⁵
Meh. I'm caricaturing, but this talk is more or less "don't use clean code in GPU pipeline and in assembly". I feel like you're missing the point of clean code principles.
@rules1874 15 дней назад
why the fuck am I getting recommended Rust, I've never made a program in my life. Algo has blessed you bruh.
@SKULDROPR 16 дней назад
Pretty trippy how the compiler automatically handles changing things to SIMD so effectively. Always pays to have a look at the asm when performance is concerned. I am also mindful of what gets put on the heap too, as illustrated in this video. I often think, "How can I do this while keeping things close to the CPU", when performance is is the primary concern of course.
@alexstone691 14 дней назад
Writing a fast code from the start should fit under "premature optimization"
@asifzamanpls 13 дней назад
Very interesting. I noticed a similar performance boost when benchmarking loops for some database code a while ago but I had no idea it was due to SIMD extensions. I guess I should dig into the generated assembly more often.
@AbhayKumar-gl5hh 17 дней назад
Can i know the font u used to show code??
@ImaskarDono 15 дней назад
This example works, because the operation is very similar between shapes. If the method implementation would be very different with many instructions, there would be much less difference.
@Gell-lo 13 дней назад ⁺¹
I'm here because the thumbnail tbh. But looks cool.
@ralfmimoun2826 17 дней назад ⁺¹
In "classic" programming languages, the first optimization would be to switch the loops: make "for shape..." the outer loop and "b.iter" the inner loop. I'd be surprised if that would not help here, too. And as long as it gives you the same result, it is a valid optimization.
@shauas4224 14 дней назад
We are not even gonna talk about Burst compiler. That is blowing my mind every time i use it
@ddre54 15 дней назад
Great content and interesting insights. It will be interesting to see the same benchmark analysis with C, C++ or Java languages.
@addmoreice 17 дней назад ⁺¹
Reg. Is. Ster.
not Regist. Er.
Other than that annoyance this was really well done.
@gotoastal 13 дней назад
You can remove the `flake-utils` dependency from this flake by inlining the loop. This will save you an entire dependency & if you include the lockfile always nets less lines of code.
@tonik2558 13 дней назад
Adding in the fast_fp crate allows you to keep the clean code, while also being faster than the hand optimized version:
running 6 tests
test ma::tests::corner_area ... bench: 7,085 ns/iter (+/- 10)
test ma::tests::corner_area_ff ... bench: 3,587 ns/iter (+/- 13)
test ma::tests::corner_area_sum4 ... bench: 3,636 ns/iter (+/- 287)
test ma::tests::total_area ... bench: 7,085 ns/iter (+/- 6)
test ma::tests::total_area_ff ... bench: 2,903 ns/iter (+/- 31)
test ma::tests::total_area_sum4 ... bench: 3,641 ns/iter (+/- 53)
@TheAlexgoodlife 17 дней назад ⁺²
Really clean animations what do you use to make them?
@lavafroth 17 дней назад ⁺²
I use Manim (by 3blue1brown).
@SteinGauslaaStrindhaug 6 дней назад
The fact that you needed to use 4 accumulators to "trick" the compiler into using optimisations, seems more like a problem with the compiler than anything. If you had written it in a funtional way in a language that either is purely functional, has some kind of syntax to indicate that a particular function is purely functional or the compiler is simply good enough to detect that a function is in fact purely functional; and then written the loop using a standard sum or reduce function the compiler should know that applying a pure function on a collection is always parallelizable.
When using a more manual loop construct like a for loop or an iterator like in this Rust code, the compiler has to make a lot of assumptions about your intent to know if it is parallelizable or not; but when you map a pure function over a collection it's a clear sign of intent that you don't care about the order of operations only that it's applied to all of the elements.
@ddystopia8091 17 дней назад ⁺²
With dod you would have each shape in it's own array and work on each array at a time, with no conditionals whatsoever
@dawre3124 17 дней назад ⁺¹
I have never programmed in rust. From coding in C the diffrence between default compiler flags, O3, Ofast and O3/Ofast + march=native would be cool to see gor the duffrent versions. I don't think default C compilers would change float ops without the Ofast flag actually
@user-dc9zo7ek5j 12 дней назад
I think I have seen the same video before, this one has animation while the previous one did not. As other commenters said, clean code does not have to be slow. Clean code does not have to be bloated or have 1 function per file or be way too abstract with 5 layers of indirection. Most developers blindly apply what someone told them at the bootcamp, but they are way too short to show most nuances of development. Here is the blindly followered result: view model, controller, view model to controller dto, iservice, serviceimpl, controller dto to service dto, irepo, repoimpl, iservicedto to repo dto. This is not only unreadable, but also hard to follow and extend and multiple times slower than it needs to be, because devs add mappers and ORMs which makes those indirections pointless and the code is layed out in such a "maybe" way that the compiler cannot do much about it. Trimming? AOT? Inlining? Contrary to popular belief, abstraction and layers in some scenarios improve performance, because it allows developers to work with the bigger picture. For example, making the most optimal function for reading of one file, modifying it, and writing to another file will perform worse, than one that caches the read part and has buffer abstractions which wait before writing bytes. You might say, well I can optimize that, but when the logic is way too intertwined, then it will be hard to see the bigger picture.
@asdfghyter 14 дней назад
I was wondering if the specific number 4 made a difference since there are exactly four shapes and they get added separately to their respective accumulators, but looking at the generated assembly I'm guessing that that wasn't relevant
@shroomer3867 13 дней назад ⁺¹
Meanwhile I'm here, happy that my code even runs to begin with...
@i-am-linja 17 дней назад ⁺⁵
I didn't know that guy's name but I instantly recognised his face. He's the guy who advocated for _one_ programming language for _every application._ I will never take one word he ever says seriously.
@dexus340 15 дней назад
I believe SSE and SSE2 are included in the x86-64 definition; so those extension *should* be present on all 64-bit x86 CPUs.
@VanStabHolme 12 дней назад
I did some tests as well and I found out that:
- Plain loop is always the slowest, since compiler doesn't have much context to work with
- SIMD-hinted loop is fastest *if* you're pre-allocating everything correctly
- Iterators is both idiomatic *and* fast, since it is specifically optimized by the compiler and has a size_hint method to automagically pre-allocate for you
All of this assuming static enums, dynamic dispatch should be avoided and is poorly optimized by the compiler. Iterators are the slowest (even slower than a plain loop) when it comes to dynamic dispatch.
@zorrozalai 17 дней назад
If you just want to know the total area of circles, you can sum up the r^2 values, and multiply the sum by PI. It should further speed up the code.
@zell4412 17 дней назад
what font do u use here ?? 👀
@whitenoisyfurball766 14 дней назад ⁺¹
*unsolves the expression problem*
@afsinbaranbayezit6663 18 дней назад ⁺³³
Nice video bro, but I disagree with the conclusion. What I've realized as I get more and more experience is that a smart but inexperienced engineer usually optimizes code whereas an experienced engineer optimizes deadlines, scalability, reusability, and maybe most importantly their sanity lol.
The moment your project becomes a little bit complicated then a hobby project, not writing "clean code" makes it extremely difficult to develop, maintain, and especially modify your project. Working on a code base that don't focus on writing clean code feels like re-inventing the wheel for every little bugfix or new feature.
At the end of the day we need to recognize that code is only a tool for creating a product. So instead of focusing how we can use our tools in the most optimal way, we need to focus on what the product we are developing requires. If it requires efficiency, we do that. If it doesn't require efficiency, we can still focus on it for fun if we have the time. But in my experience, prematurely prioritizing efficiency on a serious project always ends badly.
It is extremely difficult for a developer to break through their habit of premature optimization though. It just feels really bad initially lol.
Regardless, the video was really well-made, especially for a channel of your size. Good luck with the channel 👍
@lavafroth 18 дней назад ⁺⁷
Indeed, I personally think that one should have a very very compelling reason to build an abstraction. I see a lot of folks reach out for abstractions before even getting started. They end up with BuilderObjectFactoryFactory's.
@chrishenk4064 18 дней назад ⁺²
Summed up my thoughts! I was surprised to be recommended from such a small channel but I see why, keep it up and you'll go places with it.
By coincidence, I had a really compelling example of this at work this week. We're rewriting some of our software in rust, and a lot of stink gets made over small items. I think a big part of it is rust is more up front about static vs dynamic dispatch than other languages, and the younger people are getting excited to go optimize. Most of the time, there's are 1% type differences.
Today though, I just finished a 7088% throughput improvement to our web service! How? No joke, almost all of it just came from switching from IIS Hostable web core to hyper.
What I've really come to enjoy about rust is that in doesn't make me choose between abstractions and performance. If you're pedantic enough, everything can be static. In most cases, that's overkill. But for certain things such as library code it makes massive differences to do the boilerplate and tedium. Tower is an example, the types are really obnoxious but it's a big part of why our hyper based implementation has so little overhead.
What I would encourage people to consider "is the overhead, or the actual work larger?". Avoiding box with chained math, iterators, etc is a big deal percentage wise. Loading forms or database queries? not so much. If you are mostly overhead, and you structure where inlining can't happen, you get these penalties like in the video. That is worth keeping in mind, and it's why you'll see people really cringe at boxed iterators. For those of us using rust, we get the best of both. I still get my nice iterator abstraction, and the compiler gets its chance to optimize.
Also, I agree people are too quick to whip out "design patterns" and your BuilderObjectFactoryFactory's arise. But I think crates like tower demonstrate the power of zero cost abstractions in functional programming. If you use generics, with traits, and dependency inversion, it's clean and fast
@Alice-zj2gm 18 дней назад ⁺⁶
@@lavafroth Definitely agree on your point about the overuse of abstraction sometimes, but I think that's kind of a separate issue where a desire to have clean/modular code ends up becoming the problem it was trying to solve. There's no silver bullet approach to development either way IMO. Every time you build something, your're making a series of tradeoffs (time/money/maintainability/performance/etc.).
Two big reasons that I would make me consider an abstraction at the start though:
1. If you're writing code that does things like making network calls or interacting with a database, you might want to use mocks in your unit/integration tests. It's a lot easier to pass in mock db clients, and things of that sort if you're dealing with an abstraction layer like interfaces.
2. It can make future changes/features/fixes a ton faster as you can often more easily change the implementation for something without needing to go and update the code in all the areas that use it.
I've inherited a couple codebases from colleagues that tried to optomize the performance of everything all the time with similar tricks where the performance gain didn't end up matering very much, and those codebases are a nightmare to maintain. Our current team rewrote large sections of them to prioritize maintainability, stability, and scalability first. There was a slight drop in performance, but nobody noticed compared to the rest of the overhead we can't control. Regressions/bugs are way down, and adding new features is so much faster. So far, positive feedback from our new developers and onboarding time have also improved dramatically. I think some peoople right now are just getting really stuck on the idea of more performance always being better and that's really all we want to avoid.
I've also worked with codebases where there was an instance on everything being ultra abstracted/modular and you end up with those "BuilderObjectFactoryManager" cases with terrible performance or build systems because of the insane amount of code that is effectively duplicated or not necessary. My guess would be at at some point there was a trend to overcorrect in the pursuit of "clean code", and now we're seeing the pendulum swing back towards a mindset of performance optomization first (lot of articles on this topic lately, especially with Rust/Zig). I think we will likely also overcorrect and see an increase in codebases that are similarly hard to maintain and work on because of it in the next 3-5 years.
It's not a one solution for everything kind of problem. Some people totally go overboard with too much abstraction, and some with performance chasing. I also personally feel that a strong reliance on polymorphism (specifically overuse of inheritance and subclasses) tends to quickly become the enemy of both clean/maintainable code and performance. I almost always prefer structuring packages/code to not rely on inheritance and instead use additional member structs or a more functional approach whenever possible, but that's probably a more subjective opinion.
The purpose of the code and where it runs also have a massive weight when making decisions on what to prioritize in tradeoffs.
I'm just sharing my braindump on the pros/cons of both that I've personally experienced. Your video was excelent and I think it shines in prompting people to put a little more time into thinking about the performance of what we write and not overusing certain "clean code" patterns.
@oscarfriberg7661 17 дней назад ⁺¹
⁠@@Alice-zj2gm Your brain dump aligns with my own experiences. Premature optimization is as bad as premature abstraction.
Start with a simple solution. Avoid the temptation to over engineer it. Most bad code comes from over engineering.
@davidmartensson273 17 дней назад ⁺¹
@@oscarfriberg7661 So true. I almost always prioritize readability over optimized when writing code.
Once I find areas that really do need to be optimized I will do that but I would never ever start out by trying to optimize something since it makes refactoring it harder and many times once I get to the optimization part I find better solutions to cut computational cost by better understanding exactly what is causing the problem, maybe optimizing the loop is the wrong thing, maybe I could calculate the sum when I build the list as a separate value and skip the whole extra loop altogether.
That saves even more performance than making the loop faster.
The idea behind clean code is not to always sacrifice performance but to making sure the code can be read and maintained over time, which for the wast majority of projects is a very very important feature.
It does not matter how optimized a code is if you need to scrap it first time you need to change it if changes are expected to come regularly.
Sure there are cases where you really need to go ballistic with optimizations, but if you really need to go that route, make sure you really understand all the implications of that optimization.
I saw another similar "optimization" video doing much the same thing, problem was, it only ever works for lists that are dividable by 4, or what number of separate steps you add.
With any other number it either skips some values in the end or crashes.
And if you add checks, suddenly the unrolled loop is slower, so then you need to expand it to add up to the last group of four and then have extra code add in the remaining items and the code is quite a lot more complex and much more prone to mistakes by the next one trying to maintain it.
@CalgarGTX 15 дней назад ⁺¹
I would argue that with the way most (on the business side anyway) dev projects are being run these days, with devs coming on and getting off the project left and right it's more important to have a codebase that is as clean/maintenable/understandable by the common denominator dev than a super optimized to hell and back codebase that only the first guy who wrote it can understand and you are stuck with dead code once he's gone.
That's a sad thing to say but I've seen it happen way too many times. The quality of the average dev these days is imo quite low. And I'd rather have a project where whatever dev ressource is available can actually work on fixing bugs or expending it to support a new use case/feature, than them throwing their hands up because they don't understand what the hell they are looking at and might break more things than they fix when touching anything.
On a more philosophical level I would have thought it was the compiler's job to turn whatever human written code it's given into a performance optimized runtime ?
@martandrmc 17 дней назад
The title made me think you were gonna talk about spectre and how speculative execution will have to ultimately be dropped but I was surprised that it was about SIMD instead!
@TechnologyRules 14 дней назад
Thank you so much for this video.
@MarkTomczak 16 дней назад
This is a very good breakdown of a specific optimization. I would say that in general, The Meta reasoning here is "If you are going to optimize, you have to care about what the compiler actually does." The key idea here is "What code do I have to write to allow the system to take advantage of the CMD features on my CPU?"
And more generally, that might not even be the goal. Depending on the workload it is possible that you want to fix this problem by bundling up all these transformations into GPU workloads. But that's another optimization where knowing how to take heterogeneous data and homogenize it is useful.
@TTOO136 14 дней назад
This is really really interesting, wanted to comment to boost engagement :)
@t0rg3 17 дней назад ⁺²
Is it possible that you completely missed the point of “make it run, make it clean, make it fast”? Yes, some of these abstractions are costly, but for the most part it doesn’t matter. For the one tight inner loop you will still want to use any optimization strategy in the book.
@Bobo-ox7fj 14 дней назад
Love the "bugger it, computers are quick nowadays, nobody will care if my tiny utility eats six gigs of ram, has an insane memory leak on top of that and needs a quad core processor to run without lagging" approach to programming... err, I mean the clean code approach.
@JG-nm9zk 15 дней назад ⁺¹
resistor? register?
@liamh1621 14 дней назад
The only thing Uncle Bob optimises for is book sales and speaking events
@ferdynandkiepski5026 17 дней назад
At this point most if not all cpus have avx2. The proper way is to use something like cfg-if to check register width. But for code ran on a known target cpu, whether it's your own mavhine or a server setting an explicit target-cpu flag in rust would make use of the available instructions. The only case where you should worry about portability is CI/CD release builds, for the rest of the code it's safe to assume target-cpu native as cross compiling is rare. And if someone does it they know what they're doing.
@Davy-oq9pn 17 дней назад
What's the code font? it looks great
@madks13 14 дней назад
A bit late since the video just poped up on my feed, but i do have an argument : 10M+ LoC, a project i am working on currently.
All i want to point out is Clean Code is a tool, and like all tools, you need to use them when appropriate.
@IS2511_watcher 17 дней назад
Why use dyn Trait where impl Trait would be sufficient? dyn Trait is basically throwing away all the compile-time coolness of Rust.
"But Vec doesnt work", so use the trait system again, do impl Iterator. (The signature is more complicated actually, but that's fine).
I'm really confused on why a Rust programmer wouldn't try the compile-time approach first, IMO dynamics are basically a last chance instrument if you can't be bothered to properly do generics and traits or you *require* a uniform structure with dynamics, but even then you're better of using an enum... Traits are still useful even in the enum case, just make it implement the trait and make the function still accept impl. Make it as generic as it can be without sacrificing perfomance.
@pvtcit9711 15 дней назад ⁺¹
Get onto a project with many libraries, apps and APIs, built over years by many develops that have long since left and try add new functionality and you'll be wishing the code was clean code. that's the point
@-syn9 17 дней назад
There's still more performance to be had here, you currently have to unpack the struct within the loop (array of structs). If you had laid out the vector from the beginning as a struct containing 3 vecs, you could save some cycles unpacking (struct of arrays)
(haven't tested this, interesting video, wish it went a bit deeper)
@glitchy_weasel 14 дней назад
What a fantastic video! So my take away is that clean code is not the best approach for number crunching - like the core of a simulation solver, a renderer, or similar. Rather, it is better to use clean code in places where functionality can be enhanced by it, rather than hindered. Thoughts? Also, funny thumbnail btw.
@ukyoize 18 дней назад ⁺⁷
Seems like compiler isn't smart enough
@mgord9518 18 дней назад ⁺⁸
Yeah, compilers should just use magic because people can't be bothered to understand how a computer actually works
@YourMom-rg5jk 18 дней назад
@@mgord9518 man is enlightened
@connork9654 18 дней назад
computers aren't smart at all
@gregwholesome9202 17 дней назад ⁺¹
I guess by that logic you shouldn't use a compiler at all and should just manually compile Rust into machine code by hand and actually learn how computers work.
Jokes aside, we expect compilers to be smart and do "magic" stuff because that's exactly why they exist in the first place - to translate human readable code as good as possible to highly optimized machine code.
@mgord9518 17 дней назад ⁺¹
@@gregwholesome9202 Yes, but there should be reasonable assumptions as far as how advanced the optimizations are.
If I do something in an inherently slow way, such as allocating memory for thousands of objects and using them individually, I cannot reasonably expect the compiler to understand my intent and have my code converted to an AoS with a single large allocation.
@henry-js 17 дней назад
What font is that? I like it
@oligreenfield1537 17 дней назад
Casey Muratori the creator of handmande hero allready made a video about how bad abstraction are when performance is the main focus. Never confuse the compiler
@anyalei 16 дней назад
Super insightful video! I'd argue it's a tradeoff between maintaining/developing software and it's performance. If performance optimisation takes utmost priority, clean code, maintainability, readability etc go out the window. It's just not a concern. Likewise if you're developing software in a large team of not-entirely-illuminated programmers, sticking to abstraction to keep code from devolving into a tightly coupled mess, and you can just throw more compute at the problem, performance optimisation isn't super relevant. Truly outstanding software does both, but we all know that's just not what most companies are aiming at. Software is as shitty as it can get away with, and the _astounding_ hardware performance increases just get eaten up. It's tragic, really. A spreadsheet is just about as snappy these days as it was in 1998, because we added layers of VM that it has to run in.
@phitsf5475 15 дней назад
Who needs optimisation when we have hardware? I will have my spaghetti and I will eat it too
@colejohnson2230 17 дней назад
I wonder if one of the built in reduction methods like fold would enable the same performance boosts without the added code smell
@simpson6700 14 дней назад ⁺¹
I thought this was related to Tesm Fortress 2...
@Sylfa 13 дней назад ⁺³
More money has been lost due to bugs, and failed projects, than has ever been lost on the code running slowly.
You only spend a few seconds writing the code, and spend the rest of the projects lifetime maintaining it. Being unable to locate bugs because it's "optimized" when you lack any performance metric that shows it *matters* is going to hurt, financially, until the end of the project.
Premature optimisation is the root of all evil, writing "fast code" instead of clean code is premature optimisation in a nut shell.
You're *not* supposed to optimize the code that runs once every 24 hours and finishes in seconds. You're supposed to optimize the code that runs in a nested loop continuously.
Starting with clean code gives you the base you need to measure, then you can look at if there's a high level optimisation that can be done, such as swapping from bubble sort to quick sort. Then finally do you do hacks and other optimisations to speed things up. And document them properly, of course.
@Boz1211111 4 дня назад
Can someone explain? i have no software knowledge but i really wish to know why hardware optimization went down the drain and how is it gonna affect everyone?
@bpz10 17 дней назад
Great video!

Следующие

Автовоспроизведение

What information does Windows Actually collect?