I made it FASTER // Code Review

Поделиться
HTML-код
  • Опубликовано: 6 сен 2024

Комментарии • 739

  • @TheCherno
    @TheCherno  2 года назад +531

    Thank y'all for watching! ❤️ Do you want to see more optimization/performance stuff? Also don't forget that the first 1,000 people to use this link will get a 1 month free trial of Skillshare: skl.sh/thecherno12211

    • @martiananomaly
      @martiananomaly 2 года назад +2

      Nice

    • @mr.mirror1213
      @mr.mirror1213 2 года назад +18

      I want more optimization stuff so we can implement it in our own work

    • @art0nsec
      @art0nsec 2 года назад +12

      I definitely would like to see more performance related stuff

    • @BlenderhilfeDe
      @BlenderhilfeDe 2 года назад +4

      this topic is so interesting and you explain it very good :) thx!

    • @karelnemec6770
      @karelnemec6770 2 года назад +3

      Hi :), just stop asking and record additional video RIGHT NOW!!! :). Your skills are incredibly insane. Pleeease mooooooore... 😅

  • @vale1286
    @vale1286 2 года назад +4376

    Man, I never thought you would pick my project! Definitely made my day!

    • @vale1286
      @vale1286 2 года назад +714

      Just a couple of comments:
      - I am puzzled by the slowdown in accumulate! I remember testing it against a raw for loop in simpler usecases and it even turned out a few % faster. Granted, it was on gcc so it could be MSVC related, but somehow that would be equally surprising :\
      - I don't like the whole 100% object oriented theme of the project as well, but I decided to follow the original book in that regard because I wanted to do everything in a weekend, and changing the whole architecture would have taken way longer :)

    • @TheCherno
      @TheCherno  2 года назад +529

      Glad you saw it! Was about to send you an email asking if you had these performance issues as well, I had a feeling it might be compiler/library-related. Thanks for sending in your code!

    • @vale1286
      @vale1286 2 года назад +139

      @Harry Byrne Well, the goal was to follow the original book and I wanted to be done within the weekend! So this was my attempt to get familiar with the topic in a reasonable amount of time :)
      I'd like to solve the big issues first (design a better scene model, get rid of all the unnecessary hierarchy and so on...) before getting on the GPU, which is a completely unexplored land for me!

    • @vale1286
      @vale1286 2 года назад +65

      @@riidefi1575 A unwanted copy of the shared_ptr would explain a lot (with the control block being copied for every ptr in the container...)
      But the lambda takes the arguments as const& so I think this is up to the library implementer to make sure things are forwarded correctly to the function, right? Maybe this is why I never had problems with gcc and clang libraries... definitely gonna check this out later on

    • @vale1286
      @vale1286 2 года назад +28

      ​@Harry Byrne Yes, I am! I'll gladly take any help I can get :)

  • @mCoding
    @mCoding 2 года назад +1147

    Fun fact! std::accumulate actually did have a bug in it that was fixed in C++20! (See the #if _HAS_CXX20 clause at 24:59). std::accumulate was originally meant to accumulate small objects, so it passes things by value. As of C++20 it moves the values since they could be large. I'd be curious to see the performance improvement just by compiling with C++20! As a side note, I'd say that this operation, finding the closest object, doesn't really accumulate anything, so although accumulate can do the job, I would argue that a for loop expresses the intent better anyway.

    • @scoreunder
      @scoreunder 2 года назад +34

      As someone who does FP a lot, I feel like it does accumulate: it accumulates the minimum distance for a ray bounce. But maybe I'm just mind-poisoned by HOFs?

    • @mCoding
      @mCoding 2 года назад +58

      @@scoreunder Hahah I wouldn't say you're _poisoned_ by functional programming, perhaps enlightened? Of course, with higher order functionals, you can build very unexpected/general things out of some basic building blocks. But from a more grounded perspective, it seems unlikely that a highly optimized algorithm could remain optimized for an arbitrary functional as input. Personally, I find that if the lambda you pass in is larger in size than the algorithm itself, then it is highly likely you will find a performance hit waiting for you on the other end. I most often prioritize readability and I also find that when the intent of something becomes unclear (perhaps a _clever_ lambda to an algorithm) the optimizer will also not do very well. Don't get me wrong though, I'd love everyone take the opportunity to learn some category theory.

    • @fromgermany271
      @fromgermany271 2 года назад +14

      Why std::accumulate at all if you want to do std::for_each??
      Typical „I don’t understand the new stuff, so I stick to the old“.
      He even thinks „range based loop“ is different from „old style for loop“. If so, he should get a working compiler.

    • @VastyVastyVoid
      @VastyVastyVoid 2 года назад +20

      @@fromgermany271 I may be misunderstanding what you mean, but several modern Cpp books (notably Accelerated C++) stress the semantic value of std:: accumulate. In other words, std::for_each doesn't tell us the intent if the code; std:: accumulate does (it's a summation).

    • @fromgermany271
      @fromgermany271 2 года назад +14

      @@VastyVastyVoid
      I meant: he has a problem with std::accumulate and says „to slow“ and „classic loop is faster“.
      I say, accumulate changes a „sum“, that does not really fit to „parallel“. But I cannot see why a „sum“ should be needed at all.
      But in general saying „I don’t understand something in stdlib, so I just fall back to C-style“ is something not uncommon, but not applicable for talks.

  • @LB767
    @LB767 2 года назад +581

    You can tell this is a physicist's code because they're used to wait weeks for simulation results... 😂

    • @colonthree
      @colonthree 9 месяцев назад

      Malappropriation of physics, then, as programming is also physics.

    • @TBButtSmoothy
      @TBButtSmoothy 9 месяцев назад +2

      too much philosophy, not enough practicality@@colonthree

    • @mad_circuits
      @mad_circuits 8 месяцев назад

      Ahh, physicists bashing, I'm in. 😂

  • @Revoker1221
    @Revoker1221 2 года назад +894

    I for one would love to see more optimisation stuff. I'm currently in the process of learning how to write high performance code, so learning how to self evaluate a program's performance via either code inspection or using tools to develop metrics will be huge for me.
    That said, I'm just happy to see a Cherno video in my sub box either way, so thanks for taking the time to create and upload this video.

    • @TimeFadesMemoryLasts
      @TimeFadesMemoryLasts 2 года назад +24

      Btw. a really nice way that helps is to use a very slow chip that restricts you a lot, like for example an Arduino where you get hit hard if you code inefficiently.

    • @xeridea
      @xeridea 2 года назад +11

      A big part of optimization is similar to this video, finding slow parts, and cutting out unnecessary overhead. Giving the CPU less work. That and keeping cache sizes and layout in mind, which is easier to consider if using simpler methods with less abstraction. Modern desktop CPUs tend to max out at 32MB L3 cache, but it is better to try to stay within 8MB.

    • @epiicSpooky
      @epiicSpooky 2 года назад +5

      It's mostly what you see here - use a profiler, because often there's something bad but hidden somewhere non-obvious. (It's often not worth even guessing like he did, start with the profiler to avoid biases.) Avoid unneeded copies. Avoid replicating the same expensive work. And in C/Rust, prefer stack over heap.
      It's much much further down the line that it makes sense to start worrying in more detail about cache implementation details.
      But don't forget readability is critical. Your future self will appreciate it.

    • @tadghhenry
      @tadghhenry 2 года назад +1

      @@TimeFadesMemoryLasts Sorry im missing something do you mean running with the whole ide or just the compiled code itself?

  • @gracicot42
    @gracicot42 2 года назад +433

    For the slowness of `std::accumulate` I'm sure it's down to the copy of the `std::shared_ptr` inside the HitRecord, which do atomic operations to keep the refcount up to date. In C++20, it's not copying it anymore. The best would be to not use shared pointer at all.

    • @DimiterStanev
      @DimiterStanev 2 года назад +14

      That's what I thought too, and made a little proof myself, by cheating and making it (for a bit just plain pointer), then std::accumulate was still fine (C++17) - with 20 and std::move() on it - even better.

    • @yokebabjr3866
      @yokebabjr3866 2 года назад +8

      Yes and we could also have use std::reduce and use C++17 parallele algorithm to get the code vectorized easily

    • @sephorusFR
      @sephorusFR 2 года назад +51

      this should be pinned. std::accumulate is in no way the problem. Shared_ptr is.

    • @Narblo
      @Narblo 2 года назад +2

      Could be also the lambda call where here is just a loop. The jump may mess the cache unless the compiler is good enough to inline the lambda in

    • @DimiterStanev
      @DimiterStanev 2 года назад +5

      @@Narblo It's not that, it's really lots of "lock xadd" (atomic inc/dec due to shared_ptr copying). Granted that was on my machine, with my RAM, my CPUs, etc. - it may be better on others, but maybe even much worse (with say NUMA configs). For in house game editor we had to disable one of the CPU's in it to avoid that, as we've been also guilty of over-using abit more of shared_ptr - though not like this.

  • @nkronert
    @nkronert 9 месяцев назад +34

    With 7.5 minutes of rendering time, the objective "Raytracing in a weekend" had already been achieved by a large margin 😊

  • @dertobusch5720
    @dertobusch5720 2 года назад +294

    Hello Yan,
    I would love to see you optimize this code. The fact alone that replacing one single function with a for loop can yield such a performance boost shows how good this code is. It is definitely a great starting point for a video about optimization. Given the fact that the accumulate function was written in order to be used in cases like this, it being so slow isn ´t something that a coder should have to worry about. It ´s a language related problem. You have already started talking about playing to the strengths of the hardware architecture. This performance boost was possible because the coder didn ´t know what the accumulate function actually does. You might even go as far as to explain the different features of 32 bit and 64 bit processors and assembly language and then go back to explaining how c++ as a language actually utilizes them. That would be highly informative, albeit really time consuming, but this very code example shows how easy it can be to go from a working piece of software to a really satisfying experience for both coders and users if you are aware of the inner workings of your tools. In short: Please do it. I love your content, by the way, and have been a subscriber for a long time. All the best to you and your loved ones from Germany.

    • @phantom_stnd
      @phantom_stnd 2 года назад +26

      Its not the function, but the copy of the shared pointer that causes the slow down

    • @RucadiDev
      @RucadiDev 2 года назад +3

      @@phantom_stnd Exactly this.

    • @adamp9553
      @adamp9553 2 года назад +4

      Only the really bottlenecks need to be optimized. I think Yan tackled that well enough via the profiler. And assembly is diminishing returns these days compared to intrinsics and modern compilers that automatically find intrinsic equivalents.

  • @simonbaxter8001
    @simonbaxter8001 2 года назад +249

    These aren't just code reviews, but absolute masterclasses as to why one approach wins over another. I've learnt so much in this 38 minutes than I've learn in the last 5 years!

    • @yohannes2kifle
      @yohannes2kifle 2 года назад +40

      What have you been doing the last 5 years? Did you have internet?

    • @simonbaxter8001
      @simonbaxter8001 2 года назад +18

      @@yohannes2kifle Too busy churning out products and applications! 😉

    • @svnhddbst8968
      @svnhddbst8968 2 года назад +5

      also learning the methodology of diagnosing inefficiencies. the way he does it here for a c program still applies to all languages "break down to verbose and deliberate to see which step is causing the problem".

  • @cinad.nayeri6833
    @cinad.nayeri6833 2 года назад +70

    Please a optimization on this code! It would be very entertaining and educational for everybody! I personally would love to see that video!!!

  • @wedusk
    @wedusk 2 года назад +104

    In my experience, when changing a single part of the code can speed things up so much, the problem is almost always some kind of superfluous copy somewhere. Also, an additional optimization video would be really nice!

    • @skilz8098
      @skilz8098 2 года назад +2

      Less unnecessary copying and more usages of move semantics or using references to objects definitely helps towards improving performance and efficiency making the code already pre-optimized.

  • @capsbr2100
    @capsbr2100 2 года назад +68

    This is exactly the content i wanted to see. Old school coder myself, i struggle a bit on learning modern c++ stuff on my free time, so it's great to see that raw quick comparison. The explanation and clarity from Cherno is hard to match on any other material you might find on the web. Loving it. ❤

  • @GreenClover0
    @GreenClover0 2 года назад +21

    The speed you analyzed the whole project is impressive! Raytracer is one of the projects that I want to do in the future. At the same time I really love optimization videos so I cannot wait to see the continuation :^)

  •  2 года назад +4

    I would love to see you go through and refactor/optimize the code (get rid of the std usage where it's superfluous etc). Someone also mentioned C# and using that to achieve similar results would be really cool. Span was added not too long ago to help with stack-allocations without having to use unsafe and stackalloc.

  • @ZeroUm_
    @ZeroUm_ 2 года назад +12

    I think it would be an amazing series, take this one codebase, one episode going deeper on why virtuals were bad in this case, de-OOPing it, a longer one bringing misc stuff closer to the metal, and the final one doing it on a GPU.

  • @mad_circuits
    @mad_circuits 8 месяцев назад +2

    10:02 It is so funny to see your cam video freezing while rendering that slow raytracing image. 😂 It's like being a party host kicked out of his own house.

  • @MrFedX
    @MrFedX 2 года назад +16

    This was a great video! As a person who just rewrote a pure C++ project in a more C-style C++ I can relate. I was taught that the explicitness of C is inherently bad and it was an eyeopener to realize this is not the case.

    • @danjohnson6800
      @danjohnson6800 9 месяцев назад

      Very true! C is blazingly fast, C++ is well organized. There is a performance tradeoff in the organizing, forcing memory swaps etc., so would always do a back of the envelope performance calculation to be sure I wasn't wasting cpu. I started in C in '86 when we didn't have CPU to spare! So performance has always been in mind, and the success of many projects came down to that as they grew in size and complexity.

  • @beardyman
    @beardyman 2 года назад +51

    Please do more of this!!

  • @henne9707
    @henne9707 9 месяцев назад +3

    "I could have *written* a ray tracer in that amount of time..." GOLDEN! :)

  • @BoyBaykiller
    @BoyBaykiller 2 года назад +77

    Hey Cherno. Question:
    I know this channel is mainly focused on C++, but I was wondering if you are also accepting code that is made in for example C# and uses like some wrapper for OpenGL?
    Edit: I finished the Video and yes watching you make this RayTracer faster would be cool.

    • @TheCherno
      @TheCherno  2 года назад +59

      Of course - I've got quite a bit of C# experience as well, and anything graphics related is always good!

    • @YoTengoUnLCD
      @YoTengoUnLCD 2 года назад +14

      @@TheCherno hey man, how about rust?

    • @iXNomad
      @iXNomad 2 года назад +7

      @@YoTengoUnLCD rust is dead, it shall never excel c++. It's just overhyped.

    • @YoTengoUnLCD
      @YoTengoUnLCD 2 года назад +6

      @@iXNomad who asked?

    • @iXNomad
      @iXNomad 2 года назад +8

      @@YoTengoUnLCD it's a joke

  • @skilz8098
    @skilz8098 2 года назад +6

    No, the optimization part is very interesting, helpful and useful. I think continuing down this road is a great course of action to help others to improve their own code bases / frameworks. It should help to give them insight towards their own approaches on how to analyze and profile their own code, what to look for, and how to optimize or improve its performance. Great Video!

  • @pcarey5001
    @pcarey5001 2 года назад +2

    Can you do a reaction to the new Matrix Awakens PS5 experience? I’m curious what your take is on the engine demo from a programmers point of view.

  • @RachelMant
    @RachelMant 2 года назад

    with the std::unique_ptr wrapping a buffer like that - the point of it, and what you're getting here - is exception safety and a guarantee of no use-after-free or forgotten delete call during object destruction - two things that given the multi-threaded nature of this project, are quite important to get right and impossible to do with raw pointers.

  • @gracicot42
    @gracicot42 2 года назад +1

    I love unique pointer wrapping array, but only if this is necessary. Normally I would use a vector, but when I'm implementing something much lower level.
    In my codebase you'll see some `std::unique_ptr` and I use it to implement a vector of a runtime defined structure. Mainly to contains all uniforms for a list of objects to render, or to initialize a structure of buffer objects for opengl.

  • @jameswhitehead74
    @jameswhitehead74 2 года назад +10

    I'd love to see both of your own fully optimised version of this implementation as well as a GPU based "ray tracing in a weekend". I think they would both be fascinating.

  • @elirannissani914
    @elirannissani914 2 года назад +3

    33:52 Yesssss you king

  • @MrAlbinopapa
    @MrAlbinopapa 2 года назад

    Since you asked for opinions about the unique_ptr array, here's how I handle it.
    If I'm needing an array where I need to push/pop from the back only, I use std::vector.
    If I'm needing a buffer, I make a buffer class and still use the std::unique_ptr as the data holder. This allows me to focus on adding functionality without having write all the extra code like deleting the copy ctor and copy assignment operators, nor would I have to write the move ctor or move assignment operator, nor would I have to write a destructor. With std::vector, you get copy and move for free, but then you have to write the two lines to delete the copy ctor and copy assignment operator. As for wrapping a naked pointer, you have to write all those yourself including destructor. Not using std::unique_ptr for a heap allocated array is more work compared to a naked pointer, even if utility wrappers like this isn't the bulk of your program.

  • @siddharthasarmah9266
    @siddharthasarmah9266 2 года назад +5

    8:51 a pointer to an array looks like a double pointer to me...

  • @DarkZeros
    @DarkZeros 2 года назад

    The problem is not std::accumulate.
    It is the fact that iterating an accumulate with std::optional make s a "copy" of it every pass of the accumulate.
    Accumulate should be used with trivial copy/move elements, or ensuring the elements handle properly the move.

  • @sirenti9384
    @sirenti9384 2 года назад +4

    I want a Cherno, who does such videos with the Rust language 🦀. But still love it.

  • @ivolol
    @ivolol 2 месяца назад

    A possible suggestion for video format, in the case you want a more complete exploration while not being a 2 hour video;
    Record your thoughts during the beginning, middle, and end of the process.
    Beginning - overall impressions of what the weeds will look like, guesses at what needs to change, etc
    Middle - what the weeds *actually* look like, things that have and haven't surprised you so far. Examples of transforms done. Also kinda fun to see whether you can guess where the 'middle' would be and if you're accurate on that by the end.
    End - obviously get to discussion the final results and overall learnings

  • @AhmedSaeed97
    @AhmedSaeed97 2 года назад +5

    Thoughts on the new un5 matrix demo

  • @ericmueller8505
    @ericmueller8505 2 года назад

    He mentioned not liking the usage of unique_ptr for a buffer, and preferring vector instead.
    vector gives you a size, which is nice and a bit less implicit, and resize() is a bit more terse. Though there is an overload of make_unique for arrays that takes an array size (e.g. make_unique(1024) or what not). It didn’t look like that was used in the given code, but it’s fairly terse too.
    One big difference in some contexts is that unique_ptr is a much more lightweight template than vector. I had some heavily templated code with a fixed but dynamically sized array where switching from vector to unique_ptr cut the compile time in half. This code didn’t look to be heavily templated so it likely doesn’t matter, but it’s useful to know.

  • @anthonysteinerv
    @anthonysteinerv 2 года назад +2

    it's almost 2022, and we stil waiting for the build system with git configuration video. Sadge.

  • @user-cz1ex7kf2g
    @user-cz1ex7kf2g 2 года назад +3

    Bakko: It's multithreaded 😎
    Cherno: 👁👄👁

  • @weeb3277
    @weeb3277 2 года назад +2

    Keep going. Let's see how far you can take it!

  • @giuseppecapasso8558
    @giuseppecapasso8558 2 года назад +4

    glad to hear other people from Italy!

  • @vigneshungrapalli2267
    @vigneshungrapalli2267 2 года назад +1

    In most cases, I would avoid std::unique_ptr instance and instead use a std::vector. The only case where I would prefer unique_ptr is if the `any_type` is '`bool`. For most implementations std::vector is space optimized and creates issues when we need a raw pointer on it to pass on.

    • @zrneely
      @zrneely 2 года назад

      One downside to vector is that it allows easy resizing on the fly, which may not be something you want. If you need to keep a long-lived pointer to something inside the buffer, using a vector may not be ideal because if someone pushes to it, it may result in the internal data storage being copied to a new location and then freed, leaving you with a subtle use-after-free bug. That's still possible if you use a unique_ptr, but it will be much more obvious because you have to explicitly replace the entire buffer, as opposed to just calling push_back or emplace_back.
      I don't think there's anything wrong with using unique_ptr for a situation like this; it's basically a safer version of a C-style pointer-to-buffer. A vector serves a different purpose, as a dynamically-sized collection of items. If you don't need the resizing capabilities, why use the more complex type?

  • @OMGclueless
    @OMGclueless 2 года назад +1

    I use unique_ptr quite a lot. It's a nice pattern for when you want a non-resizable vector with size specified at runtime. The API is somewhat worse than a vector, but you can certainly wrap it into a nicer type (like an owning version of the buffer helper you describe in this video).

  • @372leonard
    @372leonard 2 года назад +11

    about the accumulate, I dont know what's going on either but it seems like lambda functions just absolutely destroy performance when called very often. There must be some overhead in starting up, finding or setting up lambda functions. Would be interesting to know what happens under the hood with lambda functions. Maybe they cant be inlined by the compiler?

    • @vale1286
      @vale1286 2 года назад +4

      Yeah, at this point that would be my guess as well, and this might be MSVC related. I never saw anything like this on gcc and clang... Visual studio ships also with clang IIRC, so it should be easy for me to see if that's the case

    • @user-sl6gn1ss8p
      @user-sl6gn1ss8p 2 года назад +1

      @@vale1286 let us know if you do : )

    • @zynius
      @zynius 2 года назад

      They are inlined. The issue is the optional in conjunction with accumulate. A bug fixed when compiling in C++20 mode.

  • @NPSJoker
    @NPSJoker 2 года назад +1

    You should do a video on how to navigate VS, like shortcuts and what not. Your ability to navigate VS like it's vim is impressive

  • @ArkDEngal
    @ArkDEngal 2 года назад +2

    I love watching optimization. It helps my mind negotiate code decisions to do optimizations by default

  • @remmoze
    @remmoze 2 года назад +11

    "raytracing in a weekend" is actually how long it takes to render the full image on an average computer

    • @Exotic69420
      @Exotic69420 10 месяцев назад

      😂😂😂😂😂🎉

  • @bruderdasisteinschwerermangel
    @bruderdasisteinschwerermangel 11 месяцев назад

    I think wrapping a uint8_t[] inside a unique_ptr is really good because unique_ptr will guarantee that the memory is freed. However, in this use case there should be no downside to just using std::vector. Unless you resize the vector it will behave the same way and it has the benefit that you don't need to store the size elsewhere.
    The only use case where you can't use vector for something like that is if your element type is not copy constructable (iirc the template instantiation of vector would fail) or if you want a vector of bools. Because yknow vector is some cursed template specialization that should've never be brought into the standard.

  • @Ninkjeboi
    @Ninkjeboi 2 года назад +1

    Congratulations on 420k subscribers. Celebrate it with a big fat blunt.

  • @AlexDanut
    @AlexDanut 2 года назад +1

    The optimization episodes are fun lol more please

  • @nanman_chief
    @nanman_chief 2 года назад +3

    21:38 24:54
    Hi Cherno!
    I would always use a loop in this case, rather than overengineering it with std algorithms. But I would say that std::accumulate is fine for C++ compilers, don't be afraid it. When it compiled with debug mode, it will not be optimized. But it works well under the release setting. With release setting / -O2 flag, it does get inlined and unrolled into a loop.

    • @llothar68
      @llothar68 Год назад

      But developers should always use debug mode for everything. Don’t get into this mentality.

  • @rouslanngouyamsa9055
    @rouslanngouyamsa9055 2 года назад +2

    It will good to see you optimize that further.

  • @rrutter81
    @rrutter81 Месяц назад

    as a dev of 20 years, its always fun to watch optimizations. its very satisfying to watch

  • @Gramphimcs419
    @Gramphimcs419 2 года назад +3

    Incredibly interesting! If you do end up making a follow up video, then I personally would like to see your take on CPU cache coherency.

  • @nullptr6249
    @nullptr6249 2 года назад +4

    i'd love you making a follow-up video

  • @urugulu1656
    @urugulu1656 2 года назад +1

    24:00 best measure for code quality during review: wtfs per second...

    • @vale1286
      @vale1286 2 года назад +1

      I feel like I could have done a better job of maximizing that metric.
      Granted, all my wtfs are in a single function call so my wtf density is through the roof.

  • @beewyka819
    @beewyka819 2 года назад +4

    I THINK the problem with using accumulate here is it returns by value instead of by reference, so thats a LOT of copies being made

  • @Arzen84
    @Arzen84 9 месяцев назад

    I would love to see you further improving this code!

  • @hstrinzel
    @hstrinzel 2 года назад +3

    DEFINITELY A LIKE! Big! THANK YOU! The point of OPTIMIZING vs OOP reminds me of John Carmack's wild "Inverse Square Root" logic in Doom. Do the math not the objects...

  • @tgha4209
    @tgha4209 2 года назад +1

    Inspirational! I'd love to see your optimization process

  • @marilynman
    @marilynman 2 года назад +3

    You know things are slow when the camera stutters XD.

  • @paherbst524
    @paherbst524 2 года назад +2

    I'd be interested in the optimization part. It would be interesting to dig in to see what exactly was slowing it down before.

  • @sajon_
    @sajon_ 2 года назад +2

    Go faster!! See how much improvements you can make! Would love to see that! And also I would have also loved to know what was the time using the improved code with the sampling rate and bounce set to the original, 500 and 50 I think...

  • @yaboinachin7198
    @yaboinachin7198 2 года назад +1

    The frozen cherno just makes this video so much funnier than it should be

  • @thewhitefalcon8539
    @thewhitefalcon8539 2 года назад +1

    Time can be attributed to the wrong line of code when profiling an optimized build. If the profiler can't figure out which specific line of code is spending time because the optimization mixed up all the lines, then it could just point to the overall function.

  • @noiamhippyman
    @noiamhippyman 2 года назад +2

    Watching you read through somebody else's code like it's a kindergarten book is so crazy to me. I forget how to read my own projects after a month of inactivity.

  • @maxofcourse
    @maxofcourse 2 года назад

    I don’t know what you’re saying, but you say it so well that I feel like I have to keep watching.

  • @likskirtspleetscreen
    @likskirtspleetscreen 2 года назад

    "...certainly he wasn't lying about the multithreading"... you cracked me

  • @kuhluhOG
    @kuhluhOG 2 года назад

    25:20 I don't know how expensive it is to copy a sphere, but I would currently guess that is the culprit.
    Yes, when accumulate calls the lambda, it copies the sphere.
    As others said, this changed a bit to C++20.

  • @kagonkhan2586
    @kagonkhan2586 2 года назад +3

    I would love for a series revolved around optimization of generic code (not just HAZEL)!

  • @AshenCrowler
    @AshenCrowler 11 месяцев назад

    24:40 is me whenever I read someone else' code these days. I'm so glad I'm not the only one!

  • @siddhartamorionsuarez9017
    @siddhartamorionsuarez9017 10 месяцев назад +1

    14:16 Never seen a programmer dropping so bars so hard

  • @4ndreas_hadj396
    @4ndreas_hadj396 2 года назад +3

    That was a very interesting code review to watch! Man your understanding on this topic is impressive, keep on the quality work

  • @Anglave
    @Anglave 2 года назад +2

    Taking the step to running on the GPU would be very interesting.
    I'd definitely watch a video (or series) of you coding up a more optimized implementation.

  • @daniridao
    @daniridao 2 года назад

    Started following you recently because of a friend recommendation. I'd love to see more optimization videos. Loving your channel!!

  • @Klorel123
    @Klorel123 2 года назад

    i only ever read one book about c++, but never used what i learend in a software.
    i am so impressed, that he is able to grasp what's going on so quickly. reading someone elses code is so hard for me.

  • @thomaskruppi4434
    @thomaskruppi4434 2 года назад +1

    Thank's Cherno! As always, great video.
    Definitely interested to see you further optimize the code and squeeze out a better performance.
    Also I would love to see you porting that to a compute shader.

  • @jackle3002
    @jackle3002 2 года назад

    This has been an excellent bloody episode Cherno! Please continue with this guys project! so so so informative and practical!

  • @asandax6
    @asandax6 2 года назад

    This Videos are amazing. Thanks to watching them I learn a bit of how to optimise my code and I don't even use C++.

  • @rikschaaf
    @rikschaaf 2 года назад +2

    Mixing aggregation with such a deep level of recursion might cause such overhead and for loops are way cheaper it seems, at the very least in C++.

  • @krown9777
    @krown9777 2 года назад

    yess bro iron out this project in another video. love hearing you optimize stuff and seeing this sort of thing from you!! awesome vid man

  • @yubinguo8284
    @yubinguo8284 2 года назад +1

    I would love it if you decide to make an additional optimization video. It is really important to optimize the code while doing project oneself.

  • @MansonMamaril
    @MansonMamaril 2 года назад +1

    "I think I could've WRITTEN a raytracer in that time..."

  • @Thalhammer1996
    @Thalhammer1996 2 года назад +1

    In case someone wonders whats going on: std::accumulate is the wrong function in this case because it assigns the returned value to temp_value on every iteration. So you are copying all the data in HitRecord for every iteration (unlike in the explicit loop). I am not sure why, but it looks like default assignment/constructors dont show up in the profiler. std::optional is really just a bool + sizeof(T) and constructs T inplace. Performance of it should be identical (or better) than using an "invalid state" value.

    • @vale1286
      @vale1286 2 года назад

      You are the second one immediately spotting the problem on the accumulate. Really my bad for not even thinking about it, but that should really have been a raw loop.
      Also, you're spot on on optional. I don't think it can introduce much overhead, but Yan is right in noting that this leads to more branches. But maybe that's because of the design rather than optional itself

    • @Thalhammer1996
      @Thalhammer1996 2 года назад

      @@vale1286 He is right that it adds extra branches but I disagree with him in that it's a problem. The optional is already in cache and nearby pixels have identical results making the if essentially a noop because the branch predictor is gonna hit in most cases. There are a number of things I don't like about the code (e.g. a chunksize of 1 pixel is way to small, since you'll get bottleneck by the queue and most of the shared_ptr could be unique_ptr's or variants) but in case of the optional usage I really think the improvement in code clarity really outweighs the (very minor) performance impact.

    • @vale1286
      @vale1286 2 года назад

      @@Thalhammer1996 that was the main reason I went for optional in the first place, I made a vow to never use function arguments as outputs :)
      For the rest I agree 100%, as I wrote elsewhere I didn't rework the architecture from the book because I wanted to keep this within a weekend.
      Now that I had more time to think I have a few ideas on how to redesign the scene (and variants are one of the things I want to try) so maybe I'll work on it this weekend

    • @Thalhammer1996
      @Thalhammer1996 2 года назад

      @@vale1286 You'll probably wanna get rid of the vast majority of virtual functions. When going after performance you want to give the compiler as much explicit information as possible. Since e.g. the number of materials is gonna be limited (most materials are just a variation of parameters of a base materials) there's no point in using a virtual function there. Instead you want a variant of all material types and std::visit. virtual functions are bad for a number of reasons. The most important being cache misses and preventing inline.

    • @vale1286
      @vale1286 2 года назад

      @@Thalhammer1996 yeah that's exactly why I was thinking about variants. I used visit once in another project and I liked it

  • @neo-mashiro
    @neo-mashiro 2 года назад

    Thank you for delivering such a masterpiece video, I love it so much and learned a lot! I am as motivated as you do, just planning to write my own raytracer using compute shader next month!

  • @Dave_thenerd
    @Dave_thenerd 2 года назад +3

    Reason std::accumulate is slow here:
    -Use of std::shared_ptr (gets copied before C++20)
    -Use of std::optional (requires branching and (often) uses very, very padded memory like types double in size)
    -Lambdas are not always inlined
    -Likely no vectorization because of function call to lambda
    I would also try manual vectorization (with raw arrays or std::vector) out of curiosity. Probably could render like 40+ frames/second.

    • @XiaoGou469
      @XiaoGou469 2 года назад

      also, use AVX vectorization.

  • @bluereino
    @bluereino 2 года назад

    I understand none of the code, but I am intrigued in how well you are able to read through the code and fix things.

    • @ClayWheeler
      @ClayWheeler 2 года назад

      This is exactly how video game developer feel every time anyone reporting a Bug in the game.

  • @matspatpc
    @matspatpc 2 года назад +3

    It would be interesting to see the assembly code from the profiling. I suspect there's something "interesting" happening when calling the lambda function (as someone writing compiler code and looking at code generated a lot, I suspect this is what happens!)

  • @shawn576
    @shawn576 2 года назад

    Very interesting video. I would like to see more videos where code is checked for issues related to speed, crashing, freezing, etc. A lot of us are just hacking stuff together with no real understanding of why problems occur such as a window freezing for 10 seconds while performing an operation.

  • @SaySaeqo
    @SaySaeqo 2 месяца назад +1

    I would love to see how you are optimizing it to its fullest potential

  • @sugar1930
    @sugar1930 2 года назад

    OH my gosh I remember watching you and learning some programming from you when I was still in primary school. You made me pursuit programming and suddenly seeing one of your videos in my feed brought back a ton of memories of me trying to figure out Java as a tiny kid. Now I'm studying cybersecurity and software development, working my way through!

  • @JannisAdmek
    @JannisAdmek 2 года назад

    I would love that additional video, I love optimization/profiling videos!

  • @MicahJohn21659
    @MicahJohn21659 2 года назад

    22:10
    CMake projects generate with a RelWithDebInfo configuration by default, hope that saves you this headache in the future.

  • @Christoph603
    @Christoph603 2 года назад +1

    Optimizations would be great to see, there is so much you can learn out of that!

  • @TroySchrapel
    @TroySchrapel 2 года назад +15

    I think the result has changed. The old code was summing (accumulating) the HitRecord. You're just returning the last one.

    • @TheCherno
      @TheCherno  2 года назад +15

      Summing HitRecords doesn't really make sense - they're structs with no real way to "sum" them. The std::accumulate in the original implementation was simply choosing between two HitRecords and either keeping the old one or replacing the result, depending on the ray hit distance.
      Additionally I ran an image diff after recording this video (just to make sure) - aside from noise and other random scattering built into the renderer, the results are identical.

    • @TroySchrapel
      @TroySchrapel 2 года назад +6

      @@TheCherno Thought it was odd it didn't trigger a compile error. Turns out std::accumulate relies on the lambda to do the accumulating by passing in the accumulated value (temp_value). That value gets passed in, you're meant to add to it and return the accumulated value for each iteration. Obviously, "our" lambda isn't doing that. As you were. :)

    • @TroySchrapel
      @TroySchrapel 2 года назад +7

      @@TheCherno Have done some testing. Using the lambda in a standard for loop gets the same slow result - so it's not std::accumulate. However, changing the lambda to return std::optional() on no hits fixes it. Change it back to return temp_value.. slow. Really strange.

    • @szaszm_
      @szaszm_ 2 года назад +1

      @@TroySchrapel Maybe we're losing time by copying HitRecord all the time? The lambda takes the accumulated value by const ref, then returns it by value, so for every iteration, there is an unnecessary copy of a non-empty std::optional.

    • @TroySchrapel
      @TroySchrapel 2 года назад +3

      @@szaszm_ Yeah. It is. Specifically the shared_ptr of mat_ptr. Replacing that with a regular pointer also fixes the issue. Also, replacing all instances of std::optional with std::shared_ptr fixes the issue.

  • @andreanderson626
    @andreanderson626 2 года назад

    More optimizing of ray tracing. I appreciate how you got the language overhead out of the way and made the logic closer to the metal.

  • @mlecz
    @mlecz 2 года назад

    Fantastic video. I would love to see you optimize step by step this code.
    Great Job Cherno :)

  • @nabeelsherazi8860
    @nabeelsherazi8860 2 года назад

    LOVE the optimization videos, please make more!

  • @MattieBW
    @MattieBW 2 года назад +3

    I love how your recording fps dropped to 1 when you rendered the image lol

  • @CoDEmanX47
    @CoDEmanX47 2 года назад

    Would like to see a follow-up video on optimizing the code, including checking the assembly and if the compiler doesn't make use of SIMD already, then also something about that for the hot path.

  • @archkral
    @archkral 2 года назад

    Hey Cherno, first video I've seen, but I like your approach to code improvement. Hope you kept a similar style since then :D

  • @Omnifarious0
    @Omnifarious0 Месяц назад

    8:59 - There are cases where I might use a unique_ptr in that way, but I agree, in almost all cases I'd prefer vector. Maybe he didn't think of using resize on the vector to pre-allocate all the space or something. Though, there is a problem there because that will insist on initializing all the memory too, which might not be what you want. Though then it seems like you could just use reserve instead and then add the elements one-by-one. Hmmm...

  • @ChrisPepper1989
    @ChrisPepper1989 2 года назад +1

    It's really, really key to use profilers for this kind of thing. More often then not "optimized code" is a completely different beast to what the rest of your project will/should look like.
    You can end up writing some really hard to read/ugly stuff so its really important to only do it in the hot zones. Also "premature optimization" is the route of all evil :p i've seen code that starts using bit shifting and all sorts "to be faster" and then in the very next code block reallocates an array inside a for loop that could have happily lived outside it :p
    But its an interesting art, particularly on GPUs, like you would be surprised how often you can replace an if statement with pure maths that will achieve the same result and will be so much faster without the conditional. (this is true of standard programing too but more applicable in GPU stuff)

  • @matttrimner155
    @matttrimner155 2 года назад

    Yes! You should do an entire series on this! Like the lowest level you can reasonably get. Math and all would be awesome!