Just realised I somehow managed to leave out an absolutely crucial piece of info: the code examples in the second half of the talk are specifically in C#, hence the way it stores structs and classes differently in arrays. In C/C++ the same principle does apply, but it's the difference between an array of classes, vs an array of pointers to classes 🙂
Thank you! Honestly I owe everything I spoke about in this talk to folks who came before me and put the effort into getting the knowledge out there. Casey Muratori, Mike Acton, Andrew Kelley, Jon Blow to name a few 🙂
@@nicbarkeragainSame list for me, however who is Andrew Kelley? If he has any unique DoD content please share. Found the famous Acton 2014 talk in October 2021 and it absolutely changed my life and coding, watched it tens of times. AWESOME channel you have. So sick of OOP slow bloated cross platform BS corporate programming culture. I want to be part of a counterreaction.
@@paulcosta8297Andrew Kelley is the creator of the Zig programming language. He has a Data-Oriented Programming talk that demonstrates practical examples of how to apply the techniques and shows how using those techniques made big performance improvements in the Zig compiler.
I'm not sure why it was recommended either given the topic is pretty niche 😅 Glad you found it interesting though, I started down the rabbit hole myself a while back and it makes me happy to think this video might be a small part of other folks' journey!
Watching this geniunely assuming it had at least 100k views. Super clear and well put together talk that more people need to see. Often people see optimisation as just the annoying thing you do at the end to squeeze an extra bit of performance. However, 90% of the optimisation can be done for free by just not programming it to begin with in a way that pretends the machine's limitations dont exist.
Agree, one of the things that really surprised me when I was first learning to program in a more performant way was: "wow, it's really this easy?". So many performance own-goals in modern software development practise...
Nic, thanks so much for posting such quality content. You have a real talent for teaching and breaking down concepts in fun, easy-to-follow and engaging ways. I‘ve watched lots of talks by Casey Muratori, Jonathan Blow, Mike Acton and the like on similar topics, but your presentation is now the one I would recommend as the best material on DOD on the internet, period. Also, your work on Clay proves you practice what you preach! It‘s a quality piece of software, thank you for your efforts! Wish you the best of luck with growing the channel and I hope you have fun making these videos :)
Thank you so much! Yes these videos are super fun to create, and I've got a much better process for doing it these days. Would you believe that this video was recorded in a single 52 minute take that took me weeks to memorize? 😅
@ I didn‘t realize it was a single take until now 🤯 And yeah, I‘ve given a couple of programming talks, so I have some idea how much work goes into a polished presentation. For example, I was discussing Brian Will‘s „Object Oriented Programming is Bad“ (which you also seem to take some inspiration from?) and even with the source material it took me several weeks to put together something cohesive that meets people where they are (in my company specifically). Anyway, love all of your examples that show actual code. I know it’s really hard to come up with something that isn’t too trivial but also not so complicated that it can’t be understood within a slide or two!
@@guywithcurlyhair I've never seen it, but I have a lot of respect for Brian Will so I'm going to go check it out right now 👍 And yeah the code examples / benchmarks were by far the most time consuming part, as Casey & Prime have been talking about recently, micro benchmarks are notoriously hard to get good results from and it was very fiddly to isolate specific effects that I wanted to demonstrate. I was very lucky that my work paid for me to make this talk because I gave it at the new zealand game developers conference, but doing it just for my personal stuff it's hard to justify putting so much effort into a single video haha
Even though I knew most of things mentioned here, I never checked actual numbers and also found few new things. Thank you for this video. It would be really helpful for me when I started programming and was even now.❤
That's awesome to hear. I was totally radicalised the first time I saw "Data Oriented Design and C++" by Mike Acton. It seriously changed my life. I wish I had been taught that way from the ground up. The whole reason I made this talk in the first place was to give at the new zealand game developers conference, and I was hoping to give some of the younger folks there that same feeling and inspiration.
This is great, one of the best explanations of data-oriented programing/structure optimization that I have seen. I saw this on my homepage, so I think you have been blessed by the algorithm.
I'm always surprised when such a niche topic get recommended to people 😅 glad you found it useful, it took quite a long time for the concepts to become clear and intuitive for me, so in some ways this video was an attempt to help short circuit that process for other people!
With the GTA, the main problem was not 10mb string but the scanf implementation. When they use it to parse number like scanf(«%d… for some reason it was using strlen every time. So basically every scanf was calling strlen and strlen is O(n)
You're right, but the point I was more trying to get across was - likely at initial implementation or even release time it wasn't a noticeable issue (even _with_ calling strlen every single time), and it was only after the string grew in size that the overall performance degraded significantly. Strings are just full of performance foot guns like that which don't seem so bad but can come back to bite you later 🙂
@@nicbarkeragain kinda. I agree you need to avoid strings when you can, but most of the time you use them, so just you need to know how it actually works. More awareness, especially in gamedev
@@delphifeel That is something I 100% agree with. More awareness is so important. Most of the things I mentioned in this talk don't actually make any noticeable difference at the small scale. But the problem is if you don't understand them at all, eventually you'll write something large scale without realising how slow things can get.
9:35 Considering not only Big O of an algo, but also the memory space is good and all, however true gains will come from exploring the question of "How fast can this chef chop an onion?" along several problem-specific dimensions, such as "can the cutting be done by several chefs?" (paralellisation), "is cutting more onions beforehand feasible and more efficient?" (batching/preloading/caching), "does the rate of onions chopped differ if the chef has been at it four hours on end?" (long-term system stability), or even "does it have to be chopping, maybe we can grate onions instead?" (out-of-the-box solutions). Big O will laregely be irrelevant if you know you will only ever need a handful of onions chopped, and startup time will be the only real cost.
The importance of cache locality is such that in many cases, just doing a linear search over continually stored data is often faster than doing a binary search, moreso if there are few elements (by few I mean less than a couple thousands) just because continuous reading is easy to predict and the CPU will prefetch the data correctly
Another name for the out of band technique is Existential Processing, it is explained in Data-Oriented Design by Richard Fabian. Also this book is really really good
Just curious, how do you then quickly know if an enemy is in frenzy? You'd need to check if it's in the swapback array? Or you'd create a separate datastructure to store that information? Seems a bit more annoying to manage right? Or you'd just add that flag in the entity itself, just for convenience, even though that might slow down the code? Loved the video, I always wanted a concrete example of data driven design applied to game code architecture. This is gold. Jblow & Casey would love it.
Since the frenzied enemies are already processed separately, there should hopefully be no reason to check whether they are in frenzy or not. You just do all the things you need to do to the enemies while processing them
@@felixp535 In that case, I think that collision checking would be handled by the process entities function, which could then apply the extra damage to itself and delete the projectile
You raise a great point. Optimisations done this way are far from ergonomic. You'll want to look into ECS game engines like Bevy Game Engine, their Query system of doing things allows you to have higher flexibility.
Great video! This is how one should program from the get-go. I know there are some that would say "Don't optimized at the start!" or whatever....but this IS proper programming, it's just how you should do it because you are now a better programmer. Also, what's your thoughts on the "array of structs" vs "struct of arrays"?
Great to hear, and I agree - I think there is a bit of a myth that performance only matters for high end games. It's important in non game software too!
Amazing channel I just discovered ! What about compiler optimization ? I would expect it to at least the swap variables of a struct for better data alignement
You would expect that, but in C family languages, the answer is not by default 🙂 The term is referred to as "packing" - rust does it by default, and you can use an annotation for C compilers, but yeah it's not a default thing!
Honestly an amazing video. I feel like I have learned so much in 50 minutes, and I don't even do game dev. However I have a quick question for the "out of band" technique, wouldn't it open a lot of complexity into managing how elements are moved between the arrays? I feel like a struct with a state property would be more maintainable in the long run if more states get added, not entire sure though.
Glad you enjoyed it! That type of technique feels a bit intimidating if you haven't used it before, for sure. Funnily enough, what I find happens is often the code actually gets more simple overall when structured like this. You have a functions that do something like processLiveEnemies(array) and processDeadEnemies(array) and the actual code inside each now doesn't need to include any code like `switch (enemy.state) {` etc. You can just write functions that do one thing, which I find end up significantly easier to read and understand. You also don't have to worry about things like, default values of the state enum (should an enemy always be "alive" by default when created? how about "dead'? is there a third state? etc.), and it also gives you immediate access to some useful emergent bookkeeping data (e.g. questions like "how many enemies are currently alive" now become trivial because it's just array.length, you don't need to iterate through enemies and compare states to count them)
@@nicbarkeragain And they are technically ordered by state if you consider each array as a single one. And you can even decide their order by going through an array before another. Very cool.
It's a very useful thing to know, and these principles actually do apply to almost every programming language and environment. Not just C & C++, but garbage collected languages, and dynamic scripting languages too!
Is your Swabback implementation available anywhere? I was thinking about swapbacks as soon as you started talking about the removeall operation but I couldn't quite wrap my head around how to make the bulk removal process performant. Is it really as simple as keeping track of the numberRemoved and swapping the (n - numberRemoved)th element, stopping iteration at (n - numberRemoved) or are there fancy tricks that make the bulk remove faster?
It's very simple to implement yourself, a removeSwapback function would be removeSwapback(index) => array[index] = array[length - 1] 🙂 Bulk processing isn't neccessarily going to improve the speed of a swapback array tremendously, as processing all the swaps one after another will likely be the same speed as processing them individually. The reason it improves the ordered list removal so much, is that you only have to traverse the array, copying and repacking everything once at the end, rather than after every item removal!
This video shows gnerally good techniques on how to approach these scenarios. I have implemented these myself several times. Worth exploring what happens if your data need to be sorted, do you sort every frame? or use an ordered data structure? These scenarios are often left out of scope in this type of talk. Additionally with the out of band change you loose ability to quickly check if the unit is dead or in frenzy mode, you now need to look into a container to see if the entity id is in there. The work to maintain the lists consistent increase as well. Stuff to consider. In other note: I think John Carmack is not the author of the fast inverse squared root algorithm found on Quake III Arena. According It to Wikipedia it was Greg Walsh.
Wouldn't optimizations turned on solve the struct order problem? I'm still new to C, so I don't know how much the compiler can compansate for this sort of thing.
Due to pointer arithmetic, no C compiler will automatically re order structs for you (could break code). There are annotations for some compilers that you can use, but they're opt in 🙂
@nicbarkeragain Thanks, I couldn't imagine something so simple could give such performance increase. Is that what the Crash Bandicoot devs did to be able to fit the game in the disc?
This was absolutely excellent. One question however on the part about alignment at 35:24: Is this not an optimisation that the APT/JIT compiler or runtime allocator would be normally make for you? I.e. the compiled bytecode/machine code would have these struct fields reordered. Or is this structural preservation something specific to C#/Unity?
I rarely comment but this video is so well put together, you are an excellent communicator and set the bar for topics like this Although I hate how u say cache 😂
Great talk! I didn't really get the "avoid last minute decision making". How would you handle enemies changing state with "Out of band" technique? You would need to add and remove enemies from the lists? So if an enemy in enemyNormal list goes into fenzy state, would you remove this enemy from the enemyNormal and add it to the enemyFrenzy? So in other words, if the enemies change their state, you would need to add/remove quite often. I might have misunderstood that example, could you please elaborate on that?
You actually understood correctly 🙂 Often while it may seem like you would be moving data around quite often, it's important to just think about the absolute numbers that you're likely to be dealing with. Even in a game with lots of units at high scale, a frame is only 16ms - a very small amount of time in gameplay terms. It's usually unlikely that you would have more than a single digit number of enemies "changing state" in one single frame, which means the cost of those types of updates would be negligible compared to increased iteration speed that you are benefiting from every single frame.
Glad you hear you found it useful! Also worth checking out the mike acton & andrew kelley videos in the description if you have time, they are awesome!
Such a good resource! Thank you. Below is my latest headscratcher hehe, would love your thoughts as a comment or video. Either if you have insights or just you also find it intriguing :-) SDL3 has a new API "SDL_Properties" that they use to cut down the public facing API (based on "really fast hashmaps"). (Looking at the source for SDL_Process is a good intro!) My gut feeling was "wait what", because ... having what amounts to a C implementation of dynamic dictionaries a la Lua or JS that are heap allocated to make the API surface area less noisy seems ... unintuitive from a C perspective. Why not structs with zero/null as defaults on the stack? I would love your take on it. The obvious answer is that cold codepaths is a good place to optimize for API "leniency". But still. Using a heap hashtable to pass arguments to functions? Does it actually accomplish its goal? Strings as keys - really? All the best! /Ivan
I agree with the others, great talk! I reckon with a better thumbnail and title you would have got a lot more views. Considering that + the duration doesn't make it the most appealing video to click on, but the content itself is really good 👍
Yes, in this case specific to C#. However even in languages like C++ where you can choose to make the distinction between array of classes vs array of pointers to classes, it's important to be aware of the performance tradeoffs of choosing one or the other - especially because using an array of pointers can be very convenient (i.e. pointer to base class for polymorphism, etc)
A tiny insignificant correction, It's a common mistake to believe Moore's law relates to computational power, but it actually relates to the number of transistors on a chip. And I believe it still holds up today.
Ah yes, you're right - thanks for reminding me. I always forget that we've just ended up using it as a proxy for "general compute performance" - whatever that means 😅
dod isnt really new, but i've never really had the time to implement it for something that's .... less specific. like enemies, it seems like during early development you'd run into millions of issues related to how brittle your code is.
Whilst I agree with the most optimizations, the last two do not make sense for me - for one, you need to constantly reshuffle data (enemies) between multiple arrays; for two, the number of states will make these arrays unmanageable even in a small game; and for three, it is way harder to grasp what is going on with all those arrays when programming game logic, as opposed to an enum field in the enemy struct
There are certainly situations where performance isn't going to be your priority, but I think it's important at least to be aware that these kind of techniques exist 🙂
I agree in the majority of cases, however it's worth mentioning that there are certain cases where it's very important to control the specific layout of data in a struct - when the struct itself is larger than 1 cache line. A packed but poorly designed layout in that case can result in two cache misses per iteration which is very painful 😅 But yes in a general sense I agree with you, which is why languages like Rust have adopted it as the default.
You're right - I did think about mentioning it but the video is already quite long, and I wanted to focus on things that can apply in the general case. Understanding the cache and the cost of memory access can still make a big difference even in an environment with a garbage collector.
I don't really think your point at 31:12 is quite accurate. Both classes and structs are stored the exact same way in memory, honestly it looks to me like you just heap-allocated the class instances, but stack-allocated the struct-instances. The only difference between structs and classes is the default visibility of member attributes and functions, whether you're storing a pointer to the instance or the actual instance has to do with how you initialize it, not which keyword you use.
Yes I somehow forgot to mention before the second half that the code is specifically C# (I originally targeted this talk towards Unity devs), which automatically changes the storage type depending on the class / struct distinction. I filmed this talk as a single 52 minute take, and on this particular runthrough I just forgot to mention it 😅 The point still generalises to storing an array of structs / classes vs an array of pointers to structs / classes.
My thought is that the performance metrics and optimization tools don't reflect any of this. Why aren't these issues front and center of any of the tools (that I'm aware of). All those pretty red and bold bar graphs should be this. Only this. Once you've got rid of this, only then should the tools show you other issues. Hammer home that this is first nowadays. You get to optimize graphics once you've finished getting the onions chopped. Don't like chopping onions? Why are you trying to run a restaurant?
There is no conversation to be had about performance benefits of this approach, but there are reasons why not everything is made this way. I like seeing optimization as a topic that is being brought back, but things like keeping reference to an enemy becomes impossible with this and there are so many cases where some things are just cumbersome to do, so it really really depends on the application you are trying to develop. Of course the struct packing is always a good idea, but unfortunately the rest is viable only if you don't care about any inter-dependencies which is sometimes just not possible and the moment you try to push these concepts that are just not compatible with ECS-like systems, you'll start losing the benefits while adding more headaches to your development journey. All in all nice overview, but most of the time I see videos that are "introducing" something and leave out the part why it's not the standard silver bullet.
I mean you're absolutely undeniably correct, but personally for a lot of these optimizations (like struct packing) I thought that the compiler would do it automatically. Great video tho!
Nice. Really. But afaik your diagram of arrays of classes wasn't right unless maybe it's an array of references. This was C++, no? Language choice is a big deal here. Boxed types are bad for cache. A fav example is sorting references then traversing the array.
Sounds silly, but this is a 52 minute talk I recorded in a single take after practicing a few times, and on this particular run through I someone forgot to mention that the examples are in C# 🤦 This talk was originally designed to help Unity devs understand lower level concepts a little better, but yes as you mentioned the same concept applies to array of structs / classes vs array of pointers (boxed)
@@nicbarkeragain Thanks. It really is a masterful job. If you gave it a title like "Total disaster due to dark programming pitfall" and showed a plane crash in the thumbnail, it may get the views it deserves 🙂.
4:12 Dynamic rendering + the new antialiasing they've been adding are so beyond garbage. The upscaling generally reduces clarity far more than lowering resolution, while the new antialiasing leave artifacts far worse than base aliasing. These settings only have been included in games in the last few years and they are the default settings in general. I just want fxaa + low model + high texture. Only settings I'll play on a game.
I certainly think they've created the wrong kind of incentives for developers, for sure. I think they're used as a bit too much of an "escape hatch" if you get to ship time and you're only hitting 10fps at 4k 😅
Just realised I somehow managed to leave out an absolutely crucial piece of info: the code examples in the second half of the talk are specifically in C#, hence the way it stores structs and classes differently in arrays. In C/C++ the same principle does apply, but it's the difference between an array of classes, vs an array of pointers to classes 🙂
Time to discuss record struct in this optimization scenario!
This video/channel are criminally underrated
I'm more just happy that people are enjoying the videos, the numbers don't matter 🙂
Was that a combination of a compliment and a jab of the grammatical error referencing plural “knife”? 😂
@@larsoevlisen hahaha 110% a compliment! the level of quality and presentation are superb!
This video is criminally under-viewed. It should be seen by all programmers.
Thank you! Honestly I owe everything I spoke about in this talk to folks who came before me and put the effort into getting the knowledge out there. Casey Muratori, Mike Acton, Andrew Kelley, Jon Blow to name a few 🙂
@@nicbarkeragainSame list for me, however who is Andrew Kelley? If he has any unique DoD content please share. Found the famous Acton 2014 talk in October 2021 and it absolutely changed my life and coding, watched it tens of times. AWESOME channel you have. So sick of OOP slow bloated cross platform BS corporate programming culture. I want to be part of a counterreaction.
@@paulcosta8297Andrew Kelley is the creator of the Zig programming language. He has a Data-Oriented Programming talk that demonstrates practical examples of how to apply the techniques and shows how using those techniques made big performance improvements in the Zig compiler.
Firstly I got recommend your video about Clay. Now RUclips pushed this gem to me. Well done, Nic!
Glad you enjoyed it! This one took a lot of effort to make 😁
@@nicbarkeragain and it was well worth it
I absolutely need more talks like this from you on other game engine architecture topics.
I'm definitely planning a lot more educational videos like this in the new year 😁
I'm not sure why RUclips recommended me this, but now I think I've found a new interest. Thank you so much for the video!
By "found a new interest", I mean learning how deep the optimisation rabbit hole goes. Also really fun to understand how the code is actually run.
I'm not sure why it was recommended either given the topic is pretty niche 😅 Glad you found it interesting though, I started down the rabbit hole myself a while back and it makes me happy to think this video might be a small part of other folks' journey!
I have a feeling this channel is about to get a lot of foot traffic - the quality is insane
Thank you, I'm glad you enjoyed the video 🙂
Watching this geniunely assuming it had at least 100k views. Super clear and well put together talk that more people need to see. Often people see optimisation as just the annoying thing you do at the end to squeeze an extra bit of performance. However, 90% of the optimisation can be done for free by just not programming it to begin with in a way that pretends the machine's limitations dont exist.
Agree, one of the things that really surprised me when I was first learning to program in a more performant way was: "wow, it's really this easy?". So many performance own-goals in modern software development practise...
This was gold, the perfect density of information and pace for me, so good
Glad you enjoyed it! The pacing was something I put a lot of work into, so I'm glad it shows.
Nic, thanks so much for posting such quality content. You have a real talent for teaching and breaking down concepts in fun, easy-to-follow and engaging ways.
I‘ve watched lots of talks by Casey Muratori, Jonathan Blow, Mike Acton and the like on similar topics, but your presentation is now the one I would recommend as the best material on DOD on the internet, period.
Also, your work on Clay proves you practice what you preach! It‘s a quality piece of software, thank you for your efforts!
Wish you the best of luck with growing the channel and I hope you have fun making these videos :)
Thank you so much! Yes these videos are super fun to create, and I've got a much better process for doing it these days. Would you believe that this video was recorded in a single 52 minute take that took me weeks to memorize? 😅
@ I didn‘t realize it was a single take until now 🤯
And yeah, I‘ve given a couple of programming talks, so I have some idea how much work goes into a polished presentation.
For example, I was discussing Brian Will‘s „Object Oriented Programming is Bad“ (which you also seem to take some inspiration from?) and even with the source material it took me several weeks to put together something cohesive that meets people where they are (in my company specifically).
Anyway, love all of your examples that show actual code. I know it’s really hard to come up with something that isn’t too trivial but also not so complicated that it can’t be understood within a slide or two!
@@guywithcurlyhair I've never seen it, but I have a lot of respect for Brian Will so I'm going to go check it out right now 👍
And yeah the code examples / benchmarks were by far the most time consuming part, as Casey & Prime have been talking about recently, micro benchmarks are notoriously hard to get good results from and it was very fiddly to isolate specific effects that I wanted to demonstrate. I was very lucky that my work paid for me to make this talk because I gave it at the new zealand game developers conference, but doing it just for my personal stuff it's hard to justify putting so much effort into a single video haha
This is actually a really well thought out explanation of performance optimization that was concise and clear as day. Thank you for making this.
Thanks, I think it took about 6 months in total from starting to collect notes to building the final talk, so I'm glad it came out well 🙂
Outstanding.
Thank you!
The kitchen analogy is the best analogy I have ever seen and I'll use it from now on every time I'm describing anything happening in a cpu.
Even though I knew most of things mentioned here, I never checked actual numbers and also found few new things. Thank you for this video. It would be really helpful for me when I started programming and was even now.❤
That's awesome to hear. I was totally radicalised the first time I saw "Data Oriented Design and C++" by Mike Acton. It seriously changed my life. I wish I had been taught that way from the ground up. The whole reason I made this talk in the first place was to give at the new zealand game developers conference, and I was hoping to give some of the younger folks there that same feeling and inspiration.
@nicbarkeragain That is really cool.
This is great, one of the best explanations of data-oriented programing/structure optimization that I have seen.
I saw this on my homepage, so I think you have been blessed by the algorithm.
I'm always surprised when such a niche topic get recommended to people 😅 glad you found it useful, it took quite a long time for the concepts to become clear and intuitive for me, so in some ways this video was an attempt to help short circuit that process for other people!
There are no words that would adequately describe how great this video is
So glad to hear you enjoyed it! I owe so much to the people who's videos taught me - Mike Acton, Casey Muratori, Andrew Kelley, Jon Blow etc
With the GTA, the main problem was not 10mb string but the scanf implementation. When they use it to parse number like scanf(«%d… for some reason it was using strlen every time. So basically every scanf was calling strlen and strlen is O(n)
You're right, but the point I was more trying to get across was - likely at initial implementation or even release time it wasn't a noticeable issue (even _with_ calling strlen every single time), and it was only after the string grew in size that the overall performance degraded significantly. Strings are just full of performance foot guns like that which don't seem so bad but can come back to bite you later 🙂
@@nicbarkeragain kinda. I agree you need to avoid strings when you can, but most of the time you use them, so just you need to know how it actually works. More awareness, especially in gamedev
@@delphifeel That is something I 100% agree with. More awareness is so important. Most of the things I mentioned in this talk don't actually make any noticeable difference at the small scale. But the problem is if you don't understand them at all, eventually you'll write something large scale without realising how slow things can get.
100 percent
9:35 Considering not only Big O of an algo, but also the memory space is good and all, however true gains will come from exploring the question of "How fast can this chef chop an onion?" along several problem-specific dimensions, such as "can the cutting be done by several chefs?" (paralellisation), "is cutting more onions beforehand feasible and more efficient?" (batching/preloading/caching), "does the rate of onions chopped differ if the chef has been at it four hours on end?" (long-term system stability), or even "does it have to be chopping, maybe we can grate onions instead?" (out-of-the-box solutions). Big O will laregely be irrelevant if you know you will only ever need a handful of onions chopped, and startup time will be the only real cost.
Amazing video, very informative, well presented and inspirational, thank you!
Thank you! My programming career was incredibly influenced by great talks that I saw, so I hope I can pass that tradition on to others.
Awesome video! Very clean display of info
Glad you enjoyed it!
The importance of cache locality is such that in many cases, just doing a linear search over continually stored data is often faster than doing a binary search, moreso if there are few elements (by few I mean less than a couple thousands) just because continuous reading is easy to predict and the CPU will prefetch the data correctly
Always start with a good old linear scan, build an acceleration structure if it gets too slow 😁
Great talk! And I think a wonderful introduction to the topic.
Love the restaurant metaphore!
Thank you! I'm surprised at how much mileage I got out of it haha
what an amazing video, thank you very much for this !!
Thank you for putting this video together!
Great stuff! Deserves way more views!
You're welcome, great that you got something useful out of it 🙂
Another name for the out of band technique is Existential Processing, it is explained in Data-Oriented Design by Richard Fabian. Also this book is really really good
Thanks for the recommendation, I'll have to check the book out!
Another winner! Thanks Nic👍
Glad you like it 😁 this one took a very long time to make (probably 80+ hours just to research and write the talk alone)
The detailed and in-depth explanation was impressive, truly mind-blowing.
So freaking good. Thank you this is awesome.
Thanks Will, glad you enjoyed it.
Now it clicked for me - i watched andrew kellys "Data oriented design in practice", but this video presenta the idea much better!
That talk was a huge inspiration for this video!
Absolute gem, great to find you in this mess.
Just curious, how do you then quickly know if an enemy is in frenzy? You'd need to check if it's in the swapback array? Or you'd create a separate datastructure to store that information? Seems a bit more annoying to manage right? Or you'd just add that flag in the entity itself, just for convenience, even though that might slow down the code?
Loved the video, I always wanted a concrete example of data driven design applied to game code architecture. This is gold. Jblow & Casey would love it.
Since the frenzied enemies are already processed separately, there should hopefully be no reason to check whether they are in frenzy or not. You just do all the things you need to do to the enemies while processing them
@@riku5543 Sure but what if later in production designers make it so there is a weapon or spell that deals +25% damage to frenzy units?
@@felixp535 In that case, I think that collision checking would be handled by the process entities function, which could then apply the extra damage to itself and delete the projectile
You raise a great point. Optimisations done this way are far from ergonomic. You'll want to look into ECS game engines like Bevy Game Engine, their Query system of doing things allows you to have higher flexibility.
Really enjoyed this talk. Thanks for posting!
Thanks for watching it!
Great video! This is how one should program from the get-go. I know there are some that would say "Don't optimized at the start!" or whatever....but this IS proper programming, it's just how you should do it because you are now a better programmer.
Also, what's your thoughts on the "array of structs" vs "struct of arrays"?
enjoyed this video. thanks
This is incredible stuff! Insightful for non gamedev too.
Great to hear, and I agree - I think there is a bit of a myth that performance only matters for high end games. It's important in non game software too!
Awesome video!
Amazing channel I just discovered !
What about compiler optimization ?
I would expect it to at least the swap variables of a struct for better data alignement
You would expect that, but in C family languages, the answer is not by default 🙂 The term is referred to as "packing" - rust does it by default, and you can use an annotation for C compilers, but yeah it's not a default thing!
amazing video, you have earned a subscriber
Great to hear that you enjoyed it! It took a very long time to design, research and memorise so I'm glad people are getting some value out of it haha
This talk was immensely helpful.
Thank you very much for putting it on RUclips, as I can now refresh my memory every so often.
Glad you enjoyed it 🙂
Honestly an amazing video. I feel like I have learned so much in 50 minutes, and I don't even do game dev. However I have a quick question for the "out of band" technique, wouldn't it open a lot of complexity into managing how elements are moved between the arrays? I feel like a struct with a state property would be more maintainable in the long run if more states get added, not entire sure though.
Glad you enjoyed it! That type of technique feels a bit intimidating if you haven't used it before, for sure. Funnily enough, what I find happens is often the code actually gets more simple overall when structured like this. You have a functions that do something like processLiveEnemies(array) and processDeadEnemies(array) and the actual code inside each now doesn't need to include any code like `switch (enemy.state) {` etc. You can just write functions that do one thing, which I find end up significantly easier to read and understand. You also don't have to worry about things like, default values of the state enum (should an enemy always be "alive" by default when created? how about "dead'? is there a third state? etc.), and it also gives you immediate access to some useful emergent bookkeeping data (e.g. questions like "how many enemies are currently alive" now become trivial because it's just array.length, you don't need to iterate through enemies and compare states to count them)
@@nicbarkeragain And they are technically ordered by state if you consider each array as a single one. And you can even decide their order by going through an array before another. Very cool.
Thanks a lot, that's mind changing
Glad you found it useful!
I was just reading about this in 100 mistakes in Go. About locality of reference and predictability.
It's a very useful thing to know, and these principles actually do apply to almost every programming language and environment. Not just C & C++, but garbage collected languages, and dynamic scripting languages too!
Is your Swabback implementation available anywhere?
I was thinking about swapbacks as soon as you started talking about the removeall operation but I couldn't quite wrap my head around how to make the bulk removal process performant.
Is it really as simple as keeping track of the numberRemoved and swapping the (n - numberRemoved)th element, stopping iteration at (n - numberRemoved) or are there fancy tricks that make the bulk remove faster?
It's very simple to implement yourself, a removeSwapback function would be removeSwapback(index) => array[index] = array[length - 1] 🙂
Bulk processing isn't neccessarily going to improve the speed of a swapback array tremendously, as processing all the swaps one after another will likely be the same speed as processing them individually.
The reason it improves the ordered list removal so much, is that you only have to traverse the array, copying and repacking everything once at the end, rather than after every item removal!
This video shows gnerally good techniques on how to approach these scenarios. I have implemented these myself several times.
Worth exploring what happens if your data need to be sorted, do you sort every frame? or use an ordered data structure? These scenarios are often left out of scope in this type of talk.
Additionally with the out of band change you loose ability to quickly check if the unit is dead or in frenzy mode, you now need to look into a container to see if the entity id is in there. The work to maintain the lists consistent increase as well. Stuff to consider.
In other note: I think John Carmack is not the author of the fast inverse squared root algorithm found on Quake III Arena. According It to Wikipedia it was Greg Walsh.
Maintain a separate redundant datastructure for existence checks. Data duplication is a huge principle in DoD
Wouldn't optimizations turned on solve the struct order problem? I'm still new to C, so I don't know how much the compiler can compansate for this sort of thing.
Due to pointer arithmetic, no C compiler will automatically re order structs for you (could break code). There are annotations for some compilers that you can use, but they're opt in 🙂
@nicbarkeragain Thanks, I couldn't imagine something so simple could give such performance increase. Is that what the Crash Bandicoot devs did to be able to fit the game in the disc?
What about if you used recursion...
This was absolutely excellent. One question however on the part about alignment at 35:24: Is this not an optimisation that the APT/JIT compiler or runtime allocator would be normally make for you? I.e. the compiled bytecode/machine code would have these struct fields reordered. Or is this structural preservation something specific to C#/Unity?
C family languages will not automatically pack structs for you, but other languages will (Rust for example, I'm not sure about java)
That's good to know, thank you for the response :)
I rarely comment but this video is so well put together, you are an excellent communicator and set the bar for topics like this
Although I hate how u say cache 😂
Tomayto, tomahto 😂
Great talk! I didn't really get the "avoid last minute decision making". How would you handle enemies changing state with "Out of band" technique? You would need to add and remove enemies from the lists? So if an enemy in enemyNormal list goes into fenzy state, would you remove this enemy from the enemyNormal and add it to the enemyFrenzy? So in other words, if the enemies change their state, you would need to add/remove quite often. I might have misunderstood that example, could you please elaborate on that?
You actually understood correctly 🙂 Often while it may seem like you would be moving data around quite often, it's important to just think about the absolute numbers that you're likely to be dealing with. Even in a game with lots of units at high scale, a frame is only 16ms - a very small amount of time in gameplay terms. It's usually unlikely that you would have more than a single digit number of enemies "changing state" in one single frame, which means the cost of those types of updates would be negligible compared to increased iteration speed that you are benefiting from every single frame.
damn good stuff
Thanks!
At the garbage of multiple useless programming videos, it's a diamond. In one hour explained so much about how cpu works ❤
Glad you hear you found it useful! Also worth checking out the mike acton & andrew kelley videos in the description if you have time, they are awesome!
intead of using structs as larges
a talk about Zig would be nice ;)
I would have to learn a lot more about it first! I've used it and understand the basics, but there's a lot of depth there.
Such a good resource! Thank you. Below is my latest headscratcher hehe, would love your thoughts as a comment or video. Either if you have insights or just you also find it intriguing :-)
SDL3 has a new API "SDL_Properties" that they use to cut down the public facing API (based on "really fast hashmaps").
(Looking at the source for SDL_Process is a good intro!)
My gut feeling was "wait what", because ... having what amounts to a C implementation of dynamic dictionaries a la Lua or JS that are heap allocated to make the API surface area less noisy seems ... unintuitive from a C perspective. Why not structs with zero/null as defaults on the stack?
I would love your take on it. The obvious answer is that cold codepaths is a good place to optimize for API "leniency". But still. Using a heap hashtable to pass arguments to functions? Does it actually accomplish its goal? Strings as keys - really?
All the best!
/Ivan
I didn't even realize this video was an hour long, that felt like 20 minutes
Time flies when you're having fun 😁
I agree with the others, great talk! I reckon with a better thumbnail and title you would have got a lot more views. Considering that + the duration doesn't make it the most appealing video to click on, but the content itself is really good 👍
All technical stuff about class vs struct is super specific to language.
Yes, in this case specific to C#. However even in languages like C++ where you can choose to make the distinction between array of classes vs array of pointers to classes, it's important to be aware of the performance tradeoffs of choosing one or the other - especially because using an array of pointers can be very convenient (i.e. pointer to base class for polymorphism, etc)
A tiny insignificant correction, It's a common mistake to believe Moore's law relates to computational power, but it actually relates to the number of transistors on a chip. And I believe it still holds up today.
Ah yes, you're right - thanks for reminding me. I always forget that we've just ended up using it as a proxy for "general compute performance" - whatever that means 😅
dod isnt really new, but i've never really had the time to implement it for something that's .... less specific. like enemies, it seems like during early development you'd run into millions of issues related to how brittle your code is.
Whilst I agree with the most optimizations, the last two do not make sense for me - for one, you need to constantly reshuffle data (enemies) between multiple arrays; for two, the number of states will make these arrays unmanageable even in a small game; and for three, it is way harder to grasp what is going on with all those arrays when programming game logic, as opposed to an enum field in the enemy struct
There are certainly situations where performance isn't going to be your priority, but I think it's important at least to be aware that these kind of techniques exist 🙂
@nicbarkeragain in these situations I wonder why compiler won't at least suggest these optimizations
They need to start using Analog CPUs
You need to tell all of this to RIOT GAMES DEVELOPERS.
wow. what I have watched
Welcome to the dark side 😎
Declaration order space optimizations should be performed by the compiler, not the programmer.
I agree in the majority of cases, however it's worth mentioning that there are certain cases where it's very important to control the specific layout of data in a struct - when the struct itself is larger than 1 cache line. A packed but poorly designed layout in that case can result in two cache misses per iteration which is very painful 😅
But yes in a general sense I agree with you, which is why languages like Rust have adopted it as the default.
all concepts alpply but using language with GC is already a huge problem, kinda surprised you didnt mention it.
You're right - I did think about mentioning it but the video is already quite long, and I wanted to focus on things that can apply in the general case. Understanding the cache and the cost of memory access can still make a big difference even in an environment with a garbage collector.
I don't really think your point at 31:12 is quite accurate. Both classes and structs are stored the exact same way in memory, honestly it looks to me like you just heap-allocated the class instances, but stack-allocated the struct-instances. The only difference between structs and classes is the default visibility of member attributes and functions, whether you're storing a pointer to the instance or the actual instance has to do with how you initialize it, not which keyword you use.
Yes I somehow forgot to mention before the second half that the code is specifically C# (I originally targeted this talk towards Unity devs), which automatically changes the storage type depending on the class / struct distinction. I filmed this talk as a single 52 minute take, and on this particular runthrough I just forgot to mention it 😅
The point still generalises to storing an array of structs / classes vs an array of pointers to structs / classes.
My thought is that the performance metrics and optimization tools don't reflect any of this. Why aren't these issues front and center of any of the tools (that I'm aware of). All those pretty red and bold bar graphs should be this. Only this. Once you've got rid of this, only then should the tools show you other issues. Hammer home that this is first nowadays. You get to optimize graphics once you've finished getting the onions chopped. Don't like chopping onions? Why are you trying to run a restaurant?
There is no conversation to be had about performance benefits of this approach, but there are reasons why not everything is made this way. I like seeing optimization as a topic that is being brought back, but things like keeping reference to an enemy becomes impossible with this and there are so many cases where some things are just cumbersome to do, so it really really depends on the application you are trying to develop. Of course the struct packing is always a good idea, but unfortunately the rest is viable only if you don't care about any inter-dependencies which is sometimes just not possible and the moment you try to push these concepts that are just not compatible with ECS-like systems, you'll start losing the benefits while adding more headaches to your development journey. All in all nice overview, but most of the time I see videos that are "introducing" something and leave out the part why it's not the standard silver bullet.
If you don't know how to parse a 10MB string fast then you should question your abilities.
I mean you're absolutely undeniably correct, but personally for a lot of these optimizations (like struct packing) I thought that the compiler would do it automatically. Great video tho!
Nice. Really. But afaik your diagram of arrays of classes wasn't right unless maybe it's an array of references. This was C++, no? Language choice is a big deal here. Boxed types are bad for cache. A fav example is sorting references then traversing the array.
Sounds silly, but this is a 52 minute talk I recorded in a single take after practicing a few times, and on this particular run through I someone forgot to mention that the examples are in C# 🤦 This talk was originally designed to help Unity devs understand lower level concepts a little better, but yes as you mentioned the same concept applies to array of structs / classes vs array of pointers (boxed)
@@nicbarkeragain Thanks. It really is a masterful job. If you gave it a title like "Total disaster due to dark programming pitfall" and showed a plane crash in the thumbnail, it may get the views it deserves 🙂.
4:12 Dynamic rendering + the new antialiasing they've been adding are so beyond garbage. The upscaling generally reduces clarity far more than lowering resolution, while the new antialiasing leave artifacts far worse than base aliasing. These settings only have been included in games in the last few years and they are the default settings in general. I just want fxaa + low model + high texture. Only settings I'll play on a game.
I certainly think they've created the wrong kind of incentives for developers, for sure. I think they're used as a bit too much of an "escape hatch" if you get to ship time and you're only hitting 10fps at 4k 😅
Your eyes look like stones.