Unity Code Optimization - Do you know them all?
HTML-код
- Опубликовано: 22 май 2024
- Find what common Unity optomizations truly make a difference. In this video I go over a bunch of interesting optimization tips as well and give you my recommendations on each.
Keep your code running fast with as little garbage allocation as possible by learning from these important concepts.
Benchmarks (let me know if you get weird results!): tarodev.itch.io/unity-benchmarks
Code: github.com/Matthew-J-Spencer/...
Linq garbage allocations: www.jacksondunstan.com/articl...
Some topics covered:
SendMessage, Vector3.Distance, Find Objects, NonAlloc, Camera.main, Linq vs loop and Stringbuilder
❤️ Become a Tarobro on Patreon: / tarodev
=========
🔔 SUBSCRIBE: bit.ly/3eqG1Z6
🗨️ DISCORD: / discord
✅ MORE TUTORIALS: / tarodev
0:00 intro
0:12 SendMessage
1:22 Extern call caching
3:02 Vector3.Distance vs SqrMagnitude
4:20 Find Objects
6:44 NonAlloc
9:34 Camera.main
10:51 Linq vs loops
13:16 Stringbuilder
14:20 Order of operations
Interesting results! Whenever I teach optimization, I always begin with "Measure, don't guess". Measure for your platform. Measure in real world situations. Start with readable code, measure and then fix to meet a frame budget. Everything else is premature optimisation because you can't guess what optimisations the compiler will do.
I like this and fully agree.
There's a saying... you get what you measure.
You can pretty much always guess that arrays iteration is faster than linked list.
And pretty much guess that GetComponent in Update is kinda bad
at the example between Vector3.Distance vs SqrMagnitude have u try ((Random.insideUnitSphere - Random.insideUnitSphere) * MULTIPLIER).sqrMagnitude || i think it is a bit faster || 1 multiplier and 1 subdivisor faster than 2 multiplier and 1 subdivisor right ? (this is my opinion)
In my case I'm working with a compiler that literally doesn't optimize anything, this creates code that runs several orders of magnitude slower than a regular monobehavior.
Love to see actual tests instead of people just repeating the same things they heard elsewhere. Good job
"Actual" means on different devices with different CPU architectures and different compiler options?
your clients (means gamers) are playing in unity editor? If they so, you could apply on that "tests". Anybody actually developed at least one game says that you must check optimization in release build on device only
Regarding the Vector3.Distance function, you're also making 2 calls to Random.insideUnitSphere, which definitely makes 1 call to sin and 2 to cos, which are also both expensive operations. But then you're calling it twice. The 6 calls to sin/cos probably dwarf the time it takes to make 1 sqrt call. That's probably why sqrt and sqrtSquared about the same time, cause the 6 trigonometric computations are taking over.
You're right. The simplified code I posted on screen shows more predictable results, but even then... not enough for me to use the less readable option. (900k iterations, 6ms and 4ms)
Also you should be comparing it to MIN_DIST*MIN_DIST
@@television1088 if we cache MIN_DIST*MIN_DIST it will be the same thing
@@davibergamin5943 That's not something that would ever be cached except for artificially improving benchmarking.
@@Tarodev Your simplified code is still hiding the call to operator- which *may* have an impact on performances, even if Vector3 is stack-allocated, it can trigger a call instruction to operator- and constructor. By manually writing a Vector3.SqrDistance( Vector3 a, Vector3 b ) you can have a ~40% gain in performances
Really nice insight on code performace! I would recommend having an "average" result for each test in miliseconds, so the more you test you get a stable number to compare.
Damn! Why didn't I think of that. Great suggestion
Hey Tarodex, avid fan here.
I've been studying game dev in a local college here (its unlike US colleges) and while we learned alot of game oriented programming, your more "classic" programming stuff is just enormously helpful because you stay so relevant to video game programming.
Really wanted to thank you for that, it's super enriching and I learn alot from your videos.
If you ever make a series that's more in depth programming then getting components while still staying relevant to video game programming. You have a fan waiting.
Thx alot for sharing all that knowledge my friend!
The order of operations bit is really handy, it's really impressive to see the difference that just reorganizing your math can make on computing time. Thanks!
I love a good caesh!
Speaking of - I've heard the idea of a "Runtime Set" in a few talks regarding scriptable objects. I think it's a very nice alternative to finding objects by/of
Super interesting! Thanks for taking the time to put this together and share it 😊.
I was just researching for it and you made it just in time! Thanks!
SUCH a gem of a video! Currently makiny my game on Early Access and this helps. Glad to use all recommended functions already ^_^ Thank you for sharing!
This unity scene is so good for teaching these optimisations... Great quality content as always! Keep going, thanks for your work!
Super helpful tests, would love to see more this
Certainly didn't know a lot of these and they are really useful! The time complexity test also helps to understand your point quite a bit more, rather than simply trusting your word.
This is great! Hope you make more of these optimization tests in the future.
Last one completely caught me off-guard. It'll be hard to erase a 10 year old habit! Great video btw, thx.
Love this type of videos. Code optimization in Unity isnt covered enough on RUclips
I LOVED this video, I just started Unity and you are my favorite channel so far.
Quality content focused on Unity (not generic programming stuff) and Game Dev.
I did not even knew about Unity making external call to C++
I would be interested in a follow up video focusing on differences across platforms. Maybe look at Browser vs Windows vs Mobile.
Great video! Just remember everyone, optimization is the last thing to do! Do not try to optimize everything that you make at the start. I always suggest to make a sloppy prototype and re-write an optimized version of whole code later on.
I cannot heart this comment enough. I was planning on adding a prefix to this video regarding pre-optimization but I guess it slipped my mind.
@@Tarodev Thanks for the heart :P
I disagree, its the same as doing multiplayer games, its something that is easier/faster/better to do since the beginning.
@@umapessoa6051 I disagree. It's not the same. Multiplayer sets architectural constraints on your code, so you should do at start. Optimization should be as a result of measurement. You cannot guess what the compiler will do.
@@RobLang when your game grow big enough its pretty hard to change the base of your code to optimize it, you'll probably lose a lot of time doing it, you can't predict what the compiler will do but you can know for sure that some functions/ways of doing things shouldn't be used at all as there's more optimized options.
Great video! It's especially good to see the StringBuilder hype confirmed. I'd love to see the performance of the string Contains() method compared to alternatives like IndexOf(), RegExp, and Linq.
This is really interesting to see! I've been talking about a few of these recently so it's good to have a clear example to point to!
Thank you for an excellent video!
I would be interested to see you run these tests in a build and not in the editor. I find the performance is usually different once you build and run.
Super helpful!! You put so much thought into this. You are killing it.
Being new to Unity this was fantastic. It also provided some context to how expensive calls actually were. Needless to say I will avoid the game object and type find calls like crazy!
I just found your channel and I love it. Thank you!
Code optimization is one of the things I planned to improve this month. VERY BIG THANKS!!
In regards to the NonAlloc, I've done some testing with this and it can yield significant performance improvements depending on how you use it. For example, if you perform a OverlapSphere and there is 100 colliders then it will process 100 colliders. However, if you use OverlapSphereNonAlloc and set it to 10 then it will only process 10 colliders and disregard the rest. This can be handy, for example, if you want to improve performance at the cost of accuracy.
EDIT 1: Just had a go at your WEBGL build. Using Vector.Distance is almost twice as slow for me with 900k loops (12ms for Distance vs 7ms for Square Root).
EDIT 2: Just had a go with the order of operation. Even with 900k loops I would almost always get 6ms for each one. But every now and then FxFxV would jump to 9ms whilst the other two would stay on 6ms or jump to 7ms. For the most part, I don't think order of operation has a significant impact.
The NonAlloc collider limit is great. I use it for grounding and use an array of size 1.
As for the results... It seems WebGL is just insanely varied from pc to pc and browser to browser. My webgl distance comes out to 6ms and 4ms, so sqrMagnitude wins, but not by enough to not use the far more readable Vector3.Distance. Order of operation comes out identical for me on both webgl and il2cpp.
@@Tarodev I would guess that Order Of Operation might be one of those things the compiler can detect and optimize away.
@@unitydev457 that's actually a test I'd like to run... If there's 1000 colliders and you limit it to 10, how much faster is that
this is awesome, I love this kind of low level things to improve code. thanks and you're welcome to do more videos like this
Thanks for sharing. Even if you have an idea of what is potentially better its nice to see some testing results. The small things like VxFxFx make sense if I think about it - but I doubt I would have even thought about it until you mentioned it. Remember using strings in my item tooltip , and switched over to stringbuilder - never looked back!
You did all what we was wondering about. Thanks for this video!
-good UI scene btw.
Better than looking at some boring numbers in a console :)
If foreach is faster in your tests, that's actually because it is implemented on a different assembly which has been compiled to release mode, whereas your for loop is in build mode which is generally way slower. This is why when compiled in webgl the for loop is actually faster!
I was wondering how a for loop could be slower than a foreach as well
And actually foreach generate more garbage Collection
been looking for a video like this, taro you champ
Great video! Thanks, I was surprised with the results!
finally I can focus on just writing simple working code rather than constantly stressing by thinking about ways to make it performance efficient. the amount of emphasize regarding using some functions in docs & web is too much, that if you do this then it's slower. thanks for making this awesome test 🔥
This one is great to know for performance and not many use this properly:
Caching WaitForSeconds and other yield return calls (like WaitForEndFrame etc) in a coroutine!
You can cache the variable as private WaitForSeconds = new WaitForSeconds (1f);
Without doing this you'll add stuff to the garbage collector every so often which can add up!
Bonus nitpick: you can change both transform.rotation and .position by calling setPositionAndRotation, then the calls are merged into 1?
I remember years ago when I first decompiled WaitForSeconds to discover it being a class and not a struct. I was shocked and began caching them ever since. Nice tip!
Also yup, just decompiled setPositionAndRotation and it is indeed just one extern call.
Be very careful with this. You may be tempted to make central area with different cached wait for seconds. Or maybe you would make a dictionary for caching this too.
The problem with that is if multiple coroutines are using the same wait for seconds, the timing will be completly wrong due to the internal timer being reset when the routines exit. I find this often at my job.
Fascinating, never knew transform wasn't a direct property and that behind the scenes it was going off and doing stuff, another quick win for caching
Useful comparisons, thanks for the video!
Man, you're rocks! Thank You for your course !!! I've learned so much!!!
Thank you sir. I will need to make a quick reference for these. The last one was unexpected as I didn't think the order actually mattered.
this is useful. thanks. I would like to see you test optimization on more code.
I havent watched the video yet but I'm sure it will enhance my knowledge and benefit me. Thanks, please keep this quality content.
Dude, yes, love vids like these and you're like the only one on the planet doing them. Much love.
(cries in cache) Thanks for the great video dude, I'd love to see more videos like this with more unity specific things, like nesting GameObjects vs. having them live free and wild in the hierarchy, nesting canvases, and other best practices for performance
Awesome one !
Great refresher :D
Bro. Im actually watching all your videos and they all are just too good for someone like me. Like seriously. I never comment under videos but like litteraly what you show is so damn useful lmao thanks for everything
Would like to see interpolated string performance added in the string builder list.
Order of operation is the only one I did not know about. Glad you added it too.
God DAMN this is exactly my kinda video. You're incredible!! Thanks for the hard work
Really liked this format for the video
i have no idea why, but hearing your voice just makes feel soothed.
Awwww ❤️
Awesome video. Im waiting for ECS comparison
Great video! Haven't known them all so thanks for the knowledge!
About the cached for-loop, I'll add and say that it is faster because the non-cached version, List.Count invokes the get method on the Count property each iteration
I'm not certain, but I remember back in my C# Business programming days running benchmarks in debug mode rather than prod builds gave significantly different results - and I suspect that's what you're seeing with your WebGl build
the compiler can't optimise so well when it is doing a debug build -
At the time (a few releases of C# ago!) for example, Linq was massively slower in a prod build but much more competitive (than a for each) an debug - however, subsequent releases improved the Linq significantly.
The long and the short is - don't optimise or measure on a dev build
Also - on the StringBuilder front - it's really insignificant for almost all real world cases - especially something like a game - I'd wager that concatenating the word 'score' with score.toString() 60 times a second makes almost zero impact on a prod build...
You're right about the editor/debug build. I made build using il2cpp to test. Obviously every result is much quicker, but here are the tests which did backflips:
Order of operation ends up the same across the board
SqrMagnitude actually does come out a bit faster (Vector3.Distance: 18. sqrMagnitude: 13. 900k iterations)
Nice work, thank you a lot for this video!
This is a great video, and some of your results make me feel better about optimizing some of my code; it can actually make a difference!
In regard to NonAlloc methods, I believe the Unity 2022 documentation states that they will be deprecated. Instead, something like Physics2D.Boxcast has two overload methods - one that takes an array as a parameter, like in your example, and another that takes a List - that actually grab all colliders, not just the first one. I would be curious how these perform, and especially if the List version is notably slower and/or produces any garbage.
That's interesting... I do prefer how the NonAlloc version works as it returns a count also. Good change IMO.
You become my sensei for unity. You do things we all need!
Another great video, appreciate the insights 😁
Very nicely put video ! Good job.
I’ve never used “SendMessage,” but I think the intent is more of a poor-man’s publisher-subscriber pattern, in which case “GetComponent” isn’t the analog. But, C# events, UnityEvents, or a custom observer pattern would probably always be a better choice. “SendMessage” is a bit unique, in that the publisher and subscriber are completely decoupled-the publisher doesn’t know if anyone received the message and the receivers don’t know who sent it. Sounds sloppy and unnecessary, but there’s a possibility the problem is our lack of imagination for when that would be useful.
You are correct! Send message is definitely an Observer pattern not Get Component. But still crappy and slow! And unsafe!
I learned more than a thing or two. Thanks!
wow, great video! always stressed about optimizations
Regarding the Vector Distance thing:
It is important to note that Vector3.Distance and sqrMagnitude do NOT produce the same result.
The result of Vector3.Distance produces the magnitude of the direction vector between the supplied vectors whereas sqrMagnitude is that same magnitude squared.
This can be useful if you want to compare distances, e.g. to see which of two positions is further away, because you don't really care about the actual distance but rather which of the two is greater. If you however need to get the actual distance, you have to use either Vector3.Distance or magnitude, sqrMagnitude will be incorrect.
The Linq vs For was one I was looking for. I was annoyed by a friend refactoring my for loops into Linq behind me because "Cleaner". It made my day to see that concrete proof I was right.
This is super helpful! thank youuu!
Loved it! Do more videos like this, please! Frame Debbugger... Profiller... Physics.. would be awesome♥
Nice video! For order of operations, you can achieve the same result by grouping the same types in parentheses like : Vector3 stillFast = x * (a * b).
Good idea
Nice content as always !
I did not know about the order of operations. Very interesting!
Damn! Worked like a charm! Thank you soooo much!
This channel faster became my favorite game dev channel with many useful information that no other dev talking about and things I didn't really know they exist or how to use them .. I would like to see a Playfab tutorial on multiplayer game and matchmaking and stuff it will really help since there isn't any one talking about it and who talks about it only shows the basics but no one goes deep in it and there is non useful documentations
I did a recent one on LootLocker, but it's not a step by step tutorial, more of an overview.
Can't wait to see your other videos.
eventually it all snapped into place and I started learning how to add all the effects, titles, motion text. It was pretty cool to see my
Great video!! Thanks for making it!! It's very helpful :D
Very interesting, mainly confirming what I was finding online to optimize our games.
Regarding the last point, the Order of operations, it seems to have 0 impact on WebGL build when running your benchmark, probably because it's optimized at compile time (3ms for 900k on all tests).
Good stuff!
Phat info dump, thanks for putting in the effort.
As always, very good and useful video tut :-)
This is a great one. Thanks mate.
Thanks for that last one, I've never seen that anywhere before.
Awesome video 👍! Did not know about the order of operation ! Now, need to refactor all my code XD.
Please don't. I just tried 900k iterations on a standalone build and they all came out even. It's so tiny to not even matter :)
excellent content so glad your channel found me
Great video and app to show off these things. Would like to see another video with it running outside the editor on a smattering of target platforms.
"send message is not a good practice"
and here I was sitting and thinking i was doing a neat job using send message to optimize my code.
thanks for the tip!
Release builds will have vastly different perf than what you see in the editor (debug). Great vid.
Nice tutorial, I'll test it right now!
It would be interesting to see Array vs List vs HashSet. There's quite s lot ot things that'd need benchmarking for all 3 though, like Add, Contains, Remove, Foreach speed and maybe even more useful things i missed.
Loved it, very informative.
Dude your content is by far the best in this space, thank you!
Got any insight into forloops vs linq in regards to garbage collection? I'm a massive linq fan but when building a multiplayer server, garbage collection is a concern. I've been too lazy to run my own tests but after seeing you do it, I might just do that.
Almost every Linq query generates garbage, which is generally not a problem if it's outside critical zones. I'd say use them freely and optimize if you find it's causing problems.
Bro, you deserve way more subs! Cheers 🍻
Thanks brother
Another great video. Thank you, I really enjoy your channel and have implemented a lot of your recommendations in my current project☺. One question: is it worth using Vector3's Set Method instead of creating a new Vector3 in Methods that are called a lot?
Thank you so much Sensei! You are a blessing!
Regarding FindObjctOfType, I actually made a component that I use regularily called Fetchable, which has a string id/name field and a static dictionary. Then you can do Fetchable.Fetch(idName) to find the specific object. This of course makes it a one-to-one name to object, but I find it as a really useful alternative to any Find method (assuming it's a 1-to-1, which is often the case)
I read somewhere that they store objects with tags in a extra lookup, so they only have to iterate over that one instead of iterating over overy object, so it makes sense that it is much faster
Great video as always. I wish you would've touched upon string interpolation though when talking about StringBuilder (e.g. $"Hello, {name}"). Since I found out about it, I pretty much completely stopped using the + operator and also started using it in StringBuilder's Append() function. Would've loved to see your thoughts and some benchmark results on this.
The comnt section is very positive and downright encouraging! Love it!
Cool ! Everything is clearly clear !
seriously helped thank you!!
IT'S ALWAYS THE UNDERRATED VID THAT'S LEGIT! THANK YOU!
Very interesting! Edge case for SendMessage: DLL plugins. I recenttly had to implement a iOS native camera access permission plugin and the simplest solution to get communication going was to use SendMessage.
Oh yeah? Were interfaces a no go?
Thank you so much!!! It did work and took less than 5 minutes!
... What worked?
Thank you man for sharing this stuff