Simulation Parallelization | Steam Revolution Game Devlog #9

Josiah Manson

Просмотров 3,8 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 14 июн 2024
In this video I talk about how I parallelized the simulation code in my game Steam Revolution. I measured accumulated times after running 1M simulation ticks. This makes the math nice so that the number of seconds taken equals the number of microseconds per simulation tick. In the world map with 214 trains, parallelizing my code made the time go from 43.2 s to 17.7 s on my computer, a 2.4x speedup.
Speeding up the simulation via multi-threading is challenging for a few reasons.
1. The duration of each simulation step is very short, so the overhead of parallelization is difficult to overcome. Each simulation step is also composed of multiple interdependent stages that further reduce the time per parallel section. There are 6 parallel steps per tick, in 43 us (7 us per step).
2. The simulation is not trivially parallelizable, because trains interact with each other and with shared resources such as sections of track and industries.
3. Results of the simulation must be deterministic and identical between serial and parallel versions. Re-running a simulation should give the same score every time to make playing the game fun. Also, server verification of results requires the server and client results to match.
I will give an in-depth description of the challenges to making each part of the simulation parallel and the strategies I used to overcome them. Although I’m specifically describing my solutions to my problems, they may provide inspiration for parallelizing your own code.
My game is a blend of OpenTTD and Zachtronics games like SpaceChem or Infinifactory. Compared to other train games, my game's focus is more on optimizing static levels for good scores rather than showing how a transportation empire progresses over time. In this series I document my progress developing a game from scratch with no pre-made engine; one could call it handmade.
Timestamps:
00:00 - Intro
03:10 - Overview of game
03:48 - Benchmark hardware
05:20 - Benchmark times
09:10 - Simulation overview
11:40 - Benchmark simulation steps
13:35 - Overhead sources
21:38 - Update track occupancy
23:12 - Propose location
28:55 - Wait and cargo
39:30 - Commit movement
42:45 - Collision detection
44:38 - Add cars and trains
45:16 - Update industries
45:48 - Cargo loaders
47:32 - Remove smoke
47:55 - Update game state
48:35 - Revisit timings
50:55 - Conclusion
Игры

Комментарии • 27

@NongBenz 29 дней назад ⁺⁴
Good video, interesting how multi track train networks and signals are a convenient analogy for code parallelization and concurrency
@vectorlua8081 Месяц назад ⁺³
Hooray! More really informative content, stuff that makes sense and is actually useful!
@krystofjakubek9376 26 дней назад ⁺¹
Nice technical video! Always a pleasure learning about how you overcome challenges
@IstasPumaNevada 25 дней назад ⁺¹
Interesting programming information and a train game. I'm going to subscribe to follow along with. :)
@QuiranPup 5 дней назад
I clicked solely for trains, also, small world
@braspatta Месяц назад
Pretty cool content!
@AJMansfield1 2 дня назад
With the existence of crossover points between single-threaded vs different core selection strategies, would it make sense to add code to switch strategies at runtime? (And do things like, occasionally run a short burst of steps using other maybe-suboptimal strategies to gather statistics on the current CPU's performance with the current sim state.)
@morthim 5 дней назад
mmm
i'm not sure if you want feedback, but something you may not be doing is massive rendering reduction.
the only things you need to actually simulate are on screen.
if something isn't on screen then you can replace it with a passive timer. an active timer is one that counts, a passive timer is one that says "check this at a specific time"
an active timer, like game frames, need to update every frame. but you can reduce the amount of calculation by reducing calculations to what is possible.
in your train simulation, your trains could either have overpasses/tunnels which would cause conflicting routes to not block one another.
conversely, if the tracks overlap and the trains can collide and derail sorta like rollercoaster tycoon. then you need to have a more accurate check.
again if it isn't possible, then trains not on the screen going back and forth are like the hands on a clock. just like the hand of the clock goes from one source to a destination, from top to bottom of the clock, there is phase associated with travel. and like the hand itself has width you need to have the phase which is associated with the locomotive, but also another value for the caboose. that way when you pan over the map you can trigger them to pop into rendering. how much wider than the screen you need to start rendering is something you can customize.
"you can't parrellelize removing smoke"
. you can parrellize removing smoke.
first option is to remove smoke as particles. you have one 'static' field which you have the locomotive drag around.
seccond option, you keep the smoke particles, and make them more transparent every tick. and then garbage collect them when they are too old- when their transparency is >100%.
whether it is worth it is another topic. it will out perform single thread (simd decrement), but may not be worth the time to implement.
"trains wait at signals"
that is a very real way of approaching it. actual trains do this exact thing.
but you can also have fewer conditional jumps by getting rid of the segments where there wont be a potential collision. at a station or intersection, a collision can occur so you need to actually sim it. but for segments of straight track you can back propagate by prioritizing updates based on the phase of the trains. the trains closest to stations or intersections get simulated first, and this allows you to find out first whether colissions occur. and where trains need to stop and wait. also by having the trains which may collide processed closer together, you may be able to spin off a child thread of how the collision should be handled.
a big drawback with this approach is if you want trains to be able to speed and derail.
"you can use a graph"
you can, and that mostly made sense, but i'm not sure if it is right. you have a number of things which are, and a number of things which can happen, and a number of things which do happen.
the map simply is. the trains have a limited number of things they can do, and don't need updates other than that. if a train is going to run out of fuel in the middle of the track, you know that will happen before they leave the station. so you don't need to loop it partially, but set a loop which stops the train when it runs out of fuel by getting to the end point.
beyond those things which will happen always, there are branches of different states that can occur. a train going to a station presumably can go to either of the 'terminals' those represent branching options. but also there could be need for the train to wait for a turn if all the terminals are filled. similarly you have branching paths for all the tracks if you have collisions. do you want the tracks themselves to have the branching path or do you want the trains being near one another to have the branch?
the fewer branching paths the faster the code will run. the former approach is realistic, while the seccond approach is event driven.
how many times you have to run a loop creates geometric work. if a train collision or derailment is an event and it gets triggered once, you might get some lag, but the average cost to generate a frame will be largely uneffected. but if every frame every train asks if it has collided with something, then that is work. if you have 60 conditional jumps a seccond, one per frame, that is 60 times the cost of a conditional jump, every seccond. if you have the assumption that abberant behavior will notify with an event, then the only time that section of code is referenced with a branch is when it is relevant.
going back to the branching of trains into stations, does it matter which lane the trains go into? and if it does sometimes, are you able to say that a specific slot is only able to be gone into by a specific company, brand, color, whatever? if it doesn't matter, and isn't on screen, it doesn't need to be simulated.
as for the third option, the things which do happen. you don't want to optimize for the average as much as the features. having a good framerate when things are normal is fine, but the amount of chug and lag that happens when the special things that happen occur is more felt. in a train game the lag may add to the atmosphere. but the events being quick, will be a main source of the sense of professionalism.
"3d position"
the idea of 3d position in the game is probably really really bad. yea the thing you are simulating is 3d, but that extra dimension for a train game overcomplicates things. under what circumstances will 3d position matter? you could make it matter for fuel, but that can be done in a slightly different way. and you can make it matter for collisions, and there aren't going to be enough collisions to make the additional dimension worth it. you can have overpasses and underpasses simply bipass the t-bone collision check.
@josiahmanson 5 дней назад
I'm glad that you enjoyed the video.
Your main assumption that I can reduce the accuracy of the simulation based on where the camera is located is incorrect. The simulation must be identical between runs, because that is the scoring mechanism of the game and also is necessary to determine if the solution to a level was correct. If any train crash happens, then the solution is a failure and simulation immediately stops. Your assumption about allowing inconsistent results seems to bleed into a more than one of your suggestions.
You are correct that actually the smoke removal can be parallelized, and I have a follow-up video that describes how I did that.
I'm unsure why you think that simulation of 3D trains is significantly more expensive than 2D. Most of the logic is completely independent of the dimensionality.
I also think that you are misunderstanding the time scales involved. There is nothing slow enough to cause lag like you describe. Perhaps you thought I was talking about times in milliseconds when I was talking about microseconds which are 1000 times shorter?
@AJMansfield1 2 дня назад
Movement could be improved by representing position as an explicit polynomial of elapsed simulation time, instead of in the implicit Euler or Verlet form you have with each tick's instant position and velocity.
Most trains processed each tick continue to follow the same second-order trajectory; so the terms of the explicit polynomial that describes that motion don't change. By using that explicit polynomial as the train's data representation, only the minority of trains with a discontinuity on that tick need their data written to and synchronized (e.g. start, stop, enter/exit coast, enter/exit turn). And, the process of converting from explicit to implicit is faster than even an L1 memory stall, just one FMA per order. (Also, not that it particularly matters for a game, but explicit motion is also a good way of eliminating integration error.)
@AJMansfield1 2 дня назад
IIRC, Factorio also uses explicit motion to track clusters of items on conveyor belts (though, just a first-order polynomial), and reported significant TPS improvements for large factories in their devlogs.
@josiahmanson 2 дня назад
@@AJMansfield1 That is an interesting suggestion. At this point I think I am satisfied with the speed so don't think I will try it. Honestly, this optimization work has been a multi-month rabbit hole that has prevented work on the actual gameplay stuff.
I agree that a higher order integrator could handle things well if trains were either on a constant slope or in free-fall, but I'm not sure it would work in the actual tracks I have, which are subdivided into actually quite small segments. There are about 2-4 segments per grid square that are drawn on the terrain. Each segment has a different slope and curvature, which affect train motion, so I don't think trains will actually coast for very long until they hit a discontinuity. Especially because the length of segments is not a constant, and a discontinuity for any of the train cars would need a full physics update in your scheme. This would likely be once per tick anyhow.
But, I will keep in mind your idea for any more feely moving physics objects I may have in future games.
@michaeljackson1147 24 дня назад ⁺¹
3:10 The logo has a typo "Steam Revloution" lol
@josiahmanson 23 дня назад
lol. oops. thanks for catching that!
@delphicdescant 22 дня назад
Typos increase engagement!
@Kanonenwind 28 дней назад
I do have a question:
Have you tried batching modifications of shared state? Especially cases like spawning puffs of smoke and noises seem like they should be trivially changeable to storing them in a local list in the thread first, then locking the global list if any were created and just shoving either the list object or each element of the local list into it.
It may be benefitial because it means you only need to lock/atomically insert at most once per worker, or totally worse because of the extra code and copies, so I'd love to hear if you have benchmarked this!
@josiahmanson 28 дней назад ⁺¹
Good question. I think I mentioned in the video, but the sounds actually are already batched per train over several simulation ticks and are consumed by the main thread during the render update. So, sounds have no locks during the simulation.
For smoke puffs I could have done the same thing, but it doesn't cost much performance as it is since the puffs aren't generated all that often (like maybe 1 in 10 frames and only if a train is moving). So, it barely shows up in profiling, a quick test just now showed 0.143 s out of a total 18.344 s run time was spent acquiring the smoke lock. But you are right, that I could probably avoid that overhead.
Thanks for the suggestion. Maybe I will implement that.
@josiahmanson 27 дней назад ⁺¹
I gave smoke puffs the same treatment as sounds. I store them only in the train now, and there is no reason to have a global array to collect the smoke in, because the renderer can directly use the train arrays.
I also removed the per-tick stage of the simulation where I remove smoke, because this only needs to happen once per rendered frame. At the fastest rate that is once per 1000 ticks, which also reduces overhead some. I also made my benchmarking code (that doesn't render) match that to remove finished smoke every 1000 ticks.
Rough timings of the best run over like 5:
17.68 before
17.13 after train smoke optimization
There is a variance of up to a second between timings, so it is hard to know precisely how much it helped, but it is clearly somewhat better by on the order of half a second.
I could in theory do similar for the cargo loaders, but since those affect the sim I will still need to touch them once per tick so the benefits won't be as large. I might try that next.
@Kanonenwind 27 дней назад
Nice to hear back the results!
Yeah, I didn't expect a big benefit out of the smoke spawning because it didn't seem like something that would happen too often, so I'm pretty amazed if it really is half a second in the end. I mainly mentioned it because this is a common technique in entity component systems to allow multithreaded systems to spawn new entities.
You could probably use it in other steps of your simulation, but once one needs more than "many threads create values for one accumulation without order" you may need to add a serial fixup step or something like that, so the smoke seemed like low hanging fruit!
And on the audio, you may have mentioned it, I mainly listened to the video over two days when I ate or was on break so I don't remember all the details 😅.
@josiahmanson 27 дней назад ⁺²
@@Kanonenwind Ya, I wouldn't expect you to remember every detail. I think the main benefit from the smoke change was not about the locking, but about my being able to skip the step in most sim ticks and do a big batch update every 1000 ticks. Most ticks there is no change in state for the smoke puffs, so repeatedly processing them was wasteful. Also there is no harm in letting smoke remain in the array past its expiration so skipping the tick it should be removed is safe.
These things aren't true for cargo loaders, so I don't expect anywhere near the same benefit.
@lynnwilliam Месяц назад
Ireland!!!!!
@lynnwilliam Месяц назад
What game is this ?
@josiahmanson Месяц назад ⁺¹
It is a game I am working on as my hobby, called Steam Revolution.
@lynnwilliam Месяц назад
@@josiahmanson I love how you have a deep understanding a parallel processing and parallel programming and simulation. I have the same it is so hard to teach in a YT video. but you did a great job.
Where can.I download your software?
@josiahmanson Месяц назад ⁺²
@@lynnwilliam Thanks for the compliment and your interest. The game hasn't been released, so isn't available for download yet.
@lynnwilliam Месяц назад
@@josiahmanson can you setup maybe a reddit subreddit and link that in all the YT videos about this? Would love to hear what other programmers think. What you are doing is amazing by the way,
@josiahmanson Месяц назад ⁺²
@@lynnwilliam That is a good suggestion. I'll have to consider it.

Следующие

Автовоспроизведение

Parallelization Addendum | Steam Revolution Game Devlog #10