Parallelization Addendum | Steam Revolution Game Devlog #10

Поделиться
HTML-код
  • Опубликовано: 27 май 2024
  • In my previous 2 videos, I talked about how I built a parallelization library, then used that to make my game run faster. As a result of feedback and some more thinking, I made some more improvements that I wanted to talk about, which resulted in a 11.5% performance improvement.
    My game is a blend of OpenTTD and Zachtronics games like SpaceChem or Infinifactory. Compared to other train games, my game's focus is more on optimizing static levels for good scores rather than showing how a transportation empire progresses over time. In this series I document my progress developing a game from scratch with no pre-made engine; one could call it handmade.
  • ИгрыИгры

Комментарии • 29

  • @josiahmanson
    @josiahmanson  22 дня назад +3

    After making the video, I realized that actually the fence I described at 14:20 isn't needed. I had forgotten that after a worker thread finishes its work, it will signal that it is done in the parallel for library I wrote by performing an atomic increment to numWorkersFinished. This atomic RMW operation acts as a memory fence, meaning that the write to the global addingTrainsDone has to complete after the thread finishes.
    My commentary about fences would have been correct otherwise I think, if the worker threads had continued without making an atomic operation. So, it is something to consider in other cases, and is hopefully illustrative of the type of memory inconsistencies that can happen.

    • @SneedsFeeduckAndSeeduck
      @SneedsFeeduckAndSeeduck 14 дней назад

      A "strong" read as per C++ memory model requires a RMW if you want to make sure. But depending on the ISA, an atomic read could also be sufficient already (not sure whether relaxed or acquire or seq_cst makes a difference regarding liveness of the value; I assume it's ISA-specific).

  • @removechan10298
    @removechan10298 13 дней назад +1

    HOLY ALGORITHM BATMAN!!!! THIS IS THE GAME I'VE BEEN WAITING FOR!!!!!!!!!!!
    I was playing OPENTTD THE OTHER DAY, LOOKED UP AND 4 HOURS HAD GONE BY!

  • @Ray-gs7dd
    @Ray-gs7dd 21 день назад +3

    I really love optimizations. It's amazing how seemingly small things can change performance by a huge margin. And then there is the more niche things one would not think would do anything, but does quite a lot. Just seeing the time used for any given operation go down is really satisfying.

  • @WuddupDok
    @WuddupDok 2 дня назад

    Super appreciate this video! Good luck and keep em coming 🎉

  • @williammedeiros2699
    @williammedeiros2699 18 дней назад +1

    The algorithm unexpectedly recommended these videos to me, and I was really impressed by the game's concept and the mechanics you’ve implemented. It's a fantastic game.
    I have a suggestion that might be a bit ambitious, but it would be interesting if the trains needed to refuel (e.g., steam locomotives requiring new water and fuel sources). It would add another layer of strategy to solving the levels and managing resources.
    Watching this game, I couldn't help but think about the potential for even more realism. For instance, incorporating real-world terrain heights, requiring players to build tunnels for straighter paths and faster movement at the cost of money, or considering slopes when laying down tracks to affect train speeds.
    I understand you mentioned not wanting too many levels, but I’d love to see an endless mode where industries continually appear, and I have to keep everything connected. If an industry runs out of products or can't send them out, the player would lose. This idea might sound like a blend of Mini Metro and Mini Motorways but with much more complexity. Imagine adding boats and encompassing the whole world-it's a fun thought experiment!
    I hope to see some updates soon. This is a really cool game, and I would definitely pay to play and see its development progress. You should consider crowdfunding to support further development.

    • @josiahmanson
      @josiahmanson  17 дней назад

      Thanks for the interest and support. You may be happy to know that some of your suggestions already exist in the game. The trains do consume fuel, and there are 3 types of train engines. Wood powered, coal powered, and oil powered. Those engines get progressively faster, but also the fuels for them get rarer. Trains need to periodically stop by a fuel source, but you can also have a train carry fuel to storage areas for other trains to pick up. As you said, my hope is that this adds an interesting layer of complexity without being too annoying.
      I don't have tunnels because those would be a bit hard to design, draw, and edit. However there are bridges, which basically have the same set of costs and benefits. Also slopes do affect the train speed, as do curvatures of the track.
      In terms of endless mode, that is a very different design and would require a very different architecture for the simulation, so would be impractical to add at this late stage of development. Also, that is what most other train games like OpenTTD or Maschinky do, so would lose the distinctiveness of my game. That said, the world map right now is large, and has an element of what you are talking about because regions unlock as you complete levels, and that causes you to have to adapt to the new industries that became accessible.

  • @0zux45
    @0zux45 22 дня назад +2

    I do really enjoy your explanations and the entire journey for optimization!

  • @nexttonic6459
    @nexttonic6459 22 дня назад +2

    You can force CPU frequency to stay at 1 setting, if you disable on intel speed stepping and on amd... umm well I can't remember and I'm AMD user xD but you can do it in the bios "turbo boosts, etc. These kinds of settings define it."

  • @Kunteki
    @Kunteki 19 дней назад

    hmm, sounds to me you might want to think more with time in mind, specifically elapsed time since last update cycle, or a fixed update cycle limited to a certain number of cycles per second.
    ... or both ^^.
    this is something that game makers in the mid 90's discovered when processors started to rapidly speed up between generations, but they wanted the game to run at a certain pace, on any computer, as to not have it become nightmarishly hard, to beat since your balling pc would run at 200+ fps vs your friends squalid 80 on their old mashine they had.
    this also made multiplayer a thing since both could be running at a agreed max fps ^^.
    hope this gave some ideas or at least made you think a bit, since this can have you completely ignore the processor speed of execution, for certain things that might be timing sensitive :)

  • @ivlaox7359
    @ivlaox7359 14 дней назад

    Hello. Great series, it gives a lot of insight into gamedev and optimizations. I haven't developed any games, but I have a bit of experience when it comes to parallel and concurrent code. At the beginning of the video you said you needed to access a global array from different parallel processes and it requires a lock. Batching is cool, but I think it might be even faster to make it a lock-free list if you don't need random access there.

    • @josiahmanson
      @josiahmanson  14 дней назад

      As it turns out, a global array was not needed, and each train could have its own array. Thus, no locking was needed. It also made parallel processing of the arrays easier.

  • @TomaszRyszkowski
    @TomaszRyszkowski 20 дней назад +1

    What the heck? This game looks amazing! I love Open TTD and this game looks exactly how I imagined Open TTD if it wasn't held back by the pretty old engine limitations. I hope that in this game there will also be similiar depth when it comes to the actual city side of the transport system, as so far i've only seen really major cities represented which doesn't really look that cool. I'd love for there to also be small villages attatched to all of those factories which also generate small amount of passangers so that there can be more than just 1 approach when it comes to the gameplay - but it's your game and those just are my thoughts

  • @ITR
    @ITR 20 дней назад +2

    If all smoke disappears after the same amount of time has passed since their insertion, you might be able to use a circular buffer instead of actually modifying the array.

    • @ITR
      @ITR 20 дней назад +1

      On the topic of benchmarking you mention at the end of the video, I think what everyone does is just take the average time over a lot of different runs that have the same conditions, so I wouldn't be surprised if there isn't a way around that. If there are any frames that take particularly long to run you can try storing the starting conditions from those and rerun that specific frame over and over as a benchmark instead. Might also be worth testing on some slightly older computers too, if you can find somebody that has some

    • @josiahmanson
      @josiahmanson  20 дней назад +1

      That is a good point about circular buffers! You are absolutely right that there is a bound on the number of puffs of smoke and a circular buffer would work. I expect very little speed gain from this, since it is already fast and amortized over several ticks, but may be worth trying. Thanks for the suggestion.
      Ya, this benchmarking problem is a bit frustrating.

    • @ITR
      @ITR 19 дней назад +1

      @@josiahmanson Well, it's one copy operation less.You could probably also get away any time calculation on the smoke if you're doing some, since you know they'll be removed in order

  • @Kanonenwind
    @Kanonenwind 22 дня назад

    Nice to see you could find an even better solution!
    Removing elements from an array in parallel (stream compaction) is one of those fascinating problems I've spent hours on implementing. I believe parallel prefix sum is still the best algorithm we know, at least it's what I use in culling.

    • @josiahmanson
      @josiahmanson  22 дня назад

      Yes, thanks for your comments! 🙂
      Prefix sum is a good solution if you have a TON of processors, but has a couple downsides. One is that you need a destination array in addition to the source array, and you also need an auxiliary array of the sums. If you have a GPU or a super computer and a huge array, it can be pretty fast though because it scales with the number of processors. What is also nice about prefix sum is that order is preserved.
      For a smaller number of processors like on a typical desktop, I think you might be better off assigning each processor a contiguous section of the array, and compacting the array in-place within that chunk. Then each processor can do a single atomic fetch-add to a counter to find a range to copy to in a destination array. This doesn't preserve order, but many use cases (such as mine) don't need that. With a bit more work you could probably make an order-preserving approach that works similarly.
      I'm not sure how to efficiently parallelize compacting entirely in-place, and not have a separate destination or auxiliary array. This seems tricky.

    • @Kanonenwind
      @Kanonenwind 22 дня назад

      @@josiahmanson I know, I didn't want my comment to be too long, but I almost included the remark that this is probably a waste of time because I doubt it can be done faster than single threaded, maybe offloaded to a background thread, in your case. I've only ever used it for GPU occlusion culling where there is no choice, code has to be run by groups of 32 threads.
      Another thought I just had while writing this comment was that, should you ever plan to release your acceleration code as a library, it would be nice to be able to run independent code on the different groups of threads.
      For instance, cleaning dead elements out of a global list not accessed otherwise in that moment might be something that could then handed off to threads in another cluster/little cores to be processed in the background while the local/big cores perform some work.
      The latency of handing work to slower/further away cores would (hopefully) be masked by the bigger chunk(s) of work performed on the local group.
      It's probably also a pretty unnecessarily complex idea for little to no performance gain!

  • @lachee3055
    @lachee3055 19 дней назад +1

    microseconds or miliseconds?

    • @josiahmanson
      @josiahmanson  19 дней назад +1

      microseconds. i measure the times by running 1 million simulation ticks. number of seconds elapsed equals number of microseconds per tick.

  • @DellariSafire
    @DellariSafire 19 дней назад +2

    we able to play the game yet?

    • @josiahmanson
      @josiahmanson  19 дней назад +2

      Thanks for asking! Sorry, it hasn't been released yet.

    • @DellariSafire
      @DellariSafire 19 дней назад +2

      @@josiahmanson oof. Tell me when the demo is out!

    • @josiahmanson
      @josiahmanson  19 дней назад +2

      @@DellariSafire I'll for sure make a video announcing the release whenever that happens, so subscribe to the channel and turn on notifications, and you will learn for sure when it gets released. ;-)