Curiosity Killed the AI Airplane (Unity ML-Agents)

Поделиться
HTML-код
  • Опубликовано: 7 ноя 2024

Комментарии • 22

  • @complexobjects
    @complexobjects 4 года назад +5

    I've been working with your Penguin agent and I definitely learned a huge amount from that. Thanks for sharing.

  • @robosergTV
    @robosergTV 4 года назад +1

    Just bought my very first Udemy course! Thank you for your work. Not that I needed a tutorial about ML agents, but more as a thank you for your YT videos and also to learn about game logic, the course seems like a very nice skeleton for my future ML experiments (racing etc). Just wanted to ask, in that course, do agents only learn on the same race track? If yes, do you plan to make a YT video or update the Udemy course to show how to generate race tracks on the fly, so that the agent learns on random tracks to help it generalize better and not overfit on the same track? (so new track every agent.done()) Thanks again.

    • @ImmersiveLimit
      @ImmersiveLimit  4 года назад

      Thank you so much! In the course, I only teach how to train on one racetrack, but we train 4 racetracks simultaneously, so it would be easy to expand up to at least 4 static tracks. I’ll have to consider how I would create random tracks. It seems challenging to make a whole track procedurally, but maybe segments could be effective too. Thanks for the suggestion and support.

    • @user-gd2oc2sq3b
      @user-gd2oc2sq3b 4 года назад

      Maybe not generate the whole track but reshuffle existing elements, making it unique

  • @Procuste34iOSh
    @Procuste34iOSh 4 года назад

    Great video ! The update on the new version 14 was great ! Have you tried playing around with the decision period of an agent ? Does a smaller decision period induces a longer training ? I found out that setting a decision period to 1 on the PushBlock example made the algo not learn, that's why I ask this question. Keep it up !

    • @ImmersiveLimit
      @ImmersiveLimit  4 года назад +1

      Thank you! I honestly haven’t experimented with it. I bet it depends on the task. I think I was reading about the DeepMind Capture The Flag research and saw that they used a decision period of 4 or 5, so I figured that was a good number to use and never questioned it.

    • @Procuste34iOSh
      @Procuste34iOSh 4 года назад

      Immersive Limit All right thank you !

    • @nabila.8647
      @nabila.8647 4 года назад +1

      It depends on the task, but mainly you wanna avoid faster decision making cause it won't give enough time for the reward algorithm to evaluate that decision properly Which results in a chaotic behaviors.
      That's what learned after a lot of experimentation.

  • @sabrango
    @sabrango 4 года назад +3

    The curiosity maybe future of learning!!
    since its algorithm can help to learn another task in future training! so the model moves another environment to learns different task with existing algorithm without creating one from zero

  • @vcorner6033
    @vcorner6033 4 года назад +2

    thanks for your explanation!

  • @user-gd2oc2sq3b
    @user-gd2oc2sq3b 4 года назад

    It's good to know. Thanks!
    BTW, are you planning to make a course or video on more advanced usage of ml-agents? I would love it.
    Something that uses full vision input (meaning not just rays) with CNN and RL. I wanted to participate in Obstacle Tower Challenge but it was a bit too complex to start with. If there was a course I would start from that. Thanks again!

    • @ImmersiveLimit
      @ImmersiveLimit  4 года назад

      I’ve got some stuff that I’m working on, may take a while before it is ready. 🙂

  • @combine_soldier
    @combine_soldier 4 месяца назад

    9:00 - you can turn on th gizmos in Game view too

  • @ThomasJeff4s0n
    @ThomasJeff4s0n 4 года назад

    Why doesn't my Tensor Board have a "Curiousity" chart? Also, what is the difference between curiosity and the Beta parameter?

    • @ImmersiveLimit
      @ImmersiveLimit  4 года назад

      I think it does if you expand one of the collapsed sections. I’m not sure on the second one, the docs would probably be helpful though. 🙂

  • @sheilafrancl1423
    @sheilafrancl1423 4 года назад +2

    perhaps the planes were doing the same thing as I did - crash on purpose so you don't have to fly back to get the checkpoint. It's faster to respawn and then your are right in front of it (unless that wasn't the case for the agents)

    • @ImmersiveLimit
      @ImmersiveLimit  4 года назад +2

      During training they start over with zero reward when they crash, so that strategy wouldn’t work. It does work in race mode though, but they wouldn’t know that, so I supposed that gives you an advantage!

    • @sheilafrancl1423
      @sheilafrancl1423 4 года назад +1

      @@ImmersiveLimit I need that advantage lol

  • @juleswombat5309
    @juleswombat5309 2 года назад

    Soo, what is the explanation answer. This seems to suggest that we should Not using Curiosity? Is that really the advice. Or is the advice that we need to be careful to reduce the strength of Intrinsic rewards. A better strategy would possibly be to disable Curiosity networks and the intrinsic rewards, when the external rewards are consistently good.
    Without clear guidance on the hyper parameter tuning we are all just thrashing around in the dark, and not developing good experience.

    • @ImmersiveLimit
      @ImmersiveLimit  2 года назад +1

      I think the takeaway is just to not assume that curiosity (or any other technique really) will always work well. I always encourage experimentation because I definitely don’t know the best way for your unique case. I probably don’t even know the best way for my own projects. 🙂

  • @AnoTheRock69
    @AnoTheRock69 4 года назад +1

    hmm Interesting