The BEST Q-Learning example! | The Mountain Car Problem

Поделиться
HTML-код
  • Опубликовано: 5 янв 2024
  • The algorithm for this video comes from the book Reinforcement Learning by Richard S. Sutton and Andrew G. Barto.

Комментарии • 4

  • @iony_mikler
    @iony_mikler 2 месяца назад

    This is very cool progress do u have a code repo for your learning?

    • @marcuskoseck98
      @marcuskoseck98  Месяц назад +1

      Honestly, I have a bunch of code stored on my computer for various projects. I need to organize the code and upload them. Eventually, I will upload code.

  • @AsmageddonPrince
    @AsmageddonPrince Месяц назад

    I don't feel like I understand the principle from your video- what is the purpose of partitioning the state into tiles? How and when are they assigned a Q value and when is it modified? Are the Q values just zero during the first epoch? Does this work for larger state spaces? Does the agent really learn anything substantial from a replay of a 40k steps Epoch?

    • @marcuskoseck98
      @marcuskoseck98  Месяц назад

      I partition the state into tiles to make a function that relates states with q-values. Think of it this way: I need a relationship between states and future returns. There is no obvious function I can think of to do the job. Instead, I break state space into squares (partitions) and assign that square a random q-value. This is initialization. As the algorithm learns, the q-value will be more representative of the actual q-value. This method doesn't work for larger state spaces. At that point, you would want to use a neural network. For this specific reinforcement learning problem, 40k steps can be helpful in the beginning for exploration. If your algorithm is taking 40k steps after a few thousand epoch, that's the sign your parameterization may be incorrect. Hope this helped!