Alpha Zero and Monte Carlo Tree Search

Training a Neural Network in a Spreadsheet

Improving LLM accuracy with Monte Carlo Tree Search

More on Donald Trump being elected 47th President of the United States

Nino Paid - Play This At My Funeral (Official Video) Dir: @Shotbyaddict

Digging Up a Mystery Egg That I Buried in My Giant Rainforest Vivarium

Using MuZero's Tree Search To Find Optimal Tic-Tac-Toe Strategy in a Spreadsheet

Concepts Illuminated

Просмотров 8 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 6 ноя 2024

Комментарии • 11

@TimJSwan Год назад ⁺¹²
Aww, man, now I can no longer win all the games in “set the video compression settings more optimally than your opponent,” anymore. 😔
@walkastray007 Год назад ⁺¹
I know this video is a year old but I have to say this was extremely well crafted together. At no point did I ever feel lost while watching the video. I have actually come from watching two other recent videos both related to #SOME. I really enjoy how you don't use a ton of buzzwords and read textbook definitions and pass them off as "explanations".
@zinoonomiwo 17 дней назад
Hey, this is a great video!
I noticed you made a mistake in the spreadsheet regarding UCB calculations. The parenthesis of the square root calculation are placed differently.
@khoakirokun217 10 месяцев назад ⁺¹
This video is gold.
@souvikbhattacharyya2480 6 месяцев назад ⁺¹
Are you sure that the input to the dynamic network is the actual game state and not the representation network output?
@hidrogenodeuterio 4 месяца назад
Thank you 😊
@cyberkraken1606 Год назад ⁺¹
Ok so i know this video is old, but how do you train a network like this to know what valid states are or what winning states or loosing states or draw states are?
@ConceptsIlluminated Год назад
At 20:40 or so, the video shows the process of turning gameplay into training data. In the very first round of training, the goal is to get the agent to know what the valid moves are, so the goal policy is all legal moves (the game engine told us these at each step) with more or less equal probability (the tree search has no idea what is good or bad). We also want to start teaching the representation and dynamics networks "how good" a state or action is. For board games (with a win or loss at the end only), we take the reward from the end of the game (again, provided by the game engine) and feed it to earlier states; this effectively pretends that each play along the way was "winning" or "losing".
Let's suppose we played randomly for 1000 games and generated that test data. We train the agent, it starts to learn what moves are "good" and we then use *that* agent to play some more. This time, the agent's tree search will be better (it has learned a little bit how to win), so the policies that are produced for training are a bit more focused towards the good moves. Play another 1000 games, and then re-train the agent on that data (I think the MuZero authors would use a random sample of training data skewed towards later generations [when the agent was better], but I could be misremembering).
@ConceptsIlluminated Год назад
I realize that was a long response, but in short: The game engine tells us about wins/losses and legal moves and we use some clever tricks to turn that into training data to teach the model how to look ahead better.
@felipeolivos8934 Год назад ⁺⁴
what amazing content, thank you so much for share it!!! :clapping_hands:
@felipeolivos8934 Год назад
👏🏻

Следующие

Автовоспроизведение

Alpha Zero and Monte Carlo Tree Search

Alpha Zero and Monte Carlo Tree Search

Training a Neural Network in a Spreadsheet

Training a Neural Network in a Spreadsheet

Improving LLM accuracy with Monte Carlo Tree Search

Improving LLM accuracy with Monte Carlo Tree Search

More on Donald Trump being elected 47th President of the United States

More on Donald Trump being elected 47th President of the United States

Nino Paid - Play This At My Funeral (Official Video) Dir: @Shotbyaddict

Nino Paid - Play This At My Funeral (Official Video) Dir: @Shotbyaddict

Digging Up a Mystery Egg That I Buried in My Giant Rainforest Vivarium

Digging Up a Mystery Egg That I Buried in My Giant Rainforest Vivarium

Real Madrid vs. AC Milan: Extended Highlights | UCL League Phase MD 4 | CBS Sports Golazo

Real Madrid vs. AC Milan: Extended Highlights | UCL League Phase MD 4 | CBS Sports Golazo

AlphaStar: Grandmaster level in StarCraft II using multi-agent reinforcement learning

AlphaStar: Grandmaster level in StarCraft II using multi-agent reinforcement learning

Harvard Professor Explains Algorithms in 5 Levels of Difficulty | WIRED

Harvard Professor Explains Algorithms in 5 Levels of Difficulty | WIRED

Advanced 4. Monte Carlo Tree Search

Advanced 4. Monte Carlo Tree Search

Training a Deep Neural Network in a Spreadsheet

Training a Deep Neural Network in a Spreadsheet

AlphaZero: An Introduction

AlphaZero: An Introduction

MuZero - ICAPS 2020

MuZero - ICAPS 2020

Explaining Monte Carlo Tree Search - AlphaGo's Core Algorithm

Explaining Monte Carlo Tree Search - AlphaGo's Core Algorithm

Building a ML Transformer in a Spreadsheet

Building a ML Transformer in a Spreadsheet

Monte Carlo Tree Search (MCTS) Tutorial

Monte Carlo Tree Search (MCTS) Tutorial

Перемычка из клинкерного кирпича

Перемычка из клинкерного кирпича

Real respect sigma

Real respect sigma

Academeg - деньги, тачки, криминал. История автоблогера, построившего империю

Academeg — деньги, тачки, криминал. История автоблогера, построившего империю

Речь Дональда Трампа по итогам выборов: «беспрецедентный и мощный мандат», «золотой век Америки»

Речь Дональда Трампа по итогам выборов: «беспрецедентный и мощный мандат», «золотой век Америки»

ВЫБОРЫ В США 2024. Предварительные результаты! 6 ноября Спецэфир 07:00.

ВЫБОРЫ В США 2024. Предварительные результаты! 6 ноября Спецэфир 07:00.

Кедми: Самое страшное для США! // База Трампа, крах государств, валюта БРИКС и ложь на выборах

Кедми: Самое страшное для США! // База Трампа, крах государств, валюта БРИКС и ложь на выборах

Нашли уникальную инвалидку СМЗ 4Х4!

Нашли уникальную инвалидку СМЗ 4Х4!

Russian soldiers flee after their T-80 tank is hit by Javelin missile

Russian soldiers flee after their T-80 tank is hit by Javelin missile