AI beats us at another game: STRATEGO | DeepNash paper explained

AI Coffee Break with Letitia

Просмотров 4,2 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 7 июл 2024
DeepMind made an expert-level Stratego bot. We explain how they program an unexploitable AI player and we go into more details while explaining their model-free Reinforcement Learning method and how they achieve Nash Equilibrium with Regularized Nash Dynamics.
► Sponsor: NVIDIA: 👉 nvda.ws/3HpWbzX Sign up for the GTC spring 2023 for FREE!
Google Form to enter DLI credits giveaway: forms.gle/DMPc4G22tnqbMWGCA
Check out our daily #MachineLearning Quiz Questions: / aicoffeebreak
➡️ AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring....
📜 DeepNash paper: Perolat, J., De Vylder, B., Hennes, D., Tarassov, E., Strub, F., de Boer, V., Muller, P., Connor, J.T., Burch, N., Anthony, T. and McAleer, S., 2022. Mastering the game of Stratego with model-free multiagent reinforcement learning. Science, 378(6623), pp.990-996. www.science.org/doi/pdf/10.11...
📖DeepNash blog post (DeepMind): www.deepmind.com/blog/masteri...
💻 Open-source implementation of the Deepnash algorithm on a GPU-accelerated game that can run on consumer hardware: github.com/baskuit/R-NaD
Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏
Dres. Trost GbR, Siltax, Edvard Grødem, Vignesh Valliappan, Mutual Information, Mike Ton
Outline:
00:00 DeepNash from DeepMind
01:05 NVIDIA - GTC 2023 [Sponsor]
02:26 RL is hard for Stratego
03:35 How Stratego works
04:43 Why RL for solving it
06:22 Model-free RL - Nash equilibrium
07:45 Technical details of DeepNash
08:22 DeepNash architecture
10:13 R-NaD: Regularized Nash Dynamics explained
13:46 Finetuning
15:04 Results
15:54 Bluffing
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to help with our Coffee Bean production! ☕
Patreon: / aicoffeebreak
Ko-fi: ko-fi.com/aicoffeebreak
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔗 Links:
AICoffeeBreakQuiz: / aicoffeebreak
Twitter: / aicoffeebreak
Reddit: / aicoffeebreak
RUclips: / aicoffeebreak
#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research
Music 🎵 : Cru - Yung Logos
Video editing: Nils Trost
Наука

Комментарии • 28

@AICoffeeBreak Год назад ⁺⁶
Check out our ► Sponsor: NVIDIA: 👉 nvda.ws/3HpWbzX Sign up for the GTC spring 2023 for FREE!
Google Form to enter DLI credits giveaway: forms.gle/DMPc4G22tnqbMWGCA
@MachineLearningStreetTalk Год назад ⁺⁴
😍
@ryankeathley995 Год назад ⁺⁶
Great video, very in-depth. And thank you for including my implementation in the description!
This paper has gone mostly unnoticed by the broader RL community, even though it should have myriad applications.
Also I think its a good idea for researchers to review the regularization code in my repo/OpenSpiel.
In my opinion, it differs from what is detailed in the paper.
@andrewloeber3844 Год назад ⁺⁵
Another awesome video, Letitia! After hearing about DeepNash some months ago, I tried to read the paper, but ultimately failed to absorb much of it. Your explanation here makes it a lot clearer.
@dshin83 Год назад ⁺⁴
The Methods section of the paper indicates that the discretization parameter during the play-phase of the game was n=16. The actions are sorted from highest-probability to lowest, and then each action's probability is rounded *up* to the nearest multiple of 1/n, discarding the remaining weights once a sum of 1 is reached.
My hypothesis on why discretization is used: it ensures that at most only the top-n moves are considered. They may have empirically found that whenever the agent made moves outside the top-n, it was more often than not a mistake.
@AICoffeeBreak Год назад ⁺³
Ah, it's a ranking, now I got it, thanks! 🫱
@dshin83 Год назад ⁺²
Another hypothesis is that discretization was motivated by memory considerations. In some MCTS implementations, the policy distribution can be the dominant contributor to each node's memory footprint, and thus to the entire system's memory. Without discretization, the policy distribution requires something like 4*N bytes, if using float32 representations. With discretization, each weight can be packed into log_2(n)=4 bits, and so you can pack all n of them into n*log_2(n) = 64 bits, or just an 8-byte word. You would need the index-information as well, which can be represented as a bit-mask of N bits, giving you a total footprint of N/8 + 8 bytes, which is about 32x less than the naive 4*N byte representation.
@Ohrami 5 месяцев назад ⁺¹
@@dshin83 It's almost certainly because of memory. Giving a bigger decision space should theoretically eventually lead to a better outcome, although it would make the training and memory requirements significantly greater.
@RolandTitan Год назад ⁺⁴
nice. may use this for titan's battalion
@tildarusso Год назад ⁺⁴
I am learner of RL, to my understanding, for stratego, the action is a huge vector for each state, even worse, the length of this vector virtually needs to be variable. So if actions are spit out from a UNet, it has to be a fixed size vector, processed via softmax. This will make the probably of those invalid movements be a very small non zero value, which, might be picked via e-greedy like process to explorer the state space. To avoid this problem, authors may do the hack of discretize to only keep action with bigger probability. This is just a WILD GUESS. I need to read this interesting paper.
@AICoffeeBreak Год назад
Thanks for your thoughts. 🤔 Shouldn't do thresholding alone do the trick already? Why also discretize?
@harumambaru Год назад ⁺⁵
First of all - I am writing with same voice as you made this video. Get well soon.
Secondly - Why this year Nvidia conference doesn't do RTX giveaway like last years?
Thirdly - This is most interesting topic that made me study DL and change field of work. I am very excited with every game solution. Poker and Go was very impressive. I remember poker bot played 2 sessions with 10 top players, and most of them noticed about "balanced game" and "unexpected bluffs". I totally get it, when search space of solution is not limited by human brain and previous knowledge gaming agents can come up with ideas that no human will come up. This is inspiring. I am waiting for gamification of real problems, like energy grid optimisation in global scale (for example for all Europe). Or working with researchers and policy makers on global warming \ economics crisis mitigation. Leave games for fun
@AICoffeeBreak Год назад ⁺⁵
1. Thanks! 🤧
2. They do RTX giveaways, but not with my small channel. Go to Yannic's or Louis's channel for fat RTX cards. ;)
3. I do not think you have to wait if you are talking about game-RL for real problems. DeepMind did matrix factorization, plasma stabilisation for fusion reaction, AlfaFold2, etc.
It's just that we still need to wait until this model-free RL method they presented for Stratego gets applied to something more interesting. :)
@harumambaru Год назад ⁺²
@@AICoffeeBreak Who is Louis? I know only you and Yannic, so Louis should be less popular
@AICoffeeBreak Год назад ⁺²
@@harumambaru Then check out What's AI. :)
@DerPylz Год назад ⁺⁵
Get well soon, Ms Coffee Bean! 🤗
@AICoffeeBreak Год назад ⁺²
Thanks! 🤧
@jansotysik5356 Год назад ⁺³
Could anyone suggest a best resource to read/learn about game theory. I would like to gain intution why does this learning dynamics works.
@AICoffeeBreak Год назад ⁺²
👀
@jasonbourne485 Год назад ⁺³
I still have no idea how the AI learned to understand information well enough to bluff, you'd have to understand what your opponent would play knowing you're chasing with an unrevealed piece near enemy territory. Also I'm still confused as to how this is unexploitable, if red saw that blue (deepnash) took the deep way in to chase the 8 he could have called the bluff because no way blue would risk their marshal that easily they should know it might be a spy trap, so arguably a smart human may have outplayed deepnash, unless if it was actually a marshal and deepnash double bluffed. So it's all just mind games in the end, which I don't know how deepnash can maneuver because scenarios like the past one seem like 50/50 if both players understand the risks.
@KeinNiemand Год назад ⁺³
But can it do even more complex turn based stategy games like Civ?
@AICoffeeBreak Год назад ⁺²
🤫, or you'll wake up the DeepMind scientists and they'll target Civilization games next.
Now honestly, I expect it to be played by RL agents in the future, just give it a few years.
@KeinNiemand Год назад ⁺²
@@AICoffeeBreak to bad DeepMind dosn't tend to release much, if they did these kind og games may finally end up with actually good AI.
@bertobertoberto242 8 месяцев назад ⁺²
softmax approximates any categorical distribution given that the support of it is the support of the softmax... in other words, it's impossible mathematically to have a softmax with discrete prob 0 or 1 (as it uses exponentials), so it's fair to just discretize them down during deployment (it's done also in LLM called top-k or beam search)
@AICoffeeBreak 8 месяцев назад ⁺²
I understand that for 0 and 1, but why do it for 0.7168 too (clamp it to 0.7)? 🫠
@Handelsbilanzdefizit Год назад ⁺³
AI beats us at another game: STRATEGO
Next:
- Car moved faster than the fastest human ran.
- Crane lifted more weight than the strongest man.
...
We are lost. 😱
@DerPylz Год назад ⁺⁴
I don't think the interesting thing here is that the bot performs well at the game, but rather the methodology they used to overcome the challenges that this specific board game poses. E.g. the incomplete information and the huge amount of possible states.
@AICoffeeBreak Год назад ⁺³
🤣

Следующие

Автовоспроизведение

Why ChatGPT fails | Language Model Limitations EXPLAINED