Parables on the Power of Planning in AI: From Poker to Diplomacy: Noam Brown (OpenAI)

Paul G. Allen School

Просмотров 56 тыс.

1 500

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 5 фев 2025
Distinguished Lecture Series
Title: Parables on the Power of Planning in AI: From Poker to Diplomacy
Speaker: Noam Brown (OpenAI)
Date: Thursday, May 23, 2024
Abstract: from Deep Blue in 1997, to AlphaGo in 2016, to Cicero in 2022, games have long been used as a way to measure the frontier capabilities of AI systems and gain algorithmic insights that have wider applications. In this talk, I will cover research breakthroughs in games including poker, Go, and Diplomacy, and in particular highlight the key role that search/planning algorithms have played in all of these achievements. I will then point to potential future applications of this research to improving machine learning models more broadly.
Bio: Noam Brown is an AI researcher at OpenAI investigating reasoning and self play. He co-created Libratus and Pluribus, the first AIs to defeat top humans in two-player no-limit poker and multiplayer no-limit poker, respectively. Noam was also the lead research scientist for Cicero, the first AI to achieve human-level performance in the natural language strategy game Diplomacy. He has received the Marvin Minsky Medal for Outstanding Achievements in AI, was named one of MIT Tech Review's 35 Innovators Under 35, and his work on Pluribus was named by Science as one of the top 10 scientific breakthroughs of 2019. Noam received his PhD from Carnegie Mellon University, for which he received the AAMAS Victor Lesser Distinguished Dissertation Award, the AAAI ACM-SIGAI Dissertation Award, and the CMU School of Computer Science Distinguished Dissertation Award.
This video is closed captioned.

Комментарии • 58

@harriehausenman8623 4 месяца назад ⁺¹⁸
Why can't I shake the feeling, someone just explained o1-preview to me, without ever mentioning it 🤔 Thank you! 🙏
@ericchang9568 3 месяца назад ⁺²
a ton of planning to roll out N COTs :)
@user-pt1kj5uw3b 2 месяца назад
He's been explaining what theyre doing without saying it for a while. Its awesome
@谢安-k6t Месяц назад ⁺¹
As he mentioned, it's the "higher-level broad discussion" about o1, where he pointed out the direction to go. Details of o1 and o3 are the "future research" that he would not be open about.
@rylieweaver1516 4 месяца назад ⁺³
This is awesome. I like how he explained the generator-verifier gap. This will be huge for AI safety and reliability in addition to performance.
@kedaibiao 10 дней назад
Good video!
@DistortedV12 4 месяца назад ⁺⁵¹
the architect of Cicero and "scaling inference time compute."
@windmaple 4 месяца назад ⁺¹¹
Well, the talk actually took place in May if you look at the description. So he kind of hinted o1 3 months ago
@DistortedV12 4 месяца назад ⁺⁶
@@windmaple ik my point exactly.. probably told UW to not release it until now
@tmchen3440 4 месяца назад
😢😮t😢 Pignll
@triplea657aaa 4 месяца назад ⁺¹³
Would love if some of these papers were in the description for easy reference!
@CameronHarrisdemont 3 месяца назад
1:26
@omadDev 4 месяца назад ⁺¹
Very interesting lecture. Thank you!
@patruff 4 месяца назад ⁺³¹
Never underestimate search. -Waldo
@smicha15 4 месяца назад ⁺¹
Oh my god brilliant.
@brianpalmer967 4 месяца назад ⁺¹
And that's how we know you're a 90s kid!
@RaviAnnaswamy 4 месяца назад ⁺²
Search means find a series of actions that lead from the current state to end state that you would
Like
Or alternatively avoid potentially bad states for you in future
@heykike 4 месяца назад
So basic algebra counts as search?
@ankitkumarpandey7262 4 месяца назад ⁺⁹
The way AI is progressing is so closely related to evolution..just at a much faster time scale.
@brandonbodily2101 4 месяца назад ⁺¹
"It is not the strongest of the species that survive, nor the most intelligent, but the one most responsive to change." - Charles Darwin
@JustinHalford 4 месяца назад ⁺²⁸
The trillion dollar question - can search with foundation models generalize beyond objectively verifiable domains like math, coding, and games?
@clray123 4 месяца назад ⁺³
The answer is no because the models, including the search-based ones, require correctly scored training data to begin with. Where is this scoring supposed to come from for other domain, which cannot be easily simulated, and in which scoring the solution correctly is a big part of the problem? That is the core question for our AI hypesters (which they will avoid at all cost as it makes the whole house of cards collapse).
So far their only proposition for image recognition and language modeling tasks specifically has been to hire thousands of underpaid workers to do all the scoring for them. The difficulty here is that scoring in real-life domains cannot be done by low-paid labor slaves. That is, if it can be done at all: in many cases experts cannot analytically explain their expertise, yet they can intuitively take "correct" actions, based on a life-long experience, using their own "neural nets" locked up in their brain.
@JustinHalford 4 месяца назад ⁺²
@@clray123 I think that you’re underestimating the odds of AI acquiring aesthetic taste at the level of talented people via clever math/algorithms. We’ve already seen art and writing contests won by AI. To me, the actual question is when, not if.
@clray123 4 месяца назад
@@JustinHalford Art and writing contests won by AI (any examples?) would really mean nothing - the recipe for success in such a contest would be to just copy someone else's great work and declare yourself the winner. We already know that AI is good at imitation, if the thing to be imitated exists in a million examples that can be interpolated across, but we also know that a great art forger does not make a great artist.
@clray123 4 месяца назад ⁺¹
I think you are overestimating the odds of AI acquiring anything, really. What we call "emergent" abiliities are really the result of being able to pick relevant signal from humungous amounts of training data. I am talking about situations where no such training data is available.
@JustinHalford 4 месяца назад ⁺⁴
@@clray123 have you heard of move 37? With sufficient compute and generalized self play, we will see many more examples of move 37 in a variety of domains.
@sergiocayuqueov 4 месяца назад ⁺¹
Interesting 💡🚀
@marbin1069 4 месяца назад ⁺¹
And this is how o1 was born.
@RaviAnnaswamy 4 месяца назад ⁺⁶
His points on why people didn’t prioritize search is very illuminating
The broader lesson here is that trained distilled knowledge is pattern recognition and good for perceptual take whereas adding a search and explore (as in GOFAI) is necessary for cognitive tasks
I think there might be one more step: to distill the patterns discovered via search back into perceptual precepts which I think is what happens in grandmaster play in chess and genius such as Newton or Ramanujan
If o1 already does this similar to alphazero I do not know as I am typing this half way the lecture
@masterchief7301 4 месяца назад ⁺¹
So, it'd be a loop of creating new patterns as it encounters novel situations.
@DistortedV12 4 месяца назад ⁺¹
Us cognitive scientists have known about this for a long time as well; "system 1" and "system 2."
@RaviAnnaswamy 4 месяца назад
@@DistortedV12 yes I am aware of that and read Kahnemans great book on that topic too but what is fascinating is how facing human players beat the system 1 version of their bot forces them to add search
@FamilyYoutubeTV-x6d 4 месяца назад
@@DistortedV12 cool
@hypercube717 4 месяца назад
Interesting
@ieltshome 3 месяца назад
I'm a newbie here and I noticed Noam uses the term planning and search interchangeably. So in a sense, RAG can be considered as planning? After all, it does the search and improve the quality of the answer. Correct me if I am mistaken.
@Eriiiiiiiick 4 месяца назад
COOL
@fil4dworldcomo623 4 месяца назад
I have been listening for a while now, though I agree that enabling search is a big factor for GenAI intellect, it's still not clear from the context of poker game if why. I can only assume you taught the model to read people's faces and then search on their historical game record to know when they are bluffing and when they do really have a strong hand?
@fil4dworldcomo623 4 месяца назад
@@erikfast9764 Thank you Erik, it keeps the excitement in the game then as that makes AI beatable by confusing it with irrational behaviour. But when AI becomes unbeatable, it must not have any hand in any game as it will kill the game.
@lesmoe524 4 месяца назад
@@fil4dworldcomo623 A.I has already been beating online poker since like 2013. Playing irrationally does not matter, the ai plays defensively aka "GTO" and doesn't mind if you never bluff, or if you bluff every hand, it will still play exactly the same way(that's why all the pros talk about using "GTO Strategy"). live poker will always be a thing, but even then you could have a device that tells you how to play like a bot though.
@patruff 4 месяца назад
TGI MCTS
@JimJordan1753 4 месяца назад ⁺⁴
He always hates going into depth on how he made the poker model
@clray123 4 месяца назад ⁺²
And rightly so because it's not the talk where he is supposed to throw around mathematical formulae mixed with arcane poker rules and assume that everyone in audience can follow.
@JimJordan1753 4 месяца назад ⁺¹
@@clray123 “always”
@samkee3859 3 месяца назад
What are you implying? I’m dense
@ericchang9568 3 месяца назад
Is the poker bot making money on the internet right now?
@Z-dv3zx 4 месяца назад ⁺²
many of these papers don't exist... did an LLM create these slides wtf
@twoplustwo5 3 месяца назад
150$ for poker bot - crazy
@sucim 4 месяца назад ⁺⁵
"I started grad school in 2012" but looks like he started grad school in 2025

Следующие

Автовоспроизведение