Reinforcement Learning : Tic-Tac-Toe

Python Pal

Просмотров 28 тыс.

389

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 9 фев 2025
#DataScience #ReinforcementLearning #TicTacToe

Комментарии • 30

@hazilyacine Год назад ⁺¹
00:01 Unsupervised learning involves learning from unlabeled data
01:27 The video discusses reinforcement learning in Tic-Tac-Toe.
04:06 Understanding the state details and policy in reinforcement learning.
06:47 Importance of subscribing and value of association
08:57 Reinforcement learning involves updating the value of the current state based on the difference between the next and current values.
11:15 Computer learning to play Tic-Tac-Toe
13:19 The video discusses reinforcement learning in Tic-Tac-Toe
15:20 Reinforcement learning has various applications and benefits.
@notteenage Месяц назад
When we compare states S3 and S4 on 11:45 we need to remember that its a move of our opponent (the on playing for 0s). And therefore we shouldn't update our value function. Because its "X-player-value-function". What is good for X is not so for 0, right?
@GopikaSR 3 года назад ⁺⁶
How to know how many times we need to do exploitation and exploration to gain an optimal solution for a particular dataset?
Rather than randomly deciding how many times we need to do exploitation and exploration, is there a fixed way to find the which combination or which selection works the best to find the optimal solution.
@pythonpal-pinc 3 года назад
There is no such fixed way and one has to do trial and error for each given problem.
@GopikaSR 3 года назад
@@pythonpal-pinc Alright. Thank you sir.
@sohamgosavi6797 Год назад ⁺¹
Does this formula derive from the Bellman's equation in some way?
(Q(s,a)=R(s)+lambda×Q(s',a'))
Where s and a are state and action corresponding to current state, s' and a' corresponding to the next state and R is the reward
@anshumanchoudhary4732 Год назад ⁺¹
Exactly right
@androidterminal3924 9 месяцев назад ⁺²
this is too general, please don't expect much from this. Don't watch if you have exam tomorrow :)
@Fandrii 8 месяцев назад
I have exam in next 30 minutes 😂😂
@VatsalyaSharan 3 года назад ⁺⁵
The example given in the video contained 3^9 possible states. In real world applications like these, these numbers could go very very high. So instead of going through each and every possibilities which might be very time and energy consuming, is it possible to model our approach in a way which takes into consideration only those samples/states which are much more likely to happen and then proceed to do Re-enforced learning from there?
@pythonpal-pinc 3 года назад ⁺¹
Yes, for problems with large number of states, sampling the entire state space is one option and is done using Monte Carlo methods which are also very popular in other areas of science and engineering. Another option is to use Deep Learning which learns intelligent ways to represent the states and thereby reduce the effective size of the state space.
@ShivanBhatt-sx5ib 3 года назад ⁺²
during training, does the model learn values for every possible state?
@pythonpal-pinc 3 года назад
For low number of states, that is surely a possibility, but fails to work when the state space is very large, in which case it is generally advised to use Deep Learning which learns intelligent ways to represent the states, thereby reducing the effective size of the state space.
@pythonpal-pinc 3 года назад
@Prashant Shukla 16139 Tic-Tac-Toe is too small a problem to appreciate this. Think of a game like Go, where the number of possible states is too large and regular reinforcement learning would just take forever to learn! Check this for a detailed explanation : nikcheerla.github.io/deeplearningschool/2018/01/01/AlphaZero-Explained/
@beizhou4025 3 года назад
How do we know if the RL algorithm converges on the TicTacToe? Which criteria should we look into?
@rotviepe Год назад
Assuming the opponent is a perfect player, then we could say our RL agent has converged to the optimal policy if all of the games end in a tie
@vascoabreu412 2 года назад ⁺¹
Does anyone have the code for this tic tac toe example?
@koushiksrinivasula3584 3 года назад ⁺¹
Sir, If I train an RNN for a task and in it if I add this concept of reward from one state to another, is that considered to be reinforcement learning.
@pythonpal-pinc 3 года назад
Whether it is considered reinforcement learning or supervised learning depends on the actual task. In supervised learning, the algorithm learns to map inputs to outputs. In reinforcement learning, the algorithm learns to perform a certain task (eg. navigating an environment or playing a game). Neural Networks are surely used for Reinforcement Learning, and the combination is called Deep Reinforcement Learning: www.baeldung.com/cs/reinforcement-learning-neural-network
@koushiksrinivasula3584 3 года назад
@@pythonpal-pinc oh! Thank you sir
@koushiksrinivasula3584 3 года назад ⁺¹
So, are the values of the states determined recursively, after reaching the end state? like if that is the case isn't that similar to training our weights and biases for each node in ANN.
But yes, I do understand that here we directly deal with different states.
@pythonpal-pinc 3 года назад ⁺²
At the fundamental level, it surely boils to function approximation, but the application, approach and models are different. It's like we use the same language for all our communication. The basic words and grammar remains the same and what changes is the outside form.
@ShirshakkPurkayastha 3 года назад ⁺¹
Can Reinforcement Learning be considered as the ML analogy of Closed-loop control systems (because both take into account the feedback of previous state)?
@pythonpal-pinc 3 года назад
In conventional control systems, there is no learning involved. But a lot of modern day control systems use Reinforcement Learning for automatic control.
@ElifArslan-l9g 2 года назад
thanks
@whoosshhaa 3 года назад
Very well explained !! Thank you 👍
@ShuhulHandoo-md9vm 3 года назад ⁺²
Can reinforcement learning be used for feature extraction? And to how much extent is it better than any other approaches?
@pythonpal-pinc 3 года назад
I am not sure how one would do that but am open to suggestions!
@kingdomofknowledge5960 Год назад ⁺⁴
Waste of time and data
@JJ-fq3dh 8 месяцев назад
total waste of time

Следующие

Автовоспроизведение