00:01 Unsupervised learning involves learning from unlabeled data 01:27 The video discusses reinforcement learning in Tic-Tac-Toe. 04:06 Understanding the state details and policy in reinforcement learning. 06:47 Importance of subscribing and value of association 08:57 Reinforcement learning involves updating the value of the current state based on the difference between the next and current values. 11:15 Computer learning to play Tic-Tac-Toe 13:19 The video discusses reinforcement learning in Tic-Tac-Toe 15:20 Reinforcement learning has various applications and benefits.
When we compare states S3 and S4 on 11:45 we need to remember that its a move of our opponent (the on playing for 0s). And therefore we shouldn't update our value function. Because its "X-player-value-function". What is good for X is not so for 0, right?
How to know how many times we need to do exploitation and exploration to gain an optimal solution for a particular dataset? Rather than randomly deciding how many times we need to do exploitation and exploration, is there a fixed way to find the which combination or which selection works the best to find the optimal solution.
Does this formula derive from the Bellman's equation in some way? (Q(s,a)=R(s)+lambda×Q(s',a')) Where s and a are state and action corresponding to current state, s' and a' corresponding to the next state and R is the reward
The example given in the video contained 3^9 possible states. In real world applications like these, these numbers could go very very high. So instead of going through each and every possibilities which might be very time and energy consuming, is it possible to model our approach in a way which takes into consideration only those samples/states which are much more likely to happen and then proceed to do Re-enforced learning from there?
Yes, for problems with large number of states, sampling the entire state space is one option and is done using Monte Carlo methods which are also very popular in other areas of science and engineering. Another option is to use Deep Learning which learns intelligent ways to represent the states and thereby reduce the effective size of the state space.
For low number of states, that is surely a possibility, but fails to work when the state space is very large, in which case it is generally advised to use Deep Learning which learns intelligent ways to represent the states, thereby reducing the effective size of the state space.
@Prashant Shukla 16139 Tic-Tac-Toe is too small a problem to appreciate this. Think of a game like Go, where the number of possible states is too large and regular reinforcement learning would just take forever to learn! Check this for a detailed explanation : nikcheerla.github.io/deeplearningschool/2018/01/01/AlphaZero-Explained/
Sir, If I train an RNN for a task and in it if I add this concept of reward from one state to another, is that considered to be reinforcement learning.
Whether it is considered reinforcement learning or supervised learning depends on the actual task. In supervised learning, the algorithm learns to map inputs to outputs. In reinforcement learning, the algorithm learns to perform a certain task (eg. navigating an environment or playing a game). Neural Networks are surely used for Reinforcement Learning, and the combination is called Deep Reinforcement Learning: www.baeldung.com/cs/reinforcement-learning-neural-network
So, are the values of the states determined recursively, after reaching the end state? like if that is the case isn't that similar to training our weights and biases for each node in ANN. But yes, I do understand that here we directly deal with different states.
At the fundamental level, it surely boils to function approximation, but the application, approach and models are different. It's like we use the same language for all our communication. The basic words and grammar remains the same and what changes is the outside form.
Can Reinforcement Learning be considered as the ML analogy of Closed-loop control systems (because both take into account the feedback of previous state)?
In conventional control systems, there is no learning involved. But a lot of modern day control systems use Reinforcement Learning for automatic control.
00:01 Unsupervised learning involves learning from unlabeled data
01:27 The video discusses reinforcement learning in Tic-Tac-Toe.
04:06 Understanding the state details and policy in reinforcement learning.
06:47 Importance of subscribing and value of association
08:57 Reinforcement learning involves updating the value of the current state based on the difference between the next and current values.
11:15 Computer learning to play Tic-Tac-Toe
13:19 The video discusses reinforcement learning in Tic-Tac-Toe
15:20 Reinforcement learning has various applications and benefits.
When we compare states S3 and S4 on 11:45 we need to remember that its a move of our opponent (the on playing for 0s). And therefore we shouldn't update our value function. Because its "X-player-value-function". What is good for X is not so for 0, right?
How to know how many times we need to do exploitation and exploration to gain an optimal solution for a particular dataset?
Rather than randomly deciding how many times we need to do exploitation and exploration, is there a fixed way to find the which combination or which selection works the best to find the optimal solution.
There is no such fixed way and one has to do trial and error for each given problem.
@@pythonpal-pinc Alright. Thank you sir.
Does this formula derive from the Bellman's equation in some way?
(Q(s,a)=R(s)+lambda×Q(s',a'))
Where s and a are state and action corresponding to current state, s' and a' corresponding to the next state and R is the reward
Exactly right
this is too general, please don't expect much from this. Don't watch if you have exam tomorrow :)
I have exam in next 30 minutes 😂😂
The example given in the video contained 3^9 possible states. In real world applications like these, these numbers could go very very high. So instead of going through each and every possibilities which might be very time and energy consuming, is it possible to model our approach in a way which takes into consideration only those samples/states which are much more likely to happen and then proceed to do Re-enforced learning from there?
Yes, for problems with large number of states, sampling the entire state space is one option and is done using Monte Carlo methods which are also very popular in other areas of science and engineering. Another option is to use Deep Learning which learns intelligent ways to represent the states and thereby reduce the effective size of the state space.
during training, does the model learn values for every possible state?
For low number of states, that is surely a possibility, but fails to work when the state space is very large, in which case it is generally advised to use Deep Learning which learns intelligent ways to represent the states, thereby reducing the effective size of the state space.
@Prashant Shukla 16139 Tic-Tac-Toe is too small a problem to appreciate this. Think of a game like Go, where the number of possible states is too large and regular reinforcement learning would just take forever to learn! Check this for a detailed explanation : nikcheerla.github.io/deeplearningschool/2018/01/01/AlphaZero-Explained/
How do we know if the RL algorithm converges on the TicTacToe? Which criteria should we look into?
Assuming the opponent is a perfect player, then we could say our RL agent has converged to the optimal policy if all of the games end in a tie
Does anyone have the code for this tic tac toe example?
Sir, If I train an RNN for a task and in it if I add this concept of reward from one state to another, is that considered to be reinforcement learning.
Whether it is considered reinforcement learning or supervised learning depends on the actual task. In supervised learning, the algorithm learns to map inputs to outputs. In reinforcement learning, the algorithm learns to perform a certain task (eg. navigating an environment or playing a game). Neural Networks are surely used for Reinforcement Learning, and the combination is called Deep Reinforcement Learning: www.baeldung.com/cs/reinforcement-learning-neural-network
@@pythonpal-pinc oh! Thank you sir
So, are the values of the states determined recursively, after reaching the end state? like if that is the case isn't that similar to training our weights and biases for each node in ANN.
But yes, I do understand that here we directly deal with different states.
At the fundamental level, it surely boils to function approximation, but the application, approach and models are different. It's like we use the same language for all our communication. The basic words and grammar remains the same and what changes is the outside form.
Can Reinforcement Learning be considered as the ML analogy of Closed-loop control systems (because both take into account the feedback of previous state)?
In conventional control systems, there is no learning involved. But a lot of modern day control systems use Reinforcement Learning for automatic control.
thanks
Very well explained !! Thank you 👍
Can reinforcement learning be used for feature extraction? And to how much extent is it better than any other approaches?
I am not sure how one would do that but am open to suggestions!
Waste of time and data
total waste of time