Maybe it's just my humble impression, but I think examples like Bandits or Blackjack are not very intuitive for someone who is just getting into RL, but they are always used as canonical because they appear in Sutton & Barto 🤔
Unfortunately almost all tutorials, lectures etc. are based on Sutton & Barto book... which is ... well.. not very creative to put it nicely. The book itself is not as good as it should be as a RL bible (for me there is too much historical background and proxy discussion with other RL "fathers"). Still waiting for another "bible" in this topic that would be much more practical and less "academic".
15:27 Why are function approximators are optimized with mean squared error function (L2) by default? Banach's fixed point theorem uses L-infinity norm which is closer to L1 error function
Is there any proof that TD converges to the maximum likelihood estimate of the Markov model, for the given data? If so could anyone direct me to it, please?
How lucky I am... It is a great lecture. It is fun and so understandable since it is well explained.
Thank you for sharing for all of us.
Maybe it's just my humble impression, but I think examples like Bandits or Blackjack are not very intuitive for someone who is just getting into RL, but they are always used as canonical because they appear in Sutton & Barto 🤔
Unfortunately almost all tutorials, lectures etc. are based on Sutton & Barto book... which is ... well.. not very creative to put it nicely. The book itself is not as good as it should be as a RL bible (for me there is too much historical background and proxy discussion with other RL "fathers"). Still waiting for another "bible" in this topic that would be much more practical and less "academic".
Thank you so much for the amazing lecture!
This was a great lecture, thank you :)
15:27 Why are function approximators are optimized with mean squared error function (L2) by default? Banach's fixed point theorem uses L-infinity norm which is closer to L1 error function
1:32:41 "Inception"
I really appreciate the lecture and the effort, but all formular development needs to be done much more rigorously.
Is there any proof that TD converges to the maximum likelihood estimate of the Markov model, for the given data? If so could anyone direct me to it, please?
Hi!
Have you found it?
Thanks in advance.
@@juanmoreno9633 Hello!
No, I have not. If you do find it, please let me know. Thank you.
@@ayoghes2277 how about this? link.springer.com/content/pdf/10.1023/A:1022632907294.pdf
it's good that this isn't more straying toward anything military directedly e related at all