Think of learning rate schedulers as setting the 'overall pace' of learning, while optimizers like Adam fine-tune the speed for each parameter during training. The scheduler steps in after an epoch or fixed interval to adjust the global learning rate (the big picture), and Adam works within that to handle mini-batch updates. It's like having a coach decide how long your practice sessions are (scheduler), while you decide how intensely to train each muscle during the session (optimizer)
A learning rate that is too small can get trapped in a local minima with no escape. A learning rate that is too large will never find the global minima because it will keep "stepping" over it.
Been tinkering for a day (and night lol).... Your training set has only 10 items and 10 classifiers assigned to it. You go through the list, make a prediction for every element. Everything is predicted wrong, because your network is fresh off the shelf and unknowing what you want from it. If an item is predicted wrong, backprop only once and then go to the next item. Skip items that have been predicted correctly. For every item in the dataset there is a count that's being incremented everytime you have to backprop and the learning rate will be count * 0.9 (really). For correct predictions count will reset to 1. So....items with low success rate will be trained everytime you loop though your data but at a monster rate, while successful items or items that have to be trained only occationally will have lower learning rate. The overall process is done, if no backprop happened inside the loop. 😘 And this is my criteria, it has to be incredible fast learning at high framerate while doing all the others stuff like extracting unknown objects from my desktop screen, all things that Johnny-Boy has never seen before.😁 Training should not be the bottleneck of AGI, I mean....am I really so clever (lol)? Why not running me on a 500MHz Computer? It should be doable 😎
Another phenomenon about constant learning rate is that the last elements in the list finish training first because the network seems to forget about the beginning of the list. Learn something and forget about the others items! That's not what I want! It seems that "dynamic monster learning" where LR is constantly fluctuating doesn't have so much trouble about forgetting, so it's more independent of the ordering of the list!
Think of learning rate schedulers as setting the 'overall pace' of learning, while optimizers like Adam fine-tune the speed for each parameter during training. The scheduler steps in after an epoch or fixed interval to adjust the global learning rate (the big picture), and Adam works within that to handle mini-batch updates. It's like having a coach decide how long your practice sessions are (scheduler), while you decide how intensely to train each muscle during the session (optimizer)
A learning rate that is too small can get trapped in a local minima with no escape. A learning rate that is too large will never find the global minima because it will keep "stepping" over it.
Thank you, these videos are very helpful :)
what a helpful videos!Thank you! :)
Been tinkering for a day (and night lol)....
Your training set has only 10 items and 10 classifiers assigned to it.
You go through the list, make a prediction for every element. Everything is predicted wrong, because your network is fresh off the shelf and unknowing what you want from it.
If an item is predicted wrong, backprop only once and then go to the next item. Skip items that have been predicted correctly.
For every item in the dataset there is a count that's being incremented everytime you have to backprop and the learning rate will be count * 0.9 (really). For correct predictions count will reset to 1.
So....items with low success rate will be trained everytime you loop though your data but at a monster rate, while successful items or items that have to be trained only occationally will have lower learning rate.
The overall process is done, if no backprop happened inside the loop. 😘 And this is my criteria, it has to be incredible fast learning at high framerate while doing all the others stuff like extracting unknown objects from my desktop screen, all things that Johnny-Boy has never seen before.😁
Training should not be the bottleneck of AGI, I mean....am I really so clever (lol)? Why not running me on a 500MHz Computer? It should be doable 😎
Another phenomenon about constant learning rate is that the last elements in the list finish training first because the network seems to forget about the beginning of the list. Learn something and forget about the others items! That's not what I want!
It seems that "dynamic monster learning" where LR is constantly fluctuating doesn't have so much trouble about forgetting, so it's more independent of the ordering of the list!
by if you have stick to learning you probably have discovered the answer huh 😁
you are so pretty..