Not sure if I understood - "Stochastic Gradient Descent (Mini-Batch) introduces noise" - how so? Mini-batches will have less noise compared to FB Gradient Descent, won't it?
It has more noise because the average of the gradients computed at each iteration (update of weights) is different from the true gradient computed over the whole dataset. As we make many of these small weight updates in SGD throughout the epoch, these gradients should average something close to the true gradient (as seen in the graph at ruclips.net/video/SftOqbMrGfE/видео.html), but it is not exactly the same. This difference in the weight updates at each iteration is the noise she's referring to.
this was very helpful, thank you :)
Great video! Absolutely helpful!
@5:02, what are the x and y axis representing. I'm pretty sure the y-axis is loss but x-axis?
so the optimal batch size is the largest one that fits within the main memory and or gpu memory?
Great video! Thanks for sharing!
Really nice video, clear explanation. Thank you ;)
Great explanation!
are those three required to train a model?
great xplantation moitas grazas
Not sure if I understood - "Stochastic Gradient Descent (Mini-Batch) introduces noise" - how so? Mini-batches will have less noise compared to FB Gradient Descent, won't it?
It has more noise because the average of the gradients computed at each iteration (update of weights) is different from the true gradient computed over the whole dataset.
As we make many of these small weight updates in SGD throughout the epoch, these gradients should average something close to the true gradient (as seen in the graph at ruclips.net/video/SftOqbMrGfE/видео.html), but it is not exactly the same.
This difference in the weight updates at each iteration is the noise she's referring to.
Beautiful voice, sister
helpful
holy great
noice