Honestly didn't really help with my questions, but I didn't expect a 3 minute video to answer them. This was very well done, the visualization was great, and everything it touched on (while brief) was concise and accurate. Subbed.
wow. you have such a talent for explaining things so well compared to the rest of the youtube sphere. i hope you will continue to bless us with your talents.
Hello. Thanks for this great video. Just I believe at 2:20, this variant of Gradient Descent that you explained is called the Mini-Batch Gradient Descent which uses a random subset of the training dataset. Stochastic Gradient Descent is the one that uses just one training record in each iteration.
Could you please make a video explaining how you made this video.. That would be very VERY helpful.. I've always wanted to use blender to make such animations as you've made, but couldn't make head or tails of it in blender. Most tutorials(and I've seen more than 100 videos) on blender showcase heavyduty animations which have nothing to do with mathematical explanations, as in how to make animations for maths related videos.. Your's is the first video in which I've seen such a thing. Please consider my request and kindly make a video tutorial about it(for the Blender part).
It works just fine with multiple minimum points in a high dimension. As long as you configure your hyperparameters (learning rate, batch size, etc) correctly, you should have no problem converging to a "decent" minimum.
very smart. But I still need another video presenting differential to help understand the slope, opposite direction thing in 2D. But this is clear. Also I like the small step demonstration.
For Vanilla GD, are you not supposed to divide by the number of samples in data before performing the update? Or do you just take the sum of this 'accumulated gradient'?
Great idea! Boyd's book is a good starting point (page 463 of web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf). I will try to add more references to the video description in the future.
Interesting topic and comparison. Since you are using information from past iterations, it would be very illustrative to include a quasi-newton in your comparison. For example the BFGS.
Isn't it awesome in a simplified way, I was juz implementing OLS for Vanilla linear regression to train the neural networks with some weights and bias and this video popped up was doing some stuff with thje matrix and dot product , I luv Mathematics !!! one thing when we have OLS algo directly why we need to implement OLS with GDA again then wts the use of having OLS algo seperately is it coz of Volume of the data points ?
This animation is really great for a small channel like this
Great job! I like that you were even able to talk about some of the different types of gradient decent algorithms, tall task for 3 minutes.
Same here. i studied Applied mathematics, so I have to get up to speed on this rather quickly, I find these videos to be excellent.
That's appealing to see these visual explanations after learning the the concept !
just a correction, 2:30 is mini-batch stochastic gradient descent since we are iterating over batches
Very comprehensive and short, love it! Quick and concise!
Thanks so much!
Best explanation and visualization I've seen. You have incredible talent. Please keep making more.
I really liked the video and the visuals, but I think it would be better without the "generic music" in the background.
Thank you for taking the time to post your feedback, this is very useful for the growth of this channel!
Thank you for the clear explanation
Honestly didn't really help with my questions, but I didn't expect a 3 minute video to answer them. This was very well done, the visualization was great, and everything it touched on (while brief) was concise and accurate. Subbed.
First video, where I got clear and precise understanding of the topic
wow. you have such a talent for explaining things so well compared to the rest of the youtube sphere. i hope you will continue to bless us with your talents.
Hello. Thanks for this great video.
Just I believe at 2:20, this variant of Gradient Descent that you explained is called the Mini-Batch Gradient Descent which uses a random subset of the training dataset.
Stochastic Gradient Descent is the one that uses just one training record in each iteration.
Yeah i have the same doubt
Great Job! well done
Thanks a lot!
Thanks for the amazing explanation and visualization
It mixes very well the theory and a practical example.
Wow this is so well, intuitivly explained
My professor said “there is no excuse for gradient descent” when conjugate gradient is so easy to implement
Keep up the good work! This viedeo and the whole chanel are amazing!
Bravo 👏🏻
i looooooooooove when the background music stops. it tells you to open your eye and focus your ear, a really important revelation is coming ....
Brilliant work 👍
Could you please make a video explaining how you made this video.. That would be very VERY helpful.. I've always wanted to use blender to make such animations as you've made, but couldn't make head or tails of it in blender. Most tutorials(and I've seen more than 100 videos) on blender showcase heavyduty animations which have nothing to do with mathematical explanations, as in how to make animations for maths related videos.. Your's is the first video in which I've seen such a thing. Please consider my request and kindly make a video tutorial about it(for the Blender part).
Hey Amit, I am definitely planning to make a video about my workflow, and in particular, how I make the animations. So stay tuned for that! :)
@@VisuallyExplained Will definitely wait for it.. Thanks for considering it. I appreciate it a lot.
Huge thanks, it would be fantastic (fellow teacher here:-)
Literal gift from god this channel
Bravo sadiki
I have only seen one video and it is helping me a lot! Keep going!
great video but at around 1:30 my heart dropped it felt like a scary movie since it was all dark lol
Thanks for this, the visualization helps a lot!
Amazing explanation and visualization
excellent vid. do you have a video about PPO in RL?
Excellent video!
Great video, was wondering how this works if there are multiple minimum points where the data has high dimensionality?
It works just fine with multiple minimum points in a high dimension. As long as you configure your hyperparameters (learning rate, batch size, etc) correctly, you should have no problem converging to a "decent" minimum.
Helpful for revising the topic :)
really nice animation, explanation and content than you very much for sharing! :)
thanks for visualization it really helped .
Proximal GD next please!
funcking incredible explanation in just 3 minutes..wow!
It is basically using the principle of induction to create a cardinality symmetry.
how you make these videos ?
manim ??
very smart. But I still need another video presenting differential to help understand the slope, opposite direction thing in 2D. But this is clear. Also I like the small step demonstration.
great video, thanks!
For Vanilla GD, are you not supposed to divide by the number of samples in data before performing the update? Or do you just take the sum of this 'accumulated gradient'?
Beautiful 👍
Can you please post a link or titles of materials(books) on this topic tsht one can go through . I really need to learn this topic. Thank you
Great idea! Boyd's book is a good starting point (page 463 of web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf). I will try to add more references to the video description in the future.
Thank you
Great video, would you consider some topics in numerical analysis, like gaussian quadrature???
Which software did you use to make those animations.
I used Blender3D (with python) for all 3d scenes. The rest is a combination of After Effects and the python library manim
what is the ita?
i was about to ask the same thing , he suddenly introduced it into the cost function as a paramter then never talked about what it meant
Interesting topic and comparison. Since you are using information from past iterations, it would be very illustrative to include a quasi-newton in your comparison. For example the BFGS.
Thanks, and great suggestion!!!
Great Animation buddy.. Cool..
Thank you! Cheers!
Huge thank you!
My fav video about this
How does this actually apply in reverse, though? How do you apply this
This is amazing
Is this the same as Newton's method, or the Newton-Raphson method?
Wow, 😊❤️ love it
Thank you! Cheers!!!
awesome
Amazing !!!
What is gradient descent trying to find?
To minimize cost/error of a learning model
Adam usually works quite good
Isn't it awesome in a simplified way, I was juz implementing OLS for Vanilla linear regression to train the neural networks with some weights and bias and this video popped up was doing some stuff with thje matrix and dot product , I luv Mathematics !!! one thing when we have OLS algo directly why we need to implement OLS with GDA again then wts the use of having OLS algo seperately is it coz of Volume of the data points ?
magnificent 🔥😧
simplex method please
great video
Great comment, thanks!
i dont get it lol
Good!
dope
veeeeery nice
As you might know I studied this topic in London…
I obviously aced it.😂
Wow
sagapo
And how the fuck i get the n ??
房山
Damn calculus.
I understand nothing.