or when we talk about optimisation, are we talking about finding the best parameters? E.g. similar to how it's done with hyperparameter tuning for RF, DT, etc...
@@DigitalSreeni it would be great if you can show us how to combine a different set of features, like GLRLM with CNN feature or LBP, or how to use multiple classifiers on a specific feature set, and thank you for all the good work, I and my classmates come to your channel whenever we're stuck and we always learn something from you.
Hello sir, it is very much informative for beginners. if possible make tutorial on stacked denoising autoencoder for intrusion detection also positively
Hi Sreeni, thanks for the video. Regarding the default values, in the TensorFlow description of Adam, they wrote "The default value of 1e-7 for epsilon might not be a good default in general. For example, when training an Inception network on ImageNet a current good choice is 1.0 or 0.1". Does it make sense to test several values here? Also, I wondered whether it makes sense at all to pass a learning rate schedule to Adam?
I am not sure why 1e-7 would not be a good default for epsilon. This hyperparameter is just there to prevent division by zero. A value of 1.0 is too large and may be used in special cases. There are a lot of hyperparameters that you can worry about but epsilon is not one of them, for a typical application. If you are engineering your own networks, like coming up with Inception like network, you can tune your own parameter. Still, I am not sure if they mentioned why 1.0 was better value than 1e-7 and if so by how much did it improve their results.
Hi Sreeni, I am a beginner with python, just learning the hooks. I thought every ML model tries to reduce the error anyway (e.g. linear regression by fitting the line and reducing the residuals...) So, what do we need optimizers for then? I don't get it. Can anyone explain?
Optimizers are the ones helping with minimizing the loss function (error). For example, for linear regression your goal is to minimize the mean square difference error. How does the job of minimizing this error? How does the system know that the error is increasing or decreasing when parameters are changed? You can use the gradient descent optimizer for this task. The optimizer calculates the gradient of the loss function, updates the parameters by taking a step in the opposite direction of the gradient, and repeats the process until convergence or a maximum number of iterations is reached. Basically, Optimizers use different algorithms to update the model's parameters (e.g., weights of the neural network. ).
@@DigitalSreeni thanks Sreeni. I understand that optimizers are there for reducing the error whenever the parameters are changed. It does this with the graident descent optimizer in this case. It's just quite theoretical, always need to see the context and numbers behind it. Anyways, I may just have another look at the video. Thanks!
Excellent Explanation. Thank you so much. One question however. So you are saying when I use Adam optimizer I dont have to explicitly define the learning rate right? but what happens when I do - optimizerr = tf.keras.optimizers.Adam(learning_rate=5e-5) . Now what does that mean? My understanding is that the Adam optimizer starts with a learning rate of 5e-5 and it will take it from there? Is that so ? TIA.
The Adam optimizer will still perform its adaptive moment estimation, adjusting the learning rates for each parameter based on the first and second moments of the gradients3. However, it will use your specified learning rate (5e-5 in this case) as the starting point for these adaptations. This approach allows you to have some control over the initial scale of the updates while still benefiting from Adam's adaptive properties. It's particularly useful when you have domain knowledge or empirical evidence suggesting that a specific learning rate might work well for your problem.
Sorry, I don't understand the role of the optimizer. We know the whole objective function is derivable. I thought we are just moving in the opposite direction of this derivative. Why did you say that the optimizer keep testing directions? Thanks!
The role of the optimizer is to adjust weights and biases such that the loss get minimized. May be this video helps fill some gaps in your understanding? ruclips.net/video/KR3l_EfINdw/видео.html
Thank you for your video! Love the analogies with the blind folded hiker and the ball, really makes sense to me now!
You are the BEST teacher. Thank you!!! All the best for you sir Sreeni.
Thank you for explaining the concepts so clearly.
Hallo There , so like the way you explained tthe concept
Very good video. learned the functioning of optimisers in just 8 minutes.
Your videos are great. Thanks a lot!
or when we talk about optimisation, are we talking about finding the best parameters? E.g. similar to how it's done with hyperparameter tuning for RF, DT, etc...
I love your content!
Thanks
Is it possible, please, to attach for us a link to the research paper that talks about Adam optimizes?
YOU ARE A LIFE SAVER !!!
I am glad you think so :)
@@DigitalSreeni it would be great if you can show us how to combine a different set of features, like GLRLM with CNN feature or LBP, or how to use multiple classifiers on a specific feature set, and thank you for all the good work, I and my classmates come to your channel whenever we're stuck and we always learn something from you.
Hello sir, it is very much informative for beginners. if possible make tutorial on stacked denoising autoencoder for intrusion detection also positively
Noted
Sir please make tutorial on image processing and segmentation with deep learning.
I have a bunch of videos on deep learning, please look for them in my channel.
Hi Sreeni, thanks for the video. Regarding the default values, in the TensorFlow description of Adam, they wrote "The default value of 1e-7 for epsilon might not be a good default in general. For example, when training an Inception network on ImageNet a current good choice is 1.0 or 0.1". Does it make sense to test several values here?
Also, I wondered whether it makes sense at all to pass a learning rate schedule to Adam?
I am not sure why 1e-7 would not be a good default for epsilon. This hyperparameter is just there to prevent division by zero. A value of 1.0 is too large and may be used in special cases. There are a lot of hyperparameters that you can worry about but epsilon is not one of them, for a typical application. If you are engineering your own networks, like coming up with Inception like network, you can tune your own parameter. Still, I am not sure if they mentioned why 1.0 was better value than 1e-7 and if so by how much did it improve their results.
Hi Sreeni, I am a beginner with python, just learning the hooks. I thought every ML model tries to reduce the error anyway (e.g. linear regression by fitting the line and reducing the residuals...) So, what do we need optimizers for then? I don't get it. Can anyone explain?
Optimizers are the ones helping with minimizing the loss function (error). For example, for linear regression your goal is to minimize the mean square difference error. How does the job of minimizing this error? How does the system know that the error is increasing or decreasing when parameters are changed? You can use the gradient descent optimizer for this task. The optimizer calculates the gradient of the loss function, updates the parameters by taking a step in the opposite direction of the gradient, and repeats the process until convergence or a maximum number of iterations is reached.
Basically, Optimizers use different algorithms to update the model's parameters (e.g., weights of the neural network. ).
@@DigitalSreeni thanks Sreeni. I understand that optimizers are there for reducing the error whenever the parameters are changed. It does this with the graident descent optimizer in this case. It's just quite theoretical, always need to see the context and numbers behind it. Anyways, I may just have another look at the video. Thanks!
Great explanation
Glad you liked it
2:25 Doesn't TF transform the equations used for the input into the respective derivative? It's mathematically different from probing 2 points.
its a simplification bro
nice explanation..
Excellent Explanation. Thank you so much. One question however. So you are saying when I use Adam optimizer I dont have to explicitly define the learning rate right? but what happens when I do - optimizerr = tf.keras.optimizers.Adam(learning_rate=5e-5) . Now what does that mean? My understanding is that the Adam optimizer starts with a learning rate of 5e-5 and it will take it from there? Is that so ? TIA.
Great question
The Adam optimizer will still perform its adaptive moment estimation, adjusting the learning rates for each parameter based on the first and second moments of the gradients3. However, it will use your specified learning rate (5e-5 in this case) as the starting point for these adaptations.
This approach allows you to have some control over the initial scale of the updates while still benefiting from Adam's adaptive properties. It's particularly useful when you have domain knowledge or empirical evidence suggesting that a specific learning rate might work well for your problem.
@@jeevan88888 makes sense. Thank you!
Hi sir, could you upload slides for all videos you posted ?
Hinge is loss function or optimiser?
Thank You so Much
Phenomenal
Thanks
Sorry, I don't understand the role of the optimizer. We know the whole objective function is derivable. I thought we are just moving in the opposite direction of this derivative. Why did you say that the optimizer keep testing directions? Thanks!
The role of the optimizer is to adjust weights and biases such that the loss get minimized. May be this video helps fill some gaps in your understanding? ruclips.net/video/KR3l_EfINdw/видео.html
@@DigitalSreeni I'll take a look. Thanks for answering