134 - What are Optimizers in deep learning? (Keras & TensorFlow)

Поделиться
HTML-код
  • Опубликовано: 17 ноя 2024

Комментарии • 39

  • @andreeamada3316
    @andreeamada3316 3 года назад +7

    Thank you for your video! Love the analogies with the blind folded hiker and the ball, really makes sense to me now!

  • @Sophiazahra-c7t
    @Sophiazahra-c7t Год назад +1

    You are the BEST teacher. Thank you!!! All the best for you sir Sreeni.

  • @rshaikh05
    @rshaikh05 11 месяцев назад +1

    Thank you for explaining the concepts so clearly.

  • @blaisofotso3439
    @blaisofotso3439 Год назад +1

    Hallo There , so like the way you explained tthe concept

  • @zeshn100
    @zeshn100 2 года назад

    Very good video. learned the functioning of optimisers in just 8 minutes.

  • @ulrikaakre8816
    @ulrikaakre8816 2 года назад +1

    Your videos are great. Thanks a lot!

  • @fun_stuff_and_games
    @fun_stuff_and_games Год назад

    or when we talk about optimisation, are we talking about finding the best parameters? E.g. similar to how it's done with hyperparameter tuning for RF, DT, etc...

  • @fvviz409
    @fvviz409 3 года назад +2

    I love your content!

  • @MohammedHAzeez
    @MohammedHAzeez 3 месяца назад +1

    Is it possible, please, to attach for us a link to the research paper that talks about Adam optimizes?

  • @ayaalsalihi6001
    @ayaalsalihi6001 2 года назад

    YOU ARE A LIFE SAVER !!!

    • @DigitalSreeni
      @DigitalSreeni  2 года назад +1

      I am glad you think so :)

    • @ayaalsalihi6001
      @ayaalsalihi6001 2 года назад

      @@DigitalSreeni it would be great if you can show us how to combine a different set of features, like GLRLM with CNN feature or LBP, or how to use multiple classifiers on a specific feature set, and thank you for all the good work, I and my classmates come to your channel whenever we're stuck and we always learn something from you.

  • @nirmalanarisetty8852
    @nirmalanarisetty8852 4 года назад +2

    Hello sir, it is very much informative for beginners. if possible make tutorial on stacked denoising autoencoder for intrusion detection also positively

  • @samarafroz9852
    @samarafroz9852 4 года назад

    Sir please make tutorial on image processing and segmentation with deep learning.

    • @DigitalSreeni
      @DigitalSreeni  4 года назад

      I have a bunch of videos on deep learning, please look for them in my channel.

  • @manuelpopp1687
    @manuelpopp1687 2 года назад

    Hi Sreeni, thanks for the video. Regarding the default values, in the TensorFlow description of Adam, they wrote "The default value of 1e-7 for epsilon might not be a good default in general. For example, when training an Inception network on ImageNet a current good choice is 1.0 or 0.1". Does it make sense to test several values here?
    Also, I wondered whether it makes sense at all to pass a learning rate schedule to Adam?

    • @DigitalSreeni
      @DigitalSreeni  2 года назад +1

      I am not sure why 1e-7 would not be a good default for epsilon. This hyperparameter is just there to prevent division by zero. A value of 1.0 is too large and may be used in special cases. There are a lot of hyperparameters that you can worry about but epsilon is not one of them, for a typical application. If you are engineering your own networks, like coming up with Inception like network, you can tune your own parameter. Still, I am not sure if they mentioned why 1.0 was better value than 1e-7 and if so by how much did it improve their results.

  • @fun_stuff_and_games
    @fun_stuff_and_games Год назад

    Hi Sreeni, I am a beginner with python, just learning the hooks. I thought every ML model tries to reduce the error anyway (e.g. linear regression by fitting the line and reducing the residuals...) So, what do we need optimizers for then? I don't get it. Can anyone explain?

    • @DigitalSreeni
      @DigitalSreeni  Год назад

      Optimizers are the ones helping with minimizing the loss function (error). For example, for linear regression your goal is to minimize the mean square difference error. How does the job of minimizing this error? How does the system know that the error is increasing or decreasing when parameters are changed? You can use the gradient descent optimizer for this task. The optimizer calculates the gradient of the loss function, updates the parameters by taking a step in the opposite direction of the gradient, and repeats the process until convergence or a maximum number of iterations is reached.
      Basically, Optimizers use different algorithms to update the model's parameters (e.g., weights of the neural network. ).

    • @fun_stuff_and_games
      @fun_stuff_and_games Год назад

      @@DigitalSreeni thanks Sreeni. I understand that optimizers are there for reducing the error whenever the parameters are changed. It does this with the graident descent optimizer in this case. It's just quite theoretical, always need to see the context and numbers behind it. Anyways, I may just have another look at the video. Thanks!

  • @Glovali
    @Glovali 4 года назад

    Great explanation

  • @wiczus6102
    @wiczus6102 3 года назад

    2:25 Doesn't TF transform the equations used for the input into the respective derivative? It's mathematically different from probing 2 points.

  • @rajkumarbatchu4763
    @rajkumarbatchu4763 3 года назад

    nice explanation..

  • @himalayakilaru
    @himalayakilaru 2 года назад

    Excellent Explanation. Thank you so much. One question however. So you are saying when I use Adam optimizer I dont have to explicitly define the learning rate right? but what happens when I do - optimizerr = tf.keras.optimizers.Adam(learning_rate=5e-5) . Now what does that mean? My understanding is that the Adam optimizer starts with a learning rate of 5e-5 and it will take it from there? Is that so ? TIA.

    • @jeevan88888
      @jeevan88888 7 дней назад +1

      Great question

    • @jeevan88888
      @jeevan88888 7 дней назад +1

      The Adam optimizer will still perform its adaptive moment estimation, adjusting the learning rates for each parameter based on the first and second moments of the gradients3. However, it will use your specified learning rate (5e-5 in this case) as the starting point for these adaptations.
      This approach allows you to have some control over the initial scale of the updates while still benefiting from Adam's adaptive properties. It's particularly useful when you have domain knowledge or empirical evidence suggesting that a specific learning rate might work well for your problem.

    • @himalayakilaru
      @himalayakilaru 6 дней назад

      @@jeevan88888 makes sense. Thank you!

  • @minhaoling3056
    @minhaoling3056 2 года назад

    Hi sir, could you upload slides for all videos you posted ?

  • @asraajalilsaeed7435
    @asraajalilsaeed7435 Год назад

    Hinge is loss function or optimiser?

  • @vrushalipawar2727
    @vrushalipawar2727 3 года назад

    Thank You so Much

  • @seroshmannan
    @seroshmannan 3 года назад

    Phenomenal

  • @thebiggerpicture__
    @thebiggerpicture__ 3 года назад

    Sorry, I don't understand the role of the optimizer. We know the whole objective function is derivable. I thought we are just moving in the opposite direction of this derivative. Why did you say that the optimizer keep testing directions? Thanks!

    • @DigitalSreeni
      @DigitalSreeni  3 года назад +1

      The role of the optimizer is to adjust weights and biases such that the loss get minimized. May be this video helps fill some gaps in your understanding? ruclips.net/video/KR3l_EfINdw/видео.html

    • @thebiggerpicture__
      @thebiggerpicture__ 3 года назад +1

      @@DigitalSreeni I'll take a look. Thanks for answering