(9/12) MobileNets: MobileNetV2 (Part2)

Поделиться
HTML-код
  • Опубликовано: 30 янв 2025

Комментарии • 11

  • @learningwithkan7208
    @learningwithkan7208 2 года назад

    Can is ask why lost information can be recovered if ReLU is applied on a high dimensional activation. What about applied on a low dimensional activation?

    • @zardouayassir7359
      @zardouayassir7359  2 года назад

      Welcome brother. I'm not sure how I can help you because I explained in that same video the authors' justifications regarding your question.

    • @learningwithkan7208
      @learningwithkan7208 2 года назад

      @@zardouayassir7359 Thank you ^^

  • @reactorscience
    @reactorscience 2 года назад

    Great explanation. Just one question. If relu destroys the important information for negative input, why even use that. Isn't it better, we use a activation function which returns non zero output for negative input?, this way we should be able to reduce the number of dimensions and we don't need to do all these things.

    • @zardouayassir7359
      @zardouayassir7359  2 года назад +1

      * As far as I understand from the paper, what destroys information is not ReLU per see but rather activation functions in general.
      * An activation function is non-linear; non-linearity skews layer activation data and causes information loss. Recall that lost information can be recovered if ReLU is applied on a high dimensional activation.
      * The ReLU example I explained serves as a good intuition of why non-linearity destroys the data.
      * There is formal mathematical explanation of this phenomena in the paper's supplemental material.
      Apologies for the late answer. Good luck!

    • @reactorscience
      @reactorscience 2 года назад

      @@zardouayassir7359 Understood. Thank you!!!!

  • @Kovu392
    @Kovu392 2 года назад

    why they wont use different activation function to prevent information loss like elu or leaky relu

    • @zardouayassir7359
      @zardouayassir7359  2 года назад +1

      * As far as I understand from the paper, what destroys information is not ReLU per see but rather activation functions in general.
      * An activation function is non-linear; non-linearity skews layer activation data and causes information loss. Recall that lost information can be recovered if ReLU is applied on a high dimensional activation.
      * The ReLU example I explained serves as a good intuition of why non-linearity destroys the data.
      * There is formal mathematical explanation of this phenomena in the paper's supplemental material.
      Good luck!

  • @omidiravani361
    @omidiravani361 3 года назад

    great explanation.thank you.

  • @khaingsuthway932
    @khaingsuthway932 3 года назад

    Thank you 😊