Why do we need Cross Entropy Loss? (Visualized)

Поделиться
HTML-код
  • Опубликовано: 1 окт 2024
  • In this video, I've explained why binary cross-entropy loss is needed even though we have the mean squared error loss. I've included visualizations for better understanding.
    #machinelearning #datascience
    For more videos please subscribe -
    bit.ly/normaliz...
    Support me if you can ❤️
    www.paypal.com...
    www.buymeacoff...
    Derivation of MSE loss and BCE loss -
    • Maximum Likelihood Est...
    Animation tool by 3blue1brown -
    / @3blue1brown
    Facebook -
    / nerdywits
    Instagram -
    / normalizednerd
    Twitter -
    / normalized_nerd

Комментарии • 132

  • @manfredmichael_3ia097
    @manfredmichael_3ia097 3 года назад +42

    Best explanation I could find! This channel is gonna be big.

  • @mazharmumbaiwala9244
    @mazharmumbaiwala9244 2 года назад +7

    it was really great pointing out that its the gradient that matters more than the actual loss value.
    Great video, keep it up

  • @sunitgautam7547
    @sunitgautam7547 3 года назад +12

    The video production quality along with the explanation is really nice. Keep making such great content, your channel is bound to gain traction very rapidly. :)

    • @NormalizedNerd
      @NormalizedNerd  3 года назад

      Thanks man...Another interesting video is on the way!

  • @kvnptl4400
    @kvnptl4400 11 месяцев назад +2

    This is the exact one video you need for the "Cross Entropy Loss" keyword. Straight to the point

  • @lelouchlamperouge5910
    @lelouchlamperouge5910 14 дней назад

    Best video to understand cross entropy out there, I was struggling a bit until I found this one.

  • @anubhavsharma3642
    @anubhavsharma3642 3 года назад

    3blue 1 brown ... but its literally a brown guy in this case.. Loved the videos man ....

  • @veganath
    @veganath Год назад

    Thank you, your explanation was perfect....

  • @whenmathsmeetcoding1836
    @whenmathsmeetcoding1836 Год назад

    Loved it thanks for making this video

  • @ИринаДьячкова-и5ф
    @ИринаДьячкова-и5ф 4 года назад +3

    Thanks, this was really helpful! though i had to put it on 0.75 :D

  • @matteoocchini3119
    @matteoocchini3119 3 года назад +1

    good Video!

  • @balltomessi8515
    @balltomessi8515 День назад

    Great explanation 🫡

  • @TheQuantumPotato
    @TheQuantumPotato 4 года назад +4

    Your explanations are great! Thanks for the vids!

    • @NormalizedNerd
      @NormalizedNerd  4 года назад +1

      Thanks for watching!

    • @TheQuantumPotato
      @TheQuantumPotato 3 года назад

      @@NormalizedNerd I just wanted to come back and let you know - I got a distinction in my MSc (I did my thesis on GANs for tabular data) and your vids were a huge factor in helping me achieve this! So thank you!

    • @NormalizedNerd
      @NormalizedNerd  3 года назад +1

      @@TheQuantumPotato Your comment just made my day! Best wishes for your future endeavors 😊

    • @TheQuantumPotato
      @TheQuantumPotato 3 года назад

      @@NormalizedNerd I am currently working on a medical computer vision project - so it’s all going well! Thanks again, I look forward to watching more of your vids

  • @yiqian2977
    @yiqian2977 3 года назад +2

    Thanks for this helpful video! This delivers a clear visual explanation that my professor didn't do

  • @juliokaro
    @juliokaro 7 месяцев назад

    Great video. Thanks!

  • @jocemarnicolodijunior2851
    @jocemarnicolodijunior2851 29 дней назад

    Amazing explanation

  • @halihammer
    @halihammer 2 года назад

    Ok great but what if the number in the log is zero. For example when the ground truth is 1 but my model predicts zero? Im having trouble understanding this. I try to make an XOR-MultilayerPerceptron but 1, 0 are not good inputs. If an input is zero the weight update for the corresponding weight is impossible. I tried -1 and 1 as inputs and labels but then the loss function is not working. Im using the simoid activation function and have to hidden and one output neuron but it does not work. Maaaaan im going crazy with this ML stuff

  • @mathewhobson
    @mathewhobson Месяц назад

    this made sense! thank you

  • @anirudhbhattacharya1749
    @anirudhbhattacharya1749 3 года назад +1

    Is there any difference between BCE & weighted cross entropy loss function?

    • @NormalizedNerd
      @NormalizedNerd  3 года назад

      Yes. In the 2nd one there's an extra weight term. The value of weight is different for each class.

  • @SP-db6sh
    @SP-db6sh 3 года назад

    Speechless ! Paid course fail to deliver these concept. Experience data scientist can only.

  • @lancelotdsouza4705
    @lancelotdsouza4705 2 года назад

    very good explanation

  • @pedramm.haqiqi1022
    @pedramm.haqiqi1022 2 года назад

    Amazing explanation

  • @TejasPatil-fz6bo
    @TejasPatil-fz6bo 3 года назад

    Nicely explained....I was struggling to decode it

  • @vikashbhagat6867
    @vikashbhagat6867 3 года назад

    Khub sundor video, amake onek bhalo lageche. Keep it up :)

  • @taewookkang2034
    @taewookkang2034 3 года назад +2

    This is such an amazing video! Thanks!

  • @sujitha3335
    @sujitha3335 3 года назад

    wow, great explanation

  • @EduardoAdameSalles
    @EduardoAdameSalles 4 года назад +1

    Congrats from Brazil!

  • @yashdeshmukh4404
    @yashdeshmukh4404 2 года назад

    How do you make such animations? What softwares do you use?

    • @lusvd
      @lusvd 2 года назад

      Its called manim

  • @somritasarkar6608
    @somritasarkar6608 3 года назад

    great explanation

  • @nimishamanjaly1048
    @nimishamanjaly1048 3 года назад +1

    great explanation!

  • @wodkawhatelse8781
    @wodkawhatelse8781 3 года назад

    Very good explanation, you got a new subscriber

  • @Shaswatapal
    @Shaswatapal 2 месяца назад

    Excellent

  • @TheRohit901
    @TheRohit901 Год назад

    awesome video, thanks for explaining it so well! keep it up.

  • @patrickjane276
    @patrickjane276 2 года назад

    subscribed just b/c of the RUclips channel name!

  • @varunahlawat9013
    @varunahlawat9013 Год назад

    There is a point that I felt missing. I've read on websites that the cross-entropy function helps reach the global minima quicker.

  • @firstkaransingh
    @firstkaransingh 2 года назад

    Excellent 👍

  • @vinuvs4996
    @vinuvs4996 2 года назад

    very nice

  • @mojtabazare3606
    @mojtabazare3606 2 года назад +1

    I love the math, calculus and all the visualizations that come in this video. Great job

  • @eclypze_
    @eclypze_ 9 месяцев назад

    Amazing explanation! just what I was looking for :)
    great job man!

  • @tobiasweingarten2737
    @tobiasweingarten2737 3 года назад +1

    What a great explanation, thank you!
    One question: Don't we want the derivative to be zero if the model performs as best as it could get and that is when p always equals p^ ? Using binary-cross-entropy, we have derivatives of +1 and -1 of the loss function at the intersections with the x-axis...

    • @NormalizedNerd
      @NormalizedNerd  3 года назад

      Ideally yes, but for functions with log terms, it's not possible to achieve a derivative of 0 right?

  • @Eta_Carinae__
    @Eta_Carinae__ Год назад

    Another way of looking at it is that L2 is already way too punishing where outliers are concerned; hence we use L1, so cross-entropy is likely to exacerbate the issues already found in L2.

  • @govindnarasimman6819
    @govindnarasimman6819 2 года назад

    In regression also u can use CE. IF U CAN QUANTIZE the target into lets say n classes. Else if u are intersted in mape . U can use mape loss or (1+log) compression.

  • @mautkajuari
    @mautkajuari Год назад

    Amazing video, totally understood the concept

  • @willliu1306
    @willliu1306 3 года назад

    Thanks for your insights sharing ~

  • @sorvex9
    @sorvex9 3 года назад +7

    I am a big fan of how you remove all the unneccesary steps in the formulas in order to explain it as simply as possible. Very nice!

    • @NormalizedNerd
      @NormalizedNerd  3 года назад

      Thanks! Yeah, this way you get to know the essence.

    • @_sonu_
      @_sonu_ Год назад +1

      no those steps are too important

  • @shaktijain8560
    @shaktijain8560 2 года назад

    Undoubtedly, the explanation of cross-entropy loss I found on youtube.

  • @angelinagokhale9309
    @angelinagokhale9309 3 года назад +1

    Very nicely explained! Your video helped me a lot in my classroom discussion today. Thank you very much.

    • @NormalizedNerd
      @NormalizedNerd  3 года назад +1

      Really glad to hear that!

    • @angelinagokhale9309
      @angelinagokhale9309 3 года назад

      @@NormalizedNerd And my students enjoyed that explanation. I'll surely share your channel link with them.

    • @NormalizedNerd
      @NormalizedNerd  3 года назад +2

      @@angelinagokhale9309 Omg! I thought you are attending the class as a student! Really happy to see other educators appreciating the videos :)

    • @angelinagokhale9309
      @angelinagokhale9309 3 года назад

      @@NormalizedNerd And well, as a teacher (or better still a facilitator) of the subject, I am a student first. There is just so much to keep learning! And I enjoy it :)

  • @sheldonsebastian7232
    @sheldonsebastian7232 3 года назад

    Noice Explanation

  • @studyaccount9662
    @studyaccount9662 3 года назад

    Brilliant insight thank you so so much!!!

  • @jefferybenzos5879
    @jefferybenzos5879 3 года назад

    I thought this was a great video!
    Can you explain how this generalizes to multi-class classification problems or link me to a video where I can learn more?
    Thank you :)

  • @Ajeet-Yadav-IIITD
    @Ajeet-Yadav-IIITD 2 года назад

    Amazing explanation!! Thanks!

  • @nidcxl4223
    @nidcxl4223 3 года назад

    great video man

  • @severlight
    @severlight 3 года назад

    Just in a point! thanks

  • @HeduAI
    @HeduAI 3 года назад

    Such an awesome explanation! Thanks!

  • @sENkyyy
    @sENkyyy 3 года назад

    very well done!

  • @rizwanhamidrandhawa8090
    @rizwanhamidrandhawa8090 3 года назад +1

    Awesome video!

  • @ionut.666
    @ionut.666 2 года назад

    Nice video! It is possible to train a neural network for a classification task using MSE. For a binary classification, we can use two neurons for each class and train the network using the MSE loss. When you want to compute the classification accuracy, you can do similarly to the classification case: the predicted class is the index of largest logit in the output layer. Any idea why this works?

  • @yawee12
    @yawee12 3 года назад

    Is the curvature of the gradient the only reason we prefer CEL over MSE? does this mean that MSE would work but just converge slower needing more data to train on?

    • @NormalizedNerd
      @NormalizedNerd  3 года назад

      Yes, the slope is an important point. There's another thing...CEL arises naturally if you solve classification problem using maximum likelihood method. More about that here: ruclips.net/video/2PfGO753UHk/видео.html

  • @christiaanpretorius05
    @christiaanpretorius05 3 года назад

    Nice video and visualisation

  • @1paper1pen63
    @1paper1pen63 3 года назад

    How do we come up with this formula of binary cross entropy loss? Linked to any proof? It would be a great help

    • @NormalizedNerd
      @NormalizedNerd  3 года назад

      I did a video about it -> ruclips.net/video/2PfGO753UHk/видео.html

  • @mathandrobotics
    @mathandrobotics 4 года назад

    Wow very much helpful
    Thank you for your graphical presentation

  • @ant1fact
    @ant1fact 3 года назад

    Even as a mathematically handicapped person I can understand it now fully. Bravo!

  • @ahmed0thman
    @ahmed0thman 3 года назад

    Very neat and clearful beast i've ever found.
    thanks a lot❤.

  • @danielsvendsen8808
    @danielsvendsen8808 3 года назад

    Awesome video, thank you very much! :)

  • @pumplove81
    @pumplove81 3 года назад

    brilliant ..also delightful bong accent :)

  • @b97501063
    @b97501063 4 года назад

    Brilliant

  • @TheFilipo2
    @TheFilipo2 3 года назад

    Superb!

  • @basics7930
    @basics7930 3 года назад

    good

  • @usmannadeem18
    @usmannadeem18 3 года назад

    Awesome explanation! Thank you!

  • @majidlotfi4622
    @majidlotfi4622 3 года назад

    Best explanation I could find!

  • @yorranreurich596
    @yorranreurich596 3 года назад

    Great explanation, Thank you :)

  • @jamesang7861
    @jamesang7861 3 года назад

    Thank you!!!!

  • @coralexbadea
    @coralexbadea 3 года назад

    Really good explanation, good job !

  • @popupexistence9253
    @popupexistence9253 3 года назад

    AMAZING!

  • @irok1
    @irok1 3 года назад

    That's an amazing introduction.

  • @wedenigt
    @wedenigt 3 года назад

    Awesome explanation - keep up the good work 👍

  • @hanchen2355
    @hanchen2355 3 года назад

    Your explanation helps me a lot!

  • @erikpalacios7238
    @erikpalacios7238 3 года назад

    Thanks!!!!!!!

  • @PoJenLai
    @PoJenLai 3 года назад

    Well explained, nice!

  • @ArvindDevaraj1
    @ArvindDevaraj1 4 года назад

    nice

  • @cboniefbr
    @cboniefbr 4 года назад

    Great video!

  • @earthpatel365
    @earthpatel365 3 года назад +1

    3blue1brown copy :/

  • @UCSAdityaKiranPal
    @UCSAdityaKiranPal Год назад

    You took this explanation to the next level man! Great analysis