Neural Networks Part 5: ArgMax and SoftMax

Поделиться
HTML-код
  • Опубликовано: 22 июл 2024
  • When your Neural Network has more than one output, then it is very common to train with SoftMax and, once trained, swap SoftMax out for ArgMax. This video give you all the details on these two methods so that you'll know when and why to use ArgMax or SoftMax.
    NOTE: This StatQuest assumes that you already understand:
    The main ideas behind Neural Networks: • The Essential Main Ide...
    How Neural Networks work with multiple inputs and outputs: • Neural Networks Pt. 4:...
    For a complete index of all the StatQuest videos, check out:
    statquest.org/video-index/
    If you'd like to support StatQuest, please consider...
    Buying my book, The StatQuest Illustrated Guide to Machine Learning:
    PDF - statquest.gumroad.com/l/wvtmc
    Paperback - www.amazon.com/dp/B09ZCKR4H6
    Kindle eBook - www.amazon.com/dp/B09ZG79HXC
    Patreon: / statquest
    ...or...
    RUclips Membership: / @statquest
    ...a cool StatQuest t-shirt or sweatshirt:
    shop.spreadshirt.com/statques...
    ...buying one or two of my songs (or go large and get a whole album!)
    joshuastarmer.bandcamp.com/
    ...or just donating to StatQuest!
    www.paypal.me/statquest
    Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
    / joshuastarmer
    0:00 Awesome song and introduction
    2:02 ArgMax
    4:21 SoftMax
    6:36 SoftMax properties
    9:31 SoftMax general equation
    10:20 SoftMax derivatives
    #StatQuest #NeuralNetworks #ArgMax #SoftMax

Комментарии • 229

  • @statquest
    @statquest  2 года назад +9

    The full Neural Networks playlist, from the basics to deep learning, is here: ruclips.net/video/CqOfi41LfDw/видео.html
    Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

  • @mrglootie101
    @mrglootie101 3 года назад +29

    Can't wait for "cross entropy cleary explained!" BAM!

  • @AndruXa
    @AndruXa Год назад +10

    universities offering AI/ML programs should just hire a program manager to sort and prioritize Josh Starmer's YT videos and organize exams

  • @cara1362
    @cara1362 3 года назад +15

    The video is so impressive especially when you explain why we can't treat the output of softmax as a simple probability. Best tutorial ever for all the explanations in ML!!!

  • @iReaperYo
    @iReaperYo 2 месяца назад +1

    nice touch at the end. I didn't realise the use for ArgMax until you said it's nice for classifying new observations

  • @bryan6aero
    @bryan6aero 3 года назад +2

    Thank you! This is by far the clearest explanation of SoftMax I've found. I finally get it!

  • @201pulse
    @201pulse 3 года назад +2

    I just want to say that YOU are awesome. Best educational content on the web hands down.

    • @statquest
      @statquest  3 года назад

      Thank you very much! :)

  • @Aman-uk6fw
    @Aman-uk6fw 3 года назад +8

    No words for you man , you are doing a very great, and I totally fall in love with your music and way you teach, love from india❤️

    • @statquest
      @statquest  3 года назад +1

      Thank you very much! :)

  • @karansaxena96
    @karansaxena96 2 года назад +3

    Your way of explaining things made me subscribe you. Love to see topics explained in a simple yet funny way. Keep up the great work. And also.... *BAM*

    • @statquest
      @statquest  2 года назад +1

      Thank you very much! BAM! :)

  • @AlbertHerrandoMoraira
    @AlbertHerrandoMoraira 3 года назад +33

    Your videos are awesome! Thank you for doing them and continue with the great work! 👍

    • @statquest
      @statquest  3 года назад +1

      Thank you very much! :)

  • @ishanbuddhika4317
    @ishanbuddhika4317 3 года назад +3

    Hi Josh,
    Your explanations are super awesome!!! You ruin barriers for statistics!!! Also they are super creative :). Many Thanks! Please keep it up. Thanks again. BAM!!!

    • @statquest
      @statquest  3 года назад +1

      Thank you! BAM! :)

  • @factsfigures2740
    @factsfigures2740 3 года назад +5

    Sir the way you teach is exceptionally creative
    thanks to you , my deep learning exam went well

    • @statquest
      @statquest  3 года назад +3

      TRIPLE BAM!!! Congratulations!!

  • @menchenkenner
    @menchenkenner 3 года назад +7

    Hey Josh, needless to say, your videos and tutorials are amazingly fun! Can you please create an video-series on Shapley values! Those are widely used in practise.

    • @statquest
      @statquest  3 года назад +2

      Thanks for your support and I'll keep that topic in mind! :)

  • @aswink112
    @aswink112 3 года назад +1

    Thanks Josh for the crystal clear explanation.

    • @statquest
      @statquest  3 года назад

      Glad it was helpful!

  • @NicholasHeeralal
    @NicholasHeeralal 2 года назад +2

    Your videos have been extremely helpful, thank you so much!!

  • @srishylesh2935
    @srishylesh2935 Год назад +1

    Josh. Hands down genius. Im crying.

  • @lucarauchenberger628
    @lucarauchenberger628 2 года назад +2

    this is all so well explained! just wow!

  • @drccccccccc
    @drccccccccc 2 года назад +1

    you deserve a professor tittle!!! Fantastic

  • @patriciachang5079
    @patriciachang5079 3 года назад +5

    Thousand thanks for the explanation! Your explanation is much easier to understand, comparing to my lecturers! Could you make some videos about cost function? :)

  • @coralkuta7804
    @coralkuta7804 Год назад +1

    Just bought your book ! it's AMAZING !!! your videos too :)

    • @statquest
      @statquest  Год назад +1

      Thank you so much! :)

    • @coralkuta7804
      @coralkuta7804 Год назад +1

      @@statquest I'm spreading your existance to all of my students friends ✌️

  • @jijie133
    @jijie133 3 года назад +1

    predicted probabilities, probabilities calibration. Great video.

  • @haadialiaqat4590
    @haadialiaqat4590 2 года назад +1

    Excellent vedio. Thank you for explaining so well.

    • @statquest
      @statquest  2 года назад +1

      Glad it was helpful!

  • @user-se8ld5nn7o
    @user-se8ld5nn7o 2 года назад +1

    Hi! First of all, absolutely amazing video!

  • @ilkinhamid1072
    @ilkinhamid1072 3 года назад +1

    Thank You for awesome explanation

    • @statquest
      @statquest  3 года назад

      Glad it was helpful!

  • @faycalzaidi6459
    @faycalzaidi6459 3 года назад +2

    bonjour JOSH
    merci beaucoup pour cette belle explication.

  • @palsshin
    @palsshin 2 года назад +1

    amazing as always!!

  • @hangchen
    @hangchen Год назад +1

    11:06 The best word of the century.

  • @amiryo8936
    @amiryo8936 Год назад +1

    Lovely video 👌

  • @weisionglee360
    @weisionglee360 Год назад +1

    First, thank you for your amazingly well-planned and prepared course videos! They are invaluable! A question about SoftMax func. It seems to me, for single output, Softmax() will always return value "1", so can't be used for backpropagation, no?

    • @statquest
      @statquest  Год назад

      If you only have a single output from your NN, then you wouldn't use Softmax to begin with. However, when you have more than one output, then the derivative works out. For details, see ruclips.net/video/M59JElEPgIg/видео.html ruclips.net/video/6ArSys5qHAU/видео.html and ruclips.net/video/xBEh66V9gZo/видео.html

  • @junaidbutt3000
    @junaidbutt3000 3 года назад +1

    Great video as always Josh! Just to clarify something about the discussion around the 9:38 timestamp, you're taking i =1 (Setosa) as an example right? When updating all of the parameter values via backpropagation, we would need to compute the softmax derivatives for all i and with respect to all output values - is that correct? So we would also require the derivative for the softmax value Virginica with respect to raw values for setosa, versicolor and virginica and also the derivative for the softmax value Versicolor with respect to raw values for setosa, versicolor and virginica?

    • @statquest
      @statquest  3 года назад +1

      Yes, that is correct.

  • @gurns681
    @gurns681 2 года назад +1

    Fantastic vid!

  • @francismikaelmagueflor1749
    @francismikaelmagueflor1749 Год назад +1

    low key kinda proud that I did the derivative before you even asked where it came from xd

  • @martynasvenckus423
    @martynasvenckus423 2 года назад

    Hi Josh, thanks for great video as always. The only thing I wanted to ask is about argmax function. The way you describe it works implies that argmax returns a vector of 0s (having 1 in the position of maximum value) which is of the same length as the input vector. However, the way argmax works in numpy or pytorch libraries is by returning a scalar value indicating the position instead of a vector. Given this difference, what is the true behaviour of argmax? Thanks

    • @statquest
      @statquest  2 года назад

      In both cases, argmax identifies the element with the largest value.

  • @travel6142
    @travel6142 2 года назад +1

    Thank you for this video. I understood the logic behind softmax. While backpropagating from loss to softmax and then from softmax to the raw input, for example for setosa we have 3 derivates (as you mentioned in video). After calculating them (derivate of setosa wrt to the 3 classes), what do we do? We sum them up? Or multiply, or, ... ?

    • @statquest
      @statquest  2 года назад +1

      See: ruclips.net/video/xBEh66V9gZo/видео.html

    • @travel6142
      @travel6142 2 года назад +1

      @@statquest I will check it, thank you!

  • @elemenohpi8510
    @elemenohpi8510 5 месяцев назад

    Thank you for the video. Quick question, as far as I understood, argmax and softmax are applied to the outputs of the last layer. Couldn't we use Argmax but train the network with back propagation with the outputs before argmax is applied?

    • @statquest
      @statquest  5 месяцев назад

      Yes, and that is often the case.

  • @dianaayt
    @dianaayt 9 месяцев назад

    hi! Does softmax has any limitations? It seems to good to be true and when that happens it usually isn't good haha I've seem some like being sensitive to outliers but I don't quite understand why. Is it if the raw numbers had some outlier?

    • @statquest
      @statquest  9 месяцев назад

      What do you mean by "too good to be true"? What seems too good to be true about the softmax function?

  • @zhenhuahuang291
    @zhenhuahuang291 3 года назад

    Could you do some videos of R or SAS for Neural Network using ReLU and Softmax activiation functions?

    • @statquest
      @statquest  3 года назад

      I plan on doing on in R soon.

  • @janeli2487
    @janeli2487 Год назад

    Hey @StatQuest,
    I am a bit confused about ArgMax function and why its derivative is 0. The argmax function that I used in python return the index of the max value which I would assume is different from what the ArgMax function you mentioned here. What is the explicit function of the ArgMax in your video?

    • @statquest
      @statquest  Год назад

      Regardless of whether or not your function sets the largest output to 1 and everything else to 0, or just returns the index of the largest output and ignores everything else, the the output is is constant until the threshold is met, then switches at that point (is discontinuous) and is then constant again. Thus, either way, the derivative is 0.

  • @tianchengsun3767
    @tianchengsun3767 2 года назад

    looks that softmax is very similar to logistic regression? correct me if I am wrong? Could you give a brief explanation? Thank you so much

    • @statquest
      @statquest  2 года назад

      It's quite different. Logistic regression doesn't just take a bunch of random values and convert them into "probabilities". For details, see: ruclips.net/p/PLblh5JKOoLUKxzEP5HA2d-Li7IJkHfXSe

  • @porkypig7170
    @porkypig7170 Год назад

    I’m getting 0.11 (rounded), not 0.10 as the softmax for versicolor using this calculation: e^-0,4/(e^1,43+e^-0,4+e^0,23)
    Is it correct? Just double checking to make sure I’m making the right calculations

  • @alonsomartinez9588
    @alonsomartinez9588 Год назад

    It would be good to remind people what 'e' is in this vid as well as what the current value of it is! People could mistake error of the network vs entropy?

  • @beshosamir8978
    @beshosamir8978 Год назад

    Hi Josh , I have some doubts here , Why we needed to use softmax at all in training ?why we didn't continue to use SSR like a backpropagation main idea ? is there any problem with SSR , so it made us had to transform the output to something else to work with ?

    • @statquest
      @statquest  Год назад

      SoftMax allows us to use Cross Entropy as a loss function, which I believe makes training easier when there are multiple classifications.

  • @qingfenglin
    @qingfenglin 7 месяцев назад +1

    Thanks!

    • @statquest
      @statquest  7 месяцев назад

      Thank you so much for supporting StatQuest! TRIPLE BAM!!! :)

  • @anshulbisht4130
    @anshulbisht4130 Год назад

    Hey josh ,
    Q1) if we are classifying N class then do our NN give us N-1 decision surface ?
    Q2) when we get our query point Xq , we pass it through all decision surface and get value predicted by each surface ?

    • @statquest
      @statquest  Год назад

      A1) See: ruclips.net/video/83LYR-1IcjA/видео.html
      A2) See A1.

  • @joaoperin8313
    @joaoperin8313 Год назад +1

    We need to minimize SSR to Regression problems using Neural Network -> when we have a quantitative response,
    We use SoftMax , ArgMax and CrossEntropy to Classification problems using Neural Network -> when we have a qualitative response. I think is something in this line...

    • @statquest
      @statquest  Год назад +1

      Yep, that's pretty much the idea.

  • @Anujkumar-my1wi
    @Anujkumar-my1wi 3 года назад

    I know that a feedforward neural net with 1 hidden layer is universal approximator but can you tell me why we use nonlinear activaton function in 2nd hidden layer in neural net with 2 hidden layer ,beacuse the neurons in 1st hidden layer has leaned nonlinear function with respect to inputs and the 2nd hidden layer is just doing linear combination thus a linear combination of nonlinear function with respect to inputs is a nonlinear function ,then why we use activation function in 2nd layer in 2 layer neural net?

    • @statquest
      @statquest  3 года назад

      I think the more activation functions we have, the more flexibility we have in the model.

    • @Anujkumar-my1wi
      @Anujkumar-my1wi 3 года назад

      @@statquest means we can use second just for linear combination of nonlinear function(learned from previous layer neurons)so to learn more complex nonlinear function,but this won't provide more flexibility than if we have used activation function with linear combination.

  • @breakingBro325
    @breakingBro325 9 месяцев назад

    Hello Josh, really nice video, could I ask you what software you used to create the video? I want to take notes by using the same thing you used and learn some presentation skills from it.

    • @statquest
      @statquest  9 месяцев назад

      I give away all of my secrets in this video: ruclips.net/video/crLXJG-EAhk/видео.html

  • @MADaniel717
    @MADaniel717 3 года назад

    How do I tune the other weights and biases altogether?

    • @statquest
      @statquest  3 года назад

      Like this: ruclips.net/video/IN2XmBhILt4/видео.html ruclips.net/video/iyn2zdALii8/видео.html ruclips.net/video/GKZoOHXGcLo/видео.html ruclips.net/video/xBEh66V9gZo/видео.html

  • @averagegamer9513
    @averagegamer9513 Год назад

    I have a question. Why is the softmax function necessary? It seems like you could directly calculate probabilities between 0 and 1 summing to 1 without the exponential function, so why do we use it?

    • @statquest
      @statquest  Год назад

      Sure, there are other ways you could solve this problem. However, the SoftMax function has a derivative that is relatively easy to compute, and that makes it relatively easy to work with in terms of using Backpropagation.

    • @averagegamer9513
      @averagegamer9513 Год назад +1

      @@statquest Thanks for the explanation, and great video!

  • @ritwikpm
    @ritwikpm 9 месяцев назад

    We minimise cross entropy (= - log likelihood) to fit both Neural Networks and Logistic Regression. Logistic regression can also theoretically converge to different parameter estimates based on initial weights - just like neural networks. But we still consider their output to be a representation of probability - specifically because they are fit to maximise log likelihood. Why can't similar logic be applied to Neural Network classification. The parameter estimates might vary, but as long as we are maximising log likelihood (and minimising the most common loss cross entropy), are we not predicting probabilities...?

    • @statquest
      @statquest  9 месяцев назад

      To be honest, I don't really know. But if I had to guess, it might have something to do with the fact that Logistic Regression fits a relatively simple and easy to understand shape to the data that doesn't allow non-linearities in the sense that the predicted probabilities don't start low, then go up and then go low again. In contrast, neural networks have no limit on the shape they can fit to the data and allow all kinds of non-linearities.

  • @csmatyi
    @csmatyi 2 года назад

    what happens when you run the NN with softmax and 2 outputs have the same value?

    • @statquest
      @statquest  2 года назад

      Then they'll have the same softmax output.

  • @mountaindrew_
    @mountaindrew_ Год назад

    Is SSR used mainly for single output neural networks?

    • @statquest
      @statquest  Год назад

      it depends on what you are predicting.

  • @rachelcyr4306
    @rachelcyr4306 3 года назад +1

    Do you have anything on soft max logistic regression????

  • @AnujFalcon
    @AnujFalcon 2 года назад +1

    Thanks.

  • @brahimmatougui1195
    @brahimmatougui1195 11 месяцев назад

    but sometimes we need to give probabilities along with the model prediction, especially for multiclass prediction. if we can not trust the probabilities (8:11) given by the model what should we do? In other words, If I want to assign probabilities to each class provided in the output, how would I go about doing it?

    • @statquest
      @statquest  11 месяцев назад +1

      These "probabilities" follow the definition of "probability" (they are between 0 and 1 and add up to 1) - so if that is good enough, then you are good to go. However, if you want to use them in a setting where you can interpret them as "given these input values, 95% of the time the species is X", then you should use a different model. Possibly logistic regression would be a better fit.

    • @brahimmatougui1195
      @brahimmatougui1195 11 месяцев назад +1

      @@statquest Thank you for your prompt answer

  • @gummybear8883
    @gummybear8883 2 года назад

    Anybody knows what is the equivalent of argmax in tensorflow's activation arguments ? They only have softmax in there.

    • @statquest
      @statquest  2 года назад

      There's probably a base "max" function in Python or numpy you could use.

    • @gummybear8883
      @gummybear8883 2 года назад

      @@statquest Thanks for the suggestion Josh. I bought your new sketch book and I think it is very clever. I thought it would have been much better, if the book cover was hard bound. Overall, thank you for making these videos.

    • @statquest
      @statquest  2 года назад +1

      @@gummybear8883 Thanks! I would have loved to have made a hardback addition, but I'm self publishing and it was not an option.

  • @shivamkumar-rn2ve
    @shivamkumar-rn2ve 2 года назад +1

    BAM you cleared all my doubt

  • @luciferpyro4057
    @luciferpyro4057 3 года назад

    What does e stand for in the softmax equation? did I miss something ?
    Is "e" suppose to represent euler's number = 2.7182818284590452353602874713527... ?

    • @statquest
      @statquest  3 года назад

      'e' is Euler's number. 'e', and the natural log (log base 'e'), are used throughout machine learning (and statistics) because their derivatives are so easy to work with.

    • @luciferpyro4057
      @luciferpyro4057 3 года назад +1

      @@statquest Thanks

  • @andrewdunbar828
    @andrewdunbar828 Год назад

    Does the output range depend on the activation function? Looks like ReLU but I think it can't happen with sigmoids.

    • @statquest
      @statquest  Год назад

      The output range of what?

    • @andrewdunbar828
      @andrewdunbar828 Год назад

      @@statquest The output nodes. Right at the start around 1:40

    • @statquest
      @statquest  Год назад +1

      @@andrewdunbar828 Because the activation functions are in the middle, and then, after them, we multiply those values by weights and add biases, that, in theory, could be anything, we could definitely end up with numbers > 1 and < 0 even if the activation functions were sigmoids. For example, if the last bias term before the output for setosa was +100, then we could easily end up with output values > 100.

    • @andrewdunbar828
      @andrewdunbar828 Год назад

      @@statquest Hmm I have much to learn (-:

  • @Anujkumar-my1wi
    @Anujkumar-my1wi 3 года назад

    I want to know in pure mathematics , do neurons learns functions with certain superpositions, width,height and slope (controlled by neurons through weights and biases) such that when we combine them we'll get a approximation for the function we're trying to approximate?

    • @statquest
      @statquest  3 года назад

      Neural Networks are considered "universal function approximators".

    • @Anujkumar-my1wi
      @Anujkumar-my1wi 3 года назад

      @@statquest I mean they approximate function by learning certain simpler function with ceratain superposition ,slope,height,width(controlled by weights and biases) so that when we combine them we get a approximation for the function we're trying to approximate?

    • @statquest
      @statquest  3 года назад

      @@Anujkumar-my1wi To be honest, I'm probably the worst person to ask about these sorts of things. I know that, through weights and biases, we create a wide variety of non-linear functions that are added together to create a complicated function that approximates the training data. However, I'm not sure that's what you're looking for.

    • @Anujkumar-my1wi
      @Anujkumar-my1wi 3 года назад

      @@statquest No , i just wanted to ask whether that's the way a neural net works mathematically.

    • @statquest
      @statquest  3 года назад

      @@Anujkumar-my1wi I'm still a little confused, because mathematically, Neural Networks do exactly what I describe in these videos. I'm not dumbing down the math, this is the real deal, so what you see here is what Neural Networks do mathematically.

  • @abhishekm4996
    @abhishekm4996 3 года назад +2

    Thanks..🥳

  • @fndpires
    @fndpires 2 года назад +1

    Come on people, buy his songs, subscribe to the channel, thumbs UP, give him some money! Look what hes doing. HUGE DAMN!

    • @statquest
      @statquest  2 года назад +1

      Thanks for the support!!! :)

  • @shubhamtalks9718
    @shubhamtalks9718 3 года назад

    Why not do the normalization of raw output values? What is the benefit of first doing exponentiation and then normalization?

    • @statquest
      @statquest  3 года назад +1

      I believe that the exponentiation ensures that the SoftMax function will be continuous for all input values.

    • @shubhamtalks9718
      @shubhamtalks9718 3 года назад

      @@statquest Will it be discontinuous if we do normalization of raw output values?

    • @statquest
      @statquest  3 года назад +1

      @@shubhamtalks9718 If two of the 3 outputs are 0, then we'll get ArgMax, and that's no good.

    • @shubhamtalks9718
      @shubhamtalks9718 3 года назад +1

      @@statquest BAM!!! Got it. Thanks.

  • @naughtrussel5787
    @naughtrussel5787 10 месяцев назад +1

    Cute bear next to formulae is the best way to explain math to me.

  • @hunterswartz6389
    @hunterswartz6389 2 года назад +1

    Nice

  • @srewashilahiri2567
    @srewashilahiri2567 2 года назад

    If we start with different values for weights and biases then why will the optimum values be different if we have a global minimum for each through gradient descent? What am I missing?

    • @statquest
      @statquest  2 года назад

      There are lots of local minimums that we can get stuck in, and there may be several that are almost as good as the global minimum.

    • @srewashilahiri2567
      @srewashilahiri2567 2 года назад +1

      @@statquest Did some reading and got your point completely....thanks for the videos...not sure if learning ML could get any easier or better!

    • @statquest
      @statquest  2 года назад

      @@srewashilahiri2567 bam!

    • @yashikajain5997
      @yashikajain5997 2 года назад

      @@statquest Stucking in the local minima would depend on the cost function? If we use Cross-entropy as the loss function, then because it is a convex function, it will definitely converge to the global minima. And, in this case can we trust the accuracy of these 'probabilities'?
      This is what I am thinking, please correct me if I am wrong.
      Thank You

    • @statquest
      @statquest  2 года назад +1

      @@yashikajain5997 Unfortunately it's not that simple. Cross-Entropy, like SSR, is convex in very simple situations, but the entire Neural Network is non-linear with respect to the parameters so regardless of the loss function, we can end up with a strange shape that has local minima that we can get stuck in.

  • @YuriPedan
    @YuriPedan 3 года назад

    Somehow "Part 4 Multiple inputs and outputs" video is not available for me :(

    • @statquest
      @statquest  3 года назад

      Thanks for pointing that out. I've fixed the link: ruclips.net/video/83LYR-1IcjA/видео.html

    • @YuriPedan
      @YuriPedan 3 года назад +1

      @@statquest Thank you very much!

  • @jennycotan7080
    @jennycotan7080 7 месяцев назад +1

    That pirate joke!
    Moving on in the fields of Maths...

  • @BillHaug
    @BillHaug Год назад +1

    I saw the thumbnail and the pirate flag and immediately knew where you were going haha.

  • @Kagmajn
    @Kagmajn Год назад +1

    nice

  • @howardkennedy4540
    @howardkennedy4540 3 года назад

    Why is the versicolor softmax value +0.10 vs -0.10? The math indicates a negative value.

    • @statquest
      @statquest  3 года назад

      SoftMax values are always positive and between 0 and 1. Can you explain how you got a negative value?

    • @howardkennedy4540
      @howardkennedy4540 3 года назад +1

      @@statquest I misunderstood your notation and missed your comment on e raised to the power. My apologies.

  • @pranjalpatil9659
    @pranjalpatil9659 2 года назад +1

    I wish Josh taught me all the maths I've ever learned

  • @nelsonmcnamara
    @nelsonmcnamara 5 месяцев назад

    Hello comment section. Would anyone know, or can point me to the right direction if I actually want the Probability (no quote), instead of "Probability"?
    Imagine if I am predicting the probability of Red Sox winning or Kim winning the Presidential Election, how would I approach that?

    • @statquest
      @statquest  5 месяцев назад

      If you want real probabilities, than you don't want to use a neural network. Instead, consider using something like linear regression ruclips.net/video/nk2CQITm_eo/видео.html or logistic regression ruclips.net/video/yIYKR4sgzI8/видео.html

  • @Janeilliams
    @Janeilliams 2 года назад

    can you show or share the python implementaion

  • @EEBADUGANIVANJARIAKANKSH
    @EEBADUGANIVANJARIAKANKSH 3 года назад +1

    let say i have the chance to increase ur subscriber,
    I will make it to 1M (small BAM!), {10^0}
    no no I will change it to 10M (BAM!) {10^1}
    but I guess ur channel should have at least 100M subs (Double BAM) {10^2}
    0, 1, 2 denotes the Standard of BAM!
    Jokes apart,
    I really think this is one of the most useful channel I have ever seen, I like the way he structures his videos for explaining the concept. Sometimes even my Professors look at these videos for reference. That's how good the channel is!!!!!

  • @deniz.7200
    @deniz.7200 22 дня назад +1

    Bro just create a course out of this videos with some additional content (like a bootcamp). I think that would sell very well :)

    • @statquest
      @statquest  22 дня назад

      Thanks! I'm currently putting it all (plus a few bonus things) in a book right now and I hope to have it out by the end of the year.

  • @bingochipspass08
    @bingochipspass08 2 года назад +1

    Not all heroes wear capes!

  • @Rictoo
    @Rictoo 5 месяцев назад

    I have a question! At 3:35 you say "ArgMax will output 1 for any other value greater than 0.23" - but shouldn't it be "greater than 1.43", because ArgMax points to the value that is the highest in the set of outputs? Other related question: And then is the intuition that if we know the true value of Virginica (e.g., if the training sample was truly Virginica), then if the ArgMax is 0 for Virginica on that training example (because we predicted it wrong), then we essentially "Wouldn't know how to get to the right answer", because we have no slope pointing towards the right answer? We're just told "You're wrong. Not telling you _how_ wrong, just wrong." which isn't helpful for learning.

    • @statquest
      @statquest  5 месяцев назад +1

      At 3:34 I say "> 0.23", because 0.23 is the second largest number, and any number larger than it, will be the one selected by argmax. If, instead, I had said "> 1.43", then nothing would be selected, since 1.43 is the largest number and nothing is larger.
      And your intuition for the second part is correct.

    • @Rictoo
      @Rictoo 5 месяцев назад

      Ohhh, thanks. Now I understand that the Argmax function you're plotting there is the Argmax of the Setosa class, not Versicolor (I think?). I was initially under the impression it was for the Versicolor class.@@statquest

  • @austinoquinn815
    @austinoquinn815 Год назад

    Why do we bother applying either of these? cant we just train with raw outputs rather than using softmax and just take the highest valued node as the answer rather than argmax?

    • @statquest
      @statquest  Год назад +1

      That's a valid question and the answer has to do with how softmax feeds into Cross Entropy, and cross entropy is easer to train than the raw output values. For details on all of this, see: ruclips.net/video/6ArSys5qHAU/видео.html

  • @yourfutureself4327
    @yourfutureself4327 Год назад +1

    💚

  • @julescesar4779
    @julescesar4779 3 года назад +1

  • @Itachi-uchihaeterno
    @Itachi-uchihaeterno 6 месяцев назад

    More videos , Autoencoders and GANs

    • @statquest
      @statquest  6 месяцев назад +1

      I'll keep those topics in mind.

  • @phoenixado9708
    @phoenixado9708 2 года назад +1

    So where's hardmax and hardplus

  • @AdrianDolinay
    @AdrianDolinay 2 года назад +2

    Great thumbnail lol

  • @tuananhvt1997
    @tuananhvt1997 Год назад

    >Setosa, Versicolor, Virginica
    I notice that reference 🤔

    • @statquest
      @statquest  Год назад

      I'm not sure I understand what you are getting at.

  • @CreativePuppyYT
    @CreativePuppyYT 3 года назад

    You forgot to add this video to the machine learning playlist

    • @statquest
      @statquest  3 года назад +1

      Thanks! I'm still in the middle of the neural network series of videos. Hopefully when they are done (in a few weeks) I'll get the playlists organized properly.

  • @alrzhr
    @alrzhr Год назад

    This guy is different :)))

  • @alternativepotato
    @alternativepotato 3 года назад +1

    heh, setosas value after softmax is 0.69

  • @felipe_marra
    @felipe_marra 8 месяцев назад +1

    up

  • @ayushupadhyay9501
    @ayushupadhyay9501 2 года назад +1

    Bam bam bam

  • @Xayuap
    @Xayuap Год назад +2

    ¡ B A M ! 😳

  • @Alchemist10241
    @Alchemist10241 2 года назад

    6:33 This teddy bear eats raw outputs, digests them using Vitamin e (not E) and then sh*ts them between flag zero and flag one. 😁

  • @BlackHermit
    @BlackHermit 2 года назад +1

    Arrrrrrrg! .)

  • @Anonymous-tm7jp
    @Anonymous-tm7jp 10 месяцев назад +1

    AAAARRRRRGGGG!!! mAx😂😂

  • @charansahitlenka6446
    @charansahitlenka6446 Год назад

    at 6:51 softmax takes 1.43 and gives out 0.69, heavy sus

  • @terjeoseberg990
    @terjeoseberg990 8 месяцев назад +3

    Nobody likes derivatives that are totally lame. Especially gradient decent.

  • @allyourcode
    @allyourcode 3 года назад

    ArgMax and SoftMax seem rather pointless since you can already tell which classification the NN is predicting from its raw output; just look for the the greatest output. SoftMax is just going to lull people into the false sense that the outputs are probabilities. In reality, there is nothing super special about its choice of the exp function to force everything to be positive (plus a normalization factor to force everything to add up to 1). Any (differentiable) function f where f(x) >= 0 would have worked just as well as exp.

  • @jijie133
    @jijie133 3 года назад +1

    toilet paper. so funny.

  • @Salmanul_
    @Salmanul_ 8 месяцев назад +1

    Thanks!