ROC and AUC in R

Поделиться
HTML-код
  • Опубликовано: 1 дек 2024

Комментарии • 375

  • @statquest
    @statquest  3 года назад +8

    You can get a copy of the code from the StatQuest GitHub, here: github.com/StatQuest/roc_and_auc_demo/blob/master/roc_and_auc_demo.R
    Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

    • @falaksingla6242
      @falaksingla6242 2 года назад

      Hi Josh,
      Love your content. Has helped me to learn a lot & grow. You are doing an awesome work. Please continue to do so.
      Wanted to support you but unfortunately your Paypal link seems to be dysfunctional. Please update it.

    • @ryanmckenna2047
      @ryanmckenna2047 Год назад

      The code would not run when I downloaded it from github

    • @statquest
      @statquest  Год назад

      @@ryanmckenna2047 What part didn't run? I just re-ran it and worked fine.

    • @ashishdayal172
      @ashishdayal172 Год назад

      did u make this in python too??

    • @statquest
      @statquest  Год назад +1

      @@ashishdayal172 not yet

  • @EdySold
    @EdySold Год назад +4

    Complex things in simple and understandable language. I have never met a better teacher!

  • @ripsu100
    @ripsu100 6 лет назад +62

    "The only man who never makes mistakes is the man who never does anything."
    Thank you ;)

    • @statquest
      @statquest  6 лет назад +2

      No, thank you! You're comment was very helpful and spared me a lot of future embarrassment. The video was only seen by 100 or so people (not 1,000s) before you pointed out the error.

  • @marcianocaliman8601
    @marcianocaliman8601 5 лет назад +24

    Dude, your videos are great. I never found something so clearly on the internet. Congratulations!!!

  • @rigae2
    @rigae2 Год назад +4

    Your explanation of the process and logic behind each function and line are so helpful. I hope you'll make more of these videos. Thank you so much, this content is uniquely valuable.

  • @QuantumQuasar91
    @QuantumQuasar91 2 месяца назад +1

    Just to let you know that I found your channel via Claude and I am not disappointed! 91 videos left BAM BAM !

    • @statquest
      @statquest  2 месяца назад +1

      You're making great progress! :)

  • @yvnasu5714
    @yvnasu5714 6 месяцев назад +2

    Crazy how good you are at explaining. You explain the little things I always start to struggle with other teachers/tutors! Thank you so much for these Videos

    • @statquest
      @statquest  6 месяцев назад

      Happy to help!

  • @happygolucky4350
    @happygolucky4350 3 года назад +2

    These are the best videos. When I need to relax, I watch your videos

    • @statquest
      @statquest  3 года назад

      Glad you like them!

    • @happygolucky4350
      @happygolucky4350 3 года назад

      @@statquest If you have two output neurons in a ANN (for a two class classification problem {1,0; 0,1}, it is okay to build the ROC just by comparing output of any one of those neurons with its corresponding target?

    • @happygolucky4350
      @happygolucky4350 3 года назад

      Thanks Josh, I changed it to {1,0} as output as the AUC for the two neurons {1or0} in the {1,0;0,1} architecture were not the same.

  • @marcoventura9451
    @marcoventura9451 2 года назад +10

    Impressive video. Theory and examples with software are the best way to learn. There is much going on this video, one of the best of ever. Thank You, Josh, greetings form Italy for a happy new year for you, your beloved ones and for all the people which follow your amazing lessons.

  • @kumarrishabh8904
    @kumarrishabh8904 4 года назад +4

    Such an awesome channel I came across! ....gonna share it with everyone under my umbrella !!! You are doing really great bro!

  • @SurrenderPink
    @SurrenderPink 5 лет назад +1

    Best song ever, Josh. StatQuest keeps gettin’ better and better! Many thanks.

  • @rylieedwards2641
    @rylieedwards2641 2 года назад +1

    Great explanation of everything including each parameter in the graphs. Loved it!

  • @ManyBadVids
    @ManyBadVids Год назад +2

    The silly songs, the calm voice and the bams gives this vibes as if the course is narrated by Forrest Gump.
    Love it.

  • @meenakshidevi5425
    @meenakshidevi5425 4 года назад +3

    Hey..... Love the way you present ❤️

    • @statquest
      @statquest  4 года назад

      Thank you so much 😀

  • @zainabkhan2475
    @zainabkhan2475 2 года назад +1

    I thank God I found this channel 2 years ago... 😇

  • @benguo661
    @benguo661 2 года назад +3

    Thank you sooooo much Josh! You are a life saver!!😄

  • @xyliu3758
    @xyliu3758 10 месяцев назад +2

    hey bro, i love your videos so much, please hang in and i will continue to support you!

    • @statquest
      @statquest  10 месяцев назад

      Thank you very much!

  • @geocarvalhont
    @geocarvalhont 5 лет назад +1

    Hey Josh Ty again, while my studies I reproduced everything using R Colab (Really recommend for who is studying Josh's codes in R)

  • @famin7794
    @famin7794 5 месяцев назад +1

    You solve my headache. Thanks a lot

    • @statquest
      @statquest  5 месяцев назад

      Happy to help!

  • @esan120au
    @esan120au Год назад +1

    Thanks for your wonderful and detailed videos!

    • @statquest
      @statquest  Год назад

      Thank you so much for supporting StatQuest! BAM! :)

  • @arike9289
    @arike9289 3 года назад +1

    Good job and well-done. I like your style of teaching, it's great!!!

  • @archowdhury007
    @archowdhury007 4 года назад +2

    Wonderful tutorial!!.....thank you so much Josh :)

  • @qurrataayunkartika1496
    @qurrataayunkartika1496 3 года назад +1

    waaa.. i'm so thankful found this video. Thanks a lot. Stay healthy cool people :)

  • @marco1anziano84
    @marco1anziano84 2 года назад

    I mean, the stats tutorial is indeed very well done, but the intro song was already enough to make me immediatly click on the like button.

  • @horseheadmd6844
    @horseheadmd6844 3 года назад +2

    Thank you for this informative video. It helped me a lot. Great work!

  • @peterh5960
    @peterh5960 3 года назад +1

    Incredibly helpful, thank you!

  • @odearjafter9426
    @odearjafter9426 2 года назад +1

    thank you for such an informative tutorial

    • @statquest
      @statquest  2 года назад

      Glad it was helpful!

  • @justchiful
    @justchiful 4 года назад +1

    Dear ,i haveenjoyed ur video ,very much clearity of thoughts

    • @statquest
      @statquest  4 года назад

      Thank you so much 🙂

  • @vivektanwar628
    @vivektanwar628 7 месяцев назад +1

    YOU ARE MARVELOUS,EXTRAORDINARY .I WISH YOU COULD HAVE EXPLAINED IN PYTHON

    • @statquest
      @statquest  7 месяцев назад +1

      One day I will.

  • @thuli5209
    @thuli5209 3 года назад +1

    Thank you sooooo much for your lessons. Super helpful

  • @Zahumny
    @Zahumny 5 лет назад +1

    Thank you for helping me with my credit risk class :)

  • @akshay_up
    @akshay_up 6 лет назад +1

    You are amazing man, thanks for the video and keep making more videos like these. BAM!!

    • @statquest
      @statquest  6 лет назад

      Double BAM!!!! Thanks for the encouragement! :)

    • @bitclear670
      @bitclear670 5 лет назад

      Double Bam!!

  • @JulioCCavalcanti
    @JulioCCavalcanti 3 года назад +1

    You are amazing, man! Thanks!!!

  • @ogunsadebenjaminadeiyin2729
    @ogunsadebenjaminadeiyin2729 5 лет назад +1

    Thanks man, very clear and helpful

  • @AromaVancouver
    @AromaVancouver Год назад +1

    Keep up the good work .. Thank u🤩

  • @nalliwok
    @nalliwok Год назад +1

    Thank you so much for this video!

  • @michalispapadopoulos5090
    @michalispapadopoulos5090 2 года назад +1

    Thanks a lot sir! You are very helpful!

  • @PeterKidd-s5c
    @PeterKidd-s5c Год назад

    Hey Josh, great videos on ROC curves, your teaching is refreshingly concise and clear. I just have one question that I hope you could expand on. When we first generate 100 samples from a normal distribution, why do we need to sort them from low to high? And what would the dangers be if we didn't do this?
    Thanks for the great content!

    • @statquest
      @statquest  Год назад

      What time point in the video, minutes and seconds, are you asking about?

    • @PeterKidd-s5c
      @PeterKidd-s5c Год назад

      @@statquest roughly around 2:55

    • @statquest
      @statquest  Год назад

      @@PeterKidd-s5c Technically, you don't need to sort them, but it makes it easier to look at the data. When we print out the values for the "obese" variable at 4:11, the output is way easier to interpret because the values for weight were sorted.

  • @b1ndaboymetz
    @b1ndaboymetz 3 года назад +1

    VERY helpful - thank you!

    • @statquest
      @statquest  3 года назад

      Glad it was helpful!

  • @yulinliu850
    @yulinliu850 6 лет назад +1

    Many Thanks Josh!

  • @timstone5168
    @timstone5168 4 года назад +1

    That ROC you had really tied the room together

  • @pitiwatkittiwimonchai4656
    @pitiwatkittiwimonchai4656 2 года назад +1

    So good Thanks for the video

    • @statquest
      @statquest  2 года назад +1

      Glad you enjoyed it!

  • @harmagician1
    @harmagician1 3 года назад +1

    Bam! Good tutorial.

  • @gregorsamsa3290
    @gregorsamsa3290 4 года назад +2

    Please make more Videos with R! :)

  • @dorothymartin2477
    @dorothymartin2477 2 года назад +1

    Hi Sir, your videos are very helpful. Hope that you can make a video on mean decrease Gini of Random Forest

  • @tojama
    @tojama 3 года назад

    Great again! I would be interested to see how to make combined ROCs for, say 2-4 different biomarker candidates. This would be to see if their combined use would result in higher AUCs than that of individual markers.

  • @jethrogauld7437
    @jethrogauld7437 2 года назад +1

    Great video thanks

  • @PunmasterSTP
    @PunmasterSTP 8 месяцев назад +1

    Ah, the pirate's favorite programming language!

  • @ainiaini4426
    @ainiaini4426 2 года назад +1

    Hahaha.. That cute confession that you have a hard time remembring what sensitivity and specificity mean, made me laugh.. Because it is so confusing to me also.. These really are confusion metrics🤣

    • @statquest
      @statquest  2 года назад +1

      Since I had so much trouble remembering about sensitivity and specificity, I wrote a little song to help me out: ruclips.net/user/shortsPWvfrTgaPBI

    • @ainiaini4426
      @ainiaini4426 2 года назад +1

      Wow.. You are so creative at making things easy.. I am impressed!

  • @KarenCruz-tx5nh
    @KarenCruz-tx5nh 4 года назад

    You are the savior of the little humans we are, thanke you god! I have a silly question, sometimes you use

    • @statquest
      @statquest  4 года назад

      R is funny about the "

  • @harip9454
    @harip9454 Месяц назад

    Hi, this video solved a puzzle for me. I was searching how to make a ROC curve using R. Now, can u demonstrate how to calculate the other statistics like precision, negative predictive value, accuracy etc, using R, and how to plot these

    • @statquest
      @statquest  Месяц назад

      I'll keep those topics in mind.

  • @DailyKosia
    @DailyKosia 4 года назад +1

    Thank you very much!

  • @vishakhakumar3854
    @vishakhakumar3854 4 года назад +1

    great video!!

  • @Davidravaux
    @Davidravaux 6 лет назад +1

    Thank you so much!
    Do you consider make a video about limited dependant variables models (tobit, heckman...)?
    It will be very helpful for us! All the best.

    • @statquest
      @statquest  6 лет назад

      OK. I'll put it on the to-do list, but it will be a while before I get to it.

    • @Davidravaux
      @Davidravaux 6 лет назад +1

      Thank you! This is a short bibliography about the topic:
      J. Scott Long, Regression Models for Categorical and Limited Dependent Variables
      Alfred DeMaris, Regression With Social Data: Modeling Continuous and Limited Response Variables
      Wooldrige, Introductory Econometrics
      I can share you the books if needed.

    • @statquest
      @statquest  6 лет назад

      @@Davidravaux OK. However, just know that my to-do list is huge (it has about 200 things on it - I get about 3 or 4 requests every day), so it might take me a long time to get to it. However, if a lot of people start asking for a certain topic, that topic gets moved closer to the top of the to-do list. So, if you know of a ton of people interested in this subject, you should have them add to this comment.

    • @Davidravaux
      @Davidravaux 6 лет назад

      Ok, I totally understand, thank you for clarifying.

  • @farawayscity
    @farawayscity 5 лет назад +1

    Great vedio! Very helpful. BTW, there is a discrepancy between this clip and the code shared in your website about the obj roc.df (line 78). Nothing has been assigned to the obj yet so when we run the line 78 gives an error msg. Overall, very clear and handy. Thank you!

    • @statquest
      @statquest  5 лет назад +1

      Thanks for catching that! The problem had to do with how wordpress interprets the the ">" and "

    • @farawayscity
      @farawayscity 5 лет назад +1

      @@statquest I see. Good to know! Thank you~ :>

  • @NitsT01
    @NitsT01 5 лет назад +4

    You gotta stop saying BAM!!! it's really funny :D

  • @dylanz52
    @dylanz52 5 лет назад +3

    Great video! One quick question. Do you know how to plot ROC-AUC graph for SVM and adaboost?

  • @nabilmahmoud608
    @nabilmahmoud608 5 лет назад +1

    This video is absolutely amazing! but how can i determine the threshold/cut off weight from threshold probability that decides whether the subject is obese or not using code and not by direct extrapolation from the logit curve?

  • @lalita3853
    @lalita3853 5 лет назад +2

    Thank you sir

  • @anoriginalnick
    @anoriginalnick 6 месяцев назад +1

    Excellent videio

  • @andreabernal414
    @andreabernal414 2 года назад +1

    YOU DA BEST

  • @songuihamedkone6600
    @songuihamedkone6600 5 лет назад +1

    Thanks a lot !

  • @PaoloItalyanca
    @PaoloItalyanca 3 месяца назад +1

    Thumb up for the "number-of-exclamation-points-on-the-BAM" track record.

  • @galan8115
    @galan8115 4 года назад +1

    For anyone that searches how to make them squared. Put "par(pty= "s")" before running the lane with the graph. And if u got like huge margins in your graph, u need one more argument in the roc() wich is "asp=NA" ; also you could print your AUCS easily in the plot.
    My code looks like this:
    roc(df_mod_cand$clase, mod_cand$fitted.values, plot=T,asp=NA,col="red", lwd=3, legacy.axes=T, print.auc=TRUE)

  • @SS-cp1cm
    @SS-cp1cm 4 года назад +1

    thank you soooo much!!!

  • @Leo-wd8vq
    @Leo-wd8vq 6 лет назад +7

    thank you for your video. btw, can you make one for python?

    • @statquest
      @statquest  6 лет назад +7

      I'll work on it. I'm doing a lot more Python coding these days, so it makes sense.

    • @ccuny1
      @ccuny1 4 года назад +1

      @@statquest A year later, I suddenly wake up to StatQuest. Python implementation please. Perhaps SciKit Learn also has built-in computations for these and other metrics. I'll check...

  • @rahulg1504
    @rahulg1504 3 года назад

    Many thanks Josh, you are doing a great job.
    In my study, I would like to calculate and plot pROCs for a couple of maxent scenarios and glm model scenarios using 1000 iterations and a 5% omission error using pROC package in R, would be really grateful if you can guide me a bit. Thanks in advance.

    • @statquest
      @statquest  3 года назад

      Let me know how it goes! :)

    • @rahulg1504
      @rahulg1504 3 года назад

      @@statquest May I get the R code for the scenario I mentioned? I am still trying to figure out how to prepare data from the maxent output and then use it with pROC package to calculate and plot AUCs. I am relatively a newbie in R. Theory wise I think I am pretty clear, but struggling with codes and commands to get this job done with pROC package.

    • @statquest
      @statquest  3 года назад

      @@rahulg1504 The code for this video is here: github.com/StatQuest/roc_and_auc_demo/blob/master/roc_and_auc_demo.R

  • @christelleleitzingerphd7491
    @christelleleitzingerphd7491 3 года назад

    Thanks for the video and explanations! What statistical test would you use to compare 2 ROC curves?

    • @statquest
      @statquest  3 года назад

      There are a bunch of options. This tool (in R) implements them: bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-12-77

  • @sanazomidi5536
    @sanazomidi5536 4 года назад +1

    Thank you

  • @juanlb1105
    @juanlb1105 3 года назад

    The Answer to the Ultimate Question of Life, the Universe, and Everything :D F*cking loved the reference (hope it is not casual ^_^´)

  • @davidstivenarboledaprado8731
    @davidstivenarboledaprado8731 6 месяцев назад

    Hello for the video, really useful, in this example you come up with a method to classify obese and not obese , what about when you don't know a threshold for the initial classification of obese or not obese ? Does the pROC function test different thresholds ?

    • @statquest
      @statquest  6 месяцев назад +1

      That's the whole idea of an ROC graph to being with - it's used to determine the optimal threshold.

  • @animeshkansal7746
    @animeshkansal7746 5 лет назад +1

    AUC for Logistic regression is more than AUC for RF, but if you consider only corner most points for both, RF does better, so who is the winner in this case ?

    • @statquest
      @statquest  5 лет назад

      Which corner are you looking at? I don't see RF doing better in either one. Or are you looking at the very edges?

    • @animeshkansal7746
      @animeshkansal7746 5 лет назад +1

      StatQuest with Josh Starmer
      At the north west corner
      Rf at a point has better tpp and fpp
      So isn’t rf better than logistic regression?

    • @statquest
      @statquest  5 лет назад

      @@animeshkansal7746 North east? You are right. RF is a little better up there. This is a good example of when a Partial AUC might be more informative.

    • @animeshkansal7746
      @animeshkansal7746 5 лет назад +1

      Thank you so much, your videos are really great

    • @statquest
      @statquest  5 лет назад

      @@animeshkansal7746 Thanks!

  • @brianhung24241111
    @brianhung24241111 5 лет назад +1

    i am a big fan of you! can you make a survival anaylsis video?

    • @statquest
      @statquest  5 лет назад

      Yes! I will make one this spring. Many people have asked for this topic, so it is at the top of my to-do list.

  • @elmonovagales2929
    @elmonovagales2929 5 лет назад +1

    savior

  • @mzw90
    @mzw90 5 лет назад

    Thank you for the video. It was very easy to follow. May I know how do i obtain optimal cut off points using the ROC curve?

    • @statquest
      @statquest  5 лет назад

      I answer that question in my video that explains ROC and AUC: ruclips.net/video/4jRBRDbJemM/видео.html

    • @mzw90
      @mzw90 5 лет назад

      @@statquest Thank you for your reply! I was actually wondering how to interpret the threshold numbers seen on 09:51. After head(roc.df), you get a list of TPP, FPP and thresholds. For example in the 2nd row TPP 100 FPP 97.77, what does threshold of 0.01349 mean?
      I also have a separate question, I am curious if it is always necessary to always create a linear model first for the ROC curve? For example I am comparing the ROC curves of age and co-morbidities against non-cancer mortality, do I have to create a linear regression for age using glm()?

  • @jarrodshingleton9642
    @jarrodshingleton9642 3 года назад

    Didn't realize you also did videos with R code. I am now truly out of a job, as my students are completely covered. Maybe I could take up the guitar...what...he does that also....dammit.

  • @mathiasschmidt93
    @mathiasschmidt93 2 года назад

    Great video! I was wondering if it is possible to plot this graph in a Multinomial Logistic Regression?

    • @statquest
      @statquest  2 года назад

      Hmmm...I'm not sure.

    • @mathiasschmidt93
      @mathiasschmidt93 2 года назад

      @@statquest Ah okay, what about a multiple logistic regression? Any ideas about that one?

    • @statquest
      @statquest  2 года назад

      @@mathiasschmidt93 As long as your predicted value is binary, it shouldn't matter how many variables you use to make predictions - the process is the exact same as illustrated in this video. To see how it is done in R, see: ruclips.net/video/qcvAqAH60Yw/видео.html

  • @arndong
    @arndong 4 года назад +1

    thanks you bro

  • @UncleLoren
    @UncleLoren 4 года назад +2

    Here's a counter-intuitive trick that helps me keep the two straight:
    SENSITIVITY = True POSITIVE Rate, even though the term does not have a P but does have an N.
    SPECIFICITY = True NEGATIVE Rate, even though the term does not have an N but does have a P.

  • @AravindHan008
    @AravindHan008 5 лет назад

    i am following python for data science so far and got stuck after saw this video , best person like you using R language instead of python so what should i do and which one is best for data science and also in future purpose R program or python kindly let me know and enlighten me
    thanks in advance ..! little BAM

    • @statquest
      @statquest  5 лет назад +1

      They are both very useful. Python is a great language used in a lot of different situations and has a lot of good machine learning libraries. In contrast, R is very useful for doing statistics.... So I would recommend learning both if you have time.

  • @thiemodinger540
    @thiemodinger540 Месяц назад

    Thank you very much for a great video. Is there a way how to get the the sensitivity and specificity data when setting the threshold manually to certain level (e.g. weight cut off for obese to 30g) ? Thanks

    • @statquest
      @statquest  Месяц назад

      If you already have a threshold in mind, you can just calculate everything directly by running your data through the model with that threshold.

  • @curvesettermcatprep1400
    @curvesettermcatprep1400 4 года назад

    Love your content! Quick q: from a conceptual standpoint, are you just testing the hypothesis that the underlying distribution of the weights (which you defined as a gaussian) is not a uniform distribution

    • @statquest
      @statquest  4 года назад

      ROC graphs give us a sense of how accurate or models are given different thresholds for making decisions. For more details, see: ruclips.net/video/4jRBRDbJemM/видео.html

  • @chelseyzhao2178
    @chelseyzhao2178 3 года назад

    Loved the video! How do you relate the threshold back to the data? I.e. make a statement like the threshold between obese and not obese is 140lb

    • @statquest
      @statquest  3 года назад +1

      First, you find the threshold you are interested in (these are in roc.df), then we look at weight associated with the largest glm.fit$fitted.values < the threshold. For example, if the threshold is 0.5, then the weight is: max(weight[glm.fit$fitted.values < 0.5])

  • @kreitzberg09
    @kreitzberg09 6 лет назад +56

    Yeah sure 420 made better looking data 😁😂🤣😄🤗

  • @marynapolyakova8722
    @marynapolyakova8722 3 года назад

    Thank your great lectures! The thresholds that you derive here are between 0 and 1. Can we translate these thresholds to the actual cut-off values?

    • @statquest
      @statquest  3 года назад

      In these examples, the thresholds are the actual cut-off values. In other words, if the logistic regression predicts that the probability that a mouse is obese is 0.9, then we would compare that to the threshold that we obtained from the ROC graph to make a final classification.

  • @anikshah8796
    @anikshah8796 4 года назад

    THanks for the videos Josh! I have a question about AUC. Even though in this video AUC for random forest is lower than logistic, isn't forest a better alternative here as there exists a threshold that generates higher true positive rate for the same false positive rate compared to logistic. This makes the significance of AUC subjective in comparison

    • @statquest
      @statquest  4 года назад

      What you have to do is pick a range of thresholds that are acceptable. Once you do that, you can compare the AUC between those thresholds to determine which method is best.

  • @jovanpetrovic168
    @jovanpetrovic168 3 года назад

    Hi Josh, your videos are great! I have one question about choosing best method based on ROC overlapping graph. If we compare Logistic Regression and Random Forest we see that Logistic Regression is better because of bigger AUC. Bur does it make more sense here to choose Random Forest because one specific instance of Random Forest (with one specific threshold) gave us best confusion matrics? I assumed here that accurately classifyng positive and negative class are equally important.

    • @statquest
      @statquest  3 года назад +1

      It really depends on your goals. In general, Logistic Regression performs better. However, depending on what threshold works best for you, you may still choose Random Forests if it performs better at that threshold.

  • @Xenoni
    @Xenoni 3 года назад +1

    BAM !!!!!!! Indeed

  • @viranchivedpathak4231
    @viranchivedpathak4231 Год назад +2

    Bouble Dam!!

  • @kashifjavedlone1780
    @kashifjavedlone1780 5 лет назад +1

    Love You

  • @elmonovagales2929
    @elmonovagales2929 5 лет назад +1

    I got an error, Error in roc.data.frame(trainData, fitModelTrai$votes[, 1], plot = TRUE, :
    'response' argument should be the name of the column, optionally quoted. the only difference between your code and mine is that I have many parameters/columns/features (approx 35) not only one (weight)

  • @amulyagupta9161
    @amulyagupta9161 Год назад

    Hey! Wonderful video. I had just one doubt- I used a similar code that you used in my Rstudio. And as the runif function is generating random numbers, I could have very well expected that the values in the obese variable is different from the ones generated in your machine. However, eerily enough, it came out to be exactly the same. What sort of sorcery is this? 😮

    • @statquest
      @statquest  Год назад

      Did you set the seed of the random number generator? If so, we'll get the same random numbers every time.

  • @tynna333
    @tynna333 5 лет назад

    Is there anyway to suppress plotting the top and right axes? I tried bty='n' and axes=FALSE to add them later using axis(1) and axis(2) but neither of those worked.

  • @dariatriffon6335
    @dariatriffon6335 4 года назад

    Hi and thanks for your great videos! Could you please elaborate about the obese variable and specifically about the "test" part in that code line. What if I already know who is obese and who is not (let's say based on some external medical profile, let's say "real") and I want to estimate the prediction of the model which is based on a some score (let's say "score") that each individual has. Would I just do glm(real ~ score).? What if I wanted to find the best score - the score that above it I classify someone as "obese" and below it "not obese". what's between the probability threshold in ROC curve and a thresholding of the score itself. Thanks!

    • @statquest
      @statquest  4 года назад +1

      In order to draw this ROC graph, we have to know who is obese and who is not to begin with. So the situation in this video is no different from yours. If you want to find the "best" score, you have to then decide what percentage of false positives and false negatives you are willing to live with - the ROC graph will help you decide that. You can then find the corresponding value by looking at the thresholds and the probabilities predicted for from your model with different scores.

  • @jarednesvet2826
    @jarednesvet2826 5 лет назад +1

    So do these thresholds correlate to the probabilities that are used to separate the obese vs. not obese? Is there a way to figure out how to convert the thresholds back to the actual weights themselves that are used as the cutoff?

    • @statquest
      @statquest  5 лет назад

      The thresholds, with the exception of -infinity and +infinity, are the exact same as the probabilities. -infinity corresponds to a probability of 0 and +infinity corresponds to a probability of 1. Thus, you can compare thresholds to the original glm.fit$fitted.values and match those to the original array of "weight" values.

    • @jarednesvet2826
      @jarednesvet2826 5 лет назад +1

      @@statquest Great thanks for the help!

    • @redgreenskittles
      @redgreenskittles 5 лет назад

      @@statquest Many thanks for a great video. Could you kindly explain how exactly we can do this? I am looking to convert these threshold to actual cut-off values

    • @statquest
      @statquest  5 лет назад +2

      @@redgreenskittles First, I would look at the ROC curve to find my threshold. For the example, we might pick a False Positive Percentage of 20 to be the threshold.
      Then I would look in roc.info to find the threshold associated with that false positive percentage. We can do that by just printing roc.info to the screen and looking at it, or with the command...
      roc.df[min(which(roc.df$fpp

    • @redgreenskittles
      @redgreenskittles 5 лет назад +1

      @@statquest Wow that was a super quick response. Works like a treat! thank you

  • @hannahhillman3593
    @hannahhillman3593 Год назад

    Is it expected that the number of sensitivity/specificity values determined by the roc function (that we stored in the data frame) may not match the number of predictor/response values that I input? For example, my input predictor/response vectors contained 46 objects, but the roc function returned only 12 sensitivity/specificity values.

    • @statquest
      @statquest  Год назад

      I believe this is possible if there are fewer thresholds that make a difference. In other words, some thresholds might result in the same number of false positives, true positives etc. and in that case, those "duplicate" thresholds will be omitted.

    • @hannahhillman3593
      @hannahhillman3593 Год назад +1

      @@statquest Okay great this is exactly what I thought was happening--just wasn't sure if that was a possible outcome. Thanks so much for your reply and for all the great videos!!!

  • @kirisakow
    @kirisakow 4 года назад

    13:47 - sorry I don't understand why, in `rf.model$votes`, choose column 1 (which is the column of zeros) and not column 2 (which is the column of ones)

    • @statquest
      @statquest  4 года назад

      Believe it or not, it doesn't matter which column you choose, both will give you the same ROC curve.

  • @nabilmahmoud608
    @nabilmahmoud608 4 года назад

    Hey Josh, is there a way to make inferences on more than two ROC and to perform multiple comparisons? (a generalization of DeLong's test? and maybe a method to adjust alpha for multiple comparisons too?)

    • @statquest
      @statquest  4 года назад +1

      Good question! Off the top of my head I don't know if there is or not.

  • @jeanmarysymon3596
    @jeanmarysymon3596 5 лет назад +4

    could you do the same in python too?

  • @emkahuda776
    @emkahuda776 2 года назад

    Thank you for another great video. I have a question, what if we have multiple problems for classifications? Not only two classifications (obese and not obese). For example, we want to classify 10 cell types (let's say cell type 1, cell type 2, ..., cell type 10) whether these cell types are present or not in the tissue sample? How can we use this roc() function to plot the ROC curve?

    • @statquest
      @statquest  2 года назад

      To be honest, I don't know the answer to that off the top of my head.

    • @emkahuda776
      @emkahuda776 2 года назад

      @@statquest I have made my own function to plot the ROC curve with similar condition I mentioned. However, I need to make another function to calculate the AUC and was hoping I could use the roc() function which seems providing more information and can include much more information, such as AUC and partial AUC as well. 😰