How to Select the BEST Threshold for Your Model Using the ROC Curve

Поделиться
HTML-код
  • Опубликовано: 10 июл 2024
  • In this video I explain how we can select the best threshold by looking at the receiver operating characteristic (ROC) curve, and how we can vary the true positive rate (TPR) and the false positive rate (FPR) to adapt our model's prediction to the business problem we are trying to solve.
    Notes
    ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
    At 4:27 I say twice "the model model begins". This is due to an editing error from my side. Sorry about that.
    Related Videos
    ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
    Why We Divide by N-1 in the Sample Variance: • Why We Divide by N-1 i...
    The Bias Variance Tradeoff: • Why Models Overfit and...
    Multivariate Normal Distribution: • Multivariate Normal (G...
    Estimated Calibrated Error (ECE): • Estimated Calibration ...
    Contents
    ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
    00:00 - Intro
    00:46 - Business problem influence the threshold
    02:08 - ROC curve intro
    02:46 - TPR and FPR definitions
    03:23 - The three phases of the ROC Curve
    05:35 - AUC and model selection using the ROC Curve
    06:54 - Outro
    Follow Me
    ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
    🐦 Twitter: @datamlistic / datamlistic
    📸 Instagram: @datamlistic / datamlistic
    📱 TikTok: @datamlistic / datamlistic
    Channel Support
    ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
    The best way to support the channel is to share the content. ;)
    If you'd like to also support the channel financially, donating the price of a coffee is always warmly welcomed! (completely optional and voluntary)
    ► Patreon: / datamlistic
    ► Bitcoin (BTC): 3C6Pkzyb5CjAUYrJxmpCaaNPVRgRVxxyTq
    ► Ethereum (ETH): 0x9Ac4eB94386C3e02b96599C05B7a8C71773c9281
    ► Cardano (ADA): addr1v95rfxlslfzkvd8sr3exkh7st4qmgj4ywf5zcaxgqgdyunsj5juw5
    ► Tether (USDT): 0xeC261d9b2EE4B6997a6a424067af165BAA4afE1a
    #roc #threshold

Комментарии • 11

  • @datamlistic
    @datamlistic  11 месяцев назад +3

    *AWKWARD VIDEO MISTAKE* : The two models used for disease detection and hiring at 05:06 should be switched in the ROC plot. Sorry for any confusion this mistake may have caused you when watching this and thanks @benedict6695 for pointing it out!

  • @mhaya1
    @mhaya1 7 месяцев назад

    Highly appreciated🙏

    • @datamlistic
      @datamlistic  7 месяцев назад

      Glad you enjoyed it! :)

  • @speed-stick
    @speed-stick 8 месяцев назад

    Thanks for the video. Did you end up making a video for ROC in multiclass too?

    • @datamlistic
      @datamlistic  8 месяцев назад +1

      Thanks for the feedback! I haven't started making that video yet, but it's on my list. Stay tuned. :)

  • @postnutclarity00
    @postnutclarity00 3 месяца назад

    Can I use a similar approach for multiclass but looking at metric balanced accuracy?

    • @datamlistic
      @datamlistic  3 месяца назад

      You can use a one vs all or one vs one approach when computing the ROC for multiclass. :)

  • @benedict6695
    @benedict6695 11 месяцев назад

    Hi Hope you are doing well!
    Im still a bit confused. Don't you think that the models for disease detection and hiring are switched? (05:06)
    We are always assuming that the alternative or the H1 as the condition that we wanna predict. Or in other terms, innocent (H0) until proven guilty (H1). Hence:
    *Disease detection model*
    Assumed:
    H0 = No disease
    H1 = Disease
    It's okay to have more False Positive (reject True H0) than False Negative (Accept False H0), Or in other term, It's okay for the model to classify more people having a disease (H1) than to have no disease (H0) since it will be more costly. Thus, having a higher False Positive Rate (FPR) is better in this case. The model should be placed on "Upper Right" instead of "Lower Left"
    *Hiring model*
    Assumed:
    H0 = Do not Hire
    H1 = Hire
    It's okay to have more False Negative (Accept False H0) than False Positive (reject True H0), Or in other term, It's okay for the model to classify more "Do not Hire" (H0) than to "Hire" (H1) since Hiring "underperfomed people" will be more costly. Thus, having a higher False Negative Rate is better in this case. The model should be placed "Lower Left" instead of "Upper Right" (Lower Left = Low FPR = Higher False Negative Rate)
    Thanks in advance! Please correct me if Im wrong :)

    • @datamlistic
      @datamlistic  11 месяцев назад

      Thank you so much for this detailed feedback and so sorry that it took me so long to respond. I've been really busy this week at my job and only today I've found a little bit of spare time to go again through this example and check if it's correct or not.
      Well, after doing that, I have to awkwardly admit that you are indeed correct, the two models at 05:06 should be switched. If you look at 1:45, I say that you need to set a low threshold for the disease detection model and a high one for the hiring model. Then, starting with 03:25, I say that we start with the maximum threshold in the ROC curve and then decrease it. Finally, for some reason, at 05:06 I just switch the two models (don't ask me why, most likely I didn't pay enough attention when creating that part of the video).
      All in all, I will pin a comment to this video where I explain this mistake, so other don't get confused. Thanks again for pointing this out!

    • @benedict6695
      @benedict6695 11 месяцев назад

      @@datamlistic Thank you so much for all of your videos mate, I learn a lot of things. Really appreciate it that you can share this knowledges with all of us. In addition, thank you for taking your time to respond to me and making these videos
      Keep up the good work man! Cheers!

    • @datamlistic
      @datamlistic  11 месяцев назад

      Thank you so much for your kind words! Really appreciated it! Also, I am super happy that you find helpful the content I make on this channel! :)