How to Select the BEST Threshold for Your Model Using the ROC Curve
HTML-код
- Опубликовано: 10 июл 2024
- In this video I explain how we can select the best threshold by looking at the receiver operating characteristic (ROC) curve, and how we can vary the true positive rate (TPR) and the false positive rate (FPR) to adapt our model's prediction to the business problem we are trying to solve.
Notes
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
At 4:27 I say twice "the model model begins". This is due to an editing error from my side. Sorry about that.
Related Videos
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Why We Divide by N-1 in the Sample Variance: • Why We Divide by N-1 i...
The Bias Variance Tradeoff: • Why Models Overfit and...
Multivariate Normal Distribution: • Multivariate Normal (G...
Estimated Calibrated Error (ECE): • Estimated Calibration ...
Contents
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
00:00 - Intro
00:46 - Business problem influence the threshold
02:08 - ROC curve intro
02:46 - TPR and FPR definitions
03:23 - The three phases of the ROC Curve
05:35 - AUC and model selection using the ROC Curve
06:54 - Outro
Follow Me
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
🐦 Twitter: @datamlistic / datamlistic
📸 Instagram: @datamlistic / datamlistic
📱 TikTok: @datamlistic / datamlistic
Channel Support
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
The best way to support the channel is to share the content. ;)
If you'd like to also support the channel financially, donating the price of a coffee is always warmly welcomed! (completely optional and voluntary)
► Patreon: / datamlistic
► Bitcoin (BTC): 3C6Pkzyb5CjAUYrJxmpCaaNPVRgRVxxyTq
► Ethereum (ETH): 0x9Ac4eB94386C3e02b96599C05B7a8C71773c9281
► Cardano (ADA): addr1v95rfxlslfzkvd8sr3exkh7st4qmgj4ywf5zcaxgqgdyunsj5juw5
► Tether (USDT): 0xeC261d9b2EE4B6997a6a424067af165BAA4afE1a
#roc #threshold
*AWKWARD VIDEO MISTAKE* : The two models used for disease detection and hiring at 05:06 should be switched in the ROC plot. Sorry for any confusion this mistake may have caused you when watching this and thanks @benedict6695 for pointing it out!
Highly appreciated🙏
Glad you enjoyed it! :)
Thanks for the video. Did you end up making a video for ROC in multiclass too?
Thanks for the feedback! I haven't started making that video yet, but it's on my list. Stay tuned. :)
Can I use a similar approach for multiclass but looking at metric balanced accuracy?
You can use a one vs all or one vs one approach when computing the ROC for multiclass. :)
Hi Hope you are doing well!
Im still a bit confused. Don't you think that the models for disease detection and hiring are switched? (05:06)
We are always assuming that the alternative or the H1 as the condition that we wanna predict. Or in other terms, innocent (H0) until proven guilty (H1). Hence:
*Disease detection model*
Assumed:
H0 = No disease
H1 = Disease
It's okay to have more False Positive (reject True H0) than False Negative (Accept False H0), Or in other term, It's okay for the model to classify more people having a disease (H1) than to have no disease (H0) since it will be more costly. Thus, having a higher False Positive Rate (FPR) is better in this case. The model should be placed on "Upper Right" instead of "Lower Left"
*Hiring model*
Assumed:
H0 = Do not Hire
H1 = Hire
It's okay to have more False Negative (Accept False H0) than False Positive (reject True H0), Or in other term, It's okay for the model to classify more "Do not Hire" (H0) than to "Hire" (H1) since Hiring "underperfomed people" will be more costly. Thus, having a higher False Negative Rate is better in this case. The model should be placed "Lower Left" instead of "Upper Right" (Lower Left = Low FPR = Higher False Negative Rate)
Thanks in advance! Please correct me if Im wrong :)
Thank you so much for this detailed feedback and so sorry that it took me so long to respond. I've been really busy this week at my job and only today I've found a little bit of spare time to go again through this example and check if it's correct or not.
Well, after doing that, I have to awkwardly admit that you are indeed correct, the two models at 05:06 should be switched. If you look at 1:45, I say that you need to set a low threshold for the disease detection model and a high one for the hiring model. Then, starting with 03:25, I say that we start with the maximum threshold in the ROC curve and then decrease it. Finally, for some reason, at 05:06 I just switch the two models (don't ask me why, most likely I didn't pay enough attention when creating that part of the video).
All in all, I will pin a comment to this video where I explain this mistake, so other don't get confused. Thanks again for pointing this out!
@@datamlistic Thank you so much for all of your videos mate, I learn a lot of things. Really appreciate it that you can share this knowledges with all of us. In addition, thank you for taking your time to respond to me and making these videos
Keep up the good work man! Cheers!
Thank you so much for your kind words! Really appreciated it! Also, I am super happy that you find helpful the content I make on this channel! :)