Tune multiple models simultaneously with GridSearchCV
HTML-код
- Опубликовано: 11 сен 2024
- You can tune 2+ models using the same grid search! Here's how:
1. Create multiple parameter dictionaries
2. Specify the model within each dictionary
3. Put the dictionaries in a list
👉 New tips every TUESDAY and THURSDAY! 👈
🎥 Watch all tips: • scikit-learn tips
🗒️ Code for all tips: github.com/jus...
💌 Get tips via email: scikit-learn.tips
=== WANT TO GET BETTER AT MACHINE LEARNING? ===
1) LEARN THE FUNDAMENTALS in my intro course (free!): courses.datasc...
2) BUILD YOUR ML CONFIDENCE in my intermediate course: courses.datasc...
3) LET'S CONNECT!
- Newsletter: www.dataschool...
- Twitter: / justmarkham
- Facebook: / datascienceschool
- LinkedIn: / justmarkham
Thanks for watching! 🙌 If you're brand new to GridSearchCV, I recommend starting with this tutorial instead: ruclips.net/video/Gol_qOgRqfA/видео.html
I was literally on my hunt to see if I can train 4 different models with different parameters all at once and here you upload this. Perfect timing! Thanks, man! Love your content!
That's awesome to hear! So glad I could be helpful, and thanks for your kind words 🙏
OMG, a classic "RUclipsr explains how to add 1+1" BUT NO ONE ELSE SAYS HOW TO ADD 1+1!!! Thanks a bunch, you most likely saved me two hours of frustrating trial and error.
Happy to help! 🙌
You're a good boy; this has streamlined my PhD's research.
Thank you!
Thanks for saving our time, i used to do loops
You're very welcome!
Amazing video, there are very few videos on these such unique topics on RUclips.
Had one doubt, didn't understood the placeholder part at 2:15.
Thanks, helpful
You're welcome!
Nice tip!!
As I'm solving a multi-output classification problem I'm using MultiOutputClassifier() for that and I think that's what's messing my code up when trying to run this solution. It looks something like this:
pipeline = Pipeline([
('vect', CountVectorizer(tokenizer=tokenize)),
('tfidf', TfidfTransformer()),
('classifier', MultiOutputClassifier(lr_clf))
])
# parameter dict for logistic regression
params_lr = {
'vect__decode_error' : ['strict', 'ignore', 'replace'],
'tfidf__norm' : ['l1', 'l2'],
'classifier__estimator__penalty' : ['l1', 'l2'],
'classifier__estimator__C' : [0.1, 1, 10],
'classifier__estimator' : [lr_clf]
}
# other dicts for different models
# list of parameters dicts
parameters = [params_lr, params_svc, params_rf]
cv = GridSearchCV(pipeline, param_grid = parameters, n_jobs=-1)
cv.fit(X_train, y_train)
Any tips on this? I think the gridsearch doesn't understand the MultiOutputClassifier.
Thanks in advance!!
I don't doubt this works, but it seems a bit odd to specify an instantiated classifier in the pipeline only to override it with another instantiated classifier. Is there a way to make the classifier in the pipeline a generic placeholder?
If you wrap it into a loop
@@DanielWeikert No need; I realized my question is kinda silly considering that it's no different then simply overriding default parameters. Instead of specifying attributes, each classifier is an object.