sklearn Logistic Regression hyperparameter optimization

Поделиться
HTML-код
  • Опубликовано: 24 мар 2020
  • sklearn Logistic Regression has many hyperparameters we could tune to obtain. Some of the most important ones are penalty, C, solver, max_iter and l1_ratio. In this video, we will use the hyperparameters in logistic regression and find the best combination using gridsearchcv.
    This ensures we get the best logistic regression model for data. Also, most likely we will have the best accuracy.
    Kaggle Kernel: www.kaggle.com/funxexcel/p2-l...
    GitHub: github.com/KunaalNaik/RUclips...
    #logisticregression #machinelearning #sklearn

Комментарии • 53

  • @KunaalNaik
    @KunaalNaik  2 года назад +2

    Want to learn Data Science effectively and show confidence during interviews?
    Download the 6 -Step Strategy to master Data Science through now linear methods of learning
    Download Link - kunaalnaik.com/learn-data-science-youtube/

  • @18meiy13
    @18meiy13 3 года назад +1

    Thank you so much for the video, very well explained with simple words.

  • @Webenefit_Youtube
    @Webenefit_Youtube Месяц назад +1

    Thank you so much for the video, very well explained

  • @rishabhpanesar3683
    @rishabhpanesar3683 3 года назад +1

    Very Helpful Video!
    Thanks for posting this video! Helped a lot!!

  • @rohitkhan1399
    @rohitkhan1399 11 месяцев назад

    Coolll !!!! Very simple and effective sir

  • @antonis7x
    @antonis7x 2 года назад +1

    Thank you so much!

  • @apoorva3635
    @apoorva3635 2 года назад +1

    Thanks. Crisp & clear explanation.

  • @sudaniscience249
    @sudaniscience249 2 года назад +1

    you saved my life tonight

  • @AaSinSin137
    @AaSinSin137 3 года назад +2

    Thank you so much... very much appreciated.

  • @dok3820
    @dok3820 2 года назад

    I appreciate your video sir...but may I ask how you chose what you wanted in the parameter grid?

  • @user-cz5nh1kq9x
    @user-cz5nh1kq9x Год назад

    Thanks

  • @user-vo1kp6mh3m
    @user-vo1kp6mh3m 7 месяцев назад

    Question, I reached a penalty 11... so how do I fix that?

  • @sachink110
    @sachink110 2 года назад

    Hello Kunaal, Nice informative video. Can I use Randomized search CV for decision tree/SVM/Logistic regression/NLP for checking hyperparameters? If yes, then the same parameter code needs to be put for all, or different parameters should be put in that hyperparameter selection bracket?

  • @varuntandon4465
    @varuntandon4465 4 года назад

    Could hyperparameter tuning cause over fitting on the training data? Shouldn't you use a subset of the data to tune your hyperparameters (a validation set)?

    • @KunaalNaik
      @KunaalNaik  4 года назад

      Agree, you should use a validation set. That works for a some models like XGBoost, LGB etc... for basic model we just tune with parameters.

  • @ganeshprabhakaran9316
    @ganeshprabhakaran9316 4 года назад +2

    Simple and neat explanation brother.. sounds great ..Kindly suggest Parameters for Voting classifier

    • @KunaalNaik
      @KunaalNaik  3 года назад

      Hi Ganesh, I am opening a tight group of learners on whatsapp to actively engage and mentor. You can join if you are interested - bit.ly/DataScience_Kunaal_Whatsapp

    • @KunaalNaik
      @KunaalNaik  3 года назад

      Will try another video :)

  • @sshahidmalik97
    @sshahidmalik97 2 года назад +1

    hello sir, i almost used the same param_grid which you showed and i was working in Google Colab and GridSearchCV was fitting different combinations for about 3h 40m. it crashed at the end...what can or should i do to get the results?(my training dataset contained 17145 examples)... thanks

    • @KunaalNaik
      @KunaalNaik  2 года назад

      Hi Shahid, try reducing the number of combinations. Take only two variations of each parameter and make it run.

  • @vishnurajkr3769
    @vishnurajkr3769 3 года назад +2

    Input contains NaN, infinity or a value too large for dtype('float64'). i got an error like this

    • @KunaalNaik
      @KunaalNaik  3 года назад

      You might want to do missing value treatment before you run this code. Check this code out - www.kaggle.com/funxexcel/titanic-solution-random-forest

  • @vishnurajkr3769
    @vishnurajkr3769 3 года назад +1

    i got error as "Solver lbfgs supports only 'l2' or 'none' penalties, got l1 penalty".

    • @KunaalNaik
      @KunaalNaik  3 года назад +1

      lbfgs cannot be used with L1 Penalty, that why the error. When you use lbfgs you could remove L1 from it. For others you could use L1.

  • @rawindersingh4007
    @rawindersingh4007 3 года назад +2

    Hi there, I am not seeming to get more optimal performance using this, Do you have any idea why?

    • @KunaalNaik
      @KunaalNaik  3 года назад +2

      You need to apply these strategies:
      1/ Outlier Detection and imputation
      2/ Missing Value Imputation Strategy
      3/ Transformation Strategy
      4/ Cross-Validation Framework
      Iterate by changing the above first and check your performance.

    • @rawindersingh4007
      @rawindersingh4007 3 года назад +1

      @@KunaalNaik thank you 😇

  • @harshverma574
    @harshverma574 2 года назад +1

    What am I doing wrong if after gridsearchcv my accuracy for the model is decreasing!

    • @KunaalNaik
      @KunaalNaik  2 года назад

      Stir the range of the parameters. It takes a while to get the sweet spot :)

  • @amirhosseinrahimi3964
    @amirhosseinrahimi3964 Год назад +2

    Thanks, one queation, after tuining the model and finding the best hyper parameter, it is not necessary to run the model with found best parameter for moel training right? i mean after using GridSearchCV, model is already configured by best parameter? can you please elaborate?

    • @KunaalNaik
      @KunaalNaik  Год назад +1

      Yes, you should rerun the model with the best parameters. In this case, it just uses the best version for the model for doing calculations. (me being lazy :p)

    • @amirhosseinrahimi3964
      @amirhosseinrahimi3964 Год назад +1

      @@KunaalNaik Thanks!
      Si i chose a model, then i found the best parameter, then train the model with best parameter, and finally model.predict.. right?

    • @KunaalNaik
      @KunaalNaik  Год назад +1

      @@amirhosseinrahimi3964 You are right :) what model are you building?

    • @amirhosseinrahimi3964
      @amirhosseinrahimi3964 Год назад +1

      @@KunaalNaik Thaks! in a classification problem, I am using Randomforest, Logistic Regression, and Knn to compare the f1 score...

    • @KunaalNaik
      @KunaalNaik  Год назад

      @@amirhosseinrahimi3964 Which one is best?

  • @chetanmazumder310
    @chetanmazumder310 3 года назад

    Why can't see the hyper parameters after dot fit ?

    • @KunaalNaik
      @KunaalNaik  3 года назад

      Could not get you Chetan?

  • @ibrahimalhassan5996
    @ibrahimalhassan5996 2 года назад +1

    What happens if one of the x variables is nominal, for example sex(male/female)?

    • @KunaalNaik
      @KunaalNaik  2 года назад

      convert it to binary feature and then build the model.

  • @varunchakilam1294
    @varunchakilam1294 4 года назад +1

    failed to perform for categorical data
    movie review classifier

    • @KunaalNaik
      @KunaalNaik  4 года назад

      Does it have more than one category? If Yes, then try some other algorithm.

  • @zimelkhan8928
    @zimelkhan8928 2 года назад +1

    Why you not passed scoring in gridsearchCV ??

    • @KunaalNaik
      @KunaalNaik  2 года назад

      Most learners get confused about it when using it for the first time :)

  • @Mustistics
    @Mustistics 2 года назад +1

    I thought you'd explain the meaning of the hyperparams.

    • @7polletes
      @7polletes Год назад

      This is much better than that. Amazing trick.