Titanic Survival - Kaggle Logistic Regression Classification

Поделиться
HTML-код
  • Опубликовано: 2 фев 2025

Комментарии • 11

  • @GregHogg
    @GregHogg  Год назад

    Take my courses at mlnow.ai/!

  • @arsheyajain7055
    @arsheyajain7055 3 года назад

    Super helpful!

    • @GregHogg
      @GregHogg  3 года назад

      Thanks; glad to hear it!

  • @jutasiroland89
    @jutasiroland89 3 года назад

    Hello,
    I've done this project (thank you for it!), but I have a question:
    You said that when you do standardization on the numerical features of the training set, the calculated means and st. deviations should be used for the test set.
    What if I make the data transformation on the full data, then split it randomly 80-20, do the train on 80%, then test on the 20%?
    I mean this should not matter given the split is random and therefore sample means and st. deviations should be close? Or am I making a mistake here?
    Many thanks im advance!
    Roland J

    • @GregHogg
      @GregHogg  3 года назад +1

      I've always wondered this question. I personally agree with you that this should be fine. Thanks!

    • @jutasiroland89
      @jutasiroland89 3 года назад

      @@GregHogg thank you for the confirmation, i got back a hit rate slighly above 80% so I think it shoudl be fine. Bottom line is to never use different mean and std if we split the data before normalization :)

    • @GregHogg
      @GregHogg  3 года назад +1

      Yes, I believe this is correct!

  • @denisvoronov6571
    @denisvoronov6571 3 года назад +1

    I tried to submit gender_submission.csv and accuracy on the test data was 0.765, which is even higher than with sub.csv :) But it is a good starting point to increase results.

    • @GregHogg
      @GregHogg  3 года назад +1

      That's funny, the base solution is better than the logistic regression? Lmao

  • @Ashutoshprusty
    @Ashutoshprusty 3 года назад

    why not use pandas get dummies function for encoding?
    pd.get_dummies(train_pd.Pclass, drop_first=True)

    • @GregHogg
      @GregHogg  3 года назад +1

      Sure, there's many different ways to do these things - go for it if you prefer :)