fit vs transform vs fit_transform | fit vs fit_transform | fit and fit_transofrm in sklearn

Поделиться
HTML-код
  • Опубликовано: 10 сен 2024
  • fit vs transform vs fit_transform | fit vs fit_transform | fit and fit_transofrm in sklearn
    #machinelearning #datascience #unfolddatascience
    Hello ,
    My name is Aman and I am a Data Scientist.
    All amazing data science courses at most affordable price here: www.unfolddata...
    Topics for the video:
    fit transform fit transform
    fit vs fit_transform
    fit and fit_transform in sklearn
    fit vs fit transform sklearn
    fit vs transform vs fit_tranform
    fit vs transform vs fit tranform
    About Unfold Data science: This channel is to help people understand basics of data science through simple examples in easy way. Anybody without having prior knowledge of computer programming or statistics or machine learning and artificial intelligence can get an understanding of data science at high level through this channel. The videos uploaded will not be very technical in nature and hence it can be easily grasped by viewers from different background as well.
    Book recommendation for Data Science:
    Category 1 - Must Read For Every Data Scientist:
    The Elements of Statistical Learning by Trevor Hastie - amzn.to/37wMo9H
    Python Data Science Handbook - amzn.to/31UCScm
    Business Statistics By Ken Black - amzn.to/2LObAA5
    Hands-On Machine Learning with Scikit Learn, Keras, and TensorFlow by Aurelien Geron - amzn.to/3gV8sO9
    Ctaegory 2 - Overall Data Science:
    The Art of Data Science By Roger D. Peng - amzn.to/2KD75aD
    Predictive Analytics By By Eric Siegel - amzn.to/3nsQftV
    Data Science for Business By Foster Provost - amzn.to/3ajN8QZ
    Category 3 - Statistics and Mathematics:
    Naked Statistics By Charles Wheelan - amzn.to/3gXLdmp
    Practical Statistics for Data Scientist By Peter Bruce - amzn.to/37wL9Y5
    Category 4 - Machine Learning:
    Introduction to machine learning by Andreas C Muller - amzn.to/3oZ3X7T
    The Hundred Page Machine Learning Book by Andriy Burkov - amzn.to/3pdqCxJ
    Category 5 - Programming:
    The Pragmatic Programmer by David Thomas - amzn.to/2WqWXVj
    Clean Code by Robert C. Martin - amzn.to/3oYOdlt
    My Studio Setup:
    My Camera : amzn.to/3mwXI9I
    My Mic : amzn.to/34phfD0
    My Tripod : amzn.to/3r4HeJA
    My Ring Light : amzn.to/3gZz00F
    Join Facebook group :
    www.facebook.c...
    Follow on medium : / amanrai77
    Follow on quora: www.quora.com/...
    Follow on twitter : @unfoldds
    Get connected on LinkedIn : / aman-kumar-b4881440
    Follow on Instagram : unfolddatascience
    Watch Introduction to Data Science full playlist here : • Data Science In 15 Min...
    Watch python for data science playlist here:
    • Python Basics For Data...
    Watch statistics and mathematics playlist here :
    • Measures of Central Te...
    Watch End to End Implementation of a simple machine learning model in Python here:
    • How Does Machine Learn...
    Learn Ensemble Model, Bagging and Boosting here:
    • Introduction to Ensemb...
    Build Career in Data Science Playlist:
    • Channel updates - Unfo...
    Artificial Neural Network and Deep Learning Playlist:
    • Intuition behind neura...
    Natural langugae Processing playlist:
    • Natural Language Proce...
    Understanding and building recommendation system:
    • Recommendation System ...
    Access all my codes here:
    drive.google.c...
    Have a different question for me? Ask me here : docs.google.co...
    My Music: www.bensound.c...

Комментарии • 27

  • @HimanshuKumar-oi8qh
    @HimanshuKumar-oi8qh Год назад +10

    fit_tranform on x_train and tranform on x_test. Reason - by fit_transform we are learning the parameters and transforming the x_train and if we do again fit_transform on x_test it will learn the parameters again so will do only transform on x_test. and sara mazra overfitting ka hai . Hope this is making sense.

    • @UnfoldDataScience
      @UnfoldDataScience  Год назад +1

      Yes - you understand the concepts well. Only thing to keep in mind, where we can use "learned parameters" on new data and where we can not

  • @PreenitaBhattacharya
    @PreenitaBhattacharya Час назад

    you are doing social work with such explanation sir. Thank you very much.

  • @mushinart
    @mushinart 11 месяцев назад +3

    After 2 long years ....now i know the answer 😭....im grateful

  • @shubhamagrawal7068
    @shubhamagrawal7068 Год назад +5

    We can apply fit on training data so that we have parameter values with us. We can also use fit_transform on training data. It will calculate parameter values from training data and do transformation as well. But on testing data, we always use transform and use the parameter values from training data. This will lead to data leakage problem. To avoid leakage problem we might use fit_transform on testing data. Correct me if I am wrong. And plz avoid this confusion by making a video Aman bhaiya...!!!!

  • @Krishna-pn5je
    @Krishna-pn5je 11 месяцев назад

    Hi Aman ,
    thanks for the video. my answer is below.
    In the prediction stage we don't require scalar object because the model still understands the numeric data and we require scaling only if the dataset has multiple numeric features and if we want to compute distance between data points
    In the prediction stage of tfidif vector, we should pass the vectorizer object because the vectorizer object helps in transforming the text to vector at evaluation stage before passing it to the model for prediction which is necessary.

  • @dakshbhatnagar
    @dakshbhatnagar Год назад +2

    For prediction we should ideally use transform because the data is fitted on training data and the test data is transformed using that fitted object. This can be for both tfidf and the scaler object.
    I could be wrong but this makes sense for me.

    • @UnfoldDataScience
      @UnfoldDataScience  Год назад +1

      Hi Daksh, hope u are doing great. Seeing your comment after long. For scaling, do you see some data leakage problems?

    • @kausikkar2587
      @kausikkar2587 Год назад +1

      Well that's what we follow usually, but then, there are cases where you have a completely different type of data with different number of maximum features. In that case you have to again fit your test data too. I applied it today on my Vikram IMDb film review NLP project using CountVectorizer and MultinomialNB. And it worked as expected. Hope this helps.

  • @shrirajpathak
    @shrirajpathak Год назад +6

    Why create all this confusion, just make the video with the answers in it...

  • @chandrabhanbahetwar9638
    @chandrabhanbahetwar9638 Год назад +3

    Bhai btana ha to puri chije clear btaya kro yr ye kya bhai tumne to hme hi confuse kr diya ki fit_transform use krege ya nhi test dataset me. video me reach chahiye to bol diya kro bhai hm sb comment kr dege lekin aisa confusion me fsake mt jaya kro. btana h to pura clear btao vrna rhne do

  • @ibrahimmosty1860
    @ibrahimmosty1860 4 месяца назад

    I will use separated scaler because each scaler save the data for the specific column

  • @subhashdixit5167
    @subhashdixit5167 Год назад

    Thanks for taking my comments seriously

  • @himalayaashish947
    @himalayaashish947 Год назад +1

    Hi,
    For the prediction.. we will have to use only transform because we have trained the model and we want to use same parameters so we will only use transform.
    For tfidf we will use fit_transform. Since the corpus is changing so we need to calculate the parameters and then apply so we will have to use fit_transform.

    • @squadgang1678
      @squadgang1678 Год назад

      I go with his answer

    • @UnfoldDataScience
      @UnfoldDataScience  Год назад

      Thanks for the answer, do you see data leakage problems with your approach?

    • @UnfoldDataScience
      @UnfoldDataScience  Год назад

      Also for tf idf, if your new corpus has a new word that was never there in training then what happens to model?

  • @niranjan.tanpure
    @niranjan.tanpure Год назад +1

    Product manager vs Data scientists which 1 pays you well sir ?

    • @UnfoldDataScience
      @UnfoldDataScience  Год назад +2

      Managing data science product is not at all an easy task - it will need all qualities of a seasoned data scientist + more.
      I believe should be paid more than a normal data scientist.

  • @rosemarydara1025
    @rosemarydara1025 Год назад

    This guy's teaching is really really amazing

  • @learning_with_irving4266
    @learning_with_irving4266 9 месяцев назад

    So is standardizing just finding the z score?

  • @weirdyounes7618
    @weirdyounes7618 Год назад

    Thkuuuuuu 🎉

  • @iyyappanmuthusamy1678
    @iyyappanmuthusamy1678 Год назад

    I don't think we will use both fit and transform function because while testing the dataset for our ml model we will not use testing dataset. we will use xtrain and ytrain dataset alone to feed for train our model in scaling.

    • @UnfoldDataScience
      @UnfoldDataScience  Год назад

      Which use case? is it Sclaing or tfidf you are suggesting about?

  • @arpittrivedi6636
    @arpittrivedi6636 Год назад

    In prediction we use only fit

    • @UnfoldDataScience
      @UnfoldDataScience  Год назад

      Only "fit" or only "transform"?
      Also in which scenario scaling/tf-idf

  • @faheemkhan-dm8zy
    @faheemkhan-dm8zy Год назад

    bakwas kia hy