Feature Selection techniques in Python | feature selection machine learning | machine learning tips

Поделиться
HTML-код
  • Опубликовано: 26 мар 2022
  • Feature Selection techniques in Python | feature selection machine learning | machine learning tips
    Hello ,
    My name is Aman and I am a Data Scientist.
    About this video,
    In this video, I explain in detail about feature selection techniques in python. I explain about feature selection machine learning with its types and categories. I show the python demo of feature selection technique as well. Below points are covered in this video:
    1.Feature Selection techniques in Python
    2.feature selection machine learning
    3. machine learning tips
    4. feature selection unfold data science
    5. Python feature selection techniques
    About Unfold Data science: This channel is to help people understand basics of data science through simple examples in easy way. Anybody without having prior knowledge of computer programming or statistics or machine learning and artificial intelligence can get an understanding of data science at high level through this channel. The videos uploaded will not be very technical in nature and hence it can be easily grasped by viewers from different background as well.
    If you need Data Science training from scratch . Please fill this form (Please Note: Training is chargeable)
    docs.google.com/forms/d/1Acua...
    Book recommendation for Data Science:
    Category 1 - Must Read For Every Data Scientist:
    The Elements of Statistical Learning by Trevor Hastie - amzn.to/37wMo9H
    Python Data Science Handbook - amzn.to/31UCScm
    Business Statistics By Ken Black - amzn.to/2LObAA5
    Hands-On Machine Learning with Scikit Learn, Keras, and TensorFlow by Aurelien Geron - amzn.to/3gV8sO9
    Ctaegory 2 - Overall Data Science:
    The Art of Data Science By Roger D. Peng - amzn.to/2KD75aD
    Predictive Analytics By By Eric Siegel - amzn.to/3nsQftV
    Data Science for Business By Foster Provost - amzn.to/3ajN8QZ
    Category 3 - Statistics and Mathematics:
    Naked Statistics By Charles Wheelan - amzn.to/3gXLdmp
    Practical Statistics for Data Scientist By Peter Bruce - amzn.to/37wL9Y5
    Category 4 - Machine Learning:
    Introduction to machine learning by Andreas C Muller - amzn.to/3oZ3X7T
    The Hundred Page Machine Learning Book by Andriy Burkov - amzn.to/3pdqCxJ
    Category 5 - Programming:
    The Pragmatic Programmer by David Thomas - amzn.to/2WqWXVj
    Clean Code by Robert C. Martin - amzn.to/3oYOdlt
    My Studio Setup:
    My Camera : amzn.to/3mwXI9I
    My Mic : amzn.to/34phfD0
    My Tripod : amzn.to/3r4HeJA
    My Ring Light : amzn.to/3gZz00F
    Join Facebook group :
    groups/41022...
    Follow on medium : / amanrai77
    Follow on quora: www.quora.com/profile/Aman-Ku...
    Follow on twitter : @unfoldds
    Get connected on LinkedIn : / aman-kumar-b4881440
    Follow on Instagram : unfolddatascience
    Watch Introduction to Data Science full playlist here : • Data Science In 15 Min...
    Watch python for data science playlist here:
    • Python Basics For Data...
    Watch statistics and mathematics playlist here :
    • Measures of Central Te...
    Watch End to End Implementation of a simple machine learning model in Python here:
    • How Does Machine Learn...
    Learn Ensemble Model, Bagging and Boosting here:
    • Introduction to Ensemb...
    Build Career in Data Science Playlist:
    • Channel updates - Unfo...
    Artificial Neural Network and Deep Learning Playlist:
    • Intuition behind neura...
    Natural langugae Processing playlist:
    • Natural Language Proce...
    Understanding and building recommendation system:
    • Recommendation System ...
    Access all my codes here:
    drive.google.com/drive/folder...
    Have a different question for me? Ask me here : docs.google.com/forms/d/1ccgl...
    My Music: www.bensound.com/royalty-free...

Комментарии • 51

  • @boejiden7093
    @boejiden7093 2 года назад +6

    Thank you so much for this Aman. Please cover the wrapper and embedded categories too. I am currently a data scientist and these techniques are very helpful for my job. Thanks

  • @subhajitroy4869
    @subhajitroy4869 2 года назад +2

    Thank you so much for this and yes sir, please cover the rest of the two methods in detail.

  • @surajjadhav2382
    @surajjadhav2382 2 года назад +2

    Thanks for the video, plz cover other two methods as well

  • @dollysiharath4205
    @dollysiharath4205 Год назад

    Thank you so much, I always learned new things with all your courses.

  • @umasharma6119
    @umasharma6119 2 года назад

    great teaching sir g

  • @andresilva9140
    @andresilva9140 2 года назад

    Thank you for your hard work!!!!

  • @supreethalahari.sreeram5147
    @supreethalahari.sreeram5147 2 года назад +2

    Today's topic is too good clean and clear explanation. 👏👏Please do next videos also Aman sir👍 waiting for next series of feature selection videos 😎😎😎😎😎

  • @leamon9024
    @leamon9024 2 года назад +1

    Nice tutorial. Thanks for your hard work and efforts you put.
    I have one question though. Which category does PCA belong to?

  • @PrincyAnnThomas2022
    @PrincyAnnThomas2022 2 года назад

    Thanks. This was very useful. Please cover wrapper and embedded too if possible

  • @sudhanshusoni1524
    @sudhanshusoni1524 2 года назад +3

    Thanks a lot Aman for this detailed and simple explanation and putting a lot of efforts for us.
    Can you cover following topics anytime in future: This will be very helpful for People learning Data science in fields like: Chemical Engg/, Pollution control in Manufacturing/ Power Sectors etc..
    1. How to deal if test and validation set in time series regression data has different distribution.
    2. Approach to deal with time series batch wise processes eg: Batch wise manufacturing of Alcohol or any other chemical product and we have to do make a model on either preventive maintenance or how to increase run length and efficiency of upcoming batch processes.

  • @rahulsingh-qs7lm
    @rahulsingh-qs7lm 4 месяца назад

    Thanks Aman. Please cover the wrapper and embedded categories too.

  • @hemanthvokkaliga
    @hemanthvokkaliga 2 года назад +3

    Thanks Aman , can you please create a seperate playlist regarding this topic , it would be a great thing🙂

    • @UnfoldDataScience
      @UnfoldDataScience  2 года назад +2

      Good Suggestion, let me create one. There are some old videos also which can be put.

  • @vijayragavansk
    @vijayragavansk 2 года назад +1

    Thanks aman. Please cover the other two categories as well!

  • @dasgupts10
    @dasgupts10 Год назад +1

    Hi Aman, we use chi2 test on top of categorical variables. But here petal length and width are numerical variable. Can you please explain this?

  • @miteshkumarsingh
    @miteshkumarsingh 8 месяцев назад

    Yes please create RFE and Wrapper video

  • @aiswaryasprasad7319
    @aiswaryasprasad7319 Год назад

    Sir the tutorial is such a good one...but I seek to know which method is more better for feature selection in random forest algorithm

  • @shubhamrawat5299
    @shubhamrawat5299 2 года назад +1

    Thank you sir for your hard work and for this video.🙂

  • @mayankbhatt1308
    @mayankbhatt1308 2 года назад +1

    Thanks a lot Aman bhai

  • @trashantrathore4995
    @trashantrathore4995 2 года назад

    Aman, i have a question , can we directly fit_transform the X in VarianceThreshold to get the desired number of Columns? why do we have to do firstly .fit() and then get_support() when if we can get directly the result from fit_transform()?

  • @beautyisinmind2163
    @beautyisinmind2163 2 года назад

    can embedded lasso and ridge regularization be used in multiclass classification like iris data??????please reply

  • @daneshk6395
    @daneshk6395 2 года назад

    do we need separate feature selection for categorical variables
    w.r.t target variable is a categorical

  • @Shubham14365
    @Shubham14365 Год назад

    Hello Aman, I have a doubt regarding threshold used in Variance Threshold, I mean what does this threshold signifies? And how we will decide that what threshold value should be chosen when?

  • @mayankmehta8480
    @mayankmehta8480 2 года назад

    Sir your videos are very very helpful

  • @sadhnarai8757
    @sadhnarai8757 2 года назад

    Very good Aman

  • @tharunnl7810
    @tharunnl7810 4 месяца назад

    hello sir if there are so many techniques for feature selection how do we get to know what techniques to use when? chi and anova test looks similar which to prefer when? I have used Pearson's correlation to overcome multicollinearity at times... How to perform feature selection when there are around 150 features?

  • @ajaykushwaha-je6mw
    @ajaykushwaha-je6mw 2 года назад +1

    Hi Aman thank again for this video.
    I have a question.
    Who will suggest us the threshold value for correlation, is it business expert.
    For n numbers of categorical features who will decide the number of to features ?
    overall who is the person who decide threshold value for each way to select feature, can choose of our own of this decision is taken from Business expert.

    • @UnfoldDataScience
      @UnfoldDataScience  2 года назад

      HI Ajay,Domain+Experience will come into picture. Whoever suggest. Suppose there is medical data where there are 200 distinct features, now if I want to create bucket domain comes handy. Suppose I have 500 columns and 1 million rows run to a correlation analysis, experience will help me what will be the suitable threshold though research suggest 0.85/0.90.
      Sometimes we can take a lower value like 0.75 based on how my features are. Let's say I have different info in most features for example

  • @dineshjoshi4100
    @dineshjoshi4100 Год назад

    Hello, Thanks for the explanation. I have one question. My question is, Does using best features helps to reduce the training data sets. Say I do not have a large datasets, but I can make independent variable that is highly corelated with the dependent variable, will it help me reduce my traning data sets. Your response will be highly valuable.

    • @UnfoldDataScience
      @UnfoldDataScience  Год назад

      Yes, but model may not be suitable for practical purposes

  • @fahadnasir1605
    @fahadnasir1605 Год назад

    Hi Aman,
    Great tutorial as usual but I have one question.
    if chi square is used for categorical variables then why are you using chi square test for continuous variable?

    • @UnfoldDataScience
      @UnfoldDataScience  Год назад +1

      That variable may be categorical only, which part of video?

    • @fahadnasir1605
      @fahadnasir1605 Год назад

      @@UnfoldDataScience In the Iris dataset, where you are using SelectKbest with Chi Square (@13:50). The iris dataset has continuous variables if I am not wrong

  • @Ankitsharma-vo6sh
    @Ankitsharma-vo6sh 2 года назад

    yes please cover the wrapper and embedded

  • @priyankathakur1691
    @priyankathakur1691 Год назад

    How to find the target variable and feature selection when there are multiple numerical and categorical variables?

  • @keshavsingh6208
    @keshavsingh6208 2 года назад

    How we will select optimal threshold value...

    • @UnfoldDataScience
      @UnfoldDataScience  2 года назад

      There can't be one size fits all kind of value, it will depend on how many features you are loosing on a threhold, how may u think u need based on domain understanding, multiple things come into picture. So start from zero and move little up and see how it is coming

  • @nagamaninagamani7057
    @nagamaninagamani7057 Год назад

    Sir we want feature selection methods machine learning using pytho we want fast

  • @saminaqadir3382
    @saminaqadir3382 Год назад

    Kindly shre the google drive link for this code

  • @onlinearchitecturalservice7993
    @onlinearchitecturalservice7993 2 года назад

    filter category

  • @souravbiswas6892
    @souravbiswas6892 2 года назад

    Hi Aman, we use chi2 test on top of categorical variables. But here petal length and width are numerical variable. Can you please explain this?