How to use Feature Engineering for Machine Learning, Equations

Поделиться
HTML-код
  • Опубликовано: 27 июл 2024
  • Feature engineering is the process of modifying/preprocessing the input to a model, such as a neural network, to make it easier for that model to produce an accurate result. In this video, I discuss the technique that I use to build my own features.
    Link to my paper that I referenced:
    arxiv.org/pdf/1701.07852.pdf
    ** Follow Me on Social Media!
    GitHub: github.com/jeffheaton
    Twitter: / jeffheaton
    Instagram: / jeffheatondotcom
    Discord: / discord
    Patreon: / jeffheaton
  • НаукаНаука

Комментарии • 72

  • @leonardsmith9870
    @leonardsmith9870 3 года назад +8

    Hi Jeff. I've recently subscribed and I honestly have to say you have the most comprehensive and easy to understand guides out there. Not to mention the fact that whenever there is an update to something, you make a new video explaining how to work with it. I tried getting in to machine learning just over a year ago and nobody at the time was able to actually explain anything apart from "download this, download that, if it doesn't work oh well" and would just go through the official tutorials without actually explaining how to do anything on your own. Your channel alone has given me the motivation to get started again and thank you so much for doing what you're doing!

    • @HeatonResearch
      @HeatonResearch  3 года назад

      Hello Leonard, thank you for the kind words. Glad the content is helpful, and yes, it is a lot of work keeping everything up to date.

  • @HarrysKavan
    @HarrysKavan 2 года назад +4

    Just wanted to leave a thank you Mr Heaton. I'm currently working on my bachelor thesis and your videos are a great help. Much appreciation.

  • @amineleking9898
    @amineleking9898 3 года назад +2

    Such a practical and helpful video, many thanks professor.

  • @germplus
    @germplus 2 года назад

    Fabulous explanation. In the early stages of my course ( MSc AI & Data Science ) and I find your channel very helpful. Thank you.

  • @ShashankData
    @ShashankData 3 года назад +1

    I've been following you for months, thank you for the free, well explained content!

  • @yongkangchia1993
    @yongkangchia1993 3 года назад

    Really valuable content that is clearly explained! keep up the great work sir!

  • @user-qy4jn1cg5p
    @user-qy4jn1cg5p 6 месяцев назад +1

    This is incredibly intuitive! Thanks

  • @khaledsrrr
    @khaledsrrr 11 месяцев назад

    Feature Engineering Explained! 😍
    This is likely the best explanation on YT. Thx 🙏

  • @daymaker_bybit
    @daymaker_bybit 10 месяцев назад

    This video and presentation is amazing. Thank you SO MUCH!! All the best!

  • @lakeguy65616
    @lakeguy65616 Год назад

    excellent video of real practical use!

  • @akramsystems
    @akramsystems 3 года назад

    This looks really fun to do!

  • @korhashamo
    @korhashamo Год назад

    Awesome. Great explanation. Thank you 🙏

  • @StevenSolomon-jb3zi
    @StevenSolomon-jb3zi Год назад

    Very insightful. Thank you.

  • @nicolaslpf
    @nicolaslpf Год назад +1

    Amazing video Jeff ! The only thing you didn't tell us is if you then drop the source features to avoid collinearity or you just leave them along with the new features you created .... Or you perform PCA, VIF or Lasso after it to chose what to do?.... I loved the video concise and super useful!

  • @sheikhakbar2067
    @sheikhakbar2067 3 года назад

    I like Jeff's approach of giving us the big picture of he is talking about!

  • @felixlucien7375
    @felixlucien7375 Год назад

    Awesome video, thank you!

  • @jameswilliamson1726
    @jameswilliamson1726 11 месяцев назад

    I read over your thesis comparing types of feature engineering vs machine learning models. Great stuff! Thx.

    • @HeatonResearch
      @HeatonResearch  11 месяцев назад

      Thanks!

    • @jameswilliamson1726
      @jameswilliamson1726 11 месяцев назад

      @@HeatonResearch Would standardizing or normalizing the input features give you better results? That one ratio had such a wide range.

    • @HeatonResearch
      @HeatonResearch  11 месяцев назад

      @@jameswilliamson1726 I will often standardize/norm after applying these techniques. The techniques I use here are really to capture the interaction between underlying features. Then standardization/normlization on top solves range concerns.

  • @SAAARC
    @SAAARC 3 года назад

    I found this video useful. Thanks!

  • @jonnywright8155
    @jonnywright8155 3 года назад +1

    Love the energy!!!

    • @HeatonResearch
      @HeatonResearch  3 года назад +2

      Thanks! I also went a little crazy on video editing too. lol

  • @sandeepmandrawadkar9133
    @sandeepmandrawadkar9133 7 месяцев назад

    Thanks for this great information

  • @MLOps
    @MLOps 3 года назад +1

    Super helpful! much appreciated!

  • @heysoymarvin
    @heysoymarvin 11 месяцев назад

    this is amazing!

  • @liquidinnovation
    @liquidinnovation 3 года назад

    Thanks, great video! Any examples on using the shap package to additively decompose regression r^2 using shapley values?

  • @jifanz8282
    @jifanz8282 3 года назад

    Informative video as always. +1 like for my professor 👏

  • @gauravmalik3911
    @gauravmalik3911 Год назад

    very informative

  • @hannes7218
    @hannes7218 Год назад

    great job!

  • @ali_adeeb
    @ali_adeeb 3 года назад

    thank you so much!!

  • @sumitchandak6131
    @sumitchandak6131 3 года назад

    Thia is really great and something out of box.
    Can you please provide similiar techniques for NLP as well

  • @jamalnuman
    @jamalnuman 4 месяца назад

    Very useful

  • @jhonnyespinozabryson8241
    @jhonnyespinozabryson8241 3 года назад

    Very thanks for sharing

  • @mohammed333suliman
    @mohammed333suliman Год назад +1

    Great, thank you.

  • @SuperHddf
    @SuperHddf Год назад

    thank you! :)

  • @Jeffben24
    @Jeffben24 3 года назад

    Thank you :)

  • @bingzexu7259
    @bingzexu7259 3 года назад +2

    When we do feature engineering, are we expecting that the new feature has a high correlation with the predicted values?

    • @HeatonResearch
      @HeatonResearch  3 года назад +1

      Yes for sure, so you must keep that in mind when evaluating feature importance. Generally, I leave the existing features in and let the model account for that (though some model types perform better with correlating fields removed).

  • @Shkvarka
    @Shkvarka 3 года назад

    Awesome explanation! Thank you very much! Best regards from Ukraine!:)

  • @juggergabro
    @juggergabro 2 года назад +1

    At last, not another Data Science hijacker trying to prove themself on YT... Thank you.

  • @DeebzFromThe90s
    @DeebzFromThe90s Год назад

    Hi Jeff, what concepts should I look into to understand "Weighting" better? For instance at 9:41, you mention that if one values food more they might square it. Someone might cube it, someone might multiply it or add a coefficient of 2 or 5. These are all subjective.
    For weighting when it comes to features in the stock market or econometrics (my specific application), one might have a feature that is GDP or inflation. I know for a fact that change in GDP (slope) and change in the change in GDP (slope of slope i.e., acceleration) are pretty important. My first problem, is that I found these two (change in GDP and GDP acceleration) simply through guess and check, and research papers. Is there a better method to this? Or should I focus on automating 'guess and check'? Secondly, sometimes the GDP features or inflation related features vary in importance to participants in the stock market. Perhaps right now (as of Oct 2022) investors might place more emphasis on inflation related features and so I might multiply inflation features by coefficient of 2 or square it. How would one deal with dynamic weighting? Or a simpler problem might be, how do you objectively select for weighting?
    EDIT: I have come up with an idea, to add a coefficient to GDP or inflation based on social media mentions (sentiment), for instance. Thoughts on this and weighting in general?
    Thanks so much! Love the video by the way!

  • @ramiismael7502
    @ramiismael7502 3 года назад

    Can you try all different possible method to do this.

  • @programming_hut
    @programming_hut 3 года назад +3

    💛✌️ Thanks

  • @Oliver-cn5xx
    @Oliver-cn5xx 3 года назад +1

    Hi Jeff, would you have a link to your paper and the kaggle notebook that you showed?

    • @HeatonResearch
      @HeatonResearch  3 года назад +2

      Oh yeah, I should have linked that. I added it to the description, here it is too: arxiv.org/pdf/1701.07852.pdf

    • @Oliver-cn5xx
      @Oliver-cn5xx 3 года назад

      @@HeatonResearch Thanks a lot!

  • @lehaipython9242
    @lehaipython9242 Год назад

    How should I perform Feature Engineering on anonymous variables? I cant put my domain knowledge on them

  • @youngjoopark4221
    @youngjoopark4221 Год назад

    I am novice. The model would figure out that relationship, then creating a new feature by dividing, multuplying something is worthy to do??

  • @johncaling6150
    @johncaling6150 3 года назад

    I dont remember if i asked this already if I did sorry but it would be great if you could do a tutorial about mxnet/gluon. It is a advanced library that is good for advanced things.

    • @HeatonResearch
      @HeatonResearch  3 года назад

      Currently researching Gluon for such a video.

    • @johncaling6150
      @johncaling6150 3 года назад

      @@HeatonResearch Nice.

    • @johncaling6150
      @johncaling6150 3 года назад

      @@HeatonResearch I always have a hard time getting it installed. You install guides are the best!!!!

  • @avithaker
    @avithaker 3 года назад

    Would love to see a link to your paper?

    • @HeatonResearch
      @HeatonResearch  3 года назад +1

      Sure! Should have linked in the description. arxiv.org/abs/1701.07852

    • @avithaker
      @avithaker 3 года назад

      Thank you!

  • @taktouk17
    @taktouk17 3 года назад

    Please show us how to customize StyleGan2 to for example generate a babyface or change the gender of someone in the image

    • @HeatonResearch
      @HeatonResearch  3 года назад +1

      Yes thinking about how to do something with that.

  • @brandonheaton6197
    @brandonheaton6197 3 года назад

    Can you address Sutton's Bitter Lesson as it applies here?

    • @HeatonResearch
      @HeatonResearch  3 года назад

      Kind of the limit of the Bitter Lesson, as time approaches infinity is that any program can be written by a random number generator, if we have enough compute time, and a way to verify correctness. I think the cleaver algorithms are always filling in the gap before massive compute is able to perform this operation on its own. However, I still see Kaggles won on feature engineering, so I tend to assume that it is still a needed skill. At least for now.

  • @Knud451
    @Knud451 2 года назад

    Thanks! Why would you e.g. square variables to make them more dominant in the model? Wouldn't the model just put more weight on them by themselves? Unless its because you want to make a nonlinear scaling of that variable.
    On a side note, isn't BMI a good example of poor feature design... 😀

  • @Yifzmagarki
    @Yifzmagarki 3 года назад

    cunning man, does not fully say what really works and what I use by professionals