Видео 6
Просмотров 76 736

Feature Engineering Secret From A Kaggle Grandmaster

22:23

How To Fill Missing Data With Pandas Fillna - Data Science For Beginners

2:54

How To Drop Columns In a Pandas DataFrame - Data Science For Beginners

2:11

Multiple Time Series Forecasting With Scikit-Learn

47:24

Fix Imbalanced Data In Machine Learning

A simple trick to deal with imbalanced classes when training machine learning models with code examples in Scikit-learn, XGBoost, and Tensorflow/Keras.
Remember to like and subscribe. Thanks!
*Video style heavily inspired by @Fireship
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
// SUPPORT THE CHANNEL 👇❤️
Sign up for a Coursera course:
imp.i384100.net/EaDmQe
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
// SOCIAL MEDIA
LinkedIn: www.linkedin.com/in/mariofilho/
Kaggle: kaggle.com/mariofilho
Twitter: mariofilhoml
Blog: forecastegy.com
Some links above can be from partnerships where I get a commission if you buy a product, without any additional cost to you. Thanks for the support!

Видео

Feature Engineering Secret From A Kaggle Grandmaster

22:23

Feature Engineering Secret From A Kaggle Grandmaster

Просмотров 38 тыс.3 года назад

Learn how to do feature engineering for tabular data like a Kaggle Grandmaster and get high-performance machine learning models. Like the video? Subscribe and turn on the notifications to get more tips :) 0:00 Intro 1:38 The One Question To Ask Yourself 2:40 Credit Card Fraud Examples 6:34 Brief Info On Categorical Features 7:23 Time Series Feature Engineering 11:53 An Extremely Valuable Exerci...

How To Fill Missing Data With Pandas Fillna - Data Science For Beginners

2:54

How To Fill Missing Data With Pandas Fillna - Data Science For Beginners

Просмотров 7873 года назад

Check my blog for more machine learning content: forecastegy.com Learn how to replace missing values in your pandas DataFrame with the fillna function. Like the video? Subscribe and turn on the notifications to get more tips :) Docs: pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html

How To Drop Columns In a Pandas DataFrame - Data Science For Beginners

2:11

How To Drop Columns In a Pandas DataFrame - Data Science For Beginners

Просмотров 3733 года назад

Check my blog for more machine learning content: forecastegy.com Learn how to drop one or more columns in a DataFrame using pandas. Like the video? Subscribe and turn on the notifications to get more tips :) Docs: pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html

Multiple Time Series Forecasting With Scikit-Learn

47:24

Multiple Time Series Forecasting With Scikit-Learn

Просмотров 36 тыс.3 года назад

You got a lot of time series data points and want to predict the next step (or steps). What should you do now? Train a model for each series? Is there a way to fit a model for all the series together? Which is better? I have seen many data scientists think about approaching this problem by creating a single model for each product. Although this is one of the possible solutions, it's not likely ...

@svitlanatuchyna7154 Месяц назад
You are creating amazing videos! Thank you! So well explained, easy to understand, helps to solve real ML problems and at the same time entertaining! Please keep creating more:))
@svitlanatuchyna7154 Месяц назад
Thank you for such an amazing video!!! It is incredibly useful!!!
@sarasatti1070 2 месяца назад
Hi Mario, and thank you for a very clear and concise explanation. One question I have is, how would you handle it if several of the products are only selling intermittently such that there are many zeros in the series?
@user-gz2po7dx3k 2 месяца назад
where are new videos!!!
@necuspam 2 месяца назад
More intriguing question is: how to train a model, based on thousands of timeseries, determined by multiple parameters, and then to simulate/forecast single timeseries, based on new set of the respective parameters
@parvneema 2 месяца назад
Very nicely explained. Your videos are good. Why did you start making them?
@dianavi3961 2 месяца назад
Thank you a lot!
@userhandle-u7b 3 месяца назад
It would be better if you use slides with key points. It was distracting by the 'hand-writing' on the screen & hard to read. Anyway, thanks
@anwarsaidan3959 3 месяца назад
Thank you very much for this amazing video ! Can we use Cross Validation for hyperparameter tuning in the case of RandomForest with time series data ?
@abdullahalmahfuz6700 4 месяца назад
Should i learn feature Engineering in 2024?
@mamyrak1114 5 месяцев назад
i can do the same processus if in place of week i have a date like yyyy-mm-dd and how to handle the year?
@bennyadrianmartinez 5 месяцев назад
Thank you. You did so very much in such little time in comparison to TWO different bootcamp instructors could in so much time...
@yuvrajchauhan9874 8 месяцев назад
00:01 Learn feature engineering for high performance models 02:00 Aggregation is essential for extracting useful information from tables and can be compared to the group-by function in various programming languages. 03:56 Feature engineering involves creating customer-specific features to predict fraud in transactions. 06:01 Feature Engineering is all about aggregation and encoding for capturing patterns and anomalies. 08:00 Feature engineering techniques like lag, difference, rolling, and date components are significant for analyzing time series data. 09:55 Seasonal patterns and time differences for feature engineering 11:55 Reverse engineer feature computation from Kaggle solutions 13:57 Feature engineering can be applied universally in tabular data for extracting features from multiple tables. 15:47 Feature engineering techniques used in data processing 17:41 Utilizing feature engineering to create indicators for bot usage from IP data. 19:22 Geolocation and network features are key for advanced feature engineering. 21:03 Graph features are important for model prediction.
@jackcarter97 8 месяцев назад
How do I find the season effect features?
@jackcarter97 8 месяцев назад
how do I find the season effect features?
@chungrandy780 9 месяцев назад
🎯 Key Takeaways for quick navigation: 00:00 📊 *Understanding Feature Engineering for Tabular Data* - Feature engineering is essential for high-performance machine learning models. - The key to feature engineering is aggregation, which involves grouping and summarizing data. - Aggregations can be applied to various types of data, including categorical and numerical variables. 06:22 🔄 *Common Feature Engineering Techniques* - Feature engineering techniques include lag, difference, rolling, date components, and time differences. - Lag captures the previous value of a variable in a sequence. - Difference calculates the difference between consecutive values in a sequence. - Rolling involves computing aggregations over a rolling window of data. - Date components extract information like month or day from dates for seasonality patterns. - Time differences measure the time elapsed between events. 15:21 🧩 *Reverse Engineering Features from a Kaggle Solution* - Analyzing features from a Kaggle competition example. - Median time between bids can be computed by grouping by user and calculating time differences between bids. - Mean number of bids per auction is determined by grouping by user and auction, then counting bid occurrences. - Detecting IP addresses used by both users and bots involves complex filtering and merging based on IP data. 21:05 🌐 *Advanced Feature Engineering* - Geolocation features can be important, calculating distances between locations, and spatial data aggregations. - Network or graph features involve representing data as graphs and computing graph-related metrics. - Suggests exploring the Instacart competition for advanced feature engineering with multiple tables. 22:16 📺 *Conclusion and Next Steps* - Encourages viewers to like, subscribe, and leave comments. - Offers a link to a time series forecasting workshop for further learning. Made with HARPA AI
@LifeKiT-i 11 месяцев назад
I just checked this amazing video after your feature selection engineering video! I have no idea why this is video isn’t popular!!! Respect the effort you spent on this!
@LifeKiT-i 11 месяцев назад
I am in a Kaggle competition. Learnt a lot from this video!! Thank you so much for uploading this video for us!!
@pcdowling Год назад
Thank you.
@dy8576 Год назад
Love the videos and blogs- absolute mad content, thank you very much
@paulkim244 Год назад
Fantastic video, so many useful references, I'm glad I watched the entire thing!
@VG-yw2mp Год назад
Why dont we use product_code as one of the features while training?
@Gabriel-iw3hc Год назад
how i future forecast with this method ? Ex: forecast week 52 ? i think, need to forecast another series too for another features .
@ElChe-Ko Год назад
Nice! It would be interesting to see what to do if the time series have different lengths.
@Septumsempra8818 Год назад
Are we going to get a video on cross-validation and selecting the right model? Your time series videos have been a wealth of knowledge.
@RodrigoLima-o5b Год назад
Mario, boa tarde. Tem algum dica para usarmos a LSTM para predições com passos à frente em um sistema MISO? .
@zulhas9 Год назад
Hi Mario, thanks for the wonderful presentation. One qouestion, how could you use the feature the "Sales" to predict sales? Using that features, when you predict using .predict function, you have to pass that as an argument. In reality, you would not have that information available.
@chengeeri Год назад
Good One!!!!! Expecting more from You!!!!!!
@ThePaintingpeter Год назад
I just found your video and it's great. The reference to FeatureTools was frustrating to say the least. The documentation on the site is not working and the github repo also has examples that just don't work. It's too bad
@dimka11ggg Год назад
Try different versions, probably examples for some old versions
@stonesupermaster Год назад
Hello Mario, I have a question... how does the model know that we're trying to predict multiple products at once? I've trying to train a model in order to predict the sales of 2000 SKU and the main concern I have now is how to do it efficiently. I watched everything that you did but I still have the same problem, do you know where I can find an example of it? thank you very much for your video
@AskApt05 4 месяца назад
Hi @stonesupermaster, Facing same problem. Have you found a solution? It would be really helpful if you can share. Thanks!
@therussiankid7296 Год назад
#getthistrending
@Mohammad-vr9dj Год назад
Thanks for the useful video. Sorry, is it possible to implement independent spatial sequences simultaneously? I have a dataset which is consist of 1000 independent spatial sequences with dimension 2*7 (2 for x and y, and the length 7 for positions in each time). I implemented it with Simple RNN, LSTM and GRU. Can I do it with transformers (attention mechanism)? Could you introduce me a practical example?
@marmadukewynn9826 2 года назад
🤘 ρгό𝔪σŞｍ
@MrGhustavo22 2 года назад
give more, please!
@SuperHddf 2 года назад
Thank you! :) ♥
@gregoryoliveira8358 2 года назад
I used this on my last project. It is very important to read the library documentation and find this unbalanced parameters.
@garcialn 2 года назад
Hi, Mario. Big fan of yours from DataHackers here! Do you know if the same applies for imbalanced data sets for anomalies detection? Such as default prediction or fraud detection problems? It's, usually, not a problem from sampling, but its from the nature of those problems having such imbalanced data... Don't know if it would end up creating bias or data leakege because of it...? Do you know better technics for this kinds of problems?
@Forecastegy 2 года назад
Hi Lucas, you can use it for anomaly detection. This is just a way of telling the model to pay more attention to the less frequent examples. Just remember to calibrate your predictions if you need probabilities instead of just a ranking score.
@anwarhermuche 2 года назад
Very clear explanation! Thank you for the video Mario
@Kevin-fp6gk 2 года назад
Loved the way you presented.
@RicardoZibordi 2 года назад
Clear, objective and very practical - congratulations!
@sekiro_19 2 года назад
Thank you so much man crazy good explanation
@VamosCoringar 2 года назад
Por essa eu não esperava kkkk
@gauravmalik3911 2 года назад
It would be great if you could show demo also , thank you for information
@snk2288 2 года назад
Difference between time features would lead to negative values. Do we take min max scaler after that?
@ozan4702 Год назад
You would want to apply difference such that future data is subtracted from past so its never negative.
@darkchoco7407 Год назад
No problem having negative values as features, at all
@Mohammad-vr9dj 2 года назад
Thanks for your useful video. Sorry, If our dataset has two target columns how can we write the codes?
@Learner_123 2 года назад
Thank you for making the topic simple. Since you have combined all the product sales to train and validate your model, How can one use this model to predict sales for 'any single' product only?
@zabmaz10 2 года назад
I have the same question, but I guess one way is to convert the product code into dummy variables and use those as features in the random forest.
@winniethepooh4891 2 года назад
This channel is a hidden gem !!!
@kaianchan7768 2 года назад
Thanks for this tutorial. Will you provide some videos about many features? Thanks!
@faraza5161 2 года назад
The Simple Imputer will impute mean values for the entire column in the missing values. Shouldn't that be done product wise as well? Thanks for a wonderful lecture btw :-)
@favourifunanya6108 2 года назад
Incredible sir

Forecastegy

Комментарии