Implementing Machine Learninng Pipelines USsing Sklearn And Python

What is Data Pipeline | How to design Data Pipeline ? - ETL vs Data pipeline (2024)

How do I encode categorical features using scikit-learn?

Juno | New Hero Gameplay Trailer | Overwatch 2

HISTORY IN SAN DIEGO | All Blacks v Fiji | San Diego, 2024

skibidi toilet 76 (full episode)

Professional Preprocessing with Pipelines in Python

NeuralNine

Просмотров 60 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 19 мар 2022
In this video, we learn about preprocessing pipelines and how to professionally prepare data for machine learning.
◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾
📚 Programming Books & Merch 📚
🐍 The Python Bible Book: www.neuralnine.com/books/
💻 The Algorithm Bible Book: www.neuralnine.com/books/
👕 Programming Merch: www.neuralnine.com/shop
🌐 Social Media & Contact 🌐
📱 Website: www.neuralnine.com/
📷 Instagram: / neuralnine
🐦 Twitter: / neuralnine
🤵 LinkedIn: / neuralnine
📁 GitHub: github.com/NeuralNine
🎙 Discord: / discord
🎵 Outro Music From: www.bensound.com/
Наука

Комментарии • 40

@vzinko 7 месяцев назад ⁺¹³
Rather than creating a class for each step, another much easier approach is to make use of sklearn's FunctionTransformer. This basically allows you to write a custom function and turn it into a transformer object, which can then be fed through a pipeline as per normal
@isaacandrewdixon Год назад ⁺⁵
This was awesome and very informative. Many thanks from a machine learning novice!
@dmitriidavs4181 2 года назад ⁺¹
Fantastic video, always wondered the reasoning behind using classes in ml, thank you!!!
@randomfinn404 2 года назад ⁺¹⁷
For those who noticed that the encoder seems to sort the values alphabetically and messes up the job column names, instead of manually typing column names you can do:
matrix = encoder.fit_transform(X[['Job']]).toarray()
column_names = sorted([i for i in df['Job'].unique()])
This will also work if there are more /new jobs and values added and makes a column for each unique value while keeping the order.
Good tutorial in any case!
@jacksummers3918 Год назад ⁺⁷
Use
pd.get_dummies(X.Job, prefix="Job")
Much neater
@ShortsSmith 17 дней назад
G.... Thank You... I was hoping to that some one noticed it...
I'm glad that I got the Better version ❤
@onecarry1532 2 года назад ⁺⁷
Hey man, great channel! Love the topic based tutorials ❤️
Video Suggestion: Can I suggest you attempt making a video on: Using Python and the Tree Algorithm to make an autocomplete Python CLI program.
Haven’t seen this anywhere and I guess it’s a great way to understand why the Tree algorithm might be the best solution for an autocomplete program.
Thanks! Sure we all appreciate what you do for the community ♥️ 🌻
@nathanhaynes2856 5 месяцев назад ⁺²
Nice. For this example I might use the ColumnTransformer class, its perfect for dropping columns and integrating imputers and scalers on select features.
@Deacc 2 года назад ⁺¹
This video is pure gold. Thank you so much!
@niv_syt6315 2 года назад ⁺¹
I remember when I took courses from udemy in ML and took more time from this video, keeps to continue creating more videos from the same subject.
@manyes7577 Год назад
wow this technique is amazing. thanks for sharing us with brilliant knowledge
@nikulnayi3271 Год назад
Thank you so much nicely explained
with what you showed i created pipeline and dumped it as pikle file but when i tryinng to load that model and using it. i have been facing an error : AttributeError: Can't get attribute 'NullEncoder' on
@vlplbl85 2 года назад ⁺²
I find using FunctionTransformer much easier. It turns each of your custom functions into a transformer and you don't need to write a class, but just a function.
@tharakawickramasinghe3762 Год назад
Thank you. This is very helpful.
@apheironnn Год назад
That was really helpful, thanks!
@MrTactics26 5 месяцев назад
Sick video bro! 😎
@Juzz_RSA Год назад
Thank you, this was informative 😁
@jelcroospockt 8 месяцев назад
I would really like to find a tutorial on how to pass arguments to an pipeline function you created yourself, like the namedropper. So i can use the gridsearch to try out dropping different features.
@juandiegoorozco5531 9 месяцев назад
really useful, thank you very much
@rohscx 2 года назад
What is the opening song of this videos name?
@gasfeesofficial3557 Месяц назад
bro great video!!
@allanmachado2011 3 месяца назад
Thank you!
@sviteribuben7245 2 года назад
Very usefull! Thx!
@pakaponwiwat2405 6 месяцев назад
Thank you, sir!
@__wouks__ 2 года назад
I think your feature encoder has some faulty logic for the "Job" column. The df2 for example shows 1 x writer, 3 x programmer and 1 x teacher, but afterwards there isn't even a "teacher" column. And if you were to recreate the single columns using 1 or 0 from the features you created you wouldn't get the same dataframe.
@thomasgoodwin2648 2 года назад
With an eye towards the love that programming has gotten from the ml community lately, it occurs to me that perhaps ml could also be used more in the data preprocessing role.
For example: Choosing encoding types, handling missing values, flattening, etc could all be automated.
Just a thought.
2nd random thought. I know random noise has been added to features in an attempt to get the models to generalize better but did not fare well.
However I have not seen that anyone has tried simply using noise generators (normal, gaussian, etc) as individual features and allowing the model itself to choose when and where noise might be effective.
@736939 Год назад ⁺¹
16:42 I think it's wrong to use fit_transform in transform method, because it will cause to memory leakage, after you divide data into two parts train/test - where transform on the test dataset will recalculate imputer.
@falkstankat6511 Год назад
Yeah, thought the Same
@juanbetancourt5106 2 года назад
Great!
@aayushpatel2904 Год назад
Thanks Sir
@nachoeigu 2 года назад ⁺⁴
I have a big one question: What is the difference of build a Machine Learning application with Pipeline and to build a machine learning application with a OOP technique? I see that it is the same.
@adriandiaz5688 Год назад
Yeah, this is a great video but that's something I'm curious about as well.
@MalcombBrown 2 года назад ⁺²
Could you use the get_dummies pandas method for the One Hot Encoding?
@lexcheshir6416 2 года назад
yep
@o1techacademy 10 месяцев назад
Awesome
@dilshodfayzullayev924 6 месяцев назад
where do you work #admin
@slothner943 9 месяцев назад
Are you swedish? 😮
@bellabella-tv8zg 2 года назад
1st

Следующие

Автовоспроизведение

Implementing Machine Learninng Pipelines USsing Sklearn And Python

Implementing Machine Learninng Pipelines USsing Sklearn And Python

What is Data Pipeline | How to design Data Pipeline ? - ETL vs Data pipeline (2024)

What is Data Pipeline | How to design Data Pipeline ? - ETL vs Data pipeline (2024)

How do I encode categorical features using scikit-learn?

How do I encode categorical features using scikit-learn?

Juno | New Hero Gameplay Trailer | Overwatch 2

Juno | New Hero Gameplay Trailer | Overwatch 2

HISTORY IN SAN DIEGO | All Blacks v Fiji | San Diego, 2024

HISTORY IN SAN DIEGO | All Blacks v Fiji | San Diego, 2024

skibidi toilet 76 (full episode)

skibidi toilet 76 (full episode)

North America's Worst Theme Park Just Got Worse...

North America's Worst Theme Park Just Got Worse...

Titanic Survival Prediction in Python - Machine Learning Project

Titanic Survival Prediction in Python - Machine Learning Project

Scientific Concepts You're Taught in School Which are Actually Wrong

Scientific Concepts You're Taught in School Which are Actually Wrong

Data Pipelines Explained

Data Pipelines Explained

Modern Python logging

Modern Python logging

I gave 127 interviews. Top 5 Algorithms they asked me.

I gave 127 interviews. Top 5 Algorithms they asked me.

Modern Graphical User Interfaces in Python

Modern Graphical User Interfaces in Python

Building a Machine Learning Pipeline with Python and Scikit-Learn | Step-by-Step Tutorial

Building a Machine Learning Pipeline with Python and Scikit-Learn | Step-by-Step Tutorial

The BEST library for building Data Pipelines...

The BEST library for building Data Pipelines...

Построение пайплайнов с помощью sklearn или как выделиться на фоне остальных. День 1

Построение пайплайнов с помощью sklearn или как выделиться на фоне остальных. День 1

Battery low 🔋 🪫

Battery low 🔋 🪫

Smart appliances - new gadgets, versatile utensils, tool items #gadgets #shorts

Smart appliances - new gadgets, versatile utensils, tool items #gadgets #shorts

Гениальное решение для вашего телевизора

Гениальное решение для вашего телевизора

Самые крутые школьные гаджеты

Самые крутые школьные гаджеты

Android или iPhone - Какой Смартфон Лучше Купить в 2024 Году

Android или iPhone — Какой Смартфон Лучше Купить в 2024 Году

Aura 879dsp новинка и хит

Aura 879dsp новинка и хит

Лого для клиента из Таджикистана. Анимация в After Effects

Лого для клиента из Таджикистана. Анимация в After Effects