What is Data Leakage? | Target Leakage | Preprocessing Leakage | Data Science

The A to Z of Naive Bayes | All that you need to know | Supervised Learning | Data Science

Read Giant Datasets Fast - 3 Tips For Better Data Science Skills

Retreat - The Skibidi Saga 12 (Part 1)

Green Bay Packers vs. Philadelphia Eagles Game Highlights | 2024 NFL Season

minecraft movie trailer… if it was good

The A to Z Complete Guide to Data Preprocessing | Data Pre-processing in Python | Data Science

Six Sigma Pro SMART

Просмотров 506

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 7 сен 2024
In data science, the journey from raw data to meaningful insights is possible only with careful preparation. In this video, we'll explore the landscape of data preparation, comparing the common approach with the practical approach. 🚀
Complete EDA and Data Preparation Playlist: tinyurl.com/4j3...
🔍 Common Approach: Laying the Foundation
The common approach to data preparation is like building a house using traditional methods. It involves familiar steps like missing value treatment, outlier detection and treatment, feature scaling, handling multicollinearity, and feature encoding. Each step plays a critical role in ensuring that the data is clean and ready for analysis. Not only are these steps important, their right sequence is equally important.
👉 Missing Value Treatment: Filling in the Blanks
Missing values are like gaps in a puzzle. In the common approach, we use simple techniques like mean, median, or mode imputation to fill these gaps. While these methods are quick and easy, they may not capture the true essence of the missing data.
📊 Outlier Treatment: Identifying the Odd Ones Out
Outliers can skew our analysis, much like a noisy signal disrupting a radio broadcast. The common approach involves removing or transforming these outliers to bring the data back in line with the rest of the dataset, but we also need to worry about loss of information in case we modify too much of genuine data.
📈 Feature Scaling: Bringing Balance
Features in a dataset can have varying scales, much like comparing apples to oranges. Scaling techniques like standardization or normalization are used in the common approach to bring all features to a similar scale, ensuring that no single feature dominates the analysis.
🔗 Handling Multicollinearity: Untangling the Web
Multicollinearity occurs when two or more features in a dataset are highly correlated. This can cause issues in some models. The common approach involves using techniques like variance inflation factor (VIF) to identify and mitigate multicollinearity.
🏷️ Feature Encoding: Decoding the Variables
Categorical variables need to be encoded into a numerical format for many machine learning algorithms to process them. The common approach includes methods like one-hot encoding or label encoding to achieve this.
Our follow-up video covers the practical approach to data pre-processing.

Комментарии • 6

@gumshuda24 8 месяцев назад
This is pure gold! Thanks for sharing the profound insights picked from the applied knowledge from the AI-ML industry.
@prosmartanalytics 8 месяцев назад
Thank you! We are glad you liked it.
@janaosama6010 6 месяцев назад ⁺¹
is removing the duplicates in data done before or after handling the missing values
@prosmartanalytics 6 месяцев назад
Removing duplicates could turn out to be a bit tricky. Ideally, we should remove duplicates only if each row in the dataset has a unique identifier and that identifier itself is duplicate e.g. we know two employees can't have the same employee id, so based on this, we can remove duplicates or suggest corrections. However, two employees can have the same age, same education, same location, and same salary, as long as these are two different employees we don't want to remove duplicates. Once these points are checked and if it is found that duplicate records are just data entry errors, we can remove duplicates before removing missing values. Basically, this is hygiene, not even data preprocessing. Hope it helps!
@younesgasmi8518 8 месяцев назад ⁺¹
Whene i have positive or negative infinity values ..Can I replace it with NaN an after that transfert it to normal values using median or mean strategy?
@prosmartanalytics 8 месяцев назад ⁺¹
Good question. First we should find out why a value would have become infinity e.g. we might have derived a ratio variable. It could be infinity because of division by zero? Second, what are the other feature values like in such rows where some features are attaining infinity and how many such values and rows are present in the data?
You may refer to our tutorial on outlier treatment for the choice of imputation techniques.

Следующие

Автовоспроизведение

What is Data Leakage? | Target Leakage | Preprocessing Leakage | Data Science

What is Data Leakage? | Target Leakage | Preprocessing Leakage | Data Science

The A to Z of Naive Bayes | All that you need to know | Supervised Learning | Data Science

The A to Z of Naive Bayes | All that you need to know | Supervised Learning | Data Science

Read Giant Datasets Fast - 3 Tips For Better Data Science Skills

Read Giant Datasets Fast - 3 Tips For Better Data Science Skills

Retreat - The Skibidi Saga 12 (Part 1)

Retreat - The Skibidi Saga 12 (Part 1)

Green Bay Packers vs. Philadelphia Eagles Game Highlights | 2024 NFL Season

Green Bay Packers vs. Philadelphia Eagles Game Highlights | 2024 NFL Season

minecraft movie trailer… if it was good

minecraft movie trailer… if it was good

Student dies after shooting inside Joppatowne High School

Student dies after shooting inside Joppatowne High School

Hands On Data Science Project: Understand Customers with KMeans Clustering in Python

Hands On Data Science Project: Understand Customers with KMeans Clustering in Python

How I’d learn ML in 2024 (if I could start over)

How I’d learn ML in 2024 (if I could start over)

FASTEST Way To Learn Coding and ACTUALLY Get A Job

FASTEST Way To Learn Coding and ACTUALLY Get A Job

I tried 50 Programming Courses. Here are Top 5.

I tried 50 Programming Courses. Here are Top 5.

1. The Complete Machine Learning Process Explained | Data Preprocessing in Machine learning

1. The Complete Machine Learning Process Explained | Data Preprocessing in Machine learning

How I would learn Machine Learning (if I could start over)

How I would learn Machine Learning (if I could start over)

Data Analyst Portfolio Project #2: Python Customer Segmentation & Clustering

Data Analyst Portfolio Project #2: Python Customer Segmentation & Clustering

How to Become a Data Scientist in 2024

How to Become a Data Scientist in 2024

👆🏻Жми на «МЫ поехали в Питер…» и смотри 1 из 48 видео про мою жизнь

👆🏻Жми на «МЫ поехали в Питер…» и смотри 1 из 48 видео про мою жизнь

КАМПУС - ПЕРВЫЙ ДЕНЬ В НОВОЙ ШКОЛЕ 🤯 ПОПАЛИ В КАБИНЕТ К ДИРЕКТОРУ 😱 ПРОБЛЕМЫ ДОМА

КАМПУС - ПЕРВЫЙ ДЕНЬ В НОВОЙ ШКОЛЕ 🤯 ПОПАЛИ В КАБИНЕТ К ДИРЕКТОРУ 😱 ПРОБЛЕМЫ ДОМА

Жириновский: Все деньги сдайте в казну! Мощная речь Жириновского в Думе #жириновский #ввж

Жириновский: Все деньги сдайте в казну! Мощная речь Жириновского в Думе #жириновский #ввж

Nuggets Gegagedigedagedago в Игре в Кальмара пытается пройти все препятствия

Nuggets Gegagedigedagedago в Игре в Кальмара пытается пройти все препятствия

Chuck Be Like : Not My Problem😂 | #brawlstars #shorts

Chuck Be Like : Not My Problem😂 | #brawlstars #shorts

Самое неинтересное видео

Самое неинтересное видео

ТЕПЕРЬ Я ВИДЕЛ ВСЁ! На что СПОСОБНЫ РАДИ.... / ЦАРЬ КАНАВЫ!

ТЕПЕРЬ Я ВИДЕЛ ВСЁ! На что СПОСОБНЫ РАДИ.... / ЦАРЬ КАНАВЫ!

Protect the environment and do not litter#Short #Officer Rabbit #angel

Protect the environment and do not litter#Short #Officer Rabbit #angel