Passthrough some columns and drop others in a ColumnTransformer

Use OrdinalEncoder instead of OneHotEncoder with tree-based models

One Hot Encoder with Python Machine Learning (Scikit-Learn)

Poppy Playtime Chapter 4 Trailer - Yarnaby

Quando Rondo - Life Goes On [Official Music Video]

I Built A 1000HP Supercharged Humvee For a Top Secret Mission

Drop the first category from binary features (only) with OneHotEncoder

Data School

Просмотров 3,1 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 27 окт 2024

Комментарии • 9

@dataschool 3 года назад ⁺²
Big news! I just launched a free, 3-hour course that contains all 50 scikit-learn tips! Join here: courses.dataschool.io/scikit-learn-tips
@timfwater Год назад ⁺¹
I think the binary justification is because with binary -- its either yes or no. So 1 binary column can record that.
If you just remove a random column for one of your 3 shapes -- say 'square' -- then haven't you just lost that information from your dataset?
I guess you could infer that since there are only 3 discrete categories -- a '0' value for circle/oval implies that it must be a square. But then how would the presence of 'square' be returned as a predictive value in a later model, if square isn't an explicitly listed option?
In the case with 2 separate "Pink" and "Yellow" values -- both would be exactly correlated with one another, as the dichotomy is either/or. They are perfect opposites, and the absence of 1 of the 2 options enables you to infer the value 100% of the time.
In the case of 3 variables -- each of these columns would not represent a similar symmetric/binary relationship- as the absence of "square" doesn't allow you to directly infer the presence of either circle/oval as an alternative, as the absence of Pink enables you to do for Yellow. Having 2 alternatives instead of 1 introduces ambiguity that is not present in a binary relationship
Anyways just my thought. Thank you for the great content!
@dataschool Год назад ⁺¹
Great question! The information is not lost when you drop the first column, because the original categories are stored in the categories_ attribute of the OneHotEncoder (ohe.categories_). Hope that helps!
@sachink9102 Год назад
Explained very well, May i know what is Multicollinearity problem ?
@Atulmishra-hs8ch 3 года назад
Well my understanding says that "a binary feature when one-encoded will always give a 2*2 Matrix and non-binary is always n*2 Matrix".
This could be the supporting pillar for using "if_binary" as it removes redundancy from a very near Identity Matrix.
@dataschool 3 года назад ⁺¹
Thanks for your comment! I still don't quite understand, because regardless of whether the feature has 2 categories or 10 categories, there is still always 1 column (after one-hot encoding) that is redundant.
@sv1562 2 года назад
@@dataschool Because in-case of binary it will be always be negative collinearity ?!
@johnanih56 3 года назад
YOU ARE AWESOME!
@dataschool 3 года назад
Thank you! 🙏

Следующие

Автовоспроизведение

Passthrough some columns and drop others in a ColumnTransformer

Passthrough some columns and drop others in a ColumnTransformer

Use OrdinalEncoder instead of OneHotEncoder with tree-based models

Use OrdinalEncoder instead of OneHotEncoder with tree-based models

One Hot Encoder with Python Machine Learning (Scikit-Learn)

One Hot Encoder with Python Machine Learning (Scikit-Learn)

Poppy Playtime Chapter 4 Trailer - Yarnaby

Poppy Playtime Chapter 4 Trailer - Yarnaby

Quando Rondo - Life Goes On [Official Music Video]

Quando Rondo - Life Goes On [Official Music Video]

I Built A 1000HP Supercharged Humvee For a Top Secret Mission

I Built A 1000HP Supercharged Humvee For a Top Secret Mission

BLACK OPS 6 ZOMBIES LIBERTY FALLS EASTER EGG GUIDE: FULL BO6 ZOMBIES EASTER EGG WALKTHROUGH!

BLACK OPS 6 ZOMBIES LIBERTY FALLS EASTER EGG GUIDE: FULL BO6 ZOMBIES EASTER EGG WALKTHROUGH!

What is One Hot Encoding | One Hot Encoding | Machine Learning | Data Magic

What is One Hot Encoding | One Hot Encoding | Machine Learning | Data Magic

Machine Learning Tutorial Python - 6: Dummy Variables & One Hot Encoding

Machine Learning Tutorial Python - 6: Dummy Variables & One Hot Encoding

Tune multiple models simultaneously with GridSearchCV

Tune multiple models simultaneously with GridSearchCV

Ordinal Encoder with Python Machine Learning (Scikit-Learn)

Ordinal Encoder with Python Machine Learning (Scikit-Learn)

HOW COMPUTERS CAST STRINGS TO NUMBERS

HOW COMPUTERS CAST STRINGS TO NUMBERS

What is One Hot Encoding

What is One Hot Encoding

Three reasons not to use drop='first' with OneHotEncoder

Three reasons not to use drop='first' with OneHotEncoder

Create feature interactions using PolynomialFeatures

Create feature interactions using PolynomialFeatures

ПОЛИЦЕЙСКАЯ ТОЛКАЕТ ПОКУПАТЕЛЯ и КИНУЛА СВОЁ УДОСТОВЕРЕНИЕ! ПЫТАЮТСЯ ДОГОВОРИТЬСЯ? ОБВИНИЛИ В КРАЖЕ

ПОЛИЦЕЙСКАЯ ТОЛКАЕТ ПОКУПАТЕЛЯ и КИНУЛА СВОЁ УДОСТОВЕРЕНИЕ! ПЫТАЮТСЯ ДОГОВОРИТЬСЯ? ОБВИНИЛИ В КРАЖЕ

一把传统手工线锯的制作，简单易学，极致性价比 #woodworking

一把传统手工线锯的制作，简单易学，极致性价比 #woodworking

КАК СТАТЬ ГУРАМОМ АМАРЯНОМ #иванабрамов #гурамамарян #пародия #shorts

КАК СТАТЬ ГУРАМОМ АМАРЯНОМ #иванабрамов #гурамамарян #пародия #shorts

Random Emoji Beatbox Challenge #beatbox #tiktok

Random Emoji Beatbox Challenge #beatbox #tiktok

Skins from the Nightmare Collection. Standoff 2 (0.31.0)

Skins from the Nightmare Collection. Standoff 2 (0.31.0)

Новый УАЗ БУХАНКА! Вся ЖЕСТЬ! Вся ПРАВДА!!! Двигатель В ХЛАМ. ВСЁ В РЖАВЧИНЕ! СВАРКА, ШВЫ. ЭТО УЖАС.

Новый УАЗ БУХАНКА! Вся ЖЕСТЬ! Вся ПРАВДА!!! Двигатель В ХЛАМ. ВСЁ В РЖАВЧИНЕ! СВАРКА, ШВЫ. ЭТО УЖАС.

Ванька пошел!!!! 🥰

Ванька пошел!!!! 🥰

Оживляем ЗАБРОШЕННЫЙ ПОЛИЦЕЙСКИЙ автомобиль, который нашли в ЛЕСУ! Он стоял там 30 ЛЕТ!

Оживляем ЗАБРОШЕННЫЙ ПОЛИЦЕЙСКИЙ автомобиль, который нашли в ЛЕСУ! Он стоял там 30 ЛЕТ!