Data Preprocessing 06: One Hot Encoding python | Scikit Learn | Machine Learning

Stats Wire

Просмотров 58 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 22 янв 2025

Комментарии • 102

@ngneerin 2 года назад ⁺²⁸
This is so straightforward. No other source where it's so simply put
@StatsWire 2 года назад
Thank you
@_danfiz 2 года назад ⁺⁶
This is a good, direct steps to use ohe. This helps me alot. Thank you!
@StatsWire 2 года назад
You're welcome!
@kyleiong7311 Год назад ⁺¹
REALLLY REALLY HELPFUL YOU SAVE MY DAY!!!!!
@StatsWire Год назад
You're welcome!
@aliyildirim5343 Год назад
Amazing explanation! Left no questions in my mind...
@StatsWire Год назад ⁺¹
Thank you!
@nirajkhatri2017 Год назад ⁺¹
Great video regarding ohe using sklearn . Describe everything that we need to understand. Thank you
@StatsWire Год назад
Glad it was helpful!
@waleedahmad2012 Год назад
I'm getting so many rows at the bottom where entire row is has NaN except for the encoded columns. What could be the issue.
@waleedahmad2012 Год назад
I did remove all null values before encoding
@StatsWire Год назад
Can you please check your code again or post it here.
@ashutoshdongare5370 2 года назад ⁺³
Great Tutorial...Only thing is that ravel() does not work for uneven arrays, one need to use concat or hstack
@StatsWire 2 года назад
Thank you for sharing. I will try.
@ZAZ069 Год назад
thanks my man
@SMoon453 9 месяцев назад
Thank you dude! I was wondering why ravel() wasn't working for me
@fefefefezzz Год назад
Very good video! Helped me a lot! Thanks ❤❤
@StatsWire Год назад
You're welcome!
@poojakumarirollno9880 Год назад
great explaining sir .your video helped as .make more videos recarding data science
@StatsWire Год назад
Thank you for your kind words!
@jeanfabraruiz7994 2 года назад ⁺¹
After doing this, should I remove the columns color and country?
@StatsWire 2 года назад ⁺¹
Yes, use dummy columns.
@RR-hq4cv 2 года назад ⁺²
Thank you for the tutorial! In cells [14] & [15] I couldn't make a straight array to later pass it as column names. So I used this line of code (from sklearn documentation): feature_labels = ohe.get_feature_names_out()
print(feature_labels)
@StatsWire 2 года назад
Great!
@ajeyamandikal2010 2 года назад ⁺¹
Thanks bro, was searching for this!
@StatsWire 2 года назад
@@ajeyamandikal2010 You're welcome
@arenashawn772 Год назад
I think if you specify “sparse_output = False” when initializing the OneHotEncoder, the resulted ohe instance will not be a scipy csr_matrix and you won’t need to use the toarray() method to see the resulted matrix. But obviously it uses more storage this way…
@StatsWire Год назад
Yes
@fullnesmindcristiano8638 2 года назад ⁺¹
Hello friend, is this method used to predict data or what is the method used to predict data?
@StatsWire 2 года назад ⁺¹
This method is to convert categorical columns into numerical columns for machine learning model
@ngneerin 2 года назад ⁺¹
.ravel() or .flatten() is just not working it's returning array of array as it is
@StatsWire 2 года назад ⁺¹
Can you please check all the steps to see that you are not missing anything
@aakashrai2749 Год назад
Onehotencoder will convert the word into binary or number format right ?
@StatsWire Год назад
OneHotEncoder is a preprocessing technique used in machine learning to convert categorical data (e.g., words, categories, labels) into a numerical format.
@aakashrai2749 Год назад
@@StatsWire ok thanks 👍
@umeshk1255 2 года назад
Can you please help me I got error " For a sparse output, all columns should be a numeric or convertible to a numeric" for pipe.fit(X_train,y_train) I double checked all this I dont why encoder error is fromed.
PS- Car is already defined.
X=car.drop(columns='Emissions_CO_[mg/km]')
y=car['Emissions_CO_[mg/km]']

from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test= train_test_split(X,y, test_size=0.2)

from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import make_column_transformer
from sklearn.pipeline import make_pipeline

ohe = OneHotEncoder()
ohe.fit(X[['Manufacturer', 'Model', 'Fuel_Type']])
ohe.categories_

column_trans= make_column_transformer((OneHotEncoder(categories=ohe.categories_),['Manufacturer','Model','Fuel_Type']),
remainder= 'passthrough')

lr=LinearRegression()
pipe=make_pipeline(column_trans,lr)

pipe.fit(X_train,y_train)
@Intellectual_House Год назад
what's the role of toarray() methode ?
@StatsWire Год назад
Its primary role is to convert a sparse or structured array into a dense NumPy array.
@mousabmohammadshtayat4788 2 года назад ⁺³
Hi, Great video. I faced one problem. I have three categorical columns, and the # of unique values in these columns are different. I tried to add (dtype= object) in the (np.array command as I found the solution in many sites) but the result was in three different arrays not in one array. so please if you can help me. Thank u
@franfernandez795 2 года назад ⁺¹
I have the exact same problem as you, have you found the solution?
@franfernandez795 2 года назад ⁺²¹
I've found the solution! the method np.hstack(x) worked for me
@taylorgood1007 2 года назад ⁺¹
@@franfernandez795 life saver! thanks :)
@fatihmercan6023 2 года назад
@@franfernandez795 #danke
@bvbyballena472 2 года назад
@@franfernandez795 BLESS YOUR SOUL
@alonzoslim Год назад
Hello. Thanks for this video. It's quite informative.
How can I deal with a situation where the categories are of varying lengths?
I got this error message, "ValueError: all arrays must be same length"
@StatsWire Год назад
It occurs when the categorical data you're trying to encode has varying lengths. One-hot encoding requires that all arrays (or columns) being encoded have the same number of unique categories.
@lucykelly499 Год назад
I got the same error, how can it be resolved?@@StatsWire
@mdhasanuzzaman4039 2 года назад
Thank you so much
Great Tutorial
@StatsWire 2 года назад
You're welcome!
@dhm9818 Год назад
I got an error in line 17 says >> ValueError: Shape of passed values is (988, 35), indices imply (988, 7) how can I fix it?
@StatsWire Год назад
You made a mistake. Please follow steps again then you won't get the error.
@Hellios92 2 года назад
Thanks a lot, really helpful video! :)
@StatsWire 2 года назад
You're welcome!
@violetasaguier1370 2 года назад ⁺³
Very good video greetings from Argentina land of LEO MESSI
@StatsWire 2 года назад ⁺²
Thank you. I like Leo Messi a lot.
@emanuelea9967 2 года назад ⁺¹
Hi! Great video but I have a question for you. Can i map, from one categories, es "STATE", with 30 different states, an onehotencoder that map for example, 5 states in a new categories Europe, 6 in America, and so on, without create 30 different new binary categories with every states?
@StatsWire 2 года назад
Yes you can!
@Mob-IN-8606 Год назад
good video bro but could be better if you droped the unnessessary columns like color and country
@StatsWire Год назад
Thank you. This video was only for encoding purpose not for feature selection :)
@tymothylim6550 3 года назад
Thanks a lot! Great tutorial!
@StatsWire 3 года назад
You're welcome!
@bthekhoa2704 2 года назад
Hi, I don't know where I can download the data set
@StatsWire 2 года назад
Hi, you can download the data and jupyternotebook from my GitHub account: github.com/siddiquiamir/Python-Data-Preprocessing
@vamsikrishna-ft8rn 2 года назад
Well explained bro
@StatsWire 2 года назад
Thank you
@ibragim_on 2 года назад
Greate tutorial!
@StatsWire 2 года назад
Thank you
@codingzone4690 2 года назад
Very Quick !! Or So Simple. Unexpected Bruh!!
@StatsWire 2 года назад
Thank you
@victorarayal Год назад
It seems the code "ravel()" does not work if the columns have different number of unique values =(
@StatsWire Год назад
I did not try that. Can you check the official documentation?
@rubennadevi Год назад
Thank you!
@StatsWire Год назад
You're welcome!
@roshini_begum 2 года назад
hi whats the difference between one hot encoding and label encoder
@StatsWire 2 года назад ⁺²
Hi Roshini, label enconder is used to label your target variable(Y) and one hot encoder is used to encode independent variables(X). One hot encoding will create new columns but label encoding will just create numbers instead of strings it will not create new columns
@roshini_begum 2 года назад
@@StatsWire thanks alot
@roshini_begum 2 года назад
also when do we use minmax scaler and standard scaler and whats the difference betn them
@StatsWire 2 года назад ⁺¹
@@roshini_begum When we have outliers in the dataset we use standard scaler otherwise minmax scaler is good to use
@roshini_begum 2 года назад
@@StatsWire thanks
@mazharalamsiddiqui6904 3 года назад
Very nice tutorial
@StatsWire 3 года назад
Thank you
@ayeshabibi-b3l Год назад
Amazing
@StatsWire Год назад
Thank you!
@uchennaonyema989 2 года назад
.ravel() isn’t working bro. It returns same two arrays as before
@StatsWire 2 года назад
Please re-run the code and check.
@ammarayounas170 Год назад
thank you so much
@StatsWire Год назад
You're welcome!
@AbdullahAlMamun-jm4qm 3 года назад
Could you olease share the csv file of this data
@StatsWire 3 года назад
Sure. Here is the dataset link
Github: github.com/siddiquiamir/Data/blob/master/data-one-hot-encoder.csv
@anshulsharma7080 Год назад
Include problem of Dummy variable trap in one hot encoding please.
@StatsWire Год назад
Thank you for your feedback. I have added it to my list.
@ngneerin 2 года назад
It's also sad that such common use-case requires so many steps. Should be available in 1 step like pandas dummies
@StatsWire 2 года назад
Yes, pandas dummies is easier
@VinitKhandelwal 2 года назад
.ravel() did not work. I used .flatten()
@StatsWire 2 года назад
That's great. I hope it's working for you
@aravindng5157 Год назад
Bro paithiyama neee avlo variables podra
@StatsWire Год назад
I did not understand the language but thank you :D

Следующие

Автовоспроизведение

Data Preprocessing 07: Ordinal Encoding Sklearn | Machine Learning | Python