Data Preprocessing 01: StandardScaler Machine Learning | Scikit Learn | Sklearn | Python |

Stats Wire

Просмотров 59 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 17 сен 2024
Data Preprocessing 01: StandardScaler Machine Learning | Scikit Learn | Sklearn | Python |
GitHub Jupyter Notebook: github.com/sid...
GitHub Data: github.com/sid...
About this video: In this video, you will learn about StandardScaler in Python
Large Language Model (LLM) - LangChain
LangChain: • LangChain Tutorial for...
Large Language Model (LLM) - LlamaIndex
LlamaIndex: • LlamaIndex Tutorial fo...
Machine Learning Model Deployment
ML Model Deployment: • ML Model Deployment us...
Spark with Python (PySpark)
PySpark: https: • PySpark with Python
Data Preprocessing (scikit-learn)
Data Preprocessing Python: • Data Preprocessing Python
Social Media Links
RUclips: / statswire
Twitter (X) : / statswire
#datascience #machinelearning #python #ai #ml #deeplearning #opencv #imageprocessing #ai #tensorflow #neuralnetworks #deeplearning #pandas

Комментарии • 84

@RhoChalmers Год назад ⁺²
Thank you! This explains things much more clearly than my textbook.
@StatsWire Год назад
Thank you for your kind words.
@dagma3437 3 года назад ⁺⁸
Thanks. It will be helpful for beginners to let them know why/the purpose standardizing the features
@StatsWire 3 года назад
Yes, it will be helpful for beginners. Thank you for the feedback.
@ettavictor4804 2 года назад ⁺¹
Thanks for your effort. I really appreciate it.
@StatsWire 2 года назад ⁺¹
I'm glad you liked it. You're welcome
@TylerMeester 3 года назад ⁺¹
Thank you, this helped a lot!
@StatsWire 3 года назад
You're welcome!
@aiz_i564 5 месяцев назад
THANKS A TON SIR!
@StatsWire 5 месяцев назад
You're welcome!
@AnhQuan04 7 месяцев назад
00:02 Standardization makes features look like a standard normally distributed data with mean 0 and unit variables.
01:40 Applying standardization on specific integer and float number variables.
03:22 Standardize variables using StandardScaler from the pre-processing library.
05:17 Using StandardScaler for data preprocessing
06:50 StandardScaler transforms data to standardized values
08:35 StandardScaler transforms data to have mean 0 and variance 1
10:12 StandardScaler transformation on test data and analysis of mean and variance.
11:56 Using StandardScaler for data standardization in Python
Crafted by Merlin AI.
@StatsWire 7 месяцев назад
Interesting
@mazharalamsiddiqui6904 3 года назад
Very nice tutorial
@StatsWire 3 года назад
Thank you
@youyangpeng9710 2 года назад ⁺¹
Thank you for ur teaching. Just i don't understand what the ''axis = 0' means.
@StatsWire 2 года назад ⁺⁴
I'm glad you liked it. axis=0 means you are applying it on row, and axis=1 means you are applying on columns
@zain3063 Год назад
thanks for sharing,
I want to ask if there is a manual calculation of the numbers formed from standardScaler processing?
@StatsWire Год назад
Yes, you can calculate it manually using the z score formula or you can just search the standard scaler formula sklearn you will get it on the official documentation.
@daniela.lapena7992 Год назад
Great video! Do you know if when we implement StandardScaler through the Pipeline we are doing it this way or if we are doing a fit_transform? How would it be done this way? Thanks
@StatsWire Год назад
Yes, we can apply it through the pipeline. There is one video on the pipeline in my channel you can watch that.
@otekanonso7059 Год назад
is that to say that the approximate values of the standard scalar mean of displacement and weight is zero?
@StatsWire Год назад
It is applying the standard normal distribution.
@lolikpof Год назад
Do we ever need to standardize the dependent variable "y"?
@StatsWire Год назад
We don't need
@_seeker423 2 года назад ⁺²
For the test data, we should be re-using the scaler object resulting from fitting only the train data, right?
something like...
ss = StandardScalar()
ss.fit(X_train)
ss.transform(X_train)
ss.transform(X_test)
@StatsWire 2 года назад ⁺¹
Yes from the train data only
@abhishekkarna9215 2 года назад ⁺²
But it will be bettter if you scale test dataset with training parameter by
scaled = StandardScalar().fit(train)
test_scaled = scaled.transform(test)
@StatsWire 2 года назад
Yes
@mayankbaber9384 Год назад
what is the meaning of random state parameter while splitting the data?
@StatsWire Год назад
Random state means when you split the data randomly but in every split you want the same samples not random samples then you have to use it.
@sherin7444 3 года назад
In this tutorial should we also transform mpg and acceleration columns?
@StatsWire 3 года назад ⁺¹
Yes, we can transform these two columns as well because they are numeric and are on a different scale. Just for the purpose of demonstrating how to use standard scaler I used few columns only otherwise you can transform other numerical columns as well
@mp2093 2 года назад
Very helpful tutorial, but I have a small problem. What to do if df.shape() returns an error : tuple object is not callable? Should I modify data type?
@StatsWire 2 года назад ⁺¹
Thanks. Look at the previous syntax or parenthesis.
@susamay 2 года назад
only df.shape no brackets
@StatsWire 2 года назад
@@susamay Ok
@BibleSamurai 2 года назад
now thats its scaled, now you just train model in this transformed data?
@StatsWire 2 года назад ⁺¹
Yes
@maxmacken8859 2 года назад
Great Video! Do you know where I can get the data set?
@StatsWire 2 года назад
Thank you. Here is the jupyternotebook and dataset link
Notebook : github.com/siddiquiamir/Python-Data-Preprocessing/blob/main/StandardScaler.ipynb
dataset: github.com/siddiquiamir/Python-Data-Preprocessing/blob/main/autompg.csv
@element6101 2 года назад
@@StatsWire Please provide it in description if possible
@StatsWire 2 года назад
@@element6101 Sure
@thaivuo2949 Год назад
hi sir, how can I calculate the standardized value from init value by mean and scale. I want apply for my program on my MCU. Hope your answer. Thanks
@StatsWire Год назад
You can also find the user-defined function to perform the same operation.
@thaivuo2949 Год назад
@@StatsWire is that scale_ is standard deviation?
@StatsWire Год назад ⁺¹
@@thaivuo2949 yes mean and std dev
@Hard_Online 2 года назад
Just wanted to know how to get the mean easily.... Thanks
@StatsWire 2 года назад
You can use NumPy to get the mean easily
> import NumPy as np
> np.mean(put any number)
@mrsilver8151 2 года назад
hi sir
how to normalize single row data
thanks in advance.
@StatsWire 2 года назад
It normalizes row by row. You can give the row number
@Sinsanevlog 2 года назад
this is same as z-score normalization?
@StatsWire 2 года назад ⁺¹
Yes
@Sinsanevlog 2 года назад
@@StatsWire thanks sir ❤️
@ImtithalSaeed Год назад
03:53 how to get suggestions while typing in Jupiter??
@StatsWire Год назад ⁺¹
Press tab key after writing few words
@ImtithalSaeed Год назад
@@StatsWire thanks
@GridoWit Месяц назад
You did't explain, what exactly StandardScaler did behind the scene. you just explained how to do it.
@StatsWire Месяц назад ⁺¹
Okay, I will make a separate video if you want more detailed information behind the scene. The formula of StandardScaler is (Xi-Xmean)/Xstd, so it adjusts the mean as a 0. It adjusts the mean to 0.
@GridoWit Месяц назад
@@StatsWire thanks for clarification and quick response 👍
@StatsWire Месяц назад
@@GridoWit You're welcome
@sheetalkumari9746 2 года назад
it was a very helpful video but why do we need to standardize the data ??
@StatsWire 2 года назад ⁺²
The reason we standardize the data is that we have different variables on different scales. For example, age can be in the range of 0-120, and salary can range from 1000 to 10000000. So the weight of the salary variable will be more in the model and age will be less. To bring all variables in the same scale so that the weight of all the variables will be the same we use standardization.
@sheetalkumari8581 2 года назад
@@StatsWire thank you for the explanation.keep doing the great work 👍
@StatsWire 2 года назад
@@sheetalkumari8581 Thank you for your kind words Sheetal.
@shilpakamath5264 2 года назад
But we don fit the xtest right??
@StatsWire 2 года назад
Right because this can lead to "data leakage"
@bea59kaiwalyakhairnar37 2 года назад
bro can you provide the data that you used.
@StatsWire 2 года назад
Yes bro, and also you can find the jupyter notebook on my github page. Below is the link
dataset: github.com/siddiquiamir/Python-Data-Preprocessing/blob/main/autompg.csv
Notebook: github.com/siddiquiamir/Python-Data-Preprocessing/blob/main/StandardScaler.ipynb
@svitirur1665 3 года назад
Sir, do you know where I can find free tutorial teach that ?
@StatsWire 3 года назад ⁺¹
May I know what do you want to learn?
@svitirur1665 3 года назад
@@StatsWire sklearn in practice
@StatsWire 3 года назад ⁺¹
@@svitirur1665 You can learn from the official documentation. Here is the link
scikit-learn.org/stable/
@jacksparrowbp Год назад
StandarScaler showing error
@StatsWire Год назад
What is the error?
@ishwarikulkarni3058 2 года назад
you have not shown how to transfer it back
@StatsWire 2 года назад ⁺¹
We can also get back to the original scale with a few more lines of code. Maybe in the next video, I can show it. Thank you for the suggestion
@ishwarikulkarni3058 2 года назад
@@StatsWire thank you
@wyldcard00 2 года назад
you didnt show how to inverse scale!!
@StatsWire 2 года назад ⁺¹
I forgot to add that in the video
@jameswood7207 3 года назад
Why not transform X instead of Xtest and Xtrain separately ??
@StatsWire 3 года назад ⁺⁸
Good question. This helps in preventing information about the distribution of the test set from leaking into your model. By fitting the scaler on the full dataset (X) prior to splitting, information about the test set is used to transform the training set, which in turn is passed downstream.
@jameswood7207 3 года назад
@@StatsWire thanks for the quick response!
@StatsWire 3 года назад
@@jameswood7207 You're welcome
@StatsWire 3 года назад
@@jameswood7207 You're welcome

Следующие

Автовоспроизведение

Data Preprocessing 02: MinMaxscaler Sklearn Python | Sklearn | Python