Data Preprocessing 01: StandardScaler Machine Learning | Scikit Learn | Sklearn | Python |

Поделиться
HTML-код
  • Опубликовано: 17 сен 2024
  • Data Preprocessing 01: StandardScaler Machine Learning | Scikit Learn | Sklearn | Python |
    GitHub Jupyter Notebook: github.com/sid...
    GitHub Data: github.com/sid...
    About this video: In this video, you will learn about StandardScaler in Python
    Large Language Model (LLM) - LangChain
    LangChain: • LangChain Tutorial for...
    Large Language Model (LLM) - LlamaIndex
    LlamaIndex: • LlamaIndex Tutorial fo...
    Machine Learning Model Deployment
    ML Model Deployment: • ML Model Deployment us...
    Spark with Python (PySpark)
    PySpark: https: • PySpark with Python
    Data Preprocessing (scikit-learn)
    Data Preprocessing Python: • Data Preprocessing Python
    Social Media Links
    RUclips: / statswire
    Twitter (X) : / statswire
    #datascience #machinelearning #python #ai #ml #deeplearning #opencv #imageprocessing #ai #tensorflow #neuralnetworks #deeplearning #pandas

Комментарии • 84

  • @RhoChalmers
    @RhoChalmers Год назад +2

    Thank you! This explains things much more clearly than my textbook.

    • @StatsWire
      @StatsWire  Год назад

      Thank you for your kind words.

  • @dagma3437
    @dagma3437 3 года назад +8

    Thanks. It will be helpful for beginners to let them know why/the purpose standardizing the features

    • @StatsWire
      @StatsWire  3 года назад

      Yes, it will be helpful for beginners. Thank you for the feedback.

  • @ettavictor4804
    @ettavictor4804 2 года назад +1

    Thanks for your effort. I really appreciate it.

    • @StatsWire
      @StatsWire  2 года назад +1

      I'm glad you liked it. You're welcome

  • @TylerMeester
    @TylerMeester 3 года назад +1

    Thank you, this helped a lot!

  • @aiz_i564
    @aiz_i564 5 месяцев назад

    THANKS A TON SIR!

    • @StatsWire
      @StatsWire  5 месяцев назад

      You're welcome!

  • @AnhQuan04
    @AnhQuan04 7 месяцев назад

    00:02 Standardization makes features look like a standard normally distributed data with mean 0 and unit variables.
    01:40 Applying standardization on specific integer and float number variables.
    03:22 Standardize variables using StandardScaler from the pre-processing library.
    05:17 Using StandardScaler for data preprocessing
    06:50 StandardScaler transforms data to standardized values
    08:35 StandardScaler transforms data to have mean 0 and variance 1
    10:12 StandardScaler transformation on test data and analysis of mean and variance.
    11:56 Using StandardScaler for data standardization in Python
    Crafted by Merlin AI.

  • @mazharalamsiddiqui6904
    @mazharalamsiddiqui6904 3 года назад

    Very nice tutorial

  • @youyangpeng9710
    @youyangpeng9710 2 года назад +1

    Thank you for ur teaching. Just i don't understand what the ''axis = 0' means.

    • @StatsWire
      @StatsWire  2 года назад +4

      I'm glad you liked it. axis=0 means you are applying it on row, and axis=1 means you are applying on columns

  • @zain3063
    @zain3063 Год назад

    thanks for sharing,
    I want to ask if there is a manual calculation of the numbers formed from standardScaler processing?

    • @StatsWire
      @StatsWire  Год назад

      Yes, you can calculate it manually using the z score formula or you can just search the standard scaler formula sklearn you will get it on the official documentation.

  • @daniela.lapena7992
    @daniela.lapena7992 Год назад

    Great video! Do you know if when we implement StandardScaler through the Pipeline we are doing it this way or if we are doing a fit_transform? How would it be done this way? Thanks

    • @StatsWire
      @StatsWire  Год назад

      Yes, we can apply it through the pipeline. There is one video on the pipeline in my channel you can watch that.

  • @otekanonso7059
    @otekanonso7059 Год назад

    is that to say that the approximate values of the standard scalar mean of displacement and weight is zero?

    • @StatsWire
      @StatsWire  Год назад

      It is applying the standard normal distribution.

  • @lolikpof
    @lolikpof Год назад

    Do we ever need to standardize the dependent variable "y"?

  • @_seeker423
    @_seeker423 2 года назад +2

    For the test data, we should be re-using the scaler object resulting from fitting only the train data, right?
    something like...
    ss = StandardScalar()
    ss.fit(X_train)
    ss.transform(X_train)
    ss.transform(X_test)

    • @StatsWire
      @StatsWire  2 года назад +1

      Yes from the train data only

  • @abhishekkarna9215
    @abhishekkarna9215 2 года назад +2

    But it will be bettter if you scale test dataset with training parameter by
    scaled = StandardScalar().fit(train)
    test_scaled = scaled.transform(test)

  • @mayankbaber9384
    @mayankbaber9384 Год назад

    what is the meaning of random state parameter while splitting the data?

    • @StatsWire
      @StatsWire  Год назад

      Random state means when you split the data randomly but in every split you want the same samples not random samples then you have to use it.

  • @sherin7444
    @sherin7444 3 года назад

    In this tutorial should we also transform mpg and acceleration columns?

    • @StatsWire
      @StatsWire  3 года назад +1

      Yes, we can transform these two columns as well because they are numeric and are on a different scale. Just for the purpose of demonstrating how to use standard scaler I used few columns only otherwise you can transform other numerical columns as well

  • @mp2093
    @mp2093 2 года назад

    Very helpful tutorial, but I have a small problem. What to do if df.shape() returns an error : tuple object is not callable? Should I modify data type?

    • @StatsWire
      @StatsWire  2 года назад +1

      Thanks. Look at the previous syntax or parenthesis.

    • @susamay
      @susamay 2 года назад

      only df.shape no brackets

    • @StatsWire
      @StatsWire  2 года назад

      @@susamay Ok

  • @BibleSamurai
    @BibleSamurai 2 года назад

    now thats its scaled, now you just train model in this transformed data?

  • @maxmacken8859
    @maxmacken8859 2 года назад

    Great Video! Do you know where I can get the data set?

    • @StatsWire
      @StatsWire  2 года назад

      Thank you. Here is the jupyternotebook and dataset link
      Notebook : github.com/siddiquiamir/Python-Data-Preprocessing/blob/main/StandardScaler.ipynb
      dataset: github.com/siddiquiamir/Python-Data-Preprocessing/blob/main/autompg.csv

    • @element6101
      @element6101 2 года назад

      @@StatsWire Please provide it in description if possible

    • @StatsWire
      @StatsWire  2 года назад

      @@element6101 Sure

  • @thaivuo2949
    @thaivuo2949 Год назад

    hi sir, how can I calculate the standardized value from init value by mean and scale. I want apply for my program on my MCU. Hope your answer. Thanks

    • @StatsWire
      @StatsWire  Год назад

      You can also find the user-defined function to perform the same operation.

    • @thaivuo2949
      @thaivuo2949 Год назад

      @@StatsWire is that scale_ is standard deviation?

    • @StatsWire
      @StatsWire  Год назад +1

      @@thaivuo2949 yes mean and std dev

  • @Hard_Online
    @Hard_Online 2 года назад

    Just wanted to know how to get the mean easily.... Thanks

    • @StatsWire
      @StatsWire  2 года назад

      You can use NumPy to get the mean easily
      > import NumPy as np
      > np.mean(put any number)

  • @mrsilver8151
    @mrsilver8151 2 года назад

    hi sir
    how to normalize single row data
    thanks in advance.

    • @StatsWire
      @StatsWire  2 года назад

      It normalizes row by row. You can give the row number

  • @Sinsanevlog
    @Sinsanevlog 2 года назад

    this is same as z-score normalization?

  • @ImtithalSaeed
    @ImtithalSaeed Год назад

    03:53 how to get suggestions while typing in Jupiter??

  • @GridoWit
    @GridoWit Месяц назад

    You did't explain, what exactly StandardScaler did behind the scene. you just explained how to do it.

    • @StatsWire
      @StatsWire  Месяц назад +1

      Okay, I will make a separate video if you want more detailed information behind the scene. The formula of StandardScaler is (Xi-Xmean)/Xstd, so it adjusts the mean as a 0. It adjusts the mean to 0.

    • @GridoWit
      @GridoWit Месяц назад

      @@StatsWire thanks for clarification and quick response 👍

    • @StatsWire
      @StatsWire  Месяц назад

      @@GridoWit You're welcome

  • @sheetalkumari9746
    @sheetalkumari9746 2 года назад

    it was a very helpful video but why do we need to standardize the data ??

    • @StatsWire
      @StatsWire  2 года назад +2

      The reason we standardize the data is that we have different variables on different scales. For example, age can be in the range of 0-120, and salary can range from 1000 to 10000000. So the weight of the salary variable will be more in the model and age will be less. To bring all variables in the same scale so that the weight of all the variables will be the same we use standardization.

    • @sheetalkumari8581
      @sheetalkumari8581 2 года назад

      @@StatsWire thank you for the explanation.keep doing the great work 👍

    • @StatsWire
      @StatsWire  2 года назад

      @@sheetalkumari8581 Thank you for your kind words Sheetal.

  • @shilpakamath5264
    @shilpakamath5264 2 года назад

    But we don fit the xtest right??

    • @StatsWire
      @StatsWire  2 года назад

      Right because this can lead to "data leakage"

  • @bea59kaiwalyakhairnar37
    @bea59kaiwalyakhairnar37 2 года назад

    bro can you provide the data that you used.

    • @StatsWire
      @StatsWire  2 года назад

      Yes bro, and also you can find the jupyter notebook on my github page. Below is the link
      dataset: github.com/siddiquiamir/Python-Data-Preprocessing/blob/main/autompg.csv
      Notebook: github.com/siddiquiamir/Python-Data-Preprocessing/blob/main/StandardScaler.ipynb

  • @svitirur1665
    @svitirur1665 3 года назад

    Sir, do you know where I can find free tutorial teach that ?

    • @StatsWire
      @StatsWire  3 года назад +1

      May I know what do you want to learn?

    • @svitirur1665
      @svitirur1665 3 года назад

      @@StatsWire sklearn in practice

    • @StatsWire
      @StatsWire  3 года назад +1

      @@svitirur1665 You can learn from the official documentation. Here is the link
      scikit-learn.org/stable/

  • @jacksparrowbp
    @jacksparrowbp Год назад

    StandarScaler showing error

  • @ishwarikulkarni3058
    @ishwarikulkarni3058 2 года назад

    you have not shown how to transfer it back

    • @StatsWire
      @StatsWire  2 года назад +1

      We can also get back to the original scale with a few more lines of code. Maybe in the next video, I can show it. Thank you for the suggestion

    • @ishwarikulkarni3058
      @ishwarikulkarni3058 2 года назад

      @@StatsWire thank you

  • @wyldcard00
    @wyldcard00 2 года назад

    you didnt show how to inverse scale!!

    • @StatsWire
      @StatsWire  2 года назад +1

      I forgot to add that in the video

  • @jameswood7207
    @jameswood7207 3 года назад

    Why not transform X instead of Xtest and Xtrain separately ??

    • @StatsWire
      @StatsWire  3 года назад +8

      Good question. This helps in preventing information about the distribution of the test set from leaking into your model. By fitting the scaler on the full dataset (X) prior to splitting, information about the test set is used to transform the training set, which in turn is passed downstream.

    • @jameswood7207
      @jameswood7207 3 года назад

      @@StatsWire thanks for the quick response!

    • @StatsWire
      @StatsWire  3 года назад

      @@jameswood7207 You're welcome

    • @StatsWire
      @StatsWire  3 года назад

      @@jameswood7207 You're welcome