4.4. Data Standardization | Data Preprocessing | Machine Learning Course

Поделиться
HTML-код
  • Опубликовано: 16 ноя 2024

Комментарии • 34

  • @thiruvannnamalaik4777
    @thiruvannnamalaik4777 3 года назад +1

    I came here after seeing Ur post in Facebook thanks bro help full

  • @Praveenmanikanta32
    @Praveenmanikanta32 3 года назад +1

    Thank you 🙏

  • @saikrishna-ht1kd
    @saikrishna-ht1kd 2 года назад +2

    when i print the dataset i am not getting the output instead of that i am getting .what i have to do sir?Should i have to download anything?

    • @YashJhota
      @YashJhota 3 месяца назад

      use print statement before instead of calling direclty variab;le

  • @vinaynaik953
    @vinaynaik953 3 года назад +1

    Great work keep it up

  • @sonamraj5323
    @sonamraj5323 4 месяца назад

    There is no link for data science project

  • @kamitp4972
    @kamitp4972 Год назад +1

    Sir, why don't we standardize the test data?

  • @sahithyasirsi9300
    @sahithyasirsi9300 3 года назад +1

    What is the need to take Y_train and Y_test data, as we have not standardized it?
    Also, what does transform mean after you fit the data? Does it make the process to find standard deviation easier?
    Please let me know! Thank you!

    • @tusharmawa2542
      @tusharmawa2542 2 года назад +6

      I know i am late but writing
      X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2, random_state = 3)
      when we do train_test_split is just a syntax thing when you train_test_split
      but if you try running
      X_train, X_test = train_test_split(X, Y, test_size = 0.2, random_state = 3)
      you get an error like this
      ValueError: too many values to unpack (expected 2)
      it happens because you when you call train_test_split() the program returns all the values it is supposed to return i.e. X_train, X_test, Y_train, Y_test. Since you did not give Y_train, Y_test variable so it has no variable to store the values returned after running your program hence " too many values to unpack"
      _
      Fit in scaler.fit is used to find the min and max value of the range
      after that you apply scaler.transform which using values provided by fit function rescales or standardizes X_train.

  • @vibhutijoshi07
    @vibhutijoshi07 5 месяцев назад +3

    date: 26/3/2021 -- I was 17😭 26/3/2024--- i turned 20

    • @nipunpatil9157
      @nipunpatil9157 5 месяцев назад

      I too turned 20😅
      From 17 to 20-time's a blur! I'm right there with you. What's been the most unexpected twist in your journey?"

  • @hashyamodhia3511
    @hashyamodhia3511 3 года назад +2

    Bro while fitting the scalar quantity can't we use scalar.fit(X) instead of scalar.fit(X_train) ???
    And is it compulsory to standardized the data ??

  • @sachinvithubone4278
    @sachinvithubone4278 3 года назад +1

    While fitting dat into module we need to give two parameters or one? You have given one that is X-train.

    • @Siddhardhan
      @Siddhardhan  3 года назад +2

      it is different in different cases. while training the model, we need to give two sets of values (x_train, y_ttain). whereas while Standardization, only x_train is required for fitting. because in that case we don't need the y_train. we are just standardizing x_train alone.

    • @sachinvithubone4278
      @sachinvithubone4278 3 года назад

      Still I have some doubt..🤔

  • @anittamariamathew4136
    @anittamariamathew4136 2 года назад

    What if we have negative values in data set??

  • @subdas134
    @subdas134 2 года назад

    Sir if i have a categorical column.
    And can I need to scale it after one hot encoding or not ??

  • @saurabhsingh5472
    @saurabhsingh5472 3 года назад

    Why are u doing separately doing standardization on xtrain and xtest ? Instead can we do standardization on x as a whole, (not putting target variable in it)?

    • @Siddhardhan
      @Siddhardhan  3 года назад +2

      Hi! it is the general practice to standard practice to standardize xtrain and xtest separately. If we standardize the whole data before splitting it, there may be some problems created by outliers. And, we don't have to standardize target variables as they are just categories.

  • @umashankarv5361
    @umashankarv5361 3 года назад

    can u explain y we should use random_state = 3 ?

    • @Siddhardhan
      @Siddhardhan  3 года назад

      we can use any random_state value. if you use random_state = 3, your data will be splited in the same way as my data is getting splitter. if you mention some other value, then the split will be different.

  • @namanjain8970
    @namanjain8970 3 года назад

    waiting for your sklearn tutorial module

    • @Siddhardhan
      @Siddhardhan  3 года назад +1

      hi! sklearn video doesn't fit well as a standalone video. so I have planned to explain the functions in videos of other modules like data pre processing & Model Training. once you watch the videos in the upcoming modules, you will get to know the important functions in sklearn.

  • @lisachakraborty528
    @lisachakraborty528 9 месяцев назад

    where can i get the data set

    • @YashJhota
      @YashJhota 3 месяца назад

      kaggle , google dataset search

  • @digilearncommunity5598
    @digilearncommunity5598 3 года назад

    how is that if std is 200 then it varies a lot??

    • @Siddhardhan
      @Siddhardhan  3 года назад

      high standard deviation represents that the data are highly spread out.

  • @saurabhsingh5472
    @saurabhsingh5472 3 года назад

    What is "data" in dataset. data.std statement at 15:06 min?

    • @Siddhardhan
      @Siddhardhan  3 года назад +1

      it means the data present in the dataset.

    • @anistor
      @anistor 2 года назад +3

      Hi Saurabh! 'dataset', as loaded from sklearn, is a dictionary. 'data' is one of this dictionary's keys. By running dataset.data you reach to the values stored in the 'dataset' dictionary, 'data' key.