What is the need to take Y_train and Y_test data, as we have not standardized it? Also, what does transform mean after you fit the data? Does it make the process to find standard deviation easier? Please let me know! Thank you!
I know i am late but writing X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2, random_state = 3) when we do train_test_split is just a syntax thing when you train_test_split but if you try running X_train, X_test = train_test_split(X, Y, test_size = 0.2, random_state = 3) you get an error like this ValueError: too many values to unpack (expected 2) it happens because you when you call train_test_split() the program returns all the values it is supposed to return i.e. X_train, X_test, Y_train, Y_test. Since you did not give Y_train, Y_test variable so it has no variable to store the values returned after running your program hence " too many values to unpack" _ Fit in scaler.fit is used to find the min and max value of the range after that you apply scaler.transform which using values provided by fit function rescales or standardizes X_train.
it is different in different cases. while training the model, we need to give two sets of values (x_train, y_ttain). whereas while Standardization, only x_train is required for fitting. because in that case we don't need the y_train. we are just standardizing x_train alone.
Why are u doing separately doing standardization on xtrain and xtest ? Instead can we do standardization on x as a whole, (not putting target variable in it)?
Hi! it is the general practice to standard practice to standardize xtrain and xtest separately. If we standardize the whole data before splitting it, there may be some problems created by outliers. And, we don't have to standardize target variables as they are just categories.
we can use any random_state value. if you use random_state = 3, your data will be splited in the same way as my data is getting splitter. if you mention some other value, then the split will be different.
hi! sklearn video doesn't fit well as a standalone video. so I have planned to explain the functions in videos of other modules like data pre processing & Model Training. once you watch the videos in the upcoming modules, you will get to know the important functions in sklearn.
Hi Saurabh! 'dataset', as loaded from sklearn, is a dictionary. 'data' is one of this dictionary's keys. By running dataset.data you reach to the values stored in the 'dataset' dictionary, 'data' key.
I came here after seeing Ur post in Facebook thanks bro help full
my pleasure 😇
Thank you 🙏
when i print the dataset i am not getting the output instead of that i am getting .what i have to do sir?Should i have to download anything?
use print statement before instead of calling direclty variab;le
Great work keep it up
Thanks, will do!
There is no link for data science project
Sir, why don't we standardize the test data?
What is the need to take Y_train and Y_test data, as we have not standardized it?
Also, what does transform mean after you fit the data? Does it make the process to find standard deviation easier?
Please let me know! Thank you!
I know i am late but writing
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2, random_state = 3)
when we do train_test_split is just a syntax thing when you train_test_split
but if you try running
X_train, X_test = train_test_split(X, Y, test_size = 0.2, random_state = 3)
you get an error like this
ValueError: too many values to unpack (expected 2)
it happens because you when you call train_test_split() the program returns all the values it is supposed to return i.e. X_train, X_test, Y_train, Y_test. Since you did not give Y_train, Y_test variable so it has no variable to store the values returned after running your program hence " too many values to unpack"
_
Fit in scaler.fit is used to find the min and max value of the range
after that you apply scaler.transform which using values provided by fit function rescales or standardizes X_train.
date: 26/3/2021 -- I was 17😭 26/3/2024--- i turned 20
I too turned 20😅
From 17 to 20-time's a blur! I'm right there with you. What's been the most unexpected twist in your journey?"
Bro while fitting the scalar quantity can't we use scalar.fit(X) instead of scalar.fit(X_train) ???
And is it compulsory to standardized the data ??
same query
Same query
While fitting dat into module we need to give two parameters or one? You have given one that is X-train.
it is different in different cases. while training the model, we need to give two sets of values (x_train, y_ttain). whereas while Standardization, only x_train is required for fitting. because in that case we don't need the y_train. we are just standardizing x_train alone.
Still I have some doubt..🤔
What if we have negative values in data set??
Sir if i have a categorical column.
And can I need to scale it after one hot encoding or not ??
Why are u doing separately doing standardization on xtrain and xtest ? Instead can we do standardization on x as a whole, (not putting target variable in it)?
Hi! it is the general practice to standard practice to standardize xtrain and xtest separately. If we standardize the whole data before splitting it, there may be some problems created by outliers. And, we don't have to standardize target variables as they are just categories.
can u explain y we should use random_state = 3 ?
we can use any random_state value. if you use random_state = 3, your data will be splited in the same way as my data is getting splitter. if you mention some other value, then the split will be different.
waiting for your sklearn tutorial module
hi! sklearn video doesn't fit well as a standalone video. so I have planned to explain the functions in videos of other modules like data pre processing & Model Training. once you watch the videos in the upcoming modules, you will get to know the important functions in sklearn.
where can i get the data set
kaggle , google dataset search
how is that if std is 200 then it varies a lot??
high standard deviation represents that the data are highly spread out.
What is "data" in dataset. data.std statement at 15:06 min?
it means the data present in the dataset.
Hi Saurabh! 'dataset', as loaded from sklearn, is a dictionary. 'data' is one of this dictionary's keys. By running dataset.data you reach to the values stored in the 'dataset' dictionary, 'data' key.