Yes take care you and your team for people in home isolation 👇🏻 I've almostj recovered from COVID in home isolation. I'm sharing what helped me recover in case it helps someone. • Steam atleast 3 times a day • Plenty of fluids: Water (preferably warm), lemonade, coconut water • Salt water gargles • Vitamin C supplement • Plenty of rest • Meditation for peace of mind • Balanced diet • Regain smell: Smell ajwain, kapoor and cloves • Lie on your stomach periodically Monitor oxygen every 2 hours. Seek medical assistance if it's 92 or below. Pls add if I missed anything Add ajwain and kapoor into the water while taking steam and drink malvani kadha (Tulsi, adrak, jaggery, lavng, Black paper, ajwain, gavti cha,dalchini)
I have completed my 1-year post-graduation program in data science from a leading institute, but the various techniques I learned from your videos in free, were not even mentioned in the curriculum. Thank you for your easy and detailed explanation.
Tbh I don't prefer any lecture series except nptel. But seeing your 20-25 I personally feel this channel is a better resource for practical implementation of ML... Initially I didn't subscribe bcz I felt ur profile is looking young and u might not be knowing the way u taught 😁😁😁... Subscribed Thanks to you and to Nptel
I guess that you should first do fit_transform then train_test_split; As if you have first splited then according to train data you have calculated mean. Then applies same mean for test data, so test data won't have mean as zero. Please clear this doubt.
from an SO answer: "Normalization across instances should be done after splitting the data between training and test set, using only the data from the training set. This is because the test set plays the role of fresh unseen data, so it's not supposed to be accessible at the training stage. Using any information coming from the test set before or during training is a potential bias in the evaluation of the performance."
Hey Krish, Can you explain Generative Adversarial Networks (GANs) especially the coding part for a dataset other than an image dataset?? It would be of great help.
With respect to StandardScaler() If you split the dataset prior to scaling the features then don't you risk having skewed features? Put differently, if you train your model to learn that values of 1 get a certain weight and in your test set the data isn't standardized around the same mean as the train set then the model will invariably have worse accuracy unless the train set and test set features have the same mean, right? Shouldn't the test set samples of the full dataset removed only to serve as an "out-of-sample" test? Not two separate datasets?
when we are using gaussian transformation that will convert our distribution to gaussian distribution where mean=median or standard gaussian distribution where mean=0 and variance=1
No, you shouldn't scale categorical data. If the feature is categorical, it means that each value has a separate meaning, so normalizing will turn this features into something different. There are several ways to deal with categorical data: a) Integer Encoding: Where each unique label is mapped to an integer. b) One Hot Encoding: Where each label is mapped to a binary vector. c) Learned Embedding: Where a distributed representation of the categories is learned. --Sunil Sharanappa
In transformation we transform distribution in Normal distribution.then after transformation we also need to perform Standardisation(Scale down).please tell me if I am wrong.
It is generally best to apply standardization to the training set only, and then apply the same scaling to the test set. This is because the test set should represent unseen data, and you want to evaluate the model's performance on the test set as closely as possible to how it would perform on new, unseen data. Applying standardization to the entire dataset before splitting it into training and test sets could result in information leakage, as the model could learn about the test set during training.
I know python programming. And I'm learning data science by self-study .. My problem is I have 4 years gap in employment. Will I get job in data science field? Need your suggestions.. I'm 26 yrs old
Same story bro , yes u will get job as data scientist just focus on prep and projects. I took gap for preparation for upsc and rbi. In 2016 I got campus placement in amazon as sde . But after 4 year break and covid scene i started preparing for ds and was fortunate enough to start with Sky as data engineer for 10lpa. So sure u will also get placed
While doing the transformation, do we need to transform both numerical and categorical (encoded) features or only numerical ones? If target is continuous, do we need to transform that as well?
No, you shouldn't scale categorical data. If the feature is categorical, it means that each value has a separate meaning, so normalizing will turn this features into something different. There are several ways to deal with categorical data: a) Integer Encoding: Where each unique label is mapped to an integer. b) One Hot Encoding: Where each label is mapped to a binary vector. c) Learned Embedding: Where a distributed representation of the categories is learned. if the Target is continuous. Yes, you do need to scale the target variable if the target variable is having a large spread of values. --Sunil Sharanappa
Hello. Important mistake in this tutorial, so I have to stop watching it. Problem: you e.g. use MinMax Scaler on whole X_train with differently scaled variables inside. Let's assume "age" is distributed 18-65 while "fare" goes from 5-2000. Scaling age with the global min/max of the dataset, distorts your features. In this case for age 20 you would get z = X-Xmin/Xmax-Xmin = (20-5)/(2000-5) = 15/1995= 0.0075. Instead in the per-feature scaling with just age you would get z = (20-18)/(65-18) = 0.0426 corresponding to a 5-fold numerical difference. The maximal age of 65 would get z = (65-5)/(2000-5) = 0.03 !!!! Meaning age would have maximal value of 0.03 instead of 1!
hai, for eg I have a feature regarding age, height, weight now I willing to make the gaussian transformation, here in my case ==>logarithm tx makes a good fit for age ==>reciprocal tx makes a good fit for height the question is may I use both features(applied with age(log tx) & height(reciprocal tx)) for my train data, kindly reply to me, sir
Please take care everyone.
you too sir.....
Yes take care you and your team
for people in home isolation 👇🏻
I've almostj recovered from COVID in home isolation. I'm sharing what helped me recover in case it helps someone.
• Steam atleast 3 times a day
• Plenty of fluids: Water (preferably warm), lemonade, coconut water
• Salt water gargles
• Vitamin C supplement
• Plenty of rest
• Meditation for peace of mind
• Balanced diet
• Regain smell: Smell ajwain, kapoor and cloves
• Lie on your stomach periodically
Monitor oxygen every 2 hours. Seek medical assistance if it's 92 or below.
Pls add if I missed anything
Add ajwain and kapoor into the water while taking steam and drink malvani kadha (Tulsi, adrak, jaggery, lavng, Black paper, ajwain, gavti cha,dalchini)
@@shivu.sonwane4429 how can we monitor oxygen levels in home
Using oximetry it measures the oxygen level (oxygen saturation) though it's not very accurate but it's good enough for home.
a very important video to review all feature important techniques at one go ... thanks for uploading!
Pray your team members recover quickly. India needs good teachers.
What A Useful and Informative Video.
Most of the ML Courses are based on Algorithms which they forget the importance of Data Preparation
Get well soon, you people need more to us 👍👍👍👍👍
Greetings from Poland
Thanks mate!
As usual neatly explained..👍👍thank you for uploading 🙏
I am looking for these master krish! Take care too
I have completed my 1-year post-graduation program in data science from a leading institute, but the various techniques I learned from your videos in free, were not even mentioned in the curriculum.
Thank you for your easy and detailed explanation.
krish bhai....please upload a PDF of notes of video summary.... along with each video...
Sir sudhanshu sir tested positive my god please I hope he get well soon
Get we soon Sudh!!
Tbh I don't prefer any lecture series except nptel. But seeing your 20-25 I personally feel this channel is a better resource for practical implementation of ML...
Initially I didn't subscribe bcz I felt ur profile is looking young and u might not be knowing the way u taught 😁😁😁... Subscribed
Thanks to you and to Nptel
Krish Naik is best
great explanation
praying for employees of ineuron, inshallah everyone will get well soon.
I guess that you should first do fit_transform then train_test_split;
As if you have first splited then according to train data you have calculated mean.
Then applies same mean for test data, so test data won't have mean as zero.
Please clear this doubt.
Hi Adil, do you find the answer to your question? If yes, please share.
from an SO answer:
"Normalization across instances should be done after splitting the data between training and test set, using only the data from the training set.
This is because the test set plays the role of fresh unseen data, so it's not supposed to be accessible at the training stage. Using any information coming from the test set before or during training is a potential bias in the evaluation of the performance."
Hope the team will recover soon, Take Care !!
Excellent Sir!
thank you sir, it is just an amazing video!!
I pray for your team for speed recovery krish . We are also getting worst news day by day here in nepal ...
Hi Krish, while transformation why we are not dividing our data in Train and Test ?
Sir box cox transformation pe ak video banaye
Very Informative
thank you very much sir
Sir weather scalling is required after performing log transformation ??
finished watching
Hey Krish, Can you explain Generative Adversarial Networks (GANs) especially the coding part for a dataset other than an image dataset?? It would be of great help.
With respect to StandardScaler() If you split the dataset prior to scaling the features then don't you risk having skewed features? Put differently, if you train your model to learn that values of 1 get a certain weight and in your test set the data isn't standardized around the same mean as the train set then the model will invariably have worse accuracy unless the train set and test set features have the same mean, right? Shouldn't the test set samples of the full dataset removed only to serve as an "out-of-sample" test? Not two separate datasets?
If I have applied some encoding technique , do I have to scale them ?
when we are using gaussian transformation that will convert our distribution to gaussian distribution where mean=median or standard gaussian distribution where mean=0 and variance=1
Sir app apna dyan rakhiye . 🥺😢
Are we suppose to scale categorical features along with continuous features?
No, you shouldn't scale categorical data.
If the feature is categorical, it means that each value has a separate meaning, so normalizing will turn this features into something different.
There are several ways to deal with categorical data:
a) Integer Encoding: Where each unique label is mapped to an integer.
b) One Hot Encoding: Where each label is mapped to a binary vector.
c) Learned Embedding: Where a distributed representation of the categories is learned.
--Sunil Sharanappa
Do we require to check this transformation techniques in all binary classification problems?!
good
In transformation we transform distribution in Normal distribution.then after transformation we also need to perform Standardisation(Scale down).please tell me if I am wrong.
should standardization be applied to whole dataset or after we split into train test data?
It is generally best to apply standardization to the training set only, and then apply the same scaling to the test set. This is because the test set should represent unseen data, and you want to evaluate the model's performance on the test set as closely as possible to how it would perform on new, unseen data. Applying standardization to the entire dataset before splitting it into training and test sets could result in information leakage, as the model could learn about the test set during training.
In the join button, i can see (6 months: ₹283.20) plan. you have not mentioned this plan in that join video.Can you pls explain here sir?
Sir what about that 'df_scaled' term?
I am getting error at that point that df_scaled is not defined... Can you please explain
I know python programming. And I'm learning data science by self-study .. My problem is I have 4 years gap in employment. Will I get job in data science field? Need your suggestions.. I'm 26 yrs old
Same story bro , yes u will get job as data scientist just focus on prep and projects. I took gap for preparation for upsc and rbi. In 2016 I got campus placement in amazon as sde . But after 4 year break and covid scene i started preparing for ds and was fortunate enough to start with Sky as data engineer for 10lpa. So sure u will also get placed
@@anandbihari3135 skills required for a data engineer??
@@nishanthviswajith1496job lagi bro
@@nishanthviswajith1496Mca kar Raha hu koi scope hai bro
While doing the transformation, do we need to transform both numerical and categorical (encoded) features or only numerical ones? If target is continuous, do we need to transform that as well?
No, you shouldn't scale categorical data.
If the feature is categorical, it means that each value has a separate meaning, so normalizing will turn this features into something different.
There are several ways to deal with categorical data:
a) Integer Encoding: Where each unique label is mapped to an integer.
b) One Hot Encoding: Where each label is mapped to a binary vector.
c) Learned Embedding: Where a distributed representation of the categories is learned.
if the Target is continuous. Yes, you do need to scale the target variable if the target variable is having a large spread of values.
--Sunil Sharanappa
@@sunilsharanappa7721 thank you
how can i perform scaling on a k-fold data?
Pray for your team!
Hello. Important mistake in this tutorial, so I have to stop watching it.
Problem: you e.g. use MinMax Scaler on whole X_train with differently scaled variables inside. Let's assume "age" is distributed 18-65 while "fare" goes from 5-2000. Scaling age with the global min/max of the dataset, distorts your features. In this case for age 20 you would get z = X-Xmin/Xmax-Xmin = (20-5)/(2000-5) = 15/1995= 0.0075. Instead in the per-feature scaling with just age you would get z = (20-18)/(65-18) = 0.0426 corresponding to a 5-fold numerical difference. The maximal age of 65 would get z = (65-5)/(2000-5) = 0.03 !!!! Meaning age would have maximal value of 0.03 instead of 1!
hai, for eg I have a feature regarding age, height, weight
now I willing to make the gaussian transformation, here in my case
==>logarithm tx makes a good fit for age
==>reciprocal tx makes a good fit for height
the question is may I use both features(applied with age(log tx) & height(reciprocal tx)) for my train data, kindly reply to me, sir
@Krish Naik. sir kindly reply me
yeah , i have a same question , do you have any solution?
1st view 💞💞❤️