Guys just a tip here...u can simplify the process of obtaining the X2 statictic-> X2_statistic=(observed_values - estimated_values)**2/estimated_values X2_statistic=X2statistic.sum() and make sure observed_values and estimated_values are numpy arrays
Krish, Thank you for the explanation. I have a question. Why didn't you use the P-Value and Chi_Square values that the contingency function provides and you calculated them separately? even the numbers you got are not the same.
Krish, Really nice video. what steps should we take after we perform these test. I have following question- 1. What should we do if two features are related with each another. Do we need to exclude one in feature selection? Or what should we do? 2. If there are independent, then we are good to take both features in our model for prediction?
Sir I have doubt.. At 11:42 u said that chi2_statistic should always be greater than critical value then only we retain null hypothesis but in the code our chi2_statistic value smaller than critical value in if condition u gave if(chi2_statisti>=critical value): print(reject ho and accept h1 there is a relation) else: print(retain ho there is no relation) I think we have to reject the ho null hypothesis if chi2_statistic is greater than critical value
Am I the only one who saw statquest by josh starmer? I am following statquest before krish Naik sir lecture...... believe me statquest is very good for indepth knowledge
Hey Krish , Love your videos. Kindly upload more videos in the machine learning pipeline section. The last one is feature selection . Interpretation and deployment videos would be largely appreciated.
Instead of writing chi_square_statistic=chi_square[0]+chi_square[1].... for number of rows just replace this line with chi_square_statistic=chi_square.sum(), So you need not to worry about the writing all the rows in case of having more rows
this is what im confused, it already returns at [1] index (0.925417020494423) from chi2.contigency result, then why we should recalculate the p value. hopefully someone can explain
Hey Krish, I had a doubt. If i have 4 columns which have 2 character type data. Eg. Let 4 columns name be A,B,C,D and all these 4 columns are categorical data that is all 4 columns have Yes and No type data. My aim is to find whether all 4 columns have a Yes. Which Test should i go for in that case.
hello Krish, awesome video series as always. if p-value is high then both samples are related to each other right? in your code, there is a condition where if p_value
Hi Krish Naik, I-am following you channel and it is very clear and easily understandable. After your z test and T test video, i tried doing some hypothesis test. Here is my example and would need your help if i am doing it wrong. I have a file with 5000 rows And i am considering as a population and i have assuming hypothesis. Null Hypothesis as age 30 This is one tailed test. So here is my question do i need create a sample from population or else i need filter age >=30 and consider it as sample? And if z score table 1.694 and z test gave 3.54 the do i need reject null hypothesis? Please kindly help me.
suppose there are around 50 features in my data set so, should i do the chi square test for each of two features and same with others also , it will more time consuming...........or we will directly find the correlation as per pair plot and select one out of similar ones
@@mohammedahtesham2021 Dear Mohd , p_value is just a probability value that assures our result , the main result is in correlation , let us take an example , suppose we got correlation of +1 with p_value of 0.05 , then it means that the 2 variables have positive 1 correlation with 0.05 probability ie with 5% of accuracy , now why only 5 percentage for this u must see the hypothesis testing video of the Krish Sir , for further doubts mail me at gsbhatia111@gmail.com , if u feel , i hope my reply helps u thanks
Krish, you told there is a relationship & one will have some effect on another as they have relation. So we have to consider one or both variable?????????? if we consider one then it will definitely get effected as they are related with each other... considering one will give miss effect on target then????
plzz tell me what 1st 2 values in the result of function chi2_contingency is ....as i was thinking that 1st one is chi_statistics_value & 2nd one is p_value.???
Hi Krish, If I have to figure out the association/ relation between more than 2 categorical variable , will that be done using Chi2? If I have to test the multicollinearity between more than 2 categorical variables, can we convert them into numeric and apply VIF?
Thank you for the video. It seems Python is not efficient in running statistical tests. You have to get one single test results by taking too many steps. Other software packages can do a better job in this department.
Thanks for the video but i have a doubt , i've never really grasped the concept how of how you choose the hypothesis in some cases you choose NULL hypothesis for the motion and in some cases you choose the hypothesis against the motion.LIke how does that work?
yes it's very important...as we always assume the H0 hypothesis to true while testing/finding the p-value. H0: there's no significant difference (just do this for all cases...it can be btw 2 variables,a sample mean nd a given population mean etc)
Guys just a tip here...u can simplify the process of obtaining the X2 statictic->
X2_statistic=(observed_values - estimated_values)**2/estimated_values
X2_statistic=X2statistic.sum()
and make sure observed_values and estimated_values are numpy arrays
Thanks, Explanation is Clear and Concise. Able to understand properly.
Great video. Please upload videos implementing various other tests
your explanations are really awesome! Thank you😊
Krish, Thank you for the explanation. I have a question. Why didn't you use the P-Value and Chi_Square values that the contingency function provides and you calculated them separately? even the numbers you got are not the same.
Krish, Really nice video. what steps should we take after we perform these test. I have following question-
1. What should we do if two features are related with each another. Do we need to exclude one in feature selection? Or what should we do?
2. If there are independent, then we are good to take both features in our model for prediction?
I have the same questions to ask
if one feature can be derived or is highly dependent on another variable it would be wise to remove it for example age and birth date
Thank you so much for this explanation!
Very nice explanation sir
Grate explanation sir thank you 😊
Sir I have doubt..
At 11:42 u said that chi2_statistic should always be greater than critical value then only we retain null hypothesis
but in the code our chi2_statistic value smaller than critical value
in if condition u gave
if(chi2_statisti>=critical value):
print(reject ho and accept h1 there is a relation)
else:
print(retain ho there is no relation)
I think we have to reject the ho null hypothesis if chi2_statistic is greater than critical value
Thank you so much... 🙂
Highly Obliged..... 🙏
Am I the only one who saw statquest by josh starmer? I am following statquest before krish Naik sir lecture...... believe me statquest is very good for indepth knowledge
Thanks Krish
Thank you sir for your help.
simply perfect!
thanks krish .great explanation
if p-value is high then both samples are related to each other right? in your code, there is a condition where if p_value
Hey Krish , Love your videos. Kindly upload more videos in the machine learning pipeline section. The last one is feature selection . Interpretation and deployment videos would be largely appreciated.
Sure
Thank you so much, this makes my day:)
Amazing. Thanks
Instead of writing chi_square_statistic=chi_square[0]+chi_square[1].... for number of rows just replace this line with chi_square_statistic=chi_square.sum(), So you need not to worry about the writing all the rows in case of having more rows
doesn't `scipy.stats.chi2_contingency` already return `p value` directly?
this is what im confused, it already returns at [1] index (0.925417020494423) from chi2.contigency result, then why we should recalculate the p value. hopefully someone can explain
@@hilmanrevisionery130 Did you got it?
Hey Krish, I had a doubt.
If i have 4 columns which have 2 character type data.
Eg. Let 4 columns name be A,B,C,D and all these 4 columns are categorical data that is all 4 columns have Yes and No type data.
My aim is to find whether all 4 columns have a Yes.
Which Test should i go for in that case.
Sir, not able to see the big data playlist
Sir, can you Make videos on different different pairwise metrics in sklearn like cosine similarity, sigmoid krnel, rbf kernel etc..
hello Krish, awesome video series as always.
if p-value is high then both samples are related to each other right? in your code, there is a condition where if p_value
Thanks
if p-value is high then both samples are related to each other right? in your code, there is a condition where if p_value
Hi, Krish ur videos are really helping me understand these concepts in a easy way thank you . Is there any possibility a video on ANOVA??
Krish, Thank You! Any video on ipynb file explaining ANOVA test ?
Nice video
if p-value is high then both samples are related to each other right? in your code, there is a condition where if p_value
Question: why did we calculate the p-value again? cant we just use the p-value returned from the chi2_contingency() ?
+1
@@amansinghrathore8308 Did you got it?
Hi Krish Naik,
I-am following you channel and it is very clear and easily understandable. After your z test and T test video, i tried doing some hypothesis test.
Here is my example and would need your help if i am doing it wrong.
I have a file with 5000 rows
And i am considering as a population and i have assuming hypothesis.
Null Hypothesis as age 30
This is one tailed test.
So here is my question do i need create a sample from population or else i need filter age >=30 and consider it as sample?
And if z score table 1.694 and z test gave 3.54 the do i need reject null hypothesis?
Please kindly help me.
suppose there are around 50 features in my data set so, should i do the chi square test for each of two features and same with others also , it will more time consuming...........or we will directly find the correlation as per pair plot and select one out of similar ones
too gud sir ,
if p-value is high then both samples are related to each other right? in your code, there is a condition where if p_value
@@mohammedahtesham2021 Dear Mohd , p_value is just a probability value that assures our result , the main result is in correlation , let us take an example , suppose we got correlation of +1 with p_value of 0.05 , then it means that the 2 variables have positive 1 correlation with 0.05 probability ie with 5% of accuracy , now why only 5 percentage for this u must see the hypothesis testing video of the Krish Sir , for further doubts mail me at
gsbhatia111@gmail.com , if u feel , i hope my reply helps u
thanks
Krish, you told there is a relationship & one will have some effect on another as they have relation. So we have to consider one or both variable?????????? if we consider one then it will definitely get effected as they are related with each other... considering one will give miss effect on target then????
plzz tell me what 1st 2 values in the result of function chi2_contingency is ....as i was thinking that 1st one is chi_statistics_value & 2nd one is p_value.???
Hi Krish,
If I have to figure out the association/ relation between more than 2 categorical variable , will that be done using Chi2?
If I have to test the multicollinearity between more than 2 categorical variables, can we convert them into numeric and apply VIF?
if p-value is high then both samples are related to each other right? in your code, there is a condition where if p_value
Krish, please upload similar implementation video for Anova test aslo
What do we do when we have multiple categorical columns ??
Hi Krish. Nice video. Where are the big data videos?
if p-value is high then both samples are related to each other right? in your code, there is a condition where if p_value
Thank you for the video. It seems Python is not efficient in running statistical tests. You have to get one single test results by taking too many steps. Other software packages can do a better job in this department.
Hi Krish the video is good can u tell me the solution for the chi square test coding for marital status and different
education level problem
Thanks for the video but i have a doubt , i've never really grasped the concept how of how you choose the hypothesis in some cases you choose NULL hypothesis for the motion and in some cases you choose the hypothesis against the motion.LIke how does that work?
yes it's very important...as we always assume the H0 hypothesis to true while testing/finding the p-value.
H0: there's no significant difference (just do this for all cases...it can be btw 2 variables,a sample mean nd a given population mean etc)
Sir while finding p value, can we give pdf instead of cdf
And how do we know if there is type1 or type2 error?
Can you explain goodness of fit test in python
What about negative values?
Can anyone please help me with Expected_Values=val[3], why here val[3] means..
Hello guys , Plese can anyone explain :- The Degree of freedom = 1 in the output what this "1" signifies.
Please would you help me with this dataset
e to upar sea nikal gaya..koi nai..jab jarurat hoga...tab dubara sea research kiya jaye ga...vese thanq sir..
can anyone send this playlist?
What about anova ?
Sir, can you please recommend me a very good book for statistics.
getting error for sns.load_dataset('tips') even though i have imported seaborn
Can anyone please help me with Expected_Values=val[3], what here val[3] means..