Tutorial 33- Chi Square Test Implementation with Python- Hypothesis Testing- Part 2

Поделиться
HTML-код
  • Опубликовано: 24 дек 2024

Комментарии • 70

  • @amalsunil4722
    @amalsunil4722 4 года назад +3

    Guys just a tip here...u can simplify the process of obtaining the X2 statictic->
    X2_statistic=(observed_values - estimated_values)**2/estimated_values
    X2_statistic=X2statistic.sum()
    and make sure observed_values and estimated_values are numpy arrays

  • @pranabmishra2609
    @pranabmishra2609 4 года назад

    Thanks, Explanation is Clear and Concise. Able to understand properly.

  • @GauravSharma-ui4yd
    @GauravSharma-ui4yd 4 года назад +5

    Great video. Please upload videos implementing various other tests

  • @lenaara4569
    @lenaara4569 Год назад

    your explanations are really awesome! Thank you😊

  • @roksanarezaei9608
    @roksanarezaei9608 4 года назад +13

    Krish, Thank you for the explanation. I have a question. Why didn't you use the P-Value and Chi_Square values that the contingency function provides and you calculated them separately? even the numbers you got are not the same.

  • @pramodkumargupta1824
    @pramodkumargupta1824 4 года назад +12

    Krish, Really nice video. what steps should we take after we perform these test. I have following question-
    1. What should we do if two features are related with each another. Do we need to exclude one in feature selection? Or what should we do?
    2. If there are independent, then we are good to take both features in our model for prediction?

    • @6shipra
      @6shipra 4 года назад

      I have the same questions to ask

    • @Rahul-gn7px
      @Rahul-gn7px 4 года назад

      if one feature can be derived or is highly dependent on another variable it would be wise to remove it for example age and birth date

  • @vanessaleiko
    @vanessaleiko 4 года назад +1

    Thank you so much for this explanation!

  • @ankitayadav2690
    @ankitayadav2690 3 года назад

    Very nice explanation sir

  • @kaifahmed316
    @kaifahmed316 3 года назад

    Grate explanation sir thank you 😊

  • @badiyabhargav8597
    @badiyabhargav8597 3 года назад +1

    Sir I have doubt..
    At 11:42 u said that chi2_statistic should always be greater than critical value then only we retain null hypothesis
    but in the code our chi2_statistic value smaller than critical value
    in if condition u gave
    if(chi2_statisti>=critical value):
    print(reject ho and accept h1 there is a relation)
    else:
    print(retain ho there is no relation)
    I think we have to reject the ho null hypothesis if chi2_statistic is greater than critical value

  • @bhagyashreemohanta7826
    @bhagyashreemohanta7826 4 года назад +1

    Thank you so much... 🙂
    Highly Obliged..... 🙏

  • @akashsoni5870
    @akashsoni5870 4 года назад

    Am I the only one who saw statquest by josh starmer? I am following statquest before krish Naik sir lecture...... believe me statquest is very good for indepth knowledge

  • @louerleseigneur4532
    @louerleseigneur4532 3 года назад

    Thanks Krish

  • @biranchinath8428
    @biranchinath8428 3 года назад

    Thank you sir for your help.

  • @priyaduttbhatt5691
    @priyaduttbhatt5691 3 года назад

    simply perfect!

  • @sandipansarkar9211
    @sandipansarkar9211 4 года назад

    thanks krish .great explanation

    • @mohammedahtesham2021
      @mohammedahtesham2021 4 года назад

      if p-value is high then both samples are related to each other right? in your code, there is a condition where if p_value

  • @ajithshenoy5566
    @ajithshenoy5566 4 года назад +1

    Hey Krish , Love your videos. Kindly upload more videos in the machine learning pipeline section. The last one is feature selection . Interpretation and deployment videos would be largely appreciated.

  • @SHIVAMBAJPEYIMIM
    @SHIVAMBAJPEYIMIM 4 года назад

    Thank you so much, this makes my day:)

  • @solar_girl_here
    @solar_girl_here 3 года назад

    Amazing. Thanks

  • @AsifMarazi
    @AsifMarazi Год назад

    Instead of writing chi_square_statistic=chi_square[0]+chi_square[1].... for number of rows just replace this line with chi_square_statistic=chi_square.sum(), So you need not to worry about the writing all the rows in case of having more rows

  • @srinathganesh6985
    @srinathganesh6985 4 года назад +3

    doesn't `scipy.stats.chi2_contingency` already return `p value` directly?

    • @hilmanrevisionery130
      @hilmanrevisionery130 3 года назад

      this is what im confused, it already returns at [1] index (0.925417020494423) from chi2.contigency result, then why we should recalculate the p value. hopefully someone can explain

    • @varunupadhyay1576
      @varunupadhyay1576 2 года назад

      @@hilmanrevisionery130 Did you got it?

  • @AK-ws2yw
    @AK-ws2yw 3 года назад +1

    Hey Krish, I had a doubt.
    If i have 4 columns which have 2 character type data.
    Eg. Let 4 columns name be A,B,C,D and all these 4 columns are categorical data that is all 4 columns have Yes and No type data.
    My aim is to find whether all 4 columns have a Yes.
    Which Test should i go for in that case.

  • @lokanathshroff3301
    @lokanathshroff3301 4 года назад +2

    Sir, not able to see the big data playlist

  • @shobitjain9619
    @shobitjain9619 4 года назад +2

    Sir, can you Make videos on different different pairwise metrics in sklearn like cosine similarity, sigmoid krnel, rbf kernel etc..

  • @omkarpatil2854
    @omkarpatil2854 4 года назад +1

    hello Krish, awesome video series as always.
    if p-value is high then both samples are related to each other right? in your code, there is a condition where if p_value

  • @amitjajoo9510
    @amitjajoo9510 4 года назад +1

    Thanks

    • @mohammedahtesham2021
      @mohammedahtesham2021 4 года назад

      if p-value is high then both samples are related to each other right? in your code, there is a condition where if p_value

  • @PravinKumar-zc2eq
    @PravinKumar-zc2eq 2 года назад +1

    Hi, Krish ur videos are really helping me understand these concepts in a easy way thank you . Is there any possibility a video on ANOVA??

  • @abhinavsharma7291
    @abhinavsharma7291 3 года назад

    Krish, Thank You! Any video on ipynb file explaining ANOVA test ?

  • @rajulshakya4899
    @rajulshakya4899 4 года назад

    Nice video

    • @mohammedahtesham2021
      @mohammedahtesham2021 4 года назад

      if p-value is high then both samples are related to each other right? in your code, there is a condition where if p_value

  • @akshayvishnukishore2282
    @akshayvishnukishore2282 3 года назад +1

    Question: why did we calculate the p-value again? cant we just use the p-value returned from the chi2_contingency() ?

  • @Balubindass
    @Balubindass 4 года назад

    Hi Krish Naik,
    I-am following you channel and it is very clear and easily understandable. After your z test and T test video, i tried doing some hypothesis test.
    Here is my example and would need your help if i am doing it wrong.
    I have a file with 5000 rows
    And i am considering as a population and i have assuming hypothesis.
    Null Hypothesis as age 30
    This is one tailed test.
    So here is my question do i need create a sample from population or else i need filter age >=30 and consider it as sample?
    And if z score table 1.694 and z test gave 3.54 the do i need reject null hypothesis?
    Please kindly help me.

  • @abhinaygupta8243
    @abhinaygupta8243 3 года назад

    suppose there are around 50 features in my data set so, should i do the chi square test for each of two features and same with others also , it will more time consuming...........or we will directly find the correlation as per pair plot and select one out of similar ones

  • @gurdeepsinghbhatia2875
    @gurdeepsinghbhatia2875 4 года назад

    too gud sir ,

    • @mohammedahtesham2021
      @mohammedahtesham2021 4 года назад

      if p-value is high then both samples are related to each other right? in your code, there is a condition where if p_value

    • @gurdeepsinghbhatia2875
      @gurdeepsinghbhatia2875 4 года назад

      @@mohammedahtesham2021 Dear Mohd , p_value is just a probability value that assures our result , the main result is in correlation , let us take an example , suppose we got correlation of +1 with p_value of 0.05 , then it means that the 2 variables have positive 1 correlation with 0.05 probability ie with 5% of accuracy , now why only 5 percentage for this u must see the hypothesis testing video of the Krish Sir , for further doubts mail me at
      gsbhatia111@gmail.com , if u feel , i hope my reply helps u
      thanks

  • @nanditasharma6766
    @nanditasharma6766 4 года назад

    Krish, you told there is a relationship & one will have some effect on another as they have relation. So we have to consider one or both variable?????????? if we consider one then it will definitely get effected as they are related with each other... considering one will give miss effect on target then????

  • @saumyamishra5203
    @saumyamishra5203 3 года назад

    plzz tell me what 1st 2 values in the result of function chi2_contingency is ....as i was thinking that 1st one is chi_statistics_value & 2nd one is p_value.???

  • @anupamasonnad220
    @anupamasonnad220 4 года назад

    Hi Krish,
    If I have to figure out the association/ relation between more than 2 categorical variable , will that be done using Chi2?
    If I have to test the multicollinearity between more than 2 categorical variables, can we convert them into numeric and apply VIF?

    • @mohammedahtesham2021
      @mohammedahtesham2021 4 года назад +1

      if p-value is high then both samples are related to each other right? in your code, there is a condition where if p_value

  • @sushantshekhar8082
    @sushantshekhar8082 4 года назад

    Krish, please upload similar implementation video for Anova test aslo

  • @AkshayDudvadkar
    @AkshayDudvadkar 3 года назад

    What do we do when we have multiple categorical columns ??

  • @pratikchatterjee5992
    @pratikchatterjee5992 4 года назад

    Hi Krish. Nice video. Where are the big data videos?

    • @mohammedahtesham2021
      @mohammedahtesham2021 4 года назад

      if p-value is high then both samples are related to each other right? in your code, there is a condition where if p_value

  • @alextjflorida
    @alextjflorida 3 года назад

    Thank you for the video. It seems Python is not efficient in running statistical tests. You have to get one single test results by taking too many steps. Other software packages can do a better job in this department.

  • @MJAYRECORDS
    @MJAYRECORDS 4 года назад

    Hi Krish the video is good can u tell me the solution for the chi square test coding for marital status and different
    education level problem

  • @sushantrauthan5704
    @sushantrauthan5704 4 года назад

    Thanks for the video but i have a doubt , i've never really grasped the concept how of how you choose the hypothesis in some cases you choose NULL hypothesis for the motion and in some cases you choose the hypothesis against the motion.LIke how does that work?

    • @amalsunil4722
      @amalsunil4722 4 года назад

      yes it's very important...as we always assume the H0 hypothesis to true while testing/finding the p-value.
      H0: there's no significant difference (just do this for all cases...it can be btw 2 variables,a sample mean nd a given population mean etc)

  • @dheerajkumark2268
    @dheerajkumark2268 4 года назад

    Sir while finding p value, can we give pdf instead of cdf

  • @Tejashri_Kate
    @Tejashri_Kate Год назад

    And how do we know if there is type1 or type2 error?

  • @vaibhavmohite468
    @vaibhavmohite468 4 года назад

    Can you explain goodness of fit test in python

  • @vincetechclass3390
    @vincetechclass3390 2 года назад

    What about negative values?

  • @minakshi_119
    @minakshi_119 3 года назад

    Can anyone please help me with Expected_Values=val[3], why here val[3] means..

  • @parikshitgurjar5545
    @parikshitgurjar5545 3 года назад

    Hello guys , Plese can anyone explain :- The Degree of freedom = 1 in the output what this "1" signifies.

  • @rayhankabir645
    @rayhankabir645 7 месяцев назад

    Please would you help me with this dataset

  • @PrinceKumar-eb8hd
    @PrinceKumar-eb8hd 4 года назад +1

    e to upar sea nikal gaya..koi nai..jab jarurat hoga...tab dubara sea research kiya jaye ga...vese thanq sir..

  • @rushin3090
    @rushin3090 2 года назад

    can anyone send this playlist?

  • @chetanmazumder310
    @chetanmazumder310 4 года назад

    What about anova ?

  • @amadoum.jallow620
    @amadoum.jallow620 4 года назад +1

    Sir, can you please recommend me a very good book for statistics.

  • @manishbolbanda9872
    @manishbolbanda9872 4 года назад

    getting error for sns.load_dataset('tips') even though i have imported seaborn

  • @minakshi_119
    @minakshi_119 3 года назад

    Can anyone please help me with Expected_Values=val[3], what here val[3] means..