Finding Multivariate Outliers with the Mahalanobis Distance Test in SPSS

Поделиться
HTML-код
  • Опубликовано: 8 окт 2024
  • When you are cleaning your raw data, you will want to check for outliers; particularly multivariate outliers, because they can really mess up your analysis. The Mahalanobis Distance test identifies outliers among multiple variables, called multivariate outliers. The biggest problem is that the Mahalanobis test is hidden in the Regression commands and not intuitive to find. I will show you how to compute Mahalanobis scores, sort them, check their probability using a Chi-square distribution table, and then a shortcut to let SPSS compute Mahalanobis probabilities. We begin with a review of outliers for those who want to brush up on the concepts, an explanation of how to choose cutoff scores for exploratory data analysis, and we wrap up with how to report the Mahalanobis findings in APA style.
    This is the full video about the Mahalanobis Distance test. I also have an abbreviated video that only covers how to do the test, create probabilities, and write up results.
    Short video: • SPSS Essentials: Mahal...
    This video teaches the following commands and techniques:
    Multivariate outliers
    Linear Regression
    Mahalanobis test
    Compute Variable
    Chi square distribution table
    Tabachnick, B. G., & Fidell, L. S. (2013). Using multivariate statistics (6th ed.). Boston, MA: Allyn and Bacon.
    SYNTAX:
    COMPUTE pMAH_1= 1 - CDF.CHISQ(mah_1,8).
    EXECUTE.
    SORT CASES BY pMAH_1 (A).
    This series uses the data set Mahalanobis_PH.sav
    Link to a Google Drive folder with all of the files that I use in the videos including the dataset and the Excel Chi-square distribution table
    drive.google.c...

Комментарии • 70

  • @katemcleod3707
    @katemcleod3707 4 года назад +14

    This is the BEST video I have watched on statistics and particularly on MD. Thank you so so much. I feel so much more confident in my understanding of MD and my write up :)

    • @ResearchByDesign
      @ResearchByDesign  4 года назад +1

      Glad it was helpful! Appreciate the comment so much!!

  • @olynn2415
    @olynn2415 8 месяцев назад +1

    Thank you very much. It is the best and clearest explanation ever.

  • @MUSICAHOLIC999
    @MUSICAHOLIC999 2 года назад +1

    Thank you so much for the citation info at the end of video.. I've been looking for it everywhere... 😭

    • @ResearchByDesign
      @ResearchByDesign  2 года назад

      Outstanding..glad to hear it.

    • @MUSICAHOLIC999
      @MUSICAHOLIC999 2 года назад +1

      @@ResearchByDesign this is the best video on RUclips about mahalanobis 👍

  • @Belinda-R74
    @Belinda-R74 3 года назад +1

    Thank you for this video. It is a clear and easy to follow explanation of how to identify and treat multivariate outliers.

    • @ResearchByDesign
      @ResearchByDesign  3 года назад

      Glad it was helpful! Thanks for taking the time to comment!

  • @nnadinemounajed6702
    @nnadinemounajed6702 Год назад +1

    This really helped me, bless ❤

  • @jojo-april4
    @jojo-april4 Год назад +1

    Thank you soo much for this video

  • @farukhzunaira
    @farukhzunaira Год назад +1

    Your videos are really helpful,the way of educating is really effective.abundal of thanks 👍

  • @alqambayan2889
    @alqambayan2889 3 года назад +1

    This is the BEST video I have watched

  • @srikaewyoosuk201
    @srikaewyoosuk201 2 года назад

    Thank you very much, it is the best VDO that I can understand.👍

  • @СергейГоворун-ы5к
    @СергейГоворун-ы5к 2 года назад +1

    Да, интересная тема однако есть!
    Спасибо большое за работу и за видео!

  • @farahfadzilaha.41
    @farahfadzilaha.41 4 года назад +1

    Thank you for your great explanation. That's cleared it up for me.

  • @dougfoote7662
    @dougfoote7662 3 года назад +1

    Wonderful explanation! Thank you!

  • @stepht7805
    @stepht7805 Год назад

    Great tutorial, thanks!

  • @katerinagk2681
    @katerinagk2681 Год назад +1

    thank you so much

  • @chathuhasi
    @chathuhasi 3 года назад

    Thank you for this, It's clear and well explained !!
    I just see there is a minor difference between the excel (from your Drive) and the one on the video. seems .. Determine a χ2 critical value for a given alpha level vs Determine a χ2 critical value for any degree of freedom, kindly clarify.Thanks

    • @ResearchByDesign
      @ResearchByDesign  3 года назад

      Might be an error, however, I have been working on that spreadsheet and make a lot of improvements. You can see the formulas in the spreadsheet to check if they are the same in both places. Thanks for letting me know.

    • @chathuhasi
      @chathuhasi 3 года назад

      @@ResearchByDesign Thanks

  • @terriehilbun
    @terriehilbun 10 месяцев назад +1

    Does anyone know of a good video that provides the steps to conduct Mardia’s test for multivariate normality?

  • @senensoberano847
    @senensoberano847 4 года назад

    Very clear explanation thank you

  • @mikorees5853
    @mikorees5853 3 года назад

    Its super amazing channel by super cool professor!

  • @Oz4rmEg
    @Oz4rmEg 3 года назад

    Thank you
    Great video

  • @syk-.
    @syk-. 4 года назад

    Thank you so much for this! Question: I am validating a 29-item questionnaire but also have 4 other questionnaire for validation purposes.
    1- Should I include all of the scales in the Mahalanobis Distance test? Or should I only include the main scale I am interested in?
    2- Should I include all 29 items? Or the latent variables (4 factors) instead?
    Thank you!

    • @ResearchByDesign
      @ResearchByDesign  4 года назад +3

      I would suggest creating a composite score (i.e. total score, mean of the 29-items) for your questionnaire and for your four other questionnaires, then doing a Mahalanobis on the five scales (df = 5). This assumes that you are far enough into your validation that you have established that the 29 items work as a single scale. If you are still trying to work out which items work well with other items (exploratory factor analysis) then Mahalanobis would identify outlier variables, but you could create a random number as your DV and toss in all of the items as your IVs (df = 29). If I misunderstood and your 29 items potentially contain 4 factors, then create the total scores for each of those factors then use Mahalanobis (df = 4).

  • @omidmahdieh7882
    @omidmahdieh7882 2 года назад

    Hello Dr. Daniel. Thanks for your helpful demonstration. Can items be used to calculate Mahalanobis distances? Or should I use variables. I mean composite variables.

    • @ResearchByDesign
      @ResearchByDesign  2 года назад +1

      You can and should use the composite variables, or sub scales. You do not have outliers in individual items (typically) but you will have outliers when you combine multiple items and take the mean. Therefore, use the composite scores, not the individual items. Good luck

    • @omidmahdieh7882
      @omidmahdieh7882 2 года назад

      @@ResearchByDesign Thanks. I wish you the best.

  • @deannabeech9587
    @deannabeech9587 7 месяцев назад

    LOVE the shirt

  • @saritamariaabarzacancino2189
    @saritamariaabarzacancino2189 4 года назад

    Thank you very much for the video!
    I have a question, what tests should I do to measure the influence of a set of variables on the occurrence of a phenomenon in the absence and / or combination of them? I'm working on my thesis and I'm trying to find patterns in the interaction of this variables.
    I would really appreciate your help

    • @ResearchByDesign
      @ResearchByDesign  4 года назад +1

      You have several questions roles into one...the occurrence or not could be a chi-square, but if you are interested in interactions, then some sort of regression model. Perhaps a logistic regression, if the outcome is 1 or 0 (occurred or not), although you could be looking at a variation of a survival analysis. I would point you in one of those directions.

  • @zohalh14
    @zohalh14 3 года назад

    Can you use Mahalanobis distance if your IVs are categorical in a mixed anova?

  • @arturojuarezg
    @arturojuarezg 4 года назад

    In your opinion is Mahalanobis better than cook distance test?. How can I obtain a cook distance significance test?

    • @ResearchByDesign
      @ResearchByDesign  4 года назад +2

      Good idea to add a video about the Cook Distance test...I was trained with Mahalanobis, so that is what I use. I recall Cook's being similar, but I don't know that one is clearly superior.

  • @zuhaibqureshi1281
    @zuhaibqureshi1281 3 года назад

    Very informative video, TQVM. However, I’m unable to download chi square critical value calculator. Can you please email me.

    • @ResearchByDesign
      @ResearchByDesign  3 года назад

      No worries. Email me through the channel email link or you can find Todd Daniel through the MissouriState.edu website. I will send you the current versions of each of the spreadsheets

  • @nikmohdfarisnikmin9368
    @nikmohdfarisnikmin9368 3 года назад

    Greetings sir..
    If I have a five independent variables, what is the critical value of chi-square? My friend have like 2 independent variable and his critical value is 13.82
    So in my case with 5 independent variables, what is my critical value?
    Thank you..

    • @ResearchByDesign
      @ResearchByDesign  3 года назад

      Without understanding of anything else about your research design, if you have five variables in a Mahalanobis test, you interpret the chi square as if df = 5 (not n - 1).

    • @nikmohdfarisnikmin9368
      @nikmohdfarisnikmin9368 3 года назад

      @@ResearchByDesign I see. Ok sir thanks for the clarifications.

  • @mikorees5853
    @mikorees5853 3 года назад

    in the writeup, how to mention about the cutoff value used?

    • @ResearchByDesign
      @ResearchByDesign  3 года назад

      I usually write something like: "the Mahalanobis test was interpreted using a p = .001 significance criteria for identifying multivariate outliers." It is very common to use .01 or .001 levels for assumptions tests.

    • @mikorees5853
      @mikorees5853 3 года назад

      @@ResearchByDesign Thank you very much. I will like to stick to the writing style that you mentioned. I noticed that in consumer/marketing journals, they mentioned only Mahalanobis distance, but I guess the reviewers would be interested to know more details, where the writeup style you mentioned would really help. Thanks again Doctor.

  • @muhammadnaeem7869
    @muhammadnaeem7869 4 года назад

    Respected Sir,
    Can you please help me in getting the Excel Chi-square distribution table from the google drive. I could not see it over there.

  • @mikorees5853
    @mikorees5853 3 года назад

    can i use the dependent variables or any variables for mahalanobis test?

    • @ResearchByDesign
      @ResearchByDesign  3 года назад +1

      Yes, in fact you should use all of the variables that you plan to use in your model when you check for multivariate outliers.

    • @mikorees5853
      @mikorees5853 3 года назад +2

      @@ResearchByDesign Thank you very much. I can finally proceed with my analysis. I searched for this a lot but was not getting clarity. Thanks again for your response and the amazing video.

    • @cecyliaadamczak4301
      @cecyliaadamczak4301 2 года назад +1

      @@ResearchByDesign Hi Professor Dan, is there a reference for this practice?
      I have tried looking for some time now and everywhere seems to just talk about the IV.

  • @shafeekafadlikhzamri7068
    @shafeekafadlikhzamri7068 4 года назад

    DR, what if i have more than ONE Dependent Variable?

    • @ResearchByDesign
      @ResearchByDesign  4 года назад

      Hello...for the Mahalanobis test, all that matters is that you include all of the scale variables in your model. So if you have 2 DVs, include them along with each of your IVs (or predictors) and use the total number of variables as the "df" for the chi-square probability interpretation. Good luck!

    • @muhammadnaeem7869
      @muhammadnaeem7869 4 года назад

      @@ResearchByDesign Sir, How do we use 2 DVs in the regression window since it has only one option to handle.

    • @kamarulliza9122
      @kamarulliza9122 3 года назад +2

      @@ResearchByDesign means if I have 1 DV 3 IV and 1 mediator variable means all in 5. Then put 5 into IV box to run Mahalanobis test. Is it correct?

  • @jamessiklos-whillans9528
    @jamessiklos-whillans9528 2 года назад

    I thought a mahalanobis D over 1 was an outlier?

    • @ResearchByDesign
      @ResearchByDesign  2 года назад

      It may be that someone taught you to standardize the Mahalanobis values so that it creates numbers where 1 indicates "outlier", but that would not be required. I have always used the chi-square interpretation. The results would be the same if you did the correction/standardization

  • @naftalibendavid
    @naftalibendavid 3 года назад

    In addition, you might have an odd combination that seemed unlikely. Age 18 is fine. Earned $300k last year might be fine. But both?

    • @ResearchByDesign
      @ResearchByDesign  3 года назад +2

      You, sir, are the very first person to notice that detail...you will find little Easter eggs like that in several videos. I love it when people spot them. I think that one was also intended to be a bivariate outlier for a class illustration.

  • @sendmeyourdog
    @sendmeyourdog 4 года назад

    Is there a maximum number of DV's you can use for a mahalanobis distance calculation? I have 36+ DV's I need to put in, however once I exceed around 27 variables, SPSS spits out 28.0333 as the MAH for every single particiapnt

    • @ResearchByDesign
      @ResearchByDesign  4 года назад +1

      There is not a maximum in terms of how many variables you can test. Having 36 distinct DVs would make your interpretation of the analysis pretty complex, so I assume that you have 36 items that will be combined into a smaller number of scales or sub-scales. If so, I'd recommend creating those scales first, then do the mahal.

    • @sendmeyourdog
      @sendmeyourdog 4 года назад

      @@ResearchByDesign Thanks for your response! Your videos are very helpful. So the degrees of freedom one has in their sample doesn't influence the number of DV's that can go into an MAH test? I'm not sure why SPSS was returning that every single participant had the exact same (28.0333) MAH distance. However when I brought the number of DV's down to around 27, it returned normal distances