Identifying Multivariate Outliers with Mahalanobis Distance in SPSS

Поделиться
HTML-код
  • Опубликовано: 6 июл 2024
  • This video demonstrates how to identify multivariate outliers with Mahalanobis distance in SPSS. The probability of the Mahalanobis distance for each case is calculated using the “Compute Variable” function in SPSS.

Комментарии • 133

  • @scarlettthorn9060
    @scarlettthorn9060 2 года назад +24

    Honestly at this point I want to acknowledge you in my thesis thank you notes. Thank you Dr Grande, you are a gem.

  • @mimimcgee5512
    @mimimcgee5512 2 года назад +4

    Thank you for another helpful video. I am just a month or so away from receiving my doctorate and your videos have greatly assisted me in that! I'm brushing up in prep for my final defense and appreciate all of your videos. Thank you!

  • @yacinehajji1784
    @yacinehajji1784 8 лет назад +1

    I would like to thank you for speaking loudly and slowly, very usefull for someone not native English like me.

  • @Swityie
    @Swityie 2 года назад

    Dr Todd, you've saved my life! I was dying with the Mahalonobis!!! Was crying at midnight while getting stuck at this.
    Thank you again!

  •  2 года назад

    This video was very helpful! Thanks for sharing your knowledge for free on RUclips!

  • @thomasbarnes5703
    @thomasbarnes5703 2 года назад

    Thank You Dr. Grande, I have no background in statistics....yet had to take a course as a portion of my degree requirements. Your video have really helped me understand this very difficult subject!!

  • @arnelferaer6486
    @arnelferaer6486 4 года назад +6

    Dude you're a legend. Thank you for this.

  • @efrestein
    @efrestein 4 года назад +1

    Your videos add a ton of value!

  • @naftalibendavid
    @naftalibendavid 2 года назад

    This has proven so helpful again and again! Thanks.

  • @oscarespinozaparra6840
    @oscarespinozaparra6840 7 лет назад

    Thank you Todd Grande for this extraordinary how to video. This was a prayer answered and feel so much better listening and following your instructions. I want to express how sincerely grateful for the detail analysis and steps you indicated on this video.

  • @ThatFellowOnline
    @ThatFellowOnline 7 лет назад

    Fabulous video, explained clearly, concisely. I like how you have also shown the importance of labelling data properly and presentation (decimals) etc as this is really important when keeping data organised i.e. not just focusing on having a tidy output.

    • @DrGrande
      @DrGrande  7 лет назад

      I am glad you found this video useful - thanks for watching.

  • @krunal699
    @krunal699 Год назад

    Dr Grande you are a saviour! Thank You!

  • @shapsgh
    @shapsgh 3 года назад

    Just Realized that the values of MD and Chi-Square test exactly match the output of the AMOS' outlier table. Thanks Dr. Grande

  • @St0rytell3r
    @St0rytell3r 6 лет назад

    Thanks for the video, very thorough.

  • @thewaterhub
    @thewaterhub 8 лет назад

    Thank you, very useful video and clear explanation.

  • @voltisathartori6451
    @voltisathartori6451 6 лет назад

    Thank you Dr Todd, for such a awesome explanation.It was very beneficial for my study to move on.

  • @frajtervivien
    @frajtervivien 8 лет назад +1

    Thank you so much it was a lifesaver!

  • @fasamad6730
    @fasamad6730 7 лет назад

    Wonderful explanation. Enjoyed the session. Thank u Todd Grande it was a great help

    • @DrGrande
      @DrGrande  7 лет назад

      You're welcome, thanks for watching -

  • @xunzhou962
    @xunzhou962 8 лет назад

    Exactly what i need! Thank you!

  • @payonrayaneh
    @payonrayaneh 8 лет назад

    Very useful......Thanks a lot professor Grande.

  • @hafizahusairi
    @hafizahusairi 5 лет назад +2

    Thank You!! I more understand after watching your video =)

  • @zarifbaihaqi8538
    @zarifbaihaqi8538 4 года назад +1

    Thank you very much Dr Tod..you helped me a lot.....

  • @TrueFocusCoaching
    @TrueFocusCoaching 3 года назад +1

    Amazing!! You should do a separate video for the Chi-square distribution. Nowhere on RUclips is the second part to the explanation and because it is not overtly flagged in the title it does not show up.
    Either way thank you so much!!

  • @GeeWhit
    @GeeWhit 7 лет назад +1

    Thanks for the great video!
    Does this method expose two-tailed outliers? If not, how can this be achieved?

  • @jongsuksong7493
    @jongsuksong7493 7 лет назад

    Thank you so much for your great explanation! It really helped me a lot!

    • @DrGrande
      @DrGrande  7 лет назад

      I'm glad you found the video useful. Thanks for watching.

  • @RichardMcCrory_Neph
    @RichardMcCrory_Neph 7 лет назад +1

    +Todd Grande - could I check the degrees of freedom for the Chi-Square distribution is n or n-1. e.g. for 20 variables, is the d.f. 20 or 19?

  • @godnkr236
    @godnkr236 5 лет назад

    thanks for this amazing video!

  • @rahimbehrad63
    @rahimbehrad63 8 лет назад +1

    Thanks Dear Todd. great !

  • @chriskeran4480
    @chriskeran4480 8 лет назад

    Dr. Grande--thank you kindly. Awesome demonstration. The question I have relates to the number of independent variables (IV) chosen when calculating a Mahalanobis Distance (MD). Should the particular IVs chosen be related in some way or can you through in all of your numeric variables into the one regression when attempting to find multivariate outliers using MD?

  • @Thejubeabides24
    @Thejubeabides24 4 года назад +1

    Excellent video!

  • @lyrahazel2079
    @lyrahazel2079 4 года назад

    Omg thank you i was so frustated . My data wouldnt met the normal multivar assumption until i stumbled onto this!

  • @mohammedimam3651
    @mohammedimam3651 2 года назад

    Wooooow! This is extremely useful! Thank you! 👌

  • @felipemcse
    @felipemcse 7 лет назад +1

    Thanks for the video, Todd. Do you have some references that explains why the number of degree of freedom should be the same of the number of variables?

  • @barbaratoson6455
    @barbaratoson6455 7 лет назад

    Great video. Could you recommend a method to identify outliers in an RM ANOVA set up? I am looking for something similar to INFLUENCE option in SAS MIXED procedure but for SPSS

  • @jaynastics2
    @jaynastics2 4 года назад +1

    Very helpful video!

  • @alibezzaa809
    @alibezzaa809 4 года назад

    I really appreciate the efforts your are putting to making concepts easy to understand. Do you have a video on transforming a multivariate outlier to a dummy variable.

  • @94bfm
    @94bfm 6 лет назад

    Great explanation! Thank you so much!

  • @thoshsamanthar4815
    @thoshsamanthar4815 5 лет назад +1

    Dr Todd, the video helped me a lot. I have 2 questions
    1) I have an integrated framework, where analysis is done in 2 stages. Should I check MD for each stage? One of my variable will look like a mediator but it is not. It will be a DV in first stage and subsequently an IV in 2nd stage of the analysis. Stage 1 and stage 2 does not have any connection. I have done each testing and got different Prob_MD / outliers to be deleted.
    2) Should I include demographic questions as part of df, as the prob outliers results are different when I omit or include?

  • @thankyou6555
    @thankyou6555 Год назад

    Thank you! Very helpful.

  • @denniscraggs8393
    @denniscraggs8393 5 лет назад

    I liked your presentation. SPSS has evolved from the old text script product. I am a current user of both Minitab and Matlab.
    I am studying the Mahalanobis Distance and see that it has many applications. The SAE and ZVEI published a standard where electronics were judged to be fit for use in a temperature x voltage environment defined by a potato shape. However, they never provided a method of dealing with the different unit scale distances. I am thinking the Mahalanobis Distance would be a more technically correct means of classifying a component's fitness for use in a temperature x voltage environment.

  • @HarerimanaAlexis
    @HarerimanaAlexis 5 лет назад

    Dear Dr Todd, Thank you very much for this wonderful video. I h
    ave the same question about how do you decide on the degree of freedom, and whether .001 is the absolute rule. Thank you

  • @evannadhim6631
    @evannadhim6631 7 лет назад

    Todd, thank you so much for this clear explanation, but you've done the identification for multivariate outliers with Mahalanobis distance for the cases.
    My question: is there any differnce if we can do it for variables?
    As the variables have their onw distributions while they are affected by the outliers

  • @patfennell
    @patfennell 7 лет назад

    Great video - thanks for posting!

    • @DrGrande
      @DrGrande  7 лет назад

      You're welcome - thanks for watching.

  • @marinacuk1400
    @marinacuk1400 8 лет назад

    Thanks you for this very helpfully video. Whether these method may be applied to lognormal datasets? Whether it is necessary the data to follow a normal distribution?

  • @maheshvykuntam2809
    @maheshvykuntam2809 7 лет назад

    +Todd Grande - Thanks a lot for the great explanation. Could you please help me in understanding- 1. Will this process work even if we have missing values. Why do we use DF as 'n' y not n-1.? Thanks a lot for the help.

  • @ibrahimmkheimer5311
    @ibrahimmkheimer5311 3 года назад +1

    awesome video dr

  • @ravindarmadishetty736
    @ravindarmadishetty736 7 лет назад

    Dear Todd good explanation. The outliers which we got are similar to Residual(Actual-Predicted) outliers to remove from the data?

  • @karimatouati5256
    @karimatouati5256 3 года назад

    Thank you for this useful video. I have a question please : What to do in case of ordinal variables when checking for these outliers ? what method is the adequate one? Mahala Distance or Cook's Diastance ?
    Does it have sense to apply this method when my data is only composed with ordinal variables and not continuous ones ?

  • @ammaarkidwai2732
    @ammaarkidwai2732 3 года назад +1

    Hi Todd! Great video as usual. Why was the cut off for the probability_MD column .001? Is that the norm cut off or based on your data?

  • @KristinColletteScott
    @KristinColletteScott 6 лет назад

    Hi Dr. Grande,
    I've got 7 constructs (3 IVs, 3 intermediary, and 1 DV) each with multiple items. How do do you recommend handling these when searching for D2? I also need to test for multivariate normality using the Wald statistic on the same data set. Do you have a video on that?

  • @sskshats6453
    @sskshats6453 7 лет назад

    Thanks Alot. May.Allah bless you

  • @HughMupfunya
    @HughMupfunya 4 года назад +1

    Awesome... Thank you very much

  • @prof.thakshilakumari7847
    @prof.thakshilakumari7847 5 лет назад

    Thank you so much I followed your video and did the test with my sample. But I have a question on the degree of freedom? why you consider it 3?

  • @guitaqui
    @guitaqui 2 года назад

    Perfect !!! Thank you!!!

  • @zohalh14
    @zohalh14 2 года назад

    Thanks for the video! Can you use Mahalanobis distance if your IVs are categorical in a mixed anova?

  • @annabelleatkin1884
    @annabelleatkin1884 6 лет назад

    Would you include control variables as predictors in the regression? And if you're testing a latent interaction in MPlus, do you simply input the observed variables into the regression in SPSS to do this test?

  • @ljubomirpupovac2009
    @ljubomirpupovac2009 7 лет назад

    Hi Todd. Thanks for the video. Just one question: your main independent variable is program? Shouldn't we compare MAH_1 value for samples that received treatment and ones that didn't? The things is, main independent variable is not used in the analyze, so whatever value I put there the results (removed cases will be the same). Regards

  • @cecyliaadamczak4301
    @cecyliaadamczak4301 2 года назад

    Hi Dr. Grande, can we include the outcome variable (DV) with the IV in the mahalanobis distance analysis?

  • @polomarco1256
    @polomarco1256 4 года назад

    hi. Dr. Todd. Thanks for sharing knowledge. May I ask you something? Can I use Mahalanobis distance for identify multivariate outliers with ordinal data?

  • @jahanzaibalvi2010
    @jahanzaibalvi2010 2 месяца назад

    thats great. thank you so much sir

  • @Elianaco
    @Elianaco 6 месяцев назад

    Hello, thank you for your helpful videos. Quick one, I'm running a moderation with multiple mediators. Are mediator variables independent variables? I'm trying to run the Malanobis distance but unsure if I should add my mediators to the IV box. Thank you

  • @kathrinho9136
    @kathrinho9136 8 лет назад

    Hi, I have one question on the method. Hope you can help me :). In your data set, you have your manipulations, descripted as "program" and then you said that you have your independents named "functioning, severity, motivation". 'Why do additional metric independents exist in your file? In my data set I have 2 independents but they are in a nominal scale. So, what do I put in the text box of the linear regression where it says "independents"? Thanks in advance!!

  • @herix7342
    @herix7342 3 года назад

    Great contribution! Is there any reference for the described procedure?

  • @omidmahdieh7882
    @omidmahdieh7882 2 года назад

    Hello Dr. Grande. Thanks for your helpful demonstration. Can items be used to calculate Mahalanobis distances? Or should I use variables. I mean composite variables.

  • @chinhankim
    @chinhankim 4 года назад

    Dr.Grande, I have two independent variables and three mediation variables of one dependent variable. Question is should I put five variables(independent plus mediation variables) to figure out outliers or should I put only two independent variables? Thanks.

  • @khaledlahlouh6944
    @khaledlahlouh6944 4 года назад

    Dear Dr. Todd, how should we do when we have a model with many IV, two mediators and two VD ? should we consider the mediators as IV ?

  • @moeshams4504
    @moeshams4504 4 года назад +1

    Excellent!

  • @harithfarhan5535
    @harithfarhan5535 3 года назад +1

    thanks for this

  • @chinchinhoh7893
    @chinchinhoh7893 6 лет назад

    Dr Grande, 1 question. Frequently, the examples of identifying & handling outliers are about independent variables. Does it mean that we don;t have to identify & handle the outliers of dependent variables? TQ!

  • @madiharazzam1098
    @madiharazzam1098 6 лет назад

    i have a sample of 300 and 2 predictors. what would be the Mahalanobis Distance for it???

  • @farhanselfatan
    @farhanselfatan 6 месяцев назад

    Thank you dr

  • @jameslebron9412
    @jameslebron9412 6 лет назад

    Dear Todd nice video clip. I have a question that in your video i think you are using 3 independent variable and 1 dependent variable so actually you are using 4 variables totally.
    I guess degree of freedom in this case is 4-1 = 3 since you are measuring distance on the 4 dimensional scales.

  • @devildman3128
    @devildman3128 8 лет назад

    hi, are there any changes to be made if I find negative values for the probability_MD?

  • @desterward
    @desterward 6 лет назад

    Hi. Is it possible to use it in non-linear multivariate as well? Thanks

  • @Oz4rmEg
    @Oz4rmEg 3 года назад

    Best vid ever

  • @muhammadfaisal9918
    @muhammadfaisal9918 4 года назад +4

    Thank you Dr. Todd for your awesome work. This is a very useful video. I am wondering if you could mention the reference for this process (or a reference for the significance value - is it by Tabachnick & Fidell 2007?). Many thanks

    • @sebastiankruse4981
      @sebastiankruse4981 2 года назад

      Hair et al 2010 also recommend this process. They suggest to divide MD by the number of predictors and then designate outliers in small samples if these values surpasse 2.5 and in large samples if they surpass 4. I think the 2.5 cutoff point corresponds very closely to the .001 p-value used by Dr. Grande.

    • @Lello991
      @Lello991 2 года назад

      @@sebastiankruse4981 Hi! Could you please provide the full reference for Hair et al 2010? Is it this one?
      Hair, J.F., Black, W.C., Babin, B.J., & Anderson, R.E. (2010). Multivariate Data Analysis. Seventh Edition. Prentice Hall, Upper Saddle River, New Jersey

    • @sebastiankruse4981
      @sebastiankruse4981 2 года назад

      @@Lello991 yes, that‘s the one

  • @ninab6136
    @ninab6136 7 лет назад

    so i guess mahalonobis cant be calculated when you have missing values somewhere in the items. any other way i can include those cases?

  • @kamrannawaz
    @kamrannawaz 7 лет назад +1

    Thanks very helpful.....I understand that why you used 3 as DF, however please explain what is Chi Square?

  • @shafeekafadlikhzamri7068
    @shafeekafadlikhzamri7068 4 года назад

    hello Dr.Todd. Your video helped a lot and the steps are easily understood. but i seemed to have too many outliers , i would like to have your contact to ask you regarding this matter.

  • @next_trip_loading
    @next_trip_loading 5 лет назад

    can we apply ANOVA for the factor at 2 level? I have seen lot of studies using 2 levels and testing it with ANOVA.. Secondly, don't know how they check the normality when they use single item likert scale .. could you please explain me this concept

  • @ainannur5836
    @ainannur5836 6 лет назад

    Mr Todd, I have 4 variable; AsliG, AsliB, GreenBP, and BlueBP. I want to know the value of Mahalanobis distance between (AsliG AsliB) and (GreenBP BlueBP). Can I calculate its variable using Mahalanobis distance using SPSS? Why I cant input 2 variable in dependent and two independent other in SPSS?

  • @MrFoganholo
    @MrFoganholo 8 лет назад +4

    Todd, great explanation! Thanks. One question: Why you used 3 as degree of freedom? Why you used .001 as reference? Can I use for any sample? Thanks again.

    • @DrGrande
      @DrGrande  8 лет назад +9

      +André Foganholo Three degress of freedom were used because there were three variables in the analysis. Using the probability of .001 is a common practice when identifying multivariate outliers.

    • @n.einstein6088
      @n.einstein6088 8 лет назад +18

      +André Foganholo as a reference for the .001 threshold I used Tabachnick, B.G., & Fidell, L.S. (2007). Using Multivariate Statistics (5th Ed.). Boston: Pearson. (p. 74). according to www-01.ibm.com/support/docview.wss?uid=swg21480128. just in case anyone needs that.

    • @wenyuanliu4602
      @wenyuanliu4602 6 лет назад

      Thanks everyone!

    • @USDG
      @USDG 5 лет назад

      He used 3 because of the number of independent variables. Thank you

  • @selamawitweldegebriel3421
    @selamawitweldegebriel3421 4 года назад

    This was very helpful, how do we contact you. Cause I have an urgent problem

  • @henkpiet1908
    @henkpiet1908 9 месяцев назад

    What do I do if there’s a missing value in one of the scales when I use pair wise deletion for my regression. In that case the mahalanobis distance returns a missing value as well.

  • @wongjanice7753
    @wongjanice7753 8 лет назад +1

    Thank for your sharing! I would like to ask a question: if i detected 8 outliers with Mahalanobis distance, is this necessary for me to delete all outliers ? or 8 outliers out of 200 respondents is still in acceptable range ? is there any reference mention about it ?

    • @j.a.o.5535
      @j.a.o.5535 8 лет назад

      +Wong Janice According to Mead and Craig (2012, Identifying Careless Responses in Survey Data), you may have up to 20 careless responders, especially if you used web-based questionnaires, so I would eliminate those 8 outliers to improve the quality of the data, although it is not always a straightforward rule.

  • @theresiabusagara7909
    @theresiabusagara7909 6 лет назад

    Great it was very useful

  • @wpadilla72
    @wpadilla72 4 года назад

    Dear Dr. Grande, my variables are measured by likert scale...how must be applied the Mahalanobis test in this cases?...thanks

  • @moroomario4007
    @moroomario4007 Год назад

    Sir, if I used a Likert scale, the DV should be the mean score of all the items and IV should be the score of each items?

  • @nehakeshri933
    @nehakeshri933 Год назад

    loved it

  • @priyas8052
    @priyas8052 8 лет назад

    What if you get zero as a result for one of the rows?

  • @loversloss101
    @loversloss101 5 лет назад

    So what happens when you follow these instructions and every number you get for the MAH_1 is the same?

  • @micahgardner7836
    @micahgardner7836 2 года назад

    what if one of your variables was excluded by SPSS when calculating Mahalonobis Distance? Are the degrees of freedom the same, or would you minus one? Example, 5 variables entered but one was excluded. Would degrees of freedom be 5 or 4?

  • @alexandrafiedler3113
    @alexandrafiedler3113 4 года назад

    Do I use for CLP-Analysis (2-waved longitudinal design) the dependend variable time 1 or time 2? sorry but i am confused whether i compute Mahalanobis d for the regression term in my CLP-Model with: Dependend Variable (t2) regressed ON --> Dependend Variable (t1), independent Variable (t1), Moderator (t1). Or it wont matter if I do the mahalanobis for a simple regression time 1: Y1 regressed ON --> X1, M1 (and what about my second independent variable ? - should i put it into the regression for timepoint 1, too?)
    I would be very glad if anybody could help me with this confusion !! :D

  • @rashidsaid-ti3jz
    @rashidsaid-ti3jz 4 года назад

    Thank you Dr.Todd for these useful lessons. Please can you mention for the reference of using formula which you wrote in compute variable.
    1-..chi(mahalanobis, df).
    Thanks alot

    • @nahk-lx2tn
      @nahk-lx2tn 4 года назад

      rashid said he is not replying to actual questions. That’s sad

    • @rashidsaid-ti3jz
      @rashidsaid-ti3jz 4 года назад

      @@nahk-lx2tn hi wasim, I found the reference (hair, 2014)

  • @nerdofilo
    @nerdofilo 8 лет назад

    is this related to MCD?

  • @dr.mayankpant1571
    @dr.mayankpant1571 5 лет назад

    If we have 5 iv then what is the degrees of freedom

  • @oliviasimms3897
    @oliviasimms3897 3 года назад

    Hi, does anyone know why it won't give me output when I add two variables to the 'independents box? I can get output for them both separately but cannot get 1 output for them both

  • @abdulmoeed4661
    @abdulmoeed4661 2 года назад

    If we have more than one independent latent variables, mediators and final dependent variable, how we would place them in the 'Independent & Dependent ' variables list box while doing this test? Thanks Waiting for response.

    • @sinemkaraoglu1717
      @sinemkaraoglu1717 2 года назад

      Hello, did you find the answer to this question?

  • @adrianfajar323
    @adrianfajar323 4 года назад

    prof, i have 3 dependent variable and 6 independent variable, how to see mahalanobis ?

  • @drarsen33
    @drarsen33 2 года назад

    "Bloody hell. What is this Mahalanobis distance. I have never done it before...I am in bit of a pickle. Let me check youtube."
    clicks on first link. Video starts
    "Wait...I know this voice...is it...." scroll down.
    Well, thank you once again Dr Grande :D