My lecturer told me not to use box plots to check for outliers as it only uses the median and interquatile range rather than the mean, he then advised me to create z-scores to find outliers as this is based on the mean, however, he only showed us how to do that manually and not with spss.
Depends on how non-normal the distribution is. I'd say skew less than .50 should be fine. There are outlier detection methods for non-normal distributions, but I haven't learned them yet!
I do have two questions.first is it mandatory to check normality for individual contnious variables or one by one secondly can we check normality of our data after coding?
Thanks for the video, it has helped me in my research work. But if I have a query, in the case of time series, if we want to assess normality, should this be done only on the component called "noise"? Thanks
Good work, thank you for the video! But I've got the problem that my variable is metric with a huge range from 3 to 12 000 000, that is why I can't detect the extreme outliers (multiplier 3.0) visually in the boxplot visualization. The scale is too wide to identify the values that are too low. How can I solve that problem?
Extreme outliers can distort the visual appeal of a box plot. You might consider simply reporting that the value of 12 000 000 was an outlier and dealt with (either removed or Winsorised). Then, re-do the box plot.
Hi there. First of all I have to thank you for such amazing videos. Secondly I have a problem and I have tried hard to find a solution but all in vain. I had some missing data and on top of it I also removed few outliers. I have multiple variables for single subject. I tried to do a repeated measure ANOVA but just because of one missing variable for a subject, all other variables are also ignored and I am loosing subjects. A had 23 subjects but ANOVA analyze just 14. If I put ZERO in missing varaible's place it gives me lower MEAN value. Please tell me how to fix the missing data so I can analyse all the subjects and it should also not affect my MEANS for all the varaibles. P.S: I can not to any computation method (I have seen your MCAR videos) to predict the values. It will mess up my data very bad.
Yes, check out Hoaglin's research; he might say it in this paper: Hoaglin, D. C., Iglewicz, B., & Tukey, J. W. (1986). Performance of some resistant rules for outlier labeling. Journal of the American Statistical Association, 81(396), 991-999. Or another paper in that time period.
Hi, probably a dumb questions, but when you go from the Var1 data set to Var2 data set, what would you call the "error bars" in the var2 graph, because technically the top error bar isnt the "maximum" as the "maximum" is the outlier. Thanks.
@@how2stats I thought the 25th and 75th percentile were the top and bottom lines of the box? Im asking what would you call the error bar above and below the box, given the outlier is the 'maximum'.
Nice video, SPSS always breaks my brain
Thank you for this video. Years later and it is still helpful.
This is one of my favorite channels on youtube! Thorough yet clear. Keep up the good work man!
Thanks for providing citations! Really helpful.
Thank you Dr. How2stats!
Thank you for the explanation! It's very good and simple! Thanks a lot!
Thanks you for helping me with my homework in Advanced Statistics
My lecturer told me not to use box plots to check for outliers as it only uses the median and interquatile range rather than the mean, he then advised me to create z-scores to find outliers as this is based on the mean, however, he only showed us how to do that manually and not with spss.
Really helpful, informative and to the point. Thanks!
Thanks, great insight! 💯
Thank you for the instructions and references.
How do you remove the outliers?
Thank you so much for an amazing explanation. :)
Thanks so much for this valuable information
Sir you the GREAT
Thanks, this was very helpful
In multivariate analysis, a Zscore = 3.2 would be an outlier if the data set contain 1000 cases?
Can we use the method of labelling outliers for non-normal data ? If not how do we identify outlier in non normal data?
Depends on how non-normal the distribution is. I'd say skew less than .50 should be fine. There are outlier detection methods for non-normal distributions, but I haven't learned them yet!
I do have two questions.first is it mandatory to check normality for individual contnious variables or one by one secondly can we check normality of our data after coding?
It would be great to know about a technique in SPSS to identify an outlier based on standard deviation. Could you please guide on this?
once you detect an outlier what do you do next? do v remove it from the data set?
Good question. I usually winsorize it: ruclips.net/video/WJuB0vZp6w4/видео.html
so how do u choose the 3 multiplier? u did the same thing
You don't have to "choose" anything. SPSS automatically reports results with the 1.5 and 3.0 multipliers (circles and stars, respectively).
Thank you for this!
Thanks for the video, it has helped me in my research work. But if I have a query, in the case of time series, if we want to assess normality, should this be done only on the component called "noise"? Thanks
Good work, thank you for the video! But I've got the problem that my variable is metric with a huge range from 3 to 12 000 000, that is why I can't detect the extreme outliers (multiplier 3.0) visually in the boxplot visualization. The scale is too wide to identify the values that are too low. How can I solve that problem?
Extreme outliers can distort the visual appeal of a box plot. You might consider simply reporting that the value of 12 000 000 was an outlier and dealt with (either removed or Winsorised). Then, re-do the box plot.
Hi there. First of all I have to thank you for such amazing videos. Secondly I have a problem and I have tried hard to find a solution but all in vain. I had some missing data and on top of it I also removed few outliers. I have multiple variables for single subject. I tried to do a repeated measure ANOVA but just because of one missing variable for a subject, all other variables are also ignored and I am loosing subjects. A had 23 subjects but ANOVA analyze just 14. If I put ZERO in missing varaible's place it gives me lower MEAN value. Please tell me how to fix the missing data so I can analyse all the subjects and it should also not affect my MEANS for all the varaibles.
P.S: I can not to any computation method (I have seen your MCAR videos) to predict the values. It will mess up my data very bad.
Does the 2.2 multiplier break down at all when applied to larger data sets? Say, n = 600?
Yes. I'd use 2.2 multiplier for samples between 20 and 300. Thereafter, I'd use a multiplier of 3.0.
Is there research supporting this though?
Yes, check out Hoaglin's research; he might say it in this paper: Hoaglin, D. C., Iglewicz, B., & Tukey, J. W. (1986). Performance of some resistant rules for outlier labeling. Journal of the American Statistical Association, 81(396), 991-999.
Or another paper in that time period.
Thank you!
muy bueno
Thank you
Great videos. Where can I get the Excel you are using to calculate outliers based on the 2.2 multiplier?
Hi, probably a dumb questions, but when you go from the Var1 data set to Var2 data set, what would you call the "error bars" in the var2 graph, because technically the top error bar isnt the "maximum" as the "maximum" is the outlier. Thanks.
It's a fine question. They correspond to the 25th (low bar or lower quartile) and 75th (high bar or upper quartile) percentiles.
@@how2stats I thought the 25th and 75th percentile were the top and bottom lines of the box?
Im asking what would you call the error bar above and below the box, given the outlier is the 'maximum'.
grazi mile
so how do you delete this damn 12