The Right Way to Detect Outliers - The Outlier Labeling Rule (part 3)

how2stats

Просмотров 61 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 21 авг 2024
I demonstrate arguably the most valid way to detect outliers in data that roughly correspond to a normal distribution: the outlier labeling rule. I also point out that using 2.2 rather than the more common 1.5 is more appropriate as a multiplier.
The formulae I use in the video are:
Upper = Q3 + (2.2 * (Q3 - Q1))
Lower = Q1 -- (2.2 * (Q3 - Q1))
The references in video are:
Tukey, J.W. (1977). Exploratory Data Analysis. Reading, MA: Addison-Wesley.
Hoaglin, D.C., Iglewicz, B., and Tukey, J.W. (1986). Performance of some resistant rules for outlier labeling, Journal of American Statistical Association, 81, 991-999.
Hoaglin, D. C., and Iglewicz, B. (1987), Fine tuning some resistant rules for outlier labeling, Journal of American Statistical Association, 82, 1147-1149.

Комментарии • 42

@craigpaulk2285 6 лет назад
Thank you so much for these videos concerning standard deviation and outliers. I was going to use the 2 Standard Deviations rule to find outliers in a series of datasets, but I believe this will be a better solution. Much appreciated!
@DougGirard 11 лет назад ⁺¹
Here's the cited reference: Hoaglin, D. C., and Iglewicz, B. (1987). Fine tuning some resistant rules for outlier labeling, Journal of American Statistical Association, 82, 1147-1149.
@how2stats 13 лет назад ⁺²
Hi, It is a fair bit more complicated in the non-normal distribution case. I do plan on getting to it, but I've got other videos in the works at the moment. I'll get to it though. How non-normal is your distribution? You might want to check out my videos on detecting a non-normal distribution, as well as the videos I have on skew in the meantime. That is, a lot of distributions that are non-normal statistically are really not that bad.
@TokenFun105 11 лет назад ⁺²
hello, great channel... would it be possible to gain some literature references on the detection method you would use for non-normal distributions. I fear that I will have to adopt those and I understand that you might not have time to upload a video in time. many thanks.
@how2stats 11 лет назад
Yes, I'd say any z-score rule is invalid, or at least less valid than the IQR rule. Technically, yes, the IQR rule is only applicable to normal distributions, but you don't have much in the way of practical options for non-normal distributions, so I would still use the IQR demonstrated here.
@kshipramoghe3361 9 лет назад
Excellent video
@asmaabeltagy7614 4 года назад
very useful. Thank you
@anonym13252 9 лет назад ⁺¹
Thank you for the useful video. I was just wondering if you could have uploaded the Excel sheet to calculate the outliers.
Thanks
@dwise75 11 лет назад
Thank you for the helpful tutorials
@dielynhijosa7561 9 лет назад
thank you for this video...this will really help my report..thank you thank you
@sibelakyuz8973 6 лет назад
I read the Hoaglin, D. C., and Iglewicz, B. (1987) paper in detail but got a question. The table in section 3 shows which k value to use in 90 or 95 percentiles depending on the number of subjects. In the video you said Q1 and Q3 correspond the 75 and 25percentiles. Arent we supposed to look at 95 and 5 percentiles for calculating upper and lower ends?
Thanks for the great videos!!
@how2stats 11 лет назад
I can't think of any conditions under which I would not use the 2.2 value as the multiplier.
@nahlabetelmal4147 11 лет назад
PLEASE post how to deal with outliers in case of non-normality distribution. many thanks
@mariaszymczak2062 10 лет назад ⁺⁵
@how2stats: How would you identify outliers in non normal distributed data? Could you please make a video on this topic? Greetings :)
@harryfranz3758 10 лет назад
Any answers to that yet? I was wondering too.
@how2stats 10 лет назад
Harry Franz From memory, it is much more complicated in that you couldn't implement it easily either in excel or SPSS. I wouldn't worry about it too much, if your absolute skew is less than 1.0. I just made that up, but I would go with it.
@harryfranz3758 10 лет назад
thanks!
@rahul122112 8 лет назад
+how2stats Hi. Great job with the explanations. However, I don't think a part 4 for non-normal distribution is coming anytime soon, since this was released in 2011 ?
@Smirna1000 6 лет назад
Great video, thanks!
Do you have any reference-recommendation for non normal distributed data and outliers? I was reading the presence of outlier may be consequence of non-normal distribution. My data are not normaly distributed. However, I don't need to transforme them, because of specificty of sample I am dealing with.
I would be really grateful for any reccomendation. Thanks in advance!
@praveenelangovan253 4 года назад
Thank you
@daniluna9885 6 дней назад
Thank you so much, it's very useful. What is the minimum N recommended for using this method?
@liam2 11 лет назад
brilliant just like all your videos. do you support the use of 2.2 for all normal dist' data or only under certain conditions (you quickly mentioned sample size being a factor 25s into the vid)? Thanks :)
@folliculostella 12 лет назад
Great video prof, really helpful..thanks ^^
@musicarnab 9 лет назад
Hi! this is a very good video which helps us to understand outliers. However, is it the same technique we can use to determine UCL and LCL?
@mbalenhleluthuli7420 9 лет назад
helpful video thank you
@aalhau5525 8 лет назад
Really thanks,
@0903Jerome 10 лет назад
@how2stats: I find your video useful. I have some questions: How do you (empirically) know when to apply 1.5 and 2.2 as a multiplier? Can you apply 1.5 on the right side only and 2.2 on the other side simultaneously? In your example 1.5 is only applicable to the upper portion of the distribution and not to the lower portion in which case you applied 2.2. Did you apply 1.5 only to the upper side and 2.2 on the lower side? Please enlighten me and more power to you.
@fah79 7 лет назад
He mentioned that you should use 2.2 if the size of the sample is not "huge" and also normally distributed. But maybe it is worth to check the papers he mentioned.
@adamdavis5654 8 лет назад ⁺¹
Great video! I did have one question. Using the value of 2.20 for g, what if your upper or lower bound extends beyond the response scale range for a variable? For instance, if my response scale range runs from 1(Disagree) to 7(agree), I have a Q1 = 4.00 and a Q3 = 5.00, the difference is 1.00, my upper bound extends to 7.20 beyond the valid response scale range. In this case, should I stick with the more conservative 1.50 for g? Any assistance would be greatly appreciated. Thank you.
@how2stats 8 лет назад
If Q1 = 4.0 and Q3 = 5.0, what is your median?
@adamdavis5654 8 лет назад
The median is 4.44. Thank you.
@sanneholvoet7663 7 лет назад
Dear Adam, did you figure out what to do when your upper or lower bound extends beyond the response scale? I have the same problem with my 4 point (upper bound = 4.58 with g = 2.2) scale so I am not sure if I can use this Outlier Labeling Rule. Thank you.
@marketmail49 9 лет назад
if possible i'd like to have this program generated normal distribution data sample. pls let me know. and btw, nice videos. i have watched a couple of them.
@wurpdaborbuko 8 лет назад
some researchers suggest that you have to repeat the procedure of identifying and removing outliers several times, until no outliers appear. is that in any way valid?
@TokenFun105 11 лет назад
what about Tabachnick & Fiddell's recommendation of z-score > 3.29? is this invalid too, like the 2 SD method (it seems rather arbitrary)? and is it only applicable to normal distributions?
@ArioR 10 лет назад ⁺¹
what happens when I get negative values for lower?
@how2stats 10 лет назад ⁺²
Sounds like your distribution may be too non-normally distributed for this approach.
@chossy4 10 лет назад
how2stats I have extremely non-normal data (financial ratio so is expected), can this or a similar procedure be used to at least identify the largest values causing skew?
@icy-spoon85 7 лет назад
I have percentage data that can't exceed 1, but with g=2.2 I exceed it. Is there something wrong with that or is it acceptable to have that result and just see it as upper = 1 rather than >1? Thanks
@junjiethu 2 года назад
Hi, can anyone show me which is the video explaining outlier detection for non-normally-distributed data? I can't find it anywhere.
@DougGirard 11 лет назад
Under what circumstances do you use 1.5? When do you use 2.2? My school doesn't have access to the article :(
@tommyvz84 6 лет назад
interesting

Следующие

Автовоспроизведение

The Right Way to Detect Outliers - Outlier Labeling Rule (part 1)