That reference written out for anyone interested: Hoaglin, D. C., Iglewicz, B., and Tukey, J. W. (1986). Performance of some resistant rules for outlier labeling. Journal of American Statistical Association, 81, 991-999.
A part 3 should have been apparent at the end of the video. Google 'the right way to detect outliers part 3' and it should come up at the top of the search results.
Using the identical formula as posted here, this did not work for me for the upper quartiles for some reason (verified by comparing to "outliers" in SPSS which use 1.5). I had to put =A2-C6 for "lower" and =C2+C6 for "upper" directly, then it worked like a charm.
Thanks a lot, Just want to be sure, you are saying that If I want to give the reference of this method I should cite the third article in the video? or this is your method and the "g" value is from that article? I've found the article but I need to read it some more to understand it thoroughly... thanks again:)
Great demonstation of the Outlier Labeling Rule BUT the video ended before the explanation of why g = 1.50 is not a good value. What is the method for finding the correct value?
You don't mention this in you video, but how many times can/should this rule be applied to the same data-set? I.e. if I use it once, remove those outliers, and then find new ones after having applied the rule a second time - what then?
Hi there, Great videos. I'm wondering what to do in the case of getting a negative value for the lower bound. In my data Q1 =1639.995 & Q3 = 2913.56 so with a G of 2.2 I get a lower bound of -1161.848. Is there something I can do? Or is this rule only valid for distributions that are already close to normal before checking for outliers?
+Grace C The IQR does tend to work only when the data are fairly normally distributed. If your data are not normally distributed, you should consider using bootstrapping; you won't have to worry about outliers. I have a video on bootstrapping to get you started: ruclips.net/video/9VjzPnoUBJQ/видео.html
That reference written out for anyone interested: Hoaglin, D. C., Iglewicz, B., and Tukey, J. W. (1986). Performance of some resistant rules for outlier labeling. Journal of American Statistical Association, 81, 991-999.
Thanks!, great job!
What if the data has a non-normal distribution?
thank you!!
A part 3 should have been apparent at the end of the video. Google 'the right way to detect outliers part 3' and it should come up at the top of the search results.
Thank you
Using the identical formula as posted here, this did not work for me for the upper quartiles for some reason (verified by comparing to "outliers" in SPSS which use 1.5). I had to put =A2-C6 for "lower" and =C2+C6 for "upper" directly, then it worked like a charm.
The value for g=1.5 always remains the same for every calculation irrespective of the quartiles, mean, median values or does it changes with ....
Do you use the absolute value if you end up with a negative number?
Did you heard about box plot ?
Thanks a lot, Just want to be sure, you are saying that If I want to give the reference of this method I should cite the third article in the video? or this is your method and the "g" value is from that article? I've found the article but I need to read it some more to understand it thoroughly... thanks again:)
What happens next? It was getting very interesting but unfortunately the video ended suddenly :(
Great demonstation of the Outlier Labeling Rule BUT the video ended before the explanation of why g = 1.50 is not a good value. What is the method for finding the correct value?
Using g = 1.5 gets outliers wrong 50% of the time.
Using g = 2.2 is much better, as the video says in part 3.
You don't mention this in you video, but how many times can/should this rule be applied to the same data-set? I.e. if I use it once, remove those outliers, and then find new ones after having applied the rule a second time - what then?
I wondered the exact same thing
It's not my method. Cite the third article.
I followed the method but gave me negative Q1???
Hi there,
Great videos. I'm wondering what to do in the case of getting a negative value for the lower bound. In my data Q1 =1639.995 & Q3 = 2913.56 so with a G of 2.2 I get a lower bound of -1161.848. Is there something I can do? Or is this rule only valid for distributions that are already close to normal before checking for outliers?
Hi, I have the same problem, have you found a solution?
+Grace C The IQR does tend to work only when the data are fairly normally distributed. If your data are not normally distributed, you should consider using bootstrapping; you won't have to worry about outliers. I have a video on bootstrapping to get you started: ruclips.net/video/9VjzPnoUBJQ/видео.html
How can I calculate the lower quartile (without excel)?
One option in SPSS: Analyze -> Descriptive Statistics -> Descriptives -> Explore; click the 'Statistics' button; select the Percentiles option; click 'Continue'; click 'OK'
58.6 + 23 = 81.6, right? not 81.1
Suzana Ulian Benitez Has got something to do with the rounding. 1.5*15 is 22.5, not 23. If you add 58.6 to 22.5, you will get 81.1.