Thanks for this great explanation! Curious what the 1.5 factor is as it relates to the upper and lower bounds? Is this some type of outlier exclusion factor? Or just a suitable one based on the data set?
@@absentdata I found that I had to change that 1.5 to 1 for the dataset I was working with for it to pick up other values I considered outliers as well. Will do some more research on that factor and what it is exactly but I had to change it to get a better result.
It depends on the data set. For example even if you had a outlier boundaries that's negative, it doesn't always apply. For examples a child birth dataset might not use any negative outliers then you would just use 0 as your range.
Outliers live outside the inner quartile range. By statistical definition, they live 1.5 times below or above your inner quartile range.( copied from the link he ut in the descriotin box)
I found this very helpful, thank you! I'm trying to compare different business with their respective values to find the most "different" one. Is this method ok to use in that case?
In a sense yes, you may be doing a univariate analysis of the three businesses. For example you may be looking at Revenue across all the three businesses in comparing their distributions. You can compare the shape of the distribution whether it's a normal distribution or a very skewed distribution. Skewed distributions are affected by outliers.
Very helpful video, a quick question: Can I use this method to emit outliers in a degradation trendline? or this method is for data that fit normal distribution or some other distribution only?
This is part of the accepted original formula. Essentially quartiles can be scaled to standard deviations. The math equates 1.5×IQR to under 3 standard deviations from the mean. So the 1.5 was designed in the formula to capture the majority of outliers assuming the data is normally distributed.
How this works on positive only data. I have list of numbers coming daily say. I want to know if the trends are inline. What is the best way to use that. If there is any dip or spike in the trend, how to t find out.
Welll not really because that invalidates the need for calculating the quartiles. However, I think the approach you are trying to take is more maual where you are setting the boundaries for the outliers yourself.
Well a column should be seen as a unique dimension in your data. so if you have 20,000 columns. By Excel only has 16,384. I would question the structure of your data. I would see big you can melt the columns down to rows
Yes, its a best practice to use 1.5. However, you can alter it but then release that you will lose data. It conforms to a standard normal distribution which means that it's a reliable way of saying that most of your data falls under this about 97.5%
@@absentdata No. I mean just three data. I conducted an analysis with 3 trials so I generated 3 data points. And I need to know whether I can average all those three and report it. Edited** What I meant with "analysis" is "experiment"
@@absentdata Is that the best option for my case to use to identify the outlier? Can you advise me some other stat tools I can easily apply on a calculator.
@@ellereal8216 Unfortunately I can't really prescribe what to do cause I would need to know your data and context. However I think you can just explain your logic when presenting your findings. You can also can also use percentiles.
BRO THERE IS SOMETHING WRONG IN THIS DATA....BECAUSE WHENEVER I JUST APPLIED A DIFFERENT FORMULA TO FIND FIRST AND 3RD QUARTILE I JUST GET DIFFERENT VALUES WHICH IS NOT POSSIBLE
Hello Sir. If the Result Show True. Meaning That we have to delete the data (true only) or the significance true data? (Im Using Panel Data). For eg: 2013 TRUE apple.co 2014 False apple.co 2015 TRUE apple.co 2016 False apple.co 2017 False apple.co SHOULD I dELETE ALL.. or just the true ?
I had to go through a set of over 300 data points, this helped me so much, I'm so glad I found this video just in time. Thank you!!
Writing an Exam right now using excel and this saved my life!! Thank you!!
I regret not opening this video first. Great video!!
This is going to help me so much on my math test. Thank you.
This was awesome. I had a whole page 9 columns wide, saved me so much work. Thank you
Glad it helped!
This helped me so much I can’t thank you enough!!
Thank you so much this is the best video to explain this you just earned yourself a subscriber
Happy that you subscribed😁
Thank for making this video, you are using my favorite PP template!
Thank you for such a nice and patient explanation!
Sir also thank you so much for this video, gosh this made it so much simpler to understand how to calculate!!!
You are most welcome
Thanks for this great explanation! Curious what the 1.5 factor is as it relates to the upper and lower bounds? Is this some type of outlier exclusion factor? Or just a suitable one based on the data set?
1.5 is a constant that is used in this statistical formula. Think of it like PI
@@absentdata I found that I had to change that 1.5 to 1 for the dataset I was working with for it to pick up other values I considered outliers as well. Will do some more research on that factor and what it is exactly but I had to change it to get a better result.
I really love your explanations.Thanks a bunch!
Thanks dude u saved a depressed student about to end it all
I am glad this brightened up your day!!!
@@absentdata no. it brightened up my life.
Keep going dude! Stay strong!
@@tentacleprincess5772 i am strong now because of absent data
If I have no negative values, I asume that I have to use the absolute value for the lower bound, right?
When can outliers be positive and when can they be negative for the data set?
It depends on the data set. For example even if you had a outlier boundaries that's negative, it doesn't always apply. For examples a child birth dataset might not use any negative outliers then you would just use 0 as your range.
You are the best! really really helpful!
Happy to help!
Please make one complete tutorial video on Excel from basic to advanced
Thank you for the video. Just a quick question why are we multiplying by 1.5?
Thanks!
Outliers live outside the inner quartile range. By statistical definition, they live 1.5 times below or above your inner quartile range.( copied from the link he ut in the descriotin box)
Thank you so much! this is so well made! it explains so well! you're the best!!! :D
This was super helpful brother thank you so much
Glad to hear it
I found this very helpful, thank you! I'm trying to compare different business with their respective values to find the most "different" one. Is this method ok to use in that case?
In a sense yes, you may be doing a univariate analysis of the three businesses. For example you may be looking at Revenue across all the three businesses in comparing their distributions. You can compare the shape of the distribution whether it's a normal distribution or a very skewed distribution. Skewed distributions are affected by outliers.
@@absentdata Thank you so much!!
Is there a reason we'd use quartiles instead of standard deviations and the mean?
IQR works for on most data including skewed data or outlier resistance; use standard deviation for symmetric, normal distribution
This video helped so much❤
That's so great
Very helpful video, a quick question: Can I use this method to emit outliers in a degradation trendline? or this method is for data that fit normal distribution or some other distribution only?
Yes, absolutely. However then you are going to be dealing with standard deviation for normal distribution. This would be a different approach
@@absentdata Thanks!
It is a very helpful video, but what is the name of this method?
Iqr method
This helped me alot, thanks!
Thank you - This was really helpful!
What is the basis for the 1.5 multiplied to the IQR? What is the rationale for that? Thank you!
This is part of the accepted original formula. Essentially quartiles can be scaled to standard deviations. The math equates 1.5×IQR to under 3 standard deviations from the mean. So the 1.5 was designed in the formula to capture the majority of outliers assuming the data is normally distributed.
What if the lower quartile is negative in value, then our IQR turns in a really large value of Q3+Q1, will this affect the values of outliers?
No it should be fine.. However it depends on the data. Should you have negatives or not might be a better question
How this works on positive only data. I have list of numbers coming daily say. I want to know if the trends are inline.
What is the best way to use that. If there is any dip or spike in the trend, how to t find out.
Works on negative outliers also
Can you tell me in general what the reason is why a cell won't calculate?
Its hard to say without having the actual file. There are a host of reasons..
data is not a number
incorrect formula
@@absentdata Ok thank you for that!
can we delete the outliers and then proceed by leaving the spaces blank or do we have to fill them?
You could do that by using an if function
I like this video, very helpful thankyou
Can U please prepare a proper statistics series.i went through many videos but I like the way you teach
I am glad you enjoy the way I teach. Yes, I want to do a purely statistic playlist. Hopefully there is enough interest :)
Can i just plug in whatever the Q1 and Q3 I want? Thank you!
Welll not really because that invalidates the need for calculating the quartiles. However, I think the approach you are trying to take is more maual where you are setting the boundaries for the outliers yourself.
@@absentdata yep, I work with big listings at my work and we look for values that out of range based on unique base values. Thank you!
I tried to do this method but I would always get negative values for the lower bound so I can't tell if there's an outlier or not
Probably indicates that your data is very normally distributed with no outliers
@@absentdata would you mind if I showed you my data and say if you think I have correctly identified the outliers?
@@absentdata also my data are decimals so like 0.0347 etc. Would that affect the method at all?
Extremely helpful!
Thank you, thank you, thank you!!
What does it mean when u say click F4?
Is it on the keyboard? What if I’m using excel on an iPad
⌘⌥R or Command T in Mac I believe it will help you lock the position of the cells. BTW I've never had a Mac so I am hoping Google is correct :)
Thanks for this great video!
Hii! can i use this for a likert scale data?
this is a lifesaver
Super helpful, thank you!
why do we multiply by 1.5?
This is standard multiplier for outlier detection. However you can increase it
BTW dont forget to sort the data first this will be crucial at bigger datasets as it will be problematic otherwise
2
Hello, I tried to use the function "OR", I couldn't get the right answer. Instead I used the Function "AND". Thanks for the Video
Thanks I'm glad you solved the issue in the end
Fantastic! Awesome Thank you!!
Thanks for the video!
I have over 20,000 columns of data, how do i do this without doing each one seperately?
Well a column should be seen as a unique dimension in your data. so if you have 20,000 columns. By Excel only has 16,384. I would question the structure of your data. I would see big you can melt the columns down to rows
how do we know we want 1.5?
it is a constant value
how to count the number of true and false after this for the outliers
Thank you so much, sir.
Is 1.5 a constant ?
Yes, its a best practice to use 1.5. However, you can alter it but then release that you will lose data. It conforms to a standard normal distribution which means that it's a reliable way of saying that most of your data falls under this about 97.5%
How to apply this to 3 data points only? Can you help?
Do you just 3 rows of data? That would be the best use of this.
@@absentdata No. I mean just three data. I conducted an analysis with 3 trials so I generated 3 data points. And I need to know whether I can average all those three and report it.
Edited** What I meant with "analysis" is "experiment"
@@ellereal8216 Yes that could work for your specific application.
@@absentdata Is that the best option for my case to use to identify the outlier? Can you advise me some other stat tools I can easily apply on a calculator.
@@ellereal8216 Unfortunately I can't really prescribe what to do cause I would need to know your data and context. However I think you can just explain your logic when presenting your findings. You can also can also use percentiles.
Very helpful, thank you
Thank you for your help
liked and subscribed. a ton thanks
No fluff, thanks
amazing. thank you!
BRO THERE IS SOMETHING WRONG IN THIS DATA....BECAUSE WHENEVER I JUST APPLIED A DIFFERENT FORMULA TO FIND FIRST AND 3RD QUARTILE I JUST GET DIFFERENT VALUES WHICH IS NOT POSSIBLE
I have the same problem
thank you!
What is the name of the test?
I am not sure the test has a name. It's more of a technique to classify the borders of your data.
THANK YOU BRUHHHHH
WHY HAVE YOU USE 1.5* IN UPPER BOUND
1.5 is a constant value that I used in the equation
Thank you so much
شكراً، Tank you
F4 (absolute value) won't work in windows :(
Ryujinne Juano press( fn ) (f4) it would work
why the 1.5?
Its a statistical formula. Its similar to using a constant like 3.14 for PI
Thankyou
You’re welcome 😊
Why 1.5?
I have a mac and f4 also does not work
in Mac is 'Command + t' or '⌘+t'.
why do you multiply 1.5 to upper bound
it is just a statistical rule
How to calculate outliers by Kermit the Frog
Hello Sir. If the Result Show True. Meaning That we have to delete the data (true only) or the significance true data? (Im Using Panel Data).
For eg:
2013 TRUE apple.co
2014 False apple.co
2015 TRUE apple.co
2016 False apple.co
2017 False apple.co
SHOULD I dELETE ALL.. or just the true ?
thank you!