I've been hunting youtube, internet, books, and all of them were fairly crappy at explaining why/how to do this (without proper background before diving into it). This was a very 'teaching you to fish' type of video- thanks so much.
The best explanation of this seemingly tricky stuff I've ever seen, thank you so much ! Clarity, every step is slowly explained, and the illustration are great. Awsome, thank you very much !
Thanks for nice video.. I got some additional information but not the answer of my search. I am a new learner and I wanted to change the outline colors of my box plot and the median line color
I was wondering if you can give some feedback on the following situation: Sometimes the mean value is greater than and 50% percentile representation of my boxplot. For example, mean=4.142857 and 50% = 3.000000. Is this right? Don't they have to have the same or approximate the same value?
Good question and this is definitely possible. The 50% percentile represents the median of your data, which is calculated differently than the mean. You likely have some very large values that are making the mean higher. For example, the median (50% percentile) of [1, 3, 8] is the middle value 3, but the mean is (1 + 3 + 8)/3 = 4. Usually the mean and median will be close together, but this isn't guaranteed since outliers will influence the mean more than the median.
Do you know of a good way to add a marker that shows how a certain value sits on the box plot? For example the most recent value in a time series vs where it fits against the distribution of historical values?
Oh cool - that seems super useful! I guess you can always plot seaborn figures on top of each other. So I might make a box plot and then put something like a scatter plot on top of that. Then for the scatter plot you could just plot a subset of the data if you want. For example, if df is the famous seaborn tips dataset: sns.boxplot(x='time', y='tip', data=df) sns.scatterplot(x='time', y='tip', data=df.iloc[-1:], color='black');
@@KimberlyFessel thanks, that scatterplot code works quite well. I find this is a great alternative way to show how a data point compares to historicals in a time series, as opposed to the normal way of showing a line chart. It's much easier on the eyes to show a box plot in my view (especially if there are hundreds of data points to compare against).
Hi Kimberly, I have to admit that your tutorials are probably some of the top notch teachings I have ever seen. No matter how I try, I would not be able to thank you enough. I became a true seaborn fan and I absolute love for its efficiency, versatility and its ease of use. I have a question though. The majority of what you showed in boxplots or other seaborn plotting methods (FacetGrid, lineplot, hist, kde), you hue/split the data by category or assume there is a viable that's a category. What I am dealing with are datalog files where the only category are the tested part serial numbers that are in one-to-one relationship to the corresponding numbers in the parameter columns. I have different test name in different columns. For example, There is a column that has test results for those serial numbers at one voltage condition, then another column for the same test but @ different voltage conditions etc. It's all columns. Those voltage conditions are not in rows. Can you have an example illustrating how I could use a box plot and plot data from different columns on a single figure? Thank you so much, Keep up the good work. Best regards, Youcef
I think I was able to answer my own question. But if there is a more efficient way of doing this, it would be great to read your reply. I extracted my columns of interest with columns I needed to keep constant data id's. I used pd.melt to convert renamed columns (from wide to long). I merged the dataframes corresponding to different column variable names, then used your seaborn boxplot method. Thanks.
Yes you can! I often use matplotlib's pyplot module to change the size of my seaborn figures. Adding a line before your seaborn plot like plt.figure(figsize=(6, 3)) will update the figure size to 6 inches wide and 3 inches tall. My video about the matplotlib figure size might also be helpful: ruclips.net/video/UUy6_ElQXBY/видео.html
The best explanation of boxplots!! Keep doing videos, please)
Thank you -- glad it was helpful! Will do!
@@KimberlyFessel same feeling, best explanation! You are a life saver!
Definitely the best explanation of box plot so far.
Thanks a million
Great explanation. Just a typo in 1:54 where it should be 'Interquartile Range' instead of 'Inner quartile Range'. Very useful video!
I really loved the way she explained every point. it's amazing. I will share this channel with my friends who need help regarding seaborn.
Not only your video is great but also the the files at github.... thank you very much!!!!
I have been trying to figure out this problem for my capstone for the past 4 hours and you made everything so simple! I cannot thank you enough
I've been hunting youtube, internet, books, and all of them were fairly crappy at explaining why/how to do this (without proper background before diving into it). This was a very 'teaching you to fish' type of video- thanks so much.
Thank you very much for the compliment! Glad to hear this helped 😀
that was a really great explanation. loved the content organization and planning. thank you so much
So glad you enjoyed the explanation! I create an outline for the structure of each video - so good to know that is effective so far!
The numbers (views and likes) don't do justice to the quality of the information provided in the video.
Keep up the good work!
Thanks for the support! Will do!
The best explanation of this seemingly tricky stuff I've ever seen, thank you so much ! Clarity, every step is slowly explained, and the illustration are great. Awsome, thank you very much !
So glad it helped! Cheers 😄
Concise and solid explanations. Extremely useful. Thanks
You blew my mind with the order feature. You don't know how many data frames I have rearranged!
Right? That feature is so useful!
Thankyou Kimberly Fessel for your wonderful video. The way of your presentation and contents are excellent.
The best video on boxplot ,I have ever seen on RUclips❤️❤️ keep doing
So glad you enjoyed the video -- will do!
This was the best video explanation of box plot. Thanks.
Thank you - very glad my explanation was helpful!
This is the best explanation of concept and the code too. Keep it up. You deserve more followers! Keep it up!
Your video is more informative. Please make video regularly. Thanks
Also would like to know regarding the customization of color options....
outstanding explanation ! ... Thank you !
wow, very nice explanation..........you are the best
Why doesn't sns.boxplot(x=cars.origin, y=cars.mpg); give the following error?
TypeError: Neither the `x` nor `y` variable appears to be numeric.
Thanks for nice video..
I got some additional information but not the answer of my search.
I am a new learner and I wanted to change the outline colors of my box plot and the median line color
your videos and teaching are as perfect as you.
Thanks you very much :)
Thank you so much - glad to hear you are enjoying the videos!
Awesome presentation , short, crisp and clear. Thanks a lot and appreciated from my heart.. Why don't tryout for scikit learn and Pandas libraries
Hi, Kimberly! In video you says that whiskers low limit equals Q1-1,5*IQR, whiskers upper limit equals Q3+1,5*IQR.
08:18 we see
Q1 = 17
Q2 = 23
Q3 = 29
IQR = 29 - 17 = 12
whisker low limit = Q1-1,5*IQR = 17 - 1,5 * 12 = -1
whisker upper limit = Q3-1,5*IQR = 29 + 1,5 * 12 = 47
05:12 we see whisker's limits are not equal -1 and 47.
Why is it?
This is because the minimum value in the dataset is 9 and the maximum value is 46. So, there's no need to stretch the whisker from -1 to 47.
I have been learnt a lot with your videos. Thank you very much.
So happy to hear that!
Thanks for this instructive video 👏🏻
I was wondering if you can give some feedback on the following situation: Sometimes the mean value is greater than and 50% percentile representation of my boxplot. For example, mean=4.142857 and 50% = 3.000000. Is this right? Don't they have to have the same or approximate the same value?
Good question and this is definitely possible. The 50% percentile represents the median of your data, which is calculated differently than the mean. You likely have some very large values that are making the mean higher. For example, the median (50% percentile) of [1, 3, 8] is the middle value 3, but the mean is (1 + 3 + 8)/3 = 4. Usually the mean and median will be close together, but this isn't guaranteed since outliers will influence the mean more than the median.
@@KimberlyFessel thanks for your time and explanation
very well explained, thank you very much. i highly appreciate it.
perfect tips !
perfect explantion, nicely done!
Thanks very much. Your video is excellent.
Nice explanation .
Excellent work!
Do you know of a good way to add a marker that shows how a certain value sits on the box plot? For example the most recent value in a time series vs where it fits against the distribution of historical values?
Oh cool - that seems super useful! I guess you can always plot seaborn figures on top of each other. So I might make a box plot and then put something like a scatter plot on top of that. Then for the scatter plot you could just plot a subset of the data if you want. For example, if df is the famous seaborn tips dataset:
sns.boxplot(x='time', y='tip', data=df)
sns.scatterplot(x='time', y='tip', data=df.iloc[-1:], color='black');
@@KimberlyFessel thanks, that scatterplot code works quite well. I find this is a great alternative way to show how a data point compares to historicals in a time series, as opposed to the normal way of showing a line chart. It's much easier on the eyes to show a box plot in my view (especially if there are hundreds of data points to compare against).
thank you for the good explanation
Great tutorial!! I had a doubt. Is there a way we can visualise the data points themselves, on the boxplot?
I just loved it!
Hooray -- thank you! Glad you enjoyed the video. 😀
Extremely helpful! Thanks a lot! ❤❤
Most welcome - glad to hear it was helpful! 😄
Hi Kimberly, I have to admit that your tutorials are probably some of the top notch teachings I have ever seen. No matter how I try, I would not be able to thank you enough. I became a true seaborn fan and I absolute love for its efficiency, versatility and its ease of use.
I have a question though. The majority of what you showed in boxplots or other seaborn plotting methods (FacetGrid, lineplot, hist, kde), you hue/split the data by category or assume there is a viable that's a category. What I am dealing with are datalog files where the only category are the tested part serial numbers that are in one-to-one relationship to the corresponding numbers in the parameter columns. I have different test name in different columns. For example, There is a column that has test results for those serial numbers at one voltage condition, then another column for the same test but @ different voltage conditions etc. It's all columns. Those voltage conditions are not in rows. Can you have an example illustrating how I could use a box plot and plot data from different columns on a single figure?
Thank you so much,
Keep up the good work.
Best regards,
Youcef
I think I was able to answer my own question. But if there is a more efficient way of doing this, it would be great to read your reply. I extracted my columns of interest with columns I needed to keep constant data id's. I used pd.melt to convert renamed columns (from wide to long). I merged the dataframes corresponding to different column variable names, then used your seaborn boxplot method. Thanks.
Really great & helpful!!!
Thank you - very glad to hear it helped 😄
well done big thanks
👍 Cheers!
Loved it mam😊
Thank you -- glad you enjoyed it!
Very gratefull of your job!
Thanks -- glad you enjoyed the video!
Thank u! It helps me a lot
Excellent -- glad to hear that it helped!
Thanks a lot @Kimberly Fessel.
Is there a way to code change plot window size?
Yes you can! I often use matplotlib's pyplot module to change the size of my seaborn figures. Adding a line before your seaborn plot like plt.figure(figsize=(6, 3)) will update the figure size to 6 inches wide and 3 inches tall. My video about the matplotlib figure size might also be helpful: ruclips.net/video/UUy6_ElQXBY/видео.html
excellent job
Thank you!
Amazing!
(It is "interquartile" not "inner quartile", American accent might make you confuse the two)
Thank you
Most welcome 😁
02:00 nice explaination
Thanks! Glad you liked it!
amazing
Thanks much!
Please start making videos again
great
Thank you!
I find your videos so interesting, I wish you could translate them into Spanish.
Oh how I wish I knew Spanish! Glad to hear you like the videos though!
@@KimberlyFessel Thank you. Use an AI translator to translate your videos from English to Spanish.
the data has proves, Japan cars are best in the world.
Thanks, excellent explanation, great video!