Master Box-Violin Plots in {ggplot2} and Discover 10 Reasons Why They Are Useful

Поделиться
HTML-код
  • Опубликовано: 21 авг 2024
  • Boxplots display a wealth of useful information about the dataset. In this video, we'll start with the most basic boxplot, build every part of this notched box-violin plot in {ggplot2} step by step, and understand why every detail matters 😉
    If you only want the code (or want to support me), consider join the channel (join button below any of the videos), because I provide the code upon members requests.
    Enjoy! 🥳
    Welcome to my VLOG! My name is Yury Zablotski & I love to use R for Data Science = "yuzaR Data Science" ;)
    This channel is dedicated to data analytics, data science, statistics, machine learning and computational science! Join me as I dive into the world of data analysis, programming & coding. Whether you're interested in business analytics, data mining, data visualization, or pursuing an online degree in data analytics, I've got you covered. If you are curious about Google Data Studio, data centers & certified data analyst & data scientist programs, you'll find the necessary knowledge right here. You'll greatly increase your odds to get online master's in data science & data analytics degrees. Boost your knowledge & skills in data science and analytics with my engaging content. Subscribe to stay up-to-date with the latest & most useful data science programming tools. Let's embark on this data-driven journey together!

Комментарии • 48

  • @WilOspinoC
    @WilOspinoC 9 месяцев назад +4

    As usual, the content does not disappoint. You always keep expectations high and deliver. Dopamine and serotonin run through my body every time you upload a new video. Once again Me Yury, thank you so much for your educational work.

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  9 месяцев назад

      Wow, thank you, Wil! That's by far the best feedback I have ever received! I'll try to make sure your dopamine and serotonin levels continue to rise 😉 Thanks for your support!

  • @shadyamigo
    @shadyamigo 9 месяцев назад +3

    Would you mind checking. In the first part you say the whiskers extend to the maximum and minimum but I think the geom_boxplot doesn’t go all the way to max and minimum-hence why there are outliers. From the documentation “The upper whisker extends from the hinge to the largest value no further than 1.5 * IQR from the hinge (where IQR is the inter-quartile range, or distance between the first and third quartiles). The lower whisker extends from the hinge to the smallest value at most 1.5 * IQR of the hinge. Data beyond the end of the whiskers are called "outlying" points and are plotted individually.”

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  9 месяцев назад

      thanks for pointing it out, you are correct: maximum should have been defined as the largest value no further than 1.5 * IQR from the hinge. I guess, I just wanted to first describe the box, then outliers later, and this step by step slow explanation has a cost of not being able to be precise all the time. Being precise immediately would throw several concepts at the learner, like box, outliers, IQR, hinge ... I just hope that I compensated for it later in the video. Thanks again for being attentive!

    • @shadyamigo
      @shadyamigo 9 месяцев назад

      @@yuzaR-Data-Science it was all very clear. Thanks for providing this material

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  9 месяцев назад

      @@shadyamigo glad you liked it! cheers, mate

  • @hikeaway1596
    @hikeaway1596 9 месяцев назад

    I love your tutorials! They are soo informative, that I need to rewatch them in order not to miss any important detail :) thanks for doing this, keep up a great work!

  • @eliapp
    @eliapp 9 месяцев назад

    I love the way you explain these concepts. It's almost as if you live inside the data ❤

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  9 месяцев назад

      Glad you enjoy my explanations 😊 I probably sometimes live inside of the data 🙈😂 thank you for such a nice feedback! Much love!

  • @zane.walker
    @zane.walker 9 месяцев назад

    A very informative (and well produced) 17 minute video. I picked up on your trick of wrapping a plot inside of a ggplotly command a video or two ago and find it very useful (wish I had discovered that earlier)! Also, some nice tips on adding mean, CI, etc. to the standard boxplots. I like using the ggbetweenstats command. which I started using after one of your earlier videos, on small sets of groups but they don't always work that well with larger numbers of groups. Adding more information to standard boxplots seems like a good compromise. Very much appreciate your videos and thank you for sharing your insights!

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  9 месяцев назад

      thanks indeed for such a nice feedback! I very much enjoy creating content and the fact that it's useful for more people than just me, means a lot to me! appreciate your support!

  • @moviezone8130
    @moviezone8130 3 месяца назад

    You absolutely set the bar dear. I can't wait to watch it again and again. Can you share the codes as pdf or some other method so that I can practice on my own. Thanks.

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  3 месяца назад

      Thanks again for such a great feedback! I am very happy it's useful! As I sad in the other comment of yours, please, feel free to rewatch and pause the video to write down the code yourself, since it is a good learning strategy. Better then copy-pasting. But if you wish to have the hole code, consider to join the channel (it's the join button below every video) and I'll send you the code. Kind regards!

  • @akanequeen
    @akanequeen 4 месяца назад

    This is sooo great!!

  • @suelook9562
    @suelook9562 7 месяцев назад

    Very educative and simple to understand

  • @tarasst6887
    @tarasst6887 9 месяцев назад

    super high quality material presentation

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  9 месяцев назад

      Thanks a lot, Tarass! I also enjoy creating content!

  • @Marcosls2015
    @Marcosls2015 3 месяца назад

    Hi Yuri, really thanks for sharing this knowledge! This was fantastic to open the mind to the possibilities of this plot. Please, I wonder if you could share the code? Thanks

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  3 месяца назад +1

      Hi Marcos, thanks a ton for joining the channel! Your support is much appreciated! Of coarse you can have the code. I just posted it on the community tab for members only. Please, let me know whether you can see/find it. Kind regards! Yury

  • @statlab_stat.solution
    @statlab_stat.solution 9 месяцев назад

    Great. Keep going

  • @MoritzSchorn
    @MoritzSchorn 8 месяцев назад

    Hi Youry,
    I really ike your videos and they make me want to learn more of R and Data Science :)
    Do you you have any recommandations for students who want to master both?
    I am looking forward to the next video!

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  8 месяцев назад +1

      Yes, definitely, Moritz. The best start in my opinion is the R4DS book. The best finish is the tidymodels book. Both are online and free. In between you'd need to go through a few classic statistics book, learn and compute statistical tests and models. Some of the topics you'll find on my channel. This will prepare you for machine learning. Thanks for such a nice feedback! I am glad my content is useful!

    • @MoritzSchorn
      @MoritzSchorn 8 месяцев назад

      @@yuzaR-Data-Science Thank you for the tips :)

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  8 месяцев назад

      you are very welcome @@MoritzSchorn

  • @juliusirungu1363
    @juliusirungu1363 9 месяцев назад

    Great and very informative

  • @sebbikankondi5546
    @sebbikankondi5546 8 месяцев назад

    Excellent video as always, thank you so much for sharing this. One question, you mentioned replacing or removing incorrect representations of sample sizes on the x.axis that materialize as a result of further splitting the plots into smaller sub-plots. What approach would you use to still display sample sizes on your plot after splitting them into sub-plots i.e., replacing and not simply removing them?

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  8 месяцев назад +1

      Thanks for the excellent question! I knew I'll get this question, because I asked myself the same one :) I don't have a quick solution for it, to be honest, because there is already a function, which does calculate the sample size and puts the values on the x-axis. So, I never needed to figure it out. It only works with one additional variable, though. Here is this function:
      library(ggstatsplot)
      grouped_ggbetweenstats(data = Wage, x = education, y = wage, grouping.var = health_ins)

    • @sebbikankondi5546
      @sebbikankondi5546 8 месяцев назад

      Thank you, grouped_ggbetweenstats() works really well and adds useful additional info. To simply add sample sizes to the already existing plot, adding stats_n_text() from EnvStats package works really well too:
      p6+
      facet_grid(jobclass ~ health_ins)+
      stats_n_text(y.pos=5).
      But that displays sample sizes on the plot and not the x.axis.

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  8 месяцев назад

      you can also produce separate plots and put them together at any time if this would reduce the complexety of programming. patchwork is there an amazing package, I will release a review of this one very soon.

  • @Walker-nb9de
    @Walker-nb9de 9 месяцев назад

    Great. Thanks for the up.

    • @Walker-nb9de
      @Walker-nb9de 9 месяцев назад

      Please upload some tutorial related raster data manipulation In R,. That would be really helpful.

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  9 месяцев назад

      thanks a lot!

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  9 месяцев назад

      thanks for the idea, I did not know about the raster data manipulation yet, but I'll have a look at it and put it on my list of tutorials I plan to do. thank you for watching!

    • @Walker-nb9de
      @Walker-nb9de 9 месяцев назад

      @@yuzaR-Data-Science Thanks.

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  9 месяцев назад

      you are very welcome!

  • @kennethgottfredsen767
    @kennethgottfredsen767 9 месяцев назад

    Hi Youzar,
    Great video, and I really like the random jokes thrown in here and there. Keep it up!
    / Kenneth

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  9 месяцев назад

      Thanks for the feedback, Kenneth! :) It's good to see that people get my jokes. Because I am never sure, whether they are funny to more people than just me 😁

    • @kennethgottfredsen767
      @kennethgottfredsen767 9 месяцев назад

      @@yuzaR-Data-Science Do you have any videos on how to connect to a cloud or local SQL-server in R?

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  8 месяцев назад

      not yet, I might come in a distant future, until then I plan to cover some modelling and machine learning topics.

  • @Learner_2000
    @Learner_2000 Месяц назад

    Excellent video and easy to learn .Thank you so much. I have one query
    after plot this graph,when I save with proper dimension, the text size appeared very small. Im trying so many times with a manually fixed font size ,but not success. Could you provide any idea to fixed.
    Note: this problem is only appeared within ggstatplot function graphs like Pairwise compariaon, vilion plot..
    Thank you in advance

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  Месяц назад +1

      Yes, you can use "device" argument, like I said near the end of the video, and use the same extention as in your picture. Here is the code for jpeg example:
      ggsave(
      "magic_boxplot.jpeg",
      # "pdf", "jpeg", "tiff", "png", "bmp",
      # "svg", "eps", "ps", "tex" or "wmf"
      device= jpeg,
      plot = p6,
      width = 10,
      height= 7,
      dpi = 1000)
      Thanks for positive feedback! Glad you liked the video! :)

    • @Learner_2000
      @Learner_2000 Месяц назад

      @@yuzaR-Data-Science Greeting from Nepal!
      Now, I can made a beautiful graph with well appearance by adjusting point.args ,centrality.point.args, centrality.label.args, and theme. I was not hear even the package name ggstatplot , after watching your videos my interest raising day by day and now I can easily make a publication graph .Thank you so much .

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  Месяц назад +1

      You are very welcome 🙏 thank you for watching!