How to Plot Counts in R: A Step-by-Step Guide

Поделиться
HTML-код
  • Опубликовано: 21 янв 2025

Комментарии • 28

  • @tomhenry-datasciencewithr6047
    @tomhenry-datasciencewithr6047  4 года назад +5

    Subscribe if you found this video helpful, because more videos on tidyverse tips are coming in the next couple of weeks!

  • @glennhighcoveexploresstuff
    @glennhighcoveexploresstuff 4 года назад +2

    Thank you for this, it was a great series of tutorials on not just ggplot2 but other useful ways of quickly reviewing and organizing data. I definitely would love to see more tutorials like this, including ones covering basic R functions you find vital to know, more tidyverse exploration, and tidytext as well.
    Edit -- just saw some of your other videos on your list, I will check those out too. Subscribed!

  • @nattyshakti
    @nattyshakti Год назад

    This was SO HELPFUL for a project i'm doing for one of my classes. THANK YOU!

  • @pradeepviv3672
    @pradeepviv3672 4 года назад +1

    Thanks very much Tom, as usual, your R tutorials are spot on.

  • @Inquire98
    @Inquire98 4 года назад +1

    'Thank GOD'🙏🏾 and thank you very much for sharing your support and time 😉 I believe feel KNOW think and trust your video presentation was REALLY Really really helpful 👍🏿 Do you have a COMPLETE tutorial on this subject 🙄 Let me know 😁 I look forward to hearing from you 🤔

  • @P_DOGG_ALI
    @P_DOGG_ALI 2 года назад

    what if the other category is the biggest? do we use a different factor?

  • @davidkindermann8502
    @davidkindermann8502 2 года назад +1

    What’s the difference between coord_flip and just switching x and y within aes?

    • @tomhenry-datasciencewithr6047
      @tomhenry-datasciencewithr6047  2 года назад

      Good question. As of the past few years ggplot allows you to switch the x and y between axes without using coord_flip (so far as I know that happened 2-3 years ago).
      One reason I like to use coord_flip is that sometimes I find it easiest to build the plot 'vertically' every time, and then flip plots 90 degrees if needed to make them look better (particularly for boxplots if there are many labels). That way I don't have to update other parameters in other parts of the plot (e.g. yintercept).
      For example, take
      library(tidyverse)
      iris %>%
      ggplot(aes(Species, Petal.Width)) +
      geom_boxplot() +
      geom_hline(yintercept = 1.0, linetype = "dotted") +
      geom_hline(yintercept = 2.0, linetype = "solid")
      Now if I want to flip it,
      + coord_flip()
      is easy - otherwise I need to update both the ggplot() section and the geom_hlines as well (to use xintercept).
      But in reality either way would be fine.

    • @davidkindermann8502
      @davidkindermann8502 2 года назад

      @@tomhenry-datasciencewithr6047 Ah I see. Thanks for the detailed answer :)

  • @glennhighcoveexploresstuff
    @glennhighcoveexploresstuff 4 года назад +2

    One other question -- were you using any keyboard shortcuts to have the autocomplete come up so quickly? I'm familiar with the tab function and auto-suggest but I haven't seen anyone's prompts come up quite so quickly and it was truly entertaining watching you code, almost like watching ballet. In another video you might mention some of the things you do that make things go so quickly. I'm beginner-intermediate and have hacked some decent projects together, but in the process probably bypassed some better practices a trained person would have learned from the start vs. self-taught.

    • @tomhenry-datasciencewithr6047
      @tomhenry-datasciencewithr6047  4 года назад +2

      That's a great idea, Glenn! I'll cover some of the things I do to 'speed up' the process of working in RStudio (I still have much to learn!).

  • @N1loon
    @N1loon 3 года назад +1

    Thank you, this helped me a lot!
    One question: Is there an option to include empty counts in the bar plot at 14:54?

    • @tomhenry-datasciencewithr6047
      @tomhenry-datasciencewithr6047  3 года назад +1

      Absolutely!
      Example:
      volcano_region_and_type_counts %>%
      expand(region, primary_volcano_type) %>%
      left_join(volcano_region_and_type_counts, by = c("region", "primary_volcano_type")) %>%
      mutate(n = replace_na(n, 0)) %>%
      mutate(primary_volcano_type = reorder_within(
      x = primary_volcano_type,
      by = n,
      within = region
      )) %>%
      ggplot(aes(x = primary_volcano_type, y = n, fill = region)) +
      geom_col(show.legend = FALSE) +
      scale_x_reordered() +
      coord_flip() +
      facet_wrap(~region, scales = "free") +
      labs(x = "Volcano type", y = "Count of volcanoes")
      Explanation:
      1) We use volcano_region_and_type_counts %>% expand(region, primary_volcano_type) to generate all combinations of region and primary_volcano_type.
      2) Then we join on the actual data based on the region and primary_volcano_type.
      3) Replace NAs with 0 to fix how the bars are sorted.
      4) Then the rest is the same as before.
      :) :) :)

    • @tomhenry-datasciencewithr6047
      @tomhenry-datasciencewithr6047  3 года назад +1

      Follow-up: If you want all the labels (volcano types) to have the SAME order,
      replace THIS
      mutate(primary_volcano_type = reorder_within(
      x = primary_volcano_type,
      by = n,
      within = region
      )) %>%
      with THIS:
      mutate(primary_volcano_type = fct_reorder(primary_volcano_type, n)) %>%

    • @N1loon
      @N1loon 3 года назад

      @@tomhenry-datasciencewithr6047 Wow! Thank you so much for this detailed explanation!
      Just tried it out and it works perfectly. And you even offered an example of a self-join in your code, which is something I always wondered about when to apply it.
      This helped me a lot!

    • @tomhenry-datasciencewithr6047
      @tomhenry-datasciencewithr6047  3 года назад +1

      Glad it helped!
      The expand() function plus left join is very useful in these cases!
      tidyr.tidyverse.org/reference/expand.html

  • @TheJodonnell1111
    @TheJodonnell1111 2 года назад +1

    Awesome video, very clear and well explained. You are a great teacher. However, I am bumping into an issue. I have a 700mb dataset with 6MM rows and I want to calculate the top 20% of objects in a text string. Everything works well and I can get the count results, the variable is in my Environment, but it is null in the pane and when I try to plot it I get an error message that the object can't be found. This is after successfully assigning the variable and seeing the count results. Any suggestions? Been searching for over an hour. Thanks for any help, subscribed and looking forward to your other videos.

    • @TheJodonnell1111
      @TheJodonnell1111 2 года назад +1

      I also went through your tutorial in this video and had the same error message. When I get to line 30 I receive an error that "object 'volcano_type_counts' not found". For some reason my assigned variables disappear. Any advice?

    • @tomhenry-datasciencewithr6047
      @tomhenry-datasciencewithr6047  2 года назад

      Hi James! Can you post the code snippets you are using here?
      (for both getting the counts and plotting the count results)

  • @janidelemmanuelcastaneda8318
    @janidelemmanuelcastaneda8318 4 года назад +1

    Hi! Can you suggest simple learning projects for beginners. Maybe using a kaggle dataset and the goal is to use basic dplyr verbs and ggplot2. Thanks. I think it would be a great content.

  • @j7andrew
    @j7andrew 3 года назад

    You're awesome man. Thanks for making this.

  • @afonsoosorio2099
    @afonsoosorio2099 Год назад

    Awesome, extremely handy and Just in time.

  • @camilosimancasmorelo6722
    @camilosimancasmorelo6722 2 года назад

    Should do more videos ! Interesting !

  • @anelkachelsea2001
    @anelkachelsea2001 3 года назад

    Why do I only stump into this after my coding interview 😭😭😭