Thank you for this, it was a great series of tutorials on not just ggplot2 but other useful ways of quickly reviewing and organizing data. I definitely would love to see more tutorials like this, including ones covering basic R functions you find vital to know, more tidyverse exploration, and tidytext as well. Edit -- just saw some of your other videos on your list, I will check those out too. Subscribed!
'Thank GOD'🙏🏾 and thank you very much for sharing your support and time 😉 I believe feel KNOW think and trust your video presentation was REALLY Really really helpful 👍🏿 Do you have a COMPLETE tutorial on this subject 🙄 Let me know 😁 I look forward to hearing from you 🤔
Good question. As of the past few years ggplot allows you to switch the x and y between axes without using coord_flip (so far as I know that happened 2-3 years ago). One reason I like to use coord_flip is that sometimes I find it easiest to build the plot 'vertically' every time, and then flip plots 90 degrees if needed to make them look better (particularly for boxplots if there are many labels). That way I don't have to update other parameters in other parts of the plot (e.g. yintercept). For example, take library(tidyverse) iris %>% ggplot(aes(Species, Petal.Width)) + geom_boxplot() + geom_hline(yintercept = 1.0, linetype = "dotted") + geom_hline(yintercept = 2.0, linetype = "solid") Now if I want to flip it, + coord_flip() is easy - otherwise I need to update both the ggplot() section and the geom_hlines as well (to use xintercept). But in reality either way would be fine.
One other question -- were you using any keyboard shortcuts to have the autocomplete come up so quickly? I'm familiar with the tab function and auto-suggest but I haven't seen anyone's prompts come up quite so quickly and it was truly entertaining watching you code, almost like watching ballet. In another video you might mention some of the things you do that make things go so quickly. I'm beginner-intermediate and have hacked some decent projects together, but in the process probably bypassed some better practices a trained person would have learned from the start vs. self-taught.
Absolutely! Example: volcano_region_and_type_counts %>% expand(region, primary_volcano_type) %>% left_join(volcano_region_and_type_counts, by = c("region", "primary_volcano_type")) %>% mutate(n = replace_na(n, 0)) %>% mutate(primary_volcano_type = reorder_within( x = primary_volcano_type, by = n, within = region )) %>% ggplot(aes(x = primary_volcano_type, y = n, fill = region)) + geom_col(show.legend = FALSE) + scale_x_reordered() + coord_flip() + facet_wrap(~region, scales = "free") + labs(x = "Volcano type", y = "Count of volcanoes") Explanation: 1) We use volcano_region_and_type_counts %>% expand(region, primary_volcano_type) to generate all combinations of region and primary_volcano_type. 2) Then we join on the actual data based on the region and primary_volcano_type. 3) Replace NAs with 0 to fix how the bars are sorted. 4) Then the rest is the same as before. :) :) :)
Follow-up: If you want all the labels (volcano types) to have the SAME order, replace THIS mutate(primary_volcano_type = reorder_within( x = primary_volcano_type, by = n, within = region )) %>% with THIS: mutate(primary_volcano_type = fct_reorder(primary_volcano_type, n)) %>%
@@tomhenry-datasciencewithr6047 Wow! Thank you so much for this detailed explanation! Just tried it out and it works perfectly. And you even offered an example of a self-join in your code, which is something I always wondered about when to apply it. This helped me a lot!
Awesome video, very clear and well explained. You are a great teacher. However, I am bumping into an issue. I have a 700mb dataset with 6MM rows and I want to calculate the top 20% of objects in a text string. Everything works well and I can get the count results, the variable is in my Environment, but it is null in the pane and when I try to plot it I get an error message that the object can't be found. This is after successfully assigning the variable and seeing the count results. Any suggestions? Been searching for over an hour. Thanks for any help, subscribed and looking forward to your other videos.
I also went through your tutorial in this video and had the same error message. When I get to line 30 I receive an error that "object 'volcano_type_counts' not found". For some reason my assigned variables disappear. Any advice?
Hi! Can you suggest simple learning projects for beginners. Maybe using a kaggle dataset and the goal is to use basic dplyr verbs and ggplot2. Thanks. I think it would be a great content.
Subscribe if you found this video helpful, because more videos on tidyverse tips are coming in the next couple of weeks!
Thank you for this, it was a great series of tutorials on not just ggplot2 but other useful ways of quickly reviewing and organizing data. I definitely would love to see more tutorials like this, including ones covering basic R functions you find vital to know, more tidyverse exploration, and tidytext as well.
Edit -- just saw some of your other videos on your list, I will check those out too. Subscribed!
This was SO HELPFUL for a project i'm doing for one of my classes. THANK YOU!
Glad it was helpful!
Thanks very much Tom, as usual, your R tutorials are spot on.
Many thanks!
'Thank GOD'🙏🏾 and thank you very much for sharing your support and time 😉 I believe feel KNOW think and trust your video presentation was REALLY Really really helpful 👍🏿 Do you have a COMPLETE tutorial on this subject 🙄 Let me know 😁 I look forward to hearing from you 🤔
what if the other category is the biggest? do we use a different factor?
What’s the difference between coord_flip and just switching x and y within aes?
Good question. As of the past few years ggplot allows you to switch the x and y between axes without using coord_flip (so far as I know that happened 2-3 years ago).
One reason I like to use coord_flip is that sometimes I find it easiest to build the plot 'vertically' every time, and then flip plots 90 degrees if needed to make them look better (particularly for boxplots if there are many labels). That way I don't have to update other parameters in other parts of the plot (e.g. yintercept).
For example, take
library(tidyverse)
iris %>%
ggplot(aes(Species, Petal.Width)) +
geom_boxplot() +
geom_hline(yintercept = 1.0, linetype = "dotted") +
geom_hline(yintercept = 2.0, linetype = "solid")
Now if I want to flip it,
+ coord_flip()
is easy - otherwise I need to update both the ggplot() section and the geom_hlines as well (to use xintercept).
But in reality either way would be fine.
@@tomhenry-datasciencewithr6047 Ah I see. Thanks for the detailed answer :)
One other question -- were you using any keyboard shortcuts to have the autocomplete come up so quickly? I'm familiar with the tab function and auto-suggest but I haven't seen anyone's prompts come up quite so quickly and it was truly entertaining watching you code, almost like watching ballet. In another video you might mention some of the things you do that make things go so quickly. I'm beginner-intermediate and have hacked some decent projects together, but in the process probably bypassed some better practices a trained person would have learned from the start vs. self-taught.
That's a great idea, Glenn! I'll cover some of the things I do to 'speed up' the process of working in RStudio (I still have much to learn!).
Thank you, this helped me a lot!
One question: Is there an option to include empty counts in the bar plot at 14:54?
Absolutely!
Example:
volcano_region_and_type_counts %>%
expand(region, primary_volcano_type) %>%
left_join(volcano_region_and_type_counts, by = c("region", "primary_volcano_type")) %>%
mutate(n = replace_na(n, 0)) %>%
mutate(primary_volcano_type = reorder_within(
x = primary_volcano_type,
by = n,
within = region
)) %>%
ggplot(aes(x = primary_volcano_type, y = n, fill = region)) +
geom_col(show.legend = FALSE) +
scale_x_reordered() +
coord_flip() +
facet_wrap(~region, scales = "free") +
labs(x = "Volcano type", y = "Count of volcanoes")
Explanation:
1) We use volcano_region_and_type_counts %>% expand(region, primary_volcano_type) to generate all combinations of region and primary_volcano_type.
2) Then we join on the actual data based on the region and primary_volcano_type.
3) Replace NAs with 0 to fix how the bars are sorted.
4) Then the rest is the same as before.
:) :) :)
Follow-up: If you want all the labels (volcano types) to have the SAME order,
replace THIS
mutate(primary_volcano_type = reorder_within(
x = primary_volcano_type,
by = n,
within = region
)) %>%
with THIS:
mutate(primary_volcano_type = fct_reorder(primary_volcano_type, n)) %>%
@@tomhenry-datasciencewithr6047 Wow! Thank you so much for this detailed explanation!
Just tried it out and it works perfectly. And you even offered an example of a self-join in your code, which is something I always wondered about when to apply it.
This helped me a lot!
Glad it helped!
The expand() function plus left join is very useful in these cases!
tidyr.tidyverse.org/reference/expand.html
Awesome video, very clear and well explained. You are a great teacher. However, I am bumping into an issue. I have a 700mb dataset with 6MM rows and I want to calculate the top 20% of objects in a text string. Everything works well and I can get the count results, the variable is in my Environment, but it is null in the pane and when I try to plot it I get an error message that the object can't be found. This is after successfully assigning the variable and seeing the count results. Any suggestions? Been searching for over an hour. Thanks for any help, subscribed and looking forward to your other videos.
I also went through your tutorial in this video and had the same error message. When I get to line 30 I receive an error that "object 'volcano_type_counts' not found". For some reason my assigned variables disappear. Any advice?
Hi James! Can you post the code snippets you are using here?
(for both getting the counts and plotting the count results)
Hi! Can you suggest simple learning projects for beginners. Maybe using a kaggle dataset and the goal is to use basic dplyr verbs and ggplot2. Thanks. I think it would be a great content.
Great idea. Thanks! Coming in the next couple of weeks.
You're awesome man. Thanks for making this.
Awesome, extremely handy and Just in time.
Should do more videos ! Interesting !
Why do I only stump into this after my coding interview 😭😭😭