Dear Professor, I have looked at many R tutorial videos for self-learning. Your lectures are by far the best with absolute clarity and with a great deal of explanation. Slides are very useful. Thank you so much..
Just a note at 8:00. Object assignment does not _need_ to be at the start of a pipe. It can be at the end through the use of a right facing arrow -> or equals sign. Sure it may be standard to go at the top, but with pipes reading left to right top to bottom, I find it more intuitive to assign the output of the pipe at the end of said pipe. For example:input _df %>% filter(some_var == "value") %>%group_by(some_var) -> output_dfFrom a consistency standpoint, I still tend to assign the output at the start of the pipe, but its absolutely doesnt need to always be at the start.
In most classes I note the right assignment operator (as well as bidirectional operator), but not in these quick workshops as I don't spend much time on what is considered good R style (e.g., no equals assignment).
Thank you for posting this on RUclips. I have the habit of going through lessons at 1.5x (or even faster), but I loved this class and enjoyed every minute of the full length. I loved your comment about "spiraling" on a problem, that is very applicable to life (and graduate research in general) and not only to programming.
a little correction at 40:30, it's not allowed anymore to use funs() inside summarize_at(). Rather it is suggested to use list(). For example: list(mean=mean, sd=sd)
Charles, You might not be interested in this but I think the dplyr's "lag" function also works in the Usable Dates part. That is mutate(date = date.entered + lag(week, default=0) * 7)mutate(date = date.entered + lag(week, default=0) * 7). By setting default=0 you avoid a NA in the first row.
wow this is one of the most informational videos on this Ive seen. Ive been having this issue on a dataframe not being able to calculate the mean/max in a column that has null values in it. my line currently looks like trackman %>% group_by(tagged_pitch_type) %>% summarise(mean(as.numeric(spin_rate, na.rm = TRUE))) it works for a few of the different pitch types but the ones that have null values dont work.
You mean NA values, not NULL values, I assume. If so, you just have an argument out of place in summarize: summarise(spin_rate = mean(as.numeric(spin_rate), na.rm = TRUE)).
46:31 Thanks Charles for your straightforward explanation! However, I have been trying to apply the group_by function, but R gives me this, Error: unexpected ')' in " n = n())"
You likely have an extra ) somewhere or are missing something higher up in the code. Check RStudio for a red mark on the left side pointing out an error in the code.
Thanks! To make my question clear, here is my code library(magrittr) aa %>% group_by(Year) %>% summarise(Life.expectancy mean = mean(Life.expectancy), Life.expectancy median = median(Life.expectancy), n = n()) %>% And they gave me this in console: Error: unexpected symbol in " Life.expectancy median" > n = n()) %>% Error: unexpected ')' in " n = n())"
@@patrickmuvunyi55 Two issues: (1) The code ends with a pipe, (2) there are spaces in variable names, which is not permitted unless surrounded by backticks (`). Fixed: aa %>% group_by(Year) %>% summarise(Life.expectancy.mean = mean(Life.expectancy), Life.expectancy.median = median(Life.expectancy), n = n())
for me and current gen i find that starting with Pen first works much better and ending with Pen inputting more in between making it flow liquidity so much easier-- Teplace pen with Pipe my my too many pens before pipes airplanes eject whoa
I guess gather and spread are now pivot_longer and pivot_wider... It is hard to keep up with this changes (gather and spread actually made sense to me).
I keep getting errors in 1:11:13 Error: unexpected symbol in: " select(-minutes, -seconds) summary" There are a few codes from your slides keeps giving me a error message; for example; billboard_1 %>% select(artist, track, weeks_at_1) %>% distinct((artist, track, weeks_at_1) %>% arrange(desc(weeks_at_1)) %>% head(7) I did name billboard-1 to a different variable, due to billboard_2000 gave me problems. Please check!
I'd advise looking carefully at the code and perhaps checking the website: clanfear.github.io/Intermediate_R_Workshop/ In your code example, for instance, you have an extra ( in your distinct() call. In your select() error you likely also have an added or missing character like ( or ,
In regards to the preponderance of NAs in the Billboard rank data would it not be rendered useless in terms of data analysis as it stands (without some sort of fancy imputation etc)?
Billboard has two types of NAs: (1) False NAs that appear when a song is no longer on the billboard, which are the result only of the data being in wide format. (2) What appear to be true NAs where some songs are no longer tracked after like 20 weeks (truncated observations). If you were modeling these data, you could use something like a survival model for the truncated observations.
@@cclanfear I was wondering about the same thing. Survival analysis might work here. NAs would be censored data. This might be an interesting problem to tackle in class (if you have not done so already). Thanks.
Dear Professor, I have looked at many R tutorial videos for self-learning. Your lectures are by far the best with absolute clarity and with a great deal of explanation. Slides are very useful. Thank you so much..
i wouldn't watch any other R tutorial after watching this awesome tutorial. thank you!
amazing video! congrats
one of the most complete, professional, well explained and slide supported video on R I found on YT, thank you
Thanks, much appreciated!
Just a note at 8:00. Object assignment does not _need_ to be at the start of a pipe. It can be at the end through the use of a right facing arrow -> or equals sign. Sure it may be standard to go at the top, but with pipes reading left to right top to bottom, I find it more intuitive to assign the output of the pipe at the end of said pipe. For example:input _df %>% filter(some_var == "value") %>%group_by(some_var) -> output_dfFrom a consistency standpoint, I still tend to assign the output at the start of the pipe, but its absolutely doesnt need to always be at the start.
In most classes I note the right assignment operator (as well as bidirectional operator), but not in these quick workshops as I don't spend much time on what is considered good R style (e.g., no equals assignment).
I'm mad at everyone in the world for not pointing me to this video sooner!
This is the best tidyverse video on RUclips. You have a new subscriber. Thanks for sharing!
Super great video! Thank you for the upload.
Thank you for posting this on RUclips. I have the habit of going through lessons at 1.5x (or even faster), but I loved this class and enjoyed every minute of the full length. I loved your comment about "spiraling" on a problem, that is very applicable to life (and graduate research in general) and not only to programming.
Thank you, much appreciated!
I'm new to R and stumbled upon your video. This is the best thing happened to me. Thank you so much Charles
I have been working with R for a few years now and this is some of the best stuff I have come across. Thanks!
Thank you for sharing this class, I have learned a lot! Greetings from Costa Rica
Sir, you are great!! Really love your classes.
Thank you Charles for the crystal clear explanations... 🙏
Dude loves pipes.
a little correction at 40:30, it's not allowed anymore to use funs() inside summarize_at(). Rather it is suggested to use list(). For example: list(mean=mean, sd=sd)
What a useful presentation, very clear, easy to follow, well explained, thank you for preparing this video!
So smooth. Thank you
Best dplyr course ever! Thank you so much man!!
Excellent presentation! Thanks for sharing.
thank you so much for such a great lecture !!
Charles,
You might not be interested in this but I think the dplyr's "lag" function also works in the Usable Dates part. That is mutate(date = date.entered + lag(week, default=0) * 7)mutate(date = date.entered + lag(week, default=0) * 7). By setting default=0 you avoid a NA in the first row.
amazing job! Thanks a lot
A really really good explanation ..
very good!谢谢!
Very helpful Charles! Thank you!
awesome presentation! thanks a lot
Great video! Well explained!
wow this is one of the most informational videos on this Ive seen. Ive been having this issue on a dataframe not being able to calculate the mean/max in a column that has null values in it. my line currently looks like
trackman %>% group_by(tagged_pitch_type) %>% summarise(mean(as.numeric(spin_rate, na.rm = TRUE)))
it works for a few of the different pitch types but the ones that have null values dont work.
You mean NA values, not NULL values, I assume. If so, you just have an argument out of place in summarize: summarise(spin_rate = mean(as.numeric(spin_rate), na.rm = TRUE)).
Very nice video, thank you!
Awesome thanks for sharing!
46:31
Thanks Charles for your straightforward explanation! However, I have been trying to apply the group_by function, but R gives me this, Error: unexpected ')' in " n = n())"
You likely have an extra ) somewhere or are missing something higher up in the code. Check RStudio for a red mark on the left side pointing out an error in the code.
Thanks! To make my question clear, here is my code
library(magrittr)
aa %>%
group_by(Year) %>%
summarise(Life.expectancy mean = mean(Life.expectancy),
Life.expectancy median = median(Life.expectancy),
n = n()) %>%
And they gave me this in console:
Error: unexpected symbol in " Life.expectancy median"
> n = n()) %>%
Error: unexpected ')' in " n = n())"
@@patrickmuvunyi55 Two issues: (1) The code ends with a pipe, (2) there are spaces in variable names, which is not permitted unless surrounded by backticks (`). Fixed: aa %>%
group_by(Year) %>%
summarise(Life.expectancy.mean = mean(Life.expectancy),
Life.expectancy.median = median(Life.expectancy),
n = n())
@@cclanfear I appreciate, Sir! It has finally worked! Am not going to fail this anymore!
for me and current gen i find that starting with Pen first works much better and ending with Pen inputting more in between making it flow liquidity so much easier-- Teplace pen with Pipe my my too many pens before pipes airplanes eject whoa
I guess gather and spread are now pivot_longer and pivot_wider... It is hard to keep up with this changes (gather and spread actually made sense to me).
Yep, I was fine with gather and spread, but in changing these they also added some useful features. They're a bit more powerful.
I keep getting errors in 1:11:13 Error: unexpected symbol in:
" select(-minutes, -seconds)
summary"
There are a few codes from your slides keeps giving me a error message; for example;
billboard_1 %>%
select(artist, track, weeks_at_1) %>%
distinct((artist, track, weeks_at_1) %>%
arrange(desc(weeks_at_1)) %>%
head(7)
I did name billboard-1 to a different variable, due to billboard_2000 gave me problems.
Please check!
I'd advise looking carefully at the code and perhaps checking the website: clanfear.github.io/Intermediate_R_Workshop/ In your code example, for instance, you have an extra ( in your distinct() call. In your select() error you likely also have an added or missing character like ( or ,
nice video.
In regards to the preponderance of NAs in the Billboard rank data would it not be rendered useless in terms of data analysis as it stands (without some sort of fancy imputation etc)?
Billboard has two types of NAs: (1) False NAs that appear when a song is no longer on the billboard, which are the result only of the data being in wide format. (2) What appear to be true NAs where some songs are no longer tracked after like 20 weeks (truncated observations). If you were modeling these data, you could use something like a survival model for the truncated observations.
@@cclanfear
I was wondering about the same thing. Survival analysis might work here. NAs would be censored data. This might be an interesting problem to tackle in class (if you have not done so already). Thanks.
No one caught onto the incontinent joke :(
HII