Using dplyr's group_by function with and without summarize (CC233)

Riffomonas Project

Просмотров 5 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 10 янв 2025

Комментарии • 33

@haraldurkarlsson1147 10 месяцев назад
Pat, I think I have pointed this out before but the fpp3 package will do most of these things that your are doing with much simpler code. fpp3 was after all designed for time series. I still love your code gymnastics and have watch some of your videos multiple times - each time I learn something new.
Thanks!
@jean-claudegolovine5725 10 месяцев назад
Hello from Scotland. Many thanks for this excellent video!
@szco9814 2 года назад ⁺¹
Your video is a gem!
@Riffomonas 2 года назад
Thanks!🤓
@timmytesla9655 2 года назад ⁺¹
Wonderful video as usual. Thumbs up.
@Riffomonas 2 года назад
Thanks Timmy!
@caseyj9 2 года назад ⁺¹
I like to save versions of my data after/as I clean it as .RDS files so I can see what I did/reproduce easily later.
People asking about organization: Usually I group my projects with /background_info /in_data /out_data /code as separate directories. I don't think there's anything special about those files except that it's organized enough and general enough to be consistent so I easily program and reuse paths across projects. I organized this way originally when reading about reproducible research and data sharing in neuroscience and psychology, so you might want to see if there's something that a group has suggested for your field that you can work within (if you want to data share). If it's a big project I also have a README with dependencies/version info and an RProj with a source.R that auto-opens and runs everything.
Thanks Dr. Schloss! Learning a lot here :)
@Riffomonas 2 года назад ⁺¹
Awesome! My only caution against Rds files is that they limit you to R and they aren’t text files. I prefer to work with csv/tsv files as much as possible
@djangoworldwide7925 2 года назад ⁺¹
Beautiful analysis work.
@Riffomonas 2 года назад
Thanks! I’m glad people are enjoying it
@PeperazziTube 2 года назад ⁺¹
Great stuff. You can make your life easier sometimes by using the %in% operator, e.g normalized_range = year %in% 1951:1980 also gives you TRUE/FALSE indicator and more concise code. The nice thing about the %in% operator is that it works on many datatypes (bools, integers, reals, chars) in both lists and vectors.
@Riffomonas 2 года назад
Thanks! It’s all a matter of what I remember when I’m under the spotlight of recording 😂
@dasrotrad 2 года назад ⁺²
Pat you have tutorials for all levels, which is fabulous. You are so prolific, unfortunatly, I can't keep up with all you produce. You are amazing. Somehow I missed "riffomonas." Where does that word, "Riffomonas", come from?
@Riffomonas 2 года назад ⁺²
Hah! It comes from the idea of riffing in music but riffing on other peoples code. My hope is that people can see how I riff on my own code to do the same for their own purposes. The “omonas” is a common ending for bacteria
@PhilippusCesena 2 года назад ⁺¹
Excellent job!
@Riffomonas 2 года назад
Thanks!
@haraldurkarlsson1147 2 года назад ⁺¹
I did not see the same trend you see in your data - namely a gradual increase. The curve for my local station (near the southern tip of Lake Michigan) is essential flat. What we may be looking at is the moderating effect of the lake. But I do see the cold October of 1925 (the relative deviation is -7.3) but I am missing measurements for 1917. Interesting stuff - also a good lesson in how to deal with NAs.
Please bring more stuff like this with broad appeal and data that is easily and freely obtained.
Thanks.
@Riffomonas 2 года назад
Cool results and insights! 🤓
@haraldurkarlsson1147 10 месяцев назад
Pat,
So when you replace the empty spaces with zero was that a form of imputation? Basically replacing missing values?
@shadyamigo 2 года назад ⁺¹
Thank you for another great video. Quick Q What does the ‘group’ argument do in the ggplot aesthetics as you also have Color set to year. Thank you
@Riffomonas 2 года назад
They group aesthetic here links all the data from the same year together. You could use color=year but then every year would be a different color. Instead I used color=is_this_year to get the two colored figure.
@shadyamigo 2 года назад ⁺¹
@@Riffomonas thank you
@shadyamigo 2 года назад ⁺¹
I must have missed the last few minutes when I posted the earlier question. I meant before you added the is_this_year column you already had group= year Color = year so to rephrase my question at that point of the tutorial is the group parameter doing anything in addition to the Color parameter as both are set to year at that point
@Riffomonas 2 года назад ⁺¹
Right - in this case they do the same thing. I tend to use group for line plots even if it’s redundant with color just to be safe
@bassamabdelnabi3117 2 года назад ⁺¹
Totally awesome… man … you really explain things very well… you hit the spot … thanks so much please keep doing great work and help people
@Riffomonas 2 года назад ⁺¹
Thanks for the encouragement 🤓. Im glad people are finding this thread of videos helpful
@r.hainez2131 2 года назад ⁺²
could you please explain how you manage your .R files (workflow wise)? And why setwd() is not your favorite?
@Riffomonas 2 года назад ⁺¹
Using paths in R and why you shouldn't be using setwd (CC179)
ruclips.net/video/StqDYjM6ULo/видео.html
@sven9r 2 года назад ⁺¹
@R.Hainez look for the here package very useful
@szco9814 2 года назад ⁺¹
Hello Boss! Could you please elaborate why you drop the groups after you group by and summarise. It was so confusing that you said when group by and summarize will remove the grouping to the right. I did not see any change after you drop the groups. The tibble size is 1558*3 which is exactly same size compared to the tibble without drop groups. Thank you sir!
@Riffomonas 2 года назад ⁺¹
Thanks for watching and for your question! It doesn’t change the size of the tibble only the grouping or structure of the tibble. I remove the groupings because they can mess with downstream processes. If I did another mutate with the data still grouped there could be unintended results
@dmalarekable 2 года назад ⁺¹
I still can't wrap my head around the fact why you normalize the temps between 50's and 80's. Shouldn't you normalize between all the years?
@Riffomonas 2 года назад ⁺³
Here’s a FAQ describing the idea of the temperature anomaly and why nasa does it this way… data.giss.nasa.gov/gistemp/faq/

Следующие

Автовоспроизведение

Removing outliers in R with tools from dplyr and ggplot2 (CC232)