Using Roxygen2 to document functions in an R package (CC293)

Join data frames in R with inner_join and anti_join (CC254)

Why You Should Think Twice Before Using Returns in Python

To Brawl AND BEYOND!

"CITADELLE DES MORTS" EASTER EGG GUIDE! - SOLO FULL EASTER EGG TUTORIAL (Black Ops 6 Zombies)

Inside Kelly Ripa & Mark Consuelos’s Sophisticated NYC Townhouse | Open Door | Architectural Digest

Benchmarking R functions for joining data frames (CC292)

Riffomonas Project

Просмотров 948

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 9 дек 2024

Комментарии • 19

@PhilippusCesena 5 месяцев назад
Great video, I was used to dplyr and it is very interesting to see other approaches.
@Riffomonas 5 месяцев назад
Thanks! Glad you enjoyed it 🤓
@tedhermann3424 5 месяцев назад
Just to note, using as.data.table() or setDT() will be considerably faster than data.table(). data.table also comes with its own version of merge() so you don't have to use the funky syntax for a full merge.
@Riffomonas 5 месяцев назад
Thanks for the feedback - I'm finding that if I use as.data.table or setDT, I get similar results as plain data.table and inner_join
@tedhermann3424 5 месяцев назад
@@Riffomonas I made synthetic datasets since I don't have your fasta data. In my test, dtA was the fastest, followed by using setDT and as.data.table. data.table() was similar to dplyr with inner_join. Here is my code:
each_num
@tedhermann3424 5 месяцев назад
@@Riffomonas Apologies if this is showing up for a second time, but I replied earlier and now it seems to be gone.
I made some synthetic data because I don't have your fasta data. I consistently find that dtA is fastest, followed by setDT() and as.data.table(). dplyr and data.table() are comparable. Here is my code:
each_num
@tedhermann3424 5 месяцев назад
@@Riffomonas I've tried replying a few times, but youtube seems to be autoremoving the comment. Maybe something to do with the code snippet.... Anyway, I made large synthetic datasets because I don't have the fasta data, and ran everything again. dtA is consistently fastest, followed by setDT and as.data.table(). Here is my code below.
each_num
@tedhermann3424 5 месяцев назад
@@Riffomonas I've tried replying numerous times, but my comment gets removed each time. I think it doesn't like the code snippet I'm trying to share... Anyway, I made synthetic datasets ~50,000 rows long, where each row is a unique group so that it is comparable to your fasta data. dtA is consistently fastest, followed by setDT and as.data.table. One thing I had to control for was using a copy of the dataframe for setDT (e.g., df_copy
@spacelem 5 месяцев назад ⁺¹
That was super helpful, thank you! I'll admit that joining was something that I hadn't really got the hang of, and even though I have gone through the tutorials, I didn't really appreciate what was going on. Only got rolling to figure out and then I can say I've mastered data.table!
I would add though that the animal_legs_dt[animal_sounds_dt[uniq_animals]] for a full join is... pretty ugly! Instead, data.table provides its own version of merge that looks exactly like base R.
@Riffomonas 5 месяцев назад ⁺¹
Thanks - I hadn't seen data.table::merge. That would simplify things considerably
@joshstat8114 5 месяцев назад
Would you like me to recommend you to use `bench::mark()` whenever you benchmark the expression?
@Riffomonas 5 месяцев назад
Thanks - I've used it in other episodes, but I find the {microbenchmark} is easier to use for some applications
@Jeep-d7c 4 месяца назад
Have you tried the join from the collapse package? It is very fast in my tests. collapse::join(x, y, how="inner", on=c("a"="b"))
@Riffomonas 4 месяца назад ⁺¹
Thanks - I'll have to check that out
@rayflyers 5 месяцев назад
A few thoughts come to mind.
1) dplyr always outputs tibbles. If you're going to use dplyr, it might be worth using tibbles throughout your package. The loss in performance is worth the consistent formatting, and tibbles are just better.
2) dplyr allows for multiple backends (dtplyr, dbplyr, duckplyr, arrow, etc). Would those affect your code? If I call duckplyr::methods_overwrite(), and a package has a custom function that calls dplyr::inner_join() under the hood, would it now call duckplyr::inner_join() under the hood instead?
3) Similarly, If I pipe a dataframe into dtplyr::lazy_dt() and then into a custom join function that calls dplyr::inner_join() under the hood, would it work and use the data.table method? Or would dtplyr just not know how to translate the code?
I know that you're not planning to write a custom join function, but your video still sparked these curiosities in me. Lately I've been looking at these dplyr backends as a way to scale up our work for big data projects without making my team have to learn new syntax, so they've been on my mind a lot. Great video as always!
@Riffomonas 5 месяцев назад
Great - thanks for the feedback. For now, the input to the phylotypr functions will be data.frames, but they should work fine if people provide tibbles or data.tables. The output will be base R structures like lists and character strings.
@j.moggridge 5 месяцев назад
> class(iris)
[1] "data.frame"
> x class(x)
[1] "tbl_df" "tbl" "data.frame"
> iris |> dplyr::inner_join(x) |> class()
[1] "data.frame"
> iris |> dplyr::inner_join(x) |> tibble::as_tibble() |> class()
[1] "tbl_df" "tbl" "data.frame"

Следующие

Автовоспроизведение

Using Roxygen2 to document functions in an R package (CC293)

Using Roxygen2 to document functions in an R package (CC293)

Join data frames in R with inner_join and anti_join (CC254)

Join data frames in R with inner_join and anti_join (CC254)

Why You Should Think Twice Before Using Returns in Python

Why You Should Think Twice Before Using Returns in Python

To Brawl AND BEYOND!

To Brawl AND BEYOND!

"CITADELLE DES MORTS" EASTER EGG GUIDE! - SOLO FULL EASTER EGG TUTORIAL (Black Ops 6 Zombies)

"CITADELLE DES MORTS" EASTER EGG GUIDE! - SOLO FULL EASTER EGG TUTORIAL (Black Ops 6 Zombies)

Inside Kelly Ripa & Mark Consuelos’s Sophisticated NYC Townhouse | Open Door | Architectural Digest

Inside Kelly Ripa & Mark Consuelos’s Sophisticated NYC Townhouse | Open Door | Architectural Digest

Ranking EVERY HERO In Marvel Rivals

Ranking EVERY HERO In Marvel Rivals

Recreating published boxplots from a dilution series using ggplot2 in R (CC323)

Recreating published boxplots from a dilution series using ggplot2 in R (CC323)

Git for R Studio (Part 1)

Git for R Studio (Part 1)

Going for simple with ggplot2 and dplyr (CC320)

Going for simple with ggplot2 and dplyr (CC320)

Submitting an R package to CRAN with help from the devtools package (CC306)

Submitting an R package to CRAN with help from the devtools package (CC306)

Teaching the tidyverse in 2023 | Mine Çetinkaya-Rundel

Teaching the tidyverse in 2023 | Mine Çetinkaya-Rundel

Using ggplot2 and R to recreate a data journalist's figure about Michigan wages (CC321)

Using ggplot2 and R to recreate a data journalist's figure about Michigan wages (CC321)

Comparing and optimizing performance of phyloytypr to mothur (CC304)

Comparing and optimizing performance of phyloytypr to mothur (CC304)

Recreating a grouped and labelled barplot in R with ggplot2 (CC308)

Recreating a grouped and labelled barplot in R with ggplot2 (CC308)

When to use KEEPFILTERS over iterators

When to use KEEPFILTERS over iterators

КОРОЧЕ ГОВОРЯ, ПРЯЧУ ОЦЕНКИ ОТ РОДИТЕЛЕЙ

КОРОЧЕ ГОВОРЯ, ПРЯЧУ ОЦЕНКИ ОТ РОДИТЕЛЕЙ

Ничего не бывает просто так 😂

Ничего не бывает просто так 😂

Путин: «Где заводы» и «где деньги»? Допрос министра

Путин: «Где заводы» и «где деньги»? Допрос министра

Все новогодние подарки Скулбоя // SchoolBoy Runaway

Все новогодние подарки Скулбоя // SchoolBoy Runaway

Криминальный Екатеринбург 90-х: Уралмаш, Синие, Центровые / История самых страшных ОПГ @anton_lyadov

Криминальный Екатеринбург 90-х: Уралмаш, Синие, Центровые / История самых страшных ОПГ @anton_lyadov

Моя ПОКУПКА СГОРЕВШЕГО Родстера 1962 года за 500 000$!

Моя ПОКУПКА СГОРЕВШЕГО Родстера 1962 года за 500 000$!

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

공중에서 다리찢기⁉️😱 Split Balance Challenge

공중에서 다리찢기⁉️😱 Split Balance Challenge