Using the purrr and broom R packages to easily perform thousands of statistical tests (CC112)

Поделиться
HTML-код
  • Опубликовано: 10 окт 2024

Комментарии • 35

  • @Riffomonas
    @Riffomonas  3 года назад +4

    Thanks to @Alex Cheong for calling my attention to a small bug in the code that I show around 9:58. I realized that I still had the data grouped when running p.adjust and so the data in p.value in p.experiment are the same. p.adjust worked, but it only adjusted the p-values within each group separately. Since there was only one test in each group, nothing changed. I could have done two things differently to get p.adjust to work...
    * I could have inserted ungroup() between the unnest and mutate lines.
    * Instead of using group_by(taxon) %>% nest() further up in the code, I could have done nest(data=-taxon)
    For this example there's no practical difference between the code I wrote and what the code should have been because the significant p-values were so small. Regardless, the data shouldn't be grouped when running p.adjust. I'll be sure to quickly comment on this in the next episode.

    • @KN-tx7sd
      @KN-tx7sd 3 года назад

      Thanks, Pat. I just noticed that. If we were to do a post-hoc multiple comparisons after an ANOVA which test will you suggest. And can it be done as a continuation I mean using piping after the initial tests using ANOVA is completed.

    • @Riffomonas
      @Riffomonas  3 года назад +1

      @@KN-tx7sd You might try using the TukeyHSD function

    • @KN-tx7sd
      @KN-tx7sd 3 года назад +1

      @@Riffomonas Thanks, Pat

  • @EnCalleBelgrano
    @EnCalleBelgrano 2 года назад +7

    This is really just an amazing set of tutorials. I found this video particularly helpful - after struggling all day trying to get tidy summary tables by group, this video was general enough and yet specific enough to let me do it in just a couple of minutes - but I think all of the videos of yours that I've watched are very clear, high-content, and worth the time!

    • @Riffomonas
      @Riffomonas  2 года назад

      Thank you 🙏 pleas spread the word so more people can benefit 😊

  • @gruntshuffle
    @gruntshuffle 3 года назад +5

    This is a great tutorial and introduced a completely new work flow to me. Thank you for that! I'm excited to try it out, and excited to see more of your videos.

    • @Riffomonas
      @Riffomonas  3 года назад

      Hey Chris - thanks so much! I have to admit it’s taken me a few reps to get used to the workflow. It’s pretty great now that I have the hang of it

  • @djangoworldwide7925
    @djangoworldwide7925 2 года назад +1

    This is a champ method of statistical analsys

  • @robertcline5314
    @robertcline5314 3 года назад +3

    Really nice presentation of broom... I sure like the opportunity to watch you work through this. Wondering if I'll ever get fluid with, I thank you for these presentations.

    • @Riffomonas
      @Riffomonas  3 года назад +3

      Can I let you in on a secret? 😂 Before I started making videos using the map functions and broom, I had no confidence in how to use them. Repeatedly practicing them and looking for opportunities to use them has really made a huge difference. You can do it!

  • @cyrillejarrin103
    @cyrillejarrin103 3 года назад +1

    Thank you !!!Did not know these packages. I used to make for loops in order to repeat several tests ....

    • @Riffomonas
      @Riffomonas  3 года назад

      Awesome! They’re a great set of tools

  • @vikrantnag86
    @vikrantnag86 3 года назад +1

    Thank you Pat. We have learnt a lot looking at your videos. It will be great Pat if you can make a videos series on how to do Time series forecasting in R especially in supply chain sku forecast. It will help a lot of us who wants to learn it. Thanks again

    • @Riffomonas
      @Riffomonas  3 года назад

      Hi Vikrank - thanks for the suggestion. Supply chain stuff is a bit outside my area of expertise, but we'll see if I can't figure out something for a future episode

  • @Riffomonas
    @Riffomonas  3 года назад +1

    What are some of your favorite methods for testing the significance of differences between taxa? Why do you prefer it?

  • @joaosaraiva8774
    @joaosaraiva8774 3 года назад +1

    Thanks for the nice explanation... Will definitely try it out

    • @Riffomonas
      @Riffomonas  3 года назад

      Thanks for watching João!

    • @joaosaraiva8774
      @joaosaraiva8774 3 года назад

      Have you tried to use this package in WGS data instead of amplicon? I realize that in metagenomics data, for example, we end up with different levels of taxonomy. In some samples we will have classifications up to species level whilst in others only up to family level. Any suggestions on how to apply the broom package to this?

  • @sven9r
    @sven9r 3 года назад +1

    very helpful again - I love this channel

    • @Riffomonas
      @Riffomonas  3 года назад +1

      Glad to hear it! Thanks for tuning in :)

  • @KN-tx7sd
    @KN-tx7sd 3 года назад +2

    Thanks Pat, this is one of the most informative tutorials, there is no better way to explain the broom package. Can the same format used for lm() or glm() on big dataframes with apply function to do regression analysis.

    • @Riffomonas
      @Riffomonas  3 года назад +1

      This workflow should replace the apply commands and be far more powerful. Tidy should be able to convey the output of those models into a data frame

  • @alexcheong5790
    @alexcheong5790 3 года назад +2

    I don't think p.adjust() actually changed the p.value within the pipeline. Had to add n=length(.$p.value) within p.adjust().

    • @Riffomonas
      @Riffomonas  3 года назад +1

      Hey Alex - thanks for your comment and calling my attention to this! I realized that I still had the data grouped when running p.adjust. Basically, it worked, but it only adjusted the p-values within each group separately. I could have done two things differently to get p.adjust to work...
      * I could have inserted ungroup() between the unnest and mutate lines.
      * Instead of using group_by(taxon) %>% nest(), I could have done nest(data=-taxon)
      For this example there's no practical difference between the code I wrote and what the code should have been because the significant p-values were so small. Regardless, the data shouldn't be grouped when running p.adjust. Thanks again! I'll be sure to quickly comment on this in the next episode.

  • @larissacury7714
    @larissacury7714 2 года назад +1

    Very interesting, indeed! but I couldn't download the .xlsx file you've used in the video

    • @Riffomonas
      @Riffomonas  2 года назад

      The data can be downloaded at github.com/riffomonas/minimalR-raw_data/releases/tag/0.3

  • @samadhigunathunga2597
    @samadhigunathunga2597 Год назад

    HI Pat, thanks for this video. I am learning a lot from your channel. I have a question. I am using this code for my amplicon data which I got for my soil samples. However, when it comes to finding pairwise wilcox test step, R gives a warning. This is the warning.
    Warning message:
    There were 236 warnings in `mutate()`.
    The first warning was:
    ℹ In argument: `pairwise_tests = map(...)`.
    ℹ In group 2: `taxon = *Methanosarcina*(OTU 59)`.
    Caused by warning in `wilcox.test.default()`:
    ! cannot compute exact p-value with ties
    Can you please tell me what is the reason for this and how can I go forward with this analysis?

  • @nsaini1029
    @nsaini1029 2 года назад +1

    Can you please make a video on how to calculate diversity indexes from relative abundance of taxonomy? or have you already made it and I haven't seen it? Thank you.

    • @Riffomonas
      @Riffomonas  2 года назад +1

      I’m on it. I’ll have one in a few weeks

  • @sven9r
    @sven9r 3 года назад +1

    I'm asking myself, could we use the nest() and tidied() function for PERMANOVAS as well ? I have to do a bunch of those, but I cannot figure out how to get a single OTU and a single META data into the nest() - do you have any expertise on that ?

    • @Riffomonas
      @Riffomonas  3 года назад +1

      I'm not sure how it would apply to permanova, but you can do nest(my_column=c(otu, meta)) to nest columns otu and meta into a column called my_column

  • @nettlesome7125
    @nettlesome7125 2 года назад +1

    When demonstrating, things get confusing when you don't use unique identifiers for columns; in other words, "data" is not a good name for a column, since it's a standard parameter for functions. "my_data" would be fine.

    • @Riffomonas
      @Riffomonas  2 года назад

      Thanks for watching! Data is actually the default name that the nest function creates.