How to clean and join data from mothur with the dplyr R package (CC101)

Поделиться
HTML-код
  • Опубликовано: 31 янв 2025

Комментарии • 37

  • @Riffomonas
    @Riffomonas  3 года назад +4

    Are there any dplyr functions that you would like to learn more about?

    • @KN-tx7sd
      @KN-tx7sd 3 года назад +1

      relocate (when working with order of column names)

    • @Riffomonas
      @Riffomonas  3 года назад

      @@KN-tx7sd thanks! to be honest, I didn't know about relocate. I've always used select to do these types of things

  • @fioredelsud
    @fioredelsud Год назад

    OMG just came across this video!! Thank you so much for this!! I have been avoiding doing this myself because after so many sleepless nights with my kids, I have been finding it really hard to concentrate and learn the tydiverse tools with this kind of data. With this super clear video you saved me probably days of struggling to do this. You saved an #academicmom with very little time and sleep. THANK YOU!

  • @romulocenci6176
    @romulocenci6176 3 года назад +1

    I did create a project days ago, but didnt even think about how connect to Rstudio, such valuable information, thanks a lot

    • @Riffomonas
      @Riffomonas  3 года назад

      Hey Romulo - glad this was inspiring!

  • @JOHNSMITH-ve3rq
    @JOHNSMITH-ve3rq 3 года назад +2

    Absolutely love this channel - perfect example why. More videos cleaning messy data - the best part of the process!!

    • @Riffomonas
      @Riffomonas  3 года назад +1

      Thank you - great comment! I appreciate the feedback and will be sure to include more steps cleaning messy data in future episodes

  • @ericagardner8249
    @ericagardner8249 Год назад

    Thank you Pat! You are the best!

  • @keynesmeetsschumpeterinanarrow
    @keynesmeetsschumpeterinanarrow 3 года назад +1

    I really like your videos on visualisation but this pivot to data cleaning is very much appreciated. Please consider making videos on missing data visualisations (like those in the nanair package). Thanks!

    • @Riffomonas
      @Riffomonas  3 года назад

      Great suggestion - thanks!

  • @nsaini1029
    @nsaini1029 3 года назад +1

    Pat - these videos are awesome.. learning R from scratch and thanks to you to make this possible!
    I wish you can organize videos for microbiome analysis where I can go through them one by one. It seems most of the videos on youtube are not properly organized currently and hard to locate all microbiome analysis videos - in one list!!

    • @Riffomonas
      @Riffomonas  3 года назад

      Thanks! Have you seen this playlist? ruclips.net/p/PLmNrK_nkqBpIIRdQTS2aOs5OD7vVMKWAi

  • @williamvilchezcruz
    @williamvilchezcruz Год назад

    Excellent tutorial Sir!

  • @Rydaholic
    @Rydaholic 2 года назад +1

    Another amazing tutorial! Thank you!

    • @Riffomonas
      @Riffomonas  2 года назад

      Glad you enjoyed it! 🤓

  • @dasrotrad
    @dasrotrad 2 года назад +1

    Dang Pat…. Awesome!

    • @Riffomonas
      @Riffomonas  2 года назад

      Thanks! I appreciate you for being on the journey with me🤓

  • @chengchenli1677
    @chengchenli1677 3 года назад +1

    Love this channel and enjoyed every R demo video so far! Thank you! Can you do a video on cleaning and matching sequencing SampleIDs (generated from illumina for example) with SampleIDs recorded in metadata. In an ideal situation, they should completely match but often time they partially match.

    • @Riffomonas
      @Riffomonas  3 года назад

      Thanks! I'm not sure I know what you mean. Can you post a small snippet of what the data look like?

  • @N1loon
    @N1loon 3 года назад +3

    Even though it's a task most people despise, I actually really enjoy pre-processing steps before doing visualizations or creating models. It can be really satisfying reading in a messy dataset and cleaning it so that it's in a tidy format :D
    Although I needed some time to fully wrap my head around the gather/spread functions (now pivot_longer and pivot_wider). And I still struggle from time to time conceptualizing how to get from dataframe X to dataframe Y putting it in either a long or wide format...

    • @Riffomonas
      @Riffomonas  3 года назад +1

      Thanks for watching! I’ll be sure to include more of these types of transformations in future episodes

  • @sunkumargurung1172
    @sunkumargurung1172 2 года назад +1

    Thanks a lot, it helped me alot

    • @Riffomonas
      @Riffomonas  2 года назад

      Wonderful - thanks for watching!

  • @afonsoosorio2099
    @afonsoosorio2099 Год назад

    Hi Pat, this is great on joining tables using a common id.
    I am an aspiring data analyst and a beginner with R. Do you have an ideia how to read multiple files from a given path *.csv, into R and append (binding) them in few explicit steps ?
    All files have common structure (similar heads) and csv formatted. There are 12 months datasets.
    I appretiate your assistance.

  • @patriciamiller8286
    @patriciamiller8286 2 года назад

    What do I do if a few of my taxonomic classes are missing and are replaced by NA? For ex if I have Kingdom to order but family and genus are missing "Kindgom: Bacteria, Phylum: Firmicutes, Class: Bacilli, Order: Bacillales (but nothing else afterwards), following this video, the family and genus become NA. If I omit, the entire row disappears (at least that's what I think happens).
    But I still want those rows because they add to the diversity calculations...i may remove them later when I want to discuss taxa specifically but for alpha and beta diversity I want to keep them in. (Hope this is making sense)
    Also, i want to remove the Eukaryota rows without having to go into excel and do it manually.

    • @patriciamiller8286
      @patriciamiller8286 2 года назад

      FYI: I solved the last question; I removed Eukaryota and Unassigned by using filter(str_detect(taxonomy, "Bacteria")...for anyone interested :)

    • @patriciamiller8286
      @patriciamiller8286 2 года назад

      forgot that str_detect is part of the stringr package

    • @patriciamiller8286
      @patriciamiller8286 2 года назад

      realizing the video actually answers this but I didn't "get" it the first time. 😝

    • @patriciamiller8286
      @patriciamiller8286 2 года назад

      Actually, it doesn't, so I added 'mutate(., replace_na(., ""))' in the pipeline after the separate pipe and now I have the blank spaces I needed - for anyone else needing this info hope that helped! Love this channel!!

    • @Riffomonas
      @Riffomonas  2 года назад

      Thanks so much for watching and working with the code using your own data. That’s the best way to learn!

  • @alexw5126
    @alexw5126 4 месяца назад

    King Phillip Came over for Good Spaghetti, Awesome!

    • @Riffomonas
      @Riffomonas  4 месяца назад +1

      lol - glad you love it 🤓

  • @CristinaCampbell
    @CristinaCampbell 3 года назад

    How would you join similar data? I'm pulling temperature data from several dataloggers in the field. The datasets all have the same column names (except for logger ID). I need the data for all the loggers to be aligned by time and grouped by datalogger but I'm not sure how to get there. When I inner join by time I end up with several columns of temp (r gives them all unique names), I'm not sure how to align them in time and then group by datalogger. Thanks for any insight. Love your channel!
    combo combo
    # A tibble: 1,776 x 5
    time f.x dl34 f.y dl35
    1 10/22/2021 15:00 87 1 87 1
    2 10/22/2021 16:00 87 2 87 2

    • @Riffomonas
      @Riffomonas  3 года назад +1

      Try doing the join without the by argument. Alternatively you could also do by=c(“time”, “f”, etc)