Tidy Data and tidyr -- Pt 2 Intro to Data Wrangling with R and the Tidyverse

Поделиться
HTML-код
  • Опубликовано: 5 дек 2024
  • Data wrangling is too often the most time-consuming part of data science and applied statistics. Two tidyverse packages, tidyr and dplyr, help make data manipulation tasks easier. Keep your code clean and clear and reduce the cognitive load required for common but often complex data science tasks.
    tidyr.tidyverse...
    tidyr.tidyverse...
    tidyr.tidyverse...
    tidyr.tidyverse...
    tidyr.tidyverse...
    ----------------
    Pt. 1: What is data wrangling? Intro, Motivation, Outline, Setup • What is data wrangling...
    /01:44 Intro and what’s covered
    Ground Rules
    /02:40 What’s a tibble
    /04:50 Use View
    /05:25 The Pipe operator:
    /07:20 What do I mean by data wrangling?
    Pt. 2: Tidy Data and tidyr • Tidy Data and tidyr --...
    00:48 Goal 1 Making your data suitable for R
    01:40 `tidyr` “Tidy” Data introduced and motivated
    08:10 `tidyr::gather`
    12:30 `tidyr::spread`
    15:23 `tidyr::unite`
    15:23 `tidyr::separate`
    Pt. 3: Data manipulation tools: `dplyr` • Data Manipulation Tool...
    00.40 setup
    /02:00 `dplyr::select`
    /03:40 `dplyr::filter`
    /05:05 `dplyr::mutate`
    /07:05 `dplyr::summarise`
    /08:30 `dplyr::arrange`
    /09:55 Combining these tools with the pipe (Setup for the Grammar of Data Manipulation)
    /11:45 `dplyr::group_by`
    /15:00 `dplyr::group_by`
    Pt. 4: Working with Two Datasets: Binds, Set Operations, and Joins • Working with Two Datas...
    Combining two datasets together
    /00.42 `dplyr::bind_cols`
    /01:27 `dplyr::bind_rows`
    /01:42 Set operations
    `dplyr::union`, `dplyr::intersect`, `dplyr::set_diff`
    /02:15 joining data
    `dplyr::left_join`, `dplyr::inner_join`, `dplyr::right_join`, `dplyr::full_join`,
    ______________________________________________________________
    Cheatsheets: www.rstudio.co...
    Documentation:
    `tidyr` docs: tidyr.tidyverse.org/reference/
    `tidyr` vignette: cran.r-project...
    `dplyr` docs: dplyr.tidyverse...
    `dplyr` one-table vignette: cran.r-project...
    `dplyr` two-table (join operations) vignette: cran.r-project...
    ______________________________________________________________

Комментарии • 29

  • @MikeDolanFliss
    @MikeDolanFliss 4 года назад +8

    Looking forward to an update on this for the new pivot_longer() and pivot_wider() grammar!

    • @keanujack605
      @keanujack605 3 года назад

      I know Im randomly asking but does anyone know of a method to get back into an instagram account?
      I stupidly lost my account password. I would appreciate any assistance you can give me.

    • @mackgerardo4152
      @mackgerardo4152 3 года назад

      @Keanu Jack Instablaster =)

    • @keanujack605
      @keanujack605 3 года назад

      @Mack Gerardo i really appreciate your reply. I found the site thru google and Im trying it out now.
      Takes quite some time so I will reply here later with my results.

  • @comditek4264
    @comditek4264 5 лет назад +2

    Great video! nicely explained and well delivered with graphics!

  • @eliebordron5599
    @eliebordron5599 4 года назад

    I learned just so much by watching this. I regret I wasn't able to download the datasets, I don't know if it's me or the venerable age of the video

  • @chamodhperera2485
    @chamodhperera2485 4 года назад +3

    Can't download EDAWR from github.
    Error: Failed to install 'EDAWR' from GitHub:
    (converted from warning) cannot remove prior installation of package ‘backports’

  • @AkshayRasal10
    @AkshayRasal10 5 лет назад

    Great Video - Well explained and Easy to understand

  • @shibukalidhasan5815
    @shibukalidhasan5815 6 лет назад +1

    Nicely presented - short and succinct

  • @tamal_sen
    @tamal_sen 6 лет назад +2

    @Garret : Please advise how to import the data sets? I have installed "devtools" package, but unable to install package "EDAWR". Looking for your help. thank you .

    • @williambiggs2308
      @williambiggs2308 6 лет назад +1

      package ‘EDAWR’ is not available (for R version 3.4.1)

    • @VercingetoR3x
      @VercingetoR3x 5 лет назад

      @@williambiggs2308 Using anaconda, how does one create an environment with an older version of base-r (3.5.1)? Is base-r 3.4.1 needed to access the EDAWR package?

    • @FancyTreer032
      @FancyTreer032 5 лет назад +1

      HI, you can create theme by yourself.
      country

    • @amendez721
      @amendez721 4 года назад

      @@FancyTreer032 thank you very much!

  • @johnsonmshiu4837
    @johnsonmshiu4837 4 года назад +1

    Hi, is it possible to use function separate() to separate more than one column using the pipe operator or any other method? thanks

  • @eyadha1
    @eyadha1 2 года назад

    thank you very much for this helpful video

  • @tekoeko
    @tekoeko 3 года назад

    So are gather and spread replaced by pivot_longer and pivot_wider?

  • @Sadia_AustralianJourneys
    @Sadia_AustralianJourneys 4 года назад

    Excellent presentation

  • @kvs123100
    @kvs123100 3 года назад

    12:23 you gave life to me!

  • @ゴリラ-w3h
    @ゴリラ-w3h 5 лет назад +2

    すごく分かりやすい!

  • @My-NaMeS_jEfF
    @My-NaMeS_jEfF 3 года назад

    Do we really want to pivot_wider pollution?
    pollution %>%
    ggplot(aes(city, amount, group = size))+
    geom_bar(aes(fill = size), stat = 'identity', position = 'dodge')

  • @SensiStarToaster
    @SensiStarToaster 4 года назад +1

    Out of date! Please post update with *pivot_* functions, scoped variables and something on non-standard evaluation pleeeeze....

  • @YouTube_ZMS
    @YouTube_ZMS 5 лет назад

    I would prefer more coding examples. 8 minutes in before tidyr package is even introduced. Lets goooooooo

  • @musicspinner
    @musicspinner 3 года назад

    very helpful

  • @vincenzo4259
    @vincenzo4259 2 года назад

    Thanks

  • @InsiderMiner
    @InsiderMiner 3 года назад +1

    nice presentation but the audio is pretty poor. the concept of observation is key

  • @jamespaz4333
    @jamespaz4333 3 года назад

    He looks like Marty Mcfly Senior!!!

  • @InsiderMiner
    @InsiderMiner 3 года назад

    Looking at your first use of gather, it seems that you have not properly assessed what an observation is, have you? I would think an observation here would best be defined as a country. Then, the columns, should be country name, count for 2011,count for 2012,count for 2013, shouldn't it? The way you have it, Country, Year, N; what are the observational units? A year-country? Why not make it a country, as I have suggested?