Using readxl and dplyr to format messy data to see change in poverty with R (CC335)

Поделиться
HTML-код
  • Опубликовано: 3 фев 2025

Комментарии • 16

  • @muhammedhadedy4570
    @muhammedhadedy4570 9 дней назад

    Oh Allah. This video alone worth tons of paid courses.
    I really don't know how to thank you. I appreciate your work, my dear professor.
    Greetings from Egypt.
    ❤❤❤❤

    • @Riffomonas
      @Riffomonas  8 дней назад +1

      Fantastic! Glad it was useful 🤓

  • @tedhermann3424
    @tedhermann3424 11 дней назад +4

    Great video! I think what you wanted for converting all your columns to numeric was the across function. e.g., mutate(across(total:percent, as.numeric). You can use it with summarize as well. Also, FYI, the code in your linked blog post looks to be from your gapminder episode!
    Any thoughts on your next series? The targets package or tidymodels could be interesting.

    • @Riffomonas
      @Riffomonas  11 дней назад +1

      Thanks for the across tip! I'll keep tidymodels in mind for the future

  • @borinsroy8992
    @borinsroy8992 8 дней назад +1

    At 16:44 use mutate(across(-name, as.numeric))

  • @ahmed007Jaber
    @ahmed007Jaber 7 дней назад +1

    Nicely executed, Pat
    If i were to do it, i would approach it differently
    I would use regex to extract the \\d{4} as year
    Then fill down NAs
    Then skip the first couple of tows
    Then mutate(across(colname:colname,as.numeric))
    Then rename

    • @ahmed007Jaber
      @ahmed007Jaber 7 дней назад

      Last step would be to promot first row as headers after skipping the top
      The interesting wonder would be, how would you approach annotating peaks and bottoms in the line. Dynamic annotating so that whatever changes it updates

  • @PeperazziTube
    @PeperazziTube 11 дней назад +3

    One small point of pedantic nitpicking: taking the average poverty rate of all states is not the average national poverty rate, as the population of states varies by 2 orders of magnitude. The original data has the population data by state/year, so a national average could be calculated by data %>% summarize(pct_national = sum(in_poverty)/sum(population, .by = year)

    • @Riffomonas
      @Riffomonas  11 дней назад +1

      You're of course correct - thanks for catching this! When I used code like yours it doesn't appear that the line moves meaningfully from what I had in the video. Well done 🤓

  • @PhilippusCesena
    @PhilippusCesena 11 дней назад

    Thanks for the very useful video, unfortunately we often find ourselves having to deal with datasets that have been collected in a rather unorganized manner.

    • @Riffomonas
      @Riffomonas  10 дней назад +1

      There used to be a hashtag .... #otherpeoplesdata that cataloged some of the more humorous challenges🤓

  • @fabianhellmold9331
    @fabianhellmold9331 11 дней назад

    Another great video. Your plots have helped me a lot for the visualization of a master thesis. When using lineend = “round”, I noticed that the keys in the legend change strangely. Any tips on how to fix this?

    • @Riffomonas
      @Riffomonas  11 дней назад

      Thanks! Hmmm, I'm not seeing that. If I do the following it looks ok...
      library(tidyverse)
      library(gapminder)
      gapminder %>% filter(country %in% c("India", "Afghanistan")) %>% ggplot(aes(x = year, y = lifeExp, color = country)) + geom_line(lineend = "round", linewidth = 2)

    • @fabianhellmold9331
      @fabianhellmold9331 10 дней назад

      @@Riffomonas In my example, I work simultaneously with geom_line and geom_segment, which each have different color groupings. Lineend=“round” draws lines in the keys, which then extend to the left and right. To stay with your code:
      library(tidyverse)
      library(gapminder)
      gapminder %>%
      filter(country %in% c("India", "Pakistan")) %>%
      ggplot(aes(x = year, y = lifeExp, color = country)) +
      geom_segment(aes(y = gdpPercap/10, xend = year, yend = 0,
      color = factor(gdpPercap > mean(gdpPercap))),
      linewidth = 4.8, alpha = 1) +
      geom_line(lineend = "round", linewidth = 2)

    • @Riffomonas
      @Riffomonas  10 дней назад

      @@fabianhellmold9331 Hey - I'm not seeing a difference if lineend="round" or not. It looks like the four values of country have two differeent line widths. If you want to simplify the legend to only have one linewidth, you could add this to the end of your code...
      +
      scale_color_discrete(guide = guide_legend(override.aes = list(linewidth = 1)))

    • @fabianhellmold9331
      @fabianhellmold9331 10 дней назад

      @@Riffomonas Thankts allot! That actually improved my Legend :)