How to Get Data from SQL Databases With R | Step-by-Step Tutorial

Поделиться
HTML-код
  • Опубликовано: 17 дек 2024

Комментарии • 13

  • @rappa753
    @rappa753  2 месяца назад +1

    Thank you for watching my latest video. ♥
    If you liked this video, I'm sure you're also going to love my Data Cleaning Master Class. Clean your data and find insights much faster at data-cleaning.albert-rapp.de/

  • @mignacio79
    @mignacio79 2 месяца назад +1

    Very nice tips and details on the use of glue and SQL query!!! Thanks for explaining them. They are a bit tricky.
    It could have been quite a while until I figured out where the error is. What a nice optimized approach you used for this part of the filtering.
    Great video Albert!

    • @rappa753
      @rappa753  2 месяца назад

      Nice! I'm glad that you find this so useful 🤗🥳

  • @rayflyers
    @rayflyers 2 месяца назад +1

    What a timely video. I'm working on a project using Duck DB from R. I know that Duck DB supports enums, nested data, and pivots, but I have no idea how to use them with DBI or dplyr. I'll put examples in the replies.

    • @rayflyers
      @rayflyers 2 месяца назад

      Example #1: Nested data. Count how many people have data across multiple years for each state (I'm American). My dplyr code looks like this:
      my_data |>
      distinct(person_id, state, year) |>
      summarize(
      ids = list(person_id),
      ids_in_this_year = n(),
      .by = c(state, year)
      ) |>
      mutate(
      ids_in_next_year = map2_int(ids, lead(ids), \(x, y) sum(x %in% y)),
      ids_in_last_year = map2_int(ids, lag(ids), \(x, y) sum(x %in% y)),
      .by = state
      )

    • @rappa753
      @rappa753  2 месяца назад

      Have you seen the {duckplyer} package yet? 🤔

    • @rayflyers
      @rayflyers 2 месяца назад

      Example 2: Convert numerical csv data into factors/enums. The data come in as 1, 2, or 99 and need to have the labels "Yes," "No," and "Unknown." Again, I have a dplyr example, but I'm not sure how to do this conversion with database enums.
      my_data |>
      mutate(
      x1 = factor(x1, levels = c(1, 2, 99), labels = c("Yes", "No", "Unknown")
      )

    • @rayflyers
      @rayflyers 2 месяца назад

      Example 3: Pivots. Trying to use dbplyr just tells me there's no method for pivoting even though I know Duck DB has pivots
      con |>
      tbl("my_data") |>
      select(state, year, ids_in_this_year) |>
      pivot_wider(
      names_from = state,
      values_from = ids_in_this_year
      )

    • @rayflyers
      @rayflyers 2 месяца назад

      ​@@rappa753 I have heard of duckplyr. My understanding is that it works with data in memory. My data are larger than memory, so they have to sit in a duckdb file.

  • @user-sq9wp2ge4h
    @user-sq9wp2ge4h 2 месяца назад +1

    In my first year of data science i went through pain of writing a parser, build with paste XD this would have been easier. Although some time later i found out about the dbplyr package, which literally converts tidyverse syntax into sql syntax. quite nice if the queries aren't too complex

    • @rappa753
      @rappa753  2 месяца назад

      Yeah dbplyr is nice but it can still be useful to use a bit of SQL manually with glue_sql(). But kudos for building a parser 🥳