Scraping weather data from the internet with R and the tidyverse (CC231)

Поделиться
HTML-код
  • Опубликовано: 1 дек 2024

Комментарии • 43

  • @MrMandarpriya
    @MrMandarpriya 2 месяца назад

    Thanks a tons Sir. I am in Germany and i was able to get the lattitude and longitude for my place. This is so incredible .

    • @Riffomonas
      @Riffomonas  2 месяца назад

      Wonderful - that's great! 🤓

  • @davidmantilla1899
    @davidmantilla1899 2 года назад +2

    Your tutorials are great. I have a purely wet bio background and your videos helped me kickstart my computational biology literacy. Thank you for openly sharing your knowledge.

    • @Riffomonas
      @Riffomonas  2 года назад +1

      My pleasure! Thanks for watching David 🤓

  • @sven9r
    @sven9r 2 года назад +6

    For everybody having a hard time with parentheses like Pat has @13:00
    Tools -> "Global options "-> "Code" -> On the top to "Display" and then tick Rainbow parentheses

    • @Riffomonas
      @Riffomonas  2 года назад

      You don’t like my “see if we get an error message”? 😂

    • @sven9r
      @sven9r 2 года назад +1

      Not at all! I'm loving it! But beginners often struggle with this stuff!
      Cheers

    • @Riffomonas
      @Riffomonas  2 года назад

      @@sven9r 🤣

    • @yaqinguo8971
      @yaqinguo8971 2 года назад

      It's a good hint. But, interestingly, i did not have this option.

  • @eric13hill
    @eric13hill 2 года назад +1

    This is my favorite video of yours. It is so useful for what I want to do. Thanks!

    • @Riffomonas
      @Riffomonas  2 года назад

      That’s awesome to hear! What part do you find most useful?

  • @NdengoMarcel
    @NdengoMarcel 2 года назад +1

    This tutorial in practice is very interesting. I did manage to run the entire code but using my local latitude and longitude as you suggested. I did work. My interested variables were TMAX and PRCP. In Rwanda we do not have SNOW. Thanks a lot.

    • @Riffomonas
      @Riffomonas  2 года назад

      Wonderful! I'm glad to hear you got it working. Sorry that you all miss out on snow 😂

  •  2 года назад +1

    Excellent! There's one station in my city!

  • @sven9r
    @sven9r 2 года назад +1

    Great episode as always! I just ended a course about german raster data with some students :) !

    • @Riffomonas
      @Riffomonas  2 года назад

      Awesome! As always thanks for watching 🤓

  • @zjardynliera-hood5609
    @zjardynliera-hood5609 2 года назад +2

    I love this, use the rainbow parentheses btw!!

    • @Riffomonas
      @Riffomonas  2 года назад

      Hah! I try to stick close to the defaults so beginners don't get too freaked out when they see something that looks different from their computer

  • @djangoworldwide7925
    @djangoworldwide7925 2 года назад +1

    + looks like a fun assignment to create a shiny dashboard containing time series plots of this data

    • @Riffomonas
      @Riffomonas  2 года назад +1

      Yeah I’ve thought about this but I’d probably build all the plots in the backend using a cron job or something. Then serve them up with minimal JavaScript. I don’t think the overhead of shiny would really be necessary 🤷‍♂️

  • @haraldurkarlsson1147
    @haraldurkarlsson1147 2 года назад +1

    This is great! I like how you build it up and have a specific goal in mind. This is also a problem any of us can tackle since the data is readily available.
    I typically write my own code for these sort of exercise (since I at least I can understand my own code) - that is how I learn best. I came up with a slightly different way of finding my "closet" weather station. I wrote a couple functions to do this - and tested the distance on Houston-Chicago and got pretty close. Here is how I tackled the problem.
    I set up two functions to run inside tidyverse - so used rlang (hence the enquo() and the bang bang !!).
    The first function converts to radians:
    radians_func %
    distinct(station) %>%
    pull(station)
    My closest station was about 500 m form my current location but has only operated for a couple of years. The filter gave me another station about 4 km away with a more extensive record. I decided to filter for stations with over 100 year record (although it is not clear what kind of record that is).
    It seems like the search should be more focused, though. What are we after? Temperature it seems. And it seems like that is the one variable most often measured.

  • @haraldurkarlsson1147
    @haraldurkarlsson1147 2 года назад +1

    I had no trouble pulling up data for my best neighborhood station. However, my question is the temperature - what is the unit? Kelvin?

    • @Riffomonas
      @Riffomonas  2 года назад +1

      I think that was a question that is flashed in the last 5 min or so of the episode. I’ll definitely cover it in tomorrows episode

  • @lancesnodgrass8016
    @lancesnodgrass8016 10 месяцев назад

    I'm having issues finding the same website as shown in 1:45 and beyond. Any info on how the path has changed from a year ago?

    • @Riffomonas
      @Riffomonas  8 месяцев назад

      I just checked it and everything was working. Perhaps the site was down when you tried.

  • @haraldurkarlsson1147
    @haraldurkarlsson1147 2 года назад +1

    Pat,
    I used vroom to read in the file and it read it fast and detected the columns. The only thing I had to do was to clean the column names.

    • @Riffomonas
      @Riffomonas  2 года назад

      Great - I haven’t tried vroom yet

  • @haraldurkarlsson1147
    @haraldurkarlsson1147 2 года назад

    I must add the vroom read it in fast (lazy loading I suspect) but I not so sure about the column allocations. It seems to have created new ones with mixed type data. So be aware.

    • @Riffomonas
      @Riffomonas  2 года назад

      Some times the simpler packages are good enough

  • @ahmedmostafaahmedkamel8532
    @ahmedmostafaahmedkamel8532 Год назад

    where is this script, please?

  • @djangoworldwide7925
    @djangoworldwide7925 2 года назад +1

    I might be wrong but mehh, I'm just gonna make this assumption.
    Science in a nutshell 😅
    Great tutorial sir. I always enjoy your videos since I learn so much more than what I came for (might you elaborate about top_n ? Couldn't quite grasp this one)

    • @Riffomonas
      @Riffomonas  2 года назад

      Thanks for the question! top_n returns the n rows (plus ties) for a particular variable that have the highest value. If you give it a negative number you’ll get the smallest values. There’s also slice_min and slice_max which are a bit similar

  • @kmbrahm
    @kmbrahm 2 года назад

    TMAX looks very high, is that combining rows?

    • @kmbrahm
      @kmbrahm 2 года назад +1

      answered my question - TMAX = Maximum temperature (tenths of degrees C)

    • @Riffomonas
      @Riffomonas  2 года назад +1

      Good sleuthing! I’ll fix this and the precipitation in the next video 🤓

  • @haraldurkarlsson1147
    @haraldurkarlsson1147 2 года назад

    Pat,
    Webscraping has - at least in my mind - a different meaning that what you are doing here. It uses rvest etc. It might be misleading for those looking for actual webscraping.

    • @Riffomonas
      @Riffomonas  2 года назад

      🤷‍♂️I’m getting data from a website. It’s a form of webscraping

  • @haraldurkarlsson1147
    @haraldurkarlsson1147 2 года назад

    Must be F with errant readings...