Scrapy Splash: How to scrape JS rendered websites (2022)

Поделиться
HTML-код
  • Опубликовано: 22 дек 2024

Комментарии • 19

  • @makedatauseful1015
    @makedatauseful1015 2 года назад +1

    I am very glad that i found your channel

  • @umair5807
    @umair5807 Год назад +1

    You do the magic, You are great

  • @Dalaimaris
    @Dalaimaris 2 года назад +2

    Well done Joe

  • @tomgreg2008
    @tomgreg2008 Год назад +2

    If you get an error like this: AttributeError: 'SelectReactor' object has no attribute '_handleSignals' .
    Try installing an earlier version of Twisted:
    pip install Twisted==22.10.0
    did the trick for me.....

  • @MingiCho-mm4cm
    @MingiCho-mm4cm 2 года назад +2

    Thank you for your guide video👍👍👍👍 I will refer to it and proceed with the project

  • @tactiguay7154
    @tactiguay7154 Год назад +1

    Aparently splash is no longer runing javascript, does someone knows what is going on?

  • @Rodourmex
    @Rodourmex 8 месяцев назад

    Thank you for your tutorial man, it was very helpful for me.
    Is there a way to retrieve information using the lua_script and storing that information to latter be used? For example a website that displays info in pages, I want to get the info of some elements in page one, but also in page two, so on. I'm guessing that maybe I can use a loop in the lua_script and then returning that information but I don't know anything about lua language.
    Thanks again for your tutorial, it was straightful and solved lot of doubts.

  • @makedatauseful1015
    @makedatauseful1015 2 года назад

    How to check what is the best way to collect data?

  • @pkavenger9990
    @pkavenger9990 Год назад

    Hi, i wanted to ask that splash:send_keys("") do not work on websites like OLX or youtube. I think they are using some cloudflare to stop bots from searching something on the search bar. because same splash script works for google with just chaning the CSS selector for google search_bar but it wont work for OLX or RUclips. Is their anything you can do or you just have make a request link to search a product?

  • @NoName-lq7kt
    @NoName-lq7kt 2 года назад

    Where did you cover the contents of your items.py file?

    • @scrapeops
      @scrapeops  2 года назад +1

      We have it all in the github project which is linked in the description! Here's a link directly to the items.py file: github.com/python-scrapy-playbook/quotes-js-project/blob/main/quotes_js_scraper/items.py

    • @NoName-lq7kt
      @NoName-lq7kt 2 года назад

      @@scrapeops Thank you!

  • @makedatauseful1015
    @makedatauseful1015 2 года назад

    1. Does page rendering take longer than regular requests?

    • @scrapeops
      @scrapeops  2 года назад +1

      Yes, rendered requests take longer as it is using a headless browser which it making 1-100 extra requests to load a page behind the scenes (to load CSS, JS files and make network requests) depending on the page you are trying to scrape. Rendered requests typically consume more bandwidth as well so can be more expensive if using proxies where you pay per GB.

    • @makedatauseful1015
      @makedatauseful1015 2 года назад

      @@scrapeops thanks for detailed answer

  • @konnen4518
    @konnen4518 Год назад +4

    I hate how you always use this basic website. Can you actually use a real website?

    • @scrapeops
      @scrapeops  10 месяцев назад +2

      The issue with using a "real website" is that most of the time they get updated frequently and then the code/ example article would be broken and even more people would be having issues!