If you get an error like this: AttributeError: 'SelectReactor' object has no attribute '_handleSignals' . Try installing an earlier version of Twisted: pip install Twisted==22.10.0 did the trick for me.....
Thank you for your tutorial man, it was very helpful for me. Is there a way to retrieve information using the lua_script and storing that information to latter be used? For example a website that displays info in pages, I want to get the info of some elements in page one, but also in page two, so on. I'm guessing that maybe I can use a loop in the lua_script and then returning that information but I don't know anything about lua language. Thanks again for your tutorial, it was straightful and solved lot of doubts.
Hi, i wanted to ask that splash:send_keys("") do not work on websites like OLX or youtube. I think they are using some cloudflare to stop bots from searching something on the search bar. because same splash script works for google with just chaning the CSS selector for google search_bar but it wont work for OLX or RUclips. Is their anything you can do or you just have make a request link to search a product?
We have it all in the github project which is linked in the description! Here's a link directly to the items.py file: github.com/python-scrapy-playbook/quotes-js-project/blob/main/quotes_js_scraper/items.py
Yes, rendered requests take longer as it is using a headless browser which it making 1-100 extra requests to load a page behind the scenes (to load CSS, JS files and make network requests) depending on the page you are trying to scrape. Rendered requests typically consume more bandwidth as well so can be more expensive if using proxies where you pay per GB.
The issue with using a "real website" is that most of the time they get updated frequently and then the code/ example article would be broken and even more people would be having issues!
I am very glad that i found your channel
You do the magic, You are great
Well done Joe
If you get an error like this: AttributeError: 'SelectReactor' object has no attribute '_handleSignals' .
Try installing an earlier version of Twisted:
pip install Twisted==22.10.0
did the trick for me.....
Thank you for your guide video👍👍👍👍 I will refer to it and proceed with the project
That's great!
Aparently splash is no longer runing javascript, does someone knows what is going on?
Thank you for your tutorial man, it was very helpful for me.
Is there a way to retrieve information using the lua_script and storing that information to latter be used? For example a website that displays info in pages, I want to get the info of some elements in page one, but also in page two, so on. I'm guessing that maybe I can use a loop in the lua_script and then returning that information but I don't know anything about lua language.
Thanks again for your tutorial, it was straightful and solved lot of doubts.
How to check what is the best way to collect data?
Hi, i wanted to ask that splash:send_keys("") do not work on websites like OLX or youtube. I think they are using some cloudflare to stop bots from searching something on the search bar. because same splash script works for google with just chaning the CSS selector for google search_bar but it wont work for OLX or RUclips. Is their anything you can do or you just have make a request link to search a product?
Where did you cover the contents of your items.py file?
We have it all in the github project which is linked in the description! Here's a link directly to the items.py file: github.com/python-scrapy-playbook/quotes-js-project/blob/main/quotes_js_scraper/items.py
@@scrapeops Thank you!
1. Does page rendering take longer than regular requests?
Yes, rendered requests take longer as it is using a headless browser which it making 1-100 extra requests to load a page behind the scenes (to load CSS, JS files and make network requests) depending on the page you are trying to scrape. Rendered requests typically consume more bandwidth as well so can be more expensive if using proxies where you pay per GB.
@@scrapeops thanks for detailed answer
I hate how you always use this basic website. Can you actually use a real website?
The issue with using a "real website" is that most of the time they get updated frequently and then the code/ example article would be broken and even more people would be having issues!