I wish this video had more exposure. I greatly appreciate that you took the time to put this series together. Being able to see these examples of the various mechanics behind scrapy has been hugely helpful. Thank you again.
Thanks a lot. The videos are very clear. Do you mind explaining please in one of your next videos the correct folder structure of a Scrapy project and what file goes where and why.
As the others have said, thanks for your time and effort, a great help. The links connecting to Amazon (e.g. the lighting link) are dead, and you might want to update them. On another front, have you added a video on caching? All in all, really well done, and, again, thanks. jA
Thanks, one of the issues with a lot of the scrapers I wrote is that always age well! I haven’t actually done anything in caching yet no, I’ll add it to my list
Hi! Yes I've been wanting to cover this for a while, unfortunatley ScrapyD doesn;t work with the latest version of Scrapy, so the best alternative I could come up with was hosting the Spider on a Linux server and using a cronjob to run it every X hours. Would that be of interest?
@@JohnWatsonRooney Sounds great. Looking forward to that. I've been having challenges as to how best to host my scraping scripts, I know there's some among us who also face the same challenge. Thanks, your efforts are much appreciated
Amazing Tutorial 😍 I do have a question - I am trying to get basic information from a shoe website and the spider is only returning half the items on the website because of the DUPEFILTER setting. Maybe there are same links for same shoe and different color or multiple items with the same link but if I try to change the filter setting, it goes into infinite loop. Is there a way around that?
I scraped a product and some items don't have some data so the result is a nonetype which means None, I created in the items.py a function to check if it is None print something: def check_gift(value): if value is None: return "No gift" else: return value but it don't work where is the problem?
I am having an issue where it seems like fetch(req) is the going a bit too fast, so it's only catching part of the page. Is there a way to slow it down? I can find it for when the crawler is working, but not for when you're scraping the shell. Thoughts?
How would change a stock item in item loader. It only returns "In Stock" or " " when things are out of stock. Would I create a function with a value and if else statement?
Another Excellent Video!!! I have one question; This is working with the parent URL data, is there a way to also use ItemLoader() with the associated child URL scraped data to end up with one combined yield l.load_item()? It could be an interesting video.
Nice to know what the competition is. I got a wisdom tooth. Is it possible with Scrapy to mark a checkbox, then click a button to get to the next page?
Super nice, however I am struggling to understand how would that work with a dynamic website where I am following a GET method which returns a data in json format. I do a bit of working around and convert it to a dictionary - but can’t seem to get it to return an item… any ideas that can help me?
I think you'd still need to parse through the JSON and then load it into the item loader and item, it's been a while since I've done that though so not 100% sure sorry
This is a great video thanks! Question: Can scrapy save item objects to pickle binary files? If so, how? I just find it really convenient to save my scraped data into pickled objects that can be used quickly in other files, but I can't find any doc on that for scrapy...
Best Vid for scrapy and best explanation @john Watson Rooney and others i have a one question along item loader that how we extract data if the element have more than one information (e.g. if element have two cell no then Item loader pick only first number not second one) as i learned from you previous vid we use getall()
Hi, i just don't quite get why you use the itemloader part and all of that stuff when you can do it within the parse function. Seems to me that it gets more complicated to get the same result. Surely there is something I am missing
Awesome Playlist But i have one question ( products which are sold out they are not giving us any data in its price field i tried to place the alternative value something which you have done in previous vedio using try and except block ) But i failed to do so please guide me
John, Another great video. In the title the first word should read Scrapy or the video won't come up in a search. Let me wish you a, well deserved, fantastic Christmas ! e
I wish this video had more exposure. I greatly appreciate that you took the time to put this series together. Being able to see these examples of the various mechanics behind scrapy has been hugely helpful. Thank you again.
Glad you enjoyed it! Thank you
You videos helped me understand scrapy more than any other resource, ty!
Thanks! the documentation did not go into enough depth and im glad someone made a comprehensive video on it
Thanks for the tutorial, after watching it, it gave me a better understanding of scrapy itemloader documents.
Great tutorial! very easy to follow, had no problems, about the typos, I'm the worst typer ever, but tabnine always saves my life.
Thanks a lot. The videos are very clear. Do you mind explaining please in one of your next videos the correct folder structure of a Scrapy project and what file goes where and why.
Thank you! I watched the previous video and then this, and it felt like I know so much about scrapy already. Really really good videos. Keep it up!
Great tutorial!! It really helps to understand
Another great video ! Very well done John 👏🏼
Awesome vid! It answered my questions with Item Loaders. Docs were confusing me haha
I know! The docs are good but also, not so good haha
As the others have said, thanks for your time and effort, a great help. The links connecting to Amazon (e.g. the lighting link) are dead, and you might want to update them. On another front, have you added a video on caching? All in all, really well done, and, again, thanks.
jA
Thanks, one of the issues with a lot of the scrapers I wrote is that always age well! I haven’t actually done anything in caching yet no, I’ll add it to my list
thank you for sharing
great tutorial, explain the exact thing I was looking for, thank you
Great explanation as always, really helpful tutorial
Hello John, could you do a video on how to host the scrapy scripts
Hi! Yes I've been wanting to cover this for a while, unfortunatley ScrapyD doesn;t work with the latest version of Scrapy, so the best alternative I could come up with was hosting the Spider on a Linux server and using a cronjob to run it every X hours. Would that be of interest?
@@JohnWatsonRooney Sounds great. Looking forward to that. I've been having challenges as to how best to host my scraping scripts, I know there's some among us who also face the same challenge. Thanks, your efforts are much appreciated
Great video. Thanks!!!
Amazing Tutorial 😍
I do have a question - I am trying to get basic information from a shoe website and the spider is only returning half the items on the website because of the DUPEFILTER setting. Maybe there are same links for same shoe and different color or multiple items with the same link but if I try to change the filter setting, it goes into infinite loop. Is there a way around that?
Thanks a lot, what you do is amazing.
Thank you, I learned a lot from this video:)
I scraped a product and some items don't have some data so the result is a nonetype which means None,
I created in the items.py a function to check if it is None print something:
def check_gift(value):
if value is None:
return "No gift"
else:
return value
but it don't work where is the problem?
I am having an issue where it seems like fetch(req) is the going a bit too fast, so it's only catching part of the page. Is there a way to slow it down? I can find it for when the crawler is working, but not for when you're scraping the shell. Thoughts?
How would change a stock item in item loader. It only returns "In Stock" or " " when things are out of stock. Would I create a function with a value and if else statement?
Thanks man!
Another Excellent Video!!!
I have one question; This is working with the parent URL data, is there a way to also use ItemLoader() with the associated child URL scraped data to end up with one combined yield l.load_item()?
It could be an interesting video.
Thanks for the tutorial!! One question, what part of the new code prevents the error to appear if there is no price info?? Thanks in advance !!!
Thanks a lot, that was very helpful.
If i wanted to include when a whisky bottle was sold out, how would i do it with the item loader?
thanks for best tutorial
thank you so much. is there a way i can scrap audio data like sound data ?
Nice to know what the competition is. I got a wisdom tooth. Is it possible with Scrapy to mark a checkbox, then click a button to get to the next page?
Excellent!!!
Great video. I wish a video of scrapy using proxy from you
Super nice, however I am struggling to understand how would that work with a dynamic website where I am following a GET method which returns a data in json format. I do a bit of working around and convert it to a dictionary - but can’t seem to get it to return an item… any ideas that can help me?
I think you'd still need to parse through the JSON and then load it into the item loader and item, it's been a while since I've done that though so not 100% sure sorry
@@JohnWatsonRooney thank you… I managed to do it now. Had to yield them all individually. But it’s working 👍🏼
@@JohnWatsonRooney when i import items
I get an error like this: attempted relative import with no known parent package
how can i solve this error
This is a great video thanks!
Question: Can scrapy save item objects to pickle binary files? If so, how? I just find it really convenient to save my scraped data into pickled objects that can be used quickly in other files, but I can't find any doc on that for scrapy...
Best Vid for scrapy and best explanation @john Watson Rooney and others
i have a one question along item loader that how we extract data if the element have more than one information (e.g. if element have two cell no then Item loader pick only first number not second one) as i learned from you previous vid we use getall()
Can you make one project with scrappy to extract stocks information along with historical data
when i import items
I get an error like this: attempted relative import with no known parent package
how can i solve this error
Hi, i just don't quite get why you use the itemloader part and all of that stuff when you can do it within the parse function. Seems to me that it gets more complicated to get the same result. Surely there is something I am missing
Awesome Playlist But i have one question ( products which are sold out they are not giving us any data in its price field i tried to place the alternative value something which you have done in previous vedio using try and except block ) But i failed to do so please guide me
Sold out products are only giving the output of name and link only
More scrappy blog please
Thanks for the informative video, Can't we just write if next_page: instead of if next_page is not None ?
John,
Another great video.
In the title the first word should read Scrapy or the video won't come up in a search.
Let me wish you a, well deserved, fantastic Christmas !
e
Oh wow I didn’t notice! Thank you for pointing that out, I’ve changed it. Happy Christmas to you too!
Nice tutorial!! Would be nice if you would show how to store the data of THIS code (item loader) into mongo DB. :)
Thanks! Sure, I’m going to extend this project to cover more of Scrapy’s features, including pipelines and databases
@@JohnWatsonRooney Awesome. Thanks a lot :)
what would you recommend? splash or playwright?
Playwright is my go to now
@@JohnWatsonRooney Thank you very much🥰
Super
Amazing tutorial. Thank you very much. Can you share the code as usual?
Yes, sure I've updated my repo here: github.com/jhnwr/whiskyspider
How to save the URL of the extracted page when using itemloader.
I got that. its: l.add_value("url", response.url)
how do I use this on the xpath, I tried but it didnt work exactily like this {l.add_xpath( ' title ' , ' .//h1[@class="product__title"]')}