Hey, I found a bug. If you will open jobs-guest page and try to scrape all links to job postings trough changing start parameter you can get some data duplicated and some data missed. Any ideas how to fix that ?
Has anyone had issues with running the final scrapy list?? i am only getting the 'quotes' instead of linked_jobs !! please help as have just spent 4 hours on this!1
Hello. I’ve really enjoyed your videos. I have a question about LinkedIn scraping. When I go to LinkedIn it seems to require that I login or signup. I do not see any way around this. When I inspect their top level url page I do not see the elements you refer to. Can you advise. Has their site changed since your recording, so as to make it inaccessible to guests? Thanks in advance.
@@scrapeops I realised after you scroll for a while the jobs data stops being automatically loaded and you encounter "See More Jobs" button. Is there a way my bot can detect that the data is no longer being loaded automatically and now start following the buttons?
if you already activate the virtual environment, then make sure you in the root directory of your scrapy project where you try to run the 'crawl' command. The directory is where the 'scrapy.cfg' file is located
How do I automate the extraction? and that it does not repeat the data? I need to perform searches every 1 hour to be able to apply quickly. Many thanks for everything
I export it to json and then with panda I make it an xlxs. It would be nice to be able to do everything automatically. I am looking for information, I await your comments. I would like to automate the process. Thanks
To schedule it to run every hour you could deploy it to a server using ScrapeOps scheduler (free). Here is a video on how to do it: scrapeops.io/docs/servers-scheduling/digital-ocean-integration/ To make sure it only extracts new data then you need to create a pipeline that checks what data you have already scraped by checking a database before adding new data. Check out this guide: scrapeops.io/python-scrapy-playbook/scrapy-save-data-postgres/#only-saving-new-data
@@scrapeops oooh, I didn’t actually validate my email address, I’ll have to do that thanks. but I got a workaround using scrapy_selenium with the selenium.webdriver using the chrome driver
Hey that's great content and the easiest setup I've ever seen, but man the pricing is hella pricey, it spends 1000 credits for 250 jobs, does it really cost that much, I'd love to support you, but paying 100 bucks for that is kinda makes me look for alternatives to the proxy, just wanted to share feedback with you.
Unfortunatly LinkedIn is one of the hardest websites to scrape out there! It costs 70 times more than scraping an Amazon page (or other normal website). The average cost is approx. $7 to scrape 1000 profile/job pages
I did my best to keep up, but it was quite challenging. Why make it so complicated? If there's a video tutorial, why not include explanations of the code? It's nearly impossible for someone without a background in web scraping to follow along.
it is challenging, when i find something like this i downgrade a level and then come back, try some web scraping fundamentals tutorial, then basic scrapy tutorial, then learn apis and youll be ready for this one
I followed you at every step and it was a success.
Thank you for your time and dedication. regards
+1 follower
Great, thanks for the feedback!
Followed you step by step and code works perfect ,thx~
Hey, I found a bug. If you will open jobs-guest page and try to scrape all links to job postings trough changing start parameter you can get some data duplicated and some data missed. Any ideas how to fix that ?
How to get full job description
Seems the solution is out dated as jobs-guest filter does not work right now, it's voyager but more complicated and I cannot get that url.
Followed the same code from blog post and video but it is always returning an empty list (not-found) ? How to solve this ?
same with me, if you get any solution kindly inform me!
Has anyone had issues with running the final scrapy list?? i am only getting the 'quotes' instead of linked_jobs !! please help as have just spent 4 hours on this!1
Hello. I’ve really enjoyed your videos. I have a question about LinkedIn scraping. When I go to LinkedIn it seems to require that I login or signup. I do not see any way around this. When I inspect their top level url page I do not see the elements you refer to. Can you advise. Has their site changed since your recording, so as to make it inaccessible to guests? Thanks in advance.
if you get 403 error ,That means your request didn't have an API key, the API key was invalid or you haven't validated your email address.
Saved me from frustration, leading to the existential crisis :D
Always the best
Thank you!
@@scrapeops I realised after you scroll for a while the jobs data stops being automatically loaded and you encounter "See More Jobs" button.
Is there a way my bot can detect that the data is no longer being loaded automatically and now start following the buttons?
hi! I am getting errors "Unknown command: list" and "Unknown command: crawl". does anyone know how I can fix this? thanks!
maybe because scrapy cannot find an active project in the directory where you trying to run the 'crawl' command
if you already activate the virtual environment, then make sure you in the root directory of your scrapy project where you try to run the 'crawl' command. The directory is where the 'scrapy.cfg' file is located
if according to the video, first, cd basic-scrapy-project then run 'scrapy list'
amazing
reomanded
How do I automate the extraction? and that it does not repeat the data?
I need to perform searches every 1 hour to be able to apply quickly.
Many thanks for everything
I export it to json and then with panda I make it an xlxs. It would be nice to be able to do everything automatically. I am looking for information, I await your comments.
I would like to automate the process.
Thanks
To schedule it to run every hour you could deploy it to a server using ScrapeOps scheduler (free). Here is a video on how to do it: scrapeops.io/docs/servers-scheduling/digital-ocean-integration/
To make sure it only extracts new data then you need to create a pipeline that checks what data you have already scraped by checking a database before adding new data. Check out this guide: scrapeops.io/python-scrapy-playbook/scrapy-save-data-postgres/#only-saving-new-data
@@scrapeops You are very generous, thanks for the work
It is giving me an error saying that http status is not handled or allowed
What is the status code?
@@scrapeops 403
@@hakeembashiru5615 That means your request didn't have an API key, the API key was invalid or you haven't validated your email address.
@@scrapeops oooh, I didn’t actually validate my email address, I’ll have to do that thanks. but I got a workaround using scrapy_selenium with the selenium.webdriver using the chrome driver
thanks a lot , this was the solution for the error for me , i only needed to validate my email @@scrapeops
Hey that's great content and the easiest setup I've ever seen, but man the pricing is hella pricey, it spends 1000 credits for 250 jobs, does it really cost that much, I'd love to support you, but paying 100 bucks for that is kinda makes me look for alternatives to the proxy, just wanted to share feedback with you.
Unfortunatly LinkedIn is one of the hardest websites to scrape out there! It costs 70 times more than scraping an Amazon page (or other normal website). The average cost is approx. $7 to scrape 1000 profile/job pages
idk is it still working or not but I enjoy the video. wp
I did my best to keep up, but it was quite challenging. Why make it so complicated? If there's a video tutorial, why not include explanations of the code? It's nearly impossible for someone without a background in web scraping to follow along.
it is challenging, when i find something like this i downgrade a level and then come back, try some web scraping fundamentals tutorial, then basic scrapy tutorial, then learn apis and youll be ready for this one