Hello, thanks for watching my video! If you want to play a bit more with scraping using n8n and Puppeteer, here is my previous tutorial: ruclips.net/video/YonNJqAAxdg/видео.html
Great video. Question: I am trying to reduct cloud costs for all these tools. How do you host Baserow, and do you host it on the same server as n8n? I now use Hetzner but not sure if it can handle both.
Hey, thank you very much for your comment and kind words about my work - I really appreciate it! Yes, it is possible. Although I haven’t self-hosted Baserow yet (I use cloud option for now), I’m pretty sure it should not be a problem to have both n8n and other apps on the same VPS. The key here is to set each app on a different port. I suppose basic machines with ~2GB memory should handle both things.
Why do you upload node.js code to Google cloud and not directly in node code in n8n workflow? Is it because you can't import modules? I get module not found error. Is there any others way?
Hello, thanks for you comment. Absolutely - mostly analytical ones, for example: monitoring of price changes, checking product fit and competitiveness, exploring customers preferences (basing on number of products sold), SEO analysis (e.g. what keywords competitors use) and many more similar. Basically, the key here is not only the type of data scraped, but also automation around it. In this example I scraped search results only for one keyword, but imagine you’d like to perform analysis for dozens or hundreds of product types - it’s also possible with this workflow. Scraped and structured data is also much easier to read and transform - it simplifies performing analytical tasks.
Hi, thanks for your comment and apologize for late feedback. Could you please let me know on which point you get this error (while running script locally or on GCF)? Do you get any other information from console? Thanks in advance for your kind reply.
Hello, thanks a lot for your comment and sorry for my late feedback. I had a chance to use Puppeteer community node, and unfortunately I find it a bit buggy. Since Puppeteer is also slightly demanding in terms of memory, it’s much more convenient for me to host it on GCF and perform calls when needed. But it’s just my experience - if Puppeteer community node works well for you, I don’t see a reason not to use it 😃
Hello, thank you for your comment. As long as you use IP rotation, chances that the script will be permanently blocked are rather low (when one IP address is getting blocked, the other one is used in the next round).
Hi, thank you for your comment! This code was created exclusively for Amazon, so scraping and retrieving data from Reddit is rather not possible with it. Of course, you can create your own scraping script using Puppeteer, and I strongly encourage you to do so - it's a lot of fun!
Hello, thank you for your comment. In the video description, you can find links to the Puppeteer code without Bright Data implemented. If you prefer not to use BD, please feel free to use these resources and adapt them to the requirements of any other proxy provider of your choice.
Hello, thanks for watching my video! If you want to play a bit more with scraping using n8n and Puppeteer, here is my previous tutorial: ruclips.net/video/YonNJqAAxdg/видео.html
thanks for teaching lesson start from the install package. it's so important!!! Keep doing it.
Thank you very much - I’m glad you find my video helpful!
I didn't like this video....I loved this ❤
There are so many new things which I have learnt from this video today.
By the awesome video editing....😍
Thank you very much! I'm happy that you like the video and I'm very grateful for your support. All the best to you!
Great video. Question: I am trying to reduct cloud costs for all these tools. How do you host Baserow, and do you host it on the same server as n8n? I now use Hetzner but not sure if it can handle both.
Hey, thank you very much for your comment and kind words about my work - I really appreciate it!
Yes, it is possible. Although I haven’t self-hosted Baserow yet (I use cloud option for now), I’m pretty sure it should not be a problem to have both n8n and other apps on the same VPS. The key here is to set each app on a different port. I suppose basic machines with ~2GB memory should handle both things.
@@workfloows Thank you....Subscribed
Congrats bro! 😃😃😃😃😃😃
Thanks a lot!
Why do you upload node.js code to Google cloud and not directly in node code in n8n workflow?
Is it because you can't import modules? I get module not found error.
Is there any others way?
Great video
Thank you very much!
What is the purpose of this scraped data? Obviously, there seem to be some use-cases. Would you please share some?
Hello, thanks for you comment.
Absolutely - mostly analytical ones, for example: monitoring of price changes, checking product fit and competitiveness, exploring customers preferences (basing on number of products sold), SEO analysis (e.g. what keywords competitors use) and many more similar.
Basically, the key here is not only the type of data scraped, but also automation around it. In this example I scraped search results only for one keyword, but imagine you’d like to perform analysis for dozens or hundreds of product types - it’s also possible with this workflow.
Scraped and structured data is also much easier to read and transform - it simplifies performing analytical tasks.
@@workfloows Hey, Thank you very much for your great answer! Makes a lot of sense to me.
lovely tutorial. i tried this but i'm getting a CORS error.. any idea how to resolve this.. thanks
Hi, thanks for your comment and apologize for late feedback.
Could you please let me know on which point you get this error (while running script locally or on GCF)? Do you get any other information from console?
Thanks in advance for your kind reply.
@@workfloows I later on figured out the error . Thanks 🙏
what about the puppeteer community node?
Hello, thanks a lot for your comment and sorry for my late feedback.
I had a chance to use Puppeteer community node, and unfortunately I find it a bit buggy. Since Puppeteer is also slightly demanding in terms of memory, it’s much more convenient for me to host it on GCF and perform calls when needed.
But it’s just my experience - if Puppeteer community node works well for you, I don’t see a reason not to use it 😃
What do you mean by undetected ? Shouldn't you get blocked after numbers of requests?
Hello, thank you for your comment.
As long as you use IP rotation, chances that the script will be permanently blocked are rather low (when one IP address is getting blocked, the other one is used in the next round).
How can I apply IP Rotation on this method on veido @@workfloows
Is it possible to take this code and modify it to scrape Reddit or any other website, or each site needs a different approach ?
Hi, thank you for your comment!
This code was created exclusively for Amazon, so scraping and retrieving data from Reddit is rather not possible with it. Of course, you can create your own scraping script using Puppeteer, and I strongly encourage you to do so - it's a lot of fun!
aguante el mate
¡Aguante!
What’s the purpose of alllll this, if you’re just going to use brightdata
Hello, thank you for your comment.
In the video description, you can find links to the Puppeteer code without Bright Data implemented. If you prefer not to use BD, please feel free to use these resources and adapt them to the requirements of any other proxy provider of your choice.