I'm an expert at web scraping in Python but JS was confusing for me until I found this tutorial. I've scoured the internet for a good tutorial on JS web scraping and you knocked it out of the park! Thanks! Straight forward, to the point, clean and crisp code... love it!
It is a MYSTERY that your code @ 10:10 works. The variable 'url' in 'const response = await axios.get(url);' is not defined. But for some reason you get some output. I would expect an error. This is hilarious!
Hi, the slowness is likely due to network latency, page load times, or throttling from the website. You can improve speed by running parallel requests (with proper limits to avoid getting blocked) :)
Hello. If you get the first page right and the second 404s, it means there is a semantic error somewhere here: if ($(".next a").length > 0) { next_page = baseUrl + $(".next a").attr("href"); getBooks(next_page); You could have accidentally missed something in the next_page variable, causing the code to create a non-existing URL. Try adding console.log(next_page) after defining next_page to see what the output is: if ($(".next a").length > 0) { next_page = baseUrl + $(".next a").attr("href"); console.log(next_page); getBooks(next_page); Hope this helps!
I am attempting to scrape a site that tells my scraper that it's not a supported browser, doesn't support JS etc. The User Agent the scraper is sending is a valid/normal Chrome agent string. I can load the page in a legit browser and inspect everything, but I cannot save the page as HTML nor right click and select View Source. Scraping, viewing source or saving as HTML all result in the same error page saying I'm not a supported browser. Anyone can help me get this page scraped? Thanks
Hey, thanks for asking! You are most likely looking at a web page that is both rendered in JS and employs some kind of browser checks to confirm your legitimateness. The first thing to try would be to scrape the website using a headless browser (see Playwright). The second: look into unblocking strategies. You can check our blog for useful tutorials - oxylabs.io/blog.
@@oxylabs Hi! Yes, I would love to see a book-link pushed too to the book_data, just like this: book_data.push({ title, price, stock, link }), if you know what I mean, thanks
@@juliciousz Hey again! To answer your question - yes it is possible. All you need to do is to find the a tag, get its href attribute and push it to the array exactly how you specified. Here's the code: link = $(this).find("a").attr("href").replace("../../../", "books.toscrape.com/catalogue/") book_data.push({title, price, stock, link}) P.S books.toscrape.com returns relative URLs when extracting href, therefore some string replacement needs to be done. Hope this helps!
It is! But not everywhere, only on public websites. We have an in-depth explanatory blog post on this exact topic. You can read it here: oxylabs.io/blog/is-web-scraping-legal
I'm an expert at web scraping in Python but JS was confusing for me until I found this tutorial. I've scoured the internet for a good tutorial on JS web scraping and you knocked it out of the park! Thanks!
Straight forward, to the point, clean and crisp code... love it!
Thanks, it's always good to hear such good feedback!
Really loved your format! So clear, straight forward and easy to follow up. Such a great job. Greetings from Colombia!
Hey Daniel! We're so happy you enjoyed it!
Great Tutorial, Learned the basics in one video !!
Glad it was helpful!
Very Simple and helpful...
Highly recommended
Happy to hear you found it useful!
It is a MYSTERY that your code @ 10:10 works.
The variable 'url' in 'const response = await axios.get(url);' is not defined.
But for some reason you get some output. I would expect an error.
This is hilarious!
Hey! Thanks for the sharp eye!😄We think it's this: the 'url' parameter is defined in function argument `async function getBooks(url){...
Thank you for the wonderful tutorial.
Awesome to hear that!
Most understandable and informative video this is... really appreciated your work...
Glad it was helpful! Thank you!
sooooo amazingggggg mannn !!!
This was really helpful. Thanks.
We're happy you found it useful! :)
very helpful, many thanks
super short super simple
Glad you liked it!
I use fetch instead of axios, it works too!
const response = await fetch(url);
const html = await response.text();
const $ = cheerio.load(html);
Hey, thanks for sharing!
DUDEEE You are a genius!!! Thanks for that. With this hint now we can save one dependency ourselves.
Spot on🎯
Very helpful tutorial, Thanks so much❤️
Great tutorial Thanks a lot
Glad it was helpful!
How would you return the value and not just console.log it? Great video! It really was much simpler than expected!
Hey! Thanks for asking :) Use the return result; statement instead of console.log(result);
Muchisimas gracias, el mejor video sobre scraping!
Hi, I created a spider in nodejs , it's crawling page by page but it's very slow , 0.3 seconds for each page .. why is that ?
Hi, the slowness is likely due to network latency, page load times, or throttling from the website. You can improve speed by running parallel requests (with proper limits to avoid getting blocked) :)
amazing video
For whatever reason at 10:14 I get a 404 error with axios, where it is not returning the second page. With the lines used at 10:14.
Hello. If you get the first page right and the second 404s, it means there is a semantic error somewhere here:
if ($(".next a").length > 0) {
next_page = baseUrl + $(".next
a").attr("href");
getBooks(next_page);
You could have accidentally missed something in the next_page variable, causing the code to create a non-existing URL. Try adding console.log(next_page) after defining next_page to see what the output is:
if ($(".next a").length > 0) {
next_page = baseUrl + $(".next
a").attr("href");
console.log(next_page);
getBooks(next_page);
Hope this helps!
subscribing nowwwwww
It was very good, thank you very much❤
We're glad you liked it!
Great tutorial, thank you!
Thank you, glad you enjoyed it!
for whatever reason, I don't know why it can't save the data and create the books.csv file. Great video, precise and straightforward
Thank you for the positive feedback!
Nice! How can I make it automated to web scrape the same data daily or a different schedule?
Hello. We've got a video for automating web scraping, too, hope it answers your question well:
ruclips.net/video/_AxotVxsPBw/видео.html
good tutorial
Thank you, there are many more to come!
I am attempting to scrape a site that tells my scraper that it's not a supported browser, doesn't support JS etc. The User Agent the scraper is sending is a valid/normal Chrome agent string. I can load the page in a legit browser and inspect everything, but I cannot save the page as HTML nor right click and select View Source. Scraping, viewing source or saving as HTML all result in the same error page saying I'm not a supported browser. Anyone can help me get this page scraped? Thanks
Hey, thanks for asking! You are most likely looking at a web page that is both rendered in JS and employs some kind of browser checks to confirm your legitimateness.
The first thing to try would be to scrape the website using a headless browser (see Playwright). The second: look into unblocking strategies. You can check our blog for useful tutorials - oxylabs.io/blog.
@@oxylabs Thanks for the quick reply! I will check out the blog right away... = )
Nice! How about book-URLs?
Hello! Do you mean URL addresses or the book URL pages themselves, one by one?
@@oxylabs Hi! Yes, I would love to see a book-link pushed too to the book_data, just like this: book_data.push({ title, price, stock, link }), if you know what I mean, thanks
@@juliciousz Hey again! To answer your question - yes it is possible. All you need to do is to find the a tag, get its href attribute and push it to the array exactly how you specified.
Here's the code:
link = $(this).find("a").attr("href").replace("../../../", "books.toscrape.com/catalogue/")
book_data.push({title, price, stock, link})
P.S books.toscrape.com returns relative URLs when extracting href, therefore some string replacement needs to be done.
Hope this helps!
@@oxylabsYess, it's working, loving it.. nice replace command too, that's new thing for me.. Great tutorial indeed, hope u having a nice day!
@@juliciousz Thank you, have a wonderful day too!
How to scrape Reactjs website
Спасибо ❤
Is web scraping legal ?
It is! But not everywhere, only on public websites. We have an in-depth explanatory blog post on this exact topic. You can read it here:
oxylabs.io/blog/is-web-scraping-legal
Suscribed
Wow