Node.js Web Scraping (Step-By-Step Tutorial)

Поделиться
HTML-код
  • Опубликовано: 12 янв 2025

Комментарии • 61

  • @Slimjwill
    @Slimjwill 10 месяцев назад

    I'm an expert at web scraping in Python but JS was confusing for me until I found this tutorial. I've scoured the internet for a good tutorial on JS web scraping and you knocked it out of the park! Thanks!
    Straight forward, to the point, clean and crisp code... love it!

    • @oxylabs
      @oxylabs  9 месяцев назад

      Thanks, it's always good to hear such good feedback!

  • @danielvega646
    @danielvega646 2 года назад +7

    Really loved your format! So clear, straight forward and easy to follow up. Such a great job. Greetings from Colombia!

    • @oxylabs
      @oxylabs  2 года назад +1

      Hey Daniel! We're so happy you enjoyed it!

  • @aakashchaurasiya9595
    @aakashchaurasiya9595 7 месяцев назад

    Great Tutorial, Learned the basics in one video !!

    • @oxylabs
      @oxylabs  6 месяцев назад

      Glad it was helpful!

  • @Toqeershah2685
    @Toqeershah2685 11 месяцев назад

    Very Simple and helpful...
    Highly recommended

    • @oxylabs
      @oxylabs  11 месяцев назад

      Happy to hear you found it useful!

  • @tacitus5001
    @tacitus5001 2 года назад +1

    It is a MYSTERY that your code @ 10:10 works.
    The variable 'url' in 'const response = await axios.get(url);' is not defined.
    But for some reason you get some output. I would expect an error.
    This is hilarious!

    • @oxylabs
      @oxylabs  2 года назад

      Hey! Thanks for the sharp eye!😄We think it's this: the 'url' parameter is defined in function argument `async function getBooks(url){...

  • @mohitsaud2071
    @mohitsaud2071 Год назад

    Thank you for the wonderful tutorial.

    • @oxylabs
      @oxylabs  Год назад

      Awesome to hear that!

  • @anuradhabarnwal3467
    @anuradhabarnwal3467 2 года назад +2

    Most understandable and informative video this is... really appreciated your work...

    • @oxylabs
      @oxylabs  2 года назад +1

      Glad it was helpful! Thank you!

  • @codecollegetv-ey3mo
    @codecollegetv-ey3mo 4 месяца назад

    sooooo amazingggggg mannn !!!

  • @selahadinjemal7138
    @selahadinjemal7138 Год назад

    This was really helpful. Thanks.

    • @oxylabs
      @oxylabs  Год назад

      We're happy you found it useful! :)

  • @Coriander_11
    @Coriander_11 Год назад

    very helpful, many thanks

  • @allmighty2000
    @allmighty2000 2 года назад +1

    super short super simple

    • @oxylabs
      @oxylabs  2 года назад

      Glad you liked it!

  • @whatislove4587
    @whatislove4587 2 года назад +1

    I use fetch instead of axios, it works too!
    const response = await fetch(url);
    const html = await response.text();
    const $ = cheerio.load(html);

    • @oxylabs
      @oxylabs  2 года назад +2

      Hey, thanks for sharing!

    • @danielvega646
      @danielvega646 2 года назад +2

      DUDEEE You are a genius!!! Thanks for that. With this hint now we can save one dependency ourselves.

  • @Odidi_Bee6ix
    @Odidi_Bee6ix 5 месяцев назад

    Spot on🎯

  • @pysavantcodes
    @pysavantcodes 2 года назад

    Very helpful tutorial, Thanks so much❤️

  • @viraj_madhushan
    @viraj_madhushan 8 месяцев назад

    Great tutorial Thanks a lot

    • @oxylabs
      @oxylabs  8 месяцев назад

      Glad it was helpful!

  • @MultiversX
    @MultiversX 2 года назад +1

    How would you return the value and not just console.log it? Great video! It really was much simpler than expected!

    • @oxylabs
      @oxylabs  2 года назад

      Hey! Thanks for asking :) Use the return result; statement instead of console.log(result);

  • @javaboy6581
    @javaboy6581 2 года назад

    Muchisimas gracias, el mejor video sobre scraping!

  • @exe.m1dn1ght
    @exe.m1dn1ght Год назад

    Hi, I created a spider in nodejs , it's crawling page by page but it's very slow , 0.3 seconds for each page .. why is that ?

    • @oxylabs
      @oxylabs  2 месяца назад

      Hi, the slowness is likely due to network latency, page load times, or throttling from the website. You can improve speed by running parallel requests (with proper limits to avoid getting blocked) :)

  • @syedhassanraza6605
    @syedhassanraza6605 10 месяцев назад

    amazing video

  • @bryansuarez2396
    @bryansuarez2396 2 года назад

    For whatever reason at 10:14 I get a 404 error with axios, where it is not returning the second page. With the lines used at 10:14.

    • @oxylabs
      @oxylabs  2 года назад +1

      Hello. If you get the first page right and the second 404s, it means there is a semantic error somewhere here:
      if ($(".next a").length > 0) {
      next_page = baseUrl + $(".next
      a").attr("href");
      getBooks(next_page);
      You could have accidentally missed something in the next_page variable, causing the code to create a non-existing URL. Try adding console.log(next_page) after defining next_page to see what the output is:
      if ($(".next a").length > 0) {
      next_page = baseUrl + $(".next
      a").attr("href");
      console.log(next_page);
      getBooks(next_page);
      Hope this helps!

  • @codecollegetv-ey3mo
    @codecollegetv-ey3mo 4 месяца назад

    subscribing nowwwwww

  • @im_mohammad1
    @im_mohammad1 2 года назад

    It was very good, thank you very much❤

    • @oxylabs
      @oxylabs  2 года назад

      We're glad you liked it!

  • @floraflora9613
    @floraflora9613 2 года назад

    Great tutorial, thank you!

    • @oxylabs
      @oxylabs  2 года назад

      Thank you, glad you enjoyed it!

  • @SaintBrisa
    @SaintBrisa 2 года назад

    for whatever reason, I don't know why it can't save the data and create the books.csv file. Great video, precise and straightforward

    • @oxylabs
      @oxylabs  2 года назад

      Thank you for the positive feedback!

  • @maestrorobi4467
    @maestrorobi4467 2 года назад

    Nice! How can I make it automated to web scrape the same data daily or a different schedule?

    • @oxylabs
      @oxylabs  2 года назад +2

      Hello. We've got a video for automating web scraping, too, hope it answers your question well:
      ruclips.net/video/_AxotVxsPBw/видео.html

  • @phtephen8456
    @phtephen8456 2 года назад

    good tutorial

    • @oxylabs
      @oxylabs  2 года назад

      Thank you, there are many more to come!

  • @m2svirtual384
    @m2svirtual384 2 года назад

    I am attempting to scrape a site that tells my scraper that it's not a supported browser, doesn't support JS etc. The User Agent the scraper is sending is a valid/normal Chrome agent string. I can load the page in a legit browser and inspect everything, but I cannot save the page as HTML nor right click and select View Source. Scraping, viewing source or saving as HTML all result in the same error page saying I'm not a supported browser. Anyone can help me get this page scraped? Thanks

    • @oxylabs
      @oxylabs  2 года назад +1

      Hey, thanks for asking! You are most likely looking at a web page that is both rendered in JS and employs some kind of browser checks to confirm your legitimateness.
      The first thing to try would be to scrape the website using a headless browser (see Playwright). The second: look into unblocking strategies. You can check our blog for useful tutorials - oxylabs.io/blog.

    • @m2svirtual384
      @m2svirtual384 2 года назад

      @@oxylabs Thanks for the quick reply! I will check out the blog right away... = )

  • @juliciousz
    @juliciousz 2 года назад

    Nice! How about book-URLs?

    • @oxylabs
      @oxylabs  2 года назад

      Hello! Do you mean URL addresses or the book URL pages themselves, one by one?

    • @juliciousz
      @juliciousz 2 года назад

      @@oxylabs Hi! Yes, I would love to see a book-link pushed too to the book_data, just like this: book_data.push({ title, price, stock, link }), if you know what I mean, thanks

    • @oxylabs
      @oxylabs  2 года назад

      @@juliciousz Hey again! To answer your question - yes it is possible. All you need to do is to find the a tag, get its href attribute and push it to the array exactly how you specified.
      Here's the code:
      link = $(this).find("a").attr("href").replace("../../../", "books.toscrape.com/catalogue/")
      book_data.push({title, price, stock, link})
      P.S books.toscrape.com returns relative URLs when extracting href, therefore some string replacement needs to be done.
      Hope this helps!

    • @juliciousz
      @juliciousz 2 года назад

      @@oxylabsYess, it's working, loving it.. nice replace command too, that's new thing for me.. Great tutorial indeed, hope u having a nice day!

    • @oxylabs
      @oxylabs  2 года назад

      @@juliciousz Thank you, have a wonderful day too!

  • @vietvie
    @vietvie Год назад

    How to scrape Reactjs website

  • @saharaprotocol
    @saharaprotocol 2 года назад

    Спасибо ❤

  • @mortabitmosab415
    @mortabitmosab415 2 года назад

    Is web scraping legal ?

    • @oxylabs
      @oxylabs  2 года назад +1

      It is! But not everywhere, only on public websites. We have an in-depth explanatory blog post on this exact topic. You can read it here:
      oxylabs.io/blog/is-web-scraping-legal

  • @mohitsaud2071
    @mohitsaud2071 Год назад

    Suscribed

  • @iqoverflow
    @iqoverflow Год назад

    Wow