Node.js Web Scraping (Step-By-Step Tutorial)

Поделиться
HTML-код
  • Опубликовано: 9 июл 2024
  • If your scraping tasks seem a little too time-consuming or inefficient, and you wish for these worries to become problems of the past, then Oxylabs Scraper APIs might be just for you! Check out the FREE trial here👉 oxy.yt/ZtLV
    Interested in starting to scrape on a large scale? Register for our webinar on “Large-Scale Web Scraping: Never Get Blocked Again” and learn from the industry experts about issues you can face and solutions to overcome them! Webinar registration link: www.bigmarker.com/oxylabs/Lar...
    Web scraping is rarely seen as an easy activity, so you might assume the same for node.js web scraping, yet that is not entirely true.
    To provide evidence for such a claim, we’ve made a quick tutorial that essentially covers all the basics you would need to know for a successful node.js web scraping start. Our step-by-step guide takes you through the entire process, starting with the required software and which libraries should be downloaded.
    Follow how packages like Cheerio provide additional benefits when used with Node.js. Learn how it converts the raw HTML captured by Axios into something that can be queried using a jQuery-like syntax. When one considers the popularity of jQuerry, quite a few developers are likely to feel familiar with some of the steps.
    Examine the practical tutorial provided and learn without fear through the use of books-to-scrape, a website dedicated to testing out your scraping projects. The availability of a step-by-step tutorial gives you the unique opportunity to analyze each of the web scraping steps and see what might be the most relevant for you. Have you chosen a valid selector, or perhaps there were difficulties while scraping the genre. All is answered, and even if your scraping project was all smooth sailing, there are still valuable tips and package suggestions provided in the video, such as Axios, Cheerio, and Json2scv.
    Watch these related videos:
    Scraping tutorials with Python:
    🎥 • Start Web Scraping Wit...
    Key information on web scraping:
    🎥 • Beginners Guide to Web...
    Scraping tips and tricks:
    🎥 • Actionable Web Scrapin...
    ✅ Grow Your Business with Top-Tier Web Data Collection Infrastructure: oxylabs.io/
    Join over a thousand businesses that use Oxylabs proxies:
    Residential Proxies:
    👉 oxy.yt/ytZX
    Shared Datacenter Proxies:
    👉 oxy.yt/btX1
    Dedicated Datacenter Proxies
    👉 oxy.yt/XtCj
    SOCKS5 Proxies:
    👉 oxy.yt/6tVD
    The following parts are examined:
    0:00 Introduction
    0:20 What is Node.js?
    1:27 Required software
    1:58 Setting up Node.js
    2:53 Basic steps of web scraping with JavaScript
    3:58 Parsing the response
    4:20 Web scraping, a practical example
    5:57 Scraping the genre
    7:56 Scraping book listings
    9:24 Pagination
    10:24 Saving scraped data to CSV
    12:00 Summary
    Subscribe for more: ruclips.net/user/Oxylabs?sub…
    © 2022 Oxylabs. All rights reserved.
    #Oxylabs #Webscraping #Nodejs
  • НаукаНаука

Комментарии • 58

  • @danielvega646
    @danielvega646 Год назад +7

    Really loved your format! So clear, straight forward and easy to follow up. Such a great job. Greetings from Colombia!

    • @oxylabs
      @oxylabs  Год назад +1

      Hey Daniel! We're so happy you enjoyed it!

  • @pysavantcodes
    @pysavantcodes Год назад

    Very helpful tutorial, Thanks so much❤️

  • @Slimjwill
    @Slimjwill 3 месяца назад

    I'm an expert at web scraping in Python but JS was confusing for me until I found this tutorial. I've scoured the internet for a good tutorial on JS web scraping and you knocked it out of the park! Thanks!
    Straight forward, to the point, clean and crisp code... love it!

    • @oxylabs
      @oxylabs  3 месяца назад

      Thanks, it's always good to hear such good feedback!

  • @javaboy6581
    @javaboy6581 Год назад

    Muchisimas gracias, el mejor video sobre scraping!

  • @allmighty2000
    @allmighty2000 2 года назад +1

    super short super simple

    • @oxylabs
      @oxylabs  2 года назад

      Glad you liked it!

  • @aakashchaurasiya9595
    @aakashchaurasiya9595 22 дня назад

    Great Tutorial, Learned the basics in one video !!

    • @oxylabs
      @oxylabs  18 дней назад

      Glad it was helpful!

  • @Coriander_11
    @Coriander_11 Год назад

    very helpful, many thanks

  • @anuradhabarnwal3467
    @anuradhabarnwal3467 Год назад +2

    Most understandable and informative video this is... really appreciated your work...

    • @oxylabs
      @oxylabs  Год назад +1

      Glad it was helpful! Thank you!

  • @mohitsaud2071
    @mohitsaud2071 11 месяцев назад

    Thank you for the wonderful tutorial.

    • @oxylabs
      @oxylabs  11 месяцев назад

      Awesome to hear that!

  • @floraflora9613
    @floraflora9613 2 года назад

    Great tutorial, thank you!

    • @oxylabs
      @oxylabs  2 года назад

      Thank you, glad you enjoyed it!

  • @syedhassanraza6605
    @syedhassanraza6605 3 месяца назад

    amazing video

  • @selahadinjemal7138
    @selahadinjemal7138 Год назад

    This was really helpful. Thanks.

    • @oxylabs
      @oxylabs  Год назад

      We're happy you found it useful! :)

  • @Toqeershah2685
    @Toqeershah2685 5 месяцев назад

    Very Simple and helpful...
    Highly recommended

    • @oxylabs
      @oxylabs  5 месяцев назад

      Happy to hear you found it useful!

  • @im_mohammad1
    @im_mohammad1 Год назад

    It was very good, thank you very much❤

    • @oxylabs
      @oxylabs  Год назад

      We're glad you liked it!

  • @exe.m1dn1ght
    @exe.m1dn1ght 6 месяцев назад

    Hi, I created a spider in nodejs , it's crawling page by page but it's very slow , 0.3 seconds for each page .. why is that ?

  • @viraj_madhushan
    @viraj_madhushan Месяц назад

    Great tutorial Thanks a lot

    • @oxylabs
      @oxylabs  Месяц назад

      Glad it was helpful!

  • @saharaprotocol
    @saharaprotocol Год назад

    Спасибо ❤

  • @phtephen8456
    @phtephen8456 2 года назад

    good tutorial

    • @oxylabs
      @oxylabs  2 года назад

      Thank you, there are many more to come!

  • @MultiversX
    @MultiversX Год назад +1

    How would you return the value and not just console.log it? Great video! It really was much simpler than expected!

    • @oxylabs
      @oxylabs  Год назад

      Hey! Thanks for asking :) Use the return result; statement instead of console.log(result);

  • @bryansuarez2396
    @bryansuarez2396 Год назад

    For whatever reason at 10:14 I get a 404 error with axios, where it is not returning the second page. With the lines used at 10:14.

    • @oxylabs
      @oxylabs  Год назад +1

      Hello. If you get the first page right and the second 404s, it means there is a semantic error somewhere here:
      if ($(".next a").length > 0) {
      next_page = baseUrl + $(".next
      a").attr("href");
      getBooks(next_page);
      You could have accidentally missed something in the next_page variable, causing the code to create a non-existing URL. Try adding console.log(next_page) after defining next_page to see what the output is:
      if ($(".next a").length > 0) {
      next_page = baseUrl + $(".next
      a").attr("href");
      console.log(next_page);
      getBooks(next_page);
      Hope this helps!

  • @mohitsaud2071
    @mohitsaud2071 11 месяцев назад

    Suscribed

  • @maestrorobi4467
    @maestrorobi4467 Год назад

    Nice! How can I make it automated to web scrape the same data daily or a different schedule?

    • @oxylabs
      @oxylabs  Год назад +2

      Hello. We've got a video for automating web scraping, too, hope it answers your question well:
      ruclips.net/video/_AxotVxsPBw/видео.html

  • @m2svirtual384
    @m2svirtual384 Год назад

    I am attempting to scrape a site that tells my scraper that it's not a supported browser, doesn't support JS etc. The User Agent the scraper is sending is a valid/normal Chrome agent string. I can load the page in a legit browser and inspect everything, but I cannot save the page as HTML nor right click and select View Source. Scraping, viewing source or saving as HTML all result in the same error page saying I'm not a supported browser. Anyone can help me get this page scraped? Thanks

    • @oxylabs
      @oxylabs  Год назад +1

      Hey, thanks for asking! You are most likely looking at a web page that is both rendered in JS and employs some kind of browser checks to confirm your legitimateness.
      The first thing to try would be to scrape the website using a headless browser (see Playwright). The second: look into unblocking strategies. You can check our blog for useful tutorials - oxylabs.io/blog.

    • @m2svirtual384
      @m2svirtual384 Год назад

      @@oxylabs Thanks for the quick reply! I will check out the blog right away... = )

  • @AdityaSingh-ui4tr
    @AdityaSingh-ui4tr 4 месяца назад

    1115,00 Compra
    Not working for this situation want to access 1115,00

  • @brisamukunde4771
    @brisamukunde4771 Год назад

    for whatever reason, I don't know why it can't save the data and create the books.csv file. Great video, precise and straightforward

    • @oxylabs
      @oxylabs  Год назад

      Thank you for the positive feedback!

  • @iqoverflow
    @iqoverflow Год назад

    Wow

  • @vietvie
    @vietvie 11 месяцев назад

    How to scrape Reactjs website

  • @jlmrz1
    @jlmrz1 2 года назад

    Nice! How about book-URLs?

    • @oxylabs
      @oxylabs  2 года назад

      Hello! Do you mean URL addresses or the book URL pages themselves, one by one?

    • @jlmrz1
      @jlmrz1 2 года назад

      @@oxylabs Hi! Yes, I would love to see a book-link pushed too to the book_data, just like this: book_data.push({ title, price, stock, link }), if you know what I mean, thanks

    • @oxylabs
      @oxylabs  2 года назад

      @@jlmrz1 Hey again! To answer your question - yes it is possible. All you need to do is to find the a tag, get its href attribute and push it to the array exactly how you specified.
      Here's the code:
      link = $(this).find("a").attr("href").replace("../../../", "books.toscrape.com/catalogue/")
      book_data.push({title, price, stock, link})
      P.S books.toscrape.com returns relative URLs when extracting href, therefore some string replacement needs to be done.
      Hope this helps!

    • @jlmrz1
      @jlmrz1 2 года назад

      @@oxylabsYess, it's working, loving it.. nice replace command too, that's new thing for me.. Great tutorial indeed, hope u having a nice day!

    • @oxylabs
      @oxylabs  2 года назад

      @@jlmrz1 Thank you, have a wonderful day too!

  • @mortabitmosab415
    @mortabitmosab415 2 года назад

    Is web scraping legal ?

    • @oxylabs
      @oxylabs  2 года назад +1

      It is! But not everywhere, only on public websites. We have an in-depth explanatory blog post on this exact topic. You can read it here:
      oxylabs.io/blog/is-web-scraping-legal

  • @tacitus5001
    @tacitus5001 Год назад +1

    It is a MYSTERY that your code @ 10:10 works.
    The variable 'url' in 'const response = await axios.get(url);' is not defined.
    But for some reason you get some output. I would expect an error.
    This is hilarious!

    • @oxylabs
      @oxylabs  Год назад

      Hey! Thanks for the sharp eye!😄We think it's this: the 'url' parameter is defined in function argument `async function getBooks(url){...

  • @whatislove4587
    @whatislove4587 Год назад +1

    I use fetch instead of axios, it works too!
    const response = await fetch(url);
    const html = await response.text();
    const $ = cheerio.load(html);

    • @oxylabs
      @oxylabs  Год назад +2

      Hey, thanks for sharing!

    • @danielvega646
      @danielvega646 Год назад +2

      DUDEEE You are a genius!!! Thanks for that. With this hint now we can save one dependency ourselves.