A Guide to Web Scraping with Node.js

Поделиться
HTML-код
  • Опубликовано: 6 сен 2024
  • Learn how to build a web scraper ⛏️ with NodeJS using two distinct strategies, including (1) a metatag link preview generator and (2) a fully-interactive bot for Instagram. fireship.io/le...
    1. Build a simple link preview scraper with Cheerio.
    github.com/che...
    2. Build an interactive Instagram scraper with puppeteer.
    github.com/Goo...
    #javascript #nodejs #webdev
    Take the quizzes 🤓
    iOS itunes.apple.c...
    Android play.google.co...
    Upgrade to Fireship PRO at fireship.io/pro
    Use code lORhwXd2 for 25% off your first payment.

Комментарии • 421

  • @oleksandrisaryk4728
    @oleksandrisaryk4728 5 лет назад +629

    Me yesterday: Hmm...there is no API available, well, need to learn how to scrap with node.
    Fireship today: A Guide to Web Scraping with NodeJS
    Well done.

    • @chicken_teriyaki
      @chicken_teriyaki 4 года назад +8

      No one like this comment, it has achieved greatness

    • @BearologyLabs
      @BearologyLabs 3 года назад +1

      @Killian Malik Tried plenty of methods, this one seems to be working!

    • @lizzagarcia6405
      @lizzagarcia6405 3 года назад

      L

  • @Tizmo-tt9ry
    @Tizmo-tt9ry 5 лет назад +272

    when you say as hackers i feel so cool

    • @Fireship
      @Fireship  5 лет назад +53

      Haha, web scraping always feels very hacky.

    • @anonwithamnesia
      @anonwithamnesia 5 лет назад +6

      @@Fireship Which font do you use?

    • @Fireship
      @Fireship  5 лет назад +26

      @@anonwithamnesia Fira code

    • @StingSting844
      @StingSting844 4 года назад +2

      Haha yeah its hacky and cool. Once you start to make money off scrapped pages you will sued though

    • @vaishnav4750
      @vaishnav4750 4 года назад

      😊yes

  • @f3lixadam
    @f3lixadam 5 лет назад +51

    Hi Jeff, thank you so much for your videos! We are building a startup and without your tutorials and explanations, I don't know if we could have ever gotten such a great insight how the mechanims work, especially for Ionic/Firebase.
    Much Love from Germany

    • @Fireship
      @Fireship  5 лет назад +12

      Wow, that is awesome! Keep me posted on your progress :)

    • @pepperpeterpiperpickled9805
      @pepperpeterpiperpickled9805 Год назад

      @@Fireship If I didnt think you were an AI, I'd want to buy you a beer

  • @AaronJack
    @AaronJack 5 лет назад +31

    So many things could be built around puppeteer, and the await syntax makes it so easy to reason about.
    Also seems like this is a perfect use-case for cloud functions vs having a dedicated API server. Great video.

  • @jimmymac601
    @jimmymac601 5 лет назад +21

    As a long time subscriber, you need to make tutorials every day. Fantastic work.

    • @Fireship
      @Fireship  5 лет назад +9

      I wish I could, my idea list is massive.

  • @almostanengineer
    @almostanengineer 5 лет назад +13

    I love the fact your using Insomnia for your rest requests, when I started working with API, I was told to try Postman, and found it overly complicated everything, then I found Insomnia, so much cleaner and easier to use.

  • @PawelPolskiPL
    @PawelPolskiPL 5 лет назад +13

    I love it! It’s the most useful video on RUclips for me, ever! Today all day I was making app using puppeteer, I didn’t know it’s so easy, thank you as always for great video. Now I’ll be saving over 40 min every day at work 😂👍

  • @ShivamGupta-nz7dy
    @ShivamGupta-nz7dy 5 лет назад +3

    When it comes to Web Scraping, most will talk about Python.
    Felt very happy knowing Web Scraping can also be done easily using JavaScript.
    Thanks a lot.

    • @softwarelivre2389
      @softwarelivre2389 4 года назад +1

      Javascript is the native environment for web scraping. Not only it is asynchronous, but also Javascript was created to be used on the web. To see people using interpreted, slower python for that is just sad. Not that there is something wrong with python, but it has other purposes. JS is much better for scraping pages generated with React, for example. On python, you can, at best, use a bot to simulate a user control. With JS, you can evaluate the DOM and then retrieve your data without the need of emulating a real user, and that is very powerful.

  • @boriskrstic
    @boriskrstic 5 лет назад +28

    This video is literally like you listened to me today what I was wondering about. Thanks for the puppeteer part! :)

    • @Fireship
      @Fireship  5 лет назад +3

      Awesome! The topic came from a suggestion in the slack channel last week :)

  • @ambarmutha8504
    @ambarmutha8504 5 лет назад +7

    🔥ship is AWESOME!
    Every time I have a project on my mind I find something useful to steal from this channel.

    • @justafreak15able
      @justafreak15able 4 года назад +3

      In an open source world there is no such thing as stealing.

  • @PaleSaturn
    @PaleSaturn 5 лет назад +8

    This is one of the best programming channels on RUclips. Love the production on your videos!

  • @yejielwahnich6136
    @yejielwahnich6136 5 лет назад +6

    Link preview I wonder if I will ever need this.
    Said jokingly knowing they talked about it just 1 week ago.
    Awesome content !!

    • @Fireship
      @Fireship  5 лет назад +2

      I get my best ideas from Slack :)

    • @yejielwahnich6136
      @yejielwahnich6136 5 лет назад

      @@Fireship not for nothing we love/hang around slack ;)

  • @alexanderf7008
    @alexanderf7008 8 месяцев назад

    I only start understanding things better after watching your videos. It's such a great explanation and never boring.

  • @actualmortgage7122
    @actualmortgage7122 3 года назад +2

    Thanks for this upload. I got into a little bit of web scraping a couple years ago but ran into a lot more complexity than I was expecting for a tiny little side project. Now that I know a lot more, and JS, hoping to add this to the toolbelt.

  • @TheGejr
    @TheGejr 5 лет назад +27

    Just as I'm building a news-scraper you make this amazing video! You're using the exact same packages I am :^) Great video as always!

  • @domaincontroller
    @domaincontroller 3 года назад +3

    00:43 use-case 1 02:19 cherrio 06:53 puppeteer

  • @firaskudsy
    @firaskudsy 5 лет назад +12

    Thanks for those amazing videos... u r covering the full full stack 👍

  • @remoteworkboard
    @remoteworkboard 5 лет назад +1

    You must be a mind reader. I was struggling with my link preview generator all week! Thank you soooo much.

  • @MrDots99
    @MrDots99 5 лет назад +2

    I've been doing this for that past 2 days at work and you upload a video today haha , fortunately I got it working but its always nice to see you upload !

  • @arisweedler4703
    @arisweedler4703 5 лет назад +3

    I love it! :D such well-made videos. I appreciate how you explain the problems that packages solve instead of just trying to peddling hype

  • @theblackharted
    @theblackharted 5 лет назад +3

    Your content is top notch and so well done!! Easily my favorite programming channel

  • @mateiadriel353
    @mateiadriel353 5 лет назад +1

    the puppeteer support for firefox is awesome!

  • @Autoscraping
    @Autoscraping 7 месяцев назад

    An outstanding video that has been a valuable reference for our newcomers. We sincerely thank you for sharing!

  • @zarefgamz2515
    @zarefgamz2515 4 года назад

    You saved me a lot of time to find the name of a tool
    It has been more than 3 days searching for that tool

  • @justadev____7232
    @justadev____7232 4 года назад +1

    For anybody wondering how to get firebase:
    npm install -g firebase-tools

  • @armaandhanji2112
    @armaandhanji2112 5 лет назад

    HUGE fan of your Node videos. Thank you so much! Best channel on RUclips.

  • @steaklover575
    @steaklover575 5 лет назад

    Been here for a long time, your content just gets better and better!

  • @Brlitzkreig
    @Brlitzkreig 9 месяцев назад

    Love how much more squeaky Jeff's voice was!

  • @ninanordbo
    @ninanordbo 2 года назад

    Love your content! Well done, my favorite youtube resource. Love that you get straight to the point.

  • @bartub5369
    @bartub5369 5 лет назад +1

    Great content! I have previously made a scraper for a discord bot with cheerio before, but never really fiddled around with puppeteer or anything similar to it earlier. Thanks for the ideas :)

  • @akarshbarar1492
    @akarshbarar1492 4 года назад +1

    Hey one of the thing I have been searching from last 4 months..

  • @lifasibiya4810
    @lifasibiya4810 5 лет назад +3

    Thanks again for the video Jeff. Last year, I was using a library called HtmlAgilityPack to achieve some of this with C#. You've made it quite simpler. YOU DESERVE THE FIRST FLYING CAR MAN👍🏾

  • @diabolo1
    @diabolo1 5 лет назад

    OMG I’ve been looking for all the web for resources on WebScraping and suddenly your video appears! Well done!

  • @SmartWizzard
    @SmartWizzard 4 года назад

    Man you are awesome, all your videos are latest, useful and very informative. I very much happy that I have subscribed to your channel.

  • @raph6709
    @raph6709 2 года назад

    Dude...this is exactly what I needed... why are you so helpful

  • @htky
    @htky 5 лет назад +4

    Perfect timing, I just started a project that will involve scraping 👌🏻

  • @curiousnrd
    @curiousnrd 5 лет назад +1

    Fantastic video! I’m excited to test this out. Thank you for sharing this information.🙏🏾

  • @albertchung7641
    @albertchung7641 4 года назад

    This channel's videos are all so cool. You rock!

  • @coolaydalena
    @coolaydalena 5 лет назад +3

    Ooohhhhwwww.. More Ideas about cloud functions please... 🔥❤️

  • @trusterzero6399
    @trusterzero6399 4 года назад

    As a programmer there is a lot of stuff that I kinda want to look in to but I never really get the chance to. These videos are helping me hard with that

  • @iykazorji8171
    @iykazorji8171 5 лет назад +2

    This is some quality content yo! Thanks for this!

  • @calebkopp4174
    @calebkopp4174 5 лет назад

    Very insightful. Appreciate all the work that's put into the videos

  • @viveksoundrapandi
    @viveksoundrapandi 5 лет назад

    Never new about puppeteer. Excellent one

  • @stevewitman
    @stevewitman 5 лет назад

    Interesting topic and it was also helpful seeing a firebase cloud function being set up.

  • @alinandrei9006
    @alinandrei9006 5 лет назад

    Thanks, understood what I came for from the first two minutes.

  • @greatsuccess4734
    @greatsuccess4734 5 лет назад

    I have been waiting for this video for so long thank you so much

  • @tedfitzpatrickyt
    @tedfitzpatrickyt 4 года назад

    scraping’s also useful for a cms allowing users to maintain easy html, but then transform page into a better ui

  • @MedyGames
    @MedyGames 4 года назад +4

    Cant you just do const urls = images.map(v => v.src) ? 09:00
    Or is Array.from added for readability ?

    • @tamuahmed5303
      @tamuahmed5303 2 года назад

      That callback function returns a node list. So we are converting it into array.

  • @tanujvyas6124
    @tanujvyas6124 5 лет назад

    Thank you for this video .I was looking for guidance for web scraping .This video prooved to be a great source.

  • @drowmik
    @drowmik 5 лет назад

    That a cool detailed video in a very short time about scrapping...

  • @randith87
    @randith87 5 лет назад

    Just getting into Node, and Json. Having a lot of fun! Definitely a challenge! 😎💻

  • @will_abule
    @will_abule 5 лет назад +3

    Thanks Jeff this can be extended for ssr and seo 😁

  • @camerenisonfire
    @camerenisonfire 5 лет назад +1

    Always such interesting and well produce videos. Thanks, Fireship.

  • @sebastiancuk7004
    @sebastiancuk7004 5 лет назад

    Im using puppeteer in my work to make short videos testing the app automatically (feels awesome).
    Also really cool video keep the good work !

  • @lyricalstudio7845
    @lyricalstudio7845 5 лет назад

    All your videos are amazing!! Keep up the good work and we will do our best to support your channel. Cheers!! :-)

  • @NikitaKaramov
    @NikitaKaramov 5 лет назад +11

    This video feels illegal...
    I love it.

  • @jeandrepentz5011
    @jeandrepentz5011 5 лет назад

    That puppeteer library looks neat, never knew about it

  • @gavinmurphytheperson5446
    @gavinmurphytheperson5446 5 лет назад

    Thats pretty cool. Don't have a use case yet to use this yet but I can imagine how useful it could be. Great video as usual!

    • @tworizki
      @tworizki 5 лет назад

      It is. Imagine you could scrape the entire net and let people browse your scraped pages using keywords. Oh wait...

  • @xit
    @xit 5 лет назад

    That Tshirt Looks SICK!

  • @Oxcorp
    @Oxcorp 5 лет назад

    Love your videos man, best quality!

  • @OniasdaRocha
    @OniasdaRocha 4 года назад

    you just got yourself a subscriber, great content my man

  • @igotapochahontas
    @igotapochahontas 5 лет назад

    This is a really good video. Very clear explanation and simple

  • @bjerz23
    @bjerz23 5 лет назад

    Your videos are just so good

  • @zdravko6t5
    @zdravko6t5 5 лет назад

    Very interesting video. Learned so much from this video. Didn't know you can do such idea using Instagram.

  • @milesmeow
    @milesmeow 5 лет назад

    A great video for adding a tool to our web Swiss Army knife.

  • @jlai383
    @jlai383 5 лет назад

    I like this kinda demo so much!

  • @G3Number
    @G3Number 5 лет назад

    Your tutorials are top quality !!

  • @alokrawatt
    @alokrawatt 5 лет назад

    Thanks for your video, I have all of your videos. Loving it.

  • @aravind.a
    @aravind.a 4 года назад

    Good explanation with example, Jeff.

  • @GavinLon
    @GavinLon 5 лет назад

    Very interesting video and very well presented. Thank you.

  • @AngrejKumar
    @AngrejKumar 2 года назад

    this is awesome man. Thank a lot!

  • @jackbird5839
    @jackbird5839 3 года назад +3

    "Awesome tutorial thank you. but for a non-tech user it is quiet hard to do a workable scraper for my WooCommerce store. As a side solution i am using eCommerce "ESCRAPER" maybe it helps somebody too.
    But I am not giving up))) Thank you for your input!!!"

  • @T3P
    @T3P 4 года назад

    I really love watching your videos ❤️

  • @sanuj-bansal
    @sanuj-bansal 5 лет назад

    Wish youtube had a heart react button also for such videos.

  • @thiagomattos450
    @thiagomattos450 5 лет назад

    You could bypass the cors in the front-end using cors-anywhere

  • @dannythrasher
    @dannythrasher 5 лет назад +1

    Great Video! Would love a tshirt!

  • @adante407
    @adante407 5 лет назад

    This video is great! Just subscribed. Fantastic channel, amazing content. 👍

  • @LukePighetti
    @LukePighetti 5 лет назад

    I've always used Cheerio for scraping. Seems to work pretty well.

    • @jaimegonzalez3956
      @jaimegonzalez3956 5 лет назад

      Luke Pighetti are you able to scrape sites that use JavaScript to populate the DOM? I’ve always had issues scraping with it.

    • @LukePighetti
      @LukePighetti 5 лет назад

      @@jaimegonzalez3956 Not with Cheerio, need something like Puppeteer for that

  • @Coksnuss
    @Coksnuss 5 лет назад +1

    I don't get what was said about CORS: IMO The main reason why cross origin requests are blocked from reading inside the browser is because an attacker could otherwise easily gain access to private data of a user or read CSRF Tokens of remote sites (which rely on the SOP and CORS). Spectre has nothing to do with that and only became known to public in 2018. CORS on the other hand is in existence much longer and dates back to 2009.
    As I understand CORB is different from that and specifically targets remote resources that are fetched by means of HTML elements (e.g. img element) as opposed to XHR or fetch requests which would probably be used to implement a Website preview for embeded links in the frontend.

  • @MagnusVestergaard
    @MagnusVestergaard 5 лет назад

    Web scraping is a great thing to know, thanks!

  • @tekforge
    @tekforge Год назад

    Thanks for the insights!

  • @gauravdasgupta3663
    @gauravdasgupta3663 5 лет назад

    Everytime I share some url in my LinkedIn posts..it loads the preview so beautifully...and I was always thinking how do they do that... U made it so easy.. now I can do it too... Thanks a ton

  • @vinukurian389
    @vinukurian389 5 лет назад

    Exactly what i was looking for 👨‍💻

  • @RoshanKumar-yr1jf
    @RoshanKumar-yr1jf 5 лет назад

    Thanks for the great video... I always wanted to do something in this domain...I was just shifting to python scrapy tool... Thanks again for awesome video..😃😃

  • @alexgogan1617
    @alexgogan1617 5 лет назад

    I'll be having such a scrape time with this!

  • @iova666
    @iova666 5 лет назад

    wish i had this vid a few years ago.
    you just gained a new sub :)

  • @yallayeho1238
    @yallayeho1238 5 лет назад

    Very interesting topic! Thank you!

  • @bharathravi7820
    @bharathravi7820 5 лет назад

    Was looking forward to this, thanks a ton!

  • @NRagAa
    @NRagAa 5 лет назад +1

    In some cases, poorly built web services leave out the internal API which can be found in client side JavaScript. So. its nice to do a hunt for that api.

  • @iddrissraaj6737
    @iddrissraaj6737 5 лет назад

    Great Video, informative as always

  • @paulboamah
    @paulboamah 5 лет назад +2

    Thanks so much for this video

    • @Fireship
      @Fireship  5 лет назад +2

      Thanks for watching!

  • @williamragstad
    @williamragstad 5 лет назад

    Amazing video! Love your content!

  • @DarkNenyk
    @DarkNenyk 5 лет назад

    Thanks to you I got into Angular

  • @goneandwiped3521
    @goneandwiped3521 4 года назад

    Nice, loved it

  • @luigis.3909
    @luigis.3909 4 года назад

    For web-scraping the best approch for me is PHP + simplehtmldom class. Simplehtmldom have a nice sytax it's like jquery for scraper :-D

  • @mksoftwaresolutions9303
    @mksoftwaresolutions9303 5 лет назад

    Another great video. Thank you.

  • @anonwithamnesia
    @anonwithamnesia 5 лет назад

    Great Video! I love your videos. Keep it up bro!!

  • @ryansamarakoon8268
    @ryansamarakoon8268 5 лет назад

    This is great! I want to try make like an app that uses face recognition to try find pictures of you where you were tagged or not, but I was too late to start using the API. This now gives me hope that it's possible

  • @BrianClincy
    @BrianClincy 5 лет назад

    Kevin Darrent.. I have used puppeteer and did a meetup with it. Thanks!

  • @DisasterSPA
    @DisasterSPA 5 лет назад

    Amazing video as usual!