Node.js Web Crawler from Scratch | Full Tutorial

Поделиться
HTML-код
  • Опубликовано: 6 окт 2024
  • I'll walk you through building a web crawler in JavaScript using Node.js and a few minimal dependencies. We'll be parsing raw HTML and following hyperlinks. It will be a simplified version of what Google does when they ingest the internet.
    Full project instructions: boot.dev/build...
    Learn back-end development: boot.dev

Комментарии • 39

  • @CodewithAbhi03
    @CodewithAbhi03 Год назад +6

    Just awesome😁 you not only helped creating a crawler but also taught how to use testcases and code. Thank you soo much 🥳🥳

  • @leonss2356
    @leonss2356 10 месяцев назад +3

    that was pretty great, helped me get more familiar with the URL class and string manipulation and parsing in general, also finally got me learning testing which I had been ignoring for quite a while now.

  • @arnoldasiimwe1819
    @arnoldasiimwe1819 Год назад +4

    This was great, i learnt more than just crawling the internet... Am experimenting with TDD with Jest. thanks a banch.

    • @bootdotdev
      @bootdotdev  Год назад

      You're welcome! Glad it helped

  • @matiassomoza8207
    @matiassomoza8207 Год назад +3

    Ok. It works. It's actually amazing seeing it work (since, from my experience, most code tutorials in you tube, at some point, don't work).
    I learnt some Node.js, (mostly Express) to make REST Apps (CRUD). But that was it. A server, some routes, some controllers; Sequelize to post stuff into a Postgres Data Base, and that's it.
    This is another level. I just was able to follow the tutorial, but I would be lying if say I understood everything you did. Yes, you import some modules, you install some packages from npm, you tested some functions... yep. And it works, and I don't know how.
    How can I learn what you do? I know you are a Backend Developer, but, (at least with Node.js), how did you learnt all that? It's awesome, it really is.

    • @bootdotdev
      @bootdotdev  Год назад +1

      If you want to learn, you should check out boot.dev

  • @MOAhmed-l5t
    @MOAhmed-l5t Год назад +3

    Nice video, clear sound, more information and very helpful
    Thank you so much for this working hard
    We need more of these projects of nodejs
    DC from Sudan

    • @bootdotdev
      @bootdotdev  Год назад +1

      So glad it helped :) I'll be doing more, be sure to sub

    • @MOAhmed-l5t
      @MOAhmed-l5t Год назад

      dev subscribed and shared it very earlier 😍

  • @marcorosenbaum9900
    @marcorosenbaum9900 3 месяца назад +1

    well explained! very easy to follow

    • @bootdotdev
      @bootdotdev  Месяц назад

      also easy to sub ya know

  • @parthghatge2423
    @parthghatge2423 Год назад +1

    Cool Project Man !! Learnt a Lot.

  • @michaelpumo83
    @michaelpumo83 Год назад +2

    Brilliant video and your teaching style is very clear! Is this code available in a GitHub repo or Gist somewhere that I can use for reference at all? Thank you

  • @hsider
    @hsider Год назад +1

    Subscribed ! You're other videos seem interesting, I'm checking em out soon. Nice content 👍

  • @somsutube
    @somsutube Год назад +4

    47:36 guys let's not DDOS xD

  • @thomasbabinsky45
    @thomasbabinsky45 Год назад +1

    In sortPages you are creating aHits and bHits but actually dont use them :P .. great tutorial thank you.

  • @miyamotomusashi5170
    @miyamotomusashi5170 6 месяцев назад

    Thankyou very much.😊
    Watched the complete video.
    Please post videos like this more🤗

  • @exe.m1dn1ght
    @exe.m1dn1ght 9 месяцев назад

    ok so i created a spider, and i'm crawling this website, my spider goes page by page , but it's very slow , half a second for each page, why is that ?

  • @josephuzuegbu7431
    @josephuzuegbu7431 Месяц назад

    pls sir, can you drop the link. Thanks

  • @vinhngotrung859
    @vinhngotrung859 10 месяцев назад

    Hello, Do you know any way to create a web crawling for multiple websites with different structures?

  • @bryanarycode3417
    @bryanarycode3417 Год назад

    I ran into an issue concerning Jest. Everything passes no issue with just 2 pages concerning sorting for the report. I get an error with any more than 2 pages, stating the output didn't match the expected. I feel this has something to do with the a,b hits function, but cannot for the life of me figure out what. The project works flawlessly in production, it only fails when trying to test with more than 2 pages with Jest. Any ideas on this?
    (Edit!)
    I just figured it out, for some reason it required me to put the pages in the expected variable in the exact opposite order the input variable pages were ordered and the test passed. A bug from a recent update perhaps? Either way thank you for the knowlege!

  • @amt.7rambo670
    @amt.7rambo670 10 месяцев назад

    bro can this crawl all websites like complex ones like amazon or other ecommerce website pls reply bro ??

  • @karthiksharma6752
    @karthiksharma6752 Год назад

    @bootdotdev ,for the function getURLfromHTMl test, you are checking both the arrays are equal or not using toEqual< I'm unable to do that donno why, and the other thing is dom.window.querySelectorAll("a") isn't giving the output array, I debugged in my case and found it to be dom.window.querySelectorAll("a") .forEach(linkElement=>{
    } this thing, bt still I tried resolving the test error multiple times using toMatch or new Set(actual), but nothing worked...........kindly provide me with a solution....i hope there will be a reply soon......

    • @bootdotdev
      @bootdotdev  Год назад

      Hard to help here. Join the Boot.dev discord for help!

  • @ExplorerSpace
    @ExplorerSpace 5 месяцев назад

    where is the github link.

  • @exe.m1dn1ght
    @exe.m1dn1ght 9 месяцев назад

    this can be turned into a weapon hahahaha

  • @rahulnegi4027
    @rahulnegi4027 7 месяцев назад

    can anyone provide the source code

  • @DeveloperMan_
    @DeveloperMan_ 6 месяцев назад

    bro hate working with or || operator 😆

  • @randomdamian
    @randomdamian Год назад +1

    Did he really change a totally clear name of "input" "output" "expected" into "AcTuAl"? I'm pretty sure I have never named anything in my life "Actual" i did do "currentString" etc.

    • @bootdotdev
      @bootdotdev  Год назад +4

      Akshually...
      "expected" and "actual" are very common unit test names