Advanced Web Scraping with Puppeteer: Avoid Looking Like a Bot and Pass Authentication!

Поделиться
HTML-код
  • Опубликовано: 7 фев 2025
  • In this video, we're going to take a look at two puppeteer improvements. First, how can you appear as if you were not a robot? That can be very helpful for avoiding bot protection or captchas. Secondly, how do we get through the authentication of a website? Let's dive right in!
    Thanks for watching, I wish you lots of fun implementing these puppeteer tips into your own projects! Remember, some companies do not allow scraping their website, so I advise just scraping your own.. :^)

Комментарии • 69

  • @righttiming
    @righttiming 11 месяцев назад +6

    “well… we look like a bot. maybe because we are a bot” 🤣
    legend. great video

  • @PremchandVerma-dz9dq
    @PremchandVerma-dz9dq 24 дня назад

    Loving this in-depth web scraping tutorial! On a related note, if anyone is hunting for cold emailing gold standards, Mystrika is seriously impressive. Handling multiple languages without a hiccup has broadened my outreach. And you cannot beat their warmup pool and analytics for that price. Microsoft 365 email deliveries are seamless now. Considering the pay-once-use-forever model, it is a gem worth diving into.

  • @fitter2boss72
    @fitter2boss72 2 года назад +9

    Each video adds something "advanced". Let's continue. Thank you.

  • @2ru2pacFan
    @2ru2pacFan Год назад +75

    Thanks Kevin De Bruyne

  • @codewithguillaume
    @codewithguillaume 2 года назад +4

    That’s so interesting. I didn’t even know we could have this report as an image. We’ll I think that I’ll spend my weekend working on my bot - however how to host them? Do you have a raspberry pi at home or do you use a regular host online?

    • @moussaibrahem9
      @moussaibrahem9 2 года назад

      I think because it build on top of nodejs you can host it eny where you want

    • @ashutoshpatel5030
      @ashutoshpatel5030 Год назад

      @@moussaibrahem9 Yeah I too think that you can host that bot just like any other node application we host!!

    • @miguelvelascodev
      @miguelvelascodev 5 месяцев назад

      You can use docker and deploy in a normal server, i use docker-compose to deploy apps like this, installing all the dependencies, sometimes requires to install a graphic interface if you are using headless: false. I hope this help :)

  • @shivram3848
    @shivram3848 24 дня назад

    Whoa, great insight into web scraping mechanics! While scraping is great, do not underestimate top-tier cold emailing. Mystrika has been my go-to for upping the game. Their tag management and subsequences have saved me countless hours. Bonus: their comprehensive analytics bounce clarity to new heights. Forget juggling numerous logins, just one system powerhouse does the trick. Nothing quite like seeing those stats roll in ever so smoothly.

  • @dragos.temelie
    @dragos.temelie 3 месяца назад

    Very interesting concepts. Thanks!

  • @mohitpunia3874
    @mohitpunia3874 5 месяцев назад +1

    i am passing html as string to it and making pdfs, but images are not getting load, but same thing works in nodejs

  • @thomasdinh2k
    @thomasdinh2k Год назад +1

    I have a question, instead of manual passing authentication, why can't I just login manually and then pass the cookie into the script. Is that harmful or something?

  • @sebastianruiz8213
    @sebastianruiz8213 Год назад +1

    Thank you so much! This helped me out on a very important project.

  • @bvodola
    @bvodola Год назад

    Got myself unstuck because of this video. Thanks man,

  • @makhmudjonjamoldinov3554
    @makhmudjonjamoldinov3554 Год назад

    bro, I actually found out that u can set headless to false in the launch options and it works

  • @8kelvin
    @8kelvin 4 месяца назад +1

    If you do npm install now, you no longer need to add executablePath to your code.

  • @1000timka
    @1000timka 6 месяцев назад

    Thank you this video helped me do some not so savory things you r the goat!!!!

  • @kudah263
    @kudah263 3 месяца назад

    Why didn't you use nodemon for this project?

  • @ahmadfraz5846
    @ahmadfraz5846 4 месяца назад

    how to bypass different types of captchas, please make a video on it.

  • @rodrigodanielss
    @rodrigodanielss 10 месяцев назад

    After 2 or 3 requests amazon fails.
    Tested the modifying to the plugin and stealth in de video, and still failing the same amount.
    Gonna have to learn and test with Crawlee.

  • @moussaibrahem9
    @moussaibrahem9 2 года назад

    Your videos idea is mind blowing keep going mate

  • @jameskayihura1675
    @jameskayihura1675 5 месяцев назад

    Thx Kevin. Just wondering if one can use the same code with puppeteer-core

  • @mihaelacostea5783
    @mihaelacostea5783 Год назад

    Would this still work in 2024? Or have big companies came up with the 'defence' already?

  • @MarieAmeliaFreyaAster
    @MarieAmeliaFreyaAster 6 месяцев назад

    That's really helpful, thanks a lot

  • @henriquematias1986
    @henriquematias1986 Год назад

    Have you tried doing the same on ebay and try log in? They still detect even if you use stealth!?

  • @Leofmoura87
    @Leofmoura87 10 месяцев назад

    What's the ultimate solution for resolving captcha?

  • @tushswe
    @tushswe 5 месяцев назад

    How do we solve captcha with puppeteer KDB?

  • @thabosiphiwemngoma1859
    @thabosiphiwemngoma1859 6 месяцев назад

    Can you show the case where you log in with Google

  • @eternl_sunshine22
    @eternl_sunshine22 9 месяцев назад

    Hi Josh just wondering how you used cjs modules along with es6 modules, cos i can't seem to make it work

  • @samfisher8426
    @samfisher8426 Год назад

    perfect content ,thats what i need to learn ,in case i use it some day in some CTF ;)

  • @AnoSkinz
    @AnoSkinz 5 месяцев назад

    How convert multiple script Node.js & Puppeteer to one file?

  • @splenwilz
    @splenwilz Год назад

    Looks like waitforTimeout will soon be deprecated. Is it a way to enforce headless true?🤔

    • @b3T4RIK
      @b3T4RIK Год назад

      browser = await launch({'headless': True})

  • @JustinK0
    @JustinK0 Год назад

    So i guess if the login required to use gmail to login, it wouldn't work because the browser that is opened doesnt seem to allow gmail login api

  • @dglalperen
    @dglalperen Год назад

    First of all nice video ! What can we do about two factor authentication ?

  • @kingkckc
    @kingkckc Год назад

    can you explain how the secret.ts file is structured if we wanted replicate feeding in the login credentials from a different file vs hardcoding?

  • @hamzadastgir1
    @hamzadastgir1 11 месяцев назад

    Im a laravel dev and was really strugling with a scraping task .. but Allah (God) sent you for my help :) Thanks a lot
    Love u

  • @AbuTaher-vx2oe
    @AbuTaher-vx2oe Год назад

    It is not working on production server, What can i do?

  • @Reaaa
    @Reaaa Год назад

    do you know any similar plugins for python

  • @hemdenminiar2139
    @hemdenminiar2139 10 месяцев назад

    where can i find the code please ?

  • @fitter2boss72
    @fitter2boss72 2 года назад

    How to send a form and catch, rename, save a file?

  • @razshahar7029
    @razshahar7029 2 года назад

    thx i search how to fix the err in min 5 ,very helpful

  • @mecode4646
    @mecode4646 Год назад

    thank you so much!

  • @OptimBro
    @OptimBro 2 года назад

    is there any way to type like a real human does? with random key taps?

    • @lzxp7943
      @lzxp7943 Год назад

      is it really necessary? as long as you pause between the email password and button click timeout should be ok.

  • @ApoloXII-sm4tx
    @ApoloXII-sm4tx 8 месяцев назад

    great video

    • @JAODc-fo9gf
      @JAODc-fo9gf 8 месяцев назад

      this comment was made by my bot :)

  • @xaviermahafaly1807
    @xaviermahafaly1807 4 месяца назад

    thanks so mutch it helpfull

  • @sleepycat3466
    @sleepycat3466 6 месяцев назад

    Im gonna change from Manucian to the citizéns

  • @boopfer387
    @boopfer387 Год назад

    you're awesome!

  • @BtcBroccoli
    @BtcBroccoli 10 месяцев назад

    life saver

  • @redradar3366
    @redradar3366 Год назад

    svvveet, works great with python pyppeteer also. thanks for the vid

  • @brandon400
    @brandon400 Год назад

    source code😭😭

    • @b3T4RIK
      @b3T4RIK Год назад +2

      just screenshot and use a online image to text converter

  • @saulotarsobc
    @saulotarsobc 11 месяцев назад

    +1