Advanced Web Scraping with Puppeteer: Avoid Looking Like a Bot and Pass Authentication!
HTML-код
- Опубликовано: 7 фев 2025
- In this video, we're going to take a look at two puppeteer improvements. First, how can you appear as if you were not a robot? That can be very helpful for avoiding bot protection or captchas. Secondly, how do we get through the authentication of a website? Let's dive right in!
Thanks for watching, I wish you lots of fun implementing these puppeteer tips into your own projects! Remember, some companies do not allow scraping their website, so I advise just scraping your own.. :^)
“well… we look like a bot. maybe because we are a bot” 🤣
legend. great video
Loving this in-depth web scraping tutorial! On a related note, if anyone is hunting for cold emailing gold standards, Mystrika is seriously impressive. Handling multiple languages without a hiccup has broadened my outreach. And you cannot beat their warmup pool and analytics for that price. Microsoft 365 email deliveries are seamless now. Considering the pay-once-use-forever model, it is a gem worth diving into.
Each video adds something "advanced". Let's continue. Thank you.
Thanks Kevin De Bruyne
Thanks, I like to help a homie out when I’m not scoring goals
😂😂😂
😭😭😭
😂😂😂😭
😂
That’s so interesting. I didn’t even know we could have this report as an image. We’ll I think that I’ll spend my weekend working on my bot - however how to host them? Do you have a raspberry pi at home or do you use a regular host online?
I think because it build on top of nodejs you can host it eny where you want
@@moussaibrahem9 Yeah I too think that you can host that bot just like any other node application we host!!
You can use docker and deploy in a normal server, i use docker-compose to deploy apps like this, installing all the dependencies, sometimes requires to install a graphic interface if you are using headless: false. I hope this help :)
Whoa, great insight into web scraping mechanics! While scraping is great, do not underestimate top-tier cold emailing. Mystrika has been my go-to for upping the game. Their tag management and subsequences have saved me countless hours. Bonus: their comprehensive analytics bounce clarity to new heights. Forget juggling numerous logins, just one system powerhouse does the trick. Nothing quite like seeing those stats roll in ever so smoothly.
Very interesting concepts. Thanks!
i am passing html as string to it and making pdfs, but images are not getting load, but same thing works in nodejs
I have a question, instead of manual passing authentication, why can't I just login manually and then pass the cookie into the script. Is that harmful or something?
Thank you so much! This helped me out on a very important project.
Got myself unstuck because of this video. Thanks man,
bro, I actually found out that u can set headless to false in the launch options and it works
If you do npm install now, you no longer need to add executablePath to your code.
Thank you this video helped me do some not so savory things you r the goat!!!!
Why didn't you use nodemon for this project?
how to bypass different types of captchas, please make a video on it.
After 2 or 3 requests amazon fails.
Tested the modifying to the plugin and stealth in de video, and still failing the same amount.
Gonna have to learn and test with Crawlee.
Your videos idea is mind blowing keep going mate
Thx Kevin. Just wondering if one can use the same code with puppeteer-core
Would this still work in 2024? Or have big companies came up with the 'defence' already?
That's really helpful, thanks a lot
Have you tried doing the same on ebay and try log in? They still detect even if you use stealth!?
What's the ultimate solution for resolving captcha?
How do we solve captcha with puppeteer KDB?
Can you show the case where you log in with Google
Hi Josh just wondering how you used cjs modules along with es6 modules, cos i can't seem to make it work
perfect content ,thats what i need to learn ,in case i use it some day in some CTF ;)
How convert multiple script Node.js & Puppeteer to one file?
Looks like waitforTimeout will soon be deprecated. Is it a way to enforce headless true?🤔
browser = await launch({'headless': True})
So i guess if the login required to use gmail to login, it wouldn't work because the browser that is opened doesnt seem to allow gmail login api
First of all nice video ! What can we do about two factor authentication ?
cry
can you explain how the secret.ts file is structured if we wanted replicate feeding in the login credentials from a different file vs hardcoding?
Im a laravel dev and was really strugling with a scraping task .. but Allah (God) sent you for my help :) Thanks a lot
Love u
It is not working on production server, What can i do?
do you know any similar plugins for python
where can i find the code please ?
How to send a form and catch, rename, save a file?
thx i search how to fix the err in min 5 ,very helpful
thank you so much!
is there any way to type like a real human does? with random key taps?
is it really necessary? as long as you pause between the email password and button click timeout should be ok.
great video
this comment was made by my bot :)
thanks so mutch it helpfull
Im gonna change from Manucian to the citizéns
you're awesome!
life saver
svvveet, works great with python pyppeteer also. thanks for the vid
source code😭😭
just screenshot and use a online image to text converter
+1