The best solution imo is NodeJS + Puppeteer + puppeteer-extra-plugin-stealth plugin. It's free, doesn't rely on any 3rd party APIs and works 100% to avoid cloudflare blocking and other captchas. You can even log into any website, even if it uses OAuth for Google, Facebook, Amazon, Microsoft, Twitter, Apple etc.
Although there are many software solutions for automating and extracting data from a website, using NodeJS and its library ecosystem remains the most flexible option offering endless possibilities.
@@aniakubow You're welcome :) Ideally put the most relevant words first, as it's likely algos will regard those as more important than later words. So instead of say "Make your videos better on RUclips", you should have "RUclips videos-improve yours". Also, it's usually better to use positive words rather than negative-eg "You will win" > "You can't lose". So: Scrape ALL Data Scrape EVERYthing Scrape and Catch ALL Data Emphasize ALL and EVERY, because that's the unique point of this video-if everything is in CAPS, then nothing is emphasized.
Another sticknote, like the documentation says, is not a web browser instance, it just takes the html to interpretate and do the job, so, if we do this stuff on websites that don't do server side rendering at all, will be missing some information since maybe it's loaded by external sources, like multiple scripts, external call apis, etc.
This sticknote is more than just a note. It’s the difference between pulling your hair out and understanding right away why some values are populated and some are not.
Yeap, she's just getting subscribers off her looks, and using these stupid sponsors as her "content". I disliked this video, and another one. In watching the previous one I couldn't figure out whether she just can't type or she doesn't really know what the heck she's talking about.
At the last video there was axios + express module , but i tried it on react result was CORS errors. Maybe this video is going to tell about that kind of errors and maybe about proxy set ups.
@Code with Ania Kubów,Hi, your video of the battleship is unavailable.Can you please look into it ? Because your video is the part one and the part 2 & 3 three is working. I am trying to study the game logic and it will be very helpful if you can re-upload your video.Thank you.
It doesn't have to feel annoying. Just tap/click the notify button, then put it out of your mind and move on to thinking about literally anything else in the world.
I'll second that. I'd even go as far as calling all those announcements years in advance spam, literally made me unsubscribe from this channel. Now, that's not to say that the content itself isn't of high quality. Ania is a real gem - I keep checking back occasionally. 👍
great job I followed your steps and really it was fantastic, I am a data scientist and you impressed me. God bless you and if you need anything like Machine learning I am working on algorithms.
WOW amazing tutorial! I love your style and your approach. I am starting web development. I want to learn Vanilla JS your way. What is the best practice to learn and retain the methodology of JS? Please help :)
Great video! I've had some better experience scraping using xpaths instead of classnames in sites which dynamically generate the classnames. But it seems to go down to the content being scraped. Scraping using CSS selectors seems to be faster also.
Killer look, light pink that's definitely you. Scrap 'UNSCRAPPABLE' data yeah I'm in, I'll be back, spoken in an Arnold hillbilly German accent. Love your stuff GO Ania.
Thank you for the great content. I have a request because I've been searching all over to find a good explanation on how to scrape pages that have a load more button - NOT DIFFERENT PAGES - using Cheerio and Puppeteer. I can scrape a page when it's auto-loading when scrolling down but still couldn't make it by clicking the load more button😭. Thank you.
@@qualitytransportation I know that it should click, but whenever I try it's not working. I mean the puppeteer will not click the load more. I did navigate the click button and but I don't know why it's not working.
Hey ania can you also include the part where you can store the fetched data in a database(Like mongodb) and then show the user. it would be a great help OwO OwO
I am almost embarrassed to admit on how much easier it is to learn such stuff when your teacher is just smokin' hot :D besides being an amazing teacher already, dont get me wrong :)
um, update the old video so that it actually works then do this christ id like to do your projects but id ont know this node.js technology for new versions!
The best solution imo is NodeJS + Puppeteer + puppeteer-extra-plugin-stealth plugin.
It's free, doesn't rely on any 3rd party APIs and works 100% to avoid cloudflare blocking and other captchas. You can even log into any website, even if it uses OAuth for Google, Facebook, Amazon, Microsoft, Twitter, Apple etc.
Is there any Python related option? Ive read about stealth plugin, it seems its great
See you in 3 days, Mother of Dragons.
See you there 🐉👑
Although there are many software solutions for automating and extracting data from a website, using NodeJS and its library ecosystem remains the most flexible option offering endless possibilities.
Ania, fyi in case it might affect the algorithm: "unscrappable" should only have one 'P'-ie "unscrapable" :)
Also, not sure it's a common word.
You make a very good point! Thanks for having my back 🙌🙌🙌. What is a better title do you think?
@@aniakubow You're welcome :)
Ideally put the most relevant words first, as it's likely algos will regard those as more important than later words. So instead of say "Make your videos better on RUclips", you should have "RUclips videos-improve yours". Also, it's usually better to use positive words rather than negative-eg "You will win" > "You can't lose". So:
Scrape ALL Data
Scrape EVERYthing
Scrape and Catch ALL Data
Emphasize ALL and EVERY, because that's the unique point of this video-if everything is in CAPS, then nothing is emphasized.
A little notecheck at 7:08, use the -D flag when installing nodemon, nodemon is just for development on this example
Another sticknote, like the documentation says, is not a web browser instance, it just takes the html to interpretate and do the job, so, if we do this stuff on websites that don't do server side rendering at all, will be missing some information since maybe it's loaded by external sources, like multiple scripts, external call apis, etc.
This sticknote is more than just a note. It’s the difference between pulling your hair out and understanding right away why some values are populated and some are not.
So.. if u want to scrap a dynamic web just go the sponsor of this video.... really?
Yeap, she's just getting subscribers off her looks, and using these stupid sponsors as her "content". I disliked this video, and another one. In watching the previous one I couldn't figure out whether she just can't type or she doesn't really know what the heck she's talking about.
At the last video there was axios + express module , but i tried it on react result was CORS errors. Maybe this video is going to tell about that kind of errors and maybe about proxy set ups.
I hope it solves your issues too :)
The problem with managed one is the cost. For custom one, you can pay for as low as $19/month for 100,000 pages. It's also not hard to scale.
Yes, I have also used cheeriojs with react native as an experiment and it worked well.
would love to see your computer setup, your desk, keyboard chair etc :)
I need to do it with more than 5000+ products and also need description and price and etc how can I do it
I am stuck on npm init, not sure how to follow instructions. Please help
@Code with Ania Kubów,Hi, your video of the battleship is unavailable.Can you please look into it ? Because your video is the part one and the part 2 & 3 three is working. I am trying to study the game logic and it will be very helpful if you can re-upload your video.Thank you.
Hey Ania, do you know how to scrape websites blocked by Cloud Flare? X
Im trying to do the same with twitter to get the tweets from any user, and it seems imposible. Could you help me?
Do you have your series 7 ?
Pozdrawiam z Polski i życzę dalszych sukcesów w rozwoju kanału!
Dziekuje 😍😍
I think putting a premiere 24 hours would be better. This long wait feels annoying!
It doesn't have to feel annoying. Just tap/click the notify button, then put it out of your mind and move on to thinking about literally anything else in the world.
I'll second that. I'd even go as far as calling all those announcements years in advance spam, literally made me unsubscribe from this channel.
Now, that's not to say that the content itself isn't of high quality. Ania is a real gem - I keep checking back occasionally. 👍
Good Work Annia!
didn´t work to me
Can u make myntra scrapper video
Nice one Ania - this is really great
Thanks for the video, can this also scrape out Instagram HTML content?
That was great Ania.....................take care ........................:) bye
great job I followed your steps and really it was fantastic, I am a data scientist and you impressed me. God bless you and if you need anything like Machine learning I am working on algorithms.
WOW amazing tutorial! I love your style and your approach. I am starting web development. I want to learn Vanilla JS your way. What is the best practice to learn and retain the methodology of JS? Please help :)
After viewing this video it would be interesting to see what we can do to prevent others from scraping our own website projects. 😅
Great Content, as usual, thank you so much for sharing it with us, I know how hard is to build a project then edit it, post...
Thanks🙏
How did you get that accent?
Amazing content! I'd be curious how to scrape/store data in a database and use that for my own frontend.
You're the best Ania! Thank you so much!
Great video! I've had some better experience scraping using xpaths instead of classnames in sites which dynamically generate the classnames. But it seems to go down to the content being scraped. Scraping using CSS selectors seems to be faster also.
Ty for these tutorials!
Hi
Can u please explain how to scrape email from LinkedIn
I think the video should help with that :)
Mam I am waiting.Why you did not list this video on top?
Oh I am not sure! Weird 👀
Killer look, light pink that's definitely you. Scrap 'UNSCRAPPABLE' data yeah I'm in, I'll be back, spoken in an Arnold hillbilly German accent. Love your stuff GO Ania.
Thanks - very useful as usual :)
great video, keep up the good work
Thank you so much Ania 🥰
Thank you for the great content. I have a request because I've been searching all over to find a good explanation on how to scrape pages that have a load more button - NOT DIFFERENT PAGES - using Cheerio and Puppeteer. I can scrape a page when it's auto-loading when scrolling down but still couldn't make it by clicking the load more button😭.
Thank you.
Just click it with puppeteer then load with cheerio
@@qualitytransportation I know that it should click, but whenever I try it's not working. I mean the puppeteer will not click the load more. I did navigate the click button and but I don't know why it's not working.
Using PHP or Perl?
Hey ania can you also include the part where you can store the fetched data in a database(Like mongodb) and then show the user. it would be a great help OwO OwO
Supabase is best choice
I wish there was an npm Ania command, because she is the total package. 😉
I came here to learn, instead i fell in love :D
Looking forward to this.
Thank you so much !
How to scrape Formula 1 data ?
This video should help I think :)
I need you as my technical partner
Hi Ania! 🙂🌸🏵🌹🌺🌼🌻🌷
Thank you for your kiss! You have made my day! 🙂🌺
No lo quiero, lo necesito
hi ania
hiya!
I love you so much! You are the best!)
you are !
ale się produkujesz ;) scrapowanie to ciężka sprawa....... sam ostatnio bawię się w diffbot'a
See you soon Teacher
At such times I would say... AI must understand what to scrape.
Amazing you are ❤
I am almost embarrassed to admit on how much easier it is to learn such stuff when your teacher is just smokin' hot :D
besides being an amazing teacher already, dont get me wrong :)
I think someone is trolling off your comments.
>How to scrape data
>Use paid service that sponsr this video
ayyyyyyyyyy lmao
I show two ways to do it so you can choose :)
how to scrape your ❤
I'm here to learn. 🙄
I personally use jsdom don't know why lol
Queen 👸
Hola 👋
Nothing but a sponsor video.
Titanic was lost in your bright eyes.... lovely, lovely you...
😜
I think there is a whole generation of programmers in love with her :))
Thanks mam
What accent is that?
🥳
Lovely
Как обычно, все "очень просто"! ) Как её смотреть то? Стояк мешает )
ты супер !
I wish so much I had a girlfriend just like you...smart, beautiful and a coder!!
👍
😱😇
My cyber girlfriend the smartest woman I know. You have my undying love, respect and devotion 🥰 I can't wait seriously on the edge of my seat 🤓
um,
update the old video so that it actually works
then do this
christ id like to do your projects but id ont know this node.js technology for new versions!
You can change the version of node.js to the one I am using in the video. Just check the package.json for the version :)
SCRAPE ME! Do you have an OF?