How to Scrape Any Website in Make.com

Nick Saraev

Просмотров 161 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 24 ноя 2024

Комментарии • 214

@nicksaraev 6 месяцев назад ⁺⁴
My new step-by-step program to get you your first automation customer is now live! Results guaranteed.
Apply fast: skool.com/makerschool/about. Price increases every 10 members 🙏😤
@Txjxsxa 5 месяцев назад
After building OpenAI module am facing rate limit error. Even after upgrading to GPT 4o am facing same issue.
Any idea how do I fix this?
@atbapp 6 месяцев назад ⁺⁹
Awesome tutorial Nick... I cant emphasise enough, not only how helpful this tutorial was but also the number of ideas this tutorial has given me - top 5 Channel for me!
@thibaultmouillefarine795 6 месяцев назад ⁺¹
Really ? Who else being in a Top 4?
@michellelandon8780 9 месяцев назад ⁺¹¹
Hi I want to say Thank you for being a great teacher. I appreciate you taking your time in explaining things.You are very easy to follow. I always look forward to your next video.
@nicksaraev 9 месяцев назад ⁺¹
You're very welcome Michelle!
@johnringo6155 7 месяцев назад ⁺¹
@@nicksaraev how can one get to your mentorship/course please ?
@senpow 7 месяцев назад
I got the same question. It seems that it is still in construction. Because we get curriculum in the video description. @@johnringo6155
@sketchingbyyash6358 7 месяцев назад
What did he write in the User Role - "Tell me about this website in JSON formate." What did he write after that?
@agirlnamedsew 8 месяцев назад ⁺⁵
3 minutes in and I know how to scrape a webpage and parse it to text. THANK YOU!!!!
@nicksaraev 8 месяцев назад
Glad I could help!
@elibessudo 7 месяцев назад ⁺¹
Super helpful, thank you. Any chance you could do a tutorial on how to scrape sites that require logging in?
@xvivaan7422 5 месяцев назад ⁺¹
Hey Nick, love the videos!! just had a few questions. Would love if you could help us out. What is your business model like? Do you offer clients a subscription model? or a one shot payment? and what do you think we should apply to our business model as well, considering we r looking to rope in new clients and remain profitable over a period of time. I ask this since the websites we will be using have a monthly subscription fee and a limit on the API / operation requests and if the requests exceed the limit of the plan purchased, how do you tackle that? It would be of great help if you could make a short 10 min video on this or maybe a reply to the comment. Love the series!! Keep up the good work!!
@saeedsm57 7 месяцев назад
One the best videos I have met this year so far..Thanks!
@sketchingbyyash6358 7 месяцев назад
What did he write in the User Role - "Tell me about this website in JSON formate." What did he write after that?
@amirsohail855 4 месяца назад
First of all, thank you @Nick Saraev for this such useful knowledge. You only scraped one record but there are a lot of records on how we scrap those all. Please give me the answer I am working on such a project just for learning purposes.
@yuryhorulko3834 8 месяцев назад ⁺²
Thank you so much Nick! Every time there is a brilliant video!
@sketchingbyyash6358 7 месяцев назад
What did he write in the User Role - "Tell me about this website in JSON formate." What did he write after that?
@robertjett_ 9 месяцев назад ⁺⁴
Dude, I cannot overstate how mindblowing this series is. There were so many things in this video that I had absolutely no idea were possible.
Also 1 bed 4 bath is crazy.
@nicksaraev 9 месяцев назад ⁺³
Hell ya man! Glad I could help. And SF real estate smh
@LeximoAI 7 месяцев назад ⁺¹
Hey Nick great video!! I just have a doubt. If you this module once for one url and then put it on sleep, how do you scrape the other urls. I didnt quite get the hang of it how it happens so it would be nice of you to explain in briefly. Thanks in advance!!
@alexf7414 5 месяцев назад
That's amazing, how did I miss this company. You got a new customer. Great job
@DaveyRanda 2 месяца назад ⁺²
This might be too basic, but how do i understand Tokens, and how do I know what limit to set when I am logging into Chat GPT here. I dont know how to estimate what a job costs etc.
@DaveyRanda 2 месяца назад
BUMP!!! me too... no clue how to do this and I just put a random number in lol
@MyFukinBass 8 месяцев назад ⁺¹
Another brilliant video, Nick!
Would be awesome to get a more in depth tutorial about Regex or what to ask chatgpt about (What are we looking for specifically) in order to scrape. Were you a developer before? You seem to know a lot about webdev.
Thanks again!
@sketchingbyyash6358 7 месяцев назад
What did he write in the User Role - "Tell me about this website in JSON formate." What did he write after that?
@jeffwc 8 месяцев назад ⁺¹
Very appreciative of what you’re doing with this series 🙏🏽
It’s becoming clear that having a solid understanding of JSON and regix is a must if you intend on building anything decently complex for clients. Any resources, courses, or forums you can point us towards?
Thanks again!
@nico.m527 8 месяцев назад
You can always ask ChatGPT for help with this kind of stuff. Explains to you in plain english
@nicksaraev 8 месяцев назад ⁺³
Thank you Jeffrey 🙏 agree that it's important. Luckily AI is making it less so-if regex is currently the bottleneck in your flows, you can usually "cheat" by passing the input into GPT-4 with a prompt like "extract X".
To answer your q, though, my education was basically: I watched a few RUclips videos, same as you, and now just use regex101 for fine tuning.
Most of my parsers are extremely simple and real regex pros would laugh at them (but they work for me and my wallet!) .* is your friend
Hope this helps man.
@sirishkumar-m5z 3 месяца назад
An engaging overview of the SmolLM model! Some alternative AI venues offer goods equivalent to those of other systems.
@swoollard 7 месяцев назад ⁺¹
Unfortunately i couldn't get this to work. The parsed HTML seemed to have different data to your example and i couldn't figure out Regex. You mentioned it could be done with Chat GPT - it would be helpful to know that approach also.
@great_live_music 8 месяцев назад
Really great content, thank you for this video!
If I wanted to optimize your flow, I would check if the URL is in the Google Sheets document before calling the parsed URL and extract the data on the page.
@nicksaraev 8 месяцев назад
Good thinking!
@nickromanek9444 Месяц назад
Really impressive stuff in this video, could it be used for a site that requires signing in too?
@stephena8965 7 месяцев назад
Hey Nick, amazing tutorial as always, you've massively helped me on so many flows - thank you! I actually managed to build a similar flow but instead of RegEx I used an anchor tag text parser with a filter that checked for the presence of a class of "page__link" from element type since all page links had that. Would you say there's anything wrong with this if it works for the use case?
@EliColussi 9 месяцев назад ⁺¹
I am curious how you would tackle getting around a "click to reveal" phone number. It requires 2 clicks to find the phone number.
@automate_all_the_things 9 месяцев назад
Super insightful videos, much appreciated! Just fyi, in timestamp 28:20 you're trying to expand the window size. You can do this by clicking the little symbol with the 4 arrows.
@axellang2132 7 месяцев назад
Thank you very much Nick for your amazing videos! I'm a beginner and this question may sound dumb but I'm running a scenario with 2 text parsers following each other. The first one runs 1 operation but the one following that's using the same input data runs way more operations. Do you know where that could be coming from? No hard feeling if you don't have time to answer ;)
@Storworx 9 месяцев назад
Again, your instructional video’s are so informative. Very much appreciated! Could you post how i can visit multiple websites from sheet? Would i add a sheet at the front and another at the the end to assess the next row?
@nicksaraev 9 месяцев назад ⁺⁵
Appreciate the support! Absolutely-here are steps for plugging multiple sites in:
1. Create a Google Sheet (sheets.new) with a column labelled "URL".
2. In the Make scenario builder, search for Google Sheets connectors. You're looking specifically for "Search Rows", which has a built in iterator. Make this the trigger of your flow.
3. Authorize your account, select the sheet from earlier in the modal, etc. Set "maximum number of returned rows" to however many you need.
4. Use the output from the "URL" column as the input to the rest of the flow you see in the video. Remember that since "Search Rows" is now a trigger, if you turn this scenario on it'll run every X minutes. So if you don't have a designated flow you might want to make it "on demand" and just run whenever you need to process sites/etc. You can then make another Google Sheet to collect the output and use the "Add Rows" module to fill it up.
Hope this helps!
@Landendoss 2 месяца назад
awesome video i just have one question, how do you get it to cycle through all pf the links? and not just use the same exact one everytime?
@karamjittech 8 месяцев назад
Great stuff. the shared Hidden_API_Masterclass.json seems incomplete, would be great if the complete json could be shared.
@yuryhorulko3834 6 месяцев назад ⁺¹
Hi Nick! Thank you for your education! But.... How to solve the issue with Status code 403?
@aymscores 5 месяцев назад
i think adding " " to the value section of the header's fixed this for me!
@chrisder1814 Месяц назад
Hello, I had some ideas to get products from multiple platforms and then compare prices but I'm not sure the ideas I thought of are good., could you tell me what you think about it?
@craigsandeman3865 7 месяцев назад
Managed to get a 200 reponse, on the first step. But it appears that some of the html is hidden. Seems like there is a delay in all the data being populated. I added all the header info. Thanks for the tutorial.
@jga13775 6 месяцев назад
Great video! What if the page you're trying to scrape requires authentification? Like the "my profile" section of uber or ny other company.
@sajjadalikanji9154 Месяц назад
I'm trying to do this for a car dealership but its not showing all of the other links of individual cars for sale
@ricardofernandes161 4 месяца назад
Masterclass Nick ! Thanks a lot for this video
@tobigbemisola 8 месяцев назад
This is great and well explained. After watching the full length of the tutorial, I'd rather opt for using a web scrapper tool until I'm good with using regex. Btw any resources on learning regex?
@nicksaraev 8 месяцев назад
Thx Tobi! Frankly I just use Regex101 for everything (regex101.com), the highlighting as you set your search up is extremely helpful. If you were to quiz me on tokens/selectors without a tool like this I'd probably know fewer than 50% of them 😂
@explorewithluke 6 месяцев назад
Don’t understand why you moved the last sleep before the sheets but otherwise great explanation
@esprit4432 7 месяцев назад ⁺¹
sometimes the regex say match on regex101 and then in integromat it doesn't..
@sunshinemodels1 8 месяцев назад ⁺²
came for the web scraping insights, stayed for the pearly white teeth
@nicksaraev 8 месяцев назад ⁺¹
Brb getting a Colgate sponsorship
@Bassdag1 9 месяцев назад
That was fascinating to watch and very clear explanation. Thank you for sharing. I am definitely subscribing!
@nicksaraev 8 месяцев назад ⁺¹
Welcome aboard!
@sketchingbyyash6358 7 месяцев назад
What did he write in the User Role - "Tell me about this website in JSON formate." What did he write after that?
@TesteAutomacao 6 месяцев назад
Hi Nick, how you doing?
First of all I wanna thank you for everything you are doing for us.
I tried to use this automation on different websites but there are a lot of websites where the code has 2 or 3 times almost in a row the same link for the same house/product, so when you use regex you get repeated results. How can I filter this or put some type of condition that makes me not have duplicate results and save me operations?
Thank you
@stevearodgers 7 месяцев назад
I can't get past the HTML to Text module. It keeps giving me an error message: BundleValidationError. Maybe poor HTML on the website I'm scraping? Anyway, thanks for the information. So much to learn!
@ArtemSFG 6 месяцев назад
Thanks so much for the tutorial! Just a question: how do you deal with pagination when scraping data?
@nicksaraev 6 месяцев назад
Thanks Artem 🙏 you'd create a separate route for the scraper so it can iterate over each page, then add each page's data to an array (using the add() function or similar). On your main route you'd then add a Get Variable module and pull the array contents. Hope this helps.
@ArtemSFG 6 месяцев назад
@@nicksaraev Thank so much for sharing, Nick! Hopefully, I'll be able to help you somehow one day :)
@FYWinBangkok 6 месяцев назад
Hey amazing work just you should cut in post what did not work I got so lost and trying to do at my home can't make it happen :(
@BBBQ.2 Месяц назад
I did exactly what was shown in the video, but I keep getting the error: Invalid number in parameter 'max_tokens'. How can I fix this?
@mrmagne Месяц назад
So what about javascript loaded content? How to fetch that information?
@woundedhealer8575 7 месяцев назад
Is there a way to use proxies for this? I just feel like it’d be pointless to get so deep into this without one
@kerimallami 8 месяцев назад ⁺³
BRO YOU ARE ROCKING IT!!!!!
@sketchingbyyash6358 7 месяцев назад
What did he write in the User Role - "Tell me about this website in JSON formate." What did he write after that?
@highnitin 8 месяцев назад ⁺²
this is pure gold :)
@snappyinsight 8 месяцев назад
Thanks for the Tutorial.
Does this also work on Amazon listings?
@nicksaraev 8 месяцев назад ⁺¹
Glad I could help. Yes it works on Amazon, though be wary that their bot detection is much more sophisticated (see another comment where I discuss how to scrape reviews).
@ThaBrowniePablo Месяц назад
Does it also crawl the entire website or just scrape the given URL?
@SaadBelcaid 6 месяцев назад
Hey Nick, What would be a prompt of GTP4 to extract those URLS and build the regex?
@hammadyounas2688 3 месяца назад
Great Work. Can you make a tutorial how to scrape data from linkedin?
@m4RIK 5 месяцев назад
to get it right ... u feed the whole html-content to gpt. so u pay the input-tokens for all of this. isnt it possible just feed the body or a single container ID or Class ?
@littlehorn941 9 месяцев назад
Thanks for making this video; very helpful with a few automation projects that I have. I've never heard of make before. I've been spending the last two years making a local webhook application as a side project that basically does the same thing as make, but this site is so much better.
@nicksaraev 9 месяцев назад ⁺¹
You're very welcome! I'm a dev as well and find Make better for 99% of business use cases. The only time I build something out in code these days is when a flow is extremely operationally heavy. Keep me posted 🦾
@EasGuardians 8 месяцев назад
Thanks Nick, super helpful. Will set this up right away :D
@nicksaraev 8 месяцев назад
Hell ya man! Let me know how it goes.
@sketchingbyyash6358 7 месяцев назад
What did he write in the User Role - "Tell me about this website in JSON formate." What did he write after that?
@DidierWiot 5 месяцев назад
Fantastic stuff, thank you so much Nick!
@conglife 7 месяцев назад
Thank you for your sharing, it has truly benefited me a lot.
@RandyRakhman 6 месяцев назад
thanks for teaching us sir. Appreciate it!
@DIPU1036 6 месяцев назад
How would you address the legality of scrapping?
@Deborah-iz1wi 7 месяцев назад
Hi Nick thanks for the video, I'm having a problem with the parser...it's not parsing down the text for me like it show in the video. Any suggestions on this?
@fontwellestate3544 3 месяца назад
HELP! Hey guys, I couldn't get paste the 403 error you came across when you got to getting the individual listings!
What do I do?
@Txjxsxa 5 месяцев назад
After building OpenAI module am facing rate limit error. Even after upgrading to GPT 4o am facing same issue.
Any idea how do I fix this?
@GarthB-uf6dr 5 месяцев назад
Hi Nick, is it possible to scrape a page that does not have an API and that you have to be logged into, please?
@hitmusicworldwide 9 месяцев назад
How do you get past authentication to scrape for resources that require a sign in?
@bsandmg 8 месяцев назад
Gonna check out, wonder if could be used for comments on a post, or twitter, example someone saying they want something, than boom you can respond
@nicksaraev 8 месяцев назад ⁺¹
Thanks Raiheen! You could, although there are probably better solutions to this. Facebook/Twitter/etc often hide comments behind a "reveal" mechanism like scrolling or a "Read More" button which makes scraping them difficult (in addition to their security and tight rate limits).
That said, anything is possible in 2024! You could run a search every X minutes using a search bar and scrape the top X comments. You'd use an intermediary DB of some kind to store the comment text, and then for every comment in your scrape, if that comment doesn't already exist, you could fire up a browser automation tool like Apify and log on to the platform in question. You'd then have GPT-4 or similar dream up a response and post it using JavaScript.
Hope this helps man 🙏
@KenshiDigital 8 месяцев назад
In the OpenAI does it need payment to generate an output (credits) or you just need to get the api key and that’s it?
@PazLeBon 8 месяцев назад
so it scrapes but you have to sign up.. hardly feels private does it
@DanielAuriemmaOfficial 8 месяцев назад
How would I use this if I have to login to a site in order to scrape it? Is there a login prompt to add before the site prompt? Thanks for all the info!!!
@nicksaraev 8 месяцев назад
Happy you found this valuable Daniel!
It depends on the site-sometimes you can just pass a username/password in the HTTP request module to get the cookie, other times you need to use browser automation tools like Apify. I recorded an in-depth video on authentication here if you're interested: ruclips.net/video/R8o7V39NSSY/видео.html
Hope this helps 🙏
@LEGOJediAcademy 7 месяцев назад
Have you caught wind of VideoGPT making waves? It's your ticket to seamless, professional video content.
@BassTi2k 7 месяцев назад
How can I code the headers for scraping data from TikTok? Is a specific type of header required to imitate a legitimate user or device?
@untetheredproperty 6 месяцев назад
Thank you for the information. BTW, your copyright has not been updated. :)
@LuxGolfAlgarve 3 месяца назад
how do you scrape in the same process the images?
@obvp 5 месяцев назад
Is it possible to scrape Wikipedia?
Not working following your steps
@ivansmiljkovic9097 9 месяцев назад
What camera are you using, is it Lumia by any chance? Thanks!
@nicksaraev 9 месяцев назад ⁺⁵
Because of this comment & a few others, I just published a full gear list in the description! Including camera, lens, lighting, etc :-) all the best
@alderdj.froolik 9 месяцев назад
Nicely brought!
@nicksaraev 9 месяцев назад ⁺¹
So happy you found value in this man.
@lc285 8 месяцев назад ⁺¹
First, you should explain what scraping a website is. 🤔
@AjarnSpencer 8 месяцев назад
There are other videos for that this is for those taking the next step
@elie2222 8 месяцев назад
Curious why you decided to watch the video if you didn’t know what it was
@channel83932 9 месяцев назад ⁺¹
Can we add proxies to these flows?
@nicksaraev 9 месяцев назад
Yes, definitely. You'd just replace the URL in the HTTP Request module with whatever your proxy is and then add the proxy-specific data (most proxies will require you to send credentials, the URL you want to passthrough, etc).
@channel83932 9 месяцев назад
@@nicksaraev can you show us an example of this?
@dfreshness2006 8 месяцев назад
You only logged the first listing on your Redfin search. How does it loop to the second and so on?
@nicksaraev 8 месяцев назад
Great q. The flow automatically loops because the "Match Pattern" module outputs multiple bundles. When multiple bundles are output by a module, every module after that module runs anew for each respective bundle.
Hope this helps 🙏
@MohamedRihan-d8i 9 месяцев назад
Thank you, can I use this to web scrape all reviews of a product on amazon
@nicksaraev 9 месяцев назад
Absolutely, just checked for you. You have to do it in two parts:
1. Feed in the Amazon product URL to a Request module like I show in the video. Then scrape HTML and parse as text.
2. Somewhere in the resulting scrape will be a URL with a string like /product-reviews/. You need to match this (can use regex). Then make another request to that URL for product reviews.
Amazon's bot detection is very good so be careful you don't get rate limited 🙏
@brianaragon1641 6 месяцев назад
but this only works if the content data you want to grab is only in text form on the web page, but if it is dynamically created let's say for a js script or something it wouldn't be able to grab the desire data... i.e... if i want to grab a price data from a web page... the content grab by Make a Request module would get something like this PRICE: $ 0.00, but in the web page it is like this PRICE: $ 3.70... but this last one is dynamically created and doesn't show this way in the make module...
@nicksaraev 3 месяца назад ⁺¹
Thanks for bringing this up. Will cover this in an updated video 🙏
@MrRichBravo 7 месяцев назад
Great info!
@BashkimUkshini 3 месяца назад
The HTML objects I get, have >50,000 characters, and when trying to paste this back on a G-Spreadsheet cell, I get an error.
Any tips how to reduce/clean-up the HTML object that I get back?
For example, ScrapeNinja module offers a JavaScript field you can use, to filter this out on the go, but they have paid APIs :/
@SaidThaher 3 месяца назад
Try split function to divide the data in pieces and then send them to splited cells in GS
@tachfineamnay398 6 месяцев назад
Great job ! thank you
@dandyddz 7 месяцев назад
Doesn`t make support css selectors?
@overtheedge23 9 месяцев назад
How about content behind a pay wall?
@nicksaraev 8 месяцев назад
Just recorded a video to answer this (hidden APIs)! Hope it helps you.
@JMasalle 6 месяцев назад ⁺¹
Skip to 2:47
@PrabhakarPrabhakar-vw7nk 8 месяцев назад
Agar WhatsApp pe contact block hota hain to call nahi lagta Hain . Message nahi jata hai, aur mere pass koi number nahi hain.to hum apna matha phone pe patakage to tumko telepathy se phone lagega?
@champagnebulge1 7 месяцев назад
It appears the free version of Chat GPT doesn't work with this. Still, interesting
@DrDonBoo815 8 месяцев назад
At the 13:31 mark, could you have your ChatGPT with custom instructions entered versus writing JSON to get a better email intro?
@nicksaraev 8 месяцев назад ⁺¹
Yes, definitely! PS the quality usually goes up if you let it output plaintext. This isn't as relevant for my purposes but something to keep in mind if you're generating content (say blogs etc)
@purvenproducts2463 9 месяцев назад
my friend thank you so much for your videos, I really appreciate it, again any Go High Level Platform Review?
@nicksaraev 9 месяцев назад
I will absolutely do one on GHL, I used to sell their platform as an affiliate actually. Tbh I don't like their "automations" one bit but it's important enough to go through. Probably next month as I finish the course and the rest of my videos-thank you for the idea!
@purvenproducts2463 9 месяцев назад
@@nicksaraev thanks buddy, I tried it but it was a bit overwhelming for a beginner.
@terrycarson 8 месяцев назад
Great Job!
@nicksaraev 8 месяцев назад
Thank you Terry!
@marvinschulz2480 8 месяцев назад
Golden content
@nicksaraev 8 месяцев назад
So glad you find it valuable man
@sm0k3ahontas 6 месяцев назад
I don't understand how to find a regex
@hypnoticblaze4323 9 месяцев назад ⁺¹
How to bypass the robo.txt file blocking the scraper?
@aiforbuiness 9 месяцев назад
Same would like to know the answer to this
@ryanangel3355 8 месяцев назад
You can't with this I am pretty sure
@my.johnnylavene 8 месяцев назад
need to scrape website for an ai web app to allow me to put Q&A, company info etc to fields on web app is that possible?
@nicksaraev 8 месяцев назад
Absolutely. I did something similar for a data viz SaaS a while back. You'd have to find a way to parse each of those strings (Q&A, Company Name, Company Description, etc) and then pass them to your app db. You can use AI for this if there's no consistent pattern-something like "Categorize the following text into XYZ using the following JSON format".
Hope this helps man 🙏
@sunmoonstarrays 8 месяцев назад
Hello,
How can I convert a super large XML FILE (literally a huge stack of file archives) into a simple site Jyson?
I'd sing at anyone's wedding if anyone can share.... lol jk but, truly would be very thankful for any suggestions
And nice channel 100 you got a new sub here ⚜️
@IwonaRepetowska-ij7so 2 месяца назад
I dont get it... you said ... "by the end of it you'll know everything that you need to know about how to scrape like you'll be better than 99% of the rest of the world at scraping
sites and you don't even really need to know like HTML or anything like that because we're going to use AI to help"... but if you dont know how to create key to connect openAI... or you have no idea what JSON is... C'mon! HTML is the easiest one of that... :( I was hopefull and eager to follow you.. now I'm in a rabbit whole
@DuniyaaWithShreyasRaj Месяц назад ⁺¹
how to tackle error 429?
@nicksaraev 26 дней назад ⁺¹
429 is basically always a rate limit error. You can either slow down, add "Break" modules to intelligently retry, use platforms like HookDeck to queue and then space out your requests, or send from different IP addresses. Hope this helps, Shreyas 🙏
@danielstay5270 8 месяцев назад
Hi, are you available to build something like this fo me or can you refer someone? Thanks Dan
@nicksaraev 8 месяцев назад
Hey Dan, happy to chat. Shoot me an email at nick@leftclick.ai and if I can't help I'll refer you.
@jtisaks8569 7 месяцев назад
This is very good explained!!!!!!!!
@byokey 6 месяцев назад
can you scrape banking account ?
@nicksaraev 6 месяцев назад
Only my own 😫

Следующие

Автовоспроизведение

How to Acquire Your First Make.com Automation Customer