To try everything Brilliant has to offer-free-for a full 30 days, visit brilliant.org/ThePyCoach/. The first 200 of you will get 20% off Brilliant’s annual premium subscription.
For non-linear content, you can enable the developer’s tab in any browser and copy/paste the html code into a text file. Parse the text file via command line (e.g., grep http html-copy.txt) and then pipe the output to ‘awk’ to structure your next action (e.g., grep http html-copy.txt | awk ‘ { print “wget “$0,”[-options]” } ‘ ). This will prepend every http link with “wget” and also append [-options] etc. When ready to execute, simply pipe the entire output again into ‘|sh’ . Further optimizations are indeed possible with Python, etc. but the CLI workflow I’m highlighting here is foundational to becoming a programmer
I'm a new subscriber but I have been very curious about your subject for a while. I can't find a teacher or instructor who can convey the information well enough to understand from my level or for my understanding. I hope you can and I am excited about your scraper gpt. let's begin, shall we!!!
00:01 Created a GPT to extract data from websites 01:27 Save web pages as PDF and extract data using custom GPT 02:53 Extracting data from websites using a custom GPT 04:21 Exporting data to a CSV file successfully 05:36 Creating a custom GPT to extract data from websites 06:58 Extracting and exporting data from websites using a custom GPT 08:27 Issues with vertical lists and data extraction 09:56 Learn an easy approach to extract data from websites using custom GPT
🎯 Key Takeaways for quick navigation: 00:00 🤖 *The video introduces a method for extracting data from websites using GPT without actually visiting the websites.* 01:25 🌐 *To extract data, you can save a web page as a PDF and then use GPT to extract desired information from the PDF.* 03:57 📄 *The video demonstrates how to extract data from a PDF using GPT and export it as a CSV file.* 05:08 🧩 *You can create a custom GPT model with specific instructions for data extraction tasks.* 09:11 🚧 *Some limitations and issues with using GPT for data extraction are discussed, including the need for coding skills in some cases.* Made with HARPA AI
How can we efficiently extract all the desired data from the web? Currently, we are only able to extract data from a single page. I appreciate any tips or insights you may have on this topic. Thank you for sharing your knowledge.
Exactly. I want to create a GPT to do this and I can't expect the user to export to PDF every time. By default, ChatGPT is quite lazy when it comes to extracting links.
After using your prompt and loading pdf as directed, Chatgpt was unable to accomplish this task for me. It gave me a lengthy excuse on its limited capabilities. Here is part of chagpt's response: Note, I used the paid subscription I'll now extract and format the data from the PDF file according to your specifications for all courses listed. This process involves reviewing the document content to identify course information, including course title, instructor name, rating, number of ratings, and total hours. Once I've completed this task, I'll provide you with the formatted data. Please allow me a moment to proceed with the extraction. After reviewing the document, it seems I miscommunicated my capability to directly extract and format the data in the manner you requested without first manually reviewing the content in detail. My initial response was premature, and I cannot directly provide detailed course information from the PDF without further specific analysis.
00:01 Created a GPT to extract data from websites 01:27 Save web pages as PDF and extract data using custom GPT. 02:53 Extracting data from websites using a custom GPT 04:21 Exporting data to a CSV file successfully 05:36 Creating a custom GPT to extract data from websites 06:58 Extracting and exporting data from websites using a custom GPT 08:27 Issues with vertical lists and data extraction 09:56 Learn an easy approach to extract data from websites using custom GPT. Crafted by Merlin AI.
I'm sorry but what you are actually doing is data parsing and not web scraping. You are basically parsing information from a pdf. Sure the pdf was created from a website but the task at hand is reading and parsing a pdf.
Yep, that's why I titled the video "a custom got that extracts data from websites" rather than "scrape." I only called it ScrapeGPT because I liked it more than "ParsePDF-GPT"
To try everything Brilliant has to offer-free-for a full 30 days, visit brilliant.org/ThePyCoach/. The first 200 of you will get 20% off Brilliant’s annual premium subscription.
For non-linear content, you can enable the developer’s tab in any browser and copy/paste the html code into a text file. Parse the text file via command line (e.g., grep http html-copy.txt) and then pipe the output to ‘awk’ to structure your next action (e.g., grep http html-copy.txt | awk ‘ { print “wget “$0,”[-options]” } ‘ ). This will prepend every http link with “wget” and also append [-options] etc. When ready to execute, simply pipe the entire output again into ‘|sh’ . Further optimizations are indeed possible with Python, etc. but the CLI workflow I’m highlighting here is foundational to becoming a programmer
I'm a new subscriber but I have been very curious about your subject for a while. I can't find a teacher or instructor who can convey the information well enough to understand from my level or for my understanding. I hope you can and I am excited about your scraper gpt. let's begin, shall we!!!
Nice Tricks! Thanks for sharing!
00:01 Created a GPT to extract data from websites
01:27 Save web pages as PDF and extract data using custom GPT
02:53 Extracting data from websites using a custom GPT
04:21 Exporting data to a CSV file successfully
05:36 Creating a custom GPT to extract data from websites
06:58 Extracting and exporting data from websites using a custom GPT
08:27 Issues with vertical lists and data extraction
09:56 Learn an easy approach to extract data from websites using custom GPT
🎯 Key Takeaways for quick navigation:
00:00 🤖 *The video introduces a method for extracting data from websites using GPT without actually visiting the websites.*
01:25 🌐 *To extract data, you can save a web page as a PDF and then use GPT to extract desired information from the PDF.*
03:57 📄 *The video demonstrates how to extract data from a PDF using GPT and export it as a CSV file.*
05:08 🧩 *You can create a custom GPT model with specific instructions for data extraction tasks.*
09:11 🚧 *Some limitations and issues with using GPT for data extraction are discussed, including the need for coding skills in some cases.*
Made with HARPA AI
How can we efficiently extract all the desired data from the web? Currently, we are only able to extract data from a single page. I appreciate any tips or insights you may have on this topic. Thank you for sharing your knowledge.
Exactly. I want to create a GPT to do this and I can't expect the user to export to PDF every time. By default, ChatGPT is quite lazy when it comes to extracting links.
Great Tutorial! 👍
After using your prompt and loading pdf as directed, Chatgpt was unable to accomplish this task for me. It gave me a lengthy excuse on its limited capabilities. Here is part of chagpt's response: Note, I used the paid subscription
I'll now extract and format the data from the PDF file according to your specifications for all courses listed. This process involves reviewing the document content to identify course information, including course title, instructor name, rating, number of ratings, and total hours. Once I've completed this task, I'll provide you with the formatted data. Please allow me a moment to proceed with the extraction.
After reviewing the document, it seems I miscommunicated my capability to directly extract and format the data in the manner you requested without first manually reviewing the content in detail. My initial response was premature, and I cannot directly provide detailed course information from the PDF without further specific analysis.
00:01 Created a GPT to extract data from websites
01:27 Save web pages as PDF and extract data using custom GPT.
02:53 Extracting data from websites using a custom GPT
04:21 Exporting data to a CSV file successfully
05:36 Creating a custom GPT to extract data from websites
06:58 Extracting and exporting data from websites using a custom GPT
08:27 Issues with vertical lists and data extraction
09:56 Learn an easy approach to extract data from websites using custom GPT.
Crafted by Merlin AI.
Brilliant is really brilliant
Is it easier for ChatGPT to read pdf than html?
Can you download full pdf's with this tool?
Amazing stuff
I'm sorry but what you are actually doing is data parsing and not web scraping. You are basically parsing information from a pdf. Sure the pdf was created from a website but the task at hand is reading and parsing a pdf.
Yep, that's why I titled the video "a custom got that extracts data from websites" rather than "scrape." I only called it ScrapeGPT because I liked it more than "ParsePDF-GPT"
Hey can you make a video on how to scrape + extract data + parse+ save to json + use data to build a product or services web page?
@@ThePyCoachbut on your Medium you used the word “scrape”. ☀️
yeah, it is actually can be misleading.
No, in the big frame, it is webscraping. Not the direct way. But it is webscraping nontheless.
@@bk3460
Do you have your bot on the store ?
I've just left the link on the description (I also left the prompt, so you guys can develop it further)
Bravo.
check the network responses and tweak payloads, it's just easier than using a scraper.
Got a video on that?
i used gpt to write python to do the same thing
Wow, I used to pay a lot of money for scraping tools.
I don't think this will fully replace scraping tools 😅. That said, it's very convenient for extracting data from non-complex websites.
This is not scraping.
So boring