Amazing - thank you for making this available to everyone! Have you experimented with a different approach like creating separate text files for each URL? I've seen a few developers say it helps the GPT or Assistant return more accurate/relevant results compared to one large file.
Great stuff Steve, really enjoying your content. What route would you advise to go from an unstructured individual pdf to a structured json output with the help of an api?
Hey Steve, question for you: does the crawler also create a log file so that I can go look and see which URL's had a crawling error? Or is there a way to view my console log files in general?
This is great, thanks! So how do I find the correct selector for my page? I am inspecting the page and trying out some names for the selector I think is correct, but is does seem to be because I am getting errors. I am inspecting your site to see what you take as selector. When I try that "div" name under "main" as you do, it does seem to work for me for other pages. Any ideas? Thanks
Wondering if there is some kind of tool similar to this that we could potentially crawl thousands of client email and awnser that will help to build an nosql database to be query by an ai..
@Steve8708 - thank you for this great video. I am trying to use this for a website that is not public. How can I customize the crawler to log into a site and crawl non-public information? The crawler shut down with this message - Reclaiming failed request back to the list or queue. page.title: Execution context was destroyed, most likely because of a navigation. The output-1.json file had just one entry with the following html" Sign in Can’t access your account?
Try it yourself: www.builder.io/blog/custom-gpt
Non-stop top shelf content, thanks Steve.
Next up, how to build a top shelf
This is sick!!! Been wanting this since GPT came out
I just tried it. It works like a charm. Amazing work!
Amazing - thank you for making this available to everyone! Have you experimented with a different approach like creating separate text files for each URL? I've seen a few developers say it helps the GPT or Assistant return more accurate/relevant results compared to one large file.
Great stuff Steve, really enjoying your content. What route would you advise to go from an unstructured individual pdf to a structured json output with the help of an api?
Nice video! Thanks for making these awesome videos
i tried to crawl laravel documentation, but i dont know what to fill in selector
Fast and clear!
❤ thanks for the tip
Thats great
Yep, this bangs.
Hey Steve, question for you: does the crawler also create a log file so that I can go look and see which URL's had a crawling error? Or is there a way to view my console log files in general?
Awesome stuff!
Man you are so generous
You are amazing, thank you so much
would love to include images so the GPT can look at screengrabs in the docs
Legend! Thanks so much
Thanks it worked! This is awesome🎉
Does this work on sites that have pdf knowledge base? Or only scrapes the body html?
do you think you can help me with setting up something similar for a unique use case?
Would like to integrate this open ai created assistant to my website. Any specific video or article to explore ???
This is great, thanks! So how do I find the correct selector for my page? I am inspecting the page and trying out some names for the selector I think is correct, but is does seem to be because I am getting errors. I am inspecting your site to see what you take as selector. When I try that "div" name under "main" as you do, it does seem to work for me for other pages. Any ideas? Thanks
I'm having the same problem.
same@@vegangaymerplays
if your documentation is not too huge you can ignore that
mine was around 100pages and I did not add selector and it works perfectly fine
Wondering if there is some kind of tool similar to this that we could potentially crawl thousands of client email and awnser that will help to build an nosql database to be query by an ai..
This is the sauce right here.
What is the difference between doing this, and just creating a custom gpt that can scrape a site directly? GPT builder can search the web!
GPT Builder can’t crawl a specific website.
@@mrloofer72 But for what use cases i can use this?
How do you know what to put for the selector?
what if the website needs authorization? is it possible? Thank you
How does this work with links locked behind auth?
Is it possible to prevent gpt to use this information that we supply? my company wouldn't want to make the information public for example
To my knowledge yes. While it will be shared with OpenAI, they say they don’t use any of the data for training purposes
@Steve8708 - thank you for this great video. I am trying to use this for a website that is not public. How can I customize the crawler to log into a site and crawl non-public information? The crawler shut down with this message - Reclaiming failed request back to the list or queue. page.title: Execution context was destroyed, most likely because of a navigation. The output-1.json file had just one entry with the following html" Sign in
Can’t access your account?
Terms of use Privacy & cookies ..."
Not a very good scraper if it has to rely on finding the selector for the website first.
is there any scraper without manually select dom?
Does it handle docs behind auth?
It uses a headless browser so can be customized to do that
adding Next.js 14 + shadcn... Hhhhh