Think you have what it takes to build an amazing AI agent? I'm currently hosting an AI Agent Hackathon competition for you to prove it and win some cash prizes! Register now for the oTTomator AI Agent Hackathon with a $6,000 prize pool! It's absolutely free to participate and it's your chance to showcase your AI mastery to the world: studio.ottomator.ai/hackathon/register
@ It's not the only AI channel i got the transcripts from. But it does help find things that 'I know I heard about' but otherwise never can find back. With the timestamps it can even tell you more or less where in the video to go. I've put the script I used on github, I can't post links here. But my screenname here and the first digit might help you further. It's under 'EasyScripts'. Sorry to be so descriptive but youtube is triggerhappy when being too direct with instructions.
@@rakly347- I’ve been wanting to do something similar myself, but wanted to feed a rag with my RUclips viewing history, and my Chrome / Google Web history, so I could always find something that I’d seen previously. So this will be a good start for me!
Thanks a TON for making this! I was dreading having to make this myself, as I'm still pretty new to python. This will help a bunch for the Unreal Engine agent I'm creating.
Thank you very much! This is the longest video I've ever put out on my channel so I worked extra hard to keep a good pace, so I appreciate you calling it out!
Funnily enough, this is more or less the architecture for all AI agent systems: a processing agent of some kind looks at the query and directs the actions to the relevant systems that then return a response that will be given to the LLM to give to the user. You can run your Azure infra's log analytics to an agent and have it monitor & repair your systems for example: all you need is the agent that looks at the system to determine which part is down and which agent to instruct to attempt repair while another timer runs down on a verification agent to see if it was successful or to escalate it to a human and so on. The structure is identical to this one.
Yes very true! I think a specific example of how to architect agents well with "agentic RAG" resonates with people more and makes the concept clear, but you are certainly right that this kind of solution and agentic reasoning is really the foundation for any agent.
I made one for Verse language by Epic Games and it used 11 million gpt-4o-mini tokens and 23 million embedding tokens lol. Thank you for this!!! Been trying to solve this months.
Brilliant video. After watching the video it seemed to me that RAG may be one of the killer use cases for Agentic implementations. Thank you for sharing your insights generously.
Hey Thanks for making this video Cole. I was having trouble bridging the gap between the phidata repo and the actual implementations of these things. This helped me make my own from scratch.
@@ColeMedin Have you noticed that nobody is talking about the Granite and Snowflake embedding models that have come out over the past few months. Theyre way better than Nomic and Ada, yet people dont even know they exist, and keep using OpenAI API!
Funny … just started building exactly that (RAG based app, that handles different documentations in an agentic way). Very good content you provided here.
Thanks man! Yeah I totally get the skepticism - I share it too. The best AI agents are ones to assist you by saving you time without you having to trust it completely. For example, an agent that will draft replies to your emails without actually sending them so you can review them first.
@@ColeMedin 100% with you on this. We want to *combine* human and machine intelligence. Big thank you for all your work in this space, Cole. Your passion and dedication shines through with every video, and you've signed yourself up to the really hard stuff like community building too. See you on competition day ;)
Unfortunately had to give up due to OpenAI rate limit issues. Tried using time library, reducing chunk size, new API key, still getting rate limit errors on Tier 1 usage. I might come back to it after going Tier 2 to see if it helps. First time forking a repo so could totally be something I'm doing but I don't see what! Awesome video though man, I do truly believe this to be the way of the future.
Thank you for the kind words! Sorry you are hitting rate limit issues though. My suggestion would be to try and use OpenRouter instead of OpenAI for the LLM. Their rate limits are much more generous right out the gate. You'd still have to use OpenAI for the embedding model though, OpenRouter doesn't have those.
Great tutorial. About the point of using Supabase as a single place to have both embeddings and structured data compared to 2 different DBs, what do you think about Payloads in Qdrant or metadata in other vector databases. Would you get the speed and ability to filter based on structured data that way?
Thank you and great question! I really like working with SQL databases for structured data over everything else, it's the most robust + easy to work with, and also LLMs know SQL very well so you can even make them write the queries for more dynamic data access.
Have your looked into using MCP servers/clients? It seems to be where all of this is going, and has a ton of industry support. Essentially (from what I've read so far) it's a modular approach to everything shown here, but with way more flexibility and ease of setup (just a few lines of json config text for each type of worker).
Yes I have started looking into MCP! And I am thinking about how I can take a lot of what I've been working on and leverage it for my protocol instead of what I've been doing a lot with HTTP endpoints.
Thank you! Awesome stuff. Just went through the entire video. Now, I’ll take my time to implement it and understanding each piece in details. You are awesome!!!!
Great question! There are really two parts to this workflow: 1. The script that creates/updates the knowledgebase you'll want to run on a scheduled basis 2. The AI agent that leverages agentic RAG I'd publish the knowledgebase script to something like a serverless function and set it on a schedule to run every hour/every day, depending on how often the documentation/site updates that you are scraping. For the AI agent, you could turn this into a serverless function or have it sitting as an API endpoint on a cloud machine through something like DigitalOcean.
Great question! RAGFlow is good from what I've heard, but I haven't used it myself. LightRAG is another great one, still requires coding but it takes care of a lot under the hood for you! github.com/HKUDS/LightRAG
I watched both lessons step by step, and I got stuck at the moment where you start communicating with the streamlit. In other words, I don't understand how you set up the frontend to make everything work. We're messing with scripts and so on, and boom, we have a web interface where we ask questions. How to do it?
Sorry that part is confusing, I get it! I cover building the Streamlit app quickly at the end of the video and mention that all the code is in the repo to check out yourself if you are curious. I use the Streamlit app prematurely (before I show the code for it), just so it's easier to demo the agent as we are building it.
@ColeMedin Great stuff. I've been thinking for a long time about how to give chatgpt access to certain knowledge. It turns out it's called RAG. And all the other tools are just great. As a result, I got a lot of files that look like a complete mess. I will be looking for a video on how to structure all this. I made an agent that works great with yandex_market_api, compared scripts written simply by chatgpt4o and chatgpt4o mini with supabase, supabase wins.
Great video and explanations. I have subscribed to the channel. Have a question though. Is there a way to use Ollama instead of Openai for this as I don't have an Openai API account. If there is, can you suggest how or point me to one of your videos which has done it.
Thank you very much! For using Ollama, they have OpenAI API compatibility so it's really easy to adjust this code to work with that! ollama.com/blog/openai-compatibility
Dude you speak so well. Thank you for all your videos. Hey I think you missed the most important part here though when you mentioned some mathematics to do the matching That's the heart of it all, and that's typically done by clustering or graphing and even the graph is intuitively clusters. A bit of a discussion on where that cluster topology is stored, IE is it stored as additional fields in the database would be great if you do that later in the video my apologies. Intuitively this is serving the effect of an index but is much different. F a i s being a graph approach and many others being cluster k means centroid approach
Thanks Ken, I appreciate it a lot! You're totally right that focusing on what actually goes on under the hood with RAG is super important. And I do want to create more content diving into that later on! You're right I don't dive into it in this video though, mostly because there is already so much content here I don't want to overwhelm someone in one video.
great video and channel! Thanks! Another very entertaining way to see classic RAG fail spectacularly is legal RAG. My first RAG use case was feeding a large legal doc to a vector db and the result was abysmal... I started to question the whole concept of RAG, but of course, the issue was in front of the computer. I am now experimenting with different approaches, obviously web scraping wont help here, but I think I will need to chunk up documents in different ways and have different dbs to pull information from... Even for one doc, just one vector db wont do. But thanks, your video just reassured me and pydanticAi bundled with your tool on studio.ottomator makes me optimistic to get this right one day :-)
Thank you and yes it does! Essentially for agentic RAG in n8n you can include the usual "vector store retrieval" tool for your agent and also define other tools like I did in this video to explore the knowledge in other ways.
@@ColeMedin Jumping in here - if I'm not very experienced with coding (though not completely foreign to it either) - would it make more sense to learn pydantic or n8n, if I want to start building agents (including complex ones)? (also considering Relevance AI and Crew AI, if you can comment on those) BTW - I think a video covering which tool to learn could be a huge success, if you feel you have the right knowledge for this.
Great question! If you're looking to build more complex agents I would certainly recommend starting to learn Pydantic AI. n8n is still great, especially for prototyping, but coding your own agents in the end is more robust. I haven't used Relevance AI and Crew AI much but for a reason, generally it does way too much for you, which can be nice but also takes away a lot of your power for customization.
@@ColeMedin Thanks for the reply. If I may ask a small follow-up: is Pydantic beginner friendly? How much longer would it take to create a fairly basic agent with it compared to something like n8n?
Question. For the second question example, how is it determined that an 'agentic ai' secondary lookup is necessary / that the initial response is not adequate? Is a secondary query always being made (or is this 'black box' )? This important tidbit seems to have been glossed over and trying to understand how this 'decision' is made. Also, as we know, follow up questions / context can be difficult with RAG for know if a new question is being asked versus a request for related or additional information to the previous query(ies). Does Streamlit facilitate this process (you mentioned historical / session based), but am not sure if Streamlit is just for the UI only. Thanks for the great videos.
Fantastic questions! And totally fair that I did gloss over this and it would have been beneficial to dive into it more. Firstly, the secondary query is not always made. Tested this myself. The agent does know when RAG is "enough", and the system prompt is where I tell it how to make that determination. Basically I just say "if the knowledge returned from the RAG lookup is relevant to the user's question and you feel confident using it to answer, then stick with that and don't perform the secondary lookup. Otherwise, continue to the secondary lookup with the other tools available." This could probably be improved a lot and I could certainly be more specific to help it make that decision better! But it was a good starting point and I wanted to keep things simple. For your last question - Streamlit has a concept of "state" for the app, which I do use to store the conversation history. So if a document chunk is returned from RAG, that is included in the conversation history so the agent can leverage it a second time to answer another related question without having to perform RAG again!
this is very beautiful but the question i will like to ask is, what if the pages are dynamic either the content of pages can chage as time goes on so the database will contain old information which will no longer be accurate how can one solve this problem?
Thanks and good question! I would take my example and turn it into something you can run on a regular basis (like once a day), clear out the old knowledge, and rescrape and insert up to date knowledge for the site.
Great Content. I have a question about the base of all that architecutrw query pipe. And is the quality of parsing complex pdfs how good isnpydantic parsing because no matter agents you have and tool if your document is bad parse all thos arch fall down. What do you think.
Thank you! For parsing PDFs, I would create a custom solution that you would bake into this process. You wouldn't use Pydantic to parse the PDFs, you would use some PDF library like: pypdf.readthedocs.io/en/stable/
Thank you! I have used Supabase for auth a lot in the past, mostly using Auth0 right now though just to have something more universal. Haven't used edge functions much but I know they are great!
Just wanted to let you know that this is a solid project. Thanks for sharing and inspiring us to learn agentic ai. At last youtube doing me good. And i was trying to implement this to create a chatbot for our colllege website, and my agent replies like .... Here are the departments in MSEC: {"tool_calls":[{"id":"list_departments_msec","type":"function","function":{"name":"list_departments_msec"},"parameters":{"}}]} Note: I used the tool "list_departments_msec" which is not provided by you. Please provide the real department list. I guess its with me changing the system prompt..
Thank you very much! Which model are you using for this? This kind of response looks like what I typically see with smaller LLMs that don't handle tool calling correctly all the time. I'd try with a different/bigger model!
Thank you! I need to look into KAG more, something on my list to research. But I'm not super sold on the idea at this point so right now I'm sticking with agentic RAG.
Great content, I have been binge watching your channel over the last few days. Very informative. As a developer this has become a go to channel for AI learning.
Thanks for the amazing work. I have a challenging use case to use another language using rag but i need it to be as accurate as possible any suggestions? Thanks
why basic rag didn'r retrive the whole weather example, although when you split chunks, the function should've accounted for taking whole code blocks and i think the whole example should have been in one chunk, easy to retirive ans spit it out just using basic rag. I know agentic is a lot better, but optimizing basic rag would help agentic I guss as it uses splitting as building block as well.
Fantastic question! Honestly I was wondering that myself. I checked the database and confirmed that the weather agent code block is maintained in a single chunk, and that the retrieval isn't grabbing that chunk so it isn't like the LLM is ignoring it. Hard to say exactly what the problem is, and if I optimized my setup (different chunking, better RAG with query expansion, etc.) I'm sure I could eventually get to the point where it could pull the full example. Agentic RAG is just one solution to make it more robust but certainly the easiest in my mind!
Thank you for replying to my previous question. I have another one if you don’t mind. Why are you not preprocessing the scrapped data before creating embeddings - normalize it, remove irrelevant data and noise, and structure the data in a more suitable format. From what I have seen, although good, the markdown provided by crawl4ai could use some further sanitation. This is a legitimate question and in no way meant to challenge how you are building; I am genuinely curious to know if preprocessing could be beneficial. Thank you again.
Good question! The primary three I've used more than just a bit are Supabase, Qdrant, and Pinecone. Supabase is my general recommendation. It isn't as fast as Qdrant or Pinecone so not the best if you really need speed, but it's as powerful and I love having my SQL DB and RAG on the same platform. Qdrant is open source so I'd generally recommend it over Pinecone since you can host it yourself for free!
Actually I think reasoning models will make agentic RAG even better! Using something like R1 to reason about what to search in the vector DB is something I am looking into.
can i create now a vue 3 and nuxt 3 agent with this and combine it somehow with cursor, so that cursor uses claude sonnet and the agent to code my requirements?
GREAT question! You certainly can by putting this agent behind an OpenAI compatible API endpoint and setting that up in Cursor. Something I am going to explore more soon and probably create a video on!
@@ColeMedin i see, but i am not able to create something like this. But think of the possibilities if this works - Amazing! Really looking forward to a video like this. Let me know if i can help..
Thanks man! I think something like this is a part of the Perplexity platform, but mostly it's a web search engine powered by AI not a RAG solution. Would be difficult to ingest the entire internet into a knowledgebase! haha
Hmmm... have you checked to see what is retrieved from RAG? Maybe the wrong context is being fetched which is why it seems the AI Agent doesn't know how to answer?
Great video! Just a small (maybe inconsequential query?) - any specific reason you used supabase and not other modules for the vectordb ? (Like faiss ) Edit: got my answer when your started agentic rag part lmao - we can't directly store metadata etc on faiss, so makes sense why you selected something like supabase
Your desk setup looks so good. Would be great if you made a video or if not did you buy the desk somewhere or build it/customised yourself and if you built it how? :D
I hate to break it to you but the background is not real! I generated it with AI and then I use a tool called Nvidia Broadcast to put it as my background without even having to have a green screen.
Hey, If I have around 20-30 docs of websites like these, how should I go and store them,should I store it in a single table or should I break it down to multiple tables ?
Great question! I'd recommend sticking to one table for simplicity, and then setting a metadata field for the website the record is from. Very similar to what I do for the "source" metadata field in the video. Then when you query the knowledgebase and you only want to query from one website, you just include that metadata filter in your query!
Great video!! Just one question, can you go over how the matching of embeddings work? I guess I didn't understand what are embeddings and thus didn't understand how the searching of "relevant" docs work. Any video I missed where you discussed this in detail?
Thank you! Lot of great resources explaining embeddings on RUclips! I don't have a video dedicated to the topic, but here is one I vetted myself that explains it very well: ruclips.net/video/dN0lsF2cvm4/видео.html
This is amazing, thank you so much for your work and for teaching us! much appreciated it! Can you set this up also with deepseek r1 or you need openai embeddings capabilities for the DB?
You are so welcome! You can certainly use R1 through DeepSeek or OpenRouter for the LLM! You'd just have to keep OpenAI for the embedding model or use a different one (like a local embedding model through Ollama).
Yeah you certainly can! In fact Ollama is OpenAI API compatible so you really wouldn't have to change much here! And Pydantic AI supports Ollama. ollama.com/blog/openai-compatibility
wouldn't it be easier to use another agent to create the summaries/titles? i saw you went directly to openai... by using an agent we can save some tokens running ollama
Great question! And your head is certainly in the right place for how to leverage this in n8n! I would use FastAPI to create a Python endpoint around whatever Crawl4AI logic you want. The "payload" for the API could be a specific page you want to crawl or a list of pages. Then the API can either return the contents of the crawled page(s) or just put the knowledge in a database like I do and then return a success code. For hosting this endpoint, I'd recommend using DigitalOcean. I'll probably be making a video on this soon! Lot of people want to use this with n8n.
Q: How vector search can find similarity to questions? Meaning: If my website is built like Q&A then it might that the question maybe comes aside the answer(same chunk?). While we know that LLM trained to answer questions, in RAG it is a just vector similarity match. I did not see that you handle that. and yet, still looks like works pretty well in your demo. I heard some podcast that taught nice trick: first give LLM to try answer something on the question (even if not good enough /no updated enough). Then, take this answer and because it is in positive sentence, not a s question, there is higher probability to find a match to it in the vector DB. what do you think?
Good question! Vector search in simple terms is all about keyword matching. That's why a question can be matched to chunks even though it's not a positive sentence. Those keywords like "tool calling" or "weather agent example" are still in the question. But the idea of forming it into a positive sentence first before retrieval is a good idea too! For a lot of uses cases I bet that'll help with accuracy a lot.
Great video and nice approach. Any specific reason to choose a chunk_size of 5000 (more or less 1250 tokens) while generally the recommended chunk size is up to 400 tokens?
I just got this working on my home laptop. Amazing! I can see how this might chew a lot of API tokens... Here's a hard question that the usual LLM's (GPT-4o, DeepSeek, etc.) get wrong: "Can you explain how Pydantic AI implements its custom validation mechanisms for complex nested models, and what are the performance implications of using these validations in large-scale applications?" now... how do I rerun this to my docs?
Glad you got it up and running! The title and summary creation for every chunk will certainly take a lot of tokens, luckily though it's a simple task so you can use very cheap LLMs to get the job done. That's a good question! Could you clarify what you mean by rerun to your docs?
I have a collection of PDF's. Presumably, I need to reliably convert them to markdown, and then let LLM do chunking and bookkeeping, similar to what you did with the website crawl.
Hi Cole, I’ve been following your videos on Agentic AI and RAG, and they’ve been incredibly insightful! I’ve successfully built an AI assistant with Agentic RAG based on your guidance, and it’s working great. However, I want to ensure that the assistant only replies to queries related to my website, and any other queries are considered outside the scope. Could you share any tips or best practices to achieve this? Your expertise would be a huge help. Thanks for the amazing content you share!
Thanks for the kind words and that's super cool you built an agent for yourself based on this! Nice work! Great question too. The main wait to limit your agent to focus on just what you made it for is to tell it that in the system prompt. Something like "You are an expert at the Pydantic AI documentation and only answer questions and talk about that. If the user asks about or talks about something out of scope, direct them back to talking about Pydantic AI and say you can't discuss other topics."
I have a request, would appreciate if you can show how we can use some of the free alternatives to openAI api (like local ollama or huggingface models) and also if you can make tutorials which involve typing the code step by step in real time, since seeing entire chunks of code at once can be pretty heavy on the eyes.
YES LangGraph and Pydantic AI is an incredible combo! This would fit very well into agentic RAG - we can ingest the LangGraph documentation just like we did with Pydantic AI using Crawl4AI. They have a sitemap.xml as well: langchain-ai.github.io/langgraph/sitemap.xml What we can do is set the metadata field "source" to be "pydantic_ai" for the Pydantic AI docs and "langgraph" for the LangGraph docs. Then we can create separate RAG tools for our agent that will search specifically through each of the docs in the knowledgebase using the metadata to filter. That way the agent won't get confused between the frameworks but can still search through both to combine them together to create agents on our behalf leveraging both.
Great question! Having more options to access the data like this almost always gets better results that just naive RAG. It allows the LLM to reason more about the knowledge it wants to retrieve. Basic RAG is pretty limiting because the agent can't make many decisions about the information it is getting.
Thank you! I don't offer consulting at this time but I am working on a platform to connect developers to business owners. Also feel free to post in our community of developers if you are looking for someone! thinktank.ottomator.ai
Is there a way to feed that data (crawled thanks to Crawl4AI) and easily feed it to a a Chat GPT agent I created ? (I am a no-coder that's why this use case is interesting for me)
Did you create a GPT Assistant? Is that what you mean by Chat GPT agent? If you follow this video to create a knowledgebase in Supabase using what you scrape with Crawl4AI, you could create a custom tool for your OpenAI assistant to query that knowledgebase!
didnt worked out for me. i just thought to update the code for using gemini api key. all went well but at the last when it cam to build a ui then it crashed during second question. works for one question per check related to the doc.
Hmmm... sounds like something is off with the way the conversation history is stored/retrieved if it crashes on the second message. What is the error you get?
Ollama is OpenAI API compatible so it's pretty easy to switch to that instead of GPT! Main thing is just changing the base URL in the OpenAI client to point to Ollama. They have docs covering this: ollama.com/blog/openai-compatibility
You can certainly tweak this solution to use both! For example you could host Supabase locally (for Postgres) and run an LLM through Ollama. For the LLM you'd just have to change the "base URL" for the OpenAI client to point to Ollama: ollama.com/blog/openai-compatibility And for the Pydantic AI agent, Pydantic AI supports Ollama: ai.pydantic.dev/api/models/ollama/
Hola Cole, me gustaría sugerirte que reconsideres activar las traducciones automáticas en tus videos. Para muchos, como yo, estas hacen tu contenido mucho más accesible y fácil de seguir, especialmente cuando necesitamos prestar atención tanto a lo que explicas como a lo que muestras en pantalla. Al consultar con otros creadores de contenido de habla inglesa, me confirmaron que desactivarlas es una decisión personal, pero hacerlo puede dar la impresión de que no se valora tanto al público HISPANO. Aprecio un montón tu trabajo y espero que tomes esto como una crítica constructiva y un impulso positivo hacia el cambio.
RUclips has changed some things under the hood so I wasn't aware I lost this. I have automatic dubbing in some languages through RUclips but not others. I will have to look into it!
Very interesting video, thanks! I have a little question for you: Why do you give the URL for the page to the LLM instead of saving your markdown text obtained from the initial crawling? Is there an advantage to do so? I guess it's to avoid the need to constantly update your Markdown information since the URLs will always have the latest information through the URL and will "re-crawl" it if need be, but I was curious to understand if there was other elements in your thinking. Thanks! :)
Thank you and great question! So when I give the URLs to the agent it doesn't actually use the URL to visit the site in realtime. It does just use the markdown I have stored in the database. It simply uses the URL to determine if the content is relevant to the user's question, sort of like a title but I was thinking URLs give extra context with the path - it speaks to how the page relates to the rest of the documentation if that makes sense. But also you're thinking is spot on that we could also have the agent pull the latest information in realtime with the URL if we wanted!
@@ColeMedin Thank you for your quick and complete reply! I understand better now and really appreciate! It does indeed make sense to have the full URL to have the extra context from the path (I didn't even think of it that way)! Have a nice weekend!
Awesome tutorial! 👏 Quick question. Is Quadrant's faster speed for sematic search big enough of a benefit to maybe introduce a hybrid model where you might want to use both whereas Supabase might have a reference column to a Quadrant vector store that handles the vector search?
Thank you and good question! Though I might need a bit of clarification. In my mind I can't really see how a specific column in Supabase would point to a Qdrant vector store. If you have multiple Qdrant vector stores to perform RAG with, I would just set those up as separate tools for the agent right in the code instead of making the agent go to Supabase to first find the Qdrant vector store to use. I suppose though that if you really do have dozens of Qdrant vector stores for some reason, it would be more scalable to maintain that list in Supabase instead of having it hardcoded in your script!
@@ColeMedin I was envisioning something similar to storing the primary key of the record in Supabase record in a Qdrant database as part of its meta data. This record would then store the vectors. It would function similar to a lookup table, instead the vector search portion would run against the Qdrant database instead of the Supabase one and the final search would be combined into one result. Is this feasible, I am wondering?
Glad you found it help! Yes I'm planning on doing that soon and it'll be very similar to this. In fact this already is a RAG agent with SQL queries essentially!
What I was suggesting is an agent having a tool that searches user’s question in a relational database and gives answer. This would mean the agent/tool will need to convert the question into a sql query to fetch the relevant data and feed it to LLM. This is required for most B2B use cases where data is stored in tables.
Hello @ColeMedin, I used this code of your but I used gemini model instead of openai and made necessary changes. When I am crawling and adding to the database, everything else is working correct except I am getting an error: Error getting title and summary: 429 Resource has been exhausted (e.g. check quota). It might be hitting some rate limit. Could you please tell me how to resolve this. Thanks a lot in advance.
Yeah you're probably hitting the LLM too frequently - that's what a 429 error typically means. I would add a delay between each call to the LLM using the time library in Python. And probably reduce the batch size for the web scraping as well.
Great question! Unfortunately not out of the gate since Crawl4AI is not one of the Python packages in n8n and there isn't a Crawl4AI node. But what you can do is turn a Crawl4AI implementation into an API endpoint that you call with the HTTP node in n8n! I might actually be making a video on this soon ;)
Think you have what it takes to build an amazing AI agent? I'm currently hosting an AI Agent Hackathon competition for you to prove it and win some cash prizes! Register now for the oTTomator AI Agent Hackathon with a $6,000 prize pool! It's absolutely free to participate and it's your chance to showcase your AI mastery to the world:
studio.ottomator.ai/hackathon/register
Your channel is a goldmine for someone who wants to get into the nitty gritties of using AI in practical life with a technical background!
Thank you - that means a lot!
I 'scraped' all your video transcriptions with youtube api and gave it as knowledge to my agent.
Thanks 👍
I’ve thought about trying this. How’d it work out?
@ It's not the only AI channel i got the transcripts from. But it does help find things that 'I know I heard about' but otherwise never can find back. With the timestamps it can even tell you more or less where in the video to go. I've put the script I used on github, I can't post links here. But my screenname here and the first digit might help you further. It's under 'EasyScripts'.
Sorry to be so descriptive but youtube is triggerhappy when being too direct with instructions.
Wow that's super cool! You bet!
@@rakly347- I’ve been wanting to do something similar myself, but wanted to feed a rag with my RUclips viewing history, and my Chrome / Google Web history, so I could always find something that I’d seen previously. So this will be a good start for me!
Thanks a TON for making this! I was dreading having to make this myself, as I'm still pretty new to python. This will help a bunch for the Unreal Engine agent I'm creating.
Strong work, Cole. You really do a great job with your videos. Just the right pace, and interesting projects.
Thank you very much! This is the longest video I've ever put out on my channel so I worked extra hard to keep a good pace, so I appreciate you calling it out!
This is my favourite channel for AI topics. I am hands-on with your content. Thank you for all your efforts. Greatly appreciated.
Thank you very much! You are so welcome!
Thanks!
Of course! Thank you for your support - that means a lot to me!
@@ColeMedin I bring good luck bro. All the channels I contributed in the early stages became very successful ! =)
Haha I love it! Glad to have you here man!
Dude, I literally watched one of your old videos on this topic today so happy you’re covering this. Thank you appreciate you.
Glad it's perfect timing haha! You bet Mitchell!
Your teaching style is the best, thank you for sharing!
That means the world, thank you! My pleasure :)
Thanks
Thank you so much for your support! :D
Funnily enough, this is more or less the architecture for all AI agent systems: a processing agent of some kind looks at the query and directs the actions to the relevant systems that then return a response that will be given to the LLM to give to the user.
You can run your Azure infra's log analytics to an agent and have it monitor & repair your systems for example: all you need is the agent that looks at the system to determine which part is down and which agent to instruct to attempt repair while another timer runs down on a verification agent to see if it was successful or to escalate it to a human and so on. The structure is identical to this one.
Yes very true! I think a specific example of how to architect agents well with "agentic RAG" resonates with people more and makes the concept clear, but you are certainly right that this kind of solution and agentic reasoning is really the foundation for any agent.
This is what I've been waiting for all day
I made one for Verse language by Epic Games and it used 11 million gpt-4o-mini tokens and 23 million embedding tokens lol. Thank you for this!!! Been trying to solve this months.
That's awesome!! You bet man!
Brilliant video. After watching the video it seemed to me that RAG may be one of the killer use cases for Agentic implementations. Thank you for sharing your insights generously.
Hey Thanks for making this video Cole. I was having trouble bridging the gap between the phidata repo and the actual implementations of these things. This helped me make my own from scratch.
That's awesome! You bet!
@@ColeMedin Have you noticed that nobody is talking about the Granite and Snowflake embedding models that have come out over the past few months. Theyre way better than Nomic and Ada, yet people dont even know they exist, and keep using OpenAI API!
As usual,
Fruitful and informative video.
Thank you Cole, keep going bro.
Thank you very much! I certainly am not going to stop :)
Bro, this is incredibly valuable! Bog shout out to you for all thos free content! Create a skool I'll be your first disciple 🙏
Thank you very much! I don't have a Skool but I do have a community over at thinktank.ottomator.ai :)
Thank you for what you do man. These videos are more than useful!
You are so welcome! I appreciate it!
Funny … just started building exactly that (RAG based app, that handles different documentations in an agentic way). Very good content you provided here.
I love it! Thank you!
Rare, no bullshit channel, thank u Cole!
Haha I appreciate it, you bet!
Great job! Watching with enthusiasm. Agentic RAG is an absolute trend now, I'm just skeptical of leaving important tasks for the LLM (ai agent)
Thanks man! Yeah I totally get the skepticism - I share it too.
The best AI agents are ones to assist you by saving you time without you having to trust it completely. For example, an agent that will draft replies to your emails without actually sending them so you can review them first.
@@ColeMedin 100% with you on this. We want to *combine* human and machine intelligence. Big thank you for all your work in this space, Cole. Your passion and dedication shines through with every video, and you've signed yourself up to the really hard stuff like community building too. See you on competition day ;)
Thank you very much, that means the world to me! Can't wait to see your submission man!
So good man! As a Java dev I love how you break down the Python code - helps me learn! 💯💯
Thanks Jeremy! Glad to help!
keep up the freaking great work Cole!
Thank you, I appreciate it!
Unfortunately had to give up due to OpenAI rate limit issues. Tried using time library, reducing chunk size, new API key, still getting rate limit errors on Tier 1 usage. I might come back to it after going Tier 2 to see if it helps. First time forking a repo so could totally be something I'm doing but I don't see what! Awesome video though man, I do truly believe this to be the way of the future.
Thank you for the kind words! Sorry you are hitting rate limit issues though. My suggestion would be to try and use OpenRouter instead of OpenAI for the LLM. Their rate limits are much more generous right out the gate. You'd still have to use OpenAI for the embedding model though, OpenRouter doesn't have those.
awesome I’ll give that a go, hadn’t thought of trying OpenRouter
Sounds great! Yeah OpenRouter is fantastic
Great video. That is exactly what am looking for - binding metadate with vectors for multi-round and precise retrievals. Thanks.
Thank you, you are welcome!
thank you cole youre the best
You bet, I appreciate it!
Great tutorial.
About the point of using Supabase as a single place to have both embeddings and structured data compared to 2 different DBs, what do you think about Payloads in Qdrant or metadata in other vector databases. Would you get the speed and ability to filter based on structured data that way?
Thank you and great question! I really like working with SQL databases for structured data over everything else, it's the most robust + easy to work with, and also LLMs know SQL very well so you can even make them write the queries for more dynamic data access.
Yet another great video! thanks man
Glad you enjoyed it! You bet!
Have your looked into using MCP servers/clients? It seems to be where all of this is going, and has a ton of industry support. Essentially (from what I've read so far) it's a modular approach to everything shown here, but with way more flexibility and ease of setup (just a few lines of json config text for each type of worker).
Yes I have started looking into MCP! And I am thinking about how I can take a lot of what I've been working on and leverage it for my protocol instead of what I've been doing a lot with HTTP endpoints.
Thank you! Awesome stuff. Just went through the entire video. Now, I’ll take my time to implement it and understanding each piece in details. You are awesome!!!!
You are so welcome, thank you very much!
Very great explanation mate! Big thanks!
Thank you! You bet!
Yes ! I've had successes, and i've had a lot of pitfalls, I think i've had more non successful runs, then successful runs, but i'm getting there
Cole, whats the recommended way to deploy it in a real live environment? Especially aws lambdas?
Great question! There are really two parts to this workflow:
1. The script that creates/updates the knowledgebase you'll want to run on a scheduled basis
2. The AI agent that leverages agentic RAG
I'd publish the knowledgebase script to something like a serverless function and set it on a schedule to run every hour/every day, depending on how often the documentation/site updates that you are scraping.
For the AI agent, you could turn this into a serverless function or have it sitting as an API endpoint on a cloud machine through something like DigitalOcean.
Do you recommend any RAG frameworks for those of us who don't want to build and maintain our own RAG? Like RAGFlow etc.?
Great question! RAGFlow is good from what I've heard, but I haven't used it myself. LightRAG is another great one, still requires coding but it takes care of a lot under the hood for you!
github.com/HKUDS/LightRAG
I watched both lessons step by step, and I got stuck at the moment where you start communicating with the streamlit. In other words, I don't understand how you set up the frontend to make everything work. We're messing with scripts and so on, and boom, we have a web interface where we ask questions. How to do it?
Sorry that part is confusing, I get it! I cover building the Streamlit app quickly at the end of the video and mention that all the code is in the repo to check out yourself if you are curious. I use the Streamlit app prematurely (before I show the code for it), just so it's easier to demo the agent as we are building it.
@ColeMedin Great stuff. I've been thinking for a long time about how to give chatgpt access to certain knowledge. It turns out it's called RAG. And all the other tools are just great. As a result, I got a lot of files that look like a complete mess. I will be looking for a video on how to structure all this. I made an agent that works great with yandex_market_api, compared scripts written simply by chatgpt4o and chatgpt4o mini with supabase, supabase wins.
Awesome content, it’s so helpful in AI learning curve! Thanks a lot!
Thank you very much! You bet!
Great video and explanations. I have subscribed to the channel. Have a question though. Is there a way to use Ollama instead of Openai for this as I don't have an Openai API account. If there is, can you suggest how or point me to one of your videos which has done it.
Does anybody know how to use this with Ollama?
Thank you very much! For using Ollama, they have OpenAI API compatibility so it's really easy to adjust this code to work with that!
ollama.com/blog/openai-compatibility
superb content, fantastic presentation... if only all of RUclips was like this...
Thank you very much!
Dude you speak so well. Thank you for all your videos. Hey I think you missed the most important part here though when you mentioned some mathematics to do the matching That's the heart of it all, and that's typically done by clustering or graphing and even the graph is intuitively clusters. A bit of a discussion on where that cluster topology is stored, IE is it stored as additional fields in the database would be great if you do that later in the video my apologies. Intuitively this is serving the effect of an index but is much different. F a i s being a graph approach and many others being cluster k means centroid approach
Thanks Ken, I appreciate it a lot! You're totally right that focusing on what actually goes on under the hood with RAG is super important. And I do want to create more content diving into that later on! You're right I don't dive into it in this video though, mostly because there is already so much content here I don't want to overwhelm someone in one video.
great video and channel! Thanks! Another very entertaining way to see classic RAG fail spectacularly is legal RAG. My first RAG use case was feeding a large legal doc to a vector db and the result was abysmal... I started to question the whole concept of RAG, but of course, the issue was in front of the computer. I am now experimenting with different approaches, obviously web scraping wont help here, but I think I will need to chunk up documents in different ways and have different dbs to pull information from... Even for one doc, just one vector db wont do. But thanks, your video just reassured me and pydanticAi bundled with your tool on studio.ottomator makes me optimistic to get this right one day :-)
Thank you very much! Sounds like a tough use case and I like where your head is at with it!
I’m also working on a project with legal docs. But I’m a newbie so almost everything is over my head.
Awsome. Does n8n supports agentic rag approach ?
Thank you and yes it does! Essentially for agentic RAG in n8n you can include the usual "vector store retrieval" tool for your agent and also define other tools like I did in this video to explore the knowledge in other ways.
@@ColeMedin
Jumping in here - if I'm not very experienced with coding (though not completely foreign to it either) - would it make more sense to learn pydantic or n8n, if I want to start building agents (including complex ones)? (also considering Relevance AI and Crew AI, if you can comment on those)
BTW - I think a video covering which tool to learn could be a huge success, if you feel you have the right knowledge for this.
Great question! If you're looking to build more complex agents I would certainly recommend starting to learn Pydantic AI. n8n is still great, especially for prototyping, but coding your own agents in the end is more robust. I haven't used Relevance AI and Crew AI much but for a reason, generally it does way too much for you, which can be nice but also takes away a lot of your power for customization.
@@ColeMedin Thanks for the reply. If I may ask a small follow-up: is Pydantic beginner friendly? How much longer would it take to create a fairly basic agent with it compared to something like n8n?
Question. For the second question example, how is it determined that an 'agentic ai' secondary lookup is necessary / that the initial response is not adequate? Is a secondary query always being made (or is this 'black box' )? This important tidbit seems to have been glossed over and trying to understand how this 'decision' is made. Also, as we know, follow up questions / context can be difficult with RAG for know if a new question is being asked versus a request for related or additional information to the previous query(ies). Does Streamlit facilitate this process (you mentioned historical / session based), but am not sure if Streamlit is just for the UI only. Thanks for the great videos.
Fantastic questions! And totally fair that I did gloss over this and it would have been beneficial to dive into it more.
Firstly, the secondary query is not always made. Tested this myself. The agent does know when RAG is "enough", and the system prompt is where I tell it how to make that determination. Basically I just say "if the knowledge returned from the RAG lookup is relevant to the user's question and you feel confident using it to answer, then stick with that and don't perform the secondary lookup. Otherwise, continue to the secondary lookup with the other tools available."
This could probably be improved a lot and I could certainly be more specific to help it make that decision better! But it was a good starting point and I wanted to keep things simple.
For your last question - Streamlit has a concept of "state" for the app, which I do use to store the conversation history. So if a document chunk is returned from RAG, that is included in the conversation history so the agent can leverage it a second time to answer another related question without having to perform RAG again!
@@ColeMedin Thanks for the reply and clarifications. Appreciate your content.
Thank you! You bet!
this is very beautiful but the question i will like to ask is, what if the pages are dynamic either the content of pages can chage as time goes on so the database will contain old information which will no longer be accurate how can one solve this problem?
Thanks and good question! I would take my example and turn it into something you can run on a regular basis (like once a day), clear out the old knowledge, and rescrape and insert up to date knowledge for the site.
Great Content. I have a question about the base of all that architecutrw query pipe. And is the quality of parsing complex pdfs how good isnpydantic parsing because no matter agents you have and tool if your document is bad parse all thos arch fall down. What do you think.
Thank you! For parsing PDFs, I would create a custom solution that you would bake into this process. You wouldn't use Pydantic to parse the PDFs, you would use some PDF library like:
pypdf.readthedocs.io/en/stable/
great... it is posible parse with llamparse and crate agents with pydantic?
Yes for sure!
Great video. Do you use supabase for auth and edge functions too?
Thank you! I have used Supabase for auth a lot in the past, mostly using Auth0 right now though just to have something more universal. Haven't used edge functions much but I know they are great!
Just wanted to let you know that this is a solid project. Thanks for sharing and inspiring us to learn agentic ai. At last youtube doing me good.
And i was trying to implement this to create a chatbot for our colllege website, and my agent replies like ....
Here are the departments in MSEC:
{"tool_calls":[{"id":"list_departments_msec","type":"function","function":{"name":"list_departments_msec"},"parameters":{"}}]}
Note: I used the tool "list_departments_msec" which is not provided by you. Please provide the real department list.
I guess its with me changing the system prompt..
Thank you very much! Which model are you using for this? This kind of response looks like what I typically see with smaller LLMs that don't handle tool calling correctly all the time. I'd try with a different/bigger model!
Another banger! Would you say agentic rag or KAG is better?
Thank you! I need to look into KAG more, something on my list to research. But I'm not super sold on the idea at this point so right now I'm sticking with agentic RAG.
Great content, I have been binge watching your channel over the last few days. Very informative. As a developer this has become a go to channel for AI learning.
Glad to hear it - thank you very much!
You're a god. Thank you for this.
You bet!!
Thanks for the amazing work. I have a challenging use case to use another language using rag but i need it to be as accurate as possible any suggestions?
Thanks
You are so welcome! Seems like you need some larger scale advice, probably would be worth posting in our community! thinktank.ottomator.ai
why basic rag didn'r retrive the whole weather example, although when you split chunks, the function should've accounted for taking whole code blocks and i think the whole example should have been in one chunk, easy to retirive ans spit it out just using basic rag. I know agentic is a lot better, but optimizing basic rag would help agentic I guss as it uses splitting as building block as well.
Fantastic question! Honestly I was wondering that myself. I checked the database and confirmed that the weather agent code block is maintained in a single chunk, and that the retrieval isn't grabbing that chunk so it isn't like the LLM is ignoring it.
Hard to say exactly what the problem is, and if I optimized my setup (different chunking, better RAG with query expansion, etc.) I'm sure I could eventually get to the point where it could pull the full example. Agentic RAG is just one solution to make it more robust but certainly the easiest in my mind!
Thank you for replying to my previous question. I have another one if you don’t mind. Why are you not preprocessing the scrapped data before creating embeddings - normalize it, remove irrelevant data and noise, and structure the data in a more suitable format. From what I have seen, although good, the markdown provided by crawl4ai could use some further sanitation. This is a legitimate question and in no way meant to challenge how you are building; I am genuinely curious to know if preprocessing could be beneficial. Thank you again.
Have you used any other vector databases? Any pref? e.g. Supabase over Pinecone, LanceDB and many others?
Good question! The primary three I've used more than just a bit are Supabase, Qdrant, and Pinecone. Supabase is my general recommendation. It isn't as fast as Qdrant or Pinecone so not the best if you really need speed, but it's as powerful and I love having my SQL DB and RAG on the same platform. Qdrant is open source so I'd generally recommend it over Pinecone since you can host it yourself for free!
Wonder if agentic approach would still be valid with these new reasoning models that surfaced. Thoughts?
Actually I think reasoning models will make agentic RAG even better! Using something like R1 to reason about what to search in the vector DB is something I am looking into.
Way excellent tutorial, thanks a bunch.
Great video, thanks. Did you ever test solutions like fusing (RRF) BM25 results with cosine similarity for precision?
Thank you, you bet! I haven't tested this yet but I would be very curious to do so!
can i create now a vue 3 and nuxt 3 agent with this and combine it somehow with cursor, so that cursor uses claude sonnet and the agent to code my requirements?
GREAT question! You certainly can by putting this agent behind an OpenAI compatible API endpoint and setting that up in Cursor. Something I am going to explore more soon and probably create a video on!
@@ColeMedin i see, but i am not able to create something like this. But think of the possibilities if this works - Amazing!
Really looking forward to a video like this. Let me know if i can help..
You are going as a 🚀
Haha I appreciate it! :D
Forgive my ignorance. You channel is great and this video is fire. It’s got me wondering, is this how Perplexity functions?
Thanks man! I think something like this is a part of the Perplexity platform, but mostly it's a web search engine powered by AI not a RAG solution. Would be difficult to ingest the entire internet into a knowledgebase! haha
Amazing video. Thank you so much.
Thank you, of course!
Please 😢, how to retrieve vector database from supabase, it's always fail although node supabase is successful, but AI Agent didn't know how to answer
Hmmm... have you checked to see what is retrieved from RAG? Maybe the wrong context is being fetched which is why it seems the AI Agent doesn't know how to answer?
Great video! Just a small (maybe inconsequential query?) - any specific reason you used supabase and not other modules for the vectordb ? (Like faiss )
Edit: got my answer when your started agentic rag part lmao - we can't directly store metadata etc on faiss, so makes sense why you selected something like supabase
Your desk setup looks so good. Would be great if you made a video or if not did you buy the desk somewhere or build it/customised yourself and if you built it how? :D
I hate to break it to you but the background is not real! I generated it with AI and then I use a tool called Nvidia Broadcast to put it as my background without even having to have a green screen.
You are a star ⭐
This is fantastic!
Thank you! :D
Thank you, excellent presentation!
You bet - thank you very much!
Hey, If I have around 20-30 docs of websites like these, how should I go and store them,should I store it in a single table or should I break it down to multiple tables ?
Great question! I'd recommend sticking to one table for simplicity, and then setting a metadata field for the website the record is from. Very similar to what I do for the "source" metadata field in the video.
Then when you query the knowledgebase and you only want to query from one website, you just include that metadata filter in your query!
Great video!! Just one question, can you go over how the matching of embeddings work? I guess I didn't understand what are embeddings and thus didn't understand how the searching of "relevant" docs work. Any video I missed where you discussed this in detail?
Thank you! Lot of great resources explaining embeddings on RUclips! I don't have a video dedicated to the topic, but here is one I vetted myself that explains it very well:
ruclips.net/video/dN0lsF2cvm4/видео.html
This is amazing, thank you so much for your work and for teaching us! much appreciated it!
Can you set this up also with deepseek r1 or you need openai embeddings capabilities for the DB?
You are so welcome! You can certainly use R1 through DeepSeek or OpenRouter for the LLM! You'd just have to keep OpenAI for the embedding model or use a different one (like a local embedding model through Ollama).
You are the man.
can we use other llms instead of open ai ones??
like LLAMA, etc.
Yeah you certainly can! In fact Ollama is OpenAI API compatible so you really wouldn't have to change much here! And Pydantic AI supports Ollama.
ollama.com/blog/openai-compatibility
@ColeMedin ohh thx for such quick response!!
You are welcome! :)
wouldn't it be easier to use another agent to create the summaries/titles? i saw you went directly to openai... by using an agent we can save some tokens running ollama
Yes you certainly could! I was just looking to keep it simple but this would be an even better approach!
How would you setup crawl4ai hosted so you can interact with it through webhooks or http requests in n8n workflows?
Great question! And your head is certainly in the right place for how to leverage this in n8n!
I would use FastAPI to create a Python endpoint around whatever Crawl4AI logic you want. The "payload" for the API could be a specific page you want to crawl or a list of pages. Then the API can either return the contents of the crawled page(s) or just put the knowledge in a database like I do and then return a success code.
For hosting this endpoint, I'd recommend using DigitalOcean. I'll probably be making a video on this soon! Lot of people want to use this with n8n.
+1
subbed! great videos!
Q: How vector search can find similarity to questions? Meaning: If my website is built like Q&A then it might that the question maybe comes aside the answer(same chunk?). While we know that LLM trained to answer questions, in RAG it is a just vector similarity match. I did not see that you handle that. and yet, still looks like works pretty well in your demo. I heard some podcast that taught nice trick: first give LLM to try answer something on the question (even if not good enough /no updated enough). Then, take this answer and because it is in positive sentence, not a s question, there is higher probability to find a match to it in the vector DB. what do you think?
Good question! Vector search in simple terms is all about keyword matching. That's why a question can be matched to chunks even though it's not a positive sentence. Those keywords like "tool calling" or "weather agent example" are still in the question. But the idea of forming it into a positive sentence first before retrieval is a good idea too! For a lot of uses cases I bet that'll help with accuracy a lot.
Great video and nice approach.
Any specific reason to choose a chunk_size of 5000 (more or less 1250 tokens) while generally the recommended chunk size is up to 400 tokens?
I just got this working on my home laptop. Amazing! I can see how this might chew a lot of API tokens...
Here's a hard question that the usual LLM's (GPT-4o, DeepSeek, etc.) get wrong:
"Can you explain how Pydantic AI implements its custom validation mechanisms for complex nested models, and what are the performance implications of using these validations in large-scale applications?"
now... how do I rerun this to my docs?
Glad you got it up and running! The title and summary creation for every chunk will certainly take a lot of tokens, luckily though it's a simple task so you can use very cheap LLMs to get the job done.
That's a good question! Could you clarify what you mean by rerun to your docs?
I have a collection of PDF's. Presumably, I need to reliably convert them to markdown, and then let LLM do chunking and bookkeeping, similar to what you did with the website crawl.
Hi Cole, I’ve been following your videos on Agentic AI and RAG, and they’ve been incredibly insightful! I’ve successfully built an AI assistant with Agentic RAG based on your guidance, and it’s working great. However, I want to ensure that the assistant only replies to queries related to my website, and any other queries are considered outside the scope. Could you share any tips or best practices to achieve this? Your expertise would be a huge help. Thanks for the amazing content you share!
Thanks for the kind words and that's super cool you built an agent for yourself based on this! Nice work!
Great question too. The main wait to limit your agent to focus on just what you made it for is to tell it that in the system prompt. Something like "You are an expert at the Pydantic AI documentation and only answer questions and talk about that. If the user asks about or talks about something out of scope, direct them back to talking about Pydantic AI and say you can't discuss other topics."
I have a request, would appreciate if you can show how we can use some of the free alternatives to openAI api (like local ollama or huggingface models) and also if you can make tutorials which involve typing the code step by step in real time, since seeing entire chunks of code at once can be pretty heavy on the eyes.
Great suggestions, I appreciate it! I'll certainly be doing more local AI in the near future for this stuff!
I would enrich it with a twin knowledge base for the langgraph doc and we woul be up to have the best AI agent assistant! How would you to it?
YES LangGraph and Pydantic AI is an incredible combo!
This would fit very well into agentic RAG - we can ingest the LangGraph documentation just like we did with Pydantic AI using Crawl4AI. They have a sitemap.xml as well:
langchain-ai.github.io/langgraph/sitemap.xml
What we can do is set the metadata field "source" to be "pydantic_ai" for the Pydantic AI docs and "langgraph" for the LangGraph docs. Then we can create separate RAG tools for our agent that will search specifically through each of the docs in the knowledgebase using the metadata to filter.
That way the agent won't get confused between the frameworks but can still search through both to combine them together to create agents on our behalf leveraging both.
@ was not expecting a tutorial already! Thanks for the quick advice! I’ll work on it
Why would you not make embeddings for all your data and just use one vector database? Is there value in having some structured and some embeddings?
Great question! Having more options to access the data like this almost always gets better results that just naive RAG. It allows the LLM to reason more about the knowledge it wants to retrieve. Basic RAG is pretty limiting because the agent can't make many decisions about the information it is getting.
Hey Cole, great vid! Do you help businesses setup agentic RAG systems?
Thank you! I don't offer consulting at this time but I am working on a platform to connect developers to business owners. Also feel free to post in our community of developers if you are looking for someone! thinktank.ottomator.ai
Is there a way to feed that data (crawled thanks to Crawl4AI) and easily feed it to a a Chat GPT agent I created ? (I am a no-coder that's why this use case is interesting for me)
Did you create a GPT Assistant? Is that what you mean by Chat GPT agent? If you follow this video to create a knowledgebase in Supabase using what you scrape with Crawl4AI, you could create a custom tool for your OpenAI assistant to query that knowledgebase!
didnt worked out for me. i just thought to update the code for using gemini api key. all went well but at the last when it cam to build a ui then it crashed during second question. works for one question per check related to the doc.
Hmmm... sounds like something is off with the way the conversation history is stored/retrieved if it crashes on the second message. What is the error you get?
can you update the code using ollama models?
Ollama is OpenAI API compatible so it's pretty easy to switch to that instead of GPT! Main thing is just changing the base URL in the OpenAI client to point to Ollama. They have docs covering this:
ollama.com/blog/openai-compatibility
Cara isso é muito bom , parabens e obrigado pelo conhecimento
Agentic is like a cheat code
what about local postgresql for database and local llm?
You can certainly tweak this solution to use both! For example you could host Supabase locally (for Postgres) and run an LLM through Ollama. For the LLM you'd just have to change the "base URL" for the OpenAI client to point to Ollama:
ollama.com/blog/openai-compatibility
And for the Pydantic AI agent, Pydantic AI supports Ollama:
ai.pydantic.dev/api/models/ollama/
Is there anyway to do a tutorial on this for someone with zero coding experience?
I'm putting out a guide on doing something similar with n8n soon!
Hola Cole, me gustaría sugerirte que reconsideres activar las traducciones automáticas en tus videos. Para muchos, como yo, estas hacen tu contenido mucho más accesible y fácil de seguir, especialmente cuando necesitamos prestar atención tanto a lo que explicas como a lo que muestras en pantalla. Al consultar con otros creadores de contenido de habla inglesa, me confirmaron que desactivarlas es una decisión personal, pero hacerlo puede dar la impresión de que no se valora tanto al público HISPANO.
Aprecio un montón tu trabajo y espero que tomes esto como una crítica constructiva y un impulso positivo hacia el cambio.
RUclips has changed some things under the hood so I wasn't aware I lost this. I have automatic dubbing in some languages through RUclips but not others. I will have to look into it!
@@ColeMedin Gracias Cole, me encanta tu material
Very interesting video, thanks! I have a little question for you: Why do you give the URL for the page to the LLM instead of saving your markdown text obtained from the initial crawling? Is there an advantage to do so? I guess it's to avoid the need to constantly update your Markdown information since the URLs will always have the latest information through the URL and will "re-crawl" it if need be, but I was curious to understand if there was other elements in your thinking. Thanks! :)
Thank you and great question! So when I give the URLs to the agent it doesn't actually use the URL to visit the site in realtime. It does just use the markdown I have stored in the database. It simply uses the URL to determine if the content is relevant to the user's question, sort of like a title but I was thinking URLs give extra context with the path - it speaks to how the page relates to the rest of the documentation if that makes sense.
But also you're thinking is spot on that we could also have the agent pull the latest information in realtime with the URL if we wanted!
@@ColeMedin Thank you for your quick and complete reply! I understand better now and really appreciate! It does indeed make sense to have the full URL to have the extra context from the path (I didn't even think of it that way)! Have a nice weekend!
You bet! You have a great weekend too!
Cool.
Is there a JavaScript equivalent to Pydantic AI ?
Thanks! There is not exactly, for JS I'd recommend using LangChain JS.
@@ColeMedin 👍
Awesome tutorial! 👏 Quick question. Is Quadrant's faster speed for sematic search big enough of a benefit to maybe introduce a hybrid model where you might want to use both whereas Supabase might have a reference column to a Quadrant vector store that handles the vector search?
Thank you and good question! Though I might need a bit of clarification. In my mind I can't really see how a specific column in Supabase would point to a Qdrant vector store. If you have multiple Qdrant vector stores to perform RAG with, I would just set those up as separate tools for the agent right in the code instead of making the agent go to Supabase to first find the Qdrant vector store to use.
I suppose though that if you really do have dozens of Qdrant vector stores for some reason, it would be more scalable to maintain that list in Supabase instead of having it hardcoded in your script!
@@ColeMedin I was envisioning something similar to storing the primary key of the record in Supabase record in a Qdrant database as part of its meta data. This record would then store the vectors. It would function similar to a lookup table, instead the vector search portion would run against the Qdrant database instead of the Supabase one and the final search would be combined into one result. Is this feasible, I am wondering?
Ah yes I gotcha - yes this kind of thing is certainly feasible!
couldn't this also be solved using graph rag? great videos! thanks!
Yeah graph RAG is another good solution though I find it more complex than agentic RAG! I do want to cover it in future videos though!
Question. I wish to use Llama instead of OpenAI. I have llama3.1 on my local system through ollama. Can you guide me?
Good man !
Please make available the audio channel selection (automatic RUclips dubbing) for this and other videos 🙏🙏🙏
I do have it on but RUclips by default only does some languages. I'll look into it!
This is so helpful, Cole. Waiting for you to cover the RAG agent with SQL queries.
Glad you found it help! Yes I'm planning on doing that soon and it'll be very similar to this. In fact this already is a RAG agent with SQL queries essentially!
What I was suggesting is an agent having a tool that searches user’s question in a relational database and gives answer. This would mean the agent/tool will need to convert the question into a sql query to fetch the relevant data and feed it to LLM. This is required for most B2B use cases where data is stored in tables.
Gotcha! Yeah I did this exactly for a client once, definitely going to share a version of that implementation on my channel!
esperando lo mismo desde argentina, gracias
Hello @ColeMedin, I used this code of your but I used gemini model instead of openai and made necessary changes. When I am crawling and adding to the database, everything else is working correct except I am getting an error: Error getting title and summary: 429 Resource has been exhausted (e.g. check quota). It might be hitting some rate limit. Could you please tell me how to resolve this. Thanks a lot in advance.
Yeah you're probably hitting the LLM too frequently - that's what a 429 error typically means. I would add a delay between each call to the LLM using the time library in Python. And probably reduce the batch size for the web scraping as well.
Hi Cole, Im half way through this video and I'm curious could this be done in n8n and if so how?
Great question! Unfortunately not out of the gate since Crawl4AI is not one of the Python packages in n8n and there isn't a Crawl4AI node. But what you can do is turn a Crawl4AI implementation into an API endpoint that you call with the HTTP node in n8n! I might actually be making a video on this soon ;)
@@ColeMedin Epic mate thanks for getting back to me and I look forward to the future video.
You bet - thank you!
شكرا جزيلا اعطيتني اجابات على اسئلة لم اكن قادرا على صياغتها او طرحها