How to Make RAG Chatbots FAST

James Briggs

Просмотров 39 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 8 сен 2024

Комментарии • 72

@WinsonDabbles Год назад ⁺³³
I really appreciate that all if not most of your collabs dont use langchain at all. Really like to see what goes on under the hood to learn from a first principals perspective.
@hughesadam87 10 месяцев назад ⁺¹
These videos are such a high quality collection of content for app developers in the AI space who are building apps and not AI experts (nor really care about the AI itself, just wanting to use it)
@xflory26x Год назад ⁺²
Been anticipating this video since seeing the notebook on your github! Thank you so much for your detailed explanations! Would be keen to see your implementations of NeMo Guardrail's moderation pipelines :)
@chrismcdannel3908 11 месяцев назад ⁺¹
Great dissection on the "wrapper" visualization to simplify the relationship between the Agent and the Model. I'm going to borrow it, with backlinks of course.
Oh, and thanks for the composure in your thumbnails my guy. It's nice to see some professionalism getting the merit it deserves instead of some assclown with his jaw on the floor and pulling his hair up like a chimp on drugs. Classy AF bro. Keep up the good work.
@cmars7845 Год назад
Thanks for the intro to NeMo Guardrails! I kept expecting you to say tools like ... 'google' ... but you seemed to pause and then not say it 😂
@joshualee6559 11 месяцев назад ⁺¹
I want to build this for my research lab so that we can query information about our protocols, standards, etc. This seems really useful.
I presume it wouldn't be that hard to then embed it into a slack chatbot?
@drwho8576 11 месяцев назад ⁺²
Excellent video as always. Thanks for sharing. Is there a way to setup Colang for an "anything but" scenario? So far, I only seem to be able to program what to detect for a workflow. But can I setup a 'default deny' type thing? Anything different than the topic my bot is designed to handle returns an "I'm sorry, Dave. I'm afraid I can't do that"...
@pavellegkodymov4295 Год назад
Thanks, James, very very useful. Will try to include guardrails into our corporate RAG chat bot.
@realCleanK 6 месяцев назад
Really appreciate you puttin this together 🙏
@yarikbratashchuk3386 11 месяцев назад ⁺¹
Would this approach work if vecorized data was shop inventory? And the question was like how many items fo you have? Or about specifics about a group of items?
@AlgoTradingX Год назад ⁺¹
You glowed up like crazy in your video content ! It so cool !!!!
@jamesbriggs Год назад
thanks Sajid - means a lot coming from you :)
@uctran1169 11 месяцев назад ⁺¹
Can you make a video tutorial on creating data from wikipedia?
@mobime6682 7 месяцев назад
Great show thank you. Question -t seems awfully similar to your (more recent?) videos about semantic router , or have I got the wrong end of the stick. I know I should do a similarity search on the text for each I guess 😉! Thanks again.
@sandorkonya 11 месяцев назад ⁺¹
Thank you for the super video. I wonder how can we do chain-of-thoughts (COT) or tree-of-thoughts with Guardrails without langchain?
@unperfectbryce 10 месяцев назад ⁺¹
can't you just do knn with your embeddings to make sure query isn't out of distribution, isn't this a pretty quick euclidean distance operation? why bother with guardrails? Thanks for the great video! keep it up.
@user-hh9do9fn1o 2 месяца назад
1.Not all queries are straightforward. Complex queries might need more nuanced understanding and contextual analysis which KNN might not handle well
2.Guardrails can adapt to new rules and policies quickly, while KNN models might need retraining with new data
3.Guardrails can provide more interpretable reasons for why a query is out-of-distribution or not appropriate, aiding in better understanding and transparency
however using both of these together might be more robust.
@kaustubhnegi1838 3 месяца назад
🎯 Key points for quick navigation:
00:00 *🔍 Introduction to retrieval augmented generation with guardrails for building chatbots.*
00:27 *📂 Utilizing vector database (Pinecone), embedding model (RoBERTa), and documents for retrieval.*
00:54 *🕸️ Two traditional approaches to RAG: naive approach and agent approach.*
02:25 *⌛ Agent approach is slower but potentially more powerful with multiple thoughts and external tools.*
05:23 *🛡️ Guardrails approach: Directly embedding query, checking similarity with defined guardrails, and triggering retrieval tool if needed.*
07:42 *🧩 Guardrails approach combines query and retrieved context, then passes to language model for answer generation.*
08:23 *⚡ Guardrails approach is significantly faster than agent approach while still allowing tool usage.*
09:03 *📋 Step-by-step implementation details, including data indexing, embedding, and Pinecone setup.*
13:12 *🔄 Defining retrieve and RAG functions as guard actions for guardrails.*
14:46 *🚫 Guardrails config to avoid talking about politics.*
15:15 *🤖 Defining guardrail for user asking about LLMs to trigger RAG pipeline.*
17:10 *🔥 Demonstrating RAG pipeline via guardrails, showing its effectiveness in answering LLM-related queries.*
18:04 *🆚 Comparing guardrails without RAG, which lacks information for LLM-related queries.*
19:55 *💡 Guardrails approach allows agent-like tool usage without slow initial LM call, making it faster for triggered tools.*
@shortthrow434 Год назад ⁺¹
Excellent thank you James.
@jamesbriggs Год назад
you're welcome!
@RichardHamnett 11 месяцев назад
Brilliant mate, also don't forget this could be a massive cost optimizer along with speed :)
@aravindudupa957 11 месяцев назад ⁺¹
What is the difference in accuracy between reasoning (whether to retrieve) using embeddeding similarity vs giving it to an llm?
@ylazerson Год назад ⁺¹
Great video as always!
@jamesbriggs Год назад
Thanks as always!
@RichardBurgmann 11 месяцев назад ⁺¹
Hi James, Enjoying your series greatly. Question or suggestion for a future video, I've been seeing a lot of articles on the use of graph data structures to build knowledge graphs to address issues such as hallucinations and weaknesses in logical reasoning in LLMs'. I've only found one person who has actually done this and they had mixed results as far as addressing these issues. Wondering what your experience has been in this area? Do you have an opinion? From what I can see there is not much evidence (yet) that it is a better result than well crafted semantic search.
@jamesbriggs 11 месяцев назад ⁺²
I never tried myself, but everyone I know who tried said it was hard to do and the results were either the same as or worse than using vector search - so haven't had much reason to look into it
Maybe at some point if I see it useful for a particular use-case, and it makes sense to use it given trade-offs, I'll try it out
@shaheerzaman620 Год назад
awesome video James!
@jamesbriggs Год назад
Thanks Shaheer!
@fabianaltendorfer11 11 месяцев назад
great video! any Idea how to deal with screenshots in the documents?
@elrecreoadan878 10 месяцев назад
When should one opt to RAG, fine tune or just a Botpress knowledge base linked to chatgpt? Thank you !
@andriusem Год назад ⁺¹
This what I was searching for! Thanks James your videos are very informative and easy to follow with google colab! My question would be: Can we use extracted information from vector db to analyse by LLM and provide insights or compare different documents using guardraills or Agent? Thanks, keep a great work!
@jamesbriggs Год назад ⁺¹
It depends on what you’re comparing, but I see no reason as to why it couldn’t work! You can select an existing doc at random, perform a semantic search for similar docs and feed them into your LLM with instructions on what you’re comparing - there may be other ways of doing it too - I hope that helps!
@guanjwcn Год назад ⁺¹
Thanks very much for the sharing, James. May I seek your advice on how I can estimate infrastructure requirements eg number of GPUs, assuming I need to host an open source model with size of 70B on premise and the number of concurrent users being 1000 at most? Thank you very much.
@jamesbriggs Год назад ⁺²
You can calculate number of parameters * bytes required for data type of each parameter - people do keep asking about this so I think I can go into more detail in a future video
@reknine 11 месяцев назад
Would really appreciate that!@@jamesbriggs
@georgekokkinakis7288 Год назад ⁺¹
Very informative video. Thanks. Is there any chance that you know any open sourced LLM that supports the Greek language for retrieval augmented generation?
@jamesbriggs Год назад ⁺¹
cohere have a multilingual embedding model - it probably covers Greek, there will also be multilingual sentence transformers you can use too :)
@georgekokkinakis7288 Год назад
Thanks for your response @jamesbriggs . For the embedding part I have found a multilingual model which does an excellent job in retrieving the document which is more relevant to the placed question. What I can not find is an open sourced LLM for the generation part which will generate the answer to the user's query based on the retrieved document (I am talking for the Greek language). OpenAI tokenizer is very expensive since, from what I have noticed, it tokenizes the greek words to character level. So using their model does not fit to my task at hand. Any ways , if you ever notice any generative model which supports Greek please mention it to your upcoming videos, which by the way I have to say that they have helped me a lot.
@ashraymallesh2856 11 месяцев назад ⁺¹
@@georgekokkinakis7288 what about doing the RAG pipeline in english and then translating to greek for your users? :P
@georgekokkinakis7288 11 месяцев назад
@@ashraymallesh2856 If I am not mistaken, please correct me if I am wrong, applying the RAG pipeline in English would require first to translate the documents from Greek to English. As I mentioned in a previous post the documents contain mathematical definitions and terminology. Using a translation model or google translate api wouldn't work because, for example, google translate translates the words παραπληρωματικές , συμπληρωματικές, both as supplementary which is not correct. On the other hand translate all the documents by a human would be a tedious task. That's why I am looking for an open sourced LLM which supports the Greek language. Any ideas are welcome 😁.
@user-jj2mo5sl7p Год назад
useful work!
@ThangTran-rj8gt 11 месяцев назад
Hey! I am researching the topic of answering questions from an open-domain, so how can I get data from that domain? Thank you
@rabomeister Год назад
What do you think about the accuracy and other related metrics while using guardrails? It really sounds nice, but if you use LLMs on fields with high risks (finance), does it promise accuracy also, at least similar to standard approaches? Great videos by the way, I guess i implemented almost all of them. And always nice to learn from a professional.
@rabomeister Год назад ⁺¹
Also, (if you are ok with that since you also work for a company), if you can make a video about the hardware side of the LLMs and DBs, that would be great. Because at some point, there are enough information about coding and software (of course, not enough yet, but one can implement something somehow), but hardware side really requires theoretical knowledge. I dont want to just check the tables and go buy some NVIDIA GPU, I want to know why. Thanks in advance.
@jamesbriggs Год назад ⁺²
It’s hard to guarantee accuracy, LLM and the broader field of NLP is generally non-deterministic so there’s always that level of randomness, I’m still figuring out the best way of dealing with it myself - we try to add metrics, or extra LLM analysis steps (like asking “is this answer using information from these sources…”) - but it’s a difficult problem
I like the GPU hardware idea, would love to jump into it
@aravindudupa957 11 месяцев назад
@James Are there any good "deterministic" ways to check the accuracy of information (by going through the reply and checking for eg) in the reply to that in the context? I've heard of Self-check GPT which takes multiple iterations but it's not 'deterministic'. It would be great to have such a technique!
@chrismcdannel3908 11 месяцев назад
@@rabomeister outside of highly specialized & sensitive use cases requiring procurement of a commercial grade GPU or TPU, and the talent & skill to use it effectively in a business process, there is no real advantage in spending $15-$20K or more on the HW unless you just have the insatiable desire & urge to do it for the hell of it and because you want to have your own and that's ok too my friend. Unfortunately the cloud giants, have structured the market in a way that makes getting compute from them is still more economically prudent than buying even 1 of the ASICs they have hundreds of thousands or millions of.
@OlivierEble Год назад ⁺¹
I want to start using RAG but I want something fully local. What could be an alternative to pinecone?
@rabomeister Год назад
Except pinecone, almost all of the vectorstores are opensource. Also, I dont know about pinecone since its not free, but others are mostly similar. I use chromadb for my personal projects since I started working on LLMs recently and it is very user friendly. You will handle it, the problematic part is data.
@satyamwarghat1305 Год назад ⁺¹
Use Deeplake I have been using it for my projects it is pretty good
@jamesbriggs Год назад ⁺²
Yeah if you want fully local there are open source alternatives like qdrant or weaviate - for the comment above, Pinecone is free, they have the free/standard tier :)
@drwho8576 11 месяцев назад
Using pgvector here, directly on top of good-ol Postgres. Works like a charm.
@humayounkhan7946 Год назад
This is awesome, thanks James, out of curiosity, do you know if this can be integrated with langchain?
@jamesbriggs Год назад ⁺²
absolutely, Langchain is code, and we can execute code via actions like we did with our RAG pipeline here
Год назад
Have you tried to setup this with gpt-4? I'm getting some errors switching from davinci to gpt-4
@jamesbriggs 11 месяцев назад
Hey Andre! I usually avoid generating output with the built-in LLM function, I usually just use guardrails as a mid decision layer and then use actions to call LLMs like GPT4
@user-ib1st1tm9w Год назад
How to use guardrail and RAG with other LLM? Like falcon or Llama?
@jamesbriggs Год назад
You can modify the model provider and name in the config.yaml file - they have docs on it in the guardrails GitHub repo :)
@AbhayKumar-yh9zs 9 месяцев назад
For implementing lagchain agents with NemoGuardrails do we need to do below?
in the colang file first define the action which is calling the function which has the agent execution like this
$answer = execute custom_function(query=$last_user_message)
and then we register the tails like ?
rag_rails.register_action(action=custom_function, name="custom_function")
Am I on right track?
@eightrice Год назад
does this have message history? Does the context carry over from one input to the next?
@jamesbriggs Год назад
In this example no, but you can bring in a few previous interactions for embedding
@eightrice Год назад
@@jamesbriggs why would you use embeddings on the previous interactions? Can you just use the ChatCompletion endpoint and pass the array of previous messages as `chat_history` ?
@jamesbriggs Год назад
@@eightrice ChatCompletion endpoint is more effective, and is what you do for the "agent approach to RAG" - it's just slower.
In real-world use-cases I have always used the pure agent approach, but I recently begun experimenting with a mix of both, so I try to capture obvious queries ("user asks about LLMs") with guardrails and send the single query direct to the RAG pipeline, but for more general purpose queries I direct them to the typical agent endpoint (and include conversation history)
I'm still experimenting with the best approach, but so far this system seems to be working well for speeding up a reasonable portion of queries
@eightrice Год назад
@@jamesbriggs yup, that hybrid architecture seems optimal if you need both normal chatbot functionality and subject matter knowledge with low latency. Thank you so much for this, I feel like I should be paying a lot for your code and tutorials :)
@jamesbriggs 11 месяцев назад
@@eightrice yeah so far I've liked this approach - haha no worries, I'm happy it's useful :)
@deter3 Год назад ⁺¹
This method is very simple to talk using toy example , but you need lots of hard work in the real business enviorment to build and test whether it's really working or not . Using simple sentences + embedding distance for decision making is not really a reliable solution .
@jamesbriggs Год назад ⁺⁵
I use it in production, it can be more reliable at times than LLMs if you define the semantic vector space that should trigger an action well - typically I view prompt engineering as the broad stroke, and guardrails as the fine-tuning of your chatbot behavior, so when you specific RAG workflows like "refer to HR docs", "refer to eng docs", "refer to company Y DB", guardrails can be very helpful
But you're very right, it needs a lot of work, testing, and iterating over the guardrails to get something reliable
@prashanthsai3441 10 месяцев назад ⁺¹
Why should I use guardrails? @jamesbriggs
I have dialogflow which has all the intents and flows (like in colang file) - I will check the intent confidence and if it is high then I will trigger the corresponding intent flow and if it is low then I will retrieve the data from the source using naive retrieval method?s
@EarningsNest Год назад ⁺²
Did u smoke something before recording this ?
@jamesbriggs Год назад ⁺¹
I have a relaxed nature 😂
@chrisalmighty Год назад ⁺¹
😂😂😂Hilarious

Следующие

Автовоспроизведение

ChatGPT Plugins: Build Your Own in Python!