LangChain + OpenAI tutorial: Building a Q&A system w/ own text data
HTML-код
- Опубликовано: 11 дек 2024
- LangChain is a fantastic tool for developers looking to build AI systems using the variety of LLMs (large language models, like GPT-4, Alpaca, Llama etc), as it helps unify and standardize the developer experience in text embeddings, vector stores / databases (like Chroma), and chaining it for downstream applications through agents. In this tutorial we're using our own custom text / data and training a question and answer agent on it.
Want to learn more about LLMs (large language models)? Here's my learning path:
Watch PART 2 of the LangChain / LLM series:
• LangChain + OpenAI to ...
Watch PART 3 of the LangChain / LLM series
LangChain + HuggingFace's Inference API (no OpenAI credits required!)
• LangChain + HuggingFac...
Watch PART 4 of the LangChain / LLM sereis:
How Embeddings in LLMs work (a practical tutorial + code demo)
All the code for the LLM (large language models) series featuring GPT-3, ChatGPT, LangChain, LlamaIndex and more are on my github repository so go and ⭐ star or 🍴 fork it. Happy Coding!
github.com/onl...
Other links mentioned in the video:
LangChain documentation: python.langcha...
Visualizing embeddings: github.com/onl...
Learn about DuckDB: • DuckDB: Hi-performance...
Just wanted stop at this first video in the playlist to thank you very much for sharing your knowledge.
And I want to say thank you for taking the time out to do this. This means a lot!
Все по теме четко и ясно, без лишней воды. Good job
Спасибо! я очень ценю это
I just found your channel and I've been watching your videos which honestly imo, best explains all about LLMs. I just would like to ask if you could make a pop up of your video whenever you mention that you did a video about it so that we can click it and it takes us there. Thanks!
That’s very good feedback, I’ll incorporate that. Thank you for your kind words!
Thanks for the video! When you do `vector_store = Chroma.from_documents(texts, embeddings)`, you only use `embeddings = OpenAIEmbeddings()`. But in previous code when introducing embeddings, you also use `doc_embeddings = embeddings.embed_documents([text])`. We don't need to do that here?
Hey man, im getting error on your:
vecstore = Chroma.from_documents(texts, embeddings)
qa = RetrievalQA.from_chain_type(
llm=OpenAI(),
chain_type = "stuff",
retriver = vecstore.as_retriever()
)
ValidationError: 2 validation errors for RetrievalQA
retriever
field required (type=value_error.missing)
retriver
extra fields not permitted (type=value_error.extra)
Can u help me out?
Really like how you explained and demo the code. Great video will be checking out your channel quite a bit. Thank you.
Wow thank you Marc!
What a perfect no B.S. intro. Thanks.
Thank you!
This video is great! I can't wait to check out all your other videos as well. Please keep posting great contents!
Thank you! Means a lot!
This video should have more views
Wouldn’t mind having more views!
Hi, could this type of chatbot be integrated with Messenger and own website for customer support like interactions, based on own Q&A files and text?
Great video, with the advent of plugins and agents coming up is there anything langchain can do which they can not?
Yeah a few main scenarios where you’d use LangChain:
1. Don’t want to use OpenAI. Plug in other LLMs directly into LangChain as the API is quite unified
2. Chaining different providers. You might use Pinecone for the vector store but something else for the Q&A chain
3. The “preprocessing”: chunking up a large corpus to “chunks” to fit into context window size, or calling it a directory loader on a directory of pdf then handle some text extraction in the upstream etc.
4. Want to work on your local environment / “offline mode”
If possible can you please explain how to brought that chromadb folder ??
Great Sam, clean explanation and easy to understand, i wanna know more about embeddings data preparation that you put in news directory
Thank you Gede! There is almost no preparation at the moment, our team builds a simple news monitor targeting news relating to the mining and commodities industry, so I take a sample of that and build the Q&A downstream. Vector stores like Chroma and indexing tools like Llamaindex all handle these unstructured text files very robustly.
I can write up an article on how we / our clients use it later on! Thanks for the feedbacks!
would be great to see a langchain tutorial that reads google docs that are access controlled and uses an open source LLM
Hey Moresh, thanks for the suggestion!
I do have a video on my channel on using open LLMs through HuggingFace and I’m also in the process of recording another video on using locally-hosted LLM (runs on your machine). I’ll see when I have time to finish it and do the editing! :)
LangChain + HuggingFace's Inference API (no OpenAI credits required!)
ruclips.net/video/dD_xNmePdd0/видео.html
@@SamuelChan are you planning to use GPT4All for the locally hosted version? Great tutorials btw, keep up the good work!
@@moreshk GPT4All is someting I'm keeping a close eye on!
Thank you for the kind words Moresh!
Ty for video but I have a question. Does your code have limits on the use of tokens? I use my code but got error
This model's maximum context length is 4097 tokens, however you requested 4225 tokens(3975 in your prompt; 250 for the completion). please reduce your prompt; or completion length
hey Helios, are you also using the sample data I provided in the github repo? Are you using your own data?
Chromadb is not combatable with python 3.11. is there going to be a workaround?
great video thanks Samuel, easy to follow.
Thank you, glad it was helpful!
The AI did not answer the "Why?" at the end of your third question. 17:20 Perhaps because the AI does not know why the ban was implemented?
Interesting. In practice I would maybe use Guidance for the prompt design. This adds more structure to the output and makes the assistant more deterministic. “Why” should still be answered, even if it has to formulate a speculative response (also getting the assistant to remark on the “knowns” from the “unknowns” in its response)
ruclips.net/video/k4Ejc3bLQiU/видео.html
What is the difference with using the first method of creating the Q and A chain, and using the SQLChain method?
RetrievalQA pseudocode:
docsearch = Chroma.from_documents(texts)
qa = RetrievalQA.from_chain_type(
llm=OpenAI(),
chain_type="stuff",
retriever=docsearch.as_retriever()
)
SQLDatabaseChain pseudocode:
db = SQLDatabase.from_uri("sqlite:///something.db")
db_chain = SQLDatabaseChain(llm= OpenAI(), database=db, verbose=True)
The latter kicks off a database agent process and operate on the database, I have a longer video on using OpenAI on your custom csv or database here:
ruclips.net/video/Fz0WJWzfNPI/видео.html
only a few minutes in but super helpful, thank you!
Thank you Ben! Means a lot!
Hi, Thank you for your tutorials. I am following your tutorials for quite some time now. I have watched your whole playlist on this. However I am unable to figure out best economic approach for my use case.
I want to create a Q & A chatbot on streamlit which answers only my custom single document of about 500 pages. The document is final and won't change. From my understanding so far, I should either choose Langchain or LlamaIndex. But, I will have to use OpenAI api to get best answers, but that API is quite costly for me. So far I have thought of using Chroma for embedding and somhow storing the vectors as pkl or json on streamlit itself for re-use, so I don't have to spend again for vectors/indexing. I don't have enough credits to test different methods myself.
Kindly guide me. Thank you.
Hey Atul, if the document is final and won't change, the first cost-saving opportunity is on embeddings: you can use Sentence Transformer, which is free and available through the transformers API / HuggingFace inference. You will then save the embeddings into Chroma, which again is free and open source (unlike pinecone, which has a pricing plan). At this point all of this is still free.
Then for Q&A, you call OpenAI's API but if this is in production setting, I may be tempted to implement some sort of caching mechanisms for very common queries. For example if you provide a few "canned questions" to choose from, those can be served through a cache. Eg. "Break down the R&D spending for me from this financial report" is a question that could be fully deterministic in its answer and can be cached. There's plenty of other opportunities for optimizations but these would significantly reduce your bills.
Great video. Learned a lot.
Using embedded DuckDB without persistence: data will be transient
Output exceeds the size limit. Open the full output data in a text editor---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
getting this error whenever i trying to convert to vector
on this command
vecstore = Chroma. from_documents(texts, embeddings)
even tried with faiss getting a ratelimiterror
please helppp
Looking at the exact error message "Output exceeds the size limit " it looks like it's something specific to VSCode and has nothing to do with langchain or chroma. If you run it on a terminal, does the same message appear? Are you printing a bunch of things (using print statements anywhere in your code)?
@@SamuelChan the last error i got was Using embedded DuckDB without persistence: data will be transient
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
@@SamuelChan Yes even if I am running it on terminal it shows the exact same thing
Great tutorial! Would love to learn more on this topic
Thanks! Yeah definitely; have a few ideas in mind just need to plan them out
Hello, Samuel! Great video, my friend!
Is it possible to specify the OpenAI LLM version? I conducted some tests, and it always uses daVinci. Is it possible to use a more cost-effective model?
Thank you.
Thank you! Yeah of course, specifying the LLM model is one of the most basic feature of LangChain. Use the model_name parameter:
from langchain.llms import OpenAI
from langchain.embeddings.openai import OpenAIEmbeddings
model_name = 'text-embedding-ada-002'
embed = OpenAIEmbeddings(
model=model_name,
openai_api_key=OPENAI_API_KEY
)
llm = OpenAI(model_name="text-davinci-002")
Change model_name to whichever you need it to be! :)
Great video! Is there a way to create a chatbot that smartly uses our data + gpt-3.5 data and give us a COMBINED answer from both the data set instead of just our data set or gpt data set?
So let's say your document had details about china and indonesia around the energy and export details and there's no information for a country like India. If my question remains the same except the country name now changing to "India and Indonesia"...it should still be able to answer by looking up our data set for details related to Indonesia and then look-up the gpt-3.5 data for details related to India.
But the gpt-3.5 data would have been stale data from when it’s last trained right? So it would have given you information from 2021, and hallucinate with high confidence as if they were actual data.
In my use case I would have prefer it to use a proper source of information (“vector store”) and not conflate it with anything, LLM in my workflows are just text generation mechanism, not sources of information or factual data
hey first of all nice video
i was having a error , i am unable to import a proper dotenv version
can anyone help
what does that mean? Check that you have installed dotenv (pip install) and the import statement should work :)
This is awesome, yes 100% interested to dig deeper into this topic!
A question: the rate limiter still applies, correct? How do we deal with that? Can we set up multiple API keys and make a script to rotate the keys?
Yes it does apply. One caveat about rotation pool is that OpenAI enforce the rate limiting on the org level, not account level. If you set up a ton of API keys but under the same org, it counts toward the same balance 😊
Each model has a different limit, you can send 200x more tokens per minute compared to a DaVinci model. You can also fill up a rate increase form to get even higher rate limit ceilings as well!
platform.openai.com/docs/guides/rate-limits/overview
@@SamuelChan wow i really appreciate the insights! you rock Sir!
Nice video! Your style is expert!
I would REALLY like to learn about the SQLChain methods of translating natural language to SQL to query a database then receive natural language back. If you have the bandwidth, that would be a hit!
Great suggestion! Say no more. Here's using natural language to query and chat with your database! (freshly released 2 mins ago)
ruclips.net/video/Fz0WJWzfNPI/видео.html
@@SamuelChan I noticed you used chain type "stuff" in your example @15:52 .... what if you have hundreds/thousands of documents, should I use "map reduce" instead?
But If I use "map reduce" wouldn't it cost a lot of credit (money) since open charges per AI call.
How do I reduce the cost of my query if I have a lot of documents?
thanks!
@@RunForPeace-hk1cu hey I replied to you on Discord. But I would encourage you to check out the other videos in this playllist as they would systemically cover these concepts
ruclips.net/p/PLXsFtK46HZxUQERRbOmuGoqbMD-KWLkOS
Where map reduce is mentioned along with caching:
ruclips.net/video/Uk_SJSnQRU8/видео.html
Where Pinecone is mentioned (for indexing hundreds of pages on cloud) is also helpful:
ruclips.net/video/k8G1EDZgF1E/видео.html
Good luck!
I am getting the error ImportError: cannot import name 'Document' from 'langchain.schema' (/usr/local/lib/python3.10/dist-packages/langchain/schema.py)
even though i ran the requiremnets.txt file can you help
Hey can you check again? I’ve just released a bump this morning to bump LangChain and LlamaIndex to the latest version so make sure you upgrade both libs - check requirements.txt if unsure! :)
Hi Samuel, great content!
One question i have is, is it possible to make open ai take the data from a document lets say a 20 page pdf, and i ask it to write an article based on that data in a special style, something to note, there are no data about the special style in my document but should be there in its general knowledgebase? I want the ai to use custom knowledge on top of its existing knowledge and not only custom knowledge.
Yeah sure, that’s quite a common use case too.
You probably can’t fit all 20 pages of the pdf in one go, so something like LangChain and llamaindex helps you “chunk” them up for the indexing, and you use a LLM to generate an article or a summary based off that index (what you call custom knowledge).
The langchain series on this channel goes through these use cases as well! You can use existing pdf, existing csv / spreadsheets, existing websites, existing documents or books to build your index and have your LLM synthesize based on these custom document. Very useful for internal knowledge bases, for example
@@SamuelChan Thank you so much for the response! Is it ideal to set the temperature of open ai model to more than 0.5 to have the special style from open ai's general knowledgebase but get the main content from the document? Also, Do you know of a way to get open ai to send articles of more than 1000 words?
Yeah on temperature: closer to 1.0 leads to “wilder”, more adventurous behavior. Close to 0 leads to safer, more conservative behavior / writing styles. Something like temperature of 0 would make it deterministic.
I’ve used langchain on some 15 page PDFs, and intend to record a tutorial on that as well. I didn’t count it to know if it has >1000 words but my guess is it might come close. Currently have 5 videos in the LLM series already though so this may have to wait a bit before going live on the channel! 😄
@@SamuelChan sorry i couldn't ask my question properly, i meant since there's a token limit for conversations, i want to know if you know of a way to get chatgpt api to give you an article of over 1000 words using any technique
@@waheedahmed6602 hey no worries at all!
At minute 19:20 of this video on my channel you see me use the max_length argument to change the length of text that gpt generates for me given a certain prompt. This might be what you're looking for! Docs described it as "The maximum length of the sequence to be generated".
ruclips.net/video/dD_xNmePdd0/видео.html
There should be an upper limit of max length you can use with gpt, but that probably depends on the model you choose among other things (~500 words, or 4,000 characters)
Did anyone tried to deploy this on any cloud service? I tried to deploy my app on GAE but having chromadb as one of the dependency in requirements.txt fails the deploy. Can anyone help me with this? (The error code displayed is 13 after I deploy)
Hey Akash! Chroma is dependent on sentence-transformers, which is dependent on pytorch. If you're using python 3.11 in your prod, this might be the problem (github.com/chroma-core/chroma/issues/249). The easy workaround is to go one version lower, to python 3.10, or use the Dockerfile (github.com/chroma-core/chroma/blob/main/docker-compose.yml).
If that isn't the issue, then I'm afraid we need more details on the error you're getting to be able to help :)
@@SamuelChan Thanks for the reply! I am using Python 3.9. I'll try to add torch dependency
@@SamuelChan I added all the dependencies. I have also given the required permissions to my service account, still the deployment fails but if i remove chromadb and its dependencies, it gets deployed.
This is amazing!!Thank you for the tutorial~ I am wondering after embedding our own data.Does it change the original opai llm's in some way? (like if you ask it about the question that isn't related to our data set, is the answer gonna be the same as if we didn't do the embedding , or what if the data we provided didn't match the answer it original thought? I'm sorry if my question is stupid.I just started with AI. Anyway ,Thank you for the tutorial ~~
Hey that is not a stupid question! :)
It doesn’t change the LLM in any way. But a query is executed against the index / vector store and not directly on your text itself. This might be a simple Vector Store, it might be a tree structure, it might be the KeywordTable structure etc (the full LLM playlist on my channel has 8+ videos, which goes into these). So you’ll do the embeddings to have a vector representation of the Q&A to work.
If the data doesn’t include the answer - the Q&A chain would say something like “I couldn’t answer the question with the provided information”.
In practice, if your data is static (let’s say you’re training a LLM to answer questions on Investor Relations report from 2010-2022) and doesn’t grow or evolve, you can train the embeddings ONCE. Then store it locally (watch the other videos on this channel to see the options: local json, Chroma and Pinecone). Then your Question Answer Retriever will query against this vector store instead of re-building the embeddings and re-indexing everything. This makes your query faster, cheaper and also more deterministic in theory since it’s the same vector store.
Train with our data as new embeddings
But where & how do i train llama
Is there an easy way to save the embeddings/chroma so that instead of going to OpenAI, we can just refer to the saved file?
Hey Liam, yes! You can save the embeddings in json, as a python dict, or just on disk (as a json file, passing a name)
Here’s my sample implementation: github.com/onlyphantom/llm-python/blob/main/2b_llama_chroma.py
Find the lines with the .save_to_dict, .save_to_disk() code!
I also have videos covering these usage patterns in the LLM playlist on this channel! Hope it helps!
Another example implementation:
github.com/onlyphantom/llm-python/blob/main/2_llama.py
This demonstrates saving embeddings to disk and loading from it!
@@SamuelChan Oh thats awesome! So we have to use Llama index instead of Langchain??? Thank you so much for the responses !!!
I’d say LangChain and Llamindex have a pretty large overlap with how both libraries evolved. LangChain has a wider feature set with how it implements agents and sequential “chains”, and llamaindex’s originally was a project to help transform unstructured data to structured data that can be thrown into gpt.
Both langchain and llamaindex supports lots of embedding db / index implementations and that includes Chroma! :)
@@SamuelChan Oh I see! So I can save the indexes created by Llamaindex to avoid using tokens again and again. Load the index from the file, and then can use Langchain to query that index in a smarter way with their wider feature set?
This video has clear explanation. Thanks for that, Sam. I have question, instead of using OpenAI could you please provide a demo with open source LLM? Like question answering from doc using open source LLM. It would be really helpful.
Hey! Thank you! I do have a video that demonstrates using langchain with an open source LLM:
LangChain + HuggingFace's Inference API (no OpenAI credits required!)
ruclips.net/video/dD_xNmePdd0/видео.html
You won’t even need OpenAI credits to go through all the examples there. I also have another video that I’m publishing on Wednesday featuring langchain with a locally-hosted LLM (running on your own machine)!
make a video to connect gpt with data base and get response
Right here! Just published a week ago! :)
LangChain + OpenAI to chat w/ (query) own Database / CSV!
ruclips.net/video/Fz0WJWzfNPI/видео.html
I watch this video with 0.75 speed
I gotta learn to speak slower 😬