Chat with Multiple PDFs using Llama 2 and LangChain (Use Private LLM & Free Embeddings for QA)
HTML-код
- Опубликовано: 15 июл 2024
- Full text tutorial (requires MLExpert Pro): www.mlexpert.io/prompt-engine...
Can you build a chatbot that can answer questions from multiple PDFs? Can you do it with a private LLM? In this video, we'll use the latest Llama 2 13B GPTQ model to chat with multiple PDFs. We'll use the LangChain library to create a chain that can retrieve relevant documents and answer questions from them.
You'll learn how to load a GPTQ model using AutoGPTQ, convert a directory with PDFs to a vector store and create a chain using LangChain that works with text chunks from the vector store.
Discord: / discord
Prepare for the Machine Learning interview: mlexpert.io
Subscribe: bit.ly/venelin-subscribe
GitHub repository: github.com/curiousily/Get-Thi...
PDF files:
Tesla Quarterly Report (Jul 21, 2023) - ir.tesla.com/_flysystem/s3/se...
Meta Q2 2023 Earnings (Jul 26, 2023) - s21.q4cdn.com/399680738/files...
Nvidia Fiscal Q1 2024 - s201.q4cdn.com/141608511/file...
Join this channel to get access to the perks and support my work:
/ @venelin_valkov
00:00 - Introduction
00:38 - Text Tutorial on MLExpert
01:11 - Earning Reports (PDF Files)
02:08 - Llama 2 GPTQ
03:59 - Google Colab Setup
06:00 - Prepare the Vector Database with Instructor Embeddings
08:45 - Create a Chain with Llama 2 13B GPTQ
14:36 - Chat with PDF Files
20:55 - Conclusion
#llm #langchain #chatbot #artificialintelligence #chatgpt #llama2 #gpt4 #promptengineering
Love your videos - so easy to follow along and your give the facts without sugar-coating! Thank you!
Great video I loved it! The most underrated video it's been almost 1 month since I found a perfect video.
Full text tutorial (requires MLExpert Pro): www.mlexpert.io/prompt-engineering/chat-with-multiple-pdfs-using-llama-2-and-langchain
Important: The filename of the model has been changed in the repository. Use:
model_name_or_path = "TheBloke/Llama-2-13B-chat-GPTQ"
model_basename = "model"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
model = AutoGPTQForCausalLM.from_quantized(
model_name_or_path,
revision="gptq-4bit-128g-actorder_True",
model_basename=model_basename,
use_safetensors=True,
trust_remote_code=True,
inject_fused_attention=False,
device=DEVICE,
quantize_config=None,
)
I'm certainly not paying to see your code, sorry.
Целенасочено пиша на български: страхотна работа, клиповете ти са много информативни и полезни, градиш над наличните статии в интернет и имаш отличен подход в обясненията.
Благодаря!
Thanks. I ran your example. SEC filings are in HTML or text format, with the text being a kind of XML. The moment I switched from PDF loader to the HTML loader from langchain-community, I landed in "dependency hell". Try removing the version numbers from the packages (which you have so kindly provided) and again numerous dependency conflicts arise. That makes it very difficult and cumbersome to use Langchain in practice. The sheer number of conflicting dependencies will be a challenge to use this in production if it continues. Thanks again.
Some of the tokenized texts from text_splitter could be bigger than Sequence_Lenght of selected embedding model (512 tokens)... and the difference is silently dropped by the embedding model. This result in no embeddings for dropped text. Hence the missing information. Could add a token counter function to check the issue.
🎉🎉
what about if i want to have followup question to the llama-2? like performing interactive conversation and not just one time question.
Is there an estimate for the max number of documents you could put in the vectorstore?
What is the best way to learn deep learning fundamentals via implementation (let's say pick a trivial problem of build a recommendation system for movies) using pytorch in Aug 26, 2023? Thanks in advance
Did they change the filename of the model in the repository again? If so, what shall be the correct code now?
Is it possible to use this code (and ideally also the google colab gpu) to create a web page that contains a chatbot to which you can send questions and print the answers on the screen?
Yes, create an api end point on Google colab using flask and ngrok to host on a public IP. Research more
what is the "-qqq" in pip install command for? Can anyone explain, I couldn't find an answer
can you broadly tell the accuracy of the system?
there is a conflict between torch 2.0.1 and 2.1.0 when installing other dependencies, can someone help?
How can we reduce hallucination
@valkov, Thank you for the great tutorial.
When i tried , at step "Llama 2 13B",
Getting this error. Plz help to solve
WARNING:auto_gptq.modeling._base:Exllama kernel is not installed, reset disable_exllama to True. This may because you installed auto_gptq using a pre-build wheel on Windows, in which exllama_kernels are not compiled. To use exllama_kernels to further speedup inference, you can re-install auto_gptq from source.
WARNING:auto_gptq.modeling._base:CUDA kernels for auto_gptq are not installed, this will result in very slow inference speed. This may because:
1. You disabled CUDA extensions compilation by setting BUILD_CUDA_EXT=0 when install auto_gptq from source.
2. You are using pytorch without CUDA support.
3. CUDA and nvcc are not installed in your device.
Bcoz of this, Model performing very slow. It take CPU to run.
My os is windows 10 and my system is cori5_8generation
and i dont have GPU can i use llama2 chat for 7B in my collab and work ? Or no work?
looking for the same answer
chaa muda
How to add memory to it?
Is it possible to deploy Llama 2 with a custom knowledge base to production?
Yes, but check all models/libraries (including Llama 2) license agreements carefully. Definitely try other embeddings too, since they might work better for your use case.
@@venelin_valkov Thank you! It would be nice if you review this in the next tutorial because there is no information about the deployment step in the web
What is the interest to import popples ? Not sure your bad answer came from embedding but from pdf parser
Do you mean the poppler library? I use that to preview some pages of the earning reports (PDFs) as images within the notebook.
Yes, it is possible that the PDF parser didn't do a good job here. Let me know if you find something better!
Thanks for watching!
Sir, i am getting an error can you help??
Hi.how to create coustum model ? Tanks
hi. please help me. how to create custom model from many pdfs in Persian language? tank you.
How can use llama models to translate a entire document?
Hello my friend. Won't you help me with the specialized model of the Persian language? I really need your help!
there is issue with dependencies and requirements , could you please solve it.
I get this error when running this cell :FileNotFoundError: Could not find model in TheBloke/Llama-2-13B-chat-GPTQ
The cell:
model_name_or_path="TheBloke/Llama-2-13B-chat-GPTQ"
model="gptq_model-4bit-128g"
tokenizer=AutoTokenizer.from_pretrained(model_name_or_path,use_fast=True)
model=AutoGPTQForCausalLM.from_quantized(
model_name_or_path,
revision="gptq-4bit-128g-actorder_True",
model_basename=model,
use_safetensors=True,
trust_remote_code=True,
inject_fused_attention=False,
device=Device,
quantize_config=None,
)
can you please help
I changed model_basename=model but still gets error
Hi.how to create coustum model ? Tanks
Why do you think the choice of embeddings makes such a difference? I get that embeddings capture meaning and linguistic structure, so. the quality of embeddings should make some difference. But assuming all LLMs are trained on high volume of low quality data on the internet, the difference should not be substantial. Would appreciate your insight! Cheers.
does this work on non-searchable documents?
Not with the current PyPDF loader. You might try with something like unstructured (supported by LangChain): python.langchain.com/docs/integrations/providers/unstructured or other OCR engine.
hi dear. can you customize this code for me? I will definitely pay you for this trouble. please inform. Thankful
GOOGLE COLAB LINK
bad audio
Stop using langchain. I said this months ago.
Hi, I'm new to this, I'm a software engineer and just getting into ML & LLMs.
I started using Langchain recently, curious what the alternatives be.
"model = AutoGPTQForCausalLM.from_quantized(
model_name_or_path,
revision="gptq-4bit-128g-actorder_True",
model_basename=model_basename,
use_safetensors=True,
trust_remote_code=True,
inject_fused_attention=False,
device=DEVICE,
quantize_config=None,
)" gives an error... "Could not find model in TheBloke/Llama-2-13B-chat-GPT" I put lots of efforts to find the path of it or reinstall it. What should I do? I will appreciate your help a lot! Thank you. And I should not forget to say your content is awesome! :)
I am trying to train the model with my company pdf files... It will be so useful for my company staff members.
@@sungrokkim3417 Write the code this way instead
model_name_or_path = "Thebloke/Llama-2-13B-chat-GPTQ"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
model = AutoGPTQForCausalLM.from_quantized(
model_name_or_path,
use_safetensors=True,
trust_remote_code=True,
inject_fused_attention=False,
device=DEVICE,
quantize_config=None,
)
The model filename has been changed in the repository. This should fix it:
model_basename = "model"
@@venelin_valkovt thxx u
@@venelin_valkov THANK YOU SO MUCH!! Now it works. If I want to have a technical support from you with payment, what should I do?