Chat with Multiple PDFs using Llama 2 and LangChain (Use Private LLM & Free Embeddings for QA)

Venelin Valkov

Просмотров 27 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 15 июл 2024
Full text tutorial (requires MLExpert Pro): www.mlexpert.io/prompt-engine...
Can you build a chatbot that can answer questions from multiple PDFs? Can you do it with a private LLM? In this video, we'll use the latest Llama 2 13B GPTQ model to chat with multiple PDFs. We'll use the LangChain library to create a chain that can retrieve relevant documents and answer questions from them.
You'll learn how to load a GPTQ model using AutoGPTQ, convert a directory with PDFs to a vector store and create a chain using LangChain that works with text chunks from the vector store.
Discord: / discord
Prepare for the Machine Learning interview: mlexpert.io
Subscribe: bit.ly/venelin-subscribe
GitHub repository: github.com/curiousily/Get-Thi...
PDF files:
Tesla Quarterly Report (Jul 21, 2023) - ir.tesla.com/_flysystem/s3/se...
Meta Q2 2023 Earnings (Jul 26, 2023) - s21.q4cdn.com/399680738/files...
Nvidia Fiscal Q1 2024 - s201.q4cdn.com/141608511/file...
Join this channel to get access to the perks and support my work:
/ @venelin_valkov
00:00 - Introduction
00:38 - Text Tutorial on MLExpert
01:11 - Earning Reports (PDF Files)
02:08 - Llama 2 GPTQ
03:59 - Google Colab Setup
06:00 - Prepare the Vector Database with Instructor Embeddings
08:45 - Create a Chain with Llama 2 13B GPTQ
14:36 - Chat with PDF Files
20:55 - Conclusion
#llm #langchain #chatbot #artificialintelligence #chatgpt #llama2 #gpt4 #promptengineering

Комментарии • 57

@sanjaybhatikar 5 месяцев назад
Love your videos - so easy to follow along and your give the facts without sugar-coating! Thank you!
@theyashsisodiya 3 месяца назад ⁺¹
Great video I loved it! The most underrated video it's been almost 1 month since I found a perfect video.
@venelin_valkov 11 месяцев назад ⁺⁴
Full text tutorial (requires MLExpert Pro): www.mlexpert.io/prompt-engineering/chat-with-multiple-pdfs-using-llama-2-and-langchain
Important: The filename of the model has been changed in the repository. Use:
model_name_or_path = "TheBloke/Llama-2-13B-chat-GPTQ"
model_basename = "model"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
model = AutoGPTQForCausalLM.from_quantized(
model_name_or_path,
revision="gptq-4bit-128g-actorder_True",
model_basename=model_basename,
use_safetensors=True,
trust_remote_code=True,
inject_fused_attention=False,
device=DEVICE,
quantize_config=None,
)
@je9625 9 месяцев назад ⁺³
I'm certainly not paying to see your code, sorry.
@MrWeasel4o 11 месяцев назад ⁺¹
Целенасочено пиша на български: страхотна работа, клиповете ти са много информативни и полезни, градиш над наличните статии в интернет и имаш отличен подход в обясненията.
@venelin_valkov 11 месяцев назад ⁺¹
Благодаря!
@sanjaybhatikar 5 месяцев назад ⁺¹
Thanks. I ran your example. SEC filings are in HTML or text format, with the text being a kind of XML. The moment I switched from PDF loader to the HTML loader from langchain-community, I landed in "dependency hell". Try removing the version numbers from the packages (which you have so kindly provided) and again numerous dependency conflicts arise. That makes it very difficult and cumbersome to use Langchain in practice. The sheer number of conflicting dependencies will be a challenge to use this in production if it continues. Thanks again.
@Lucifer-lc8rq 10 месяцев назад ⁺²
Some of the tokenized texts from text_splitter could be bigger than Sequence_Lenght of selected embedding model (512 tokens)... and the difference is silently dropped by the embedding model. This result in no embeddings for dropped text. Hence the missing information. Could add a token counter function to check the issue.
@wilfredomartel7781 6 месяцев назад ⁺¹
🎉🎉
@jennilthiyam1261 7 месяцев назад ⁺¹
what about if i want to have followup question to the llama-2? like performing interactive conversation and not just one time question.
@darkknight32920 10 месяцев назад ⁺¹
Is there an estimate for the max number of documents you could put in the vectorstore?
@pantherg4236 10 месяцев назад ⁺¹
What is the best way to learn deep learning fundamentals via implementation (let's say pick a trivial problem of build a recommendation system for movies) using pytorch in Aug 26, 2023? Thanks in advance
@arinspace6588 10 месяцев назад ⁺¹
Did they change the filename of the model in the repository again? If so, what shall be the correct code now?
@Lorenzo_T 10 месяцев назад ⁺⁵
Is it possible to use this code (and ideally also the google colab gpu) to create a web page that contains a chatbot to which you can send questions and print the answers on the screen?
@tejassp4964 9 месяцев назад
Yes, create an api end point on Google colab using flask and ngrok to host on a public IP. Research more
@user-uw3jp6jc7e 7 месяцев назад ⁺¹
what is the "-qqq" in pip install command for? Can anyone explain, I couldn't find an answer
@akankshajoshiii 10 месяцев назад
can you broadly tell the accuracy of the system?
@danmotoc3691 8 месяцев назад ⁺¹
there is a conflict between torch 2.0.1 and 2.1.0 when installing other dependencies, can someone help?
@vipd6983 7 месяцев назад ⁺¹
How can we reduce hallucination
@arutchelvana33 6 месяцев назад ⁺¹
@valkov, Thank you for the great tutorial.
When i tried , at step "Llama 2 13B",
Getting this error. Plz help to solve
WARNING:auto_gptq.modeling._base:Exllama kernel is not installed, reset disable_exllama to True. This may because you installed auto_gptq using a pre-build wheel on Windows, in which exllama_kernels are not compiled. To use exllama_kernels to further speedup inference, you can re-install auto_gptq from source.
WARNING:auto_gptq.modeling._base:CUDA kernels for auto_gptq are not installed, this will result in very slow inference speed. This may because:
1. You disabled CUDA extensions compilation by setting BUILD_CUDA_EXT=0 when install auto_gptq from source.
2. You are using pytorch without CUDA support.
3. CUDA and nvcc are not installed in your device.
Bcoz of this, Model performing very slow. It take CPU to run.
@user-hd9li6df4r 10 месяцев назад ⁺³
My os is windows 10 and my system is cori5_8generation
and i dont have GPU can i use llama2 chat for 7B in my collab and work ? Or no work?
@ailenrgrimaldi6050 3 месяца назад
looking for the same answer
@krispyPotato36 9 месяцев назад ⁺¹
chaa muda
@vigneshkumar6099 10 месяцев назад ⁺¹
How to add memory to it?
@courses4611 11 месяцев назад ⁺³
Is it possible to deploy Llama 2 with a custom knowledge base to production?
@venelin_valkov 11 месяцев назад ⁺¹
Yes, but check all models/libraries (including Llama 2) license agreements carefully. Definitely try other embeddings too, since they might work better for your use case.
@courses4611 10 месяцев назад
@@venelin_valkov Thank you! It would be nice if you review this in the next tutorial because there is no information about the deployment step in the web
@loicbaconnier9150 11 месяцев назад ⁺¹
What is the interest to import popples ? Not sure your bad answer came from embedding but from pdf parser
@venelin_valkov 11 месяцев назад ⁺¹
Do you mean the poppler library? I use that to preview some pages of the earning reports (PDFs) as images within the notebook.
Yes, it is possible that the PDF parser didn't do a good job here. Let me know if you find something better!
Thanks for watching!
@byron-ih2ge 10 месяцев назад ⁺¹
Sir, i am getting an error can you help??
@mohsenghafari7652 6 месяцев назад
Hi.how to create coustum model ? Tanks
@mohsenghafari7652 5 месяцев назад
hi. please help me. how to create custom model from many pdfs in Persian language? tank you.
@gammingtoch259 4 месяца назад
How can use llama models to translate a entire document?
@mohsenghafari7652 4 месяца назад
Hello my friend. Won't you help me with the specialized model of the Persian language? I really need your help!
@SoftwareDevelopmentiQuasar 4 месяца назад
there is issue with dependencies and requirements , could you please solve it.
@sarazayan6655 10 месяцев назад ⁺¹
I get this error when running this cell :FileNotFoundError: Could not find model in TheBloke/Llama-2-13B-chat-GPTQ
The cell:
model_name_or_path="TheBloke/Llama-2-13B-chat-GPTQ"
model="gptq_model-4bit-128g"
tokenizer=AutoTokenizer.from_pretrained(model_name_or_path,use_fast=True)
model=AutoGPTQForCausalLM.from_quantized(
model_name_or_path,
revision="gptq-4bit-128g-actorder_True",
model_basename=model,
use_safetensors=True,
trust_remote_code=True,
inject_fused_attention=False,
device=Device,
quantize_config=None,
)
can you please help
@sarazayan6655 10 месяцев назад ⁺¹
I changed model_basename=model but still gets error
@mohsenghafari7652 6 месяцев назад
Hi.how to create coustum model ? Tanks
@sanjaybhatikar 5 месяцев назад
Why do you think the choice of embeddings makes such a difference? I get that embeddings capture meaning and linguistic structure, so. the quality of embeddings should make some difference. But assuming all LLMs are trained on high volume of low quality data on the internet, the difference should not be substantial. Would appreciate your insight! Cheers.
@pauloneto8147 11 месяцев назад ⁺²
does this work on non-searchable documents?
@venelin_valkov 11 месяцев назад ⁺³
Not with the current PyPDF loader. You might try with something like unstructured (supported by LangChain): python.langchain.com/docs/integrations/providers/unstructured or other OCR engine.
@mohsenghafari7652 3 месяца назад
hi dear. can you customize this code for me? I will definitely pay you for this trouble. please inform. Thankful
@shubhamkumar-er2dd 10 месяцев назад ⁺¹
GOOGLE COLAB LINK
@chronicallychill9979 7 месяцев назад ⁺¹
bad audio
@JakubSK 7 месяцев назад ⁺¹
Stop using langchain. I said this months ago.
@thoughtsofadyingatheist1003 7 месяцев назад ⁺¹
Hi, I'm new to this, I'm a software engineer and just getting into ML & LLMs.
I started using Langchain recently, curious what the alternatives be.
@sungrokkim3417 10 месяцев назад ⁺¹
"model = AutoGPTQForCausalLM.from_quantized(
model_name_or_path,
revision="gptq-4bit-128g-actorder_True",
model_basename=model_basename,
use_safetensors=True,
trust_remote_code=True,
inject_fused_attention=False,
device=DEVICE,
quantize_config=None,
)" gives an error... "Could not find model in TheBloke/Llama-2-13B-chat-GPT" I put lots of efforts to find the path of it or reinstall it. What should I do? I will appreciate your help a lot! Thank you. And I should not forget to say your content is awesome! :)
@sungrokkim3417 10 месяцев назад
I am trying to train the model with my company pdf files... It will be so useful for my company staff members.
@nashkwmz3331 10 месяцев назад
@@sungrokkim3417 Write the code this way instead
model_name_or_path = "Thebloke/Llama-2-13B-chat-GPTQ"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
model = AutoGPTQForCausalLM.from_quantized(
model_name_or_path,
use_safetensors=True,
trust_remote_code=True,
inject_fused_attention=False,
device=DEVICE,
quantize_config=None,
)
@venelin_valkov 10 месяцев назад ⁺³
The model filename has been changed in the repository. This should fix it:
model_basename = "model"
@axcelkuhn6400 10 месяцев назад ⁺¹
@@venelin_valkovt thxx u
@sungrokkim3417 10 месяцев назад ⁺¹
@@venelin_valkov THANK YOU SO MUCH!! Now it works. If I want to have a technical support from you with payment, what should I do?

Следующие

Автовоспроизведение

Chat with Multiple PDFs | LangChain App Tutorial in Python (Free LLMs and Embeddings)