Langchain PDF App (GUI) | Create a ChatGPT For Your PDF in Python
HTML-код
- Опубликовано: 15 июл 2024
- In this project-based tutorial, we will use Langchain to create a ChatGPT for your PDF using Streamlit. We will build an application that allows you to ask questions about a PDF document and get answers directly from an LLM (Large Language Model), like OpenAI's ChatGPT.
----------------
Links
👉 Github repo: github.com/alejandro-ao/langc...
👉 Langchain docs: python.langchain.com/en/lates...
💬 Join the Discord Help Server - link.alejandro-ao.com/HrFKZn
❤️ Buy me a coffee... or a beer (thanks): link.alejandro-ao.com/l83gNq
✉️ Join the mail list: link.alejandro-ao.com/AIIguB
----------------
Langchain is a powerful open-source Python (and Javascript) framework that allows you to build applications with LLMs and text embeddings. With Langchain and PDF, you can easily train GPT on your own data and create a personalized LLM that can answer questions and generate text.
In this tutorial, we will walk you through the process of building an application with Langchain using Streamlit. We will dive into Langchain and its features, including its ability to train GPT on documents and text embeddings.
We will show you how to create embeddings for your text using OpenAI's API and Langchain, and how to integrate it into your project. You will also learn how to train GPT on PDF documents and fine-tune it to your specific use case.
Throughout the tutorial, we will be building a fully functional application that allows you to upload a PDF file, ask questions about it directly, and have an LLM answer for you. We will use the powerful capabilities of Langchain with PDF to create a seamless user experience and showcase the framework's many possibilities.
Langchain is a versatile framework that can be used in many applications, from chatbots to document analysis. We will show you how to leverage Langchain to create powerful applications with Streamlit that can help automate tasks and improve efficiency.
If you're looking to take your natural language processing skills to the next level, this tutorial is a must-watch. Learn how to build powerful applications with Langchain and chat with GPT about your PDF files. So, what are you waiting for? Let's get started with Langchain!
--------------------------------
⏰ Timestamps
0:00 Introduction
1:48 Setup
6:30 Create GUI
9:33 Read the PDF
12:46 Process Diagram
16:45 Split PDF into chunks
21:22 Create embeddings
24:19 Finish Knowledge Base
26:11 Similarity Search
29:23 Q&A Chain from Langchain
34:37 Monitor Your Costs
38:55 Conclusion
#streamlit #langchain #python #chatgpt #openai
💬 Join the Discord Help Server: link.alejandro-ao.com/981ypA
❤ Buy me a coffee (thanks): link.alejandro-ao.com/YR8Fkw
✉ Join the mail list: link.alejandro-ao.com/o6TJUl
Do I need to buy chatgpt API to build this project?
I built something similar a while back. You should teach people how to store vectors locally. That would drastically reduce the cost.
Hi, I am also trying to make a Q&A chatbot based on my single custom document of about 500 pages. But the vectors are quite costly for me. Can you please help with a workaround. I have explained my use case in the comments too, but so far no answer. Thank you.
I also want to know about it 😅
Thank you, thank you, thank you! Your video solidified everything for me. I knew the different pieces, but not how they all meshed together. Now I can finally continue with the project I was working on. Great content. Subscribed.
Im learning python and i decided to follow along with you here. Amazing! I have less than a few months of experience and I was able to follow along with you. Well done lad!
Incredible job. Thank you very much for this project. I have no more than 2 weeks learning python and with your help, I was able to complete this project in my free time. Keep the amazing work Sir. Much respect from the Caribbean 🌴
After updating python to the latest version and updating my PyCharm installation, this tutorial ran perfectly. Alejandro is an excellent presenter. Thanks for this terrific tutorial!
Thank you man,
I had this project in mind for a while, never had time for it.
Very good content. Taking the time to explain what every component of the code is amazing. I never felt I understand code like this. I am subscribing.
Never heard about you before Alejandro, but I'm glad you popped up on my RUclips feed. Got to give it to you---your a natural born teacher. Excellent job and very informative. Liked and subscribed.
Thank you, it means a lot! be sure to use this knowledge to build awesome applications 😎
Holy smokes - the conciseness of you code given all that it's doing behind the scenes is breathtaking! Impressive!
thanks mate!
Deeply appreciate for your sharing !!
Perhaps the best tutorial out there. Maximum value 0 crap. Keep it like this. :)
Good job.
thanks! i appreciate it!
Fabulous job! Great use of the diagram to provide the high level view. I like that you kept the code simple and focused on the main objective, and covered each part step-by-step. I also like that you used FAISS, since it is free.
I'm glad you appreciated it! The new release of LangChain is out now btw!
This is an incredible tutorial. I'm not very technical and I was able to follow the entire thing! Thanks for creating it.
i'm very glad to hear that!
Thank you Alejandro for all the efforts and amazing videos you are creating A big thanks from my bottom of my heart ♥
You are a really good teacher. Love how you explain things step by step without overwhelming. Keep it up!
Thanks, it means a lot!
@@alejandro_ao
Hi can you help me resolve this error --> No module named 'altair.vegalite.v4', I am not able to start streamlit
also can you make a video on environment setup. Please i really need to submit the project within one week
One of the best tutorials on the subject! Great job!
Thanks! I really appreciate it, there are many more coming soon 🔥
I've watched numerous GPT for PDF RUclips tutorials, but yours stands out for its clarity, conciseness, and thorough explanations. Thanks, Alejandro! I've subscribed and eagerly await more content from you.
You’re far too kind! Thanks :) I’m glad you found it useful! I’ll keep making videos like this 💪🏼
I agree! I’ve been watching videos for months and this one is so clear for someone who was not a coder previously. I’ve subscribed too and will go back and watch prior videos. Thanks Alejandro!
tip: you api key is a firewall hole
100% agree with @chiefwiki ! Great tutorial and appreciate the process diagram ✈
انا ايضا!
me too!
Thanks for the Intro showing the final result. Priceless
i’m glad you liked it!! i’ll be posting more when i come back from vacation.
Love it! Thanks!!
Thanks! This is what I was looking for!
I really appreciate the tiny details and explanations like "since we're using langchain you have to set the variable like this" etc. Very helpful in understanding "why" in addition to "how". Cheers
Thanks! I try to keep it as approachable as possible so everyone can learn it 😊
@@alejandro_ao
Hi can you help me resolve this error --> No module named 'altair.vegalite.v4', I am not able to start streamlit
also can you make a video on environment setup. Please i really need to submit the project within one week
Great Tutorial!!! Thanks a lot for sharing with us!
Hi from France 👋
It's the best tutorial on how to create our own PDF GPT.
Thanks a lot.
bonjour ! hi from toulouse :)
nice video! appreciate the step-by-step details.
I learned new frameworks and technology today. Thanks
Looks super duper good Alejandro !!😍
thank you, you look super good!!
Excellent. Very well done!!
Very well explained. Thank you!
thank you!
Great, this is what I needed to get me going. The best guide I have seen so far. I've been working on the code and can now load multiple files of pdf, docx, txt and html at the same time. I also increased the text box size to allow for follow-on questions. Next I want to integrate something like Pinecone with Chat 'memory'.
Hi @AbsenteeAtom, Do you mind sharing how you went about loading multiple files of PDF, Docx, txt?
Excellent tutorial, thank you for sharing. Let's learn more about Fais and modularizing code 😃
Hey there! Let me know what you want to see next 👇
Thanks for the video. I was being told I had to convert them to CSV, which made no sense to me.
Great job tks from mexico keep going man
I appreciate your hard work and great video. Thank you.
Brilliant. Useful. Thank you.
great tutorial. Thanks soo much. Really easy to follow for a non-programmer.
let's turn you into a programmer 💪
Excellent content, and it works smoothly n correct for me. I also tested your other Repo about ask-multiple-pdf's. GREAT, thank you!
Awesome explanation every single code steps
Langchain is a framework that allows you to build apps with LLMs like ChatGPT or GPT4All🔥
Edit: Be careful with adding super long documents as the embedding also has a cost. Pricing is minimal but it can scale up pretty quickly if you embed long texts. Here are OpenAI's embeddings prices: openai.com/pricing#embedding-models
Let me know if you want a detailed course on Langchain 💪
regarding your Chat gpt video about maths when how can we copy and paste into word
yes !! i find the documentation on their website hard to use for creating own apps, a nice series on how to use Langchain for more applications would be nice (details on how to use agents, models, tools, ...) so others can do experimental apps. There are other videos on youtube explaining how to in general but not in detail like you did with this one. Kudos
@@sean9901 there are definitely more videos about langchain and its features coming very soon 😎
@@yaseenkhan-oq4ih i’m actually adding that feature to the extension, so be sure to install it from the chrome web store to get the updates ;)
Is this why you used FAISS instead of OpenAI embeddings? Would OpenAI embeddings given better results though?
Pretty wild! So cool - thanks
Amazing tutorial! Thank you! 😃
thank you!
I was just thinking about creating a dashboard to visualize $ spent on requests. You are ahead of the game
Streamlit seems lit 🔥Thanks bro ;)
Awesome explanation, video, and idea
Awesome and thank you for not using colab or jyupter. Very few people doing actual LLM coding, mostly skims thru colab notebook.
Thanks buddy for this awesome content waiting for more videos on langchain and keep explaining everything like you did so that it helps all to understand in detail.
thanks! i will :)
Amazing job. thanks for this.
The best step by step explanation. Please keep it up! I suscribed expecting to learn much more. Thank you
thanks!
Absolutely brilliant! Subscribed
thank you!
I really appreciate your videos, great walkthrough. Thanks.
thanks!
Amazing explication, thank you so much
Having watched several GPT for PDF RUclips tutorials, I must say that yours stands out due to its clear, concise, and comprehensive explanations. Thank you, Alejandro! I have subscribed to your channel and eagerly look forward to your future content.
By the way, I'm in the process of developing my own Langchain app by replicating the code and incorporating additional features. Could you please guide me on replacing the Open API Key in your .env.example file?
Very nice tutorial, thanks! Wish your channel quick growth🤞
thank you! it means a lot :)
This video is amazing, I get the most from it.
thanks, you're awesome!
Very good tutorial, thanks.
very useful thank you bro
Well done excellent tutorial best I’ve seen so far
thanks! you're the best
Thank you very much!!!!!!!
Hey!, keep these videos man!
Thank you so much for the very nice tutorial.
OpenAI has "gpt-3.5-turbo" available which is much cheaper and should have better performance than its default model here, we just need to specify the model as this:
llm = OpenAI(model_name="gpt-3.5-turbo")
what is the default model used ?
@@SoftYoda Its text-davinci-003
There ins't any free model?
Is it possible for me to use GPT-4 if I am willing to pay for it? Like if I sign up for the GPT-4 thing (not the chat-plus which I already have) will I get a GPT-4 API for use in a custom bot built with langchain??
as of 2023-06-27, LangChain has set the default model to gpt-3.5-turbo. No need to do the above setting.
There are other open-source models available right now, such as flan-alpaca-large.
llm = HuggingFaceHub(repo_id="declare-lab/flan-alpaca-large", model_kwargs={"temperature":0, "max_length":512})
This is really great content thank you 🎉
This is perfect thanks!
If you are getting an error that says "ModuleNotFoundError: No module named 'altair.vegalite.v4'" - you have to downgrade your Altair to version 4.1.0 as version 5 is giving the error. You can do this by running the command "pip install altair==4.1.0"
That's very interesting thanks for share
I love this content.. thanks for sharing
you're welcome!
Thank you for sharing and explaining.
thank you!
Nice tutorial video, and thanks for the shout out to my work!
hello there! thank you for your amazing work! 💪
amazing video, thank you for sharing.
Thank you! More coming soon
thank you very much Alejandro
no problem!
thank you, man,
Superb!
you are
Good tutorial. Very helpful. Thanks
thank you!
Great video
Great!
Good job
I stumbled upon this channel by accident - to be honest, I was looking for information. I'm 30 years old and decided to change my career. I started learning the Python language from scratch. Often, I hear that it doesn't make sense at this age, etc., but I have a goal and dreams.
Thank you very much for this video and the knowledge you convey in a simple and genuinely transparent way. Your way of explaining is really understandable. You're doing a great job.
dude, that is great! i'm glad you got into software development, it's a very rewarding and dynamic industry. 30yo is absolutely alright to get into it! keep it up and enjoy the learning process!
=) I am 38, studying webdev. python seems way easer after all that pain with javascript... like the both.
This is great man, thank you so much. Sub'd.
thanks!
Thanks!
you're awesome man, thank you!!
I've watched several videos on knowledge bases, but this one stands out! The way they explain OpenAI is so clear and concise. Highly recommended!
@Alejandro how can I make a chatbot which understand my previous chat and give batter answer it my pleasure if you answer me,.
HA! My Bass instructor just assigned your intro song for me to practice ; )
rock on dude lml
Your contents are great 👍, also I like your teaching style with visual flow of codes. Please make more videos on streamlit, langchain and OpenAI. Thanks in advanced
i will!
@@alejandro_ao Hi can you help me resolve this error --> No module named 'altair.vegalite.v4', I am not able to start streamlit
also can you make a video on environment setup. Please i really need to submit the project within one week
I have seen a video and it is also very clear that you can that program is based on GPT-3.0. I tried to make adjustments on that program but failed. I hope I can get some inspiration from you here. Thanks!
You have done an excellent job of explaining complex material in a simple manner. I was wondering if you could provide information on loading multiple PDFs or direct me to additional tutorials on the topic?
thank you! i am publishing a detailed explantation on how to do that next week!
Some may experience issues doing this as there are a few things I believe Alejandro assumed were already done. This is especially true for getting past the first phase which IMO is launching the initial app in you local environment.
1. You may need to add a .toml file with your open ai keys. You have to add a folder in your project folder titled .streamlit with a file inside it titled secrets.toml
2. You may have to add import openai in your apps file
3. if you have issues with firewall (I did) you'll need to add the port
Open the Windows Defender Firewall settings:
Go to the Control Panel.
Search for "Windows Defender Firewall" and open it.
its a bit involved so best to ask GPT how to allow access to the specified port by configuring Windows Defender Firewall
after these issues it was a breeze for me
Any help needed, just ask me
Cheers from Ireland
Awesome 😮 thanks
thanks!!
@@alejandro_aoCan you please share the same process with a csv file
@@sadyaz64 actually that video is coming next week! be sure to click on the bell so that you get a notification once it's live :)
I have subscribed to you buddy. This is what my employer assign me to do this, You make my job easier. 🤩
i’m happy to hear that! welcome to the club 😎
Thanks for the great tutorial. I'm glad I found you !
$0.02 per request might not be high for an individual, but that's something to consider for any commercial application. At scale, it might go up fast.
I guess prices can only go down and OpenAI is currently getting the first mover bonus.
indeed this approach might still be quite expensive for a freemium business model. but you can always find ways to monetize this!
@@alejandro_ao the two same requests pay double or just pay once ?
Great work. Like and Subscribed. Thank you again for sharing this project. I am looking forward to seeing more of your videos.
Perhaps, others may have already asked these, but I would like to know if:
- For PDF files that are non-English languages, would FAISS embedding and the model work as effectively?
- In the case of multiple pdf files or a large pdf file, is there a way to reduce the embedding cost? (like using non-open AI models, or other embedding methods)
Thank you for your time.
Thanks for this video, it was the most well explained video I have found on RUclips till now.
One question, how is this knowledge base created using FAISS performs against vector databases like Pinecone for semantic searches?
¡Gracias!
gracias a ti 😊
Very good and comprehensive tutorial Alejandro - Do you have an idea how I can use multiple JSON file instead PDFs?
Thanks
thank you
Using FAISS part was very good. I would like to see the alternatives and different approaches also. You might consider making different advanced level videos also. Thank you.
hey there, thanks for your comment. sure thing. as soon as i get back from vacation i’ll be covering more advanced topics! i’ll be posting surveys soon to see which topics you guys prefer. stay tuned :)
🎯 Key Takeaways for quick navigation:
00:16 📋 The tutorial demonstrates how to build a Python application with a graphical user interface for PDF text analysis.
01:24 🛠️ The environment setup includes using a .env file for secret keys, and it's essential to name the OpenAI API key environment variable correctly as "openai-api-key."
03:16 📦 The necessary dependencies for the project include Langchain, pdf2, python-dotenv, and Streamlit.
06:46 📄 The tutorial utilizes Streamlit to create a user interface for the PDF application, allowing users to upload PDFs.
12:54 🧩 The text from the PDF is divided into manageable chunks to work with language models efficiently.
20:24 📃 The tutorial explains how to convert the text chunks into embeddings for semantic search.
26:19 💬 An input field is added for users to ask questions about the uploaded PDF.
27:27 📚 The video discusses the process of using a knowledge base and similarity search to find chunks of information relevant to a user's question.
31:44 💬 The Langchain framework provides a convenient "load QA chain" for question answering using language models like OpenAI's. It allows you to use various language models.
34:51 💰 To monitor spending on questions answered by the language model, Langchain offers a callback function that tracks the cost per operation, especially useful when working with OpenAI.
38:09 🧾 You can track the cost of generating answers to questions using Langchain's monitoring function, which shows the cost in tokens and currency for each question answered.
Made with HARPA AI
very good, thanks man! I'd love to see it integrated with pinecone so that we could upload a bunch of pdfs and have it stateful.
it's coming soon!
I saw several of you video trainings and I like the details and the easy and clear explanations.
I wonder if instead to send the request for the embeddings each time, is is possible to be saved in a local db and in that case only needed if there is a new pdf file uploaded.
Hope this makes sense.
Definitely I will subscribe to your channel 👏👏👏
Alejandro thanks for this. I’m getting a little more familiar with Python from every video. I believe there is a big opportunity for someone like you to help ‘no coders’ like me get the most from LLM’s.
It would be helpful to know how multiple documents could be uploaded and what the limits of the LLM are before the responses begin to fail completely or degrade in quality.
I always see people who don’t know sht telling somebody how this is a great opportunity for someone to teach them for free. How about to learn it on your own? “Non-coders” you mean people who jump on bandwagons. If Tylenol became cool tomorrow you should also be a doctor. Maybe you should be more grateful and *gasp* humble. Clearly you would never take the time to learn how to make something secure, so you’re basically just wanting to set up some lame thing where you have put literally no time into. People like you literally ruined crypto, altering its commonality from being useful as utility to being some worthless not-even-penny-stocks.
You cant really call yourself a 'no coder' anymore if you follow enough videos like this
i'm working on a video about that indeed!
@@coolmcdude You’re right - interesting, the labels we give ourselves.
@@alejandro_ao Hi can you help me resolve this error --> No module named 'altair.vegalite.v4', I am not able to start streamlit
also can you make a video on environment setup. Please i really need to submit the project within one week
I made it. I feel like data scientist now
bravo! you are awesome 🔥
Hello, excellent work. If I need it to return a json separating some specific values from the pdf, how would you do it?
Awesome.. Can you please make like upload python script and ask questions about script from chatgpt