Working with MULTIPLE PDF Files in LangChain: ChatGPT for your Data

Prompt Engineering

Просмотров 55 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 12 апр 2023
Welcome to this tutorial video where we'll discuss the process of loading multiple PDF files in LangChain for information retrieval using OpenAI models like ChatGPT. Our step-by-step guide will explain how to convert PDF files into embeddings based on the chosen large language model. Let's get started!
Welcome to this tutorial where you'll learn how to extract valuable information from your PDFs using LangChain and OpenAI Text Embeddings. We'll guide you step-by-step through the process of setting up LangChain to communicate with your PDF files, allowing you to retrieve information efficiently and effectively. By the end of this tutorial, you'll have the skills necessary to use advanced language processing technology and improve your data analysis.
▬▬▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬
☕ Buy me a Coffee: ko-fi.com/promptengineering
|🔴 Support my work on Patreon: Patreon.com/PromptEngineering
🦾 Discord: / discord
▶️️ Subscribe: www.youtube.com/@engineerprom...
📧 Business Contact: engineerprompt@gmail.com
💼Consulting: calendly.com/engineerprompt/c...
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
LINKS:
Google Colab: colab.research.google.com/dri...
LangChain: docs.langchain.com/docs/
VectorstoreIndexCreator vectorstore: tinyurl.com/3yz455m3
☕ Buy me a Coffee: ko-fi.com/promptengineering
Join the Patreon: patreon.com/PromptEngineering
#LangChain #InformationRetrieval #PDF #OpenAITextEmbeddings #DataAnalysis #LanguageProcessingTechnology #AI #MachineLearning #NaturalLanguageProcessing #NLP #Tutorial
Наука

Комментарии • 258

@engineerprompt Год назад
Want to connect?
💼Consulting: calendly.com/engineerprompt/consulting-call
🦾 Discord: discord.com/invite/t4eYQRUcXB
☕ Buy me a Coffee: ko-fi.com/promptengineering
|🔴 Join Patreon: Patreon.com/PromptEngineering
▶ Subscribe: www.youtube.com/@engineerprompt?sub_confirmation=1
@chandrasekhargogula7991 10 месяцев назад
Suppose my question is about one document but it's taking answer from another document it's giving irrelevant answer , how we can handle it
@asepmulyana9085 Год назад
Just wanna say thanks a lot for your tutorial!
@neerajjulka4756 5 месяцев назад ⁺¹
superb explanation. Thanks
@mauriceleetp Год назад
Thank you for sharing! Excellent video.
@samdaniel1368 Год назад
Thank you very much. Is there anyway we can specify which document to scan into to find the answers?
@dealersagent Год назад
Very cool video. Thank you!
@markanthonymarez Год назад ⁺¹
Can you choose which model to use? I don’t see a request completion with the model statement. Thank you for this video - I’m still learning by doing.
@TheAzerue Год назад
Great Video. I'm learning a a lot from you. Thank you.
@engineerprompt Год назад
Glad its helpful :)
@VenkatesanVenkat-fd4hg Год назад ⁺¹
Thanks for excellent video. How to get the page number of the content & sources...Any suggestions
@giovannigrassobbio6448 Год назад ⁺³
Hi, very good work. Thanks! Sorry but the link of google colab is invalid
@PallaviChauhan91 4 месяца назад
You are amazing! This is exactly what I was looking for. I might also need to connect with you in future for consultancy on something that I am trying to build.
@asprinama Год назад ⁺²
My man! First, you're a monster. Obviously, I bought your a coffee. Anyway, there were 3 erros/bugs (excuse my languague this is the first time I code something in my life); which in case somebody was struggling I think they are useful. 1) in the section Connect Google Drive; second segment of the code; I had to input between pdf_folder_path = f'{root_dir}/data/' and os.listdir(pdf_folder_path) the line import os. In other words, the full line(s) of code is (first line) pdf_folder_path = f'{root_dir}/data/' [enter] import os (second line) [enter] os.listdir(pdf_folder_path) (third line). 2. In the section 'Load Multiple PDF Files' I included these two lines of code from langchain.document_loaders import UnstructuredPDFLoader
from langchain.indexes import VectorstoreIndexCreator; 3) In Vector Store section as a first line of code I have included: !pip install unstructured[local-inference]. And that's basically it! Cheers mate!
@engineerprompt Год назад
Thank you!
@sylap 4 месяца назад
Thanks stranger! You just fixed my traceback error with suggestion number 3.
@NerdNetArcade 6 месяцев назад
Hey, Thanks for the tutorial, I am thinking of creating a Voice Assistant using Open AI embeddings , it there any tutorial for this ?
@sammiller9855 Год назад ⁺⁵
Please consider doing a similar video on how to be able to chat more freely with Google Drive PDFS with memory. For example, having the script generate a glossary, an outline, or a lesson plan based on the database of pdfs.
@engineerprompt Год назад ⁺⁶
Sure, that's in plans. There is a video on channel 'crash course on langchain', it has a section on memory so check that out for the time being :)
@haveaniceday7950 Год назад
This is great idea Sam
@scott701230 Год назад ⁺¹
Great stuff
@lilianxx8073 Год назад ⁺¹
Thank you so much, it worked! (But also required me to install that pdfminer and several other things)
@engineerprompt Год назад
Glad you found it useful. Google colab is sometimes really funny :)
@LoneRanger.801 Год назад ⁺²
This is excellent. Would love for you to dwell deeper into this experimentation. How much did it cost you on OpenAI’s end? For embeddings etc.
@engineerprompt Год назад ⁺¹
Thanks, I will be doing a lot more on this. For this video & and experimentation, the cost was around $1.
@DiegoPaulDP Год назад
Hello! nice solution, wanted to try it, is the colab link working?
@elgodric Год назад ⁺²
Does this method works with full books ~300 pages?
@JJaitley 11 месяцев назад
This is all good for demos but is langchain reliable for production-level apps. Are there any alternates? @Prompt Engineering
@arsalanriaz3382 Год назад ⁺⁴
Can you also include how to interact with tables and pictures in a PDF document
@gsdeng Год назад ⁺¹
can it answer questions that need information from multiple pdfs?
@RonBarrett1954 Год назад ⁺²
Hi Prompt Engineering!
Quick question: I like the way you created an index from multiple PDF files and queried from the index. Have you attempted to persist the vectorstore for later use (e.g., query or update with additional documents)?
@engineerprompt Год назад ⁺¹
Chromadb will be a good option for doing that.
@georgescresshounnoukon9855 Год назад
@@engineerprompt can you make difference between chromadb and Faiss of Meta
?
@AC-fn7jl Год назад
How would you adjust the temperature ?
@matheus89555 Год назад
Hi, when installing chromadb I get error when installing hnslib, how did you fixed it?
"Failed building wheel for hnswlib"
@lynnqi6451 Год назад ⁺¹
Loooooove it ❤
@morris5648 Год назад
Good stuff. Two questions/suggestions. First, is the data stored locally or in a database like Pinecone? Second, can the intake be modified so that I can use directeoryLoader?
Good stuff!
@engineerprompt Год назад ⁺²
Thank you.
1) For this example, its stored locally but you can use any database you want.
2) Yes, you can do that. In that case, you will have to define the file type you want to read.
hope this helps.
@user-vb3uy5nh9k 6 месяцев назад
is accuracy being calculated for the model?
@md.shahriaralam5930 11 месяцев назад
can we compare the difference between 2 pdfs ?
@michakoodziej8760 Год назад
Hello, great tutorial! Any idea how to change the max_tokens (output tokens) in this approach? So far I'm getting 256 tokens in the response, while I need much more.
@krisszostak4849 Год назад
which part of the script creates embedings and uses chroma pls?
@scott701230 Год назад
Thank you for educating us. I wonder how would you integrate AutoGPT for multiple agents
@engineerprompt Год назад
AutoGPT will decide how many agents to use based on the problem its trying to solve. I don't think you need to specify number of agents.
@szymoonkowalczyk Год назад
I am very interested in how you made the model of the animated face that is at the beginning of the video, is there a tutorial on your channel on how to make such an avatar?
@engineerprompt Год назад
Yes, check this out: ruclips.net/video/V2efVSXSlqc/видео.html
I have a couple of other videos as well that use open source tools. You can look for those as well.
@caiyu538 9 месяцев назад
it works greatly to understand my files. But even with T4 GPU 16gB gpu memory, it takes 2-3 minutes to get answer for a file with 4 or 5 pages. Is it normal in this GPU condition?
@Alex-Ibby Год назад ⁺³
One more question - do the documents need to be reloaded into a vector every single time? Or can we simply import the query and answer to another Python file?
@engineerprompt Год назад ⁺⁴
That's a great question, should have addressed this in the video. You can simply write the embedding into a file and store that instead. Then reuse it whenever you want
@ynboxlive Год назад ⁺⁶
I found that I had to add this in order for it to work:
!pip install unstructured[local-inference]
Otherwise I got this error:
ImportError: Following dependencies are missing: pdfminer. Please install them using `pip install unstructured[local-inference]`.
Why is this?
@Prakash-oq5ke Год назад ⁺¹
Thanks a lot! This indeed saved my time!
@youshikyou Год назад
Hi, I have this error. How did you solve it? What is the local-inference?
@henkhbit5748 Год назад
Thanks for the clear example👍 I have 2 additional questions"
1. if u have a pdf with mathematical formula, as an example to calculate some measure (i.e. BMI). Can u also ask for the BMI if u supply your length and weight?
2. if I have a document with question and answers. How to feed it?
Thanks in advance.
@sauravmukherjeecom Год назад ⁺²
1. Might be possible with GPT4 token.
2. Should be possible, similar to any other PDF. It will treat it normally like a sequence of tokens.
@Fordtruck4sale Год назад
Please update the colab link, thanks very much!😀
@shubhammural4760 Год назад
Hi Prompt Engineering
I tried with similar example, but I am getting error
Did not find openai_api_key, please add an environment variable `OPENAI_API_KEY` which contains it, or pass `openai_api_key` as a named parameter. (type=value_error)
@Alex-Ibby Год назад
Awesome video. Is it possible to run this as a regular Python file without Jupyter notebook, anything I should be aware of?
@engineerprompt Год назад
Yes, absolutely you can do that. Just create another virtual environment for this and install the packages and you are good to go.
@moz658 Год назад ⁺²
Nice tutorial. I am actually facing a problem when trying to use the Chroma vector store with a persisted index. I have already loaded a document, created embeddings for it, and saved those embeddings in Chroma. The script ran perfectly with LLM and also created the necessary files in the persistence directory (.chroma\index).
However, when I try to initialize the Chroma instance using the persist_directory to utilize the previously saved embeddings, I encounter a NoIndexException error, stating "Index not found, please create an instance before querying". Is there a way to fix it? Could you find a solution and make a video of it?
Additionally, I am curious if these pre-existing embeddings could be reused without incurring the same cost for generating Ada embeddings again, as the documents I am working with have lots of pages. Thanks in advance!
@engineerprompt Год назад ⁺³
I will try to look at the first problem. 2) yes, you can do that.
@agyson Год назад
Hello, is there any update regarding this problem? By the way, nice vids!!
@PradyMixes 8 месяцев назад
Just curious do you have a tutorial for multiple pdfs using llama 2 and other open source embedding ?
@engineerprompt 8 месяцев назад
Yes just check out my localgpt videos
@cheunghenrik7041 Год назад ⁺¹
May I ask does it work with PDFs having over 4000 tokens (the limit of OpenAI API)? Thanks a lot for providing both guidelines and Colab notebook for immediate use!
@ilianos 10 месяцев назад ⁺¹
That's what chunks are for. The text is split up into those chunks so that it makes them manageable for further processing.
@ahmedsalimlachkar5460 Год назад
Thank you so much that's quite helpful ! , although it would be great if you can help us give memory to it , for exemple if I correct a wrong output the bot should remember it. Have a nice a day and keep up the good work.
@engineerprompt Год назад ⁺²
Glad you found it helpful. As far the memory is concerned, watch this video, there is a section on how to do it. I will be making more detailed videos on it later: ruclips.net/video/5-fc4Tlgmro/видео.html&ab_channel=PromptEngineering
@FranciscoMonteiro25 Год назад
is there a github respository for all your excellent training videos?
@kaleshashaik5959 4 месяца назад
Can we integrate this with Django?
@user-vl3mr4yg6v Год назад ⁺⁴
Thanks for the video, it's very useful. Is it possible to integrate a voice assistant that receives a question as input and answers via voice, using the information present in the pdfs? It would be very useful. It could be done by whisper or bark. What do you think about it?
@kirklearned Год назад ⁺¹
I downloaded many PDF's just waiting for the day this becomes a reality.
@engineerprompt Год назад ⁺⁵
Yes, if I find time, I will put together something for this.
@haveaniceday7950 Год назад
I want this too!
@gnanashreechethan1209 10 месяцев назад
Hi instead of UnstructuredPDFLoader can we use PyPDFLoader? I was using PyPDFLoader using glob and loader_cls . I have added 3 pdf files in a folder called pdf . so when i load it and print len of the documents it shows wrong ans like 5 or 6 whereas what i have loaded is only 3 pdf lines ? Can you plz lemme know if u have solution for this
@hanumanparida8131 10 месяцев назад
Hey what if we want image responses how do we get it
@PaoloPizzorni Год назад
Excellent video, exactly what i was looking for. My pdf files are a mess (anyone can relate?) Hundreds of pages, images, scanned documents sometimes.
Can you clarify what is Pinecone and how it could help in this particular workflow?
@engineerprompt Год назад
So this approach will only work on the text part of your documents, I am not aware of any approach that will understand images (yet). Pinecone is the vectorstore (think a database). You can basically store your embeddings there if you have a very large set of documents. Hope this helps.
@rodrigofarias900 2 месяца назад
do you have a video like this with local LLM? Like using LMStudio as a server?
@engineerprompt 2 месяца назад
look at the localgpt project.
@ankit9401 7 месяцев назад
suppose i added pdfs containing details for each employee. and then i ask how many employee have python experience? or how many employees are there in company?
can it respond correctly?
if no, then what should be done in order to get correct response for above queries?
Thanks!
@Sachin-kk3np 11 месяцев назад
My Question : I want the answers from both sources mentioning that this Answer1 is coming from Source1 and Answer2 is coming from Source2.
How can i achieve this?
@lmrecords4564 Год назад
can you try this for estimation pdf blueprint files for commercial window treatment business? and construction firms
@engineerprompt Год назад ⁺²
You could. If you have files, I can try
@ticelsoful 11 месяцев назад
Wonderful tutorial. Would it be possible to run this through VSCode appear within the browser? Since I attempted it and it only shows up in the console without opening the browser.
@engineerprompt 11 месяцев назад
Would should be able to. VSCode has support for Jupyter Notebook. Check that out.
@fahad123434 2 месяца назад
@@engineerprompt im facing a problem in vector store index ccreator
please share the solution
@fishandcat4281 Год назад
That's a another informative video. Appreciated. I know someone already asked in this thread that how to persist the index for later use. And you recommend chromadb is a good choice. However, in my company computer, I failed to install the chromadb. So how can I use FAISS instead to persist the index?
@engineerprompt Год назад
In that case, use pickle to dump the index to a pickle file.
@fishandcat4281 Год назад
@@engineerprompt thx for the response. Yes, I did that after watching your other vedios. All good now
@nitingoswami1959 Год назад ⁺¹¹
I want to use alpaca or vacuna model instead of chatgpt because chatgpt has limitations on the requests we sent. I just wanted to use any open-source model instead of chatgpt is this possible?
@engineerprompt Год назад ⁺¹⁴
Yes, you can looking into huggingface embeddings. When I get them working, will make a tutorial on it.
@SuproMVP Год назад ⁺²
@@engineerprompt Yes looking forward to that tutorial where we can read multiple pdfs and query it without using OpenAPI.
@sauravmukherjeecom Год назад
Looking forward to this!
@nitingoswami1959 Год назад
Waiting for this 😊
@AmBasLam Год назад
@@engineerprompt waiting for this
@maxbodley6452 Год назад
Great stuff! I was looking al over the web fro how to do this and this and this was the only useful video I could find. I jusr have one quick question for further work I need to do.
Just wondering if it is possible to make queries which only pertain to a specific document? For example, i only want to know something about the first paper (eg. authors, title, etc) but not the second. Let me know how you would go about doing this.
@engineerprompt Год назад ⁺¹
Thank you and glad you liked it. Yes, you can do that by adding metadata and use that as context to the LLM.
@maxbodley6452 Год назад
@@engineerprompt Thanks! Do you have any videos/resources on how to do this?
@nightmisterio Год назад
Do a demo of it working, I always wanted this.
@Ianhilts667 Год назад
Great content. Would it be easy to modify this process to handle different file formats such as .doc or .txt. Thanks again. I have subscribed.
@engineerprompt Год назад
Thank you, yes, you just need to add different loaders for each file type.
@tapos999 Год назад ⁺⁷
Is it possible to retrieve which section of the PDF it is referring too? (even it can detect the portion of chunk in pdf)
@engineerprompt Год назад ⁺³
I am not sure, will look into it.
@Myplaylist892 Год назад ⁺³
That would be incredible in order to make scientific reviews and references
@engineerprompt Год назад ⁺³
@@Myplaylist892 I agree, I will look into it in more details.
@girijeshsingh6947 Год назад ⁺¹
@@engineerprompt hi..you got anything on this?
@martinsherry Год назад ⁺¹
yes that does sound pretty useful
@Brainjoy01 Год назад
Immediately hit quota before able to query. Huggingface would be the next route for free AI version?
@engineerprompt Год назад
Yes, you can try huggingface or if you have the hardware, you can try to run something like localgpt locally
@amardeepraj4321 11 месяцев назад
Can I store a vector in a database like Azure and then run just a similarity search or retriever without having to recreate it? Can someone help me?
@LoneRanger.801 Год назад ⁺³
Apart from OpenAI, who else provides embeddings?
@engineerprompt Год назад ⁺⁷
Hugginface have their embeddedings as well as you can integrate models like Bert.
@1984amitsince Год назад
I am getting this error on VectorStoreIndex creation :
ImportError: cannot import name 'open_filename' from 'pdfminer.utils' (C:\Program Files\Python310\lib\site-packages\pdfminer\utils.py)
@caiyu538 9 месяцев назад
Besides pdf, word files, Can localgpt handle excel, CSV files?
@engineerprompt 9 месяцев назад
Yes, but you will have to experiment with the embedding model and llm
@satemyu Год назад
sorry, i think the Google Colab got problem, anyone has problem to open and run it ?
@SanjeevKumar-dr6qj Год назад
Can I use this in my company website for creating pdf searching .. please reply
@engineerprompt Год назад ⁺¹
Yes, its simple to integrate.
@abdelouahabmotrani3831 Год назад ⁺¹
thank you for this valuable information's, how can i get the the number of the page as reference with the source pdf ?
@engineerprompt Год назад
I think there is a way, need to check it.
@haveaniceday7950 Год назад
It’s listed on lang chain site
@kevinyuan2735 Год назад
谢谢！
@engineerprompt Год назад
Thank you for your support!
@diegogutierrez2874 5 месяцев назад
Now text-davinci-003 model has been deprecated I'm no longer able to use openai library.
Got the error: openai.error.InvalidRequestError: the model 'text-davinci-003' has been deprecated. Is there a way I can replace it to gpt-3.5-turbo-instruct (recommended one by openai)?
@arslanabid2245 10 месяцев назад
I have a Question:
What Should be my System Requirements, if i want to build a Project Application using Langchain & OpenAi ?
@engineerprompt 10 месяцев назад ⁺¹
You can run it on any machine that can run python if you are using openai models. You don’t need a gpu in that case
@syedsaalim3604 Год назад
This error is popping up: "Failed to load the Detectron2 model. Ensure that the Detectron2 module is correctly installed."
after running this "index = VectorstoreIndexCreator().from_loaders(loaders)" although i have already installed the detectron2 model.
any solutions?
@engineerprompt Год назад
if its warning, just ignore.
@ahmedkotb3089 Год назад
I got this error :
InvalidRequestError: This model's maximum context length is 4097 tokens, however you requested 4331 tokens (4075 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.
@samser1150 Год назад ⁺⁴
Thanks a lot, but I have a error message when I run the VectorstoreIndexCreator() cell i get the following error: "ImportError: cannot import name 'open_filename' from 'pdfminer.utils' (/usr/local/lib/python3.9/dist-packages/pdfminer/utils.py)" ¿could you help me?
@Prakash-oq5ke Год назад ⁺¹
@Sameul - @Yn Box has answered this question, please see the below comments. Basically you need to do:
!pip install unstructured[local-inference]
I was also facing the same issue as you and his resolution solved it!
@thomasneuhaus4838 Год назад
@@Prakash-oq5ke This worked! Thank you.
@user-wr4yl7tx3w Год назад
do you know how to incorporate a new LLM like Dolly into LangChain?
@engineerprompt Год назад ⁺²
tutorial coming soon.........
@cascaderz Год назад ⁺⁴
When I run the VectorstoreIndexCreator() cell i get the following error
ImportError: cannot import name 'open_filename' from 'pdfminer.utils' (/usr/local/lib/python3.9/dist-packages/pdfminer/utils.py)
I tried installing and importing the packages but that didn't work either , any solution to this?
@MatiPage Год назад
Same 🥲
@samdaniel1368 Год назад ⁺¹
You can try installing this library first. !pip install unstructured[local-inference]
@cascaderz Год назад
@@samdaniel1368 Thank you for the solution , running it in the first cell and restarting the runtime solved the issue for me
@electrikkingdom Год назад
@@samdaniel1368 Perfect thanks
@samser1150 Год назад
Thanks a lot for the video. I am facing a problem with the access to the Colab file, please, can you help?
@engineerprompt Год назад
What is the issue?
@samser1150 Год назад
@@engineerprompt Something like "Request had invalid authentication credentials. Expected OAuth 2 access token, login cookie or other valid authentication credential"
😢
@engineerprompt Год назад ⁺¹
@@samser1150 Make sure you are not running the colab in another tabs.
@samser1150 Год назад
@@engineerprompt solved, thanks a lot
@engineerprompt Год назад
@@samser1150 Perfect, glad was helpful!
@yassinerabaoui4219 Год назад
Hey, thnx for the video but The Colab file is private we can't access
@engineerprompt Год назад
It seems to be public, can you not see it at all?
@maartirosian Год назад ⁺¹
Can you make a video where you make a webapp using langchain & streamlit where you can upload multiple PDF files and ask questions about the files?
@engineerprompt Год назад ⁺⁴
Its on my to-do list :)
@maartirosian Год назад
@@engineerprompt Hope I see it soon🙏🏼🤩
@kaustubhsarnaik6672 Год назад
I am using Azure OpenAI , Code failing in the Index creation step i.e
index = VectorstoreIndexCreator(embedding=embeddings).from_loaders(loaders)
with the following message
raise error.InvalidRequestError(
openai.error.InvalidRequestError: Must provide an 'engine' or 'deployment_id' parameter to create a
Can you help on how to do this with Azure OpenAI setup
@engineerprompt Год назад
Sorry haven't used Azure so not sure what's going on here. Seems like you are having issues accessing openai api.
@JustinBieberFan957 Год назад ⁺¹
what to do about 'detectron2 not installed
@engineerprompt Год назад
Are you getting a warning or error?
@blindender9979 Год назад ⁺¹
How to do the same without openai? I mean, using gtp4all or some other llm. The point is, doing everthing "for free" without spend on API calls. And another on, how to do the same on a large codebase? Pyhton, java, clojure, etc.. thank you
@engineerprompt Год назад ⁺³
Stay tuned, tutorial coming soon :-)
@roykim1425 9 месяцев назад
Can ypu help me? I have error message.
question = "핵가족 그리고 직계가족이 뭐지?"
response = model1({"question":question}, return_only_outputs=True)
print("Answer : ",response['answer'])
print("Sources : ",response['sources'])
This model's maximum context length is 4097 tokens, however you requested 4903 tokens (4647 in your prompt; 256 for the completion). Please reduce your prompt; or completion length
@MatiPage Год назад ⁺²
the cell " index = VectorstoreIndexCreator().from_loaders(loadees) " gives me error even though i pip installed pdfminer and !pip install unstructured[local-inference]... don't know what to do :(
@engineerprompt Год назад ⁺²
What's your python version?
@MatiPage Год назад
@@engineerprompt 3.9
@StefanoTrinchero Год назад
Same error here
@earningman5836 Год назад
same erroe fot any solution?
@RonBarrett1954 Год назад
Running python 3.10 and with the unstructured[local-inference] installed , I am running into the error at the index = ... line.
The error is:
AttributeError Traceback (most recent call last)
in ()
1 get_ipython().system('pip install unstructured[local-inference]')
----> 2 index = VectorstoreIndexCreator().from_loaders([loaders])
3 index
/usr/local/lib/python3.10/dist-packages/langchain/indexes/vectorstore.py in from_loaders(self, loaders)
67 docs = []
68 for loader in loaders:
---> 69 docs.extend(loader.load())
70 sub_docs = self.text_splitter.split_documents(docs)
71 vectorstore = self.vectorstore_cls.from_documents(
AttributeError: 'list' object has no attribute 'load'
@SaiKiranAdusumilli Год назад ⁺¹
How to use gpt-3.5 turbo instead of davinci
@engineerprompt Год назад
in the OpenAI function, set the model variable to gpt-3.5
@edspa8576 Год назад
@@engineerprompt here, too trying to do this, but with the VectorstoreIndexCreator it's a bit tricky if one doesn't know where to put it. Not choosing the gpt3.5 turbo becomes costly with the davinci over time
@caankitrmehta2281 Год назад
How to extract certain basic kyc data and stuff in excel from insurance polices and invoices of different different structures in pdf
Can this be done using chatgp Or any similar AI tool automatically without any training and annotations?
@engineerprompt Год назад
Yes!
@caankitrmehta2281 Год назад
@@engineerprompt how?
@engineerprompt Год назад ⁺¹
@@caankitrmehta2281 You need to design a prompt which will extract this info from the files.
@caankitrmehta2281 Год назад
@@engineerprompt what will b approx cost
@VR-fh4im Год назад
Error in your google colab file "ModuleNotFoundError: No module named 'pdfminer'"
@engineerprompt Год назад
check it now!
@osb22 Год назад
Getting the following error message when I try to run the 'Load Required Packages' cell:
ModuleNotFoundError: No module named 'langchain'
Any advice?
@engineerprompt Год назад
Seems like you didn't install langchain. In the start there is a cell with the following command.
!pip install langchain
Make sure you run that.
@osb22 Год назад
@@engineerprompt Thank you. It worked after refreshing the page, think the error was on the Colab side. Great video!
@ynboxlive Год назад
Has anyone been able to fix the "detectron2 is not installed" issue?
@engineerprompt Год назад
Is it warning or error?
@ynboxlive Год назад
@@engineerprompt it is a warning : "detectron2 is not installed. Cannot use the hi_res partitioning strategy. Falling back to partitioning with the fast strategy." and it is repeated for each file in the directory that has the PDFs.
@NigelPowell Год назад
which AI avatar generator are you using ? :)
@engineerprompt Год назад ⁺¹
I have a locally open-source workflow for that :-)
@NigelPowell Год назад
@@engineerprompt I feel a need to request a video on it 🙂
@fumedia-Language Год назад
How much should pay for 10 pages based on your experience ?
@engineerprompt Год назад ⁺¹
Depending on how many times you will be prompting but its going to be in cents or a few dollars at max.
@fumedia-Language Год назад
@@engineerprompt I appreciate your cooments.
@arturgoraus7947 2 месяца назад
Due to changes in VectorstoreIndexCreator API some errors appeared.
To solve it, I did:
embedding_ai = OpenAIEmbeddings() #Use any embedding you want to
index = VectorstoreIndexCreator(embedding = embedding_ai).from_loaders(loaders)
turbo_llm = ChatOpenAI(
temperature=0,
model_name='gpt-3.5-turbo-0125' # default gpt-3.5-turbo
)
#Need to define LLM now
index.query('Tell me something about Interpersonal communication', llm = turbo_llm)
@maheshsanjaychivateres982 Год назад
please provide updated google collab link.
@engineerprompt Год назад
Can you please try now!
@digidope Год назад ⁺¹
How to run this locally without Collab?
@engineerprompt Год назад ⁺⁴
You will need to install Python on your machine. Download visual code studio and install it. Then download the notebook shown in the video and run it in visual code studio. Hope this helps. If you are not familiar with the process, I can make a tutorial at some point.
@digidope Год назад
@@engineerprompt I have python and visual code studio installed as i run locally many LLM models and i do AI training. I just have never used collab/notebook things.
@engineerprompt Год назад ⁺²
@@digidope Perfect, then its just a normal jupyter notebook once you download it. Just download it and you can run it as jupyter notebook.
@Bragheto Год назад
Valeu!
@engineerprompt Год назад ⁺¹
Thank you, really appreciate your support!
@Bragheto Год назад
@@engineerprompt I was looking for an alternative for manually creating each loader. Thx!
@pkay3399 Год назад ⁺⁴
Getting an error when opening the link.
@engineerprompt Год назад
what's the error?
@pkay3399 Год назад
@@engineerprompt Colab signed out on a different tab. But I'm signed in.
@Fordtruck4sale Год назад ⁺¹
Request had invalid authentication credentials. Expected OAuth 2 access token, login cookie or other valid authentication
@MatthewGunnin Год назад
@@engineerprompt Yeah I can't access it either. Something about the OAuth token
@engineerprompt Год назад
@@MatthewGunnin Make sure you close all instances of google colab you have running and then open this link. Hope this helps.
@mvkrishna760 Год назад
Can it work with 100s of PDFs?
@engineerprompt Год назад ⁺¹
You could, I haven't tried it so can't say how hard its going to be. Will look into it.
@sauravmukherjeecom Год назад
This would be a very interesting experiment as well.
At which point does it stop being context setting and starts being fine tuning?
@hamaltarther2515 Год назад ⁺¹
Bro can u make video on how u link on ur website + ui make it more good
@hamaltarther2515 Год назад ⁺¹
??
@engineerprompt Год назад
@@hamaltarther2515 You can use streamlit, will look into it.
@aadarshunniwilson8517 Год назад
cannot access colab
@engineerprompt Год назад
Can you please try now!

Следующие

Автовоспроизведение

ChatGPT for YOUR OWN PDF files with LangChain