Chat With Multiple PDF Documents With Langchain And Google Gemini Pro
HTML-код
- Опубликовано: 7 янв 2024
- github: github.com/krishnaik06/Comple...
In this video we will develop an LLM application uing Google gemini pro and langchain where we can multiple pdf documents with the help of FAISS vector emebeddings.
-----------------------------------------------------------------------------------------------------------
Support me by joining membership so that I can upload these kind of videos
/ @krishnaik06
---------------------------------------------------------------------------------------------------------------------------
►Data Science Projects:
• Now you Can Crack Any ...
►Learn In One Tutorials
Statistics in 6 hours: • Complete Statistics Fo...
Machine Learning In 6 Hours: • Complete Machine Learn...
Deep Learning 5 hours : • Deep Learning Indepth ...
►Learn In a Week Playlist
Statistics: • Live Day 1- Introducti...
Machine Learning : • Announcing 7 Days Live...
Deep Learning: • 5 Days Live Deep Learn...
NLP : • Announcing NLP Live co...
---------------------------------------------------------------------------------------------------
My Recording Gear
Laptop: amzn.to/4886inY
Office Desk : amzn.to/48nAWcO
Camera: amzn.to/3vcEIHS
Writing Pad: amzn.to/3vcEIHS
Monitor: amzn.to/3vcEIHS
Audio Accessories: amzn.to/48nbgxD
Audio Mic: amzn.to/48nbgxD
Lets Target for 1000 likes :). share your Work In Linkedin and Tag Me :)
For sure sir I am always there to support valuable content providing by you ❤
Valuable information
hi sir, if possible, could you do a video for scanned multipage pdf which includes images and tables with in it.
what are the prerequisites for this project Sir ?
@krishnaik06 pdf contains images or some figure. Is it possible to get those images as output for user response
Thanks 🙏. Was planning to do something similar with Gemini pro today. Before I even started writing the code your video popped up 😂.
Love your video.Please make a tutorial series showcasing Langchain, Gemini, and VectorDB in action.
olid video! You've covered a lot of ground and explained things well. There's always room to improve, but you're definitely on the right track. Looking forward to seeing more from you!
was just looking for this, loved the video and the explanation .
just a small request sir, can you bring a hands on practical course for LLM, langchain. from the basics or level 0 to be able to build a production level project using MLOPS or LLMOPS.
thank you sir.
This was really helpful! I didn't know I could talk to PDFs now.
Will get to it tomorrow morning🙌 do remind me at 11 am on 10/01. Thank you 🤝
Amazing Krish, looking for more end to end projects on gen ai.
quite the Quintessential Indian IT RUclipsr, you are good sir.
I have been giving up with this kind of project, but I will give this tutorial a try tomorrow.
really great video, thanks a lot for such great contents!!.
This is truly awesome
Crazy tutorial!! Loved it
Your videos really helpful for learners bro, from Tamilnadu
Can you please tell what are the prerequisites for this project
Thank You Sir For Giving Us This Valuable Knowledge I Created My First GEN AI App Today Thank You Sir😃
Have you figured out how one can fine tune Gen AI app with their own data?
hey krish awesome knowledge sharing it will be great if you could create a video on how we can deploy LLM model with RAG on cloud. I mean using docker or something and then hosting on AWS, GCP or other cloud platform so we can get idea that how the complete project is done.
My Dear , thanks you, I tested it and it worked well., you are blessed by god..Carlos from Brazil.
I am getting some deserialization error:
ValueError: The de-serialization relies loading a pickle file. Pickle files can be modified to deliver a malicious payload that results in execution of arbitrary code on your machine.You will need to set `allow_dangerous_deserialization` to `True` to enable deserialization. If you do this, make sure that you trust the source of the data. For example, if you are loading a file that you created, and no that no one else has modified the file, then this is safe to do. Do not set this to `True` if you are loading a file from an untrusted source (e.g., some random site on the internet.).
new_db = FAISS.load_local("faiss_index", embeddings)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniforge/base/envs/venv1/lib/python3.11/site-packages/langchain_community/vectorstores/faiss.py", line 1078, in load_local
raise ValueError(
how to solve it?
Hi Krish, thanks for the detailed video. I have a query on this application. Is this one time upload of PDF will keep the vector index for ever in the faiss_index folder? or do we need to keep uploading the file every time to achieve the expected summary. I mean the files uploaded in one session.
Thank you for video. It was amazing. Can we use the similar approach for multiple photos as well?
Thank you for sharing this useful advice on an important topic. How would you modify it to work with OpenAI's GPT-4 API instead of Google Gemini Pro?
Short and precise. Well Done. I have a question. Can i use this with flutter. I mean i need to develop front end using flutter
Hi Krish, Can you also teach us to deploy this solution on a cloud platform as well. Deployment are being asked a lot in interviews
Great Video. Can you please let me know how you identified the limit of not uploading a single pdf exceeding 200MB in size?
Nice explanation bro. Could you please share one more video with langchain htmltotxt extract from website with google api
Zabarjust ... Ek numbwr..
Thanks Krish for the video. I have a doubt that if I upload only 1 PDF will it work or I have to upload more than 1 PDF?
Great Sir but have a Question that coding not remain important? concept is important?
I mean we can use your coding to make this so what creative work remains in coding.
Thanks Krish 🙏, can we use Gemma 2b from huggingface instead of Gemini Pro?
As for poc it will be cost-effective.
Very Good information bro.
Can you help.. If headers based information can we take it from PDF to Excel
thank you very much
Wow Great...
And also will it be able to give a reference where my answer is referring to and if a question is asked from multiple pdfs ... Will be able to give reference page number and also pdf name properly?? Let me know if u have tested that
Thank you Sir for sharing this with us.
Can we use PandasAI for reading .CSV files, pls confirm
Hi Krish. Just want to be know for Q&A chatbot. I have Data about insurance in pdf file. So the Data which I am having is same columns and rows. But the insurance cost is different from each pdf file. So I want to build a Q& A chatbot. Which will extract the pdf file name along with that insurance type avilable in the pdf file and then it will display a drop down box or something to select which type of insurance you need and along with the cost. Can you helpe out for this?
Hi Krish, Thanks for the wonderful video. i could implement this by following your video. However, I am not getting the correct results when I try to analyze pdf file with form fields like checkboxes and radio buttons. I tried using OCR reading etc but still, it is not giving me the correct result. could you please recommend how to deal with pdf with from fields etc?
Hi Krish, can you make a video on rag application using llm models locally by using cpu and gpu as well.
¡Gracias!
Krish, great job. One question. How can this app return images from your attention is all you need paper. I would like to return the encoder decoder architecture diagram along with the steps mentioned in the doc. Can you please make a video for that?
Please did you find any solution for that? I'm working on something similar and I want to extract images and interact with them
any idea, that chatbot can display graphical or any kind of comparison graphs?
Hello sir. I have a doubt in 20:33 , what does the variable "context" in prompt template contain? We later using "input_docs" in conversational chain, but "context" is not used. Can you please clarify?
hello krish,
can we use different types file format for this project
new suscriptor Here!
can we use amazon titan multimodal embedding to get text along side image as output from the PDF and docx files.....
Can we get the difference between the two documents also using any utility I'm llm ?
Good day greetings
Hey Krish can you create a video on RAG
What happens with index.faiss after running several times with different files? Let's suppose I update a file and delete a paragraph, this information will be kept available yet for new questions? Do I have to delete the index.faiss?
Sir, if you don't mind, can you write the same program using gemini pro, but reading the PDF file from data base. For example, firebase.
Please make same for video. import video nd it give description
How to use gemini-pro-vision model on textual data
Hello sir when I run this code I am getting type error GoogleGenerativeAIEMbeddings is not collable
How to implement chat history/ conversation history when using FastApi for every request to get response?
Thanks very much. How can I run it in Colab?
I am getting Answer not available in context it's not properly read and storing the vector embeddings it seems😕
Yes Even me , I think this comment was almost 2 months ago , I am trying to implement it now , even I am getting the same error , Did you figured it out on how we can solve this issue or the error.
I know you might be busy but a 2 word reply would mean a lot.
You are putting crores of content on RUclips, my only request is that if you bring them all in Hindi, I will be very grateful.
If you can't understand this in English, this video has no use for you😂
does this code still work im having some issues with the chunk function expecting string, but even after converting the stuff to string it doesn't work
Hello sir my business problem is some tricky like how i will also get the page number along with question from the given pdf can you please help me
what if we ask completely different question which is not present in the pdf , i want to check the similarity
I am getting error: {'output_text': 'This context does not mention anything about Hierarchical Vision Transformers, so I cannot answer this question from the provided context.'}
Hello sir, actually i am getting attribute error: bytes object has no attribute seek
Can anyone pls say what are the prerequisites for this project?
Sir how can i extend this to chat bot which take input pdf of 15/50 pages where it containes images tables and content of text using langchain
Can we extract images from pfd files as well?
Hello Krish need your help for one of my poc project:
i have been asked to build poc on document search for different types of files like pdf, docx and excel:
i have used streamlit and build the simple ui application using google ai api and gemini pro..
now i have to implement below functionality:
"Should have design & approach of the solution
Two types of user to use this document search bot.
Admin & Normal User.
Admin should be able to upload and delete documents.
Both user should be able to ask query
The response should match the query asked
Have test cases created for validation"
please help me out with the source code how to start and build it
Hi Krish, I have to build a chatbot for a client on domain specific data. Can I use all these for the same? But when I see google gemini or gpt, they charge money for it. So exactly how should I build the bot on large textural data for a client.
I am guessing your best bet to keep this data private is to host a local LLM (maybe set up an endpoint with LMStudio) and see how it works. Mistral 7 does a good job for me.
I don't know for what purpose but everytime I see there is a library called tiktoken imported into the project but it is never used. why is that?
Can we fetch image as well from pdf using this project
Sir please also make with js
Finally I can get documents to just shut up sometimes, you know?
Hey hi @krishnaik can it also give proper response with clear output format having around 40 pdfs
Yes it can be any number
How to fetch image and text form the pdf can u make one more video on this ?
sir i am a getting a error that is ValidationError: 1 validation error for LLMChain llm Can't instantiate abstract class BaseLanguageModel with abstract methods agenerate_prompt, apredict, apredict_messages, generate_prompt, predict, predict_messages (type=type_error) i tried different methods to solve but i am unable to solve this error , can you plz kindly help with this .
Since i have taken up this for my 6th sem project.
me tooo plzz help me
Hi Friend. I think there's something wrong with the code. I always get "answer is not available in the context" when ever i try to chat with a pdf
If I need to write good morning to chat...
so answer should to be display like.
good morning. How can i help you...
Hey Krish, I am getting a bit of a problem, I've tried from 2 accounts its giving me a 400 eror API kry not valid, but i even made another account for it.
Please help me out
Instead of Streamlit, can I use Flask?
yes u can
I have this use-case where there are different types of documents. I can parse documents using document loaders using langchain. But, there are images also in these documents. I want to store them as metadata and if answer generated from a context chunk it show the image also. Please help.
Hey, If you find the answer please let me know.
Help please. How to add conversation memory here?
how i write all this commands in pip
ValueError: The de-serialization relies loading a pickle file. Pickle files can be modified to deliver a malicious payload that results in execution of arbitrary code on your machine.You will need to set `allow_dangerous_deserialization` to `True` to enable deserialization. If you do this, make sure that you trust the source of the data. For example, if you are loading a file that you created, and no that no one else has modified the file, then this is safe to do. Do not set this to `True` if you are loading a file from an untrusted source (e.g., some random site on the internet.).
How to resolve this error
just set the parameter allow_dangerous_deserialization = True i.e. FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True)
Bro same here you got it solved? Can you help?
@@srihbalaji7272 Change this line new_db = FAISS.load_local("faiss_index", embeddings)
to new_db = FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True)
Change this line new_db = FAISS.load_local("faiss_index", embeddings)
to new_db = FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True)
@@srihbalaji7272
In the function user_input(), replace the line and add allow_dangerous_deserialization=True
#Code
new_db = FAISS.load_local("faiss_index_react", embeddings, allow_dangerous_deserialization=True)
Sir , I'm getting error as TypeError: 'GoogleGenerativeAIEmbeddings' object is not callable. can you do something?
I'm also getting same error, any one fixed this issue.
update langchain
mine is fixed
After update lanchain, I still get the same error. It looks like the google_api_key become "None"?
: `embedding_function` is expected to be an Embeddings object, support for passing in a function will soon be removed.
vector_store:
Google Generative AI embeddings: model='models/embedding-001' task_type=None google_api_key=None client_options=None transport=None
`embedding_function` is expected to be an Embeddings object, support for passing in a function will soon be removed.
perform Similarity_Search for user_question: provide a detailed summary of Multi-Head Attention
2024-03-11 22:46:57.585 Uncaught app exception
Traceback (most recent call last):
File "/home/peter/anaconda3/envs/tf/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 534, in _run_script
exec(code, module.__dict__)
File "/home/peter/AIU/AIU_CS515_LLM/020_Gemini/002_Code/pdf_chatbot.py", line 108, in
main()
File "/home/peter/AIU/AIU_CS515_LLM/020_Gemini/002_Code/pdf_chatbot.py", line 93, in main
user_input(user_question)
File "/home/peter/AIU/AIU_CS515_LLM/020_Gemini/002_Code/pdf_chatbot.py", line 71, in user_input
docs = new_db.similarity_search(user_question)
File "/home/peter/anaconda3/envs/tf/lib/python3.10/site-packages/langchain/vectorstores/faiss.py", line 509, in similarity_search
docs_and_scores = self.similarity_search_with_score(
File "/home/peter/anaconda3/envs/tf/lib/python3.10/site-packages/langchain/vectorstores/faiss.py", line 390, in similarity_search_with_score
embedding = self._embed_query(query)
File "/home/peter/anaconda3/envs/tf/lib/python3.10/site-packages/langchain/vectorstores/faiss.py", line 155, in _embed_query
return self.embedding_function(text)
TypeError: 'GoogleGenerativeAIEmbeddings' object is not callable
How could I print an answer? or a set of answers?
How to check efficiency of this model..
hello sir
after done this.
not getting answer properly
getting this when i upload pdf file
{'output_text': 'answer is not available in the context'}
First, try this: Increase the context length to 10000 in charactersplitter.
If that doesn't happen: Try changing
response = chain(
{"input_documents": docs, "question": query}
, return_only_outputs=True)
to
response = chain.invoke(
{"input_documents": docs, "question": query}
, return_only_outputs=True).
I just wanted to let you know that this worked for me.
thanks for sharing and for the good tutorial.... but why do I always get the Reply: the answer is not available in the context, but I'm sure I upload PDFs
initially i got the answer but later I am getting similar answers
Check out my project DocCuddle as it was purely inspired by this tutorial, and I believe there is room to enhance it with additional features, such as a video summarizer or other functionalities so let me know what else can I add. Developing this project was truly enjoyable, especially while watching Krish Naik's tutorials. Thanks sir!
is this example RAG?
I was trying to build mcq generator but one problem I am getting is it can only generate 3 mcq question is there any solution to generate more than 3 mcq at once or what is better way to do it
Even I'm trying to do same thing, could you help me out please with the code .
I have selected this idea as my final year project
After the Gemini update, it constantly gives the error "The answer you are looking for in the text could not be found"
did you found any solution for this ?
Can you let us know where to set safety settings to use BLOCK_NONE as threshold?
it is still not by implemented from langchain
Hi Krish while following the video I have faced an error: 1 validation error for LLMChain llm Can't instantiate abstract class BaseLanguageModel with abstract methods agenerate_prompt, apredict, apredict_messages, generate_prompt, predict, predict_messages (type=type_error) .How can we fix it
same here do you find solution
I am also facing the same issue.
@krishnaik06
Yes, I am also facing the same problem here, please suggest a solution.
same here do you find solution
i got the solution we need to update the langchain library i am using langchain 0.1.0
Hello - I am getti g following error while running the program shared in git - plz advise "ValueError: The de-serialization relies loading a pickle file. Pickle files can be modified to deliver a malicious payload that results in execution of arbitrary code on your machine.You will need to set `allow_dangerous_deserialization` to `True` to enable deserialization. If you do this, make sure that you trust the source of the data. For example, if you are loading a file that you created, and no that no one else has modified the file, then this is safe to do. Do not set this to `True` if you are loading a file from an untrusted source (e.g., some random site on the internet.)."
Hey This was all working for me for last 2 days, but then I upgraded streamlit and now 'allow_dangerous_deserialization' flag is not required but facing an error - "TypeError: 'GoogleGenerativeAIEmbeddings' object is not callable". can you help to resolve the error?
@@harishsharma9466 Did you resolve the issue?
hey getting same error, were you able to find a resolution?
just set the parameter allow_dangerous_deserialization = True i.e. FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True)
You got solution?
Hi Krish, I am facing challenge with FAISS in my Mac, even if I installed it still I am getting error message: ImportError: Could not import faiss python package. Please install it with `pip install faiss-gpu` (for CUDA supported GPU) or `pip install faiss-cpu` (depending on Python version). I tried various ways but nothing is working. DO YOU KNOW A SOLUTION FOR IT?
I had similar issue and I did in a virtual env and it worked. Use python 3.11
api key is free??
I used your code where the response is {'output_text': 'Answer is not available in the context'} even with the proper context can you help with the proper response
the same error ....no solution now ..are getting a solution for the problem?
First, try this: Increase the context length to 10000 in charactersplitter.
If that doesn't happen: Try changing
response = chain(
{"input_documents": docs, "question": query}
, return_only_outputs=True)
to
response = chain.invoke(
{"input_documents": docs, "question": query}
, return_only_outputs=True).
I just wanted to let you know that this worked for me.
@@shubhamrathod9969 i tried the code which shown in the video and got
IndexError: list index out of range
Can you help me with this.
But where we have used the word query @@shubhamrathod9969
How can I deploy this on AWS
does this tutorial still work?
ValidationError: 1 validation error for LLMChain llm Can't instantiate abstract class BaseLanguageModel with abstract methods agenerate_prompt, apredict, apredict_messages, generate_prompt, invoke, predict, predict_messages (type=type_error) what is this error can anyone help
need to update the langchain inorder to fix it
Hello sir
I am getting error
ValueError: The de-serialization relies loading a pickle file. Pickle files can be modified to deliver a malicious payload that results in execution of arbitrary code on your machine.You will need to set allow_dangerous_deserialization
What to do sir
Change code into
new_db = FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True)
It worked for me, thanks!@@kanth2223
ValidationError: 1 validation error for LLMChain llm Can't instantiate abstract class BaseLanguageModel with abstract methods agenerate_prompt, apredict, apredict_messages, generate_prompt, predict, predict_messages (type=type_error) .I am getting this error while running the code.
same, you got solution ?
@@ravikantjain Please check you environment you are working.That might be issue