I see that images are extracted out to a output location but happens if the question related to specific image and its relevant title or chunk of data does it able to provide info? Is it something missing linking between description of a image and its related context of text in place of doc? where is it mapped in the code relation between image desc and chuck of text or title or table summaries?
Yes sure, you have got access to the metadata attribute of the documents and can just use them whatever you want for. If you struggle with that, maybe watch my LCEL Crashcourse on this channel :)
@@김한승-n1k sorry i will not use open source models. The small models are weak and openai models are really cheap. The new Mini Model costs like 1 Dollar per month
Hey. I had an issue while running your code on another pdf document. It is giving me error: TesseractError: (1, 'Image too large: (3698, 34653) Error during processing.') It seems that tesseract has a upper bound on the size of the image. So the solution I think is to resize the image but I don't know how to do it while extracting it from the pdf inside partition_pdf function. Do you know how to resolve it?
I resized image with Pillow in the past: new_size = (width, height) # Set the desired width and height resized_img = img.resize(new_size, Image.ANTIALIAS) ChatGPT will easily write that code for you^^
why did you use chiain.invoke and not .run or apply or batch? Sometimes in your videos you use run and sometimes invoke. How od you know when to use when and whats the difference?
I thought about using batch and think its probably better, but I tried to keep it simple and just use a loop for every call. The difference between run and invoke is the chain. I try to use Language expression language only in my newer videos and invoke is the implementation of the runnable interface, while run is the implementation of the (deprecated) chain interface
nicely explained and nice informations as always but i have a question my files are stored in azure blob storage i am getting tghem throw blob loader does implementing the multimodal works with them?
Hello. I want to run this code on Linux. Do I have to download tesseract? What does it do? I deleted the relevant statements and found that the code will report an error.
@@codingcrashcourses8533 Thanks for the reply. Is there anything to change in the code in linux please? For example, should I delete this “pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'” or change it to another statement? ?
Thanks for the vid! Subscribed :) Is it not easier to now just convert the pdf to a set of high fidelity images, then get LLM with vision to review those images. It can return all text eg stored in SQLLite, while keeping the context of the text / tables / embedded images etc. then you mine the returned text…? I find that Tesseract is just unreliable enough to be dangerous (!)
@@codingcrashcourses8533 PDFs stored as blobs on Azure are different than reading locally. I have tried using langchain but was not able to read it. I then used pypdf to read the pdf as a streaming object.
Nice explanation. Keep going🎉
Nice video. Would be nice to go into a few more examples/use cases to more strongly illustrate why multimodal RAG is useful
I see that images are extracted out to a output location but happens if the question related to specific image and its relevant title or chunk of data does it able to provide info? Is it something missing linking between description of a image and its related context of text in place of doc? where is it mapped in the code relation between image desc and chuck of text or title or table summaries?
cant we get image also if we use vision model in the chain?
Currently not. At least last time i saw it
How can I use pinecone instead of chroma here?
Nicely explained ,Subscribed🎉
Nice video! Is there some way to retrieve the metadata as well with the multivector retriever? Such as page number or file name?
Yes sure, you have got access to the metadata attribute of the documents and can just use them whatever you want for. If you struggle with that, maybe watch my LCEL Crashcourse on this channel :)
@@codingcrashcourses8533 Sorry, I was being unprecise. I mean retrieving metadata from the docstore! Is that also possible?
how about doc and docx with images and tables? is converting to pdf the only way?
Please also use the free model.
With llm like llama3
@@김한승-n1k sorry i will not use open source models. The small models are weak and openai models are really cheap. The new Mini Model costs like 1 Dollar per month
Hey. I had an issue while running your code on another pdf document. It is giving me error: TesseractError: (1, 'Image too large: (3698, 34653) Error during processing.') It seems that tesseract has a upper bound on the size of the image. So the solution I think is to resize the image but I don't know how to do it while extracting it from the pdf inside partition_pdf function.
Do you know how to resolve it?
I resized image with Pillow in the past:
new_size = (width, height) # Set the desired width and height
resized_img = img.resize(new_size, Image.ANTIALIAS)
ChatGPT will easily write that code for you^^
Can we show the images as response along with relevant text as response based on the prompt passed.
Yes, but I would probably do that different. Maybe with a different embedding model. But to be honest, I can not good idea out of the box.
@AdarshMamidpelliwar i want to do same thing did you find out how this possible ?
@@vivekpatel2736 i was able to do
will this output images with text as well?
No. Gpt4 only outputs text. You can pass the output to dall-e-3
why did you use chiain.invoke and not .run or apply or batch? Sometimes in your videos you use run and sometimes invoke. How od you know when to use when and whats the difference?
I thought about using batch and think its probably better, but I tried to keep it simple and just use a loop for every call.
The difference between run and invoke is the chain. I try to use Language expression language only in my newer videos and invoke is the implementation of the runnable interface, while run is the implementation of the (deprecated) chain interface
nicely explained and nice informations as always but i have a question my files are stored in azure blob storage i am getting tghem throw blob loader does implementing the multimodal works with them?
I don´t know to be honest, but I think it should be possible. If not maybe try to get the files directly with the Azure SDK
@@codingcrashcourses8533 as always thanks for replying to my comments my mentor
how to store vectore created to local? so ican used again later
Faiss and chroma offer methods to do that. You will find that in the langchain docs
Hello. I want to run this code on Linux. Do I have to download tesseract? What does it do? I deleted the relevant statements and found that the code will report an error.
Tesseract is an OCR library. Read here: github.com/tesseract-ocr/tesseract . On linux it´s very easy to install
@@codingcrashcourses8533 Thanks for the reply. Is there anything to change in the code in linux please? For example, should I delete this “pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'” or change it to another statement? ?
Thanks for the vid! Subscribed :)
Is it not easier to now just convert the pdf to a set of high fidelity images, then get LLM with vision to review those images. It can return all text eg stored in SQLLite, while keeping the context of the text / tables / embedded images etc. then you mine the returned text…? I find that Tesseract is just unreliable enough to be dangerous (!)
@@hBenDg i also thought about it, but i guess you might loose some Information bei doing it this way
Will this part of partitioning work on Azure? How do you read pdf from a storage container?
I have not tried this yet. I would use the Azure SDK, but not sure if that works the same as reading the file from the local filesystem
@@codingcrashcourses8533 PDFs stored as blobs on Azure are different than reading locally. I have tried using langchain but was not able to read it. I then used pypdf to read the pdf as a streaming object.
Can you pls share the notebook
Code is in the description
Hi Markus ,I am having problem with downloading tesseract ,the download is really slow , do you have any link to tesseract
digi.bib.uni-mannheim.de/tesseract/ Hello Zaid, this is another Link I used before. Hope that helps! Best regards
@@codingcrashcourses8533 Thanks Markus!!
😊
thank you
thanks❤