Multimodal RAG with GPT-4-Vision and LangChain | Retrieval with Images, Tables and Text

Поделиться
HTML-код
  • Опубликовано: 12 дек 2024

Комментарии • 46

  • @naveenkumar-ik8bx
    @naveenkumar-ik8bx Месяц назад

    Nice explanation. Keep going🎉

  • @nmstoker
    @nmstoker 6 месяцев назад +1

    Nice video. Would be nice to go into a few more examples/use cases to more strongly illustrate why multimodal RAG is useful

  • @KushwanthK
    @KushwanthK 12 дней назад

    I see that images are extracted out to a output location but happens if the question related to specific image and its relevant title or chunk of data does it able to provide info? Is it something missing linking between description of a image and its related context of text in place of doc? where is it mapped in the code relation between image desc and chuck of text or title or table summaries?

  • @akmr0079
    @akmr0079 9 месяцев назад +2

    cant we get image also if we use vision model in the chain?

  • @arishasaeed
    @arishasaeed 6 месяцев назад +1

    How can I use pinecone instead of chroma here?

  • @Slimshady68356
    @Slimshady68356 Год назад

    Nicely explained ,Subscribed🎉

  • @lucasrichter4827
    @lucasrichter4827 5 месяцев назад +1

    Nice video! Is there some way to retrieve the metadata as well with the multivector retriever? Such as page number or file name?

    • @codingcrashcourses8533
      @codingcrashcourses8533  5 месяцев назад +1

      Yes sure, you have got access to the metadata attribute of the documents and can just use them whatever you want for. If you struggle with that, maybe watch my LCEL Crashcourse on this channel :)

    • @lucasrichter4827
      @lucasrichter4827 5 месяцев назад

      @@codingcrashcourses8533 Sorry, I was being unprecise. I mean retrieving metadata from the docstore! Is that also possible?

  • @robertbai2237
    @robertbai2237 6 месяцев назад +1

    how about doc and docx with images and tables? is converting to pdf the only way?

  • @김한승-n1k
    @김한승-n1k 4 месяца назад +3

    Please also use the free model.
    With llm like llama3

    • @codingcrashcourses8533
      @codingcrashcourses8533  4 месяца назад +2

      @@김한승-n1k sorry i will not use open source models. The small models are weak and openai models are really cheap. The new Mini Model costs like 1 Dollar per month

  • @hb87594
    @hb87594 2 месяца назад

    Hey. I had an issue while running your code on another pdf document. It is giving me error: TesseractError: (1, 'Image too large: (3698, 34653) Error during processing.') It seems that tesseract has a upper bound on the size of the image. So the solution I think is to resize the image but I don't know how to do it while extracting it from the pdf inside partition_pdf function.
    Do you know how to resolve it?

    • @codingcrashcourses8533
      @codingcrashcourses8533  2 месяца назад

      I resized image with Pillow in the past:
      new_size = (width, height) # Set the desired width and height
      resized_img = img.resize(new_size, Image.ANTIALIAS)
      ChatGPT will easily write that code for you^^

  • @AdarshMamidpelliwar
    @AdarshMamidpelliwar 9 месяцев назад

    Can we show the images as response along with relevant text as response based on the prompt passed.

    • @codingcrashcourses8533
      @codingcrashcourses8533  9 месяцев назад

      Yes, but I would probably do that different. Maybe with a different embedding model. But to be honest, I can not good idea out of the box.

    • @vivekpatel2736
      @vivekpatel2736 6 месяцев назад

      @AdarshMamidpelliwar i want to do same thing did you find out how this possible ?

    • @AdarshMamidpelliwar
      @AdarshMamidpelliwar 6 месяцев назад

      @@vivekpatel2736 i was able to do

  • @muzammilnizamani8585
    @muzammilnizamani8585 Год назад +1

    will this output images with text as well?

  • @yazanrisheh5127
    @yazanrisheh5127 Год назад +1

    why did you use chiain.invoke and not .run or apply or batch? Sometimes in your videos you use run and sometimes invoke. How od you know when to use when and whats the difference?

    • @codingcrashcourses8533
      @codingcrashcourses8533  Год назад +1

      I thought about using batch and think its probably better, but I tried to keep it simple and just use a loop for every call.
      The difference between run and invoke is the chain. I try to use Language expression language only in my newer videos and invoke is the implementation of the runnable interface, while run is the implementation of the (deprecated) chain interface

  • @chakerayachi8468
    @chakerayachi8468 7 месяцев назад

    nicely explained and nice informations as always but i have a question my files are stored in azure blob storage i am getting tghem throw blob loader does implementing the multimodal works with them?

    • @codingcrashcourses8533
      @codingcrashcourses8533  7 месяцев назад +1

      I don´t know to be honest, but I think it should be possible. If not maybe try to get the files directly with the Azure SDK

    • @chakerayachi8468
      @chakerayachi8468 7 месяцев назад

      @@codingcrashcourses8533 as always thanks for replying to my comments my mentor

  • @egitaufiqnoor3612
    @egitaufiqnoor3612 7 месяцев назад

    how to store vectore created to local? so ican used again later

    • @codingcrashcourses8533
      @codingcrashcourses8533  7 месяцев назад +1

      Faiss and chroma offer methods to do that. You will find that in the langchain docs

  • @TaugenichtsRichtiger
    @TaugenichtsRichtiger 3 месяца назад

    Hello. I want to run this code on Linux. Do I have to download tesseract? What does it do? I deleted the relevant statements and found that the code will report an error.

    • @codingcrashcourses8533
      @codingcrashcourses8533  3 месяца назад +2

      Tesseract is an OCR library. Read here: github.com/tesseract-ocr/tesseract . On linux it´s very easy to install

    • @TaugenichtsRichtiger
      @TaugenichtsRichtiger 3 месяца назад

      @@codingcrashcourses8533 Thanks for the reply. Is there anything to change in the code in linux please? For example, should I delete this “pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'” or change it to another statement? ?

  • @hBenDg
    @hBenDg 2 месяца назад

    Thanks for the vid! Subscribed :)
    Is it not easier to now just convert the pdf to a set of high fidelity images, then get LLM with vision to review those images. It can return all text eg stored in SQLLite, while keeping the context of the text / tables / embedded images etc. then you mine the returned text…? I find that Tesseract is just unreliable enough to be dangerous (!)

    • @codingcrashcourses8533
      @codingcrashcourses8533  2 месяца назад

      @@hBenDg i also thought about it, but i guess you might loose some Information bei doing it this way

  • @saurabhjain507
    @saurabhjain507 Год назад

    Will this part of partitioning work on Azure? How do you read pdf from a storage container?

    • @codingcrashcourses8533
      @codingcrashcourses8533  Год назад

      I have not tried this yet. I would use the Azure SDK, but not sure if that works the same as reading the file from the local filesystem

    • @saurabhjain507
      @saurabhjain507 Год назад

      @@codingcrashcourses8533 PDFs stored as blobs on Azure are different than reading locally. I have tried using langchain but was not able to read it. I then used pypdf to read the pdf as a streaming object.

  • @ajeetojha3745
    @ajeetojha3745 8 месяцев назад

    Can you pls share the notebook

  • @Slimshady68356
    @Slimshady68356 Год назад +1

    Hi Markus ,I am having problem with downloading tesseract ,the download is really slow , do you have any link to tesseract

    • @codingcrashcourses8533
      @codingcrashcourses8533  Год назад +2

      digi.bib.uni-mannheim.de/tesseract/ Hello Zaid, this is another Link I used before. Hope that helps! Best regards

    • @Slimshady68356
      @Slimshady68356 Год назад +1

      @@codingcrashcourses8533 Thanks Markus!!
      😊

  • @micbab-vg2mu
    @micbab-vg2mu Год назад

    thank you

  • @amortalbeing
    @amortalbeing Год назад

    thanks❤