Multimodal RAG: Chat with PDFs (Images & Tables) [latest version]

Поделиться
HTML-код
  • Опубликовано: 22 ноя 2024

Комментарии • 36

  • @ZevUhuru
    @ZevUhuru День назад

    Bro I literally came to back to get your old video on PDFs and you already have an update. Thank You!

  • @algatra6942
    @algatra6942 10 дней назад +4

    Idk, i just finally found most understandable AI Explanation Content. Thank you Alejandro

    • @alejandro_ao
      @alejandro_ao  9 дней назад

      glad to hear this :)

    • @argl1995
      @argl1995 9 дней назад

      ​@@alejandro_aoI want to create a multi llm chatbot for telecommunication, is it a way to connect with you apart from RUclips so that I can share the problem statement with you ?

  • @whouchekin
    @whouchekin 10 дней назад +2

    the best touch is when you add front-end
    good job

    • @alejandro_ao
      @alejandro_ao  9 дней назад +3

      hey! i'll add a ui for this in a coming tutorial 🤓

  • @GlebSamokhvalov
    @GlebSamokhvalov День назад

    Great video! But what about big documents? If one document will reach the tokens limits? Or just LLM will lost in long context?

  • @SidewaysCat
    @SidewaysCat 9 дней назад +3

    Hey dude what are you using to screen record? Mouse sizing and movement looks super smooth id like to create a similar style when giving tutorials

    • @alejandro_ao
      @alejandro_ao  9 дней назад

      hey there, that's this screen studio app for mac developed by the awesome Adam Pietrasiak @pie6k, check it out :)

  • @ronnie333333
    @ronnie333333 8 дней назад

    Thank you for the video. Just curious, how to go about persisting the multivector database? What data sources are available that cater to such requirements? Also, how do we go about getting an image as an input from the user, so the language model can relate to it within the documents and predict an answer!

  • @jaimeperezpazo
    @jaimeperezpazo 4 дня назад

    Excellent!!!! Thank you Alejandro

  • @GowthamRaghavanR
    @GowthamRaghavanR 4 дня назад

    Good one!!, did you see any opensource alternatives like Markers?

  • @muhammadadilnaeem
    @muhammadadilnaeem 10 дней назад +1

    Amazing Toturial

  • @onkie.ponkie
    @onkie.ponkie 10 дней назад +1

    i was about to learn from the previous video. But you brother. just bring more gold.

  • @Diego_UG
    @Diego_UG 8 дней назад

    What do you recommend or how do you suggest that the conversion of a PDF of images (text images) to text can be automated? The problem is that traditional OCR does not always do the job well, but ChatGPT can handle difficult images.

  • @duanxn
    @duanxn 10 дней назад

    Great tutorial, very detailed. Just one question, any options to link the text chunk that describes the image as the context of the image to create more accurate summary of the image?

    • @alejandro_ao
      @alejandro_ao  9 дней назад

      beautiful question. totally. as you can see, the image is actually one of the `orig_elements` inside a `CompositeElement`. and the `CompositeElement` object has a property called `text`, which contains the raw text of the entire chunk. this means that instead of just extracting the image alone like i did here, you can extract the image alongside the text in its parent `CompositeElement` and send that along with the image when generating the summary. great idea 💪

  • @Pman-i3c
    @Pman-i3c 10 дней назад +1

    Very nice, is it possible to be done with local LLM like Ollama model?

    • @alejandro_ao
      @alejandro_ao  10 дней назад +2

      Yes, absolutely. just use the langchain ollama integration and change the line of code where i use ChatOpenAI or ChatGroq. Be sure to select multimodal models when dealing with images though

  • @olexiypukhov-KT
    @olexiypukhov-KT 10 дней назад +2

    You should look into llamaparse rather than unstructured. The amount of content I've indexed into the vector db would take 15 days with unstructured, vs. with llamaparse it only takes a few hours. Plus, you can make the api calls async as well.

    • @alejandro_ao
      @alejandro_ao  9 дней назад +1

      i LOVE llamaparse. i'll make a video about it this month

    • @daniellopez8078
      @daniellopez8078 День назад

      Do you know if unstructured is open-source (meaning free)? Do you know any other free alternative to unstructured?

    • @olexiypukhov-KT
      @olexiypukhov-KT День назад

      @@daniellopez8078 Unstructured is free but its slow. Its default with langchain. Lllamaparse offers a free plan, in which it gives 1000 free pages to parse daily.

    • @olexiypukhov-KT
      @olexiypukhov-KT 3 часа назад

      @@daniellopez8078 Unstructured is free. They have a open source ver and a propriety version. The propriety version is paid, and apparently offers better quality. The free unstructured is slow. Llamaparse is fast, and it gives you 1000 pages free per day.

  • @julianomoraisbarbosa
    @julianomoraisbarbosa 9 дней назад

    # til
    tks for you video.
    is possible using crewAI in the same example ?

  • @alexramos587
    @alexramos587 10 дней назад +1

    Nice

  • @blakchos
    @blakchos 2 дня назад

    any idea to install poppler tesseract libmagic in windows maqhine?

  • @AkashL-y9q
    @AkashL-y9q 10 дней назад +1

    Hi Bro, Can you create a video for Multimodal RAG: Chat with video visuals and dialogues.

    • @alejandro_ao
      @alejandro_ao  10 дней назад +3

      this sounds cool! i’ll make a video about it!

    • @AkashL-y9q
      @AkashL-y9q 10 дней назад

      Thanks @@alejandro_ao

  • @karansingh-ce8yy
    @karansingh-ce8yy 10 дней назад

    what about mathematical equatons?

    • @alejandro_ao
      @alejandro_ao  9 дней назад +3

      in this example, i embedded them with the rest of the text. if you want to process them separately, you can always extract them from the `CompositeElement` like i did here with the images. then you can maybe have a LLM explain the equation and vectorize that explanation (like we did with the description of the images). in my case, i just put them with the rest of the text, i feel like that gives the LLM enough context about it.

    • @karansingh-ce8yy
      @karansingh-ce8yy 9 дней назад +1

      @@alejandro_ao thanks for the context i was stuck at this for a week now