Multimodal RAG: Text, Images, Tables & Audio Pipeline

Поделиться
HTML-код
  • Опубликовано: 8 ноя 2024

Комментарии • 15

  • @eventsjamaicamobileapp1426
    @eventsjamaicamobileapp1426 3 месяца назад +4

    This video deserves way more views. It was BRILLIANT!

    • @techwithzoum
      @techwithzoum  3 месяца назад +1

      Thank you!
      Please share with anyone who might benefit from the lessons!

  • @navaneeth44
    @navaneeth44 9 дней назад +1

    This video is a lit.. kudos man.

  • @jbernece
    @jbernece 17 дней назад +1

    Excellent and detailed walkthrough.

  • @PapoIAVetorial-oe5nj
    @PapoIAVetorial-oe5nj 2 месяца назад +1

    Wow, that is a really nice video!!!

    • @techwithzoum
      @techwithzoum  2 месяца назад

      Thank you, PapolAVetorial-oe5nj!

  • @robertboroughs7824
    @robertboroughs7824 21 день назад

    So you can input multi-modal sources.
    On the retrieval side (let’s say a table and an image of vacuum cleaner ). The LLM could be informed by the information in the table.
    Could I retrieve the image of the complete table and/or vacuum cleaner? (The objects )

  • @eventsjamaicamobileapp1426
    @eventsjamaicamobileapp1426 3 месяца назад +1

    Great video. What if it were multiple PDF documents in a single folder. What code would have to be changed?

    • @techwithzoum
      @techwithzoum  3 месяца назад +1

      I am glad the video helped.
      The main change would occur in the data processing part, where we would keep track of the source document for each document being processed. This way, whenever the answer is given, the corresponding metadata would be included as well.

  • @annapetmikel4356
    @annapetmikel4356 3 месяца назад +1

    Hi. thank for a good video. i try to replicate you code and have got error: "Error during transcription: [WinError 2] The system cannot find the file specified". even the mp3-file is created and exists in the directory. Where can i check possible solutions for my problem?

    • @techwithzoum
      @techwithzoum  3 месяца назад

      You are welcome, Anna!
      This is happening because the transcription model could not find the file you want to transcribe.
      Are you making sure to provide the exact file location for transcription?

    • @annapetmikel4356
      @annapetmikel4356 3 месяца назад

      @@techwithzoum i found out that the problem was in ffmpeg.exe file that was not found by the code (the file to transcribe was correct saved). I was not the only person with similar issue and i i found the solution on stackoverflow :) Thank you any way. Again, great tutorial. I am happy to build the same project for my own based on your videos and codebase on github!!!

    • @annapetmikel4356
      @annapetmikel4356 3 месяца назад

      ​@@techwithzoum is it possible to communicate with you in a private message? i have solved the issue described above (got help from stackoverflow). But i have got the new issue and i don't find a solution. may be it is something that you had experience with. The issue appears when i run partition_pdf and error is again: PermissionError: [Errno 13] Permission denied: 'C:\\\\Users\\\\Konsumer\\\\AppData\\\\Local\\\\Temp\\\\tmpt4iifmwt'"
      }. But this is nothing really with permission. ". from chatGPT: "You're right that the error message can be misleading. In your case, the problem isn't actually related to file permissions, but rather to how the unstructured library is trying to handle temporary files and NLTK data downloads."