Extract Table Info From PDF & Summarise It Using Llama3 via Ollama | LangChain

Поделиться
HTML-код
  • Опубликовано: 27 окт 2024
  • НаукаНаука

Комментарии • 30

  • @StnImg
    @StnImg 5 месяцев назад +7

    Sir, Can you please make a further video on complete flow of data ingestion to Qdrant vectorDB without using ipynb notebook. I have tried many times without success due to issues like SSL certificate & unable to download nltk issues.

  • @kursatkilic6975
    @kursatkilic6975 4 месяца назад

    It was fruitful video, and wonder if the pdf has complex layout like made by different dimensions rectangles and rectangles have information in it. For that case, yolo or cv2 is used to detect edges and then implement OCR to extract table and information in the tables.
    My question is the way possible to extract layouts and information and then visualize on jupyter ?

  • @THE-AI_INSIDER
    @THE-AI_INSIDER 5 месяцев назад

    Great video! just one thing - if there are any columns in the pdf which have only URLs, then the urls are just shown as NaN,. and the urls are not read during inferencing from the pdf..(after the data structuring), have you also encountered or tried this? Can you try this out in one of the upcoming videos?

  • @ameralfatish2392
    @ameralfatish2392 2 месяца назад

    Very nice video. I have a question. Is it private? If i have sensitive documents. Does it stays private?

    • @datasciencebasics
      @datasciencebasics  2 месяца назад

      yes, it is as you used the llm via Ollama which is downloaded in your machine.And with Unstructured, if you used the pip install, thats private but with API make sure to check how they process the data.

  • @dq-music
    @dq-music 2 месяца назад

    Extracting Japanese tables has problems with garbled characters. unstructured can get characters, why OCR has to re-read them wrong?

  • @anuragbhandari3776
    @anuragbhandari3776 5 месяцев назад +1

    it would be really interesting if you make a video on a multimodal RAG using unstructured, groq, quadrant, langchain and chainlit. (even better to make a streamlit app out of it)

    • @datasciencebasics
      @datasciencebasics  5 месяцев назад

      will take it in my to do list ✅

    • @datasciencebasics
      @datasciencebasics  5 месяцев назад

      here is one example you can try,
      RAG With LlamaParse from LlamaIndex & LangChain 🚀
      ruclips.net/video/f9hvrqVvZl0/видео.html

  • @Srb0002
    @Srb0002 5 месяцев назад +1

    Sir, could you please make a video on extract images from PDFs using open source models.

  • @zuowang5185
    @zuowang5185 2 месяца назад

    How difficult is it to bypass the paywall to build your own instance that serves the pdf extraction instead of using their api?

    • @datasciencebasics
      @datasciencebasics  2 месяца назад

      its not that difficult but installing packages might be challenging as it needs different packages to do the task.

  • @ajaymahich3180
    @ajaymahich3180 4 месяца назад

    How much accuracy is it provides when we are extracting tables and text from scanned and handwritten PDFs ??

  • @The_Equalizer-nl4rg
    @The_Equalizer-nl4rg 4 месяца назад

    which app you use for python coding?

  • @IdPreferNot1
    @IdPreferNot1 5 месяцев назад

    Have you tried llamaparser?

    • @datasciencebasics
      @datasciencebasics  5 месяцев назад +1

      here it is :)
      Super Easy Way To Parse PDF | LlamaParse From LlamaIndex | LlamaCloud
      ruclips.net/video/wRMnHbiz5ck/видео.html

  • @alishaikh782
    @alishaikh782 4 месяца назад

    I have implemented the code in Colab on own custom data.I am facing the issue as it omit the zero's for ex Amount value is 43220.00, but show only 4322. suggest some way so it fix this issue

  • @Rifadm1
    @Rifadm1 4 месяца назад

    Does it cover scanned pdf ?

  • @anuragbhandari3776
    @anuragbhandari3776 5 месяцев назад

    which browser do you use?

  • @TooyAshy-100
    @TooyAshy-100 5 месяцев назад

    Thank you,,,

  • @notSOanonymousBD
    @notSOanonymousBD 5 месяцев назад

    anyone getting error while importing unstructured?