Information Extraction with LangChain & Kor

Поделиться
HTML-код
  • Опубликовано: 21 авг 2024

Комментарии • 52

  • @VikramSoni2
    @VikramSoni2 Год назад +8

    Great work you are doing here mate. Love the structure of your videos and colab examples.
    I stopped looking in this area due to explosion of tools which are repetitive. And use only your channel to find out whats new nowadays. Keep up the awesome videos.!

  • @kevon217
    @kevon217 Год назад +1

    You’re a godsend. You’ve really helped me understand and utilize the power of these approaches and the packages. Appreciate it!

  • @tubingphd
    @tubingphd Год назад +3

    Thank you Sam

  • @Canna_Science_and_Technology
    @Canna_Science_and_Technology Год назад

    Awesome! Thank you for taking the time to pass on some of your wisdom and knowledge.

  • @RedCloudServices
    @RedCloudServices Год назад

    Sam I subscribe to a ton of AI LLM channels you are a top notch resource thank you. I just need to try this out on the weekends 😆

  • @SloanMosley
    @SloanMosley Год назад +3

    Keep ‘em coming 🎉

  • @andy111007
    @andy111007 11 месяцев назад

    Hey Sam,
    I changed the example: examples=[
    (
    "My first fav meal was at a restaurant called Burnt Ends in Singapore where there top dish italian cuisine was lasagne.",
    {"name": "Burnt Ends", "location": "Singapore","style":"cuisine","top_dish":"lasagne"},
    )
    ],
    and results were much better
    Cheers,
    Andy

  • @lacknerish
    @lacknerish Год назад +2

    Great video as usual. Thanks for your hard work. You should do one about Microsoft Guidance! I find the template-driven format pretty natural and ideal.

    • @samwitteveenai
      @samwitteveenai  Год назад +1

      yeah I want to do one about that and guardrails etc.

  • @hichamzmarrou3762
    @hichamzmarrou3762 Год назад

    Great video as usual, thanks sam for the great work!

  • @ChenXibo
    @ChenXibo Год назад

    it's just great! one idea, let's assume the doc to be eactracted is a human made one, so one validator could be to create an agent, asking for help to the doc maker (human) based on the dataframe extracted.

  • @MadhavanSureshRobos
    @MadhavanSureshRobos Год назад +2

    Amazing. Why can't we use open source models with small size to test and improve the responses for these tasks? That would be real value instead plugging everything to GPT3 or 4

  • @adityaroy4261
    @adityaroy4261 Год назад +2

    Is there any other open source models that can extract information apart from openai?

  • @alx8439
    @alx8439 Год назад

    Have a look on Guanaco which was trained following new QLoRA approach. Might be interesting for you and your audience

  • @gigabytechanz9646
    @gigabytechanz9646 Год назад

    Great Work! Thank You!

  • @sammathew535
    @sammathew535 Год назад +3

    Thanks Sam.
    I actually used it for extracting items that would even require some math reasoning (e.g., total cost would require some math operations to be performed, based on the numbers in the text) which is then left to the LLM's accuracy. It got the text objects right most of the time but didn't do all that well on numbers. Any suggestions how this could be implemented?

    • @vivekmathur3068
      @vivekmathur3068 Год назад

      I am trying to do the same. I can't seem to understand what I need to search for.

  • @efneogearbox
    @efneogearbox Год назад +1

    Hi Sam, thanks for this video! Do you know maybe how to use vectorestores with kor? Kor generates long prompts and when I add a text, it usually exceeds the token limit on OpenAI.
    When using pure LangChain, I can easly use text splitter and vectorstore to grab the relevant chunks of text, but I find it difficult to replicate it with kor. Any idea how to go around it? Thank you! Franek

  • @mukkeshmckenzie7386
    @mukkeshmckenzie7386 Год назад

    Thank you!

  • @Truizify
    @Truizify Год назад +1

    Thanks for the awesome videos! 👏
    What's interesting about this one is that it seems to work well in my limited testing, but the author himself claims the implementation is "half-baked" and prone to error. They recommend people try alternative libraries like Promptify and MiniChain to achieve the same output - could you do a video on either/both of those?

    • @samwitteveenai
      @samwitteveenai  Год назад +3

      Yeah I should make a benchmark comparing to the alternatives. I think the author of Kor is very honest and I think many of the issues are to do with the qualities of the LLM rather than that package.

    • @Truizify
      @Truizify Год назад

      @@samwitteveenai That would be awesome!

  • @eduardomoscatelli
    @eduardomoscatelli Год назад

    Incredible. Question of 1 million dollars 😊: How to "teach" chatgpt just 1 time what the schema is and be able to validate infinite texts without having to spend a token inputting the schema at the prompt and without having to train the model via fine-tune?

    • @samwitteveenai
      @samwitteveenai  Год назад

      It is all put in via ICL (In context Learning)

    • @eduardomoscatelli
      @eduardomoscatelli Год назад

      @@samwitteveenai Thanks for the tip. Do you have any indication of material so I can do this in a nocode way?

  • @shivanidwivedi1625
    @shivanidwivedi1625 4 месяца назад

    does output we get using kor depend on the operating system we are using?

  • @andy111007
    @andy111007 11 месяцев назад

    Hey Sam, chain.predict_and_parse has deprecated, please change to : output2 = chain.run(text=("Alice Doe moved from New York to Boston, MA while Bob Smith did the opposite"))["data"]
    printOutput(output2)
    def printOutput(output):
    print(json.dumps(output,sort_keys=True, indent=3))
    Regards,
    Andy

  • @user-gp6ix8iz9r
    @user-gp6ix8iz9r Год назад

    Hi Sam good video 👍 can you make a video on how to run Private GTP on your local machine and colab 🙂👍

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w Год назад

    Wow, that’s really cool. Is Kor the only game in town for doing this currently?

    • @samwitteveenai
      @samwitteveenai  Год назад

      No there are a couple of other ways as well, So I might make a vid about them at some point as well

  • @nosiphondlovu6751
    @nosiphondlovu6751 9 месяцев назад

    Thanks a lot
    How is your real-life example not nested, how possible is it.

  • @user-dg7ud7wv5n
    @user-dg7ud7wv5n 8 месяцев назад

    I cannot get past the StringIO error in Kor library. can anyone help me with this?

  • @alexdantart
    @alexdantart Год назад

    and what about not using openai and use any nice pretrained model in your language ?

  • @user-ss9bh6jp9s
    @user-ss9bh6jp9s Год назад

    hi thank you so much for the wonderful video. Is if fine to use confidential information of a company for information extraction using Langchain?? I mean does langchain itself doesn't have privacy concerns for that usage?

    • @samwitteveenai
      @samwitteveenai  Год назад

      No it is just software that runs on your setup, its where the tools and LLMs are hosted that cause the privacy issues etc

  • @programwithpradhan
    @programwithpradhan Год назад

    Can you please tell me instead of giving text data is there any other way so that i can give embedding vectors as input to the llm with this approach that you discussed in this video

    • @samwitteveenai
      @samwitteveenai  Год назад

      I am not sure why you would want to do that, can you explain.

    • @programwithpradhan
      @programwithpradhan Год назад

      @@samwitteveenai I can't give all of the raw text to the open ai because the text is so long like more than 200k letters are there so I need to convert that text into chunks and do the embedding

  • @krishradha5709
    @krishradha5709 Год назад

    Can we use the extracted text as an input to the llm?

  • @pypypy4228
    @pypypy4228 Год назад

    14:23 probably pd.json_normalize(json_data) would work out of the box here

  • @onirdutta666
    @onirdutta666 Год назад +1

    guys, is anybody facing await issue..how to solve this one

    • @programwithpradhan
      @programwithpradhan Год назад

      Hii have you solved that issue because I am also facing same problem? Please reply

  • @TheKingfysher
    @TheKingfysher Год назад

    Do you know if its possible to feed in multiple text chunks into the pipeline like you can do with the langchain QA Chain?

    • @samwitteveenai
      @samwitteveenai  Год назад

      Yeah that should doable. It will really operate on any input.

  • @fernandosanchezvillanueva4762
    @fernandosanchezvillanueva4762 Год назад

    Great work, How long can be the sentence? The same number of token than ChatGpt admite?

    • @samwitteveenai
      @samwitteveenai  Год назад

      It's not limited to a sentence really it can be anything you can 'stuff' into one pass of the LLM, I generally do a few paragraphs at a time.

  • @muhamadabdallah7960
    @muhamadabdallah7960 Год назад

    can you please tell me or make a new video about making a tool or a transoformer agent that can take an audio and dubb it to another language with whisper or Nllb-200 and make a talking avatar to say it with sadtalker for free . thank you very much .

  • @Quitcool
    @Quitcool Год назад

    i think after release OpenAI Function Agent that Kor is useless

  • @maninzn
    @maninzn Год назад

    I have a pdf with a table data. What is the best way to extract that and store it as vectors for proper retrieval? The standard textsplitter is not accurate since it is storing it as one continuous text. Cheers!

  • @Teathebest0
    @Teathebest0 Год назад

    Hi, is it working with LinkedIn?

  • @user-dg7ud7wv5n
    @user-dg7ud7wv5n 8 месяцев назад

    I cannot get past the StringIO error in Kor library. can anyone help me with this?