Information Extraction with LangChain & Kor

Поделиться
HTML-код
  • Опубликовано: 12 дек 2024

Комментарии • 56

  • @VikramSoni2
    @VikramSoni2 Год назад +8

    Great work you are doing here mate. Love the structure of your videos and colab examples.
    I stopped looking in this area due to explosion of tools which are repetitive. And use only your channel to find out whats new nowadays. Keep up the awesome videos.!

  • @kevon217
    @kevon217 Год назад +1

    You’re a godsend. You’ve really helped me understand and utilize the power of these approaches and the packages. Appreciate it!

  • @Pure_Science_and_Technology
    @Pure_Science_and_Technology Год назад

    Awesome! Thank you for taking the time to pass on some of your wisdom and knowledge.

  • @andy111007
    @andy111007 Год назад

    Hey Sam,
    I changed the example: examples=[
    (
    "My first fav meal was at a restaurant called Burnt Ends in Singapore where there top dish italian cuisine was lasagne.",
    {"name": "Burnt Ends", "location": "Singapore","style":"cuisine","top_dish":"lasagne"},
    )
    ],
    and results were much better
    Cheers,
    Andy

  • @RedCloudServices
    @RedCloudServices Год назад

    Sam I subscribe to a ton of AI LLM channels you are a top notch resource thank you. I just need to try this out on the weekends 😆

  • @hichamzmarrou3762
    @hichamzmarrou3762 Год назад

    Great video as usual, thanks sam for the great work!

  • @SloanMosley
    @SloanMosley Год назад +3

    Keep ‘em coming 🎉

  • @tubingphd
    @tubingphd Год назад +3

    Thank you Sam

  • @lacknerish
    @lacknerish Год назад +2

    Great video as usual. Thanks for your hard work. You should do one about Microsoft Guidance! I find the template-driven format pretty natural and ideal.

    • @samwitteveenai
      @samwitteveenai  Год назад +1

      yeah I want to do one about that and guardrails etc.

  • @ChenXibo
    @ChenXibo Год назад

    it's just great! one idea, let's assume the doc to be eactracted is a human made one, so one validator could be to create an agent, asking for help to the doc maker (human) based on the dataframe extracted.

  • @adityaroy4261
    @adityaroy4261 Год назад +2

    Is there any other open source models that can extract information apart from openai?

  • @MadhavanSureshRobos
    @MadhavanSureshRobos Год назад +2

    Amazing. Why can't we use open source models with small size to test and improve the responses for these tasks? That would be real value instead plugging everything to GPT3 or 4

  • @sammathew535
    @sammathew535 Год назад +3

    Thanks Sam.
    I actually used it for extracting items that would even require some math reasoning (e.g., total cost would require some math operations to be performed, based on the numbers in the text) which is then left to the LLM's accuracy. It got the text objects right most of the time but didn't do all that well on numbers. Any suggestions how this could be implemented?

    • @vivekmathur3068
      @vivekmathur3068 Год назад

      I am trying to do the same. I can't seem to understand what I need to search for.

  • @alx8439
    @alx8439 Год назад

    Have a look on Guanaco which was trained following new QLoRA approach. Might be interesting for you and your audience

  • @efneogearbox
    @efneogearbox Год назад +1

    Hi Sam, thanks for this video! Do you know maybe how to use vectorestores with kor? Kor generates long prompts and when I add a text, it usually exceeds the token limit on OpenAI.
    When using pure LangChain, I can easly use text splitter and vectorstore to grab the relevant chunks of text, but I find it difficult to replicate it with kor. Any idea how to go around it? Thank you! Franek

  • @Truizify
    @Truizify Год назад +1

    Thanks for the awesome videos! 👏
    What's interesting about this one is that it seems to work well in my limited testing, but the author himself claims the implementation is "half-baked" and prone to error. They recommend people try alternative libraries like Promptify and MiniChain to achieve the same output - could you do a video on either/both of those?

    • @samwitteveenai
      @samwitteveenai  Год назад +3

      Yeah I should make a benchmark comparing to the alternatives. I think the author of Kor is very honest and I think many of the issues are to do with the qualities of the LLM rather than that package.

    • @Truizify
      @Truizify Год назад

      @@samwitteveenai That would be awesome!

  • @nosiphondlovu6751
    @nosiphondlovu6751 Год назад

    Thanks a lot
    How is your real-life example not nested, how possible is it.

  • @shivanidwivedi1625
    @shivanidwivedi1625 7 месяцев назад

    does output we get using kor depend on the operating system we are using?

  • @gigabytechanz9646
    @gigabytechanz9646 Год назад

    Great Work! Thank You!

  • @alexdantart
    @alexdantart Год назад

    and what about not using openai and use any nice pretrained model in your language ?

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w Год назад

    Wow, that’s really cool. Is Kor the only game in town for doing this currently?

    • @samwitteveenai
      @samwitteveenai  Год назад

      No there are a couple of other ways as well, So I might make a vid about them at some point as well

  • @pypypy4228
    @pypypy4228 Год назад

    14:23 probably pd.json_normalize(json_data) would work out of the box here

  • @joe_hoeller_chicago
    @joe_hoeller_chicago 2 месяца назад

    @samwitteveenai Because it uses Langchain, can KOR work with local models like Llama/Ollama? (This might be a stupid ques, but this is not clear to me in the video).

    • @samwitteveenai
      @samwitteveenai  Месяц назад

      there are better ways to do this now and good open models. Do you have have a specific use case?

  • @훌라훌라-k3b
    @훌라훌라-k3b Год назад

    hi thank you so much for the wonderful video. Is if fine to use confidential information of a company for information extraction using Langchain?? I mean does langchain itself doesn't have privacy concerns for that usage?

    • @samwitteveenai
      @samwitteveenai  Год назад

      No it is just software that runs on your setup, its where the tools and LLMs are hosted that cause the privacy issues etc

  • @eduardomoscatelli
    @eduardomoscatelli Год назад

    Incredible. Question of 1 million dollars 😊: How to "teach" chatgpt just 1 time what the schema is and be able to validate infinite texts without having to spend a token inputting the schema at the prompt and without having to train the model via fine-tune?

    • @samwitteveenai
      @samwitteveenai  Год назад

      It is all put in via ICL (In context Learning)

    • @eduardomoscatelli
      @eduardomoscatelli Год назад

      @@samwitteveenai Thanks for the tip. Do you have any indication of material so I can do this in a nocode way?

  • @andy111007
    @andy111007 Год назад

    Hey Sam, chain.predict_and_parse has deprecated, please change to : output2 = chain.run(text=("Alice Doe moved from New York to Boston, MA while Bob Smith did the opposite"))["data"]
    printOutput(output2)
    def printOutput(output):
    print(json.dumps(output,sort_keys=True, indent=3))
    Regards,
    Andy

  • @krishradha5709
    @krishradha5709 Год назад

    Can we use the extracted text as an input to the llm?

  • @candidateuser1
    @candidateuser1 Год назад

    I cannot get past the StringIO error in Kor library. can anyone help me with this?

  • @fernandosanchezvillanueva4762
    @fernandosanchezvillanueva4762 Год назад

    Great work, How long can be the sentence? The same number of token than ChatGpt admite?

    • @samwitteveenai
      @samwitteveenai  Год назад

      It's not limited to a sentence really it can be anything you can 'stuff' into one pass of the LLM, I generally do a few paragraphs at a time.

  • @onirdutta666
    @onirdutta666 Год назад +1

    guys, is anybody facing await issue..how to solve this one

    • @programwithpradhan
      @programwithpradhan Год назад

      Hii have you solved that issue because I am also facing same problem? Please reply

  • @programwithpradhan
    @programwithpradhan Год назад

    Can you please tell me instead of giving text data is there any other way so that i can give embedding vectors as input to the llm with this approach that you discussed in this video

    • @samwitteveenai
      @samwitteveenai  Год назад

      I am not sure why you would want to do that, can you explain.

    • @programwithpradhan
      @programwithpradhan Год назад

      @@samwitteveenai I can't give all of the raw text to the open ai because the text is so long like more than 200k letters are there so I need to convert that text into chunks and do the embedding

  • @TheKingfysher
    @TheKingfysher Год назад

    Do you know if its possible to feed in multiple text chunks into the pipeline like you can do with the langchain QA Chain?

    • @samwitteveenai
      @samwitteveenai  Год назад

      Yeah that should doable. It will really operate on any input.

  • @mukkeshmckenzie7386
    @mukkeshmckenzie7386 Год назад

    Thank you!

  • @8888-u6n
    @8888-u6n Год назад

    Hi Sam good video 👍 can you make a video on how to run Private GTP on your local machine and colab 🙂👍

  • @Teathebest0
    @Teathebest0 Год назад

    Hi, is it working with LinkedIn?

  • @Quitcool
    @Quitcool Год назад

    i think after release OpenAI Function Agent that Kor is useless

  • @constandinosk.3251
    @constandinosk.3251 3 месяца назад

    Hey Sam! Great video!
    Im looking to do ner using an LLM model using transformers library. Do you know how to create my llm in my code, without it being openai's LLM?
    from kor import create_extraction_chain, Object, Text
    schema = Object(
    id="person",
    description=(
    "Personal info about a person"
    ),
    attributes=[
    Text(
    id="first name",
    description="The first name of a person",
    examples=[],
    # many=True,
    ),
    ],
    examples=[
    ("Alice and Bob are friends", [{'first_name': 'Alice'}, {'first_name': 'Bob'}])
    ]
    )
    llm = "?"
    chain = create_extraction_chain(llm, schema)
    text = "My name is Bobby and my Sister is Rachel. My brother is Joe."
    output = chain.invoke(("My name is Bobby. My brother's name Joe."))
    print(output)

    • @samwitteveenai
      @samwitteveenai  3 месяца назад +1

      For a basic NER model you can just use something like a fine-tuned DistilRoBERTA etc

  • @muhamadabdallah7960
    @muhamadabdallah7960 Год назад

    can you please tell me or make a new video about making a tool or a transoformer agent that can take an audio and dubb it to another language with whisper or Nllb-200 and make a talking avatar to say it with sadtalker for free . thank you very much .

  • @maninzn
    @maninzn Год назад

    I have a pdf with a table data. What is the best way to extract that and store it as vectors for proper retrieval? The standard textsplitter is not accurate since it is storing it as one continuous text. Cheers!

  • @candidateuser1
    @candidateuser1 Год назад

    I cannot get past the StringIO error in Kor library. can anyone help me with this?