Great work you are doing here mate. Love the structure of your videos and colab examples. I stopped looking in this area due to explosion of tools which are repetitive. And use only your channel to find out whats new nowadays. Keep up the awesome videos.!
Hey Sam, I changed the example: examples=[ ( "My first fav meal was at a restaurant called Burnt Ends in Singapore where there top dish italian cuisine was lasagne.", {"name": "Burnt Ends", "location": "Singapore","style":"cuisine","top_dish":"lasagne"}, ) ], and results were much better Cheers, Andy
Great video as usual. Thanks for your hard work. You should do one about Microsoft Guidance! I find the template-driven format pretty natural and ideal.
it's just great! one idea, let's assume the doc to be eactracted is a human made one, so one validator could be to create an agent, asking for help to the doc maker (human) based on the dataframe extracted.
Amazing. Why can't we use open source models with small size to test and improve the responses for these tasks? That would be real value instead plugging everything to GPT3 or 4
Thanks Sam. I actually used it for extracting items that would even require some math reasoning (e.g., total cost would require some math operations to be performed, based on the numbers in the text) which is then left to the LLM's accuracy. It got the text objects right most of the time but didn't do all that well on numbers. Any suggestions how this could be implemented?
Hi Sam, thanks for this video! Do you know maybe how to use vectorestores with kor? Kor generates long prompts and when I add a text, it usually exceeds the token limit on OpenAI. When using pure LangChain, I can easly use text splitter and vectorstore to grab the relevant chunks of text, but I find it difficult to replicate it with kor. Any idea how to go around it? Thank you! Franek
Thanks for the awesome videos! 👏 What's interesting about this one is that it seems to work well in my limited testing, but the author himself claims the implementation is "half-baked" and prone to error. They recommend people try alternative libraries like Promptify and MiniChain to achieve the same output - could you do a video on either/both of those?
Yeah I should make a benchmark comparing to the alternatives. I think the author of Kor is very honest and I think many of the issues are to do with the qualities of the LLM rather than that package.
@samwitteveenai Because it uses Langchain, can KOR work with local models like Llama/Ollama? (This might be a stupid ques, but this is not clear to me in the video).
hi thank you so much for the wonderful video. Is if fine to use confidential information of a company for information extraction using Langchain?? I mean does langchain itself doesn't have privacy concerns for that usage?
Incredible. Question of 1 million dollars 😊: How to "teach" chatgpt just 1 time what the schema is and be able to validate infinite texts without having to spend a token inputting the schema at the prompt and without having to train the model via fine-tune?
Hey Sam, chain.predict_and_parse has deprecated, please change to : output2 = chain.run(text=("Alice Doe moved from New York to Boston, MA while Bob Smith did the opposite"))["data"] printOutput(output2) def printOutput(output): print(json.dumps(output,sort_keys=True, indent=3)) Regards, Andy
Can you please tell me instead of giving text data is there any other way so that i can give embedding vectors as input to the llm with this approach that you discussed in this video
@@samwitteveenai I can't give all of the raw text to the open ai because the text is so long like more than 200k letters are there so I need to convert that text into chunks and do the embedding
Hey Sam! Great video! Im looking to do ner using an LLM model using transformers library. Do you know how to create my llm in my code, without it being openai's LLM? from kor import create_extraction_chain, Object, Text schema = Object( id="person", description=( "Personal info about a person" ), attributes=[ Text( id="first name", description="The first name of a person", examples=[], # many=True, ), ], examples=[ ("Alice and Bob are friends", [{'first_name': 'Alice'}, {'first_name': 'Bob'}]) ] ) llm = "?" chain = create_extraction_chain(llm, schema) text = "My name is Bobby and my Sister is Rachel. My brother is Joe." output = chain.invoke(("My name is Bobby. My brother's name Joe.")) print(output)
can you please tell me or make a new video about making a tool or a transoformer agent that can take an audio and dubb it to another language with whisper or Nllb-200 and make a talking avatar to say it with sadtalker for free . thank you very much .
I have a pdf with a table data. What is the best way to extract that and store it as vectors for proper retrieval? The standard textsplitter is not accurate since it is storing it as one continuous text. Cheers!
Great work you are doing here mate. Love the structure of your videos and colab examples.
I stopped looking in this area due to explosion of tools which are repetitive. And use only your channel to find out whats new nowadays. Keep up the awesome videos.!
You’re a godsend. You’ve really helped me understand and utilize the power of these approaches and the packages. Appreciate it!
Awesome! Thank you for taking the time to pass on some of your wisdom and knowledge.
Hey Sam,
I changed the example: examples=[
(
"My first fav meal was at a restaurant called Burnt Ends in Singapore where there top dish italian cuisine was lasagne.",
{"name": "Burnt Ends", "location": "Singapore","style":"cuisine","top_dish":"lasagne"},
)
],
and results were much better
Cheers,
Andy
Sam I subscribe to a ton of AI LLM channels you are a top notch resource thank you. I just need to try this out on the weekends 😆
Great video as usual, thanks sam for the great work!
Keep ‘em coming 🎉
Thank you Sam
Great video as usual. Thanks for your hard work. You should do one about Microsoft Guidance! I find the template-driven format pretty natural and ideal.
yeah I want to do one about that and guardrails etc.
it's just great! one idea, let's assume the doc to be eactracted is a human made one, so one validator could be to create an agent, asking for help to the doc maker (human) based on the dataframe extracted.
Is there any other open source models that can extract information apart from openai?
Amazing. Why can't we use open source models with small size to test and improve the responses for these tasks? That would be real value instead plugging everything to GPT3 or 4
Thanks Sam.
I actually used it for extracting items that would even require some math reasoning (e.g., total cost would require some math operations to be performed, based on the numbers in the text) which is then left to the LLM's accuracy. It got the text objects right most of the time but didn't do all that well on numbers. Any suggestions how this could be implemented?
I am trying to do the same. I can't seem to understand what I need to search for.
Have a look on Guanaco which was trained following new QLoRA approach. Might be interesting for you and your audience
Hi Sam, thanks for this video! Do you know maybe how to use vectorestores with kor? Kor generates long prompts and when I add a text, it usually exceeds the token limit on OpenAI.
When using pure LangChain, I can easly use text splitter and vectorstore to grab the relevant chunks of text, but I find it difficult to replicate it with kor. Any idea how to go around it? Thank you! Franek
Thanks for the awesome videos! 👏
What's interesting about this one is that it seems to work well in my limited testing, but the author himself claims the implementation is "half-baked" and prone to error. They recommend people try alternative libraries like Promptify and MiniChain to achieve the same output - could you do a video on either/both of those?
Yeah I should make a benchmark comparing to the alternatives. I think the author of Kor is very honest and I think many of the issues are to do with the qualities of the LLM rather than that package.
@@samwitteveenai That would be awesome!
Thanks a lot
How is your real-life example not nested, how possible is it.
does output we get using kor depend on the operating system we are using?
Great Work! Thank You!
and what about not using openai and use any nice pretrained model in your language ?
Wow, that’s really cool. Is Kor the only game in town for doing this currently?
No there are a couple of other ways as well, So I might make a vid about them at some point as well
14:23 probably pd.json_normalize(json_data) would work out of the box here
@samwitteveenai Because it uses Langchain, can KOR work with local models like Llama/Ollama? (This might be a stupid ques, but this is not clear to me in the video).
there are better ways to do this now and good open models. Do you have have a specific use case?
hi thank you so much for the wonderful video. Is if fine to use confidential information of a company for information extraction using Langchain?? I mean does langchain itself doesn't have privacy concerns for that usage?
No it is just software that runs on your setup, its where the tools and LLMs are hosted that cause the privacy issues etc
Incredible. Question of 1 million dollars 😊: How to "teach" chatgpt just 1 time what the schema is and be able to validate infinite texts without having to spend a token inputting the schema at the prompt and without having to train the model via fine-tune?
It is all put in via ICL (In context Learning)
@@samwitteveenai Thanks for the tip. Do you have any indication of material so I can do this in a nocode way?
Hey Sam, chain.predict_and_parse has deprecated, please change to : output2 = chain.run(text=("Alice Doe moved from New York to Boston, MA while Bob Smith did the opposite"))["data"]
printOutput(output2)
def printOutput(output):
print(json.dumps(output,sort_keys=True, indent=3))
Regards,
Andy
Can we use the extracted text as an input to the llm?
I cannot get past the StringIO error in Kor library. can anyone help me with this?
Great work, How long can be the sentence? The same number of token than ChatGpt admite?
It's not limited to a sentence really it can be anything you can 'stuff' into one pass of the LLM, I generally do a few paragraphs at a time.
guys, is anybody facing await issue..how to solve this one
Hii have you solved that issue because I am also facing same problem? Please reply
Can you please tell me instead of giving text data is there any other way so that i can give embedding vectors as input to the llm with this approach that you discussed in this video
I am not sure why you would want to do that, can you explain.
@@samwitteveenai I can't give all of the raw text to the open ai because the text is so long like more than 200k letters are there so I need to convert that text into chunks and do the embedding
Do you know if its possible to feed in multiple text chunks into the pipeline like you can do with the langchain QA Chain?
Yeah that should doable. It will really operate on any input.
Thank you!
Hi Sam good video 👍 can you make a video on how to run Private GTP on your local machine and colab 🙂👍
Hi, is it working with LinkedIn?
What do you want to do with Linkedin?
i think after release OpenAI Function Agent that Kor is useless
Hey Sam! Great video!
Im looking to do ner using an LLM model using transformers library. Do you know how to create my llm in my code, without it being openai's LLM?
from kor import create_extraction_chain, Object, Text
schema = Object(
id="person",
description=(
"Personal info about a person"
),
attributes=[
Text(
id="first name",
description="The first name of a person",
examples=[],
# many=True,
),
],
examples=[
("Alice and Bob are friends", [{'first_name': 'Alice'}, {'first_name': 'Bob'}])
]
)
llm = "?"
chain = create_extraction_chain(llm, schema)
text = "My name is Bobby and my Sister is Rachel. My brother is Joe."
output = chain.invoke(("My name is Bobby. My brother's name Joe."))
print(output)
For a basic NER model you can just use something like a fine-tuned DistilRoBERTA etc
can you please tell me or make a new video about making a tool or a transoformer agent that can take an audio and dubb it to another language with whisper or Nllb-200 and make a talking avatar to say it with sadtalker for free . thank you very much .
I have a pdf with a table data. What is the best way to extract that and store it as vectors for proper retrieval? The standard textsplitter is not accurate since it is storing it as one continuous text. Cheers!
I cannot get past the StringIO error in Kor library. can anyone help me with this?