Use LLMs To Extract Data From Text (Expert Mode)

Поделиться
HTML-код
  • Опубликовано: 25 дек 2024

Комментарии • 83

  • @abdoualgerian5396
    @abdoualgerian5396 Год назад +4

    Finally a video that i can enjoy without that backgroud noise , thanks a lot and please continue without background music

  • @ac_cobra8540
    @ac_cobra8540 Год назад +6

    Interesting, I'm going to give this a go. I've experimented with pydantic for parsing llm output into json so this super relevant right now. Thanks Greg, great explainer as always.

  • @pradeepthiyyagura8677
    @pradeepthiyyagura8677 Год назад +1

    Greg, great video as always! I achieved the same results by including the desired output in JSON format along with the initial prompt itself, without using the Kor library.

    • @lucasamadsen
      @lucasamadsen Год назад

      But you to prompt in all your JSON file text, right?

  • @nattapongthanngam7216
    @nattapongthanngam7216 8 месяцев назад

    Thank you, Greg, for this informative video on using LLMs to extract data from text! I found it particularly valuable for its potential application in skill/information extraction from resumes/CVs submitted to large companies. I also noticed a minor error in the
    original code:
    """
    output = chain.predict_and_parse(text="...")['data']
    printOutput(output)
    """
    updated code:
    """
    output = chain.run(text="...")['data']
    print(output)
    """

  • @DelaLange
    @DelaLange Год назад

    You're channel is gold! Thanks a lot for all those tutorials

  • @jakobkristensen2390
    @jakobkristensen2390 Год назад

    Thanks, this was super useful! I would love to get some insight into the feedback you got from those 80 companies.

    • @DataIndependent
      @DataIndependent  Год назад +1

      Most people either wanted the data for investment or sales use cases

    • @jakobkristensen2390
      @jakobkristensen2390 Год назад

      @@DataIndependent I am developing a few small tools for a recruitment bureau, I am interested since what you mentioned seemed relevant

  • @caiyu538
    @caiyu538 Год назад +1

    Great lectures. Thank you to share us for free. Thumb up

    • @DataIndependent
      @DataIndependent  Год назад

      Thank you! I also explore more function calling to extract information

  • @tomwalczak4992
    @tomwalczak4992 Год назад

    Thanks Greg, this is very relevant, will give Kor a try!

  • @AB51002
    @AB51002 Год назад

    I really liked your video "The Data Learning Journey (Part 1)", and am hoping you will post Part 3 soon.

  • @steveadams617
    @steveadams617 10 месяцев назад

    Great introduction. Perfect pacing I’m going to do some further research to see if I can figure out a way to use Kor with a local language model since I deal with confidential patient data in a healthcare setting.

    • @furkankasap806
      @furkankasap806 8 месяцев назад

      I wonder the same thing, some letters for the Turkish language are problematic

    • @JustDoIt-pl2sl
      @JustDoIt-pl2sl 3 месяца назад

      I'm trying to do it, It's not working, the model (using KOR) is acting very stupid

  • @rolexalexander7513
    @rolexalexander7513 Год назад

    Thanks Greg, this was really helpful!

  • @Ideariver
    @Ideariver 8 месяцев назад

    This was an awesome content

  • @mahroushkagaurav3601
    @mahroushkagaurav3601 Год назад

    very insightful - thank you

    • @DataIndependent
      @DataIndependent  Год назад

      Awesome! I need to add another level to this which is openai function calling

  • @SteveSolun
    @SteveSolun Год назад

    Hey Greg, at 7:54 - what is the "many = True" attribute in Text class? Can you please explain with a bit more details?

  • @ChatGPT-ef6sr
    @ChatGPT-ef6sr Год назад +2

    Come on why did you steal my idea 😅. I was literally thinking how to scrape a youtube channel's data usung llms. I was looking for the info. You came right on time!

    • @adumont
      @adumont Год назад +2

      There's a video from James Briggs iirc that, iirc does Q&A against a knowledge base of youtube channels videos transcripts. Not sure if it was a dataset available or he extracted them from RUclips. Hope that helps

    • @ChatGPT-ef6sr
      @ChatGPT-ef6sr Год назад +1

      @@adumont Oh thanks. I will look it up

    • @asiddiqi123
      @asiddiqi123 Год назад +1

      Why everyone making this?😂

  • @oru65
    @oru65 Год назад +1

    In the 3rd cell of the Kore Hello World example the call 'output = chain.predict_and_parse(text=(text))["data"]' must be replaced with 'output = chain.run(text=(text))["data"]' because 'predict_and_parse' has been depreciated.

    • @DataIndependent
      @DataIndependent  Год назад

      Yikes - thanks for the catch. I would also recommend looking at function calling from openai in case you want to see a different approach

  • @manujkumarjoshi9342
    @manujkumarjoshi9342 Год назад

    Wow!! it's magic

  • @Ryan-yj4sd
    @Ryan-yj4sd Год назад

    awesome

  • @adumont
    @adumont Год назад +3

    That's really interesting. Would it be easy (maybe using LangChain) to define like required attributes or elements in the schéma, and if the LLM can't extract them, it would then start a Q&A with the user to ask the missing elememts and attributes until completing the required fields? That would be awesome to launch posterior actions for example.

  • @SudhakarVyas
    @SudhakarVyas 6 месяцев назад

    Hey Greg, thanks for this video!
    Since, there is a limit to access open ai api key without paying, how can the above implementation be carried out with other open source LLMs ?

  • @densonsmith2
    @densonsmith2 Год назад +1

    Where is the "sign up" you mentioned? This seems very interesting for many applications.

    • @DataIndependent
      @DataIndependent  Год назад

      Whoops! I'll put it in the description, this was it
      www.openingattributes.com/

    • @densonsmith2
      @densonsmith2 Год назад

      @@DataIndependent I am very impressed as were all my work friends.

  • @ahmadzaimhilmi
    @ahmadzaimhilmi Год назад +1

    This is precisely what I need for my project, but like you said, the cost can spiral out of control. Have you tried with gpt 3.5? If so, how unreliable was it?

  • @yellowboat8773
    @yellowboat8773 Год назад +3

    Newbie here, I don’t understand why you would need to use the library for this task? Couldn’t you just include in your llm prompt to specify the exact output and formatting you need? Cheers!😊

    • @aflous
      @aflous Год назад

      Basically this abstracts a way all the extra needed work for formatting and text extracting and let you focus on your business logic

  • @fabsync
    @fabsync 6 месяцев назад

    Fantastic tutorial! It would be great to see another tutorial using "transformers" instead of openai with chroma or any local database... and how will you save the extracted information.. does Kor tokenize that information, etc?

  • @pocker91
    @pocker91 Год назад

    hi Greg, thank you for the great video! How would you go about extracting "tags" or predefined values an not String texts? Especially if the number of values ar in the thousands and are too many to just feed into the prompt (token optimization etc). Any ideas? Thank you!

    • @DataIndependent
      @DataIndependent  Год назад

      hmm good question, check out this tutorial and code
      In cell 15 I have a schema for tags that may be helpful: github.com/gkamradt/langchain-tutorials/blob/main/data_generation/Topic%20Modeling%20With%20Language%20Models.ipynb
      ruclips.net/video/pEkxRQFNAs4/видео.html

  • @JustDoIt-pl2sl
    @JustDoIt-pl2sl 3 месяца назад

    I'm having some problems running this with Ollama local models (I tried llama 3.1 and nuextract) and it's not working ... The output has lot of repetitive info

    • @JustDoIt-pl2sl
      @JustDoIt-pl2sl 3 месяца назад

      After close inspection, seems like the local llms don't understand the (bit complex) prompt generated by KOR

  • @rajpdus
    @rajpdus Год назад

    I think we'll have mor such prompt based tooling available sooner or later. Any other specific tools you are experimenting with?

  • @eduardomoscatelli
    @eduardomoscatelli Год назад

    Incredible. Question of 1 million dollars 😊: How to "teach" chatgpt just 1 time what the schema is and be able to validate infinite texts without having to spend a token inputting the schema at the prompt and without having to train the model via fine-tune?

  • @dprggrmr
    @dprggrmr Год назад

    damn, thats cool

  • @davidmichaelcomfort
    @davidmichaelcomfort Год назад

    This looks like a really interesting approach. @DataIndependent any ideas of what the best approach for using tabular data (whether from a pandas dataframe, pyspark dataframe or SQL data table) in conjunction with LLMs? What about combining tabular data with text documents?

  • @mysticaltech
    @mysticaltech Год назад

    Hey Greg, you sure this doesn't work well with GPT-3.5?

  • @pooja1124
    @pooja1124 Год назад

    Can we extract important contents from research paper ? like some text from abstract and some from results or ablation table present. Can you make one video about it as how to customize that text extraction to google sheets.

  • @constandinosk.3251
    @constandinosk.3251 3 месяца назад

    Does anyone know how to do this with an LLM model loaded from transformers?

  • @vanamonde_8809
    @vanamonde_8809 Год назад

    Hello, how to connect langchain not to chatgpt but to local chat-bots by their local-host names?

  • @catyung1094
    @catyung1094 Год назад +1

    Is that a few shot NER ? 🤔

    • @dchip95
      @dchip95 Год назад

      yeah the llm's are pretty good at it now

  • @muhammadowaissiddiqui2443
    @muhammadowaissiddiqui2443 Год назад

    can i use it to extract events from the text using hugging face or any other open source llm model?

    • @DataIndependent
      @DataIndependent  Год назад

      Yes, just swap out your model of choice when you make your LLM

  • @AditiTambi-y8g
    @AditiTambi-y8g Год назад

    How can I extract the data from an API output as JSON?

  • @programwithpradhan
    @programwithpradhan Год назад

    Can you please tell me if I want to give word embeddigns or vector db instead of text how can i do that?

    • @DataIndependent
      @DataIndependent  Год назад

      What do you mean? could you explain more?

    • @programwithpradhan
      @programwithpradhan Год назад

      @@DataIndependent Thank you for your reply:)
      I am working on a problem where I am extracting text from websites like Amazon, McDonald using web scraping and giving that raw text to my Open AI so that it can extract products or food items and their price, ratings, discount etc.
      Now the problem here is that I can't give all the text at a time to the open ai because of the limitation of the number of tokens.
      So is there any other way so that I can give text in chunks.
      Now the second thing is to improve the model performance, instead of giving raw text to the open ai i want to give embedding vectors of that text by the help of open ai embeddings.
      I am using retrievalQA and character text splitter in Lang chain to solve the above problem in my previous approach but how can I do that in this approach that you did in this video.
      Please give me a solution. Thank you for your time ☺️

    • @programwithpradhan
      @programwithpradhan Год назад

      I saw your videos on token limit and embeddings but I want to combine these two ideas and ask a query by the help of kor library so that I can get the output in a structure format.

  • @TonyHoangPodcast
    @TonyHoangPodcast Год назад

    Is there a way to read an entire PDF with Langchain and Kor?

    • @DataIndependent
      @DataIndependent  Год назад

      Oh ya, big time, use a PDF loader and you’re good to go.
      In my “question a book” video I read a pdf this way

    • @TonyHoangPodcast
      @TonyHoangPodcast Год назад

      @@DataIndependent thanks watching that video right now.

    • @TonyHoangPodcast
      @TonyHoangPodcast Год назад

      @@DataIndependent after watching that video, do I need to use a vector database or can I just use the PDF loader and pipe that directly into Kor?

  • @mvasanth5200
    @mvasanth5200 Год назад

    Can anyone help me with this error [initial_value must be str or None, not dict], while executing chain.predict and parse

    • @SundarBalamurugan
      @SundarBalamurugan Год назад

      Same

    • @vamsiraghu3258
      @vamsiraghu3258 9 месяцев назад

      i tried `chain.run()` and it worked.
      output = chain.run(text=(text))["data"]
      printOutput(output)

  • @wiktorm9858
    @wiktorm9858 Год назад

    Is there an existing tool that is cutting low-signal text?

    • @DataIndependent
      @DataIndependent  Год назад

      What kind of low signal text?

    • @wiktorm9858
      @wiktorm9858 Год назад

      @@DataIndependent this is term that you used for (probably) "filler words"; words that do not carry much of meaning

  • @Teathebest0
    @Teathebest0 Год назад

    Hi may I know if it is working with LinkedIn?

    • @DataIndependent
      @DataIndependent  Год назад

      Totally - you just need to access their data somehow

  • @thorthumb0031
    @thorthumb0031 Год назад

    pip install kor? his document doesn't specify...

    • @DataIndependent
      @DataIndependent  Год назад +1

      Yes! I don't run through the dependencies because it's different for everyone. Especially with sub packages.

  • @Grahfx
    @Grahfx Год назад +1

    This is a wrong approach imho. You have to use output as a text and not as an object. If you do that, you lose the ability to stream the output which is a main feature of these LLM. If you want to structure your text, you'll have to go with MD (mark down). Not to mention also that the translation in object is never deterministic due to the nature of LLM and you could get something unusable for your front end.

    • @ko-Daegu
      @ko-Daegu Год назад

      Wait at what point you are exactly talking u got me a bit confused here

  • @rolenle8794
    @rolenle8794 Год назад

    you painted!

  • @EranMoshe-y9h
    @EranMoshe-y9h Месяц назад

    m'ke?

  • @greendsnow
    @greendsnow Год назад

    It's just too expensive to offer a viable product with OpenAI.
    Ada-002 is $0.0004 per 1K tokens...