Use LLMs To Extract Data From Text (Expert Mode)

Поделиться
HTML-код
  • Опубликовано: 22 авг 2024

Комментарии • 78

  • @abdoualgerian5396
    @abdoualgerian5396 Год назад +3

    Finally a video that i can enjoy without that backgroud noise , thanks a lot and please continue without background music

  • @nattapongthanngam7216
    @nattapongthanngam7216 4 месяца назад

    Thank you, Greg, for this informative video on using LLMs to extract data from text! I found it particularly valuable for its potential application in skill/information extraction from resumes/CVs submitted to large companies. I also noticed a minor error in the
    original code:
    """
    output = chain.predict_and_parse(text="...")['data']
    printOutput(output)
    """
    updated code:
    """
    output = chain.run(text="...")['data']
    print(output)
    """

  • @ac_cobra8540
    @ac_cobra8540 Год назад +6

    Interesting, I'm going to give this a go. I've experimented with pydantic for parsing llm output into json so this super relevant right now. Thanks Greg, great explainer as always.

  • @pradeepthiyyagura8677
    @pradeepthiyyagura8677 Год назад +1

    Greg, great video as always! I achieved the same results by including the desired output in JSON format along with the initial prompt itself, without using the Kor library.

    • @lucasamadsen
      @lucasamadsen Год назад

      But you to prompt in all your JSON file text, right?

  • @DelaLange
    @DelaLange Год назад

    You're channel is gold! Thanks a lot for all those tutorials

  • @steveadams617
    @steveadams617 6 месяцев назад

    Great introduction. Perfect pacing I’m going to do some further research to see if I can figure out a way to use Kor with a local language model since I deal with confidential patient data in a healthcare setting.

    • @PratyushaMalla
      @PratyushaMalla 5 месяцев назад

      Hey, were you able to do this? I am looking to do something similar with a locally set up LLM.

    • @furkankasap806
      @furkankasap806 4 месяца назад

      I wonder the same thing, some letters for the Turkish language are problematic

  • @caiyu538
    @caiyu538 10 месяцев назад +1

    Great lectures. Thank you to share us for free. Thumb up

    • @DataIndependent
      @DataIndependent  10 месяцев назад

      Thank you! I also explore more function calling to extract information

  • @jakobkristensen2390
    @jakobkristensen2390 10 месяцев назад

    Thanks, this was super useful! I would love to get some insight into the feedback you got from those 80 companies.

    • @DataIndependent
      @DataIndependent  10 месяцев назад +1

      Most people either wanted the data for investment or sales use cases

    • @jakobkristensen2390
      @jakobkristensen2390 10 месяцев назад

      @@DataIndependent I am developing a few small tools for a recruitment bureau, I am interested since what you mentioned seemed relevant

  • @tomwalczak4992
    @tomwalczak4992 Год назад

    Thanks Greg, this is very relevant, will give Kor a try!

  • @AB51002
    @AB51002 Год назад

    I really liked your video "The Data Learning Journey (Part 1)", and am hoping you will post Part 3 soon.

  • @rolexalexander7513
    @rolexalexander7513 11 месяцев назад

    Thanks Greg, this was really helpful!

  • @ChatGPT-ef6sr
    @ChatGPT-ef6sr Год назад +2

    Come on why did you steal my idea 😅. I was literally thinking how to scrape a youtube channel's data usung llms. I was looking for the info. You came right on time!

    • @adumont
      @adumont Год назад +2

      There's a video from James Briggs iirc that, iirc does Q&A against a knowledge base of youtube channels videos transcripts. Not sure if it was a dataset available or he extracted them from RUclips. Hope that helps

    • @ChatGPT-ef6sr
      @ChatGPT-ef6sr Год назад +1

      @@adumont Oh thanks. I will look it up

    • @asiddiqi123
      @asiddiqi123 Год назад +1

      Why everyone making this?😂

  • @oru65
    @oru65 Год назад +1

    In the 3rd cell of the Kore Hello World example the call 'output = chain.predict_and_parse(text=(text))["data"]' must be replaced with 'output = chain.run(text=(text))["data"]' because 'predict_and_parse' has been depreciated.

    • @DataIndependent
      @DataIndependent  Год назад

      Yikes - thanks for the catch. I would also recommend looking at function calling from openai in case you want to see a different approach

  • @mahroushkagaurav3601
    @mahroushkagaurav3601 Год назад

    very insightful - thank you

    • @DataIndependent
      @DataIndependent  Год назад

      Awesome! I need to add another level to this which is openai function calling

  • @yellowboat8773
    @yellowboat8773 Год назад +3

    Newbie here, I don’t understand why you would need to use the library for this task? Couldn’t you just include in your llm prompt to specify the exact output and formatting you need? Cheers!😊

    • @aflous
      @aflous Год назад

      Basically this abstracts a way all the extra needed work for formatting and text extracting and let you focus on your business logic

  • @adumont
    @adumont Год назад +3

    That's really interesting. Would it be easy (maybe using LangChain) to define like required attributes or elements in the schéma, and if the LLM can't extract them, it would then start a Q&A with the user to ask the missing elememts and attributes until completing the required fields? That would be awesome to launch posterior actions for example.

  • @Ideariver
    @Ideariver 4 месяца назад

    This was an awesome content

  • @fabsync
    @fabsync 2 месяца назад

    Fantastic tutorial! It would be great to see another tutorial using "transformers" instead of openai with chroma or any local database... and how will you save the extracted information.. does Kor tokenize that information, etc?

  • @eduardomoscatelli
    @eduardomoscatelli Год назад

    Incredible. Question of 1 million dollars 😊: How to "teach" chatgpt just 1 time what the schema is and be able to validate infinite texts without having to spend a token inputting the schema at the prompt and without having to train the model via fine-tune?

  • @ahmadzaimhilmi
    @ahmadzaimhilmi Год назад +1

    This is precisely what I need for my project, but like you said, the cost can spiral out of control. Have you tried with gpt 3.5? If so, how unreliable was it?

  • @SudhakarVyas
    @SudhakarVyas 2 месяца назад

    Hey Greg, thanks for this video!
    Since, there is a limit to access open ai api key without paying, how can the above implementation be carried out with other open source LLMs ?

  • @Ryan-yj4sd
    @Ryan-yj4sd Год назад

    awesome

  • @densonsmith2
    @densonsmith2 Год назад +1

    Where is the "sign up" you mentioned? This seems very interesting for many applications.

    • @DataIndependent
      @DataIndependent  Год назад

      Whoops! I'll put it in the description, this was it
      www.openingattributes.com/

    • @densonsmith2
      @densonsmith2 Год назад

      @@DataIndependent I am very impressed as were all my work friends.

  • @manujkumarjoshi9342
    @manujkumarjoshi9342 Год назад

    Wow!! it's magic

  • @SteveSolun
    @SteveSolun Год назад

    Hey Greg, at 7:54 - what is the "many = True" attribute in Text class? Can you please explain with a bit more details?

  • @rajpdus
    @rajpdus Год назад

    I think we'll have mor such prompt based tooling available sooner or later. Any other specific tools you are experimenting with?

  • @dprggrmr
    @dprggrmr Год назад

    damn, thats cool

  • @davidmichaelcomfort
    @davidmichaelcomfort Год назад

    This looks like a really interesting approach. @DataIndependent any ideas of what the best approach for using tabular data (whether from a pandas dataframe, pyspark dataframe or SQL data table) in conjunction with LLMs? What about combining tabular data with text documents?

  • @pocker91
    @pocker91 Год назад

    hi Greg, thank you for the great video! How would you go about extracting "tags" or predefined values an not String texts? Especially if the number of values ar in the thousands and are too many to just feed into the prompt (token optimization etc). Any ideas? Thank you!

    • @DataIndependent
      @DataIndependent  Год назад

      hmm good question, check out this tutorial and code
      In cell 15 I have a schema for tags that may be helpful: github.com/gkamradt/langchain-tutorials/blob/main/data_generation/Topic%20Modeling%20With%20Language%20Models.ipynb
      ruclips.net/video/pEkxRQFNAs4/видео.html

  • @pooja1124
    @pooja1124 Год назад

    Can we extract important contents from research paper ? like some text from abstract and some from results or ablation table present. Can you make one video about it as how to customize that text extraction to google sheets.

  • @vanamonde_8809
    @vanamonde_8809 Год назад

    Hello, how to connect langchain not to chatgpt but to local chat-bots by their local-host names?

  • @Grahfx
    @Grahfx Год назад +1

    This is a wrong approach imho. You have to use output as a text and not as an object. If you do that, you lose the ability to stream the output which is a main feature of these LLM. If you want to structure your text, you'll have to go with MD (mark down). Not to mention also that the translation in object is never deterministic due to the nature of LLM and you could get something unusable for your front end.

    • @ko-Daegu
      @ko-Daegu Год назад

      Wait at what point you are exactly talking u got me a bit confused here

  • @user-zb3xf7iq4n
    @user-zb3xf7iq4n Год назад

    How can I extract the data from an API output as JSON?

  • @mvasanth5200
    @mvasanth5200 9 месяцев назад

    Can anyone help me with this error [initial_value must be str or None, not dict], while executing chain.predict and parse

    • @SundarBalamurugan
      @SundarBalamurugan 9 месяцев назад

      Same

    • @vamsiraghu3258
      @vamsiraghu3258 4 месяца назад

      i tried `chain.run()` and it worked.
      output = chain.run(text=(text))["data"]
      printOutput(output)

  • @muhammadowaissiddiqui2443
    @muhammadowaissiddiqui2443 Год назад

    can i use it to extract events from the text using hugging face or any other open source llm model?

    • @DataIndependent
      @DataIndependent  Год назад

      Yes, just swap out your model of choice when you make your LLM

  • @mysticaltech
    @mysticaltech Год назад

    Hey Greg, you sure this doesn't work well with GPT-3.5?

  • @catyung1094
    @catyung1094 Год назад +1

    Is that a few shot NER ? 🤔

    • @dchip95
      @dchip95 Год назад

      yeah the llm's are pretty good at it now

  • @programwithpradhan
    @programwithpradhan Год назад

    Can you please tell me if I want to give word embeddigns or vector db instead of text how can i do that?

    • @DataIndependent
      @DataIndependent  Год назад

      What do you mean? could you explain more?

    • @programwithpradhan
      @programwithpradhan Год назад

      @@DataIndependent Thank you for your reply:)
      I am working on a problem where I am extracting text from websites like Amazon, McDonald using web scraping and giving that raw text to my Open AI so that it can extract products or food items and their price, ratings, discount etc.
      Now the problem here is that I can't give all the text at a time to the open ai because of the limitation of the number of tokens.
      So is there any other way so that I can give text in chunks.
      Now the second thing is to improve the model performance, instead of giving raw text to the open ai i want to give embedding vectors of that text by the help of open ai embeddings.
      I am using retrievalQA and character text splitter in Lang chain to solve the above problem in my previous approach but how can I do that in this approach that you did in this video.
      Please give me a solution. Thank you for your time ☺️

    • @programwithpradhan
      @programwithpradhan Год назад

      I saw your videos on token limit and embeddings but I want to combine these two ideas and ask a query by the help of kor library so that I can get the output in a structure format.

  • @TonyHoangPodcast
    @TonyHoangPodcast Год назад

    Is there a way to read an entire PDF with Langchain and Kor?

    • @DataIndependent
      @DataIndependent  Год назад

      Oh ya, big time, use a PDF loader and you’re good to go.
      In my “question a book” video I read a pdf this way

    • @TonyHoangPodcast
      @TonyHoangPodcast Год назад

      @@DataIndependent thanks watching that video right now.

    • @TonyHoangPodcast
      @TonyHoangPodcast Год назад

      @@DataIndependent after watching that video, do I need to use a vector database or can I just use the PDF loader and pipe that directly into Kor?

  • @wiktorm9858
    @wiktorm9858 Год назад

    Is there an existing tool that is cutting low-signal text?

    • @DataIndependent
      @DataIndependent  Год назад

      What kind of low signal text?

    • @wiktorm9858
      @wiktorm9858 Год назад

      @@DataIndependent this is term that you used for (probably) "filler words"; words that do not carry much of meaning

  • @thorthumb0031
    @thorthumb0031 Год назад

    pip install kor? his document doesn't specify...

    • @DataIndependent
      @DataIndependent  Год назад +1

      Yes! I don't run through the dependencies because it's different for everyone. Especially with sub packages.

  • @Teathebest0
    @Teathebest0 Год назад

    Hi may I know if it is working with LinkedIn?

    • @DataIndependent
      @DataIndependent  Год назад

      Totally - you just need to access their data somehow

  • @rolenle8794
    @rolenle8794 Год назад

    you painted!

  • @greendsnow
    @greendsnow Год назад

    It's just too expensive to offer a viable product with OpenAI.
    Ada-002 is $0.0004 per 1K tokens...