"How to give GPT my business knowledge?" - Knowledge embedding 101

Поделиться
HTML-код
  • Опубликовано: 17 дек 2024

Комментарии • 334

  • @AIJasonZ
    @AIJasonZ  Год назад +62

    A few people asked “why only vectorise one column instead of the whole csv?”
    Adding a few more explanation here:
    So vectorise is mainly for search, and the column to vectorise can be considered as “index” or “id” of the dataset; while the data it return will still be in question/answer pair;
    The reason I want to vectorise only one column is because:
    1. It save cost - vectorise using embedding model which means every token we vectorise generate cost
    2. It increase accuracy, in this case I want to only search for past customer email instead of sales response; search both column might return wrong answer “e.g. search for “interested in learning more”, it can return pair: “client: stop sending me emails; sales: understood, let us know if you are interested in learning more in future!”
    Hope this help!

    • @ozfish17
      @ozfish17 Год назад +1

      It seems Embedding enriches your search query. how about answers? In your example, do you 'train' llm with Q&A pair?

    • @AIJasonZ
      @AIJasonZ  Год назад +1

      @@ozfish17 yep, it return both Q&A pair!

    • @Taskade
      @Taskade Год назад +1

      Jason, brilliant step-by-step guide on knowledge embedding! Your breakdown of the process was super insightful. I'm curious about how AI Agents in Langchain perform, especially in long-running scenarios. Hope you'll consider diving into that topic in the future. Keep up the stellar content!

    • @sandeepbansal1195
      @sandeepbansal1195 Год назад

      So if you want the output response email to be generated by the LLM based on a specific tone, why wouldn't the 2nd column be a part of vectorizing the dataset?

    • @csss142
      @csss142 Год назад

      Hey Jason! What would be the best way to do this with financial PDFs? I want to ask questions and get accurate insights from the large documents. Would using embeddings be best or the fine tuning from your other video? Thanks! @AIJasonZ

  • @psychxx7146
    @psychxx7146 Год назад +44

    Small channels like this are the ones that hold the most values.

  • @Helpsmallbusinesses
    @Helpsmallbusinesses Год назад +105

    In 2 minutes and 54 seconds you explained what is vectoring better than any other video online. You made it easy. Thank you!

  • @funkyboodah
    @funkyboodah 10 месяцев назад +6

    man you have a really rare ability to explain super complicated things in a very simple way and organize the information so it's even more clear. Bravo and thank you

  • @humadi2001
    @humadi2001 9 месяцев назад +1

    I've watched many video on this topic and I can say that your simple examples has covered most of what I need to know. Thanks Jason.

  • @sidavidsin
    @sidavidsin Год назад +28

    Thank for sharing your knowledge with us, your channel is literally a gold mine of information. Keep doing what you doing, Jason!

  • @verasalem5071
    @verasalem5071 Год назад +35

    Love your content, very easy to digest and understand. The only recommendation I would give is to use other embeddings and LLM models besides OpenAI. Mid/Large sized companies cannot use OpenAI in their environment because of legal issues around OpenAIs data retention policy. Alot of companies want to develop their own implementations so including other models like Llama 2, Vicuna, etc would allow you to reach a bigger audience.

    • @AIJasonZ
      @AIJasonZ  Год назад +5

      yea great points, thanks for the recommendation! totally get that company dont want to send any data to OpenAI LOL

    • @Ascended23
      @Ascended23 Год назад +2

      +1 for using more open models. I love your content and the approach you take to your videos. But even though I'm not a big company I just value using systems that are open instead of closed.

  • @averagegamer9513
    @averagegamer9513 Год назад +26

    Great video as always, Jason. Thank you for making one of the few channels with genuine AI tools video that actually demonstrate implementation and applications rather than hyping up the content through sweet talk then simply dropping an affiliate link.

    • @devklepacki
      @devklepacki Год назад +4

      This! I feel so grateful that the RUclips algorithm blessed me with Jason's channel. Beautiful explanations and clear steps.

    • @koen.mortier_fitchen
      @koen.mortier_fitchen Год назад +1

      Yeah, he's one if the real ones. I've asked him if he could add a github for the code. It's the only thing this channel lacks imo.

    • @frankchangshow
      @frankchangshow Год назад

      @@devklepackisame feelings here

  • @_arman_
    @_arman_ Год назад +6

    Man... you have a serious gift for teaching! This is super helpful. Thanks.

  • @muhammadanasazambhatti2772
    @muhammadanasazambhatti2772 Год назад +6

    Thank you very much! Nobody explained Embedding and Vectorization like this! Thank you again!

  • @photon2724
    @photon2724 Год назад +23

    Anyone looking to make a great startup in AI,you have to jump on this!

  • @shivamroy1775
    @shivamroy1775 Год назад +9

    Absolutely great video, I loved that you took the time to explain everything in theory and then went on to give a detailed walkthrough of the code. Please keep posting such videos !

  • @fuxxs5994
    @fuxxs5994 Год назад +21

    I really love your style, first explaining the theory and then demonstrating it by an example

  • @normanluismadrid422
    @normanluismadrid422 Год назад +3

    this is virtual gold, mad props to jason for clearly describing complex topics and even showing practical application, saved me hours of research lol, it'd be great if you can touch up on the various services out there that offer AI services that embed, and how they compare in performance, pros / cons etc.

  • @rbdon5607
    @rbdon5607 Год назад +1

    Thanks! What do you see as the major pros and cons doing it through coding your own versus a platform such as Relevance AI?

  • @_yasser
    @_yasser 5 месяцев назад

    This is my new favorite channel. The topics are pretty dense and dry - but you make them super easy and fun to learn. Thank you!

  • @nguyenvanduc2000
    @nguyenvanduc2000 7 месяцев назад

    I have the same idea in mind. I have tons of product documents that I wish I could just ask an agent something about it instead of scrolling hundreds of word pages. I really appreciate your video man.

  • @davidkwon1233
    @davidkwon1233 Год назад +3

    one of the best channels out there, really appreciate your content!

  • @devinoutfleet1998
    @devinoutfleet1998 Год назад +1

    Bro... you are incredibly smart and are a great teacher. This is going to provide 10x value to my users

  • @VaibhavShewale
    @VaibhavShewale Год назад +1

    this is just awesome, now people who didnt had idea now dont only have idea but also reference

  • @PlectrumShorts
    @PlectrumShorts Год назад +2

    Great tutorial! You covered a LOT of ground quickly, but thoroughly. Haha. Nice work.

  • @jasonfinance
    @jasonfinance Год назад +3

    the best video about embedding ive seen; thank you!

  • @Optable
    @Optable Год назад +2

    You have helped the community so much with this valuable content. Keep it up my friend, i'll be watching!

  • @stepkurniawan
    @stepkurniawan Год назад +2

    yo bro.. i really like when you explain all the step-by-step and all relevant tools out there! thank you!

  • @farid3101
    @farid3101 9 месяцев назад

    I am really surprised that these tools can help so many businesses doing the low-cost and autonomous response specifically for customer service! Great video!

  • @SaminYasar_
    @SaminYasar_ Год назад +4

    Keep it up man probably one of the only channels with incredible value

  • @growthub8541
    @growthub8541 Год назад +3

    So helpful! I started using relevance ai because of your videos & just as a no-code developer been able to build some sick ass LLM chains with Zapier Custom HTTP Requests.
    I have my development team even using it & it’s definitely speeding up our velocity to iterate🙌🔥

    • @AIJasonZ
      @AIJasonZ  Год назад

      thats great to hear! 🤘

  • @JJ-vq8mu
    @JJ-vq8mu Год назад +2

    Great job and appreciate a lot on sharing your knowledge. Looking forward for Open LLM content.

  • @christhornham
    @christhornham 8 месяцев назад

    Outstanding. Your ability to explain complicated topics is incredible. Thank you.

  • @camach28
    @camach28 Год назад +8

    It would be amazing if you could make a video creating a knowledge base using long pdfs as source,, and use gpt as well to make an expert assistant in a topic.

    • @frankchangshow
      @frankchangshow Год назад

      Yes like if the data source is like a book and we want to search the contents in it giving relative data like “I remember this part of the book saying something like this… where was it?” … or “the book had this story … where was it and the main ideas”

  • @Gingeey23
    @Gingeey23 Год назад +8

    Great video Jason, however the biggest challenge for companies will be ensuring that commercially sensitive information isn't fed into hosted LLM models due to security concerns. Would be really interested to see how you would approach this challenge, and potentially try to deploy this tool locally? keep up the good work!

    • @AIJasonZ
      @AIJasonZ  Год назад +7

      Thanks mate! Yea I agree, I heard business talk about sensitive information a lot, especially ones with clients data;
      There are 2 ways I see it can be solved now:
      1. Self host LLM, using Azure self host version or even using open source models; so you don’t send info to openai
      2. Anonymoulyse your input/output data, so openai don’t have a clear idea that data A is from company A;

    • @devklepacki
      @devklepacki Год назад +2

      If using hosted LLM like OpenAI's this would probably 1. require just a lot of manual work with clearing all the data or 2. first pushing the data through lighter local LLM with a task to clear any sensitive information (like they used one LLM to create training prompts for another LLM). Just a thought, tho

  • @koen.mortier_fitchen
    @koen.mortier_fitchen Год назад +2

    Thanks for your work Jason. You're one of the best, and I follow tons.

  • @rahuliyer6007
    @rahuliyer6007 Год назад

    Came here after the fine tune model video - looking for exactly this. Thanks!

  • @pietdebeer7972
    @pietdebeer7972 Год назад +1

    I'm blown away. Thank you!!

  • @michalf16
    @michalf16 Год назад +1

    Love your content good sir, tuned for all next videos you are the leader

  • @stevi32800
    @stevi32800 Год назад +2

    I really like your video. You knows how to reach the people attention. Please make more videos like this 😊

  • @coldestlin
    @coldestlin 10 месяцев назад

    当中间向量查询的结果出来, 一下子就了解了整个流程, 非常赞. 原来是拿向量查询的结果, 再去扔给llm, 当作promt instruction, 然后让llm给出答案.

  • @Ozla102
    @Ozla102 Год назад +2

    The video is very inspiring and straightforward, a valuable lesson

  • @TheDestint
    @TheDestint Год назад +2

    This is super duper helpful man ! Great work and thanks !

  • @maciejbalasinski2419
    @maciejbalasinski2419 Год назад +1

    Thanks for No coding alteratives

  • @shethromesh
    @shethromesh Год назад

    Loved to see similar demo of knowledge search with open source models not with openai models

  • @kurtcampher4716
    @kurtcampher4716 Год назад

    thank you for this
    As a dev with no AI experience, you really make it easy to understand

  • @kiraakamaru
    @kiraakamaru Год назад +4

    This is exactly what I was looking for, I have a question Jason: How can we secure our company personal data?

  • @half_way_expert
    @half_way_expert Год назад +3

    Another great video! Thanks Jason, keep up the excellent work

  • @naimneman
    @naimneman Год назад +2

    Amazing video Jason! Pretty useful information. I would love to see a video about GPT4All as a personal assistance for everyday life.

  • @AndrejsKarpovs
    @AndrejsKarpovs Год назад +2

    I have a couple of questions:
    1) I have 0 knowledge about Vector databases, but don't you need to define some kind of access related information, connection string, username/password, etc. to use it? Did you define it in .env file?
    2) How does this method compare to PEFT/LoRa? Does it basically achieve the same thing? It looks like embeddings can be a faster solution

    • @AIJasonZ
      @AIJasonZ  Год назад +1

      Hey mate!
      1/ if you want to have the vector database stored on managed cloud solution like pinecone, then yes you can create account and use them; in this example I used Faiss, which is not a managed database solution; so it just store on your local machine
      2/ so Lora or other fine tune solution as I mentioned at the beginning is more use case of getting LLM behave in certain way (e.g. digitise someone), while embedding is useful for knowledge retrieval (e.g. Q&A on your own data)

    • @Fiop22
      @Fiop22 Год назад +2

      I haven’t watched the video, but to answer your first question if you’re using a cloud service like pinecone then yes. Alternatively, you can store the embeddings locally as a .csv for example and perform the lookup via cosine similarity with numpy for example.

  • @ludwigvanbeethoven61
    @ludwigvanbeethoven61 Год назад

    I wonder why those AI channels, like yours, are not exploding. This is so important for the future what you all are doing. Only a few people get this!

  • @Artificial_Noob
    @Artificial_Noob Год назад +7

    Great video man! I hope you can cover more "No Code Methods" for beginners like me that are not very technical! The last part of this video was GOLD for me. cheers!

  • @gautamdawar5067
    @gautamdawar5067 Год назад +1

    This is pure gold. Thank you so much!

  • @wojpaw5362
    @wojpaw5362 Год назад

    Absolutely outstanding. I liked, subscribed and shared. Best explanation of knowledge embedding I have come across!!!!

  • @manojnaidu619
    @manojnaidu619 7 месяцев назад

    Cannot be more valuable than this. Loved it 🎉

  • @aliyousefi9735
    @aliyousefi9735 Год назад +1

    you're the man Jason, great content!

  • @shrvn110
    @shrvn110 Год назад +2

    this dude is on FIRE 🔥

  • @ristopaasivirta9770
    @ristopaasivirta9770 Год назад +2

    My friend. You have an uncanny ability to teach AI science and concepts to us pepegs.
    This and your other videos are really good at explaining on how the neural networks work, not just how to do the thing.

  • @RichardGetzPhotography
    @RichardGetzPhotography Год назад

    Thanks!First, you have lorem ipsom text on your contact page
    (1) I have a lot of general documentation, not Q&A. Do you have a video of best practices for that workflow?
    (1b) I have a lot of docs (PDFs) that has text on how to do something and images of where to do it. Not so much Q&A. How affective is embedding at deriving answers from documents or should that be fine-tuned?
    (2) I had GPT write me an embedding script straight from python, can you do a video on that vs openAI embeddings or others? Best practices for embeddings?
    (2b) I presume I should take large docs and break that up into subjects (chapters) to embed?

    • @elijahbock4357
      @elijahbock4357 4 дня назад

      Hi Richard, did you ever figure this out? I've spent weeks trying to do something similar to fine-tune a model to assess academic writing for APA format

  • @KarlJuhl
    @KarlJuhl Год назад +3

    Great resources Jason, I will add to the flood of comments - you are a great communicator and you move at a good speed. Thanks for sharing!
    It is interesting how many langchain UI apps are being built. Relevance AI looks to be the most integrated from end to end, with such an easy deploy process.
    I am curious to know your thoughts on using a UI tool like flowise or relevance AI versus custom programming.

  • @T0mstyle
    @T0mstyle Год назад

    Really like the video, but the prerequisite knowledge makes it hard to follow. I have no idea what an .env file is (6:50), so it would be great if you could hint at where I could start.

  • @ridg2806
    @ridg2806 Год назад

    Really high quality content, thank you Jason!

  • @chrisvienneau3366
    @chrisvienneau3366 Год назад +1

    Great content and love the intros

  • @MrDe0
    @MrDe0 Год назад +1

    This is GOLD !!
    Thank You !

  • @robertcormia7970
    @robertcormia7970 11 месяцев назад

    Very well done! Straightforward to follow!

  • @aliq6709
    @aliq6709 8 месяцев назад

    This was super helpful. Thank you, Jason!

  • @manideepatalukdar9201
    @manideepatalukdar9201 Год назад +1

    Great video! Very simple to understand.

  • @AI_Ron
    @AI_Ron Год назад +2

    These are gems

  • @rimilien
    @rimilien 4 месяца назад

    Thank you my friend! Awsome work!

  • @AssassinUK
    @AssassinUK Год назад +2

    This was 🔥🔥🔥. If I hadn't already subscribed, I would have. Excellent use case! Looking to impliment this using Flowise.

  • @nealshah5874
    @nealshah5874 6 месяцев назад

    This is the greatest video ever created

  • @IanTrolinger
    @IanTrolinger Год назад

    this is the best video on your channel.

  • @xulipaTV
    @xulipaTV Год назад +1

    You are the man Jason!

  • @DeLeizard
    @DeLeizard Год назад +3

    Thank you for the super video. I'm learning LLM and am quite confused between knowledge base embedding, that was mentioned, vs prompt tuning. Could you tell me the difference?

  • @davide.2349
    @davide.2349 Год назад +1

    Jason you are awesome!

  • @kylelau1329
    @kylelau1329 Год назад

    have been waiting for this video, Thank you!

  • @karankanchetty8320
    @karankanchetty8320 9 месяцев назад

    Great job. You deserve more subscribers.

  • @markieuanroberts
    @markieuanroberts Год назад +1

    Awesome explanation, thanks.

  • @AlessaOxygen-ot4rl
    @AlessaOxygen-ot4rl 11 месяцев назад

    This is hilariously good. Thanks for this wonderful ressource!

  • @patriciodiaz2377
    @patriciodiaz2377 Год назад +1

    Thanks a lot for the info!! Greetings from Mexico 🤙

  • @rverm1000
    @rverm1000 Год назад +1

    great video! is that enough info to go out and start building a customer response ai for other people or businesses?

  • @CyberSQUID9000
    @CyberSQUID9000 Год назад +1

    More excellent content, thanks mate

  • @Pedro-ps1rk
    @Pedro-ps1rk Год назад +1

    Great video mate! But I was wondering one thing here. You said something about using personal data/ private data, but as you are using openai service, aren't you "sharing" your data with openai?

  • @aibeginnertutorials
    @aibeginnertutorials Год назад +4

    Hey Jason thanks for the always excellent presentations and information. The Streamlit and RelevanceAI information were interesting and useful. Relevance reminds me of another great product, Flowise.

    • @frankchangshow
      @frankchangshow Год назад

      I don’t know if I should use stack ai, relevance ai, or flow wise. Going into decision fatigue now

  • @davidwylie8491
    @davidwylie8491 Год назад +1

    Amazing! Thanks for sharing

  • @satyamgupta2182
    @satyamgupta2182 Год назад +2

    Thank you so much for your video. Its very helpful.
    At the same time, is there a way to run this with Llama-2 or other open source LLM's?
    Edit: If security is my main concern, how do I go about embedding?

  • @arunkabilan
    @arunkabilan Год назад +1

    Great explanation

  • @tahunal
    @tahunal Год назад +1

    Bro you are awesome.

  • @TheRcfrias
    @TheRcfrias 7 месяцев назад

    hey @AIJasonZ, great video! I wonder if you could create another video where you could combine 1) Fine Tuning, 2) Knowledge base AND 3) API Data. The scenario is as follows: "I want to respond to a customer with my style of writing (Fine Tuning) about the services I provide (Knowledge Base) and my available schedule for a demo (Schedule API)". Is this something we can mix together? or how would we tell the model it should suggest a demo meeting?

  • @AfeezAzizTV
    @AfeezAzizTV Год назад +2

    Hi Jason, is there an alternative for OpenAI Embeddings? Because if possible, id use opensource projects rather than using OpenAI.

  • @tauraik
    @tauraik Год назад +1

    Amazing content my guy Amazing

  • @ozfish17
    @ozfish17 Год назад +1

    Great video Jason! In the sample you shared, does the llm get trained every time you have a new message? Or you train it once, then you can ask multiple questions?

    • @bobwilkinson8053
      @bobwilkinson8053 Год назад

      I have the same question. Did you find the answer?

  • @Scooterboy_and_others109
    @Scooterboy_and_others109 Год назад

    At 7:35 in video you said you need not do TEXT Splitting (do it only if the input file content is huge)
    Is there a way to know (in advance) by some code if the threshold has reached and I need to do TEXT Splitting?
    In your case CSV file had only 219 rows, but how can I know 219 rows has NOT crossed that threshold limit?

  • @YangYang-rh8uy
    @YangYang-rh8uy 9 месяцев назад

    Exactly want I want , thanks Jason.

  • @Grumptr0nix
    @Grumptr0nix Год назад

    This is exactly what I was looking for... I have a tremendous amount of assets (Requirements docs, project plans, etc) that we've created over and over for all our engagements, and I'm trying to find a way for us to stop reinventing the wheel. All of which are in our Google Drive, but I'm having trouble conceptualizing how I'd be able to turn that into vectored data (you talk about text splitter, but I'm still a bit confused about its application). Anyways, I'll do more research but this is amazing content. Thank you.

    • @Grumptr0nix
      @Grumptr0nix Год назад +1

      And for sure, the legal issues with our business data and OpenAI that is discussed in other comments have been a blocker for us as well, but at least there's options.

  • @BillVoisine
    @BillVoisine 5 месяцев назад

    Thank you Jason, this is awesome an very helpful!

  • @facundozupel4166
    @facundozupel4166 Месяц назад

    Jason, first of all, this video is very very clear and well explained, so thank you and congratulations on such great content! already subscribed. I have a question about how to evaluate if the context it´s being used correctly. I know langchain have some retrieve functionality, is it worth checking it out?

  • @groccy
    @groccy Год назад +4

    Thank you for making these great contents, Jason! You literally created a gold mine for LLM practitioners. Really appreciated it! Any chance we can find your codes taught in this video online?

    • @AIJasonZ
      @AIJasonZ  Год назад +3

      Hah I had a hard time to define my audience, and LLM practitioner is kinda perfect! Sure thing, I will open up the GitHub link soon

    • @groccy
      @groccy Год назад

      @@AIJasonZ Thanks. Can’t wait!

  • @takeshikriang
    @takeshikriang Год назад +1

    Great video, subscribed.

  • @adi2hot
    @adi2hot Год назад +1

    Fantastic content, thank you.

  • @ZYLON22
    @ZYLON22 Год назад +1

    Hey man! Always great content you have🤙
    Planing an app that can compare similar documents.
    In case every time I upload a new document the llm can tell what have changed comparing to the last one.
    Can I do this by using the same tools you used in this video?

    • @AIJasonZ
      @AIJasonZ  Год назад

      Hey man, is it like comparing let’s say 2 invoices? And see if the number matches?
      In that case I think you can do it in relevance ai, they allow you to upload docs and extract data!

    • @ZYLON22
      @ZYLON22 Год назад

      @@AIJasonZ Sorry for the late reply! I will check it out!
      But actually it's more complicated, it's about law documents with nearly the same content that's getting updated from time to time.
      So as far as I understood this I have to train the AI with embeddings and when an updated document comes out the AI will see the difference and point it out?

  • @mike8677
    @mike8677 Год назад +1

    Thanks ! when will this be on Github ?

  • @ivant_true
    @ivant_true 8 месяцев назад

    you make really useful videos man

  • @ZorinsFactFrenzy
    @ZorinsFactFrenzy 8 месяцев назад

    Hey @AIJasonZ, great video! I'm curious, is there a method to retrieve the confidence level from the embeddings? Since it's possible that not all the information will be present in the embeddings, it would be helpful to have a way to handle such scenarios. For instance, if certain information is missing, perhaps the system could respond with "response not found" or trigger another action like calling an API.