LangChain 101: Ask Questions On Your Custom (or Private) Files + Chat GPT

Поделиться
HTML-код
  • Опубликовано: 15 июл 2024
  • Twitter: / gregkamradt
    Newsletter: mail.gregkamradt.com/signup
    See how to upload your own files to Chat GPT using LangChain.
    In this example we are using text files, what other files types do you want to see? Google? Notion? Email?
    Follow me on Twitter
    Personal: / gregkamradt
    Github code: github.com/gkamradt/langchain...

Комментарии • 373

  • @liamboyd4676
    @liamboyd4676 Год назад +5

    Great video! I would love to see some in-depth walk through of the various chain types and agents, along with examples to help clarify usage. Thank you.

  • @liameiternes5744
    @liameiternes5744 7 месяцев назад

    Thanks Greg, found your video from reddit. Extremely well explained, great work!🤗

  • @adityalakkad8795
    @adityalakkad8795 Год назад +35

    I would love to see a video where I can build same thing with open source alternative tech, being a student that would be very much helpful to build some capstone projects.

    • @DataIndependent
      @DataIndependent  Год назад +6

      Sounds good! I'll add that to the list.

    • @chrisriley6264
      @chrisriley6264 Год назад +4

      ​@@DataIndependent If you switch to open source I'm fairly certain you will increase access and grow faster as a content creator. I only use only source tools and will wait for someone to create an alternative. I tend to seek out creators that use these tools by default. It's good to know about the tech of course but if I can't own it I don't use it daily and don't usually recommend it unless it's business related.

    • @konstantinlozev2272
      @konstantinlozev2272 Год назад

      ​@@chrisriley6264 There is PrivateGPT with ChromaDB and Vicuna LLM. Fully local and free. But it's not too impressive ATM on its own.
      I think with some tweaking and optimisation, it will become useful.

  • @mlg4035
    @mlg4035 Год назад +1

    Freaking awesome video, all stuff, no fluff!! This will help me get started quickly, thank you so much!

  • @user-or7tq1ie9f
    @user-or7tq1ie9f 9 месяцев назад

    This is such a good combination of accessible and technical. Thanks so much for what you're doing! As a technical person very new and exploring this space, your content is so so helpful

  • @tsukinome8153
    @tsukinome8153 Год назад +5

    An amazing video, I learned a lot thanks to this. LangChan's documentation is very difficult for me to understand, but thanks to your video I feel much more confident to start experimenting with this library.

    • @DataIndependent
      @DataIndependent  Год назад +1

      Awesome, I'm glad it worked out. Let me know what other videos or questions you have

  • @michaelzumpano7318
    @michaelzumpano7318 Год назад +2

    Excellent demo! Very straightforward programming.

  • @hasani511
    @hasani511 Год назад +1

    Your videos are extremely helpful. Looking forward to more

    • @DataIndependent
      @DataIndependent  Год назад +1

      Nice! Thank you - what else do you want to see?

  • @hugomejia7826
    @hugomejia7826 Год назад +1

    Great series about LangChain a must see!

  • @dawidanio7053
    @dawidanio7053 Год назад +1

    That was great, thanks a lot for your knowledge and for your work. Great job!

  • @daethyra
    @daethyra 9 месяцев назад

    Your videos are so amazingly well done. Thank you for your hard work!

  • @skillerd7429
    @skillerd7429 Год назад

    So cool! Can't wait for the next video

    • @DataIndependent
      @DataIndependent  Год назад

      Nice! Love this - What would you like to see?

    • @skillerd7429
      @skillerd7429 Год назад

      @@DataIndependent It would be excellent to learn about how LangChain modules works behind the scenes 🤯

  • @markvosloo8833
    @markvosloo8833 Год назад +1

    Amazing video, thank you!

  • @bdtrauma01
    @bdtrauma01 Год назад +10

    I am new to AI, but have an interest in AI application for healthcare, specifically in the emergency department. Your explanations are very easy to follow ... great job!

    • @DataIndependent
      @DataIndependent  Год назад +1

      Nice! Glad to hear it

    • @joeqi4677
      @joeqi4677 Год назад

      @@DataIndependent Really, this is a useful video. Do you have any suggestions on getting the embedding value from variable "embeddings" or "dosearch"? I tried to find answers from openai library codes or langchain documents but didn't get a clear answer.

  • @abhishekjayant9733
    @abhishekjayant9733 8 месяцев назад

    The underrated ninja of the AI space. Amazing content and explanation. VERY USEFUL.

  • @boratsagdiev6486
    @boratsagdiev6486 Год назад

    thanks so much!

  • @elsenorguerric
    @elsenorguerric Год назад

    Great video thanks a lot :)

  • @ubaidghante8604
    @ubaidghante8604 Год назад

    Great video man 💙💙🔥

  • @cosmotxt680
    @cosmotxt680 Год назад

    Thanks, your video was very helpful

  • @davidl3383
    @davidl3383 Год назад

    Very helpful. Thanks a lot

  • @user-hv4ku2pz4z
    @user-hv4ku2pz4z Год назад

    very informative, tnx!

  • @abhijitbarman
    @abhijitbarman Год назад

    I am really enjoying your content on Langchain. Its awesome. I was hoping if you could create videos around Vicuna LLM...What it is, How to fine-tune or train Vicuna LLM on custom dataset.

  • @shanx1243
    @shanx1243 Год назад

    Thanks for making these videos! I'd like to see FAISS in action with this kind of stuff.

    • @DataIndependent
      @DataIndependent  Год назад

      Awesome thanks for sharing. I'll see if I can slot in a video for it

  • @Crowward92
    @Crowward92 Год назад

    man this guy is a GOAT

  • @akiffpremjee
    @akiffpremjee Год назад +1

    This was super helpful! Would love a video on using pandas dataframe as a loader. It loads fine with one column but having multiple columns causes issues with chroma for some reason

  • @remicomte-offenbach73
    @remicomte-offenbach73 Год назад

    Thanks for the great video !

    • @DataIndependent
      @DataIndependent  Год назад

      Glad you liked it! What else you do you want to see?

    • @remicomte-offenbach73
      @remicomte-offenbach73 Год назад

      ​@@DataIndependent The use case I am currently trying to figure out is creating a Q&A chatbot that uses documentation that lives in a large (700+ pages) Notion Database.
      The current approach I have in mind is: load the Notion content in a new github repo (refreshed on a weekly basis), use Langchain to load the text, split it in chunks, embed it, and store the vectors in Pinecone to then query it with user question. We'd also need to dynamically add info about our customer in the prompt. It pretty advanced but a "how to" would be absolutely AMAZING! And a game changer for my company!

  • @JT-Works
    @JT-Works 10 месяцев назад

    Great video series, keep up the amazing work!
    Also, you kind of look like Bradley Cooper. I figured that was a decent compliment, so I thought I would toss it your way. Have a great one.

  • @tubehelpr
    @tubehelpr Год назад +2

    Exactly what I was looking for - thank you!

    • @DataIndependent
      @DataIndependent  Год назад

      Nice! Glad to hear it

    • @tubehelpr
      @tubehelpr Год назад

      @@DataIndependent Haivng a hell of a time getting the imports and dependancies working correctly...

    • @DataIndependent
      @DataIndependent  Год назад

      @@tubehelpr same. It was a pain and didn’t record that part for y’all

    • @tubehelpr
      @tubehelpr Год назад +1

      @@DataIndependent Can you share some tips? Python version? I can't seem to get past the import nltk line since it throws an error but I have everything installed in my virtual env 🤔

    • @DataIndependent
      @DataIndependent  Год назад

      @@tubehelpr what’s the error say?

  • @jgill0
    @jgill0 Год назад +3

    Excellent walk through, thanks! If you have any interest in the elasticSearch integration I'd love to see a video on that :)

    • @DataIndependent
      @DataIndependent  Год назад +2

      Nice! Thanks for the comment. I need to explore other document loaders and custom agents before I try out more vectorstores or dbs. I jotted this down on my ideas list.

  • @proudindian3697
    @proudindian3697 Год назад

    Thank you so much..!

  • @danielsommer6655
    @danielsommer6655 Год назад +1

    Super helpful! Thanks so much. It looks like OpenAI made a change and depreciated VectorDBQA and suggested RetrievalQA.

    • @DataIndependent
      @DataIndependent  Год назад +1

      Langchain made that change and yep that is what they recommend now!

    • @danielsommer6655
      @danielsommer6655 Год назад

      @@DataIndependent Any suggestion for how to update the jupyter notebook? Thanks!

  • @arigato1901
    @arigato1901 Год назад +21

    Excellent video, thank you so much!
    I have concerns about data privacy tho. It would be great to do the same with another model that could run locally, like alpaca... would it be possible?

  • @nattapongthanngam7216
    @nattapongthanngam7216 3 месяца назад

    Thanks Greg! Great video on using custom files.
    Could you share a video about RAGs? I heard there are many types and I'd love to learn which is best for different tasks.

  • @FRANKWHITE1996
    @FRANKWHITE1996 Год назад

    Subscribed ❤

  • @kuntalpcelebi2251
    @kuntalpcelebi2251 Год назад +1

    I have got a better answer :D "' In 1960, John McCarthy published a remarkable paper in which he showed how, given a handful of simple operators and a notation for functions, you can build a whole programming language. He called this language Lisp, for "List Processing," because one of his key ideas was to use a simple data structure called a list for both code and data.'"

  • @hetthummar9582
    @hetthummar9582 Год назад

    Awesome video!! Really enjoyed it. 😃
    It would be cool if you can make a video on langchain vs gpt-index.

  • @blocksystems202
    @blocksystems202 Год назад +2

    Dude this amazing - thanks so much. Can you also do a tutorial, building an application that interfaces with this? say uploading a doc into an app etc.

    • @DataIndependent
      @DataIndependent  Год назад +1

      Nice that sounds fun. Check out my video on building apps which may help
      ruclips.net/video/U_eV8wfMkXU/видео.html
      I believe Streamlit is easy to work with on files.

  • @coachfrank2808
    @coachfrank2808 Год назад +1

    I am highly interested in the implementation of a T-5 and different ways for textsplitting without losing context. A tour would be greatly appreciated..

    • @DataIndependent
      @DataIndependent  Год назад +1

      Nice! Sounds good and I'll add this to the list.

  • @JasonMelanconEsq
    @JasonMelanconEsq Год назад +4

    I understand how you are using CharacterTextSplitter to chunk the data into 1000 characters. What code could be used to chunk by each page, and also if you wanted to chunk every time a particular word arises, such as "Question:"? I'm asking because I would like to create embedding for each question and answer in a deposition. Many thanks for the great videos! This helps even a non-coder understand the overall process. Excellent job!

    • @DataIndependent
      @DataIndependent  Год назад +10

      Nice! That’s fun. The splitter takes an argument “separator” which you can customize and it’ll split your documents by that thing.
      In your case you can have it look for “question:” and split there.
      It’s not ideal and you may need to do some string clean up but it’ll work

    • @esltdo1
      @esltdo1 9 месяцев назад

      thats what i do, sometimes gotta pull out the ol regex especially when chunking code, thankfully gpt makes that not an afternoon of regex googling anymore lol@@DataIndependent

  • @caiyu538
    @caiyu538 11 месяцев назад

    great

  • @PabloBossledaLuz
    @PabloBossledaLuz Год назад +1

    Thanks for such a great and informative series! Please keep bringing us more content about Langchain.
    Do you think it's possible to have a chat about the content you input (instead of simply asking a question about it)?

    • @DataIndependent
      @DataIndependent  Год назад +1

      Could you describe more about what you mean?
      To create a chat bot for the data you input?

    • @PabloBossledaLuz
      @PabloBossledaLuz Год назад +1

      @@DataIndependent Yes, like a chat bot. For example, suppose you input a piece of content about your company or even a book, something GTP doesn't know about.
      What I'd like to do is not only to ask questions about it but also have GPT asking me questions and assesing my answers, in a chat-like way.
      The conversation could be pre-structured, like:
      "Let's have a 10min chat about the content above. I'd like you to ask me a question about it, then assess my answer, providing me the right answer in case my answer is not corrrect. Right after that, please repeat the process, asking me another question and correcting me, till the end of the 10min chat timeframe".

    • @avidrucker
      @avidrucker Год назад

      This sounds amazing. Can ChatGPT already do this?

    • @avidrucker
      @avidrucker Год назад

      *for content/topics it is already familiar with

  • @ivangouvea4195
    @ivangouvea4195 Год назад +2

    Amazing video! Would be great to see a version with an open-source model such as Alpaca/LLaMa. Does anyone know if it is available/possible?

  • @nanti_dulu
    @nanti_dulu Год назад +1

    Hi, thank you for the great content😆! This is something I can't ask ChatGPT for help so it's really helpful!
    By the way, does this code work with a longer document? I used a 150-page pdf document and it exceeded the token limit. It worked fine with a shorter pdf. Thank you!

  • @robertovillegas2220
    @robertovillegas2220 20 дней назад

    I have a use case with a haven't seen anywhere: I create a private GPT that has documents as contexts. This documents has criteria for a specific subject I give it system instructions so its function is to evaluate a user document that it's attached as a part of the prompt, to see if the document complies with the criteria in the context documents, and give detail response on the result of evaluation and the justification that includes, the content of the user document and the criteria in the context documents. I want to do that in LangChain but I don't know hot to add a user document as a part of the prompt for the RAG.. It would be great if you can you explain how to approach this implementation. Thanks you for the content!! Keep the good work.

  • @caiyu538
    @caiyu538 9 месяцев назад

    Great

  • @rkthebrowneyedboy1
    @rkthebrowneyedboy1 Год назад +1

    Great video and thanks so much for simplifying the complex parts. Btw is there a way to create multiple indexes or collections in chromadb and use that index to limit the search to set of documents? i havent seen anywhere its defineable in your code. would be great if you could clarify. My best,

    • @DataIndependent
      @DataIndependent  Год назад

      You could create multiple indexes, but I'm a fan of adding metadata to your embeddings that you can filter on. That way you can keep your data tidy.
      If the data is on completely different projects or topics then it may make sense for separate indexes.
      Check out the documentation on how to do this

  • @user-vc2sc9rq7t
    @user-vc2sc9rq7t Год назад +1

    Thanks for the great tutorial! For multiple documents, can you please advise on how i can retrieve the file name where the contextual information is retrieved from?

    • @DataIndependent
      @DataIndependent  Год назад +1

      LangChain has the functionality to give you an answer w/ sources which should help. Check out their documentation.

  • @yili2419
    @yili2419 Год назад +4

    bad idea. open ai will keep your query and response (which includes your personal data) in their datacenter, and used it to train their next model.

    • @CelestialEnlight
      @CelestialEnlight 6 месяцев назад +1

      Taht is only reason stopping me from going all in on chatgpt , is there an alternative? Which can help me , i don't want to share all information with chatgpt at the same time don't have money to make my own locallly host gpt

  • @3niusYT
    @3niusYT Год назад

    Hi, Thank you for this comprehensive playlist as well as your good mood. Do you have an opinion on why using Llama-index over what you are doing with Langchain directly?

    • @DataIndependent
      @DataIndependent  Год назад

      I don't have an opinion! Both are great tools for their own use cases. I encourage you to experiment with both and try them out

  • @Red-fu3gb
    @Red-fu3gb Год назад +4

    Really helpful for me, thank you! I also heard about GPT Index , but don't know the difference between LangChain with GPT Index. Is it possible to see more details about the comparison?

    • @DataIndependent
      @DataIndependent  Год назад +5

      Totally - Let's do a video on this. Thanks for the tip

  • @TRSTNGLRD
    @TRSTNGLRD Год назад +2

    This is awesome - how am I able to use this with a Davinci Model for more in-depth responses? Can you do a second video on Fine-Tuning this system to reduce Hallucinations, create more complex responses, and ask it more in-depth questions?

    • @DataIndependent
      @DataIndependent  Год назад +3

      Ya sounds great - What is your use case that you want to run through? It's always better with a real world example

    • @TRSTNGLRD
      @TRSTNGLRD Год назад +1

      @@DataIndependent RUclips Transcripts; I have a course that’s divided into 12 Months, all adding up to a total of 118 .txt files for Transcripts. I’d like to be able to create a “tutor” if you will, one where I can ask questions about the contents of the course if something confuses me
      I’ve made one that does this, but the absolute main issue I’ve found is structuring the Transcript data…
      The bot cannot interpret raw Transcripts all too well, so I realize I may need to reformat them into something like a Knowledge Graph for each Lecture. What would be the best way to structure/format a Transcript for this use case? The issue is that minimal data should be lost when re-formatting so the bot isn’t lacking any information that’s already been discussed. This has been my biggest issue

  • @Harshanalluru_3
    @Harshanalluru_3 Год назад

    Gold

  • @ynboxlive
    @ynboxlive Год назад

    This is great. How do you force it to use a specific Open AI model such as gpt-3.5-turbo?

  • @gregoryosianturi6192
    @gregoryosianturi6192 3 месяца назад

    HI Greg , thanks for your tutorial video. I am new to AI and I am wondering what if at first we want to explore private folders/database consist of multiple pdf with the chatbot (as search engine) then would like to ask question to specific document retrieved by the chatbot earlier, so when we ask questions to the bot it focus to search on selected document rather than the entire folder/database. Is it possible to apply metadata to address each chunks sources or maybe using the "agent" framework for this case? Thank you

  • @user-gn4ln5rn2p
    @user-gn4ln5rn2p Год назад +1

    sir can we implement same module in nodejs frame work?
    i tried but some of required modules are not available in nodejs

  • @angelfeliciano8794
    @angelfeliciano8794 Год назад

    Amazing video. Any idea if this method work with information stored on my local Mysql database?

  • @coachfrank2808
    @coachfrank2808 Год назад +1

    Hi! We are building a research tool for german subsidy guidelines, which are very eloborate and hard to understand for the SME target group. Your video is really helpful. We have an excel sheet with 1800 state subsidies waiting to be simplified for over than 1.3 million SMEs making up over 60% of Germanys GNP. If you could tap into the processing of excel sheets and information extraction with source documentation me and my team would profit immensely. Keep up the step by step explainations. You are doing a phenomenal job teaching.

    • @DataIndependent
      @DataIndependent  Год назад +1

      I’ll do a tutorial on that. Shouldn’t be too difficult.
      At a minimum, if you wanted to shoe horn this videos method. You could create a for loop and run through your excel doc, turn each cell you care about into a document, then load them up using DocumentLoader.
      But my guess is LangChain has support for this already.

    • @coachfrank2808
      @coachfrank2808 Год назад

      @@DataIndependent brilliant! Can‘t wait for the vid. Do you have LinkedIn?

    • @DataIndependent
      @DataIndependent  Год назад +1

      @@coachfrank2808 Sure do, but I communicate more on Twitter
      twitter.com/GregKamradt
      twitter.com/dataindependent
      www.linkedin.com/in/gregkamradt/

    • @coachfrank2808
      @coachfrank2808 Год назад

      @@DataIndependent thank you for sharing! The use of openai involves lots of costs. recent developments show, that autoregressive LLMs (PaLM 560B) provide promising accuracy. isn't this the perfect case for langchain applications? (ruclips.net/video/XV1RXLPIVlw/видео.html)

    • @DataIndependent
      @DataIndependent  Год назад +1

      @@coachfrank2808 totally! They make it easy to swap out your models with whatever you want.
      Agreed that OpenAI is expensive. I’m glad there will be multiple options to drive prices down.

  • @thefamousdjx
    @thefamousdjx Год назад

    Good stuff. Does this mean if I implemented this at my company, openai will basically have access to all our data, like all those text files you loaded are now sitting somewhere at openai server?

    • @DataIndependent
      @DataIndependent  Год назад +2

      Good question and I'm thinking I should make a no-code overview on this w/ a diagram.
      Short answer is, sort of?
      In this example we loaded 10 essays w/ LangChain. However 10 essays worth of tokens (basically words) is too much to pass to OpenAI in a single API call. They have a prompt limit of ~4K characters. Side note: This will increase but there will always likely be limits.
      This is where the chunking comes into play. LangChain chunked our text, then only sent the most relevant text to OpenAI in the API call.
      No need to send information about start ups if you're asking about McCarthy and lisp.
      So some of your information would have been sent to OpenAI.
      This is a great question and I haven't seen their data policies and privacy. However I would assume zero-trust and anything you send to them might get used in ways you don't want to.

  • @ronakshah9561
    @ronakshah9561 8 месяцев назад

    What a great video! Can this be done on web-urls - I have a use case wherein we have several internal confluence pages. And want to ask questions on them.Can they be somehow loaded into vectorDB. Any guidance highly appreciated.

  • @ujjwalgupta1318
    @ujjwalgupta1318 Год назад

    Thanks a lot, what exactly is different between vectorDBQA and retriever though, they are doing same thing?

  • @shuntera
    @shuntera Год назад +1

    In this example, Chroma always generates an in memory DuckDB database, each time you open and run this notebook to query your text documents it has to build the database from scratch. Is there any way to save this to disk and use that in subsequent queries. And if it could be loaded into memory each subsequent time that would be great.

    • @DataIndependent
      @DataIndependent  Год назад

      Check out the documentation here about persistence
      python.langchain.com/en/latest/modules/indexes/vectorstore_examples/chroma.html

  • @drewwellington2496
    @drewwellington2496 Год назад +3

    A very useful bit of information - and I'm not sure if this is possible with LangChain - would be to display how many tokens each request is using etc. This video is awesome but, behind the scenes, we have no idea how many tokens/embeddings/queries are being performed so I can't see any way to keep track of the cost involved in doing this over and over

    • @DataIndependent
      @DataIndependent  Год назад +2

      Here you go! langchain.readthedocs.io/en/latest/modules/llms/examples/token_usage_tracking.html?highlight=token

    • @fullcrum2089
      @fullcrum2089 Год назад

      this is why i wan't make my own module in js.

    • @fullcrum2089
      @fullcrum2089 Год назад

      @@DataIndependent Thanks, this was helpful

  • @ranu9376
    @ranu9376 Год назад +2

    Cool! Is there any chance we could use other LLMs? instead of openAI? or was this designed specifically based on openAI models?

    • @DataIndependent
      @DataIndependent  Год назад +2

      You could use other LLMs no problem. That's one of the cool parts about LangChain, swapping LLMs is easy

  • @aischool0912
    @aischool0912 Год назад

    @Greg Kamradt can we pass prompts to Chatmodel while using conversational retrieval chain?

  • @vidhandhagai672
    @vidhandhagai672 11 месяцев назад

    Great video! Is there a way to create a chatbot that smartly uses our data + gpt-3.5 data and give us a COMBINED answer from both the data set instead of just our data set or gpt data set?
    So let's say your document had details about 'what did McCarthy discover?' but there's no information for 'when did McCarthy discover the language lisp?'. In this case, it should still be able to answer by looking up our data set for details related to McCarthy and language lisp... and then look-up the gpt-3.5 data for details related to when was it discovered as that's not in our data set.

  • @user-wt7tj6ok7p
    @user-wt7tj6ok7p Год назад

    very helpful for me, i can follow it with no problem. thanks!!! i am wondering if there are document loader available for excel files?

    • @DataIndependent
      @DataIndependent  Год назад

      Check out the langchain documentation for their currently supported loaders!

  • @hayekianman
    @hayekianman Год назад +1

    great video. is there any need to set temperature to anything other than zero for such 'search' like applications? i can see enterprise search being a simple use case, but people would want 1 authoritative answer or a ranking. so in such a case, can it for example, fall back to amazon kendra -which ranks the results instead ?

    • @DataIndependent
      @DataIndependent  Год назад +1

      I don't fully understand the question but I would experiment with other temperature values to see what works best for you.

  • @youwang9156
    @youwang9156 Год назад +1

    really appreciate your work, just have one question for the chunk, how can I split the text into chunks by sentence or comma or space instead of chunk size?

    • @DataIndependent
      @DataIndependent  Год назад +2

      Check out this page
      langchain.readthedocs.io/en/latest/_modules/langchain/text_splitter.html#RecursiveCharacterTextSplitter
      Where is says "self._separators = separators or ["

      ", "
      ", " ", ""]"
      Those are the default separators but you can specify your own. Do separators="," for a comma for example

    • @youwang9156
      @youwang9156 Год назад

      @@DataIndependent thank you so much ! have a good day

  • @cjp3288
    @cjp3288 Год назад

    hey, great video! i would love you to do a demo of this using open source models, maybe via huggingface transformers?

  • @kennethleung4487
    @kennethleung4487 Год назад +2

    Great video! Any concerns about privacy here? You mentioned about using local files, but it seems like there is a chance for OpenAI to have access to the doc text?

    • @DataIndependent
      @DataIndependent  Год назад +3

      OpenAI (or any LLM you use) will only have access to the pieces you send over to them. In this example we load up 5-10 essays. But to answer a question we only send 4 smaller chucks over to OpenAI.
      So if you're worried about OpenAI seeing *any* of your data, then yes there are privacy concerns. They no doubt are using your data to train more models, beyond that I'm not sure what they are doing with it.

    • @trunghieumai3895
      @trunghieumai3895 Год назад +2

      @@DataIndependent thanks for the answer. I also have the same concern :) I am curious to know is it possible to run a local GPT model to perform the same task using langchain. It would be great that you can share some of your thought :) Thank you very much!

  • @Murcie4S
    @Murcie4S Год назад +1

    Thank you for implementing the feature with the text file. While using the line of code 'qa = VectorDBQA.from_chain_type(llm=OpenAI(), chain_type="stuff", vectorstore=docsearch)', I received feedback indicating that the VectorDBQA module is deprecated. Deprecated cautions suggests importing 'from langchain.chains import RetrievalQA', which includes a retriever parameter. However, I encountered an issue when attempting to replace 'docsearch' with this parameter. Can you please advise me on how to properly use the 'RetrievalQA' module?

    • @DataIndependent
      @DataIndependent  Год назад

      Check out my latest video on the 7 core concepts of Langchain and specifically the indexes section. I have an example about retriever in there.
      If that doesn't help then I recommend looking at the documentation

    • @verdurakh
      @verdurakh Год назад

      I had the same error. I swapped the code with VectorDBQA to the following
      RetrievalQA.from_chain_type(llm=OpenAI(temperature=0.5), chain_type="stuff", retriever=docsearch.as_retriever())
      The thing that took me time to find was the docserach.as_retiever function

  • @sun19851
    @sun19851 Год назад +1

    Great video Greg, thank you!! The VectorDBQA method is already deprecated, use below
    chat = RetrievalQA.from_llm(llm=OpenAI(), retriever=docs.as_retriever())

    • @fernandofrias8322
      @fernandofrias8322 11 месяцев назад

      I asked this to Langchain helper chat AI, this is the answer: RetrievalQA.from_llm and VectorDBQA are both question answering models in the Langchain library.
      RetrievalQA.from_llm is a chain that uses a large language model (LLM) as the question answering component. It retrieves relevant documents from a vector database (VectorDB) based on the query and then uses the LLM to generate an answer.
      VectorDBQA, on the other hand, directly retrieves documents from a vector database and uses a default question answering prompt to generate an answer.
      Both models can be used to answer questions based on retrieved documents, but RetrievalQA.from_llm provides more flexibility in terms of customizing the prompt and using the LLM for generating answers.

  • @maof77
    @maof77 Год назад +1

    Great feedback! How do you add extra documents to the Chroma store when using the persisted_directory? Neither 'add_documents()' nor 'from_documents()' seem to work for me :-(

    • @DataIndependent
      @DataIndependent  Год назад

      What's the error you're getting?
      Here is the documentation (at the bottom) that could help: langchain.readthedocs.io/en/latest/modules/indexes/vectorstore_examples/chroma.html

  • @elahehsalehirizi618
    @elahehsalehirizi618 Год назад +1

    Hi I have a question. I am using GPT-3 for the task of few-shot text classification, I have tried directly GPT-3 and also via LangChain (using the same few-shot examples for both, the same format of the prompt (the one from LangChain formatter) and the same features of GPT-3). I had better results when I used LangChain. Is there any explanation why LangChain performed better than directly using GPT-3? thanks

  • @wilfredomartel7781
    @wilfredomartel7781 Год назад +1

    What about using semantic search to retrieve relevant docs and flan-t5 to reasong over them?

  • @Kalease54
    @Kalease54 Год назад +1

    I just binged all of your videos on Langchain, this is exactly the library I was looking for.
    One question I have is if you need to utilize an OpenAI embeddings model for vector search of custom data, how would you also utilize a Model like let’s say Davinci if the solution also calls providing results not just from the vectorized content? For instance if the solution calls for having knowledge of personal data but also need to utilize LangChain search tools for query answer search?
    I don’t believe the OpenAI embeddings model can also do what you presented in your previous videos but I could be wrong. Any help would be greatly appreciated.
    Please keep up the videos!!

    • @DataIndependent
      @DataIndependent  Год назад +2

      Nice! Thank you very much.
      For your question
      * Quick clarification - you don't *need* to use OpenAI for embeddings, lots of models can give you this.
      * The embeddings are just a way to get relevant documents. Once you got those docs you can do all sorts of chains (like the query answer search).

    • @Kalease54
      @Kalease54 Год назад

      @@DataIndependent Thank you for the info. Do you have anything on your list for querying a SQL db for answers?

    • @DataIndependent
      @DataIndependent  Год назад +2

      @@Kalease54 I haven't done that yet but good idea. I'll add that to the list.

    • @cgtinc4868
      @cgtinc4868 Год назад +1

      @@DataIndependent I have one more question further to Kalease (which is a great Q btw). So after vectorized and uploaded to Pinecone for example, and let say where the original text (pdf, word, text etc) are resided; once they are disconnected, will the LLM still be able to retrieve the information (sorry if people have already asked this)

  • @kennethleung4487
    @kennethleung4487 Год назад

    Thanks! Is there a way to tell ChatGPT to look at one folder first, before asking further questions? Since sometimes the question may only be specific to some files, and we don't want to generalize to all files

    • @DataIndependent
      @DataIndependent  Год назад +5

      There isn't a way to do that w/ ChatGPT on it's own. With LangChain you can use an Agent which decides if the answer is complete or not. "If not, then go to the next folder" type of action

  • @alx4571
    @alx4571 Год назад +2

    Is it possible to use Chat GPT-4 with a private source data? Something proprietary for example that wont get shared out to the model?

    • @DataIndependent
      @DataIndependent  Год назад

      For that you'll want to use a private local model on your own computer. It would work there.

  • @chronicfantastic
    @chronicfantastic Год назад

    I wish there was a better way to visualise the [source_documents] results - if you ask it an unrelated question it gives the right answer but hallucinates the reference points. Still a bit unsure what's going on. Thanks so much for these videos!

    • @DataIndependent
      @DataIndependent  Год назад +1

      Check out langchain documentation, they have QA with sources and that should help

  • @DanielGomez-zk2st
    @DanielGomez-zk2st Год назад +1

    hi great video! Im wondering if anybody got an error when creating the docsearch "IndexError: list index out of range". Cant seem to find why this happens if I followed step by step. Any help is greatly appreciated. Thanks!

  • @mysticartsmelodies
    @mysticartsmelodies Год назад

    that awesome, is there a way for us to combine the vectored/custom data we have with GPTs own base training? so we can get an almost double referenced output? could we state in our return prompt to have gpt add onto the pulled vectorDB data its own understanding of the topic or would it do that already? cheers!

    • @DataIndependent
      @DataIndependent  Год назад

      You could alter your prompt to have it add any extra details that aren't in the context.
      People often need to say, "only use what's in the context, don't make anything up"
      So I bet the reverse would be true too

    • @mysticartsmelodies
      @mysticartsmelodies Год назад

      @@DataIndependent yeah makes sense, I'm fine tuning a model with highly niched data for specific software use cases. Base gpt has good details but not enough however I don't want to eliminate the existing gpt knowledge with my custom data so yeah, imma play around with the prompting and ask for it to use its own core knowledge with the pinecone VDB

  • @user-ru1qz1bo2q
    @user-ru1qz1bo2q Год назад

    Thanks a ton. I'd love to see a video on using metadata filters in Chroma. Also, VectorDBQA seems to be deprecated now - any chance for an update on its replacement?

    • @DataIndependent
      @DataIndependent  Год назад +4

      You're right VectorDBQA isn't recommended anymore. It's the RetrievalDBQA now.
      I'm debating putting together a video on how to create a chat bot in python, but not just surface level, one with all the features people have been asking for.
      * How to upload multiple documents
      * How to persist the data and come back to it
      * How to persist the user history so they have their chat when they come back
      * How to upload different types of documents
      * How to do filters, etc.

  • @KiritiSai93
    @KiritiSai93 Год назад +1

    Amazing video! Thank you so much for putting this out.
    How does the text splitting affect the accuracy of returned results? I have a collection of question and answers for an educational course. I want to customize the prompt given to ChatGPT that these are question and answers and find the correct answer. Is this something that can be done with LangChain?

    • @DataIndependent
      @DataIndependent  Год назад

      Yep big time. If you have a small set of question/answers then you can do the method in this video. If you have a ton of questions/answer, check out my video on asking a book a question

    • @KiritiSai93
      @KiritiSai93 Год назад

      @@DataIndependent Thanks for the reply. Agree that recursive splitter is more useful for a single large document. My question was more like - is it possible to tell CHATGPT via a prompt that you are looking at question and answer document instead of it assuming they are just pieces of text?

  • @hdgdz1259
    @hdgdz1259 Год назад

    Thank you for the video. Would it be possible to save the chrome output and load it later so that we do not need to run the analysis each time we want to use the code?

    • @DataIndependent
      @DataIndependent  Год назад

      Yep - check out how to 'persist' data in the documentation

  • @ramishashahid8853
    @ramishashahid8853 8 месяцев назад

    It's a really great video i want to ask that how to increase speeed bcz it takes alot of time to reply to user query

  • @megajagatube
    @megajagatube Год назад +1

    Great video! What if the data in these files cannot leave the premises? Does calling the embedding take the data off premise?

    • @DataIndependent
      @DataIndependent  Год назад

      Yes, because you need to send your raw text to OpenAI to get the embeddings back.
      If you wanted you could use a locally hosting embedding engine

    • @megajagatube
      @megajagatube Год назад

      @@DataIndependent thanks! Can you point me to some resources on locally hosted embedding engine?

  • @domyu6418
    @domyu6418 Год назад +1

    Great video, thank you! After implementing your steps, I always get the error that I reached the token limit of the model. I have to use 300+ .txt files. Does changing the chain type help here? I watched your video on chain types as well :)

    • @angelferhati
      @angelferhati 11 месяцев назад

      bro i get the same error and i cant find a way to fox that have you solve it?

  • @HantuMedias
    @HantuMedias Год назад +2

    I'm got IndexError: list index out of range when executing this line: docsearch = Chroma.from_documents(texts, embeddings). I tried loading a large pdf file. This could be the culprit.
    Can you suggest a workaround?

    • @DataIndependent
      @DataIndependent  Год назад

      Hm, I haven't run into that one yet. Have you done it on a small pdf?
      Is your texts a list or a single text file? I believe it should be a list

  • @ducrider75
    @ducrider75 Год назад

    Hi Greg: Amazing series. I'm learning a ton. In this video, I'm stuck with dependencies. A langchain component - argilla wants older versions of numpy and pandas. This results in a situation where numpy, pandas, nltk and argilla just won't play well. Any pointers on how you solved this?

    • @DataIndependent
      @DataIndependent  Год назад

      hmm I didn't run into that, but I've heard a bunch of people have a hard time with the unstructured library.
      Have you tried using a different loader? Either the from file or offline one? You can do this same process with another data source as well.
      A bunch listed under the integrations here
      python.langchain.com/docs/modules/data_connection/document_loaders/

  • @andrelvcoelho
    @andrelvcoelho 10 месяцев назад

    Hi there. Haven't checked whether someone else asked this before, but what are the reasons for replacing Chroma by FAISS in the notebook? Is the latter a better option as vector store? Tks

    • @DataIndependent
      @DataIndependent  10 месяцев назад

      Honestly - I was having errors with both Chroma and FAISS for a while so I had to switch between them. Those errors seem to be fixed so either is fine imo

  • @ghalwash
    @ghalwash Год назад

    Amazing tutoring 😊, if I may ask How can I train or on my e-commerce data and be able to get response in form of list of product IDs

    • @DataIndependent
      @DataIndependent  Год назад +1

      Check out my video on getting structured data back from your LLM, that may help.

  • @pshah222
    @pshah222 10 месяцев назад

    I was able to edit the code with Directory loader function to read and query most types of documents in a folder. Is there a way I can integrate Google Search within the same code and questioning Chain so the logic of the model is to make a decision to search local db and if no answer is found it will also look on the internet

  • @anintrovertabroad2065
    @anintrovertabroad2065 Год назад +1

    There seems to be some sort of limit on the length. It seems to be somewhere around 8 to 10k words. Some of my documents will work at that length but some of them dont. It says the token limit has been reached on some of them. Did I do something wrong or is that expected?

    • @DataIndependent
      @DataIndependent  Год назад +1

      That's totally correct. There is a token limit that you'll hit.
      Check out my video on "workarounds to openai token limit" for information on what to do about it.

  • @captainjackrana
    @captainjackrana Год назад

    This still seems to be uploading "private" data over to openAI during the embedding creation phase. Is there a way to create the embeddings without having to pass the document data to their APIs?

  • @henryl7421
    @henryl7421 Год назад

    When you load the embedding using openAI's API key via langchain, what models are you using by default? text-embedding-ada-002?

  • @crazymohan2008
    @crazymohan2008 Год назад

    Great tutorial, very informative. If my data source is a databricks delta table, how do I do this? What if my delta table is updated on a daily basis with new data. Thanks in advance.

    • @DataIndependent
      @DataIndependent  Год назад

      The easy data loaders will be listed on the langChain Document loaders. Or else something on llama hub might work for you llamahub.ai/
      If not then getting an export of that table to a spot you can work with will work

  • @kirtg1
    @kirtg1 Год назад +1

    Thanks for the video. Can this be done for a 3000 text document?

  • @henkhbit5748
    @henkhbit5748 Год назад

    Great video, so the documents embeddings are stored in Lanchain vector database and is it persistent? I mean when I have created a chatbot which is specialised in some subject (which was feeded with n-documents) then multiple users can ask questions. Each will user will instantiated the whole process from the beginning?
    Also getting info from a database instead of a document, postgres as an example, would be great... Thanks so far, very informative.👍

    • @DataIndependent
      @DataIndependent  Год назад +1

      You can use a vectordatabase to persist the data. There are a bunch to choose from
      langchain.com/integrations.html

    • @henkhbit5748
      @henkhbit5748 Год назад

      @@DataIndependent Thanks for the list

  • @nsitkarana
    @nsitkarana Год назад

    for me, i had to go with RetrievalQA instead of VectorDBQA (it was marked as deprecated) and accordingly, changed the query to 'qa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type="stuff", retriever=db.as_retriever(search_kwargs={"k": 1}))'. Other than this, worked well !!

  • @Piroco11
    @Piroco11 Год назад

    Great video, thanks for all the insights. Had a question:
    Does every question you ask to the QA, send the entire set of documents in the prompt to chatGPT? I.e. does each question costs as much tokens as the entire set of documents + the question?