ChatGPT For Your DATA | Chat with Multiple Documents Using LangChain

Поделиться
HTML-код
  • Опубликовано: 4 окт 2024
  • ChatGPT For Your DATA | Chat with Multiple Documents Using LangChain
    In this video, I will show you, how you can chat with any document.  Let's say you have a folder and inside the folder, you have different file formats. Let's say you have PDF file. You have text file. You have read me file and others.  I will show you how you can take all of your data, split the data into different chunks, do the embeddings using the OpenAI embeddings, store that into the Pinecone vectorstore. Finally, you can just chat with your own documents and get insights out of it, similar to ChatGPT for with your own data. Happy Learning.
    👉🏼 Links:
    GitHub code: github.com/sud...
    ☕ Buy me a Coffee: ko-fi.com/data...
    💬 Build your custom chatbot: www.chatbase.c...
    🔗 Other videos you might find helpful:
    LangChain: • What is LangChain ? | ...
    Open Assistant: • Open Assistant 😱 | Ope...
    Chat with any pdf: • ChatPDF | MindBlowing ...
    Sketch with pandas: • AI Code-Writing Assist...
    Analyzing data with ChatGPT: • Analyzing data with CH...
    🔴 RUclips: www.youtube.co...
    👔 LinkedIn: / sudarshan-koirala
    🐦 Twitter: / mesudarshan
    💰🔗 Some links are affiliate links, meaning, when you use those, I might get some benefit.
    #openai #llm #datasciencebasics #chatwithdata #documents #chatgpt #nlp

Комментарии • 94

  • @gilbertomendes165
    @gilbertomendes165 Год назад +4

    Many Thanks for your great work!
    It's very well explained and applies to real uses of AI.

  • @peralser
    @peralser Год назад +2

    great Video. Thanks for your time and explanation.

  • @fabsync
    @fabsync 9 месяцев назад

    as always great tutorials! I would love to see this same topic but without using openai..

    • @datasciencebasics
      @datasciencebasics  9 месяцев назад

      you can try this one,
      How TO SetUp and USE PrivateGPT | FULLY LOCAL | 100% PRIVATE
      ruclips.net/video/VEQ8mxv2MHY/видео.html

  • @IamalwaysOK
    @IamalwaysOK Год назад +4

    I've identified the problem in your code. The issue lies in the creation of chat history. Your code expects a list of tuples, but in your Gradio app, you're creating a list of lists (nested lists), which is causing the code to malfunction.
    Please try using the following code instead and replace it in your Gradio block. This updated code should resolve the issue and make it work correctly.
    import gradio as gr
    with gr.Blocks() as demo:
    chatbot = gr.Chatbot()
    msg = gr.Textbox()
    clear = gr.Button("Clear")
    def respond(user_message, chat_history):
    print(user_message)
    print(chat_history)
    if chat_history:
    chat_history = [tuple(sublist) for sublist in chat_history]
    print(chat_history)
    # Get response from QA chain
    response = qa({"question": user_message, "chat_history": chat_history})
    # Append user message and response to chat history
    chat_history.append((user_message, response["answer"]))
    print(chat_history)
    return "", chat_history
    msg.submit(respond, [msg, chatbot], [msg, chatbot], queue=False)
    clear.click(lambda: None, None, chatbot, queue=False)
    demo.launch(debug=True, share=True)

  • @motopaediatheview9284
    @motopaediatheview9284 Год назад +2

    Thank you. How could I print from which document (title) it is from and which page (s). It is useful when multiple files of multiple pages in the sorce directory. Thank you for your time.

    • @jannik3475
      @jannik3475 Год назад

      Yes that would be nice, any tipps?

  • @chineduezeofor2481
    @chineduezeofor2481 Год назад

    Awesome tutorial.
    Thank you for sharing!

  • @HeroReact
    @HeroReact 11 месяцев назад +1

    Please do again this video with Streamlit

  • @Alimenteocerebro
    @Alimenteocerebro Год назад

    So every time I need to chat with my own data I will have to embedding the query? That’s make it much more expensive isn’t?

    • @datasciencebasics
      @datasciencebasics  Год назад

      The query/question is in natural language and needs to be converted to numbers/vectors. So, yes each of those needs to be embedded. The embedding is not that expensive. It is how it works :)

  • @ruidinis75
    @ruidinis75 25 дней назад

    Can we do it with Groq ? How would you do the embeddings ?

  • @muratalarcin8515
    @muratalarcin8515 Год назад +1

    How can I utilize this ChatBot for my SQL documents?

  • @tusharbhatnagar8143
    @tusharbhatnagar8143 Год назад +1

    My question would be. How would you accommodate new random data that has to be introduced to this? Will be do the vectorization process all over again or is there a better way to handle it even for 1 document?

    • @AlgorithmicEchoes
      @AlgorithmicEchoes Год назад

      You can 'upsert' a document over existing ones. No need to vectorize again.

    • @tusharbhatnagar8143
      @tusharbhatnagar8143 Год назад +1

      @@AlgorithmicEchoes So when is vectorization needed and when not? How does that work? In case of pinecone fine, we can simply push the docs and the rest is taken care of right? Any idea about other cases?
      Also possible to share any demo codes for this or refer to it here so that others can also benefit from the same.. that'll be great.

  • @ramp2011
    @ramp2011 Год назад +1

    Thank you for the good video. I am curious why you stored the vectors in chroma first and again in pincone again? Thank you

    • @datasciencebasics
      @datasciencebasics  Год назад +1

      It was just for the demonstration purpose. You can choose anything either Chroma or Pinecone.

  • @imranmunshi5894
    @imranmunshi5894 6 месяцев назад

    hello sir,
    what will be evaluation metrics we should use for our usecase. kindly let me know

  • @tattooGuri
    @tattooGuri Год назад

    Its Chunks and not Choonks
    just for Fun,Dont take it serious.....The video is informational and Perfect

  • @arielbonfil4079
    @arielbonfil4079 Год назад

    Thanks, very good content!

  • @francoist1672
    @francoist1672 7 месяцев назад

    I tried your tutorial, but get stuck on the steps to Pinecone, error: AttributeError: init is no longer a top-level attribute of the pinecone package. Do you have an updated notebook?

  • @mayank1334
    @mayank1334 Год назад +2

    Amazing tutorial! Is there a way to add in the sources as well with the responses?

    • @datasciencebasics
      @datasciencebasics  Год назад

      In many cases, Yes. Refer to documentation, you might find the answer.

    • @tusharkhatri5795
      @tusharkhatri5795 Год назад

      @@datasciencebasics Hi can you please tell which model you used for embedding same gpt 3.5 turbo?

  • @903siddhu
    @903siddhu Год назад

    May I know which website are you using to execute step by step. I learnt a lot form this tutorial

    • @datasciencebasics
      @datasciencebasics  Год назад

      its google colab. Refer to this video for more info -> ruclips.net/video/Xi9-W26cDBs/видео.html

  • @nitroeh
    @nitroeh Год назад

    Were you able to figure out the error when entering the second query? I’m running into the same issue.

  • @hoduchoa1727
    @hoduchoa1727 10 месяцев назад

    many thanks for great tutorial, but It seems slow, is there any way make it run faster? thanks advance

    • @datasciencebasics
      @datasciencebasics  10 месяцев назад

      You are welcome. Some options might be trying with different models and can also use cache and see how it performs.

  • @DrakeAI-J
    @DrakeAI-J Год назад +1

    Can it be used with code? For example a .NET project with multiple classes

    • @datasciencebasics
      @datasciencebasics  Год назад +1

      I am not sure if its possible with .NET Right now, langchain has python and javascript documentation.

  • @sreekumargurunathan4805
    @sreekumargurunathan4805 Год назад

    Great Video 💯💥. So, can we add multiple CSV files instead of multiple types of files?

  • @pranavgupta9015
    @pranavgupta9015 Год назад

    Very informative video Sudarshan, in the Pinecone method. The free version does not support 1536 embedding size, suggestions?

    • @datasciencebasics
      @datasciencebasics  Год назад

      Thank you Pranav. I am able to create embedding with 1536 dimensions. There must be something wring somewhere OR Pinecone must have disabled for newer users.

  • @mrmortezajafari
    @mrmortezajafari 11 месяцев назад

    why we split data with chunk of 1000 or 1500 and then get 4 most relevant chunks? why not more than 1500 or 1000 character per chunk? or why not more than 4 releant chunks? is there limitation of characters to feed the chatGPT with data? how much is the limitation? after using the code I checked my API usage in OpenAi and saw that I have used instructGPT. what is instructGPT?

    • @datasciencebasics
      @datasciencebasics  11 месяцев назад +1

      Please refer to these materials. Also, in the future you can do a simple google search to find your answers 😄
      www.pinecone.io/learn/chunking-strategies/
      Perplexity AI: what is instructGPT www.perplexity.ai/search/what-is-instructGPT-bbhHw0xHRLudFO.l..LIpQ?s=mn

    • @mrmortezajafari
      @mrmortezajafari 11 месяцев назад +1

      @@datasciencebasics You are right

  • @mrmortezajafari
    @mrmortezajafari 11 месяцев назад

    Hi, thank you for your contribution, how much data I can use? I mean a lot of documents ca be stored in the vector store?

    • @datasciencebasics
      @datasciencebasics  11 месяцев назад +1

      As far as I know, you can store as much as you can. Give a try as I haven't tried storing too much of data.

  • @amartya4008
    @amartya4008 11 месяцев назад

    Nice Video .Can we use open source models for the same ??

    • @datasciencebasics
      @datasciencebasics  11 месяцев назад

      Here is with Llama2 ruclips.net/video/VPk-at5oqAY/видео.htmlsi=gkfVmnF0xP7pgJ8C

  • @malleswararaomaguluri6344
    @malleswararaomaguluri6344 Год назад

    Is it possible to retrieve embedding part from chroma like pinecone. 2nd doubt is first I done embed with 2 files and I want to add 2 more files. So I need to 4 files need to embed or latest 2 files embed can combine with first 2 embed data. If it is possible, how to do it

    • @datasciencebasics
      @datasciencebasics  Год назад

      Yes it is possible. You can just save the embeddings somewhere and again dump it in the same location. This way you don’t need to embed again. Give a try yourself and see how it works :)

  • @benitoloaiza566
    @benitoloaiza566 Год назад

    Hey I runned and it worked great but why the responses so cold? is it the temperature?

    • @datasciencebasics
      @datasciencebasics  Год назад

      Hey, might be. You can try playing around with temperature.

  • @snehitvaddi
    @snehitvaddi 6 месяцев назад

    Getting some numpy error: "AttributeError: module 'numpy.linalg._umath_linalg' has no attribute '_ilp64' " in all your LangChain related colab notebooks

    • @datasciencebasics
      @datasciencebasics  6 месяцев назад

      By just seeing this, I have no clue how to help you. Try using newer versions of packages or try to update the code as many libraries update over time.

    • @snehitvaddi
      @snehitvaddi 6 месяцев назад

      ​@@datasciencebasics Could you please try rerunning your notebook up to the Directory Loader and check? That seems to be where the issue originates from

  • @Alkotas25
    @Alkotas25 Год назад +1

    hi, thx the video! is it possible also to chunk a long html code to chat with that and help gpt to modify long code?

    • @datasciencebasics
      @datasciencebasics  Год назад +1

      it is possible to load a github repo, make different chunks and ask questions related to code. I will demonstrate this in next video. Based in this, you might modify your code to achieve what you are looking for.

    • @Alkotas25
      @Alkotas25 Год назад

      @@datasciencebasics thx the reply and in advance the future video!

  • @JayP-127
    @JayP-127 Год назад

    When executing the loaders section enters in a long process that I stopped after 13 minutes. I checked and everything seemed normal

    • @datasciencebasics
      @datasciencebasics  Год назад

      You might need to uninstall and install Pillow. Not sure if it helps but you can try.
      import PIL
      !pip uninstall Pillow
      !pip install --upgrade Pillow

  • @mike-ss4wk
    @mike-ss4wk Год назад

    I copied your colab document. When I execute #take all the loader, then I get en error:
    ImportError: cannot import name 'is_directory' from 'PIL._util' (/usr/local/lib/python3.10/dist-packages/PIL/_util.py)
    How to solve it?

    • @datasciencebasics
      @datasciencebasics  Год назад

      You might need to uninstall and install Pillow.
      import PIL
      !pip uninstall Pillow
      !pip install --upgrade Pillow

  • @smile_tadikonda6736
    @smile_tadikonda6736 Год назад

    great video really helpful
    may i know different types of chatbots like gradio for free and code for some chatbots
    and one more thing can we get our doucument link from our chat bot
    thank you
    please reply me

    • @datasciencebasics
      @datasciencebasics  Год назад

      Thanks, you can use gradio, streamlit and others to create a UI for chatbots. Yes, you can have sources also being shown when returning the answers. Please refer to langchain documentation about QA with sources.

  • @karthikeyaramaswamy5003
    @karthikeyaramaswamy5003 Год назад

    The last part gradio in the code is tried to install a malware in my system.

    • @datasciencebasics
      @datasciencebasics  Год назад

      Strange, others are not facing this issue. Were you able to run it ?

  • @Curious-nomad
    @Curious-nomad Год назад

    at loaders step where you are creating list out of all loader in document [] when i run this piece of code it is taking more than 10 mins andstill not executed anyone can help??

    • @datasciencebasics
      @datasciencebasics  Год назад

      hi, it’s hard to say what went wrong without seeing what and how u r loading. Hope, you already fix it.

  • @LynchTee
    @LynchTee Год назад

    Can I know using this method will consumed any openAI token for reading and answer queries of the documents?

    • @datasciencebasics
      @datasciencebasics  Год назад +1

      Yes, it does. Please watch my PrivateGPT video to know how to chat with documents locally where OpenAI is not being used.

    • @LynchTee
      @LynchTee Год назад

      @@datasciencebasics Yes. I have followed your PrivateGPT video. It's work, much appreciate!

  • @زينبسالمعزيز-د2ح

    Ok if i have csv file how can i load it bro

    • @datasciencebasics
      @datasciencebasics  Год назад

      You can use CSVLoader for that. I have other videos too where I have explained how to deal with csv files.

  • @mohitaggarwal3625
    @mohitaggarwal3625 Год назад

    Can this be used to chat with JSON files too ?

  • @GALTechEnterprises-m7c
    @GALTechEnterprises-m7c Год назад

    Can you share colab link.

  • @malleswararaomaguluri6344
    @malleswararaomaguluri6344 Год назад

    How much pinecone is secured

    • @datasciencebasics
      @datasciencebasics  Год назад

      It depends what kind of usecase you will be using but I would say generall it is secured !

  • @bennymartinez8197
    @bennymartinez8197 Год назад

    it doesn’t let me install the libraries, they take forever and that the end it crashes sand says there’s no dick space. Im on replit

  • @srinathv9227
    @srinathv9227 Год назад

    Authentication error : in pinecone steps how to slove?

    • @mohammedelismaili3803
      @mohammedelismaili3803 Год назад

      HI. I got the same error. Did you fix it?

    • @srinathv9227
      @srinathv9227 Год назад

      @@mohammedelismaili3803 No

    • @datasciencebasics
      @datasciencebasics  Год назад +1

      Hey, its related to pinecone authentication as it says. Either its from pinecone side or there is something wrong in ur side. Hopefully it will be fixed.

  • @besmart2350
    @besmart2350 10 месяцев назад

    is there an easier way for non programmers to chat with 300+ own pdf's? does anyone sell ready to run solutions that I can just download and upload 300+ books on the same topic (legal theory for example) into it and chat with it?

    • @datasciencebasics
      @datasciencebasics  10 месяцев назад

      you can use the latest model from OpenAI and use retrieval tool which they claim can handle 300 pages of pdf or you can even create GPTs from OpenAI

    • @besmart2350
      @besmart2350 10 месяцев назад

      @@datasciencebasics not 300 pages, I mean 300+ books (pdf, epub and so on). Is it possible? not online, but offline on my Macintosh. Having my now GPT that is trained on my own 300 books on that cover the same particular topic. Train it and then chat with it, ask it questions on this topic which GPT will answer with info from all those 300 books I've loaded it with

    • @datasciencebasics
      @datasciencebasics  10 месяцев назад

      there is nothing that’s impossible but for these kind of scenarios, good research about Open Source Models is needed. I can’t just say now use this and that model.

    • @besmart2350
      @besmart2350 10 месяцев назад

      @@datasciencebasics create such an app please (for mac), I will buy it

  • @mrmortezajafari
    @mrmortezajafari 11 месяцев назад

    Is it possible to embed Persian language?

    • @datasciencebasics
      @datasciencebasics  11 месяцев назад +1

      haven’t tried myself so I am not sure about it !

    • @mrmortezajafari
      @mrmortezajafari 11 месяцев назад

      @@datasciencebasics it supports

  • @AIEntusiast_
    @AIEntusiast_ 7 месяцев назад

    fails on import pinecone

    • @datasciencebasics
      @datasciencebasics  7 месяцев назад

      hei, you might need to check the latest code from Langchain website. Also did u install pinecone ?