ChatGPT For Your DATA | Chat with Multiple Documents Using LangChain

Data Science Basics

Просмотров 23 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 4 окт 2024
ChatGPT For Your DATA | Chat with Multiple Documents Using LangChain
In this video, I will show you, how you can chat with any document. Let's say you have a folder and inside the folder, you have different file formats. Let's say you have PDF file. You have text file. You have read me file and others. I will show you how you can take all of your data, split the data into different chunks, do the embeddings using the OpenAI embeddings, store that into the Pinecone vectorstore. Finally, you can just chat with your own documents and get insights out of it, similar to ChatGPT for with your own data. Happy Learning.
👉🏼 Links:
GitHub code: github.com/sud...
☕ Buy me a Coffee: ko-fi.com/data...
💬 Build your custom chatbot: www.chatbase.c...
🔗 Other videos you might find helpful:
LangChain: • What is LangChain ? | ...
Open Assistant: • Open Assistant 😱 | Ope...
Chat with any pdf: • ChatPDF | MindBlowing ...
Sketch with pandas: • AI Code-Writing Assist...
Analyzing data with ChatGPT: • Analyzing data with CH...
🔴 RUclips: www.youtube.co...
👔 LinkedIn: / sudarshan-koirala
🐦 Twitter: / mesudarshan
💰🔗 Some links are affiliate links, meaning, when you use those, I might get some benefit.
#openai #llm #datasciencebasics #chatwithdata #documents #chatgpt #nlp

Комментарии • 94

@gilbertomendes165 Год назад ⁺⁴
Many Thanks for your great work!
It's very well explained and applies to real uses of AI.
@peralser Год назад ⁺²
great Video. Thanks for your time and explanation.
@datasciencebasics Год назад
You’re welcome. Glad that it was helpful.
@fabsync 9 месяцев назад
as always great tutorials! I would love to see this same topic but without using openai..
@datasciencebasics 9 месяцев назад
you can try this one,
How TO SetUp and USE PrivateGPT | FULLY LOCAL | 100% PRIVATE
ruclips.net/video/VEQ8mxv2MHY/видео.html
@IamalwaysOK Год назад ⁺⁴
I've identified the problem in your code. The issue lies in the creation of chat history. Your code expects a list of tuples, but in your Gradio app, you're creating a list of lists (nested lists), which is causing the code to malfunction.
Please try using the following code instead and replace it in your Gradio block. This updated code should resolve the issue and make it work correctly.
import gradio as gr
with gr.Blocks() as demo:
chatbot = gr.Chatbot()
msg = gr.Textbox()
clear = gr.Button("Clear")
def respond(user_message, chat_history):
print(user_message)
print(chat_history)
if chat_history:
chat_history = [tuple(sublist) for sublist in chat_history]
print(chat_history)
# Get response from QA chain
response = qa({"question": user_message, "chat_history": chat_history})
# Append user message and response to chat history
chat_history.append((user_message, response["answer"]))
print(chat_history)
return "", chat_history
msg.submit(respond, [msg, chatbot], [msg, chatbot], queue=False)
clear.click(lambda: None, None, chatbot, queue=False)
demo.launch(debug=True, share=True)
@datasciencebasics Год назад ⁺¹
Great, thanks for identifying the problem !
@motopaediatheview9284 Год назад ⁺²
Thank you. How could I print from which document (title) it is from and which page (s). It is useful when multiple files of multiple pages in the sorce directory. Thank you for your time.
@jannik3475 Год назад
Yes that would be nice, any tipps?
@chineduezeofor2481 Год назад
Awesome tutorial.
Thank you for sharing!
@datasciencebasics Год назад
You are welcome 😎
@HeroReact 11 месяцев назад ⁺¹
Please do again this video with Streamlit
@jamieabela5886 11 месяцев назад
Yes ❤
@Alimenteocerebro Год назад
So every time I need to chat with my own data I will have to embedding the query? That’s make it much more expensive isn’t?
@datasciencebasics Год назад
The query/question is in natural language and needs to be converted to numbers/vectors. So, yes each of those needs to be embedded. The embedding is not that expensive. It is how it works :)
@ruidinis75 25 дней назад
Can we do it with Groq ? How would you do the embeddings ?
@muratalarcin8515 Год назад ⁺¹
How can I utilize this ChatBot for my SQL documents?
@tusharbhatnagar8143 Год назад ⁺¹
My question would be. How would you accommodate new random data that has to be introduced to this? Will be do the vectorization process all over again or is there a better way to handle it even for 1 document?
@AlgorithmicEchoes Год назад
You can 'upsert' a document over existing ones. No need to vectorize again.
@tusharbhatnagar8143 Год назад ⁺¹
@@AlgorithmicEchoes So when is vectorization needed and when not? How does that work? In case of pinecone fine, we can simply push the docs and the rest is taken care of right? Any idea about other cases?
Also possible to share any demo codes for this or refer to it here so that others can also benefit from the same.. that'll be great.
@ramp2011 Год назад ⁺¹
Thank you for the good video. I am curious why you stored the vectors in chroma first and again in pincone again? Thank you
@datasciencebasics Год назад ⁺¹
It was just for the demonstration purpose. You can choose anything either Chroma or Pinecone.
@imranmunshi5894 6 месяцев назад
hello sir,
what will be evaluation metrics we should use for our usecase. kindly let me know
@tattooGuri Год назад
Its Chunks and not Choonks
just for Fun,Dont take it serious.....The video is informational and Perfect
@arielbonfil4079 Год назад
Thanks, very good content!
@datasciencebasics Год назад
You're welcome!
@francoist1672 7 месяцев назад
I tried your tutorial, but get stuck on the steps to Pinecone, error: AttributeError: init is no longer a top-level attribute of the pinecone package. Do you have an updated notebook?
@mayank1334 Год назад ⁺²
Amazing tutorial! Is there a way to add in the sources as well with the responses?
@datasciencebasics Год назад
In many cases, Yes. Refer to documentation, you might find the answer.
@tusharkhatri5795 Год назад
@@datasciencebasics Hi can you please tell which model you used for embedding same gpt 3.5 turbo?
@903siddhu Год назад
May I know which website are you using to execute step by step. I learnt a lot form this tutorial
@datasciencebasics Год назад
its google colab. Refer to this video for more info -> ruclips.net/video/Xi9-W26cDBs/видео.html
@nitroeh Год назад
Were you able to figure out the error when entering the second query? I’m running into the same issue.
@hoduchoa1727 10 месяцев назад
many thanks for great tutorial, but It seems slow, is there any way make it run faster? thanks advance
@datasciencebasics 10 месяцев назад
You are welcome. Some options might be trying with different models and can also use cache and see how it performs.
@DrakeAI-J Год назад ⁺¹
Can it be used with code? For example a .NET project with multiple classes
@datasciencebasics Год назад ⁺¹
I am not sure if its possible with .NET Right now, langchain has python and javascript documentation.
@sreekumargurunathan4805 Год назад
Great Video 💯💥. So, can we add multiple CSV files instead of multiple types of files?
@datasciencebasics Год назад
Thanks. It should be possible. You can give a try :)
@pranavgupta9015 Год назад
Very informative video Sudarshan, in the Pinecone method. The free version does not support 1536 embedding size, suggestions?
@datasciencebasics Год назад
Thank you Pranav. I am able to create embedding with 1536 dimensions. There must be something wring somewhere OR Pinecone must have disabled for newer users.
@mrmortezajafari 11 месяцев назад
why we split data with chunk of 1000 or 1500 and then get 4 most relevant chunks? why not more than 1500 or 1000 character per chunk? or why not more than 4 releant chunks? is there limitation of characters to feed the chatGPT with data? how much is the limitation? after using the code I checked my API usage in OpenAi and saw that I have used instructGPT. what is instructGPT?
@datasciencebasics 11 месяцев назад ⁺¹
Please refer to these materials. Also, in the future you can do a simple google search to find your answers 😄
www.pinecone.io/learn/chunking-strategies/
Perplexity AI: what is instructGPT www.perplexity.ai/search/what-is-instructGPT-bbhHw0xHRLudFO.l..LIpQ?s=mn
@mrmortezajafari 11 месяцев назад ⁺¹
@@datasciencebasics You are right
@mrmortezajafari 11 месяцев назад
Hi, thank you for your contribution, how much data I can use? I mean a lot of documents ca be stored in the vector store?
@datasciencebasics 11 месяцев назад ⁺¹
As far as I know, you can store as much as you can. Give a try as I haven't tried storing too much of data.
@amartya4008 11 месяцев назад
Nice Video .Can we use open source models for the same ??
@datasciencebasics 11 месяцев назад
Here is with Llama2 ruclips.net/video/VPk-at5oqAY/видео.htmlsi=gkfVmnF0xP7pgJ8C
@malleswararaomaguluri6344 Год назад
Is it possible to retrieve embedding part from chroma like pinecone. 2nd doubt is first I done embed with 2 files and I want to add 2 more files. So I need to 4 files need to embed or latest 2 files embed can combine with first 2 embed data. If it is possible, how to do it
@datasciencebasics Год назад
Yes it is possible. You can just save the embeddings somewhere and again dump it in the same location. This way you don’t need to embed again. Give a try yourself and see how it works :)
@benitoloaiza566 Год назад
Hey I runned and it worked great but why the responses so cold? is it the temperature?
@datasciencebasics Год назад
Hey, might be. You can try playing around with temperature.
@snehitvaddi 6 месяцев назад
Getting some numpy error: "AttributeError: module 'numpy.linalg._umath_linalg' has no attribute '_ilp64' " in all your LangChain related colab notebooks
@datasciencebasics 6 месяцев назад
By just seeing this, I have no clue how to help you. Try using newer versions of packages or try to update the code as many libraries update over time.
@snehitvaddi 6 месяцев назад
@@datasciencebasics Could you please try rerunning your notebook up to the Directory Loader and check? That seems to be where the issue originates from
@Alkotas25 Год назад ⁺¹
hi, thx the video! is it possible also to chunk a long html code to chat with that and help gpt to modify long code?
@datasciencebasics Год назад ⁺¹
it is possible to load a github repo, make different chunks and ask questions related to code. I will demonstrate this in next video. Based in this, you might modify your code to achieve what you are looking for.
@Alkotas25 Год назад
@@datasciencebasics thx the reply and in advance the future video!
@JayP-127 Год назад
When executing the loaders section enters in a long process that I stopped after 13 minutes. I checked and everything seemed normal
@datasciencebasics Год назад
You might need to uninstall and install Pillow. Not sure if it helps but you can try.
import PIL
!pip uninstall Pillow
!pip install --upgrade Pillow
@mike-ss4wk Год назад
I copied your colab document. When I execute #take all the loader, then I get en error:
ImportError: cannot import name 'is_directory' from 'PIL._util' (/usr/local/lib/python3.10/dist-packages/PIL/_util.py)
How to solve it?
@datasciencebasics Год назад
You might need to uninstall and install Pillow.
import PIL
!pip uninstall Pillow
!pip install --upgrade Pillow
@smile_tadikonda6736 Год назад
great video really helpful
may i know different types of chatbots like gradio for free and code for some chatbots
and one more thing can we get our doucument link from our chat bot
thank you
please reply me
@datasciencebasics Год назад
Thanks, you can use gradio, streamlit and others to create a UI for chatbots. Yes, you can have sources also being shown when returning the answers. Please refer to langchain documentation about QA with sources.
@karthikeyaramaswamy5003 Год назад
The last part gradio in the code is tried to install a malware in my system.
@datasciencebasics Год назад
Strange, others are not facing this issue. Were you able to run it ?
@Curious-nomad Год назад
at loaders step where you are creating list out of all loader in document [] when i run this piece of code it is taking more than 10 mins andstill not executed anyone can help??
@datasciencebasics Год назад
hi, it’s hard to say what went wrong without seeing what and how u r loading. Hope, you already fix it.
@LynchTee Год назад
Can I know using this method will consumed any openAI token for reading and answer queries of the documents?
@datasciencebasics Год назад ⁺¹
Yes, it does. Please watch my PrivateGPT video to know how to chat with documents locally where OpenAI is not being used.
@LynchTee Год назад
@@datasciencebasics Yes. I have followed your PrivateGPT video. It's work, much appreciate!
@زينبسالمعزيز-د2ح Год назад
Ok if i have csv file how can i load it bro
@datasciencebasics Год назад
You can use CSVLoader for that. I have other videos too where I have explained how to deal with csv files.
@mohitaggarwal3625 Год назад
Can this be used to chat with JSON files too ?
@datasciencebasics Год назад
yes we can. give a try.
@GALTechEnterprises-m7c Год назад
Can you share colab link.
@datasciencebasics Год назад
hei, its in the description of the video 😎
@malleswararaomaguluri6344 Год назад
How much pinecone is secured
@datasciencebasics Год назад
It depends what kind of usecase you will be using but I would say generall it is secured !
@bennymartinez8197 Год назад
it doesn’t let me install the libraries, they take forever and that the end it crashes sand says there’s no dick space. Im on replit
@srinathv9227 Год назад
Authentication error : in pinecone steps how to slove?
@mohammedelismaili3803 Год назад
HI. I got the same error. Did you fix it?
@srinathv9227 Год назад
@@mohammedelismaili3803 No
@datasciencebasics Год назад ⁺¹
Hey, its related to pinecone authentication as it says. Either its from pinecone side or there is something wrong in ur side. Hopefully it will be fixed.
@besmart2350 10 месяцев назад
is there an easier way for non programmers to chat with 300+ own pdf's? does anyone sell ready to run solutions that I can just download and upload 300+ books on the same topic (legal theory for example) into it and chat with it?
@datasciencebasics 10 месяцев назад
you can use the latest model from OpenAI and use retrieval tool which they claim can handle 300 pages of pdf or you can even create GPTs from OpenAI
@besmart2350 10 месяцев назад
@@datasciencebasics not 300 pages, I mean 300+ books (pdf, epub and so on). Is it possible? not online, but offline on my Macintosh. Having my now GPT that is trained on my own 300 books on that cover the same particular topic. Train it and then chat with it, ask it questions on this topic which GPT will answer with info from all those 300 books I've loaded it with
@datasciencebasics 10 месяцев назад
there is nothing that’s impossible but for these kind of scenarios, good research about Open Source Models is needed. I can’t just say now use this and that model.
@besmart2350 10 месяцев назад
@@datasciencebasics create such an app please (for mac), I will buy it
@mrmortezajafari 11 месяцев назад
Is it possible to embed Persian language?
@datasciencebasics 11 месяцев назад ⁺¹
haven’t tried myself so I am not sure about it !
@mrmortezajafari 11 месяцев назад
@@datasciencebasics it supports
@AIEntusiast_ 7 месяцев назад
fails on import pinecone
@datasciencebasics 7 месяцев назад
hei, you might need to check the latest code from Langchain website. Also did u install pinecone ?

Следующие

Автовоспроизведение

🦜🔗 LangChain Components | Beginner's Guide | 2023