LangChain101: Connect Google Drive Files To OpenAI

Greg Kamradt (Data Indy)

Просмотров 33 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 22 авг 2024

Комментарии • 100

@temozarela Год назад ⁺⁷
I'm so obsessed goin through all of this videos one by one. No better way to spend my Saturday, especially when things work!
Thanks for your amazing contribution!
@adamsardo Год назад ⁺²
Appreciate what you've been doing and the time you've spent helping the community :)
@moreshk Год назад ⁺⁶
might be a bit silly to ask, but it would be useful if you can provide some guidance on how to setup the credentials json. Have been fumbling on it.
@merkemong1496 11 месяцев назад
same
@merkemong1496 11 месяцев назад
haven't you found a way to setup the credentials, I put my credentials.json at correct path but it still says not found
@davidwu3247 Год назад ⁺¹
awesome vid. can't wait till GPT4 is out and we can use google drive photos/text as multimodal input
@DataIndependent Год назад
Big time! That is going to be super cool.
@fliu5282 Год назад ⁺³
Python + LangChain + Html basic coding = Big Future = Prompt Engineering
@DataIndependent Год назад
Nice
@VictorCardonan Год назад ⁺⁵
Hello, thank you for the videos. They are really interesting. I have two questions:
1) Why are you not using embeddings in this case?
2) Would it make sense and it is possible to save the state of the summarizer so you don't have to do all the process from scratch if you have +1000 documents?
Thank you
@MK-jn9uu Год назад ⁺¹
I was thinking the same thing..
@EstherL-wd9yx Год назад ⁺¹
@DataIndependent - My main question is #2: How can we build a database of documents so that the knowledge db grow and not do all of the processing from scratch?
@badrinarayanans355 2 месяца назад
Great Insights
@rossgalvanofficial Год назад
Thank you for sharing this, very interested.
@ahsanahmad3193 11 месяцев назад ⁺¹
Should have shown the structure of credentials file. Maybe add in comment.
@bladeplays6425 Год назад ⁺¹
One use case that I would love to see is how this performs on Excel/Google Sheets Data. Given event/log data from a website or a mobile app and documentation on what activity each event type in the log represents, does the model know how to answer questions about frequent (or user-specific) app activity?
@blocksystems202 Год назад
You're amazing - thanks for sharing.
@DataIndependent Год назад
My pleasure!
@briandao975 Год назад ⁺¹
Awesome video thank you. Do you have a video on how to utilize embeddings in the sample scenario. Would like to create something similar but have a lot of docs. Also is there a way to refresh the embeddings automatically or on a schedule? For example, if the doc gets updated, how does that get handled
@eracton Год назад
Did you figure that out?
@weipingwu7852 Год назад ⁺¹
thanks very much! I have a question, I want to control the usage of document, only for my company internal use. If I use langchain, is the other party include openai can see my document? thansk
@DataIndependent Год назад ⁺¹
Yes, if you use OpenAI as your LLM then they can see your data. Check out their data retention policies for more information.
You could do a self hosted LLM for privacy reasons but that is more set up
@Iammikelovin Год назад
Hi, can you recommend info on self hosted LLM? Can I use OpenAI and basically not have them retain my data? Or do I have to use another LLM?
@user-pk6ym7og4w Год назад ⁺¹
Would it make sense to store embeddings in a database like Pinecone to avoid re-generating them with each call?
@DataIndependent Год назад
If you want them remote, then yep that would work. I should have put that example in the video
@RussellDeming Год назад
Definitely interested in implementing in my business
@DataIndependent Год назад
Nice! What domain are you in? How are you thinking about using it?
@frankrobert9199 11 месяцев назад
great
@carlosterrazas5091 Год назад
Great content, just a question about security of the information. Do you know if this way ChatGPT will see the information like if you enter it on their platform?. My concern is if you use for private documents then the info will be in ChatGPT database for everyone to see, thanks
@TreiGamer Год назад ⁺¹
Hey Data Independent, I'm new to Python and coding in general but AI has been the push I need to really dig into this. I got Jupyter running locally, is there a recommended resource you'd point me towards for bringing your code into it?
@TreiGamer Год назад ⁺²
Haha never mind, I figured it out. I just asked GPT 🤣
Love your content.
@DataIndependent Год назад ⁺³
Nice! That's great. What I was going to say is:
Easiest - Copy and paste the code from the github link in the description into your jupyter notebook
More Robust - Git clone the repo so you can stay up to date with future changes as well
@TreiGamer Год назад ⁺²
I did the git clone method. Thank you.
@Iammikelovin Год назад
Hello, I have just started watching a few of your vids, they’re super interesting and really well explained, thanks! Q: The source files, in my case several PDF docs, are confidential and my idea is to create a internal Q&A. What is the privacy? Does LongChain or OpenAI potentially have access to it? Does it add it to its “brain”? Or is it completely private? Thanks again
@bagamanocnon Год назад ⁺¹
Data used through the Open AI APIs like the questions fed to the LLM and the answers outputted by the LLM (what Open AI calls prompts and completions, respectively) will be stored on their servers for 30 days before being purged. Per their policy, only a limited number of employees within OpenAI itself - only those employees who are monitoring it for abuse - will have access to the data. For enterprise customers, they might even have the option to totally opt out of having their data stored at all. Look up Open AI API usage policies. I can't paste link here.
Using their embeddings service also exposes your data to OpenAI.
The demo in this video doesn't use embeddings but (it reads the text directly) but you almost always want to create a vector index with embeddings for your knowledge base (kb) specially if it consists of hundreds or thousands of documents. LLMs has an easier time 'reading' vector values rather than raw text. cheers.
@DataIndependent Год назад
Agree! and if you don't want OpenAI to have your data then you should be using a local model
@rahuliitm Год назад ⁺¹
Great tutorial. Absolutely loving it. I'm trying to read a gitbook and summarise it but apparently there's a prompt context length limit.
"This model's maximum context length is 4097 tokens, however you requested 7592 tokens"
Not sure where I can set the token limit
@jmanhype1 Год назад
yea thats why hes selling his service to fill in the gaps
@DataIndependent Год назад ⁺²
Nice! Yes there is a context limit for prompts. Check out either my video on asking a question to a 300 page book or else my "work arounds for prompt limit" video
@DataIndependent Год назад ⁺⁴
Nothing to sell here - happy to help with any questions you have though
@manyavarshney4399 Год назад
Hello, can you resolve my error? I gave credentials path and it got executed. But when I loaded document, it displayed "Access blocked to the Google Drive API"
@DataIndependent Год назад
Have you googled it? that sounds like a google credential issue
@wardaraees4887 Год назад
I want to ask question to my excel files or a dataset which is in csv format (not a text file) or may be want to get a file in a form of table from sql server which is a result of a sql query, is it possible to upload that file in googledrive the same way or this method is for just text files?
Or is there any direct way to ask question yo my sql table with open ai?
@DataIndependent Год назад
Check out the langchain documentation for how to query sql files, it's very doable.
@DheerSinghDel Год назад
Can u exactly explain the path of credentials folder assuming that I am working with GoogleColab and drive folder path where ipynb file is residing my drive at /ColabNotebooks/LangChain/drivetest.ipynb
@DataIndependent Год назад
I would put this question into chatgpt and have it work with you on the details.
It requires knowledge about your setup which I don't have
@coachfrank2808 Год назад ⁺¹
Nice!
@DataIndependent Год назад
Thanks!
@leticiaromanbernal4151 Год назад
Hi, I would like to know if there's any possibility to connect Google Sheets from my Google Drive account as it does with Google Doc. Please help me. Thanks a lot :)
@DataIndependent Год назад
big time - you can use langchains drive loader python.langchain.com/docs/modules/data_connection/document_loaders/integrations/google_drive
@federicogiacomarra Год назад
Not sure if this is explained elsewhere, can you retrieve the source document somehow together with the answer?
@nsitkarana Год назад
Nice video. I have one follow up - when i do any kind of interaction with openai (for instance the doc from google drive) or in the other video where i chunk/embed local documents, how safe are the personal documents. in other words, how safe is it to use openai for personal documents ? does anyone have any idea on that.
@user-fe9bh1cv4m Год назад
Hi Greg, I am getting an error while trying to connect Google Drive files to OpenAI and the error is below:
ValueError: Client secrets must be for a web or installed app. May you please me to resolve this error. I am using Azure credentials.
@DataIndependent Год назад ⁺¹
Because Azure and Google Drive are run by different companies the credentials won't work.
Try getting google credentials
@user-fe9bh1cv4m Год назад
@@DataIndependent Thanks Greg 😇
@AizzatAffero Год назад
Once langchain read all of it, does it store the data when we reopen it again?
@adamtemple8677 Год назад
Is it still limited by the prompt token limits, or can you use an entire G-Drive and chat with all your documents?
@user-vm7xx3wi8cbz 8 месяцев назад
Did you figure this out?
@ujjwalgupta1318 Год назад
Is this and directory loader not doing a similar sort of thing?
@joelmartinez7628 Год назад
Still skeptical in opening our internal information to gpt3. Information will definitely be used to train and internal information that will be public once fed to gpt3. am i wrong to ask if they have a plan they can use the data to train but not as public information?
@DataIndependent Год назад
I totally agree - It's a problem that will need to get solved. I actually tweeted about this same question here: twitter.com/GregKamradt/status/1627338667936337921
AFAIK this isn't on the roadmap for them yet but I hope I'm wrong
@VictorCardonan Год назад
why don't you use Gpt4all which can be installed locally and is not sending any data outside? It won't be that good nor straighforward but it can give you a good result.
@ahmadzaimhilmi Год назад
Still studying this langchain module. I'm looking to chain a series of questions, i.e. use result from a question to generate the next question.
@DataIndependent Год назад
Nice, that would likely be an agent. What's the example you want to do?
@ahmadzaimhilmi Год назад
@@DataIndependentA business plan aims to develop a research plan for a thesis. The research plan needs to find a research gap, which means an unexplored area in the existing literature. Otherwise, the research would be repetitive and unoriginal. This is a difficult part that involves a lot of writing and concentration. It might take around nine months to finish this part if one is very committed. To do this, one has to go through hundreds of papers, learn about the methods, materials, standards and challenges of similar research. There is a technique for doing this, but LLM simplifies it a lot. My approach is to use Bert or another tool to get relevant keywords from the papers and build on them for the research plan. This way, the researcher spend less time on the writing part and focus on doing the experiment.
@johnallen9992 10 месяцев назад
typo on screen on the credentials file name. minute 2.11
@DataIndependent 10 месяцев назад
Thank you, noted.
@cgtinc4868 Год назад
Sorry for noob question, where to place the "../../desktop_credetnaisl.json" as to admit that I am a non coder, just following your video along the way
@DataIndependent Год назад
Nice! You can place your credentials file where ever you want.
By default your program will usually look in a root folder, but you can tell it to look whereever you need.
If your credentials were in the same folder as your script you could do "credentials.json" without going up/down from any folder
@cgtinc4868 Год назад ⁺¹
@@DataIndependent Thanks! wrote to you in Twitter as well
@photon2724 Год назад
Another fantastic tutorial! although, what is the credentials.json file? and how can i get my own?
@DataIndependent Год назад
Thanks! That is on the google side of the house.
developers.google.com/workspace/guides/create-credentials
@anishmanandhar1203 Год назад
and how do we do with it , how do we get the .json file@@DataIndependent
@cgtinc4868 Год назад
Great video and as founder of startup need this tool! Is there a way not to access Google drive but like Synology Nas (which we use), that will be really really helpful
@DataIndependent Год назад
Thank you! I've never heard of Synology. For it to integrate it would either take a custom data loader from LangChain/Unstructured or you'd need to export the files you'd want to another spot.
@cgtinc4868 Год назад
@@DataIndependent Thanks! its just a brand for external NAS setup. Maybe you can have a video on local HD drive which with that we can just change the path for wherever the source of the documents are :)
@ivantan222 Год назад
4:00 That's a pretty short summary of the long text, is there any parameter to make it longer?
@DataIndependent Год назад
You can see here the prompt that is being used to generate this summary
github.com/hwchase17/langchain/blob/master/langchain/chains/summarize/stuff_prompt.py
Under the hood it's just a prompt with your text in it. You could adjust the prompt manually (not by using the chain, but doing your own prompt) to get a longer one.
@ivantan222 Год назад
@@DataIndependent ah okay, thanks a lot for your info.
@johnallen9992 10 месяцев назад
Langchain just removed the Google drive connect Tool from their API... gotta build a custom Tool now w the google doc loader for the drive
@DataIndependent 10 месяцев назад
Weird I didn't know that - thanks for letting me know
@ezequielmelillan1708 Год назад
Hi man, thanks for sharing, this is amazing. Can you make a video using alpaca/llama integration with LangChain? Is it possible to use embeddings with those open-source AI?
@DataIndependent Год назад ⁺¹
Yep it's very possible you just need to swap out your embeddings model
@haisai4159 Год назад
amazing tutorial! beginner here: can you do this for a google sheets and instead of juypter notebook a google collab notebook? thank you!
@DataIndependent Год назад
What's the use case you'd want to run through
@AmineBELALIA Год назад
@@DataIndependent have the same problem. I have a list of product specifications (2000 specs) and I want to build a chatbot that can answer customer questions about these products and explain the technical details of each spec by searching the internet ( google sheet doesn't have thislevel of detail )
@learnapplybuild Год назад
Please make a video on onedrive
@vinosamari Год назад
Please do a map-reduce video
@DataIndependent Год назад
Here's a video explaining the different chain_types
ruclips.net/video/f9_BWhCI4Zo/видео.html
@user-ig3ww3dz1x Год назад
How do I get my credentials path from google?
@DataIndependent Год назад
*You* give your credentials path to google.
This guide may help googleapis.dev/python/google-auth/latest/user-guide.html
@neon_Nomad Год назад
What about nextCloud or syncthing?
@DataIndependent Год назад
Could you link me to the examples you'd want to see?
@zes7215 Год назад
wrg
@abdoualgerian5396 Год назад
the only bad thing about your content is the disturbing background music not all people can concentrate on a mixiture of more than one voice
@ryanonvr2267 Год назад
---> 76 with open(self.token_path, "w") as token:
77 token.write(creds.to_json())
79 return creds
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\info\\.credentials\\token.json' (even though the cred file is correct somewhere else.)
:( newb
@DataIndependent Год назад
You can do two things
1) Make sure your cred file is in the location your script is looking for (I'm guessing it's the directory you mentioned above)
2) Tell your script to look elsewhere. This would be the location of your creds file wherever you would like it. I usually do it in my same folder or a parent folder above.

Следующие

Автовоспроизведение

LangChain 101: YouTube Transcripts + OpenAI