LangChain101: Connect Google Drive Files To OpenAI

Поделиться
HTML-код
  • Опубликовано: 22 авг 2024

Комментарии • 100

  • @temozarela
    @temozarela Год назад +7

    I'm so obsessed goin through all of this videos one by one. No better way to spend my Saturday, especially when things work!
    Thanks for your amazing contribution!

  • @adamsardo
    @adamsardo Год назад +2

    Appreciate what you've been doing and the time you've spent helping the community :)

  • @moreshk
    @moreshk Год назад +6

    might be a bit silly to ask, but it would be useful if you can provide some guidance on how to setup the credentials json. Have been fumbling on it.

    • @merkemong1496
      @merkemong1496 11 месяцев назад

      same

    • @merkemong1496
      @merkemong1496 11 месяцев назад

      haven't you found a way to setup the credentials, I put my credentials.json at correct path but it still says not found

  • @davidwu3247
    @davidwu3247 Год назад +1

    awesome vid. can't wait till GPT4 is out and we can use google drive photos/text as multimodal input

  • @fliu5282
    @fliu5282 Год назад +3

    Python + LangChain + Html basic coding = Big Future = Prompt Engineering

  • @VictorCardonan
    @VictorCardonan Год назад +5

    Hello, thank you for the videos. They are really interesting. I have two questions:
    1) Why are you not using embeddings in this case?
    2) Would it make sense and it is possible to save the state of the summarizer so you don't have to do all the process from scratch if you have +1000 documents?
    Thank you

    • @MK-jn9uu
      @MK-jn9uu Год назад +1

      I was thinking the same thing..

    • @EstherL-wd9yx
      @EstherL-wd9yx Год назад +1

      @DataIndependent - My main question is #2: How can we build a database of documents so that the knowledge db grow and not do all of the processing from scratch?

  • @badrinarayanans355
    @badrinarayanans355 2 месяца назад

    Great Insights

  • @rossgalvanofficial
    @rossgalvanofficial Год назад

    Thank you for sharing this, very interested.

  • @ahsanahmad3193
    @ahsanahmad3193 11 месяцев назад +1

    Should have shown the structure of credentials file. Maybe add in comment.

  • @bladeplays6425
    @bladeplays6425 Год назад +1

    One use case that I would love to see is how this performs on Excel/Google Sheets Data. Given event/log data from a website or a mobile app and documentation on what activity each event type in the log represents, does the model know how to answer questions about frequent (or user-specific) app activity?

  • @blocksystems202
    @blocksystems202 Год назад

    You're amazing - thanks for sharing.

  • @briandao975
    @briandao975 Год назад +1

    Awesome video thank you. Do you have a video on how to utilize embeddings in the sample scenario. Would like to create something similar but have a lot of docs. Also is there a way to refresh the embeddings automatically or on a schedule? For example, if the doc gets updated, how does that get handled

    • @eracton
      @eracton Год назад

      Did you figure that out?

  • @weipingwu7852
    @weipingwu7852 Год назад +1

    thanks very much! I have a question, I want to control the usage of document, only for my company internal use. If I use langchain, is the other party include openai can see my document? thansk

    • @DataIndependent
      @DataIndependent  Год назад +1

      Yes, if you use OpenAI as your LLM then they can see your data. Check out their data retention policies for more information.
      You could do a self hosted LLM for privacy reasons but that is more set up

    • @Iammikelovin
      @Iammikelovin Год назад

      Hi, can you recommend info on self hosted LLM? Can I use OpenAI and basically not have them retain my data? Or do I have to use another LLM?

  • @user-pk6ym7og4w
    @user-pk6ym7og4w Год назад +1

    Would it make sense to store embeddings in a database like Pinecone to avoid re-generating them with each call?

    • @DataIndependent
      @DataIndependent  Год назад

      If you want them remote, then yep that would work. I should have put that example in the video

  • @RussellDeming
    @RussellDeming Год назад

    Definitely interested in implementing in my business

    • @DataIndependent
      @DataIndependent  Год назад

      Nice! What domain are you in? How are you thinking about using it?

  • @frankrobert9199
    @frankrobert9199 11 месяцев назад

    great

  • @carlosterrazas5091
    @carlosterrazas5091 Год назад

    Great content, just a question about security of the information. Do you know if this way ChatGPT will see the information like if you enter it on their platform?. My concern is if you use for private documents then the info will be in ChatGPT database for everyone to see, thanks

  • @TreiGamer
    @TreiGamer Год назад +1

    Hey Data Independent, I'm new to Python and coding in general but AI has been the push I need to really dig into this. I got Jupyter running locally, is there a recommended resource you'd point me towards for bringing your code into it?

    • @TreiGamer
      @TreiGamer Год назад +2

      Haha never mind, I figured it out. I just asked GPT 🤣
      Love your content.

    • @DataIndependent
      @DataIndependent  Год назад +3

      Nice! That's great. What I was going to say is:
      Easiest - Copy and paste the code from the github link in the description into your jupyter notebook
      More Robust - Git clone the repo so you can stay up to date with future changes as well

    • @TreiGamer
      @TreiGamer Год назад +2

      I did the git clone method. Thank you.

  • @Iammikelovin
    @Iammikelovin Год назад

    Hello, I have just started watching a few of your vids, they’re super interesting and really well explained, thanks! Q: The source files, in my case several PDF docs, are confidential and my idea is to create a internal Q&A. What is the privacy? Does LongChain or OpenAI potentially have access to it? Does it add it to its “brain”? Or is it completely private? Thanks again

    • @bagamanocnon
      @bagamanocnon Год назад +1

      Data used through the Open AI APIs like the questions fed to the LLM and the answers outputted by the LLM (what Open AI calls prompts and completions, respectively) will be stored on their servers for 30 days before being purged. Per their policy, only a limited number of employees within OpenAI itself - only those employees who are monitoring it for abuse - will have access to the data. For enterprise customers, they might even have the option to totally opt out of having their data stored at all. Look up Open AI API usage policies. I can't paste link here.
      Using their embeddings service also exposes your data to OpenAI.
      The demo in this video doesn't use embeddings but (it reads the text directly) but you almost always want to create a vector index with embeddings for your knowledge base (kb) specially if it consists of hundreds or thousands of documents. LLMs has an easier time 'reading' vector values rather than raw text. cheers.

    • @DataIndependent
      @DataIndependent  Год назад

      Agree! and if you don't want OpenAI to have your data then you should be using a local model

  • @rahuliitm
    @rahuliitm Год назад +1

    Great tutorial. Absolutely loving it. I'm trying to read a gitbook and summarise it but apparently there's a prompt context length limit.
    "This model's maximum context length is 4097 tokens, however you requested 7592 tokens"
    Not sure where I can set the token limit

    • @jmanhype1
      @jmanhype1 Год назад

      yea thats why hes selling his service to fill in the gaps

    • @DataIndependent
      @DataIndependent  Год назад +2

      Nice! Yes there is a context limit for prompts. Check out either my video on asking a question to a 300 page book or else my "work arounds for prompt limit" video

    • @DataIndependent
      @DataIndependent  Год назад +4

      Nothing to sell here - happy to help with any questions you have though

  • @manyavarshney4399
    @manyavarshney4399 Год назад

    Hello, can you resolve my error? I gave credentials path and it got executed. But when I loaded document, it displayed "Access blocked to the Google Drive API"

    • @DataIndependent
      @DataIndependent  Год назад

      Have you googled it? that sounds like a google credential issue

  • @wardaraees4887
    @wardaraees4887 Год назад

    I want to ask question to my excel files or a dataset which is in csv format (not a text file) or may be want to get a file in a form of table from sql server which is a result of a sql query, is it possible to upload that file in googledrive the same way or this method is for just text files?
    Or is there any direct way to ask question yo my sql table with open ai?

    • @DataIndependent
      @DataIndependent  Год назад

      Check out the langchain documentation for how to query sql files, it's very doable.

  • @DheerSinghDel
    @DheerSinghDel Год назад

    Can u exactly explain the path of credentials folder assuming that I am working with GoogleColab and drive folder path where ipynb file is residing my drive at /ColabNotebooks/LangChain/drivetest.ipynb

    • @DataIndependent
      @DataIndependent  Год назад

      I would put this question into chatgpt and have it work with you on the details.
      It requires knowledge about your setup which I don't have

  • @coachfrank2808
    @coachfrank2808 Год назад +1

    Nice!

  • @leticiaromanbernal4151
    @leticiaromanbernal4151 Год назад

    Hi, I would like to know if there's any possibility to connect Google Sheets from my Google Drive account as it does with Google Doc. Please help me. Thanks a lot :)

    • @DataIndependent
      @DataIndependent  Год назад

      big time - you can use langchains drive loader python.langchain.com/docs/modules/data_connection/document_loaders/integrations/google_drive

  • @federicogiacomarra
    @federicogiacomarra Год назад

    Not sure if this is explained elsewhere, can you retrieve the source document somehow together with the answer?

  • @nsitkarana
    @nsitkarana Год назад

    Nice video. I have one follow up - when i do any kind of interaction with openai (for instance the doc from google drive) or in the other video where i chunk/embed local documents, how safe are the personal documents. in other words, how safe is it to use openai for personal documents ? does anyone have any idea on that.

  • @user-fe9bh1cv4m
    @user-fe9bh1cv4m Год назад

    Hi Greg, I am getting an error while trying to connect Google Drive files to OpenAI and the error is below:
    ValueError: Client secrets must be for a web or installed app. May you please me to resolve this error. I am using Azure credentials.

    • @DataIndependent
      @DataIndependent  Год назад +1

      Because Azure and Google Drive are run by different companies the credentials won't work.
      Try getting google credentials

    • @user-fe9bh1cv4m
      @user-fe9bh1cv4m Год назад

      @@DataIndependent Thanks Greg 😇

  • @AizzatAffero
    @AizzatAffero Год назад

    Once langchain read all of it, does it store the data when we reopen it again?

  • @adamtemple8677
    @adamtemple8677 Год назад

    Is it still limited by the prompt token limits, or can you use an entire G-Drive and chat with all your documents?

  • @ujjwalgupta1318
    @ujjwalgupta1318 Год назад

    Is this and directory loader not doing a similar sort of thing?

  • @joelmartinez7628
    @joelmartinez7628 Год назад

    Still skeptical in opening our internal information to gpt3. Information will definitely be used to train and internal information that will be public once fed to gpt3. am i wrong to ask if they have a plan they can use the data to train but not as public information?

    • @DataIndependent
      @DataIndependent  Год назад

      I totally agree - It's a problem that will need to get solved. I actually tweeted about this same question here: twitter.com/GregKamradt/status/1627338667936337921
      AFAIK this isn't on the roadmap for them yet but I hope I'm wrong

    • @VictorCardonan
      @VictorCardonan Год назад

      why don't you use Gpt4all which can be installed locally and is not sending any data outside? It won't be that good nor straighforward but it can give you a good result.

  • @ahmadzaimhilmi
    @ahmadzaimhilmi Год назад

    Still studying this langchain module. I'm looking to chain a series of questions, i.e. use result from a question to generate the next question.

    • @DataIndependent
      @DataIndependent  Год назад

      Nice, that would likely be an agent. What's the example you want to do?

    • @ahmadzaimhilmi
      @ahmadzaimhilmi Год назад

      @@DataIndependentA business plan aims to develop a research plan for a thesis. The research plan needs to find a research gap, which means an unexplored area in the existing literature. Otherwise, the research would be repetitive and unoriginal. This is a difficult part that involves a lot of writing and concentration. It might take around nine months to finish this part if one is very committed. To do this, one has to go through hundreds of papers, learn about the methods, materials, standards and challenges of similar research. There is a technique for doing this, but LLM simplifies it a lot. My approach is to use Bert or another tool to get relevant keywords from the papers and build on them for the research plan. This way, the researcher spend less time on the writing part and focus on doing the experiment.

  • @johnallen9992
    @johnallen9992 10 месяцев назад

    typo on screen on the credentials file name. minute 2.11

  • @cgtinc4868
    @cgtinc4868 Год назад

    Sorry for noob question, where to place the "../../desktop_credetnaisl.json" as to admit that I am a non coder, just following your video along the way

    • @DataIndependent
      @DataIndependent  Год назад

      Nice! You can place your credentials file where ever you want.
      By default your program will usually look in a root folder, but you can tell it to look whereever you need.
      If your credentials were in the same folder as your script you could do "credentials.json" without going up/down from any folder

    • @cgtinc4868
      @cgtinc4868 Год назад +1

      ​@@DataIndependent Thanks! wrote to you in Twitter as well

  • @photon2724
    @photon2724 Год назад

    Another fantastic tutorial! although, what is the credentials.json file? and how can i get my own?

    • @DataIndependent
      @DataIndependent  Год назад

      Thanks! That is on the google side of the house.
      developers.google.com/workspace/guides/create-credentials

    • @anishmanandhar1203
      @anishmanandhar1203 Год назад

      and how do we do with it , how do we get the .json file@@DataIndependent

  • @cgtinc4868
    @cgtinc4868 Год назад

    Great video and as founder of startup need this tool! Is there a way not to access Google drive but like Synology Nas (which we use), that will be really really helpful

    • @DataIndependent
      @DataIndependent  Год назад

      Thank you! I've never heard of Synology. For it to integrate it would either take a custom data loader from LangChain/Unstructured or you'd need to export the files you'd want to another spot.

    • @cgtinc4868
      @cgtinc4868 Год назад

      @@DataIndependent Thanks! its just a brand for external NAS setup. Maybe you can have a video on local HD drive which with that we can just change the path for wherever the source of the documents are :)

  • @ivantan222
    @ivantan222 Год назад

    4:00 That's a pretty short summary of the long text, is there any parameter to make it longer?

    • @DataIndependent
      @DataIndependent  Год назад

      You can see here the prompt that is being used to generate this summary
      github.com/hwchase17/langchain/blob/master/langchain/chains/summarize/stuff_prompt.py
      Under the hood it's just a prompt with your text in it. You could adjust the prompt manually (not by using the chain, but doing your own prompt) to get a longer one.

    • @ivantan222
      @ivantan222 Год назад

      @@DataIndependent ah okay, thanks a lot for your info.

  • @johnallen9992
    @johnallen9992 10 месяцев назад

    Langchain just removed the Google drive connect Tool from their API... gotta build a custom Tool now w the google doc loader for the drive

    • @DataIndependent
      @DataIndependent  10 месяцев назад

      Weird I didn't know that - thanks for letting me know

  • @ezequielmelillan1708
    @ezequielmelillan1708 Год назад

    Hi man, thanks for sharing, this is amazing. Can you make a video using alpaca/llama integration with LangChain? Is it possible to use embeddings with those open-source AI?

    • @DataIndependent
      @DataIndependent  Год назад +1

      Yep it's very possible you just need to swap out your embeddings model

  • @haisai4159
    @haisai4159 Год назад

    amazing tutorial! beginner here: can you do this for a google sheets and instead of juypter notebook a google collab notebook? thank you!

    • @DataIndependent
      @DataIndependent  Год назад

      What's the use case you'd want to run through

    • @AmineBELALIA
      @AmineBELALIA Год назад

      ​@@DataIndependent have the same problem. I have a list of product specifications (2000 specs) and I want to build a chatbot that can answer customer questions about these products and explain the technical details of each spec by searching the internet ( google sheet doesn't have thislevel of detail )

  • @learnapplybuild
    @learnapplybuild Год назад

    Please make a video on onedrive

  • @vinosamari
    @vinosamari Год назад

    Please do a map-reduce video

    • @DataIndependent
      @DataIndependent  Год назад

      Here's a video explaining the different chain_types
      ruclips.net/video/f9_BWhCI4Zo/видео.html

  • @user-ig3ww3dz1x
    @user-ig3ww3dz1x Год назад

    How do I get my credentials path from google?

    • @DataIndependent
      @DataIndependent  Год назад

      *You* give your credentials path to google.
      This guide may help googleapis.dev/python/google-auth/latest/user-guide.html

  • @neon_Nomad
    @neon_Nomad Год назад

    What about nextCloud or syncthing?

    • @DataIndependent
      @DataIndependent  Год назад

      Could you link me to the examples you'd want to see?

  • @zes7215
    @zes7215 Год назад

    wrg

  • @abdoualgerian5396
    @abdoualgerian5396 Год назад

    the only bad thing about your content is the disturbing background music not all people can concentrate on a mixiture of more than one voice

  • @ryanonvr2267
    @ryanonvr2267 Год назад

    ---> 76 with open(self.token_path, "w") as token:
    77 token.write(creds.to_json())
    79 return creds
    FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\info\\.credentials\\token.json' (even though the cred file is correct somewhere else.)
    :( newb

    • @DataIndependent
      @DataIndependent  Год назад

      You can do two things
      1) Make sure your cred file is in the location your script is looking for (I'm guessing it's the directory you mentioned above)
      2) Tell your script to look elsewhere. This would be the location of your creds file wherever you would like it. I usually do it in my same folder or a parent folder above.