Working with MULTIPLE PDF Files in LangChain: ChatGPT for your Data

Поделиться
HTML-код
  • Опубликовано: 12 апр 2023
  • Welcome to this tutorial video where we'll discuss the process of loading multiple PDF files in LangChain for information retrieval using OpenAI models like ChatGPT. Our step-by-step guide will explain how to convert PDF files into embeddings based on the chosen large language model. Let's get started!
    Welcome to this tutorial where you'll learn how to extract valuable information from your PDFs using LangChain and OpenAI Text Embeddings. We'll guide you step-by-step through the process of setting up LangChain to communicate with your PDF files, allowing you to retrieve information efficiently and effectively. By the end of this tutorial, you'll have the skills necessary to use advanced language processing technology and improve your data analysis.
    ▬▬▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬
    ☕ Buy me a Coffee: ko-fi.com/promptengineering
    |🔴 Support my work on Patreon: Patreon.com/PromptEngineering
    🦾 Discord: / discord
    ▶️️ Subscribe: www.youtube.com/@engineerprom...
    📧 Business Contact: engineerprompt@gmail.com
    💼Consulting: calendly.com/engineerprompt/c...
    ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
    LINKS:
    Google Colab: colab.research.google.com/dri...
    LangChain: docs.langchain.com/docs/
    VectorstoreIndexCreator vectorstore: tinyurl.com/3yz455m3
    ☕ Buy me a Coffee: ko-fi.com/promptengineering
    Join the Patreon: patreon.com/PromptEngineering
    #LangChain #InformationRetrieval #PDF #OpenAITextEmbeddings #DataAnalysis #LanguageProcessingTechnology #AI #MachineLearning #NaturalLanguageProcessing #NLP #Tutorial
  • НаукаНаука

Комментарии • 258

  • @engineerprompt
    @engineerprompt  Год назад

    Want to connect?
    💼Consulting: calendly.com/engineerprompt/consulting-call
    🦾 Discord: discord.com/invite/t4eYQRUcXB
    ☕ Buy me a Coffee: ko-fi.com/promptengineering
    |🔴 Join Patreon: Patreon.com/PromptEngineering
    ▶ Subscribe: www.youtube.com/@engineerprompt?sub_confirmation=1

    • @chandrasekhargogula7991
      @chandrasekhargogula7991 10 месяцев назад

      Suppose my question is about one document but it's taking answer from another document it's giving irrelevant answer , how we can handle it

  • @asepmulyana9085
    @asepmulyana9085 Год назад

    Just wanna say thanks a lot for your tutorial!

  • @neerajjulka4756
    @neerajjulka4756 5 месяцев назад +1

    superb explanation. Thanks

  • @mauriceleetp
    @mauriceleetp Год назад

    Thank you for sharing! Excellent video.

  • @samdaniel1368
    @samdaniel1368 Год назад

    Thank you very much. Is there anyway we can specify which document to scan into to find the answers?

  • @dealersagent
    @dealersagent Год назад

    Very cool video. Thank you!

  • @markanthonymarez
    @markanthonymarez Год назад +1

    Can you choose which model to use? I don’t see a request completion with the model statement. Thank you for this video - I’m still learning by doing.

  • @TheAzerue
    @TheAzerue Год назад

    Great Video. I'm learning a a lot from you. Thank you.

  • @VenkatesanVenkat-fd4hg
    @VenkatesanVenkat-fd4hg Год назад +1

    Thanks for excellent video. How to get the page number of the content & sources...Any suggestions

  • @giovannigrassobbio6448
    @giovannigrassobbio6448 Год назад +3

    Hi, very good work. Thanks! Sorry but the link of google colab is invalid

  • @PallaviChauhan91
    @PallaviChauhan91 4 месяца назад

    You are amazing! This is exactly what I was looking for. I might also need to connect with you in future for consultancy on something that I am trying to build.

  • @asprinama
    @asprinama Год назад +2

    My man! First, you're a monster. Obviously, I bought your a coffee. Anyway, there were 3 erros/bugs (excuse my languague this is the first time I code something in my life); which in case somebody was struggling I think they are useful. 1) in the section Connect Google Drive; second segment of the code; I had to input between pdf_folder_path = f'{root_dir}/data/' and os.listdir(pdf_folder_path) the line import os. In other words, the full line(s) of code is (first line) pdf_folder_path = f'{root_dir}/data/' [enter] import os (second line) [enter] os.listdir(pdf_folder_path) (third line). 2. In the section 'Load Multiple PDF Files' I included these two lines of code from langchain.document_loaders import UnstructuredPDFLoader
    from langchain.indexes import VectorstoreIndexCreator; 3) In Vector Store section as a first line of code I have included: !pip install unstructured[local-inference]. And that's basically it! Cheers mate!

    • @engineerprompt
      @engineerprompt  Год назад

      Thank you!

    • @sylap
      @sylap 4 месяца назад

      Thanks stranger! You just fixed my traceback error with suggestion number 3.

  • @NerdNetArcade
    @NerdNetArcade 6 месяцев назад

    Hey, Thanks for the tutorial, I am thinking of creating a Voice Assistant using Open AI embeddings , it there any tutorial for this ?

  • @sammiller9855
    @sammiller9855 Год назад +5

    Please consider doing a similar video on how to be able to chat more freely with Google Drive PDFS with memory. For example, having the script generate a glossary, an outline, or a lesson plan based on the database of pdfs.

    • @engineerprompt
      @engineerprompt  Год назад +6

      Sure, that's in plans. There is a video on channel 'crash course on langchain', it has a section on memory so check that out for the time being :)

    • @haveaniceday7950
      @haveaniceday7950 Год назад

      This is great idea Sam

  • @scott701230
    @scott701230 Год назад +1

    Great stuff

  • @lilianxx8073
    @lilianxx8073 Год назад +1

    Thank you so much, it worked! (But also required me to install that pdfminer and several other things)

    • @engineerprompt
      @engineerprompt  Год назад

      Glad you found it useful. Google colab is sometimes really funny :)

  • @LoneRanger.801
    @LoneRanger.801 Год назад +2

    This is excellent. Would love for you to dwell deeper into this experimentation. How much did it cost you on OpenAI’s end? For embeddings etc.

    • @engineerprompt
      @engineerprompt  Год назад +1

      Thanks, I will be doing a lot more on this. For this video & and experimentation, the cost was around $1.

  • @DiegoPaulDP
    @DiegoPaulDP Год назад

    Hello! nice solution, wanted to try it, is the colab link working?

  • @elgodric
    @elgodric Год назад +2

    Does this method works with full books ~300 pages?

  • @JJaitley
    @JJaitley 11 месяцев назад

    This is all good for demos but is langchain reliable for production-level apps. Are there any alternates? @Prompt Engineering

  • @arsalanriaz3382
    @arsalanriaz3382 Год назад +4

    Can you also include how to interact with tables and pictures in a PDF document

  • @gsdeng
    @gsdeng Год назад +1

    can it answer questions that need information from multiple pdfs?

  • @RonBarrett1954
    @RonBarrett1954 Год назад +2

    Hi Prompt Engineering!
    Quick question: I like the way you created an index from multiple PDF files and queried from the index. Have you attempted to persist the vectorstore for later use (e.g., query or update with additional documents)?

  • @AC-fn7jl
    @AC-fn7jl Год назад

    How would you adjust the temperature ?

  • @matheus89555
    @matheus89555 Год назад

    Hi, when installing chromadb I get error when installing hnslib, how did you fixed it?
    "Failed building wheel for hnswlib"

  • @lynnqi6451
    @lynnqi6451 Год назад +1

    Loooooove it ❤

  • @morris5648
    @morris5648 Год назад

    Good stuff. Two questions/suggestions. First, is the data stored locally or in a database like Pinecone? Second, can the intake be modified so that I can use directeoryLoader?
    Good stuff!

    • @engineerprompt
      @engineerprompt  Год назад +2

      Thank you.
      1) For this example, its stored locally but you can use any database you want.
      2) Yes, you can do that. In that case, you will have to define the file type you want to read.
      hope this helps.

  • @user-vb3uy5nh9k
    @user-vb3uy5nh9k 6 месяцев назад

    is accuracy being calculated for the model?

  • @md.shahriaralam5930
    @md.shahriaralam5930 11 месяцев назад

    can we compare the difference between 2 pdfs ?

  • @michakoodziej8760
    @michakoodziej8760 Год назад

    Hello, great tutorial! Any idea how to change the max_tokens (output tokens) in this approach? So far I'm getting 256 tokens in the response, while I need much more.

  • @krisszostak4849
    @krisszostak4849 Год назад

    which part of the script creates embedings and uses chroma pls?

  • @scott701230
    @scott701230 Год назад

    Thank you for educating us. I wonder how would you integrate AutoGPT for multiple agents

    • @engineerprompt
      @engineerprompt  Год назад

      AutoGPT will decide how many agents to use based on the problem its trying to solve. I don't think you need to specify number of agents.

  • @szymoonkowalczyk
    @szymoonkowalczyk Год назад

    I am very interested in how you made the model of the animated face that is at the beginning of the video, is there a tutorial on your channel on how to make such an avatar?

    • @engineerprompt
      @engineerprompt  Год назад

      Yes, check this out: ruclips.net/video/V2efVSXSlqc/видео.html
      I have a couple of other videos as well that use open source tools. You can look for those as well.

  • @caiyu538
    @caiyu538 9 месяцев назад

    it works greatly to understand my files. But even with T4 GPU 16gB gpu memory, it takes 2-3 minutes to get answer for a file with 4 or 5 pages. Is it normal in this GPU condition?

  • @Alex-Ibby
    @Alex-Ibby Год назад +3

    One more question - do the documents need to be reloaded into a vector every single time? Or can we simply import the query and answer to another Python file?

    • @engineerprompt
      @engineerprompt  Год назад +4

      That's a great question, should have addressed this in the video. You can simply write the embedding into a file and store that instead. Then reuse it whenever you want

  • @ynboxlive
    @ynboxlive Год назад +6

    I found that I had to add this in order for it to work:
    !pip install unstructured[local-inference]
    Otherwise I got this error:
    ImportError: Following dependencies are missing: pdfminer. Please install them using `pip install unstructured[local-inference]`.
    Why is this?

    • @Prakash-oq5ke
      @Prakash-oq5ke Год назад +1

      Thanks a lot! This indeed saved my time!

    • @youshikyou
      @youshikyou Год назад

      Hi, I have this error. How did you solve it? What is the local-inference?

  • @henkhbit5748
    @henkhbit5748 Год назад

    Thanks for the clear example👍 I have 2 additional questions"
    1. if u have a pdf with mathematical formula, as an example to calculate some measure (i.e. BMI). Can u also ask for the BMI if u supply your length and weight?
    2. if I have a document with question and answers. How to feed it?
    Thanks in advance.

    • @sauravmukherjeecom
      @sauravmukherjeecom Год назад +2

      1. Might be possible with GPT4 token.
      2. Should be possible, similar to any other PDF. It will treat it normally like a sequence of tokens.

  • @Fordtruck4sale
    @Fordtruck4sale Год назад

    Please update the colab link, thanks very much!😀

  • @shubhammural4760
    @shubhammural4760 Год назад

    Hi Prompt Engineering
    I tried with similar example, but I am getting error
    Did not find openai_api_key, please add an environment variable `OPENAI_API_KEY` which contains it, or pass `openai_api_key` as a named parameter. (type=value_error)

  • @Alex-Ibby
    @Alex-Ibby Год назад

    Awesome video. Is it possible to run this as a regular Python file without Jupyter notebook, anything I should be aware of?

    • @engineerprompt
      @engineerprompt  Год назад

      Yes, absolutely you can do that. Just create another virtual environment for this and install the packages and you are good to go.

  • @moz658
    @moz658 Год назад +2

    Nice tutorial. I am actually facing a problem when trying to use the Chroma vector store with a persisted index. I have already loaded a document, created embeddings for it, and saved those embeddings in Chroma. The script ran perfectly with LLM and also created the necessary files in the persistence directory (.chroma\index).
    However, when I try to initialize the Chroma instance using the persist_directory to utilize the previously saved embeddings, I encounter a NoIndexException error, stating "Index not found, please create an instance before querying". Is there a way to fix it? Could you find a solution and make a video of it?
    Additionally, I am curious if these pre-existing embeddings could be reused without incurring the same cost for generating Ada embeddings again, as the documents I am working with have lots of pages. Thanks in advance!

    • @engineerprompt
      @engineerprompt  Год назад +3

      I will try to look at the first problem. 2) yes, you can do that.

    • @agyson
      @agyson Год назад

      Hello, is there any update regarding this problem? By the way, nice vids!!

  • @PradyMixes
    @PradyMixes 8 месяцев назад

    Just curious do you have a tutorial for multiple pdfs using llama 2 and other open source embedding ?

    • @engineerprompt
      @engineerprompt  8 месяцев назад

      Yes just check out my localgpt videos

  • @cheunghenrik7041
    @cheunghenrik7041 Год назад +1

    May I ask does it work with PDFs having over 4000 tokens (the limit of OpenAI API)? Thanks a lot for providing both guidelines and Colab notebook for immediate use!

    • @ilianos
      @ilianos 10 месяцев назад +1

      That's what chunks are for. The text is split up into those chunks so that it makes them manageable for further processing.

  • @ahmedsalimlachkar5460
    @ahmedsalimlachkar5460 Год назад

    Thank you so much that's quite helpful ! , although it would be great if you can help us give memory to it , for exemple if I correct a wrong output the bot should remember it. Have a nice a day and keep up the good work.

    • @engineerprompt
      @engineerprompt  Год назад +2

      Glad you found it helpful. As far the memory is concerned, watch this video, there is a section on how to do it. I will be making more detailed videos on it later: ruclips.net/video/5-fc4Tlgmro/видео.html&ab_channel=PromptEngineering

  • @FranciscoMonteiro25
    @FranciscoMonteiro25 Год назад

    is there a github respository for all your excellent training videos?

  • @kaleshashaik5959
    @kaleshashaik5959 4 месяца назад

    Can we integrate this with Django?

  • @user-vl3mr4yg6v
    @user-vl3mr4yg6v Год назад +4

    Thanks for the video, it's very useful. Is it possible to integrate a voice assistant that receives a question as input and answers via voice, using the information present in the pdfs? It would be very useful. It could be done by whisper or bark. What do you think about it?

    • @kirklearned
      @kirklearned Год назад +1

      I downloaded many PDF's just waiting for the day this becomes a reality.

    • @engineerprompt
      @engineerprompt  Год назад +5

      Yes, if I find time, I will put together something for this.

    • @haveaniceday7950
      @haveaniceday7950 Год назад

      I want this too!

  • @gnanashreechethan1209
    @gnanashreechethan1209 10 месяцев назад

    Hi instead of UnstructuredPDFLoader can we use PyPDFLoader? I was using PyPDFLoader using glob and loader_cls . I have added 3 pdf files in a folder called pdf . so when i load it and print len of the documents it shows wrong ans like 5 or 6 whereas what i have loaded is only 3 pdf lines ? Can you plz lemme know if u have solution for this

  • @hanumanparida8131
    @hanumanparida8131 10 месяцев назад

    Hey what if we want image responses how do we get it

  • @PaoloPizzorni
    @PaoloPizzorni Год назад

    Excellent video, exactly what i was looking for. My pdf files are a mess (anyone can relate?) Hundreds of pages, images, scanned documents sometimes.
    Can you clarify what is Pinecone and how it could help in this particular workflow?

    • @engineerprompt
      @engineerprompt  Год назад

      So this approach will only work on the text part of your documents, I am not aware of any approach that will understand images (yet). Pinecone is the vectorstore (think a database). You can basically store your embeddings there if you have a very large set of documents. Hope this helps.

  • @rodrigofarias900
    @rodrigofarias900 2 месяца назад

    do you have a video like this with local LLM? Like using LMStudio as a server?

  • @ankit9401
    @ankit9401 7 месяцев назад

    suppose i added pdfs containing details for each employee. and then i ask how many employee have python experience? or how many employees are there in company?
    can it respond correctly?
    if no, then what should be done in order to get correct response for above queries?
    Thanks!

  • @Sachin-kk3np
    @Sachin-kk3np 11 месяцев назад

    My Question : I want the answers from both sources mentioning that this Answer1 is coming from Source1 and Answer2 is coming from Source2.
    How can i achieve this?

  • @lmrecords4564
    @lmrecords4564 Год назад

    can you try this for estimation pdf blueprint files for commercial window treatment business? and construction firms

  • @ticelsoful
    @ticelsoful 11 месяцев назад

    Wonderful tutorial. Would it be possible to run this through VSCode appear within the browser? Since I attempted it and it only shows up in the console without opening the browser.

    • @engineerprompt
      @engineerprompt  11 месяцев назад

      Would should be able to. VSCode has support for Jupyter Notebook. Check that out.

    • @fahad123434
      @fahad123434 2 месяца назад

      @@engineerprompt im facing a problem in vector store index ccreator
      please share the solution

  • @fishandcat4281
    @fishandcat4281 Год назад

    That's a another informative video. Appreciated. I know someone already asked in this thread that how to persist the index for later use. And you recommend chromadb is a good choice. However, in my company computer, I failed to install the chromadb. So how can I use FAISS instead to persist the index?

    • @engineerprompt
      @engineerprompt  Год назад

      In that case, use pickle to dump the index to a pickle file.

    • @fishandcat4281
      @fishandcat4281 Год назад

      @@engineerprompt thx for the response. Yes, I did that after watching your other vedios. All good now

  • @nitingoswami1959
    @nitingoswami1959 Год назад +11

    I want to use alpaca or vacuna model instead of chatgpt because chatgpt has limitations on the requests we sent. I just wanted to use any open-source model instead of chatgpt is this possible?

    • @engineerprompt
      @engineerprompt  Год назад +14

      Yes, you can looking into huggingface embeddings. When I get them working, will make a tutorial on it.

    • @SuproMVP
      @SuproMVP Год назад +2

      @@engineerprompt Yes looking forward to that tutorial where we can read multiple pdfs and query it without using OpenAPI.

    • @sauravmukherjeecom
      @sauravmukherjeecom Год назад

      Looking forward to this!

    • @nitingoswami1959
      @nitingoswami1959 Год назад

      Waiting for this 😊

    • @AmBasLam
      @AmBasLam Год назад

      @@engineerprompt waiting for this

  • @maxbodley6452
    @maxbodley6452 Год назад

    Great stuff! I was looking al over the web fro how to do this and this and this was the only useful video I could find. I jusr have one quick question for further work I need to do.
    Just wondering if it is possible to make queries which only pertain to a specific document? For example, i only want to know something about the first paper (eg. authors, title, etc) but not the second. Let me know how you would go about doing this.

    • @engineerprompt
      @engineerprompt  Год назад +1

      Thank you and glad you liked it. Yes, you can do that by adding metadata and use that as context to the LLM.

    • @maxbodley6452
      @maxbodley6452 Год назад

      @@engineerprompt Thanks! Do you have any videos/resources on how to do this?

  • @nightmisterio
    @nightmisterio Год назад

    Do a demo of it working, I always wanted this.

  • @Ianhilts667
    @Ianhilts667 Год назад

    Great content. Would it be easy to modify this process to handle different file formats such as .doc or .txt. Thanks again. I have subscribed.

    • @engineerprompt
      @engineerprompt  Год назад

      Thank you, yes, you just need to add different loaders for each file type.

  • @tapos999
    @tapos999 Год назад +7

    Is it possible to retrieve which section of the PDF it is referring too? (even it can detect the portion of chunk in pdf)

    • @engineerprompt
      @engineerprompt  Год назад +3

      I am not sure, will look into it.

    • @Myplaylist892
      @Myplaylist892 Год назад +3

      That would be incredible in order to make scientific reviews and references

    • @engineerprompt
      @engineerprompt  Год назад +3

      @@Myplaylist892 I agree, I will look into it in more details.

    • @girijeshsingh6947
      @girijeshsingh6947 Год назад +1

      ​@@engineerprompt hi..you got anything on this?

    • @martinsherry
      @martinsherry Год назад +1

      yes that does sound pretty useful

  • @Brainjoy01
    @Brainjoy01 Год назад

    Immediately hit quota before able to query. Huggingface would be the next route for free AI version?

    • @engineerprompt
      @engineerprompt  Год назад

      Yes, you can try huggingface or if you have the hardware, you can try to run something like localgpt locally

  • @amardeepraj4321
    @amardeepraj4321 11 месяцев назад

    Can I store a vector in a database like Azure and then run just a similarity search or retriever without having to recreate it? Can someone help me?

  • @LoneRanger.801
    @LoneRanger.801 Год назад +3

    Apart from OpenAI, who else provides embeddings?

    • @engineerprompt
      @engineerprompt  Год назад +7

      Hugginface have their embeddedings as well as you can integrate models like Bert.

  • @1984amitsince
    @1984amitsince Год назад

    I am getting this error on VectorStoreIndex creation :
    ImportError: cannot import name 'open_filename' from 'pdfminer.utils' (C:\Program Files\Python310\lib\site-packages\pdfminer\utils.py)

  • @caiyu538
    @caiyu538 9 месяцев назад

    Besides pdf, word files, Can localgpt handle excel, CSV files?

    • @engineerprompt
      @engineerprompt  9 месяцев назад

      Yes, but you will have to experiment with the embedding model and llm

  • @satemyu
    @satemyu Год назад

    sorry, i think the Google Colab got problem, anyone has problem to open and run it ?

  • @SanjeevKumar-dr6qj
    @SanjeevKumar-dr6qj Год назад

    Can I use this in my company website for creating pdf searching .. please reply

  • @abdelouahabmotrani3831
    @abdelouahabmotrani3831 Год назад +1

    thank you for this valuable information's, how can i get the the number of the page as reference with the source pdf ?

  • @kevinyuan2735
    @kevinyuan2735 Год назад

    谢谢!

  • @diegogutierrez2874
    @diegogutierrez2874 5 месяцев назад

    Now text-davinci-003 model has been deprecated I'm no longer able to use openai library.
    Got the error: openai.error.InvalidRequestError: the model 'text-davinci-003' has been deprecated. Is there a way I can replace it to gpt-3.5-turbo-instruct (recommended one by openai)?

  • @arslanabid2245
    @arslanabid2245 10 месяцев назад

    I have a Question:
    What Should be my System Requirements, if i want to build a Project Application using Langchain & OpenAi ?

    • @engineerprompt
      @engineerprompt  10 месяцев назад +1

      You can run it on any machine that can run python if you are using openai models. You don’t need a gpu in that case

  • @syedsaalim3604
    @syedsaalim3604 Год назад

    This error is popping up: "Failed to load the Detectron2 model. Ensure that the Detectron2 module is correctly installed."
    after running this "index = VectorstoreIndexCreator().from_loaders(loaders)" although i have already installed the detectron2 model.
    any solutions?

  • @ahmedkotb3089
    @ahmedkotb3089 Год назад

    I got this error :
    InvalidRequestError: This model's maximum context length is 4097 tokens, however you requested 4331 tokens (4075 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.

  • @samser1150
    @samser1150 Год назад +4

    Thanks a lot, but I have a error message when I run the VectorstoreIndexCreator() cell i get the following error: "ImportError: cannot import name 'open_filename' from 'pdfminer.utils' (/usr/local/lib/python3.9/dist-packages/pdfminer/utils.py)" ¿could you help me?

    • @Prakash-oq5ke
      @Prakash-oq5ke Год назад +1

      @Sameul - @Yn Box has answered this question, please see the below comments. Basically you need to do:
      !pip install unstructured[local-inference]
      I was also facing the same issue as you and his resolution solved it!

    • @thomasneuhaus4838
      @thomasneuhaus4838 Год назад

      @@Prakash-oq5ke This worked! Thank you.

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w Год назад

    do you know how to incorporate a new LLM like Dolly into LangChain?

  • @cascaderz
    @cascaderz Год назад +4

    When I run the VectorstoreIndexCreator() cell i get the following error
    ImportError: cannot import name 'open_filename' from 'pdfminer.utils' (/usr/local/lib/python3.9/dist-packages/pdfminer/utils.py)
    I tried installing and importing the packages but that didn't work either , any solution to this?

    • @MatiPage
      @MatiPage Год назад

      Same 🥲

    • @samdaniel1368
      @samdaniel1368 Год назад +1

      You can try installing this library first. !pip install unstructured[local-inference]

    • @cascaderz
      @cascaderz Год назад

      @@samdaniel1368 Thank you for the solution , running it in the first cell and restarting the runtime solved the issue for me

    • @electrikkingdom
      @electrikkingdom Год назад

      @@samdaniel1368 Perfect thanks

  • @samser1150
    @samser1150 Год назад

    Thanks a lot for the video. I am facing a problem with the access to the Colab file, please, can you help?

    • @engineerprompt
      @engineerprompt  Год назад

      What is the issue?

    • @samser1150
      @samser1150 Год назад

      ​@@engineerprompt Something like "Request had invalid authentication credentials. Expected OAuth 2 access token, login cookie or other valid authentication credential"
      😢

    • @engineerprompt
      @engineerprompt  Год назад +1

      @@samser1150 Make sure you are not running the colab in another tabs.

    • @samser1150
      @samser1150 Год назад

      @@engineerprompt solved, thanks a lot

    • @engineerprompt
      @engineerprompt  Год назад

      @@samser1150 Perfect, glad was helpful!

  • @yassinerabaoui4219
    @yassinerabaoui4219 Год назад

    Hey, thnx for the video but The Colab file is private we can't access

    • @engineerprompt
      @engineerprompt  Год назад

      It seems to be public, can you not see it at all?

  • @maartirosian
    @maartirosian Год назад +1

    Can you make a video where you make a webapp using langchain & streamlit where you can upload multiple PDF files and ask questions about the files?

  • @kaustubhsarnaik6672
    @kaustubhsarnaik6672 Год назад

    I am using Azure OpenAI , Code failing in the Index creation step i.e
    index = VectorstoreIndexCreator(embedding=embeddings).from_loaders(loaders)
    with the following message
    raise error.InvalidRequestError(
    openai.error.InvalidRequestError: Must provide an 'engine' or 'deployment_id' parameter to create a
    Can you help on how to do this with Azure OpenAI setup

    • @engineerprompt
      @engineerprompt  Год назад

      Sorry haven't used Azure so not sure what's going on here. Seems like you are having issues accessing openai api.

  • @JustinBieberFan957
    @JustinBieberFan957 Год назад +1

    what to do about 'detectron2 not installed

  • @blindender9979
    @blindender9979 Год назад +1

    How to do the same without openai? I mean, using gtp4all or some other llm. The point is, doing everthing "for free" without spend on API calls. And another on, how to do the same on a large codebase? Pyhton, java, clojure, etc.. thank you

  • @roykim1425
    @roykim1425 9 месяцев назад

    Can ypu help me? I have error message.
    question = "핵가족 그리고 직계가족이 뭐지?"
    response = model1({"question":question}, return_only_outputs=True)
    print("Answer : ",response['answer'])
    print("Sources : ",response['sources'])
    This model's maximum context length is 4097 tokens, however you requested 4903 tokens (4647 in your prompt; 256 for the completion). Please reduce your prompt; or completion length

  • @MatiPage
    @MatiPage Год назад +2

    the cell " index = VectorstoreIndexCreator().from_loaders(loadees) " gives me error even though i pip installed pdfminer and !pip install unstructured[local-inference]... don't know what to do :(

    • @engineerprompt
      @engineerprompt  Год назад +2

      What's your python version?

    • @MatiPage
      @MatiPage Год назад

      @@engineerprompt 3.9

    • @StefanoTrinchero
      @StefanoTrinchero Год назад

      Same error here

    • @earningman5836
      @earningman5836 Год назад

      same erroe fot any solution?

    • @RonBarrett1954
      @RonBarrett1954 Год назад

      Running python 3.10 and with the unstructured[local-inference] installed , I am running into the error at the index = ... line.
      The error is:
      AttributeError Traceback (most recent call last)
      in ()
      1 get_ipython().system('pip install unstructured[local-inference]')
      ----> 2 index = VectorstoreIndexCreator().from_loaders([loaders])
      3 index
      /usr/local/lib/python3.10/dist-packages/langchain/indexes/vectorstore.py in from_loaders(self, loaders)
      67 docs = []
      68 for loader in loaders:
      ---> 69 docs.extend(loader.load())
      70 sub_docs = self.text_splitter.split_documents(docs)
      71 vectorstore = self.vectorstore_cls.from_documents(
      AttributeError: 'list' object has no attribute 'load'

  • @SaiKiranAdusumilli
    @SaiKiranAdusumilli Год назад +1

    How to use gpt-3.5 turbo instead of davinci

    • @engineerprompt
      @engineerprompt  Год назад

      in the OpenAI function, set the model variable to gpt-3.5

    • @edspa8576
      @edspa8576 Год назад

      @@engineerprompt here, too trying to do this, but with the VectorstoreIndexCreator it's a bit tricky if one doesn't know where to put it. Not choosing the gpt3.5 turbo becomes costly with the davinci over time

  • @caankitrmehta2281
    @caankitrmehta2281 Год назад

    How to extract certain basic kyc data and stuff in excel from insurance polices and invoices of different different structures in pdf
    Can this be done using chatgp Or any similar AI tool automatically without any training and annotations?

  • @VR-fh4im
    @VR-fh4im Год назад

    Error in your google colab file "ModuleNotFoundError: No module named 'pdfminer'"

  • @osb22
    @osb22 Год назад

    Getting the following error message when I try to run the 'Load Required Packages' cell:
    ModuleNotFoundError: No module named 'langchain'
    Any advice?

    • @engineerprompt
      @engineerprompt  Год назад

      Seems like you didn't install langchain. In the start there is a cell with the following command.
      !pip install langchain
      Make sure you run that.

    • @osb22
      @osb22 Год назад

      @@engineerprompt Thank you. It worked after refreshing the page, think the error was on the Colab side. Great video!

  • @ynboxlive
    @ynboxlive Год назад

    Has anyone been able to fix the "detectron2 is not installed" issue?

    • @engineerprompt
      @engineerprompt  Год назад

      Is it warning or error?

    • @ynboxlive
      @ynboxlive Год назад

      @@engineerprompt it is a warning : "detectron2 is not installed. Cannot use the hi_res partitioning strategy. Falling back to partitioning with the fast strategy." and it is repeated for each file in the directory that has the PDFs.

  • @NigelPowell
    @NigelPowell Год назад

    which AI avatar generator are you using ? :)

    • @engineerprompt
      @engineerprompt  Год назад +1

      I have a locally open-source workflow for that :-)

    • @NigelPowell
      @NigelPowell Год назад

      @@engineerprompt I feel a need to request a video on it 🙂

  • @fumedia-Language
    @fumedia-Language Год назад

    How much should pay for 10 pages based on your experience ?

    • @engineerprompt
      @engineerprompt  Год назад +1

      Depending on how many times you will be prompting but its going to be in cents or a few dollars at max.

    • @fumedia-Language
      @fumedia-Language Год назад

      @@engineerprompt I appreciate your cooments.

  • @arturgoraus7947
    @arturgoraus7947 2 месяца назад

    Due to changes in VectorstoreIndexCreator API some errors appeared.
    To solve it, I did:
    embedding_ai = OpenAIEmbeddings() #Use any embedding you want to
    index = VectorstoreIndexCreator(embedding = embedding_ai).from_loaders(loaders)
    turbo_llm = ChatOpenAI(
    temperature=0,
    model_name='gpt-3.5-turbo-0125' # default gpt-3.5-turbo
    )
    #Need to define LLM now
    index.query('Tell me something about Interpersonal communication', llm = turbo_llm)

  • @maheshsanjaychivateres982
    @maheshsanjaychivateres982 Год назад

    please provide updated google collab link.

  • @digidope
    @digidope Год назад +1

    How to run this locally without Collab?

    • @engineerprompt
      @engineerprompt  Год назад +4

      You will need to install Python on your machine. Download visual code studio and install it. Then download the notebook shown in the video and run it in visual code studio. Hope this helps. If you are not familiar with the process, I can make a tutorial at some point.

    • @digidope
      @digidope Год назад

      @@engineerprompt I have python and visual code studio installed as i run locally many LLM models and i do AI training. I just have never used collab/notebook things.

    • @engineerprompt
      @engineerprompt  Год назад +2

      @@digidope Perfect, then its just a normal jupyter notebook once you download it. Just download it and you can run it as jupyter notebook.

  • @Bragheto
    @Bragheto Год назад

    Valeu!

    • @engineerprompt
      @engineerprompt  Год назад +1

      Thank you, really appreciate your support!

    • @Bragheto
      @Bragheto Год назад

      @@engineerprompt I was looking for an alternative for manually creating each loader. Thx!

  • @pkay3399
    @pkay3399 Год назад +4

    Getting an error when opening the link.

    • @engineerprompt
      @engineerprompt  Год назад

      what's the error?

    • @pkay3399
      @pkay3399 Год назад

      @@engineerprompt Colab signed out on a different tab. But I'm signed in.

    • @Fordtruck4sale
      @Fordtruck4sale Год назад +1

      Request had invalid authentication credentials. Expected OAuth 2 access token, login cookie or other valid authentication

    • @MatthewGunnin
      @MatthewGunnin Год назад

      @@engineerprompt Yeah I can't access it either. Something about the OAuth token

    • @engineerprompt
      @engineerprompt  Год назад

      @@MatthewGunnin Make sure you close all instances of google colab you have running and then open this link. Hope this helps.

  • @mvkrishna760
    @mvkrishna760 Год назад

    Can it work with 100s of PDFs?

    • @engineerprompt
      @engineerprompt  Год назад +1

      You could, I haven't tried it so can't say how hard its going to be. Will look into it.

    • @sauravmukherjeecom
      @sauravmukherjeecom Год назад

      This would be a very interesting experiment as well.
      At which point does it stop being context setting and starts being fine tuning?

  • @hamaltarther2515
    @hamaltarther2515 Год назад +1

    Bro can u make video on how u link on ur website + ui make it more good

  • @aadarshunniwilson8517
    @aadarshunniwilson8517 Год назад

    cannot access colab