Create Your Own ChatGPT with PDF Data in 5 Minutes (LangChain Tutorial)

Поделиться
HTML-код
  • Опубликовано: 15 июл 2024
  • 📚 My Free Resource Hub & Skool Community: bit.ly/3uRIRB3 (Check “RUclips Resources” tab for any mentioned resources!)
    🤝 Need AI Solutions Built? Work with me: bit.ly/3K3L4gN
    📈 Find out how we help industry experts sign their first 5 AI Agency clients, guaranteed: bit.ly/skoolmain
    In this video I show you how to train ChatGPT on your own data in 5 minutes using LangChain so you can chat with your PDFs! This is a super beginner friendly guide that explains how these custom knowledge chatbots can be created in a few minutes using LangChain. This is similar to tools like ChatPDF which allow you to chat to your docs (chatpdf.com/).
    If you've ever wanted to know how to chat with your PDFs or train ChatGPT on your own data, this is the video for you! Code available below.
    Create a copy of my notebook (code):
    colab.research.google.com/dri...
    Timestamps:
    0:00 - What we're building
    1:10 - System Explained
    2:48 - Creating the chatbot
    8:18 - Steal my code!
  • ХоббиХобби

Комментарии • 309

  • @LiamOttley
    @LiamOttley  Год назад +4

    Leave your questions below! 😎
    📚 My Free Skool Community: bit.ly/3uRIRB3
    🤝 Work With Me: www.morningside.ai/
    📈 My AI Agency Accelerator: bit.ly/3wxLubP

  • @moses5407
    @moses5407 Год назад +2

    Golden! Clear, concise info and a notebook! If it's too fast for some viewers, I'll remind that they can always show down the replay speed.

  • @borisbadinoff1291
    @borisbadinoff1291 Год назад +12

    👏👏 Hey Liam, your five-minute tutorial is fantastic! Kudos and thanks for putting the effort to produce it. Your app is exactly what any knowledge worker is craving for: We all have gigabytes of pdf files in some folder named "READ", "TO READ" or "__TO READ" (so it stays on top of the root :), but never get to it (probably distracted by all these tutorials to become more productive we love to watch). A bot that can read that stuff for us, so we can continue to wing it is a true godsend. :D

  • @guilhermeveiga9345
    @guilhermeveiga9345 Год назад +2

    Thought it would be just another video on the subject, but you summarize in an awesome way! Great vid! Congrats

  • @naturallydope247
    @naturallydope247 Год назад +4

    This was definitely one of your better videos. You explained Langchain well and I’m glad you used the colab notebook instead of Jupyter or repl.

  • @ryanjames3907
    @ryanjames3907 Год назад +2

    thank you for time, effort and generosity,
    I wish very good things for you.

  • @chandrachoodR
    @chandrachoodR Год назад

    Thats a fantastic video and to the point and thanks for the code as well

  • @user-tm1jp7fk7n
    @user-tm1jp7fk7n Год назад

    You're awesome, Liam !!

  • @bendaniels8677
    @bendaniels8677 Год назад

    Appreciate your hustle bro

  • @gabijazza1220
    @gabijazza1220 Год назад

    Cheers, this is a brilliant video. Looking forward to making a bespoke AI.

  • @CK-ho7gj
    @CK-ho7gj Год назад

    Awesome tutorial. Cheers Liam

  • @luigiseven
    @luigiseven Год назад

    Awesome work

  • @justingu9541
    @justingu9541 Год назад

    Thank you for your excellent sharing. This is great guidance, and I hope you can continue to share more! If there's anything I can do, please let me know~

  • @omountassir
    @omountassir Год назад

    Freaking Great Content! Keep Rocking 💯

  • @sganesh07
    @sganesh07 Год назад

    Thanks Liam ... neat and fast as always; could you post another similar video doing the same thing with Llama index pls. I thought that was easier.

  • @stefano94103
    @stefano94103 Год назад +6

    Excellent! Thank you for your hard work to put these together.

    • @LiamOttley
      @LiamOttley  Год назад

      My pleasure! Thanks for watching

    • @AlbyTheMovieCreator
      @AlbyTheMovieCreator Год назад +2

      This video was copied from the beginning to the end from the channel Prompt Engineering

    • @stefano94103
      @stefano94103 Год назад

      @@AlbyTheMovieCreator Oh wow I totally didn't know that. Thanks for the heads up! SMH😒

  • @coinhawk
    @coinhawk Год назад

    Great job... will run this on my writings/ book collection and my code snippets, and build an awesome, MeKnowledgeBase 😎

  • @user-rc6ik9gz6g
    @user-rc6ik9gz6g 11 месяцев назад

    Thank you, keep going.

  • @konstantinrebrov675
    @konstantinrebrov675 Год назад +2

    Wonderful tutorial. Thank you!

  • @antonpictures
    @antonpictures Год назад +1

    i there! As a fellow filmmaker, I find the concept of regenerative agents fascinating. I'm curious, what specific types of agents are you interested in exploring in your video? Additionally, have you thought about incorporating some real-world examples of sim city-like models, such as the ones developed by Stanford, to help illustrate the concept to your audience? Looking forward to hearing more about your project! George Anton

  • @AndrewSheves
    @AndrewSheves Год назад

    Liam, this is a great tutorial, thank you. What I really liked was the explanation of what is happening behind the scenes - anyone (even a non-developer) like me - can cut and paste the code but knowing what the commands are doing is super helpful.
    The explanations in the Colab are great and I took your advice and stole your code. The chatbot was up and running in a few hours (remember: non-developer) but that included building a separate UI. Great work, thank you

    • @aradinac
      @aradinac Год назад +3

      can i ask whether you paod for the OPENAI KEY OR YOU DID IT WITH THE FREE TRAIL? Cuz am encountering this error RateLimitError: You exceeded your current quota, please check your plan and billing details.

    • @AndrewSheves
      @AndrewSheves Год назад

      @@aradinac I used the paid for openAi key

    • @csss142
      @csss142 7 месяцев назад

      @@AndrewSheves which one did you buy?

    • @miguelmunoz4135
      @miguelmunoz4135 6 месяцев назад

      ​@@aradinac I have the same error because I have the account not paid, if you found another solution, pls let us know

  • @chatbotsvideochatbotsforwe1207

    Very Good👍

  • @ganashayoutube
    @ganashayoutube Год назад

    awesome bro

  • @zoumanakeita8016
    @zoumanakeita8016 Год назад

    Straightforward and concise! Great explanation.
    How do you extract the exact page number where the answer was found?

  • @SimonStJohn
    @SimonStJohn Год назад +4

    Hey Liam! Awesome...could you do one that scrapes data from blog/website for embedded chatbot for a blog?

  • @mic9657
    @mic9657 2 месяца назад

    Those biceps too! 💪

  • @JohnAlexanderEcheverryOcampo
    @JohnAlexanderEcheverryOcampo Год назад

    Thanks

  • @vicentesoto1628
    @vicentesoto1628 11 месяцев назад

    Liam your content is unreal
    Some of the best I've seen so far
    This is hard knowledge
    You are brilliant
    What do you mean by 512 tokens on every chunk? Characters?
    I'll be waiting for a detailed masterclass
    Vicente

  • @andym9565
    @andym9565 Год назад

    Great video! Would be cool to create a video similar with Apify and LangChain.

  • @SedhuujGorem
    @SedhuujGorem 5 месяцев назад +28

    The Best tool for this is ruclips.net/video/bcK7LldB3dk/видео.html
    I like some of the transitions, but sometimes they're a bit too much and are seemingly random. Since we use these persistent elements that transition across pages to indicate some kind of relationship between the previous and the next states, some of your transitions confuse me because I can't immediately see what the relationship is.
    For example 1:23 of the selectable tiles (which weren't selected) transition into being two switches... does that mean anything? are they related in some way? I see this as random and a bad use of the design language. However, at 3:14 I like the transition from switches to the ticks on a paper, that makes sense to me. Epic presentation tho

  • @vukradovic172
    @vukradovic172 3 месяца назад

    Excellent

  • @1Esteband
    @1Esteband Год назад +7

    Thank you it worked perfectly despite generating an error on the pip install.
    ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
    yfinance 0.2.18 requires beautifulsoup4>=4.11.1, but you have beautifulsoup4 4.8.2 which is incompatible.

  • @tspang1977
    @tspang1977 Год назад +5

    Hi Liam, great video. I do have a question, from the following code, i notice that we don't have to specifically turn the "query" into embeddings, before it performs a search against the vector db? Is it because the function "similiary_serach" internally calls the openapi embedding to perform words embeddings?
    query = "Who created transformers?"
    docs = db.similarity_search(query)

  • @bene88597
    @bene88597 Год назад

    You got my mail buddy GJ

  • @MichielVermandel
    @MichielVermandel Год назад +2

    Thanks for the great video! One question: which OpenAI model is used to retrieve the answer? Is it gpt-35-turbo or ada or...? Where is it defined?

  • @user-by3xv9kv4s
    @user-by3xv9kv4s Год назад

    This is great Liam, thank you for sharing, what's the simple automated way to deploy this code to a basic online application/chat page

  • @vverboX
    @vverboX Год назад

    I would love to see a video which helps me to deploy such a chatbot (created on colab) on a webpage.

  • @navigatingsideways
    @navigatingsideways 8 месяцев назад

    It’s convenient because I just completed a Data Analysis course via IBM, and Vanderbilt Promp Engineering course. I created my first Smart Bot for my Dad’s website on Sunday.
    I’d like to dump RFP contractor documents to easily take the 88 pages to question parts of a bid

  • @Iatalksbrasil
    @Iatalksbrasil Год назад

    great video! help me to complete me knowlege about best praticies in prompt!

  • @JerryTrade28
    @JerryTrade28 10 месяцев назад

    Will you be sharing your Marcus Aurelius database u created previously? I was really looking forward to that

  • @Ramp_cat_7
    @Ramp_cat_7 Год назад

    This is amazing! Can you teach us mindai?

  • @joepbaks
    @joepbaks Год назад +2

    Thanks a lot man, been trying to get this to work via other ways for days. This was so easy, great tutorial. How would you transfer something like this to a user friendly ux/ui?

  • @johnjoesafatso
    @johnjoesafatso Год назад

    Amazing content. Thank you!
    Is there a way to do this with PDFs that have graphics and images?

  • @yiyuanzhang6335
    @yiyuanzhang6335 Год назад

    will need a video on how to do this for multiple pdfs

  • @tuwayne3624
    @tuwayne3624 Год назад +1

    Thank you, I've learned a lot from your channel. I'm curious about the differences between the llama index and the langchain. Maybe I'm still a beginner in AI and don't quite understand.

  • @TheSacredGrove
    @TheSacredGrove Год назад

    Cool AF!

  • @noteniceu
    @noteniceu Год назад +5

    Can you feed it multiple pdf at the same time like a group of 300 or would you have to run each line individually.

  • @armandocapogrossi6689
    @armandocapogrossi6689 Год назад

    Thanks, very good content. Just a question to understand the market better: did I misinterpret your hourly rate at $997/45 mins?

  • @featherly4267
    @featherly4267 Год назад +2

    Brother can you make video on how to use autogpt for beginners 😊

  • @minhe9008
    @minhe9008 11 месяцев назад +2

    great tutorial! I have hundreds of research papers in pdf format. Can I use this approach to build a vector db and then chat with chatgpt? Is there a limit to the size of db? any pitfall to avoid?thanks!

  • @quangdinhdota2388
    @quangdinhdota2388 Год назад +2

    Amazing Video. I have a question: Can Your notebook (code) run with muti file pdf?

  • @tibz11c
    @tibz11c 2 месяца назад

    Great Work! Can we do this with a local or a smaller language model ?

  • @user-we3qo9kj4q
    @user-we3qo9kj4q Год назад

    is there way to also store the questions from the user and the answers to them for monitoring, data analysis and other ideas?

  • @qwerto-ye5pe
    @qwerto-ye5pe Год назад

    Hi! I just wanted to ask what are the licenses used in this project? Are they commercial-friendly?

  • @JJBoi8708
    @JJBoi8708 Год назад +1

    What is a good way to split text in a textbook pdf because on one page it has 2 columns, text on the left and right side?

  • @juliamarsh2077
    @juliamarsh2077 10 месяцев назад

    Can you also use it to write content, e.g. web articles, based on the PDF or PDFs you have uploaded?

  • @vrynstudios
    @vrynstudios 9 месяцев назад

    i am noob here. Is it possible to embed it on a site? If I embed, is it standalone? or still it uses GPT API calls and costs?

  • @georgekokkinakis7288
    @georgekokkinakis7288 Год назад

    Can you explain how we could use other llms than openai, for example can we use mosaic mpt-7b ?

  • @Pppljssbs
    @Pppljssbs Год назад

    As a beginner coding their first ever plug-in, how long would it take to develop a high quality plug-in?

  • @rishabpoddar3866
    @rishabpoddar3866 Год назад +4

    ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
    yfinance 0.2.18 requires beautifulsoup4>=4.11.1, but you have beautifulsoup4 4.8.2 which is incompatible.

  • @frosti7
    @frosti7 Год назад

    What solution can dynamically add or extract database for an LLM?
    Like your company information that can be accessible by employees

  • @user-rf9dl1bl6s
    @user-rf9dl1bl6s Год назад

    Hia Liam! Which version of gpt does the chatbot use? Can I use it with gpt4?

  • @InnocenceVVX
    @InnocenceVVX 10 месяцев назад

    So essentially you calculate semantic similarity of the stored vectors and the asked question, then provide the 4 most similar vectors as context in the prompt?

  • @TheUselessgeneration
    @TheUselessgeneration Год назад +1

    I cant wait until we can expand this to all documents. I assume that is what Microsoft 365 Copilot will do.

  • @marcosemeria97
    @marcosemeria97 Год назад +1

    Can you suggest alternatives to OpenAI in terms of embeddings and llm? They are too expensive their APIs

  • @flyinonminds6415
    @flyinonminds6415 Год назад +1

    great video!, it is possible to add more than 1 pdf with that code ?, will be possible to provide a code for multiple pdf ? thank you

  • @michaeldblake
    @michaeldblake 9 месяцев назад

    I'd love to figure out how to do this.

  • @Finalform77
    @Finalform77 Год назад +2

    Hi Liam, I am getting 'authentication Error' when running 2. section of the code "Embed text and store embeddings" . I have not change anything yet just running it as is. Any suggestion?

  • @kiranhipparagi543
    @kiranhipparagi543 11 месяцев назад

    Hello. Thanks for a great video. But i have financial statement pdf file and it contains tables in it. How can i achieve besy results out of it?
    Any suggestions or help would be more helpful. Thanks😊

  • @timtensor6994
    @timtensor6994 8 месяцев назад +1

    great tutorial , can it be modified to support multiple pdfs ?

  • @paulp6752
    @paulp6752 9 месяцев назад

    Great stuff. Is there any good model to perform the embeddings calculation (and then semantic search) on my server in oppose to use OpenAI API?

  • @TheSimoncio
    @TheSimoncio 11 месяцев назад

    What about using any other open source LLM instead of GPT? thank you!

  • @tommycondon1918
    @tommycondon1918 Год назад

    Could you do it using Gradio interface and importing openai module?

  • @siddhantmohanty1578
    @siddhantmohanty1578 4 месяца назад

    Hi,
    THANK YOU for sharing your knowledge. Could please let me know how many PDF can we train using this technique and does this LLM remember what PDFs it has been trained on or do we have to train the LLM at before running the query?

  • @derrickwong3114
    @derrickwong3114 Год назад

    Can the chatbot incorporate website links or app deeplink as the chat results?

  • @maxdranitsa
    @maxdranitsa 3 месяца назад +1

    Liam, if there an option to make the assistant always use the data that has been uploaded to knowledgebase? It doesn't read the KB files every time and uses the links that even doesn't exist

  • @angel1st007
    @angel1st007 Год назад

    Hey @LiamOttley - I copied your code lab project, however, on the very 1st, I bumped into "ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
    yfinance 0.2.21 requires beautifulsoup4>=4.11.1, but you have beautifulsoup4 4.8.2 which is incompatible.". Any thoughts?

  • @stefan-ls7yd
    @stefan-ls7yd Год назад

    So when I store text in a Vector DB, this method retrieves the raw text to input to the LLM again?
    Is this the Ada encoder?

  • @willyjauregui6541
    @willyjauregui6541 24 дня назад

    Out of complete ignorance, is Langchaining the best method currently available to increase the perform of our LLMs Chatbots?
    If not, what is it or what other methods are out there that I may be missing.
    Thanks for answering.

  • @ronan815
    @ronan815 4 месяца назад

    Hi, sorry, there is an issue in colab, first script: ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
    pydrive2 1.6.3 requires six>=1.13.0, but you have six 1.12.0 which is incompatible.
    yfinance 0.2.36 requires beautifulsoup4>=4.11.1, but you have beautifulsoup4 4.8.2 which is incompatible.. By the way, do you plan to make an adaptation for Mistral AI?

  • @mr.pantherpanther1013
    @mr.pantherpanther1013 6 месяцев назад +1

    Hey Liam @ 03.22 you said we can upload pdf data by entering the pdf name. But what if we have more pdf, life for example I have 5 pdf?

  • @markbrown1609
    @markbrown1609 11 месяцев назад

    good show chap, can i use chatgpt 3.5?

  • @sayamkhan4209
    @sayamkhan4209 Год назад

    Where does he describe the model to use for output? Is he using Da Vinci 003?

  • @yosta3826
    @yosta3826 Год назад

    Do i pay openai api tokens when using the code or i use gpt2 local model.

  • @Miya-ub5qn
    @Miya-ub5qn Год назад +4

    Thank you very much for this great video!!! One question. On the part of Create chat bot with chat memory (OPTIONAL), I received the following message "DeprecationWarning: on_submit is deprecated. Instead, set the .continuous_update attribute to False and observe the value changing with: mywidget.observe(callback, 'value').
    input_box.on_submit(on_submit)" Why? Would you be able to fix it?

    • @ranjitherusa7139
      @ranjitherusa7139 Год назад +1

      I am having same issue
      Is the optional segment should be on same py program?

  • @ameynaik2743
    @ameynaik2743 Год назад

    Is there an alternative to open Ai embedding engine which is competitive and free?

  • @suriyakrishnan5177
    @suriyakrishnan5177 Год назад

    Can you explain this same example using expressJS? Coz no other tutorial hasn't used expressJS to illustrate this example

  • @ayanbahukhandi1869
    @ayanbahukhandi1869 Год назад +6

    Can I do it with multiple PDFs? like for each pdf I'll just chunk every page?

  • @denizkapteina2151
    @denizkapteina2151 Год назад +1

    Thanks for the super video. I have a question: in the overview you show that ChatGPT3.5 is used, or that the query is last processed by 3.5. But in the code I can't find any reference to it. Where is my mistake?

    • @LiamOttley
      @LiamOttley  Год назад +1

      The default LLM for Langchains "OpenAI()" is text-davinci-003 and "ChatOpenAI()" is gpt-3.5-turbo I believe

  • @moses5407
    @moses5407 Год назад

    Can this be expanded to read from multiple pdf's ... Pt can this be fine by combining pdf's into a single file?

  • @themotivationhub1355
    @themotivationhub1355 Год назад +1

    Yo I’ve made plugins but don’t know how to test it so can you give some ideas .(I don’t have access to the plugins yet.I’m in the waitlist)

  • @GiovaDuarte
    @GiovaDuarte Год назад +8

    This is great. Is it possible to retrieve images from the PDF? I have a PDF with many graphics that help understand the content. Do you have any ideas as to how I can provide images as part of the conversation?

    • @MCroppered
      @MCroppered Год назад

      What type of graphics are you talking about?

    • @quinnherden
      @quinnherden Год назад +1

      You could leverage Lang Chain's agent feature set to use computer vision to analyze your images.

    • @GiovaDuarte
      @GiovaDuarte Год назад

      @@MCroppered the PDF I have has images embedded and I was wondering if how I could recall these during a conversation

    • @GiovaDuarte
      @GiovaDuarte Год назад

      @@quinnherden I will research this. Thanks!

    • @gaben7
      @gaben7 Год назад +2

      @@GiovaDuarte if you figure out how to bring images along with the conversation, let us know how please

  • @aipy5147
    @aipy5147 9 месяцев назад +2

    Great video! I was wondering why is it a private chatbot when you're using openAI key and sending the information to LLM GPT-3.5? How can you secure sensitive data with your method? Thank you sharing your knowledge.

    • @lubeckable
      @lubeckable 6 месяцев назад

      Using and hosting by yourself a custom open source LLM like llama or mistral

  • @sahansathsara7106
    @sahansathsara7106 Год назад

    This is great! But how much does it cost

  • @Essential-Self
    @Essential-Self 9 месяцев назад

    hi and thanks for your work. i am totally new at this but i would like to be able to chat with my whole archive, like a second brain. is this possible with this method?

  • @igortrindade-dev
    @igortrindade-dev 5 месяцев назад

    I'm sorry about the silly question, if I use this script in a separated python nodule and call if with other documents, it will mix the sources of documents or this instance of the vector db will live only in runtime?

  • @ian5629
    @ian5629 Год назад

    How to use this in my business or website? How to embed for example in a better ui

  • @moritzwilhelm9658
    @moritzwilhelm9658 6 месяцев назад

    Hey Liam, cool Video! is it possible to do the same with more than one PDF-file?

  • @jitenbhalavat5738
    @jitenbhalavat5738 6 месяцев назад

    What if we have multiple PDfs and we want to fetch the Answer from that pdf ?
    like for an example : I have 20 Pdfs, and if I ask one question then it should fetch the answer from any one of the Pdf (correct obviously) and show me as a output.

  • @harshavardhan7097
    @harshavardhan7097 9 месяцев назад +1

    Can i do it on jupyter notebook rather then using colab