LangChain101: Question A 300 Page Book (w/ OpenAI + Pinecone)

Поделиться
HTML-код
  • Опубликовано: 9 сен 2024

Комментарии • 623

  • @edzehoo
    @edzehoo Год назад +157

    So even Ryan Gosling's getting into this now.

    • @DataIndependent
      @DataIndependent  Год назад +11

      It's a fun topic!

    • @blockanese3225
      @blockanese3225 Год назад +10

      @@DataIndependent he was referring to the fact you look like Ryan Gosling.

    • @Author_SoftwareDesigner
      @Author_SoftwareDesigner Год назад +15

      ​@@blockanese3225 I think understands that.

    • @blockanese3225
      @blockanese3225 Год назад +3

      @@Author_SoftwareDesigner lol I couldn’t tell if he understood that when he said it’s a fun topic.

    • @nigelcrasto
      @nigelcrasto Год назад

      yesss

  • @sarahroark3356
    @sarahroark3356 Год назад +78

    OMG, this is exactly the functionality I need as a long-form fiction writer, not just to be able to look up continuity stuff in previous works in a series so that I don't contradict myself or reinvent wheels ^^ -- but then to also do productive brainstorming/editing/feedback with the chatbot. I need to figure out how to make exactly this happen! Thank you for the video!

    • @DataIndependent
      @DataIndependent  Год назад +3

      Nice! Glad it was helpful

    • @areacode3816
      @areacode3816 Год назад +3

      Agreed. Do you have any simplified tutorials? Like explaining langchain I fed my novel into chatgpt page by page it worked..ok but I kept running into roadblocks. Memory cache limits and more.

    • @thebicycleman8062
      @thebicycleman8062 Год назад

      @@areacode3816 maybe from ur pinecone reaching its limit? or ur 4000 gpt3 token limit? i would check these first, if its pinecone the fix is easy, jus buy more space, but if its due to gpt then try gpt4 it has double the token at 8k or if that doesnt work i would figure out an intermediary step in between to introduce another sumarizing algorithm before passing it to gpt3

    • @gjsxnobody7534
      @gjsxnobody7534 Год назад +2

      How would I use this to make a smart chat bot for our chat support on our company? Specific to our company items

    • @shubhamgupta7730
      @shubhamgupta7730 Год назад

      @@gjsxnobody7534I have same query!

  • @nigelcrasto
    @nigelcrasto Год назад +3

    you know it's something big when The GRAY MAN himself is teaching you AI!!

  • @NaveenVinta
    @NaveenVinta Год назад +18

    Great job on the video. I understood a lot more in 12 mins than from a day of reading documentation. Would be extremely helpful if you can bookend this video with 1. dependencies and set up and 2. turning this into a web app. If you can make this into a playlist of 3 videos, even better.

  • @davypeterbraun
    @davypeterbraun Год назад +9

    Your series is just so so good. What a passionate, talented teacher you are!

  • @vinosamari
    @vinosamari Год назад +22

    Can you do a more indepth Pinecone video? It seems like an interesting concept alongside embeddings and i think it'll help seam together the understanding of embeddings for more 'web devs' like me. I like how you used relatable terms while introducing it in this video and i think it deserves its own space. Please consider an Embeddings + Pinecone fundamentals video. Thank you.

    • @DataIndependent
      @DataIndependent  Год назад +5

      Nice! Thank you. What's the question you have about the process?

    • @ziga1998
      @ziga1998 Год назад

      @@DataIndependent I thinks that general pinecone video would be great, and connecting it with LangChain and building similar apps to this would be awesome

    • @ko-Daegu
      @ko-Daegu Год назад

      Weaviet is even better

  • @blocksystems202
    @blocksystems202 Год назад +4

    No idea how long i've been searching the web for this exact tutorial. Thank you.

    • @DataIndependent
      @DataIndependent  Год назад +1

      Wonderful - glad it worked out.

    • @koraegis
      @koraegis Месяц назад

      ​@@DataIndependentdo you offer consulting? I'd like to do something like this for my learners / learning business. 🙂

    • @DataIndependent
      @DataIndependent  Месяц назад

      @@koraegis Happy to chat! Can you send me an email at contact@dataindependent.com with more details?

    • @koraegis
      @koraegis Месяц назад

      @@DataIndependent Thanks! Will do it now :D

  • @krisszostak4849
    @krisszostak4849 Год назад +6

    This is absolutely brilliant! I love the way you explain everything and just give away all notes in such detailed and easy to follow way.. 🤩

  • @nsitkarana
    @nsitkarana Год назад +1

    Nice video. i tweaked the code and split the index part and the query part so that i can index once and keep querying - like how we would do in the real world. Nicely put together !!

    • @babakbandpey
      @babakbandpey Год назад +1

      Hello, Do you have an example of how you did that. This is the part that I have become confused about how to reuse the same indexes. Thanks

    • @karimhadni9858
      @karimhadni9858 Год назад

      Can you pls provide an example?

  • @64smarketing
    @64smarketing Год назад +1

    This is exactly what I was looking to do, but I could'nt sort it out. This video is legit the best resource on this subject matter. You're gentleman and a scholar. I tip my hat to you, good sir.

  • @CarloNyte
    @CarloNyte Год назад +2

    Duudee!!! This video is exactly what I was looking for! Still a complete noob at all this LLM integration stuff and so visual tutorials are so incredibly helpful!
    Thank you for putting this together 🙌🏿🎉🙌🏿

    • @DataIndependent
      @DataIndependent  Год назад

      Great to hear! Checkout the video on the '7 core concepts' which may help round out the learnings

  • @nickpetolick4358
    @nickpetolick4358 Год назад

    This is the best video i've watched explaining the use of pinecone.

  • @DanielChen90
    @DanielChen90 Год назад +1

    Great tutorial bro. You're really doing good out here for us the ignorant. Took me a while to figure out that I needed to run pip install pinecone-client to install pinecone. So this is for anyone else who is stuck there

  • @lostnotfoundyet
    @lostnotfoundyet Год назад +1

    thanks for making these videos! I've been going through the playlist and learning a lot. One thing I wanted to mention that I find really helpful in addition to the concepts explained is the background music! Would love to get that playlist :)

    • @DataIndependent
      @DataIndependent  Год назад +1

      Thank you! A lot of people gave constructive feedback that they didn't like it. Especially when they sped up the track and listed to it on 1.2x or 1.5x
      Here is where I got the music!
      lofigenerator.com/

  • @mosheklein3373
    @mosheklein3373 Год назад +1

    This is really cool but i havent yet seen a query for a specific information store (in your case, a book) that chatgpt cant natively answer. For example i queried chatgpt the questions you asked and got detailed answers that echoed the answers you received and then some.

  • @davidzhang4825
    @davidzhang4825 Год назад +2

    This is gold ! please do another one with data in Excel or Google sheet please :)

  • @virendersingh9377
    @virendersingh9377 Год назад

    I like the video because it was to the point and the presentation with the initial overview diagram is great.

  • @sabashioyaki6227
    @sabashioyaki6227 Год назад +2

    This is definitely cool, thank you. There seem to be several dependencies left out. It would be great if all dependencies were shown or listed...

    • @DataIndependent
      @DataIndependent  Год назад +1

      ok, thank you and will do. Are you having a hard time installing them all?

    • @benfield1866
      @benfield1866 Год назад

      @@DataIndependent hey I'm stuck on the dependency part as well

  • @Mr_Chiro_
    @Mr_Chiro_ Год назад

    Thank you soooo much I am using this knowledge soo much for my school projects.

  • @ShadowScales
    @ShadowScales 10 месяцев назад

    bro thank you so much honestly this video means so much to me, I really appreciate this all the best in all your future endeavors

  • @PatrickCallaghan-yf2sf
    @PatrickCallaghan-yf2sf Год назад

    Fantastic video thanks. I obtained excellent results (accuracy) following your guide compared to other tutorials I tried previously.

    • @DataIndependent
      @DataIndependent  Год назад

      Ah that's great - thanks for the comment

    • @aaanas
      @aaanas 11 месяцев назад

      Was the starter tier of pinecone enough for you?

    • @PatrickCallaghan-yf2sf
      @PatrickCallaghan-yf2sf 11 месяцев назад

      Its one project only on starter tier, that one project can contain multiple documents under one vector vector db. For me it was certainty enough to get an understanding of the potential.
      From my limited experience, to create multiple vector db's for different project types you will need to premium/paid and the cost is quite high.
      There may be other competitors offering cheaper entry level if you wish to develop apps but for a hobbyist/learning the starter tier on pinecone is fine IMO.

  • @bartvandeenen
    @bartvandeenen Год назад

    I actually scanned the whole Mars trilogy to have something substantial, and it works fine. The queries generally return decent answers, although some of them are way off.
    Thanks for your excellent work!

    • @DataIndependent
      @DataIndependent  Год назад

      Nice! Glad to hear it. How many pages/words is the mars trilogy?

    • @bartvandeenen
      @bartvandeenen Год назад

      @@DataIndependent About 1500 pages in total.

    • @keithprice3369
      @keithprice3369 Год назад

      Did you look at the results returned from Pinecone so you could determine if the answers that were off were due to Pinecone not providing the right context or OpenAi not interpreting the data correctly?

    • @bartvandeenen
      @bartvandeenen Год назад +1

      @@keithprice3369 no I haven't.good idea to do this. I know have gpt4 access so can use much larger prompts

    • @keithprice3369
      @keithprice3369 Год назад

      @@bartvandeenen I've been watching a few videos about LangChain and they did bring up that the chunk size (and overlap) can have a huge impact on the quality of the results. They not only said there hasn't been much research on an ideal size but they said it should likely vary depending on the structure of the document. One presenter suggested 3 sentences with overlap might be a good starting point. But I don't know enough about LangChain, yet, to know how you specify a split on the number of sentences vs just a chunk size.

  • @____2080_____
    @____2080_____ Год назад

    This is such a game changer. Can’t wait to hook all of this up to GPT-4 as well as countless other things

    • @DataIndependent
      @DataIndependent  Год назад

      Nice! What other ideas do you think it should be hooked up to?

    • @____2080_____
      @____2080_____ Год назад

      Thumbs up and subscribed.

  • @MrWrklez
    @MrWrklez Год назад +2

    Awesome example, thanks for putting this together!

    • @DataIndependent
      @DataIndependent  Год назад

      Nice! Glad it worked out. Let me know if you have any questions

  • @borisrusev9474
    @borisrusev9474 6 месяцев назад

    I would love to see a video on the limitations of RAG. For instance say you have a document containing a summary of each country in Europe. Naturally one of the facts listed for each country would be the year they joined the EU. Unless explicitly stated, RAG wouldn't be able to tell you how many countries there are in the EU. I would love to see a tutorial on working around that limitation.

    • @DataIndependent
      @DataIndependent  6 месяцев назад +1

      nice! That's fun thanks for the input on that.
      You're right, that isn't a standard question and you'll need a different type of system set up for that

  • @thespiritualmindset3580
    @thespiritualmindset3580 Год назад

    this helped me a lot, thanks, for the updated code in description as well!

  • @ninonazgaidze1360
    @ninonazgaidze1360 Год назад

    This is super awesome!!! And so easily explained! You made my year. Please keep up the greatest work

  • @Crowward92
    @Crowward92 Год назад

    Great video man. Loved it. I had been looking for this solution for some time. Keep up the good work.

  • @ramachinta3140
    @ramachinta3140 2 месяца назад

    Very helpful Video, Thank you!

  • @HelenJackson-pq4nm
    @HelenJackson-pq4nm Год назад

    Really clear, useful demo - thanks for sharing

  • @jonathancrichlow5123
    @jonathancrichlow5123 Год назад +1

    this is awesome! my question is, what happens when the model is asked a question outside of the knowledge base that was just uploaded? For example, what would happen if you asked a question about who is the best soccer player?

  • @401kandBeyond
    @401kandBeyond Год назад

    This is a great video and Greg is awesome. Let's hope he puts together a course!

  • @tunle3980
    @tunle3980 Год назад +1

    Thank you very much for doing this. It's absolutely awesome!!! Also can you do a video on how to improve the quality of answers?

  • @ThomasODuffy
    @ThomasODuffy Год назад

    Thanks for this very helpful practical tutorial!

  • @HerroEverynyan
    @HerroEverynyan Год назад +3

    Hi! Awesome tutorial. This is exactly what I was looking for. I really love this series you've started and hope you'll keep it up. I also wanted to ask:
    1. What's the difference between using Pinecone or another vector store like Chrome, FAISS, Weaviate, etc? And what made you choose Pinecone for this particular tutorial?
    2. What was the cost for creating embeddings for this book? (time & money)
    3. Is there a way to estimate the cost of embeddings with LangChain beforehand?
    Thank you very much and looking forward to more vids like this! 🤟

    • @DataIndependent
      @DataIndependent  Год назад +1

      For your questions
      1. The difference with Pinecone/Chrome,etc. Not much. They store your embeddings and they run a similarity calc for you. However the space is super new, as things progress one may be a no brainer over another. Ex: You could also do this in GCP but you'd have to deal with their overhead as well.
      2. Hm, unsure about the book but here is the pricing for Ada embeddings: $0.0004 / 1K tokens. So if you had 120K word book which is ~147K tokens, it would be $.05. Not very steep...
      3. Yes, you can calc the number of tokens you're going to use and the task, then look up their pricing table and see how much it'll be.

    • @DataIndependent
      @DataIndependent  Год назад

      ​@@myplaylista1594 This one should help out
      help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them

    • @klaudioz_
      @klaudioz_ Год назад

      @@DataIndependent It can't be so expensive. text-embedding-ada-002 is about ~3,000 pages per US dollar (assuming ~800 tokens per page).

    • @DataIndependent
      @DataIndependent  Год назад +1

      @@klaudioz_ ya, you’re right my mistake. I didn’t divide by the extra thousand in the previous calc. Fixing now

    • @klaudioz_
      @klaudioz_ Год назад

      @@DataIndependent No problem. Thanks for your great videos !!

  • @sumitbakhli2049
    @sumitbakhli2049 Год назад +1

    I am getting Index 'None' not found in your Pinecone project. Did you mean one of the following indexes : langchain1 for below line
    docsearch = Pinecone.from_texts([t.page_content for t in texts], embeddings, index_name)
    Any idea what the issue could be. I checked index_name variable is set correctly as langchain1

  • @deanshalem
    @deanshalem Год назад +1

    Greg, you are INCREDIBLE! Your channel and GitHub are a goldmine. Thank you 🙏. At 9:09, what install on Mac is necessary to assess methods like that?

    • @deanshalem
      @deanshalem Год назад

      Also, I’ve been trying to make some type of “theorems, definitions, and corollaries” assistant which extracts from my textbook all the math theorems, definitions, and corollaries. The goal there was to create textbook summaries to reference when I work through tough problems which require me to flip back and forth through my book all day long.
      But more interesting, I am struggling to create a “math_proofs” assistant. Your approach in this video is awesome, but I can’t find any of your resources in which you use markdown, or latex, or any mathematical textbook to be queried. I use MathPix to convert my textbooks to latex, wordDoc, or markdown. But when I use my new converted markdown text, despite working hand-in-hand with the lang chain documentation, I still fail to get a working agent that proves statements.
      I feed the model:
      “Prove the zero vector is unique” and it replies nonsense, even though this proof is explicitly written in the text. It is not even something it had to “think” to produce (simple for the sake of example, these proofs are matrix theory so they get crazy). Could you please chime in?

    • @DataIndependent
      @DataIndependent  Год назад +1

      Pulling all of that information out could be tough. I have a video on the playlist around "topic modeling" which is really just pulling structured information out of a piece to text. That one may be what you're looking for

  • @gvarph7212
    @gvarph7212 Год назад +1

    Just out of curiosity, how much does something like this cost in openAI credits?

  • @vigneshyaadav6322
    @vigneshyaadav6322 Год назад +1

    I was looking for creating an API to store the embedding in Pinecode that was fairly simple but I did not understand how to pass on a query(Plain Text) and get the response back from the embedding stored in the pinecone db I see that's what happening in the doc search and the chain lines but how i do i do it separately

    • @DataIndependent
      @DataIndependent  Год назад

      Sorry I don't fully understand your question - could you rephrase it?

  • @user-xp2ym1ng2h
    @user-xp2ym1ng2h 11 месяцев назад

    Thanks as always Greg!

  • @johnsmith21170
    @johnsmith21170 Год назад

    awesome video, very helpful! thank you

  • @guilianamustiga2962
    @guilianamustiga2962 11 месяцев назад

    thank you Greg! very helpful tutorial!!

  • @ritik1857
    @ritik1857 Год назад

    Thanks Ryan!

  • @TheHumanistX
    @TheHumanistX Год назад +1

    Ok, so maybe I misunderstand this one. I used the full text of War and Peace, just to test. My query was "How many times does the word 'fire' appear in War and Peace?" and when it finishes running there is no output... is this not the right set up for that kind of question?
    Then, I set the query to 'What are the main philosophical ideas in War and Peace?' and also returned nothing. Didn't error out. I double checked and all my code is good.

    • @DataIndependent
      @DataIndependent  Год назад

      Ah yes this is a fun question.
      So LLMs won't be good at counting words like you're describing. That's. a task they aren't well suited for yet. I would use regular regex or a .find() for that
      The 2nd question is also hard, you need to review multiple pieces of text in the book to form a good opinion of the philosophical ideas.
      Just doing an similar embedding approach won't get you there.
      If you wanted to answer the philosophical question I would do a map reduce or refine with a large context window. However war and peace is huge so that would cost a lot.

  • @geethaachar8495
    @geethaachar8495 Год назад

    That was fabulous thank you

  • @RodolphoPortoSantista
    @RodolphoPortoSantista Год назад

    This video is very good!

  • @waeldimassi3355
    @waeldimassi3355 Год назад

    Amazing work ! thank you so much !!

  • @dogchaser520
    @dogchaser520 Год назад

    Succinct and easy to follow. Very cool.

  • @luisarango-jm8eq
    @luisarango-jm8eq Год назад

    Love this brother!

  • @sunbisoft9556
    @sunbisoft9556 Год назад

    Got to say, you are awesome! Keep up the good work, you got a subscriber here!

    • @DataIndependent
      @DataIndependent  Год назад +1

      Nice! Thank you. I just ordered upgrades for my recording set up so quality will increase soon.

  • @walter7812
    @walter7812 11 месяцев назад

    Great tutorial, thanks so much!

  • @rs-cm1kz
    @rs-cm1kz Год назад +1

    if I already have some embedding vector stored in pinecone, I don't need to embed again, how can I modify the following code ''docsearch = Pinecone.from_texts([t.page_content for t in texts], embeddings, index_name=index_name)'' and use docsearch.similarity_search() in the next step?

    • @RealEstate3D
      @RealEstate3D Год назад

      Well this indeed is the unanswered question. Unfortunately that is the problem with Jupiter Notebook cells.

  • @philipsnowden
    @philipsnowden Год назад +2

    Your videos are amazing. Keep it up and thanks!

    • @DataIndependent
      @DataIndependent  Год назад

      Thanks Philip. Anything else you want to see?

    • @philipsnowden
      @philipsnowden Год назад

      @@DataIndependent I'm curious what's a better option for this use case and would love to hear your thoughts. Why LangChain over Haystack? I want to pass through thousands of text documents into a question answering system and am still learning the best way to structure it. Also, an integration into something like Paperless would be cool!
      I'm a total noob so excuse my ignorance. Thanks!

    • @DataIndependent
      @DataIndependent  Год назад +1

      @@philipsnowden I haven't used Haystack yet so I can't comment on it.
      If you have 1K text documents you'll definitely want to get embeddings and store them, retrieve them, then pass them into your prompt for the answer.
      Haven't used paperless yet either :)

    • @philipsnowden
      @philipsnowden Год назад

      @@DataIndependent Good info, thank you.

    • @philipsnowden
      @philipsnowden Год назад

      @@DataIndependent Could you do a more in depth explainer on this? I'm struggling to take a directory of text files and get it going. I've been reading and trying the docs for langchain but am having a hard time . And can you use the new turbo 3.5 model to answer the questions? Thanks for your time, have a tip jar?

  • @nathanburley
    @nathanburley Год назад

    This is a great video - succinct and easy to follow.
    Two questions:
    1) How easy is it to add more than one document to the same vector db
    2) Is it possible to append an additional ... field(?) to that database table - so that the provenance of the reference can be reported back with the synethised result?

    • @DataIndependent
      @DataIndependent  Год назад +1

      1) Super easy. Just upload another
      2) Yep you can, it's the metadata field as you can add a whole bunch. People will often do this for document id's

    • @nathanburley
      @nathanburley Год назад

      @@DataIndependent Amazing (and thanks for the reply). One final follow up then, is it easy / possible to delete vectors from the db too (I assume yes wanted to ask). I assume this is done by using a query e.g. if meta data contains "Document ID X" then delete?

  • @caiyu538
    @caiyu538 Год назад

    Great series.

  • @Juniorventura29
    @Juniorventura29 Год назад +3

    Awesome tutorial, brief and easy to understand, Do you think this could be an approach to make semantic search on private data from clients? my concern is data privacy so, I guess by using pinecone and openAI, is that openAI only process what we send (to respond in a NL), but they don't store any of our documents.

  • @PizzaLord
    @PizzaLord Год назад +4

    Nice!
    I was working with pinecone / gpt code recently that gave your chat history basically infinite memory of past chats by storing them in pinecone which was pretty sweet as you can use it to give your chatbot more context for the conversation as it then remembers everything you ever talked about.
    Will be combining this with a custom dataset pinecone storage this week (like a book) to create a super powered custom gpt with infinite recall of past convos.
    Would be curious on your take, particularly how to keep the book data universally available to all users but at the same time keeping the past chat data of a particular user totally private but still being able to store both types of data on the free tier pinecone which I can see you are using (and I will be using too).

    • @DataIndependent
      @DataIndependent  Год назад

      Nice! That's great. Soon if you have too much information (like in the book example above), you'll need to get good at picking which pieces of previous history you want to parse out. I imagine that won't be too hard in the beginning but it will later on.

    • @PizzaLord
      @PizzaLord Год назад

      @@DataIndependent Doesnt the k variable take care of this? It only returns the top k number in order of relevance that you end up querying.
      Or are you talking about the chat history and not the corpus?
      I see no reason why you would not just specify a k variable of 5 or 10 in regard to the chat history too. For example if a user was seeking relationship advice and the system knew their entire relationship history and the user said something like "this reminds of of the first relationship that I told you about", it would be easy for the system to do an exact recall of the relationship, the name of the partner and from there recall everything very quickly using the k variable on the chat history.
      I use relationships as an example because I just trained my system on a book that I wrote called sex 3.0 (something that gpt knows nothing about) and I am going to be giving it infinite memory and recall this week.

    • @DataIndependent
      @DataIndependent  Год назад

      @@PizzaLord Yes, the K variable will help w/ this. My comment was around the chance for more noise to get introduced the more data you have. Ex: More documents creep in that share a close semantic meaning, but aren't actually what you're looking for. For small projects this shouldn't be an issue.
      Nice! That's cool about the project. Let me know how it goes.
      The langchain discord #tools would love to see it too

    • @PizzaLord
      @PizzaLord Год назад

      @@DataIndependent Another thing I will look at, and I think it would be cool if you looked at it too, is certain chat questions triggering an event like a graphic or a video link being shown where by the video can be played without leaving the chat. This can be done by either embedding the video in the chat response area or by having a separate area of the same html page which is the multimedia area or pane that gets updated.
      After all the whole point of langchain is to be able to chain things together, no? Once you chain things together you can get workflow.
      This gets around one of chat gpts main limitations right now which is that its text only in terms of what you can teach it and the internet loves its visuals and videos.
      Once this event flow stuff is in place you can easily use it to flow through all kinds of workflow with gpt at the centre like collecting data in forms, doing quick survey so you can store users preferences and opinions about what they might want to get out of an online course that you are teaching it and then storing that in a vector DB. It can become its own platform at that point.

    • @DataIndependent
      @DataIndependent  Год назад

      @@PizzaLord You could likely do that by defining a custom tool, grabbing an image based off a URL (or generating one) and then displaying in your chat box. Doing custom tools is interesting and I'm going to look into a video for that.

  • @shinycaroline3722
    @shinycaroline3722 5 месяцев назад

    How to retrieve the data from the existing index instead of recreating them over and over again. I find the upgraded langchain Pinecone version has dependency issues. Suppose I have some 10 docs wherein i want to store each doc separately with the id, metadata and it's embeddings initially. Then i just need to retrieve it's index and query. How should I do?

  • @rajivraghu9857
    @rajivraghu9857 Год назад

    Excellent 👍

  • @pedrorios6566
    @pedrorios6566 8 месяцев назад

    Every time I run the cell with the emmbeding class do I get a charge from OpenAI?
    What option can I use to do the embedding load only once (for example to make queries available through a web application)?

  • @RomuloMagalhaesAutoTOPO
    @RomuloMagalhaesAutoTOPO Год назад

    Great explanation. Thank you.

  • @roshawnbrooks4091
    @roshawnbrooks4091 Год назад

    I tried building this but idk I'm having a bunch of dependency problems. I even tried to download the repo and still have a bunch of dependency problems is there something I'm missing?

  • @nattapongthanngam7216
    @nattapongthanngam7216 4 месяца назад

    Appreciate it!

  • @3278andy
    @3278andy Год назад

    Amazing tutorial Greg! I'm able to reproduce your result in my env, I think in order to ask about follow up questions, chat_history should be handy

  • @lukaszwiktor
    @lukaszwiktor Год назад

    This is gold! Thank you so much!

  • @tianyijia4585
    @tianyijia4585 Год назад

    I have issues to run command "import pinecone" after install the pinecone module by "pip3 install pinecone-client". How could I fix this? Thank you!

  • @amaanqureshi1286
    @amaanqureshi1286 2 месяца назад

    My question is pinecone only stores vectors and not text files, how do I get the texts in my program

  • @Ahmed-ec7zb
    @Ahmed-ec7zb Год назад +1

    What about a video on hosting this on AWS and adding a Front end to make it accessible to clients?

    • @DataIndependent
      @DataIndependent  Год назад +1

      I have a video about building a simple web app in 23 minutes using streamlit which may help! If not then vercel seems like another good option. Soon pynecone will be once they add hosting.

  • @user-bv7qv2vj5w
    @user-bv7qv2vj5w 7 месяцев назад

    thank you for this series. I'm confused about one thing. When querying the db, you passed the text, not its embedding. How does pinecone know how to embed the text?

  • @JoanSubiratsLlaveria
    @JoanSubiratsLlaveria Год назад

    Excellent video!

  • @retardedpenguin1
    @retardedpenguin1 Год назад +1

    How do you get around rate limits for really large documents? OpenAI ada embeddings model can only take up to a certain amount of requests/ chunk sizes per minute.

  • @vishnuvardhanvaka
    @vishnuvardhanvaka Год назад +1

    Is it a fine tuned model ? Because if not we will charged high for using openai api.
    Please make a video on fine tuned langchain openai ai model like text-ada-001

  • @brutalbutler
    @brutalbutler Год назад

    I have a question, in my case I have various books. But I want when I query for one I get information about that one and that only, not another book that may be similar to it. How should I go about this, should I have an index in pinecone for each? or is there a better way to accomplish this

  • @nuckybad
    @nuckybad Год назад

    Is the output reliable? I have tried my own LangChain app so far with a smaller PDF and the result was not so cool due to the wrong answers.

  • @haouasy
    @haouasy Год назад

    Amazing content man , love the diagrams and how you deliver ,absolutely professional .
    quick question , is the text returned by the chain is exactly the same from the book or does the openAI engine make some touches and make it better ?

  • @klammer75
    @klammer75 Год назад +1

    Yes!!! Tku💪🏼

  • @vigneshnagaraj7137
    @vigneshnagaraj7137 Год назад

    len(data[0].page_content )gives 0 characters. How are you getting 176584 characters in your document.Please explain

  • @ralffig3297
    @ralffig3297 Год назад

    Hello
    If I want to use a HuggingFace sentence transformer instead o OpenAI embeddings, how the code would look like? Thxs!

  • @tom-greg
    @tom-greg Год назад +2

    Great! What are the limits? How many pages can it handle, and what are the costs?

    • @DataIndependent
      @DataIndependent  Год назад +1

      However many pages you want. It just storage space. Check out pinecone's pricing for more

  • @florentinhonorius613
    @florentinhonorius613 Год назад

    Which api does it use from openai? GPT-3.5 Turbo or InstructGPT Davinci model? And which one i should use because InstructGPT Davinci costs more than GPT-3.5 Turbo.

  • @quengelbeard
    @quengelbeard 5 месяцев назад

    Hey Greg, great video!
    Do you know if it's possible to automatically create a pinecone db index from code?
    So that you don't have to create them manually

  • @saburspeaks
    @saburspeaks Год назад

    Amazing stuff with these videos

  • @chunhualiao8191
    @chunhualiao8191 Год назад

    I cannot easily get pinecone test API key now due to high demands. What alternatives are available for the vector store?

  • @marmellatadipesca
    @marmellatadipesca Год назад

    which version of python are you using? could not reproduce since the unstructured pdf loader library requires numpy 1.21.2 which for some reason is not listed on the versions supported by python 3.10 (required >=3.7,

    • @DataIndependent
      @DataIndependent  Год назад

      Apologies I don't have the version on there. I'll do that in my videos going foward

  • @shubhamgupta7730
    @shubhamgupta7730 Год назад +1

    I have a doubt. Please help me in this.
    I am trying to create a chatbot in which I provide companies information and it will refer that information and provide answer.
    Currently I was trying to achieve this by fine-tuning the openai gpt model but not getting the desired results.
    How much I have understood that this technique will work for the above use case.
    Am i right?

    • @DataIndependent
      @DataIndependent  Год назад +1

      Yes, it would help with that. You just need to pass your company's documents into the loader

    • @shubhamgupta7730
      @shubhamgupta7730 Год назад

      @@DataIndependentThank you for the reply!

  • @christopheryan3999
    @christopheryan3999 Год назад

    Is there any way we can have the embeddings only embed the text alone and not the metadata?

  • @chrisl4211
    @chrisl4211 Год назад

    so after you uploaded the book, how can you get the book back from pinecone, i kind of wrote a python app following your code and it uploaded 3 time.

  • @jatinaqua007
    @jatinaqua007 2 месяца назад

    Great video! so what if my question is out of context of the pdf document? Will the open ai answers it from its generic knowledge? Or it will simply say that it doesn't know the answer? Either way can we configure it to respond the way we want?

  • @ben4571
    @ben4571 Год назад

    I’m finding that breaking down into chunks following your method and code above, it’s not picking the right documents, or not cross referencing as accurately, given questions when the answer can vary. My book is actually a complex research paper. Would you have any suggestions on what to play around with in Pinecone in order to get a more accurate answer?

  • @BlueGiant69202
    @BlueGiant69202 Год назад

    In 1994 Richard E. Osgood created a conversational reading system called "ASK Michael" for Michael Porter's book "The Competitive Advantage of Nations". Please let me know when you can automate the conceptual indexing and question-based indexing of a book including the creation and categorization of relevant questions that a novice that doesn't know any keywords or relevant vocabulary can ask.

  • @lnyxiux9654
    @lnyxiux9654 Год назад +1

    Thanks for sharing !

    • @DataIndependent
      @DataIndependent  Год назад

      Nice! Glad it worked out

    • @lnyxiux9654
      @lnyxiux9654 Год назад

      @@DataIndependent Yep ! It was a bit of pain to get unstructured properly set up but after that it's all good. Impressive results very quickly !

    • @DataIndependent
      @DataIndependent  Год назад +1

      @@lnyxiux9654 I shared the same pain...that part didn't make it to the video

  • @ininodez
    @ininodez Год назад +2

    Great video!! Loved your explanation. Could you create another video on how to estimate the costs? Is the process of turning the Documents to Embeddings using OpenAI running every time you make a new question? or just the first time? Thanks!

    • @silent.-killer
      @silent.-killer Год назад

      Pinecone is basically a search engine for ai. It doesn't need the entire book but just segments of it instead. This saves a lot of tokens cause only segments of information end up in the prompt.
      Like adding some information into gpt's short term memory

  • @ahmadmoner
    @ahmadmoner Год назад

    what will happen if you ask it outside of the book knowledge? also how to restrict it from going outside of the book?

  • @TonyHoangPodcast
    @TonyHoangPodcast Год назад

    Using your guide and trying to load streamlit for the front end.
    When I try to switch out the query variable to a streamlit text area (below)
    query=st.text_area('Input')
    I get this error: Failed to connect; did you specify the correct index name?
    any idea? on how to fix it? When I switch the text_area back to a variable - it works.

  • @rayxiao460
    @rayxiao460 Год назад

    Very impressive.great job.

  • @valdinia-office2910
    @valdinia-office2910 Год назад

    In LangChain is "similarity search" used as a synonym for "semantic search", or they are referring to different types of search?
    To my knowledge similarity search focuses on finding items that are similar based on their features or characteristics, while semantic search aims to understand the meaning and intent behind the query to provide contextually relevant results

  • @kevinmulia4605
    @kevinmulia4605 Год назад

    Can we scrape from a directory of PDF files rather than individual files?

  • @rishi.b
    @rishi.b Год назад +1

    Is there any limit on the number/ size of the documents that can be uploaded so that the model performs efficiently? I am guessing with larger size, cosine similarity search might take higher computational time

    • @DataIndependent
      @DataIndependent  Год назад

      ya it likely would take longer. I haven't seen a limit yet. At that point it's an engineering problem rather than LLM/LangChain situation

  • @roberthahn9040
    @roberthahn9040 Год назад

    Really awesome video!

    • @DataIndependent
      @DataIndependent  Год назад

      Nice!! Thank you - what else do you want to see?

  • @shinycaroline3722
    @shinycaroline3722 5 месяцев назад

    Greg can you tell me which vector db would work well for prod?