Query Your Data with GPT-4 | Embeddings, Vector Databases | Langchain JS Knowledgebase

Поделиться
HTML-код
  • Опубликовано: 24 окт 2024
  • Introduction to Langchain Javascript Embeddings, Vectorstorage, Similarity Search. How to Create GPT-3 GPT-4 Chatbots that can contextually reference your data (txt, JSON, webpages, PDF) with embeddings. Discussion into embeddings, vectorstorage options such as Pinecone, Chroma, Langchain, Supabase, Weaviate.
    Intro Call cal.com/starmo...
    1 hr consulting cal.com/starmo...
    🌐 Our Official Website: starmorph.com
    Langchain Resources
    Langchain JS Docs: js.langchain.c...
    OpenAI Embeddings Docs: platform.opena...

Комментарии • 52

  • @zalmykarimi5255
    @zalmykarimi5255 Год назад +4

    this kind of content is why im so glad to have subscribed.. thanks especially for going through the various vector storage options

    • @starmorph
      @starmorph  Год назад +2

      thank you I'm really glad it was helpful I know everyone has been asking for more about creating the bots with specific data

  • @Jaybearno
    @Jaybearno 11 месяцев назад

    This is a great overview but I’ve been scouring the web for something more advanced. I want to get a deeper understanding of chunking and embedding strategies for specific use cases, as well as learn how to encode document semantic structure in embeddings.

    • @starmorph
      @starmorph  10 месяцев назад

      Thank you and I hear you I would recommend checking out Jerry liu from llamaindex

  • @obaidmuneer2273
    @obaidmuneer2273 Год назад +4

    @starmorph please make a video on how to smartly control token usage issue

  • @ThingEngineer
    @ThingEngineer Год назад

    Thank you! Excited to see this continue to progress.

    • @starmorph
      @starmorph  Год назад +1

      Thank you preciate it 🙂

  • @henzo1735
    @henzo1735 Год назад +1

    My strategy is using Postgres with the pgvector extension (I think this is what Supabase is using).
    Also, its nice to know about the document loader, I'm going to try that.

    • @starmorph
      @starmorph  Год назад

      How has your experience been with pgVector? very excited to try it out

  • @DanielFankhauser
    @DanielFankhauser Год назад

    another amazing video! Thanks a lot

    • @starmorph
      @starmorph  Год назад

      you're welcome glad it was helpful!!

  • @Fluffynix
    @Fluffynix Год назад

    How does LangChain compare to Haystack which has been around for years?

  • @braidata
    @braidata Год назад

    Great! I'm trying to create custom tools and it's been expensive to use the api without embeddings, I hope I can move forward. Thank you,

    • @starmorph
      @starmorph  Год назад +1

      You’re welcome good luck with your projects!

  • @infamousrockstar7
    @infamousrockstar7 Год назад

    Great Video!
    I have a question if you dont mind.
    I have a script that queries the langchain documentation via a HTML loader, I do text splitting, then create the embeddings, then store them in a vector database, and at the end I query the document using a "ConversationalRetrievalChain" with the GPT 3.5 turbo model, GTP 4 or text-davinci-003.
    Is this the correct aproach? Because I see people doing just text splitting and storing on a vector database, and querying the data using text-davinci-003.

  • @tomhurford8177
    @tomhurford8177 Год назад

    Really interesting work

  • @brezl8
    @brezl8 Год назад

    awesome, keep it up!

  • @efficiencydna897
    @efficiencydna897 Год назад

    mark ,thanks ! great video

  • @sourav_-_7038
    @sourav_-_7038 Год назад +1

    brother do some tutorials on this topic, take one company policy pdf, and question the app for answers.

    • @starmorph
      @starmorph  Год назад +1

      good idea - will do

    • @Zcy7th
      @Zcy7th Год назад

      I also would love to see it, wait for it cheers!

  • @aiContent420
    @aiContent420 Год назад +1

    Great work, do you have a public github with such project?

    • @starmorph
      @starmorph  Год назад

      thank you - we have many public repos on our github github.com/starmorph

  • @Yanikikudon
    @Yanikikudon Год назад

    Great, Thank you

    • @starmorph
      @starmorph  Год назад

      you're welcome, glad it was helpful!

  • @SRWeaverPoetry
    @SRWeaverPoetry Год назад

    What kind of developer is one that primarily does Localhost servers and uses it to run their bot through RSS format?
    Also longterm I may also share files over IPFS.

    • @starmorph
      @starmorph  Год назад

      it depends what you use to create the localhost server, could be done with node.js, python, or other frameworks

  • @rikhendrix261
    @rikhendrix261 Год назад

    Is there a way to Add the embeddings to the Chat GPT dataset? And when asking it questions about your document. It can also validate the document. For exmaple. A math test document with only the questions and possible answers. And chat gpt can answer those questions?

    • @rikhendrix261
      @rikhendrix261 Год назад

      When I try this with the Chat With Data way. The api says it doesn't know since the document doesn't provide the answers to the questions, but of course when asking chat gpt what is 5+2 it konws the answer. But not chat gpt when asking about the embeddings math test document

    • @starmorph
      @starmorph  Год назад

      If the answers aren't defined in the dataset, I would say some other options would be finetuning the model (instead of embeddings), or using another tool like the langchain calculator, or wolfram alpha to help with the math part

  • @aurobindobhuyan2107
    @aurobindobhuyan2107 Год назад

    Can we use real-world JSON data for embedding? which includes id, name, username, email, addresses, etc.

    • @starmorph
      @starmorph  Год назад

      Yes, there is a new JSON loader in Langchain js.langchain.com/docs/api/document_loaders_fs_json/classes/JSONLoader

  • @christopherchilton-smith6482
    @christopherchilton-smith6482 Год назад +4

    Who the hell has access to gpt4 in their api, they sure as hell dont seem to be giving me access anytime soon.

    • @starmorph
      @starmorph  Год назад

      Fair enough - this works with 3 or 3.5 as well

    • @drancerd
      @drancerd Год назад

      Un w8ing to 😢 (Supabase team are the best)

  • @elghali6687
    @elghali6687 Год назад

    Does LangChain support other open source GPT models such as LLAma ?

    • @starmorph
      @starmorph  Год назад +1

      Yes in particular the python version of langchain has Llama and lots of other models
      python.langchain.com/en/latest/modules/models/llms/integrations/llamacpp.html

  • @luis96xd
    @luis96xd Год назад

    Great video!
    Do we need an API key for all of this or for using all models of Langchain?

    • @starmorph
      @starmorph  Год назад +2

      Thank you! Yes you need an API key for using openai (creation available on their website) you don’t need one if you use a locally hosted model like alpaca instead of GPT

    • @luis96xd
      @luis96xd Год назад +1

      @@starmorph Because in my country OpenAI don't offer their services

  • @Edu4Dev
    @Edu4Dev Год назад

    10:39 start here

  • @benji_dev
    @benji_dev Год назад

    Awesome videos! I've been learning a lot from you lately.
    Recently, I was trying to create a vector store with HNSWLib, as it looked the simplest of all to get started with a pet project, however it's giving me error of that I should install the package... Obviously it's installed, I'm using NextJS v13+ with TypeScript. I know that most of these stores only work on Nodejs environment, but Next is built on top of node. Also I'm using other tools from langchain that only works on Node and they work fine in my Next app as well. Has anyone came across this issue before?

    • @starmorph
      @starmorph  Год назад +1

      I have run into similar issues with strange package dependency errors in javascript langchain. My best 2 ideas are:
      1. Make sure you are using HNSW imported from langchain( rather than standalone pkg). I’m not sure if that requires external HNSWlib package as well.
      2. Try to Find an open source repo with examples as sometimes the environment config (eg next-config) is sensitive / needs to have very specific config to support experimental tech used in Langchain

    • @starmorph
      @starmorph  Год назад +1

      Also thank you for watching and glad it’s been helpful!! 🙏🏼

  • @kendrickcaranicas4562
    @kendrickcaranicas4562 Год назад

    any chance you will open source a starter?

  • @MogulSuccess
    @MogulSuccess Год назад

    so we are all going to pretend that OpenAI isn't going to go through our proprietary data and use it to benefit their start ups?

    • @starmorph
      @starmorph  Год назад

      I think it is an important point concerning how centralized the data is becoming and hope to support alternative options like llama alpaca in the future