Query Your Data with GPT-4 | Embeddings, Vector Databases | Langchain JS Knowledgebase
HTML-код
- Опубликовано: 24 окт 2024
- Introduction to Langchain Javascript Embeddings, Vectorstorage, Similarity Search. How to Create GPT-3 GPT-4 Chatbots that can contextually reference your data (txt, JSON, webpages, PDF) with embeddings. Discussion into embeddings, vectorstorage options such as Pinecone, Chroma, Langchain, Supabase, Weaviate.
Intro Call cal.com/starmo...
1 hr consulting cal.com/starmo...
🌐 Our Official Website: starmorph.com
Langchain Resources
Langchain JS Docs: js.langchain.c...
OpenAI Embeddings Docs: platform.opena...
this kind of content is why im so glad to have subscribed.. thanks especially for going through the various vector storage options
thank you I'm really glad it was helpful I know everyone has been asking for more about creating the bots with specific data
This is a great overview but I’ve been scouring the web for something more advanced. I want to get a deeper understanding of chunking and embedding strategies for specific use cases, as well as learn how to encode document semantic structure in embeddings.
Thank you and I hear you I would recommend checking out Jerry liu from llamaindex
@starmorph please make a video on how to smartly control token usage issue
Thank you! Excited to see this continue to progress.
Thank you preciate it 🙂
My strategy is using Postgres with the pgvector extension (I think this is what Supabase is using).
Also, its nice to know about the document loader, I'm going to try that.
How has your experience been with pgVector? very excited to try it out
another amazing video! Thanks a lot
you're welcome glad it was helpful!!
How does LangChain compare to Haystack which has been around for years?
Great! I'm trying to create custom tools and it's been expensive to use the api without embeddings, I hope I can move forward. Thank you,
You’re welcome good luck with your projects!
Great Video!
I have a question if you dont mind.
I have a script that queries the langchain documentation via a HTML loader, I do text splitting, then create the embeddings, then store them in a vector database, and at the end I query the document using a "ConversationalRetrievalChain" with the GPT 3.5 turbo model, GTP 4 or text-davinci-003.
Is this the correct aproach? Because I see people doing just text splitting and storing on a vector database, and querying the data using text-davinci-003.
Really interesting work
thank you 🙏
awesome, keep it up!
thank you!! Will do
mark ,thanks ! great video
Thank you
brother do some tutorials on this topic, take one company policy pdf, and question the app for answers.
good idea - will do
I also would love to see it, wait for it cheers!
Great work, do you have a public github with such project?
thank you - we have many public repos on our github github.com/starmorph
Great, Thank you
you're welcome, glad it was helpful!
What kind of developer is one that primarily does Localhost servers and uses it to run their bot through RSS format?
Also longterm I may also share files over IPFS.
it depends what you use to create the localhost server, could be done with node.js, python, or other frameworks
Is there a way to Add the embeddings to the Chat GPT dataset? And when asking it questions about your document. It can also validate the document. For exmaple. A math test document with only the questions and possible answers. And chat gpt can answer those questions?
When I try this with the Chat With Data way. The api says it doesn't know since the document doesn't provide the answers to the questions, but of course when asking chat gpt what is 5+2 it konws the answer. But not chat gpt when asking about the embeddings math test document
If the answers aren't defined in the dataset, I would say some other options would be finetuning the model (instead of embeddings), or using another tool like the langchain calculator, or wolfram alpha to help with the math part
Can we use real-world JSON data for embedding? which includes id, name, username, email, addresses, etc.
Yes, there is a new JSON loader in Langchain js.langchain.com/docs/api/document_loaders_fs_json/classes/JSONLoader
Who the hell has access to gpt4 in their api, they sure as hell dont seem to be giving me access anytime soon.
Fair enough - this works with 3 or 3.5 as well
Un w8ing to 😢 (Supabase team are the best)
Does LangChain support other open source GPT models such as LLAma ?
Yes in particular the python version of langchain has Llama and lots of other models
python.langchain.com/en/latest/modules/models/llms/integrations/llamacpp.html
Great video!
Do we need an API key for all of this or for using all models of Langchain?
Thank you! Yes you need an API key for using openai (creation available on their website) you don’t need one if you use a locally hosted model like alpaca instead of GPT
@@starmorph Because in my country OpenAI don't offer their services
10:39 start here
Awesome videos! I've been learning a lot from you lately.
Recently, I was trying to create a vector store with HNSWLib, as it looked the simplest of all to get started with a pet project, however it's giving me error of that I should install the package... Obviously it's installed, I'm using NextJS v13+ with TypeScript. I know that most of these stores only work on Nodejs environment, but Next is built on top of node. Also I'm using other tools from langchain that only works on Node and they work fine in my Next app as well. Has anyone came across this issue before?
I have run into similar issues with strange package dependency errors in javascript langchain. My best 2 ideas are:
1. Make sure you are using HNSW imported from langchain( rather than standalone pkg). I’m not sure if that requires external HNSWlib package as well.
2. Try to Find an open source repo with examples as sometimes the environment config (eg next-config) is sensitive / needs to have very specific config to support experimental tech used in Langchain
Also thank you for watching and glad it’s been helpful!! 🙏🏼
any chance you will open source a starter?
Yes working on this
so we are all going to pretend that OpenAI isn't going to go through our proprietary data and use it to benefit their start ups?
I think it is an important point concerning how centralized the data is becoming and hope to support alternative options like llama alpaca in the future