Semantic Search with Open-Source Vector DB: Chroma DB | Pinecone Alternative | Code

Pradip Nichite

Просмотров 23 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 22 авг 2024

Комментарии • 47

@Andromeda26_ 9 месяцев назад ⁺³
Thank you so much for sharing the details, Pradip! Your informative RUclips videos have been incredibly helpful. Great job on putting together such valuable content! Keep up the outstanding work and continue enlightening us. We truly appreciate your contributions!
@FutureSmartAI 9 месяцев назад ⁺¹
My pleasure!
@avishsharma8852 11 месяцев назад ⁺¹
Where is the coffee button :) great work! let me know if you offer paid one on one tutorials
@FutureSmartAI 11 месяцев назад
Hi Thanks, No I dont offer one on one coaching. Now days very much busy with freelancing work
@zaheerbeg4810 Год назад ⁺¹
Adorable , thanks #PradipNichite for your time and efforts👍👍👍
@moralstorieskids3884 Год назад ⁺¹
Thanks for your efforts, waiting for chromadb along with langchain similar to your previous video (pinecone with langchain)
@FutureSmartAI Год назад
Just Published: ruclips.net/video/5NG8mefEsCU/видео.html
@satheeshthangaraj5614 Год назад ⁺²
Great 👍
@FutureSmartAI Год назад
Thanks for the visit
@notmeno4881 10 месяцев назад
You know you know you know you know you know
@prasanosara1944 Год назад ⁺²
thanks pradip for great tutorial! i have couple of questions, 1. is chromadb is best in semantic search? can you please do a comparison video for vector dbs? 2. these semantic search are giving too many unwanted and un related results, how to filter them out? Kindly help
@FutureSmartAI Год назад
What usecase you have? you can try different embeding model and use some threshold for semantic score.
@prasanosara1944 Год назад
@@FutureSmartAI thanks for the response! for example , i have a requirement to get set of survey questions relevant to group of people who can be described as "Single men at the age of 45", these vector search is giving higher priorities to the questions such as "What is the age of your children?" instead of "What is your age?" from the list of questions i send as part of prompt.
Kindly let me know how can i control this?
@vinven7 Год назад ⁺¹
Thanks so much Pradip for this great video and explanation of ChromaDb! The code file that you shared seems to be missing the section where you store the db in a local directory. Can you made also explain how to do a client.upsertion if the document corpus is very large?
@FutureSmartAI Год назад
from chromadb.config import Settings
client = chromadb.Client(Settings(
chroma_db_impl="duckdb+parquet",
persist_directory="pet_db" # Optional, defaults to .chromadb/ in the current directory
))
Here persist_directory="pet_db" is local directory
@SatyendraJaiswal-hz1cb 2 месяца назад
can we query from our csv dataset as well, i tried creating vector embeddings but while querying any prompt am not able to get exact answer. for eg i have a airlines data set and i am giving prompt like "how many customers are frequent flyer" so in response i am expecting overall count but its giving me name and id for a particular customer.
Any thoughts?
@SoundTamilan 3 месяца назад
Where the db space will create and how it can be user given path
@shinycaroline3722 5 месяцев назад
@Pradip Nichite : Help me with the below queries pls
1. Is chromadb good for prod since it uses in memory db
2. What is limit of docs we can store in a chromadb collection?
3. I have tried using Pinecone through langchain, could see my stored index in hosted application, but not able to retrieve the embeddings from it and use it. Eg. Pinecone.from_existing_index() doesn't seem to work.
4. Is there any other vector db which has hosted application?
@quantadotonium3654 Год назад ⁺¹
Thank you Pradip! Hopefully, soon will see besides normal pdfs/txts extractions, what about extracting data from Tables and store them in DB?
@FutureSmartAI Год назад
We can do that you can even use libraries like pdf plumber to extract and then GPT to answer qouestions from it.
Do you have any particular usecase in mind that I can cover?
@zaheerbeg4810 Год назад
@@FutureSmartAI , You can consider PDF Query tool to QA on tabular data, especially PDF's like 10K, 10Q documents, Tanks in advance #Pradip
@idveernegi1021 4 месяца назад
How can I change similarity search algorithm in query
Like cosine ecludien etc..
@karamjittech 11 месяцев назад
Nice video. Which is the best open source Vector DB? Pinecode FREE tier is really limited.
@FutureSmartAI 11 месяцев назад ⁺¹
Hi Open Source mostly I have used only chroma DB and its working well. Slowly I am exploring others
@user-ll1ht7dr9h Год назад
i am facing issues with chromadb installation on unbuntu
@ravitejavemula333 9 месяцев назад
Hi sir . I have a couch db . Can I create a vector db for it
@test12345265 11 месяцев назад
Thank you Pradip for a great video. What is the limitation of Chroma DB (ie number of MBytes, number of documents, etc)? I tried to index 2000+ PDF files for semantic search, however, it always stopped at PDF #273. No error message was given.
@OlaPraveenMishra 10 месяцев назад
Expected each value in the embedding to be a int or float, got getting this error buddy, can you help with this. idea_collection_emb = client.create_collection("idea_collection_emb")
idea_collection_emb.add(
documents=documents,
embeddings=embeddings,
metadatas=metadatas,
ids=ids
)
@kamalinichauhan4407 6 месяцев назад
were you able to solve it?
@chet3118 Год назад
HI Pradip, Its a basic question but need to instruct my team, can you please let me the know the tool which you used to run the code.
@kien3848 11 месяцев назад ⁺¹
google colab bro
@shrutinathavani Год назад
sir how to set it as a pinecone alternative and use as per your previous video with gpt 3.5 turbo ?
@FutureSmartAI Год назад
Check this : ruclips.net/video/5NG8mefEsCU/видео.html
@TheRealWakanda 11 месяцев назад
@Pradip How to store and retrieve data in vector format
@FutureSmartAI 10 месяцев назад
Hi can you ellaborate on exactly what you want to do?
@henkhbit5748 Год назад
Thanks for the video. Question how to get the vectors from chroma db where my documents already stored in the persist db?
@FutureSmartAI Год назад
you mean you already have db?
@henkhbit5748 Год назад
@@FutureSmartAI yes. i already have the db
@mwanthidaniel1254 10 месяцев назад
Where do we get the Pets data
@FutureSmartAI 10 месяцев назад
its sample file generated using chatgpt
@avishsharma8852 11 месяцев назад
But I was hoping for semantic search
@FutureSmartAI 11 месяцев назад
Hi What you are expecting in Semantic Search. May you can check other tutorials where I explained
Embeding, Semantic Search , QnA etc.
@wasimsalafi 11 месяцев назад ⁺¹
@Pradip Nichite can you remove like 99% of "you know"s from your recordings and re-upload please? perhaps 1% of the time it is ok or better remove them 100%
@FutureSmartAI 11 месяцев назад
Thanks for suggestions
@biniyam106 7 месяцев назад
hater
@rageshantony2182 Год назад
Hi
If I give n_results=2, then I am getting all docs
{'ids': [['id2', 'id1']],
'distances': [[0.8069301247596741, 1.648103952407837]],
'metadatas': [[{'category': 'vehicle'}, {'category': 'animal'}]],
'embeddings': None,
'documents': [['This is a document about car',
'This is a document about cat']]}

Следующие

Автовоспроизведение

OpenAI Embeddings and Vector Databases Crash Course