Semantic Search with Open-Source Vector DB: Chroma DB | Pinecone Alternative | Code

Поделиться
HTML-код
  • Опубликовано: 22 авг 2024

Комментарии • 47

  • @Andromeda26_
    @Andromeda26_ 9 месяцев назад +3

    Thank you so much for sharing the details, Pradip! Your informative RUclips videos have been incredibly helpful. Great job on putting together such valuable content! Keep up the outstanding work and continue enlightening us. We truly appreciate your contributions!

  • @avishsharma8852
    @avishsharma8852 11 месяцев назад +1

    Where is the coffee button :) great work! let me know if you offer paid one on one tutorials

    • @FutureSmartAI
      @FutureSmartAI  11 месяцев назад

      Hi Thanks, No I dont offer one on one coaching. Now days very much busy with freelancing work

  • @zaheerbeg4810
    @zaheerbeg4810 Год назад +1

    Adorable , thanks #PradipNichite for your time and efforts👍👍👍

  • @moralstorieskids3884
    @moralstorieskids3884 Год назад +1

    Thanks for your efforts, waiting for chromadb along with langchain similar to your previous video (pinecone with langchain)

    • @FutureSmartAI
      @FutureSmartAI  Год назад

      Just Published: ruclips.net/video/5NG8mefEsCU/видео.html

  • @satheeshthangaraj5614
    @satheeshthangaraj5614 Год назад +2

    Great 👍

  • @notmeno4881
    @notmeno4881 10 месяцев назад

    You know you know you know you know you know

  • @prasanosara1944
    @prasanosara1944 Год назад +2

    thanks pradip for great tutorial! i have couple of questions, 1. is chromadb is best in semantic search? can you please do a comparison video for vector dbs? 2. these semantic search are giving too many unwanted and un related results, how to filter them out? Kindly help

    • @FutureSmartAI
      @FutureSmartAI  Год назад

      What usecase you have? you can try different embeding model and use some threshold for semantic score.

    • @prasanosara1944
      @prasanosara1944 Год назад

      @@FutureSmartAI thanks for the response! for example , i have a requirement to get set of survey questions relevant to group of people who can be described as "Single men at the age of 45", these vector search is giving higher priorities to the questions such as "What is the age of your children?" instead of "What is your age?" from the list of questions i send as part of prompt.
      Kindly let me know how can i control this?

  • @vinven7
    @vinven7 Год назад +1

    Thanks so much Pradip for this great video and explanation of ChromaDb! The code file that you shared seems to be missing the section where you store the db in a local directory. Can you made also explain how to do a client.upsertion if the document corpus is very large?

    • @FutureSmartAI
      @FutureSmartAI  Год назад

      from chromadb.config import Settings
      client = chromadb.Client(Settings(
      chroma_db_impl="duckdb+parquet",
      persist_directory="pet_db" # Optional, defaults to .chromadb/ in the current directory
      ))
      Here persist_directory="pet_db" is local directory

  • @SatyendraJaiswal-hz1cb
    @SatyendraJaiswal-hz1cb 2 месяца назад

    can we query from our csv dataset as well, i tried creating vector embeddings but while querying any prompt am not able to get exact answer. for eg i have a airlines data set and i am giving prompt like "how many customers are frequent flyer" so in response i am expecting overall count but its giving me name and id for a particular customer.
    Any thoughts?

  • @SoundTamilan
    @SoundTamilan 3 месяца назад

    Where the db space will create and how it can be user given path

  • @shinycaroline3722
    @shinycaroline3722 5 месяцев назад

    @Pradip Nichite : Help me with the below queries pls
    1. Is chromadb good for prod since it uses in memory db
    2. What is limit of docs we can store in a chromadb collection?
    3. I have tried using Pinecone through langchain, could see my stored index in hosted application, but not able to retrieve the embeddings from it and use it. Eg. Pinecone.from_existing_index() doesn't seem to work.
    4. Is there any other vector db which has hosted application?

  • @quantadotonium3654
    @quantadotonium3654 Год назад +1

    Thank you Pradip! Hopefully, soon will see besides normal pdfs/txts extractions, what about extracting data from Tables and store them in DB?

    • @FutureSmartAI
      @FutureSmartAI  Год назад

      We can do that you can even use libraries like pdf plumber to extract and then GPT to answer qouestions from it.
      Do you have any particular usecase in mind that I can cover?

    • @zaheerbeg4810
      @zaheerbeg4810 Год назад

      @@FutureSmartAI , You can consider PDF Query tool to QA on tabular data, especially PDF's like 10K, 10Q documents, Tanks in advance #Pradip

  • @idveernegi1021
    @idveernegi1021 4 месяца назад

    How can I change similarity search algorithm in query
    Like cosine ecludien etc..

  • @karamjittech
    @karamjittech 11 месяцев назад

    Nice video. Which is the best open source Vector DB? Pinecode FREE tier is really limited.

    • @FutureSmartAI
      @FutureSmartAI  11 месяцев назад +1

      Hi Open Source mostly I have used only chroma DB and its working well. Slowly I am exploring others

  • @user-ll1ht7dr9h
    @user-ll1ht7dr9h Год назад

    i am facing issues with chromadb installation on unbuntu

  • @ravitejavemula333
    @ravitejavemula333 9 месяцев назад

    Hi sir . I have a couch db . Can I create a vector db for it

  • @test12345265
    @test12345265 11 месяцев назад

    Thank you Pradip for a great video. What is the limitation of Chroma DB (ie number of MBytes, number of documents, etc)? I tried to index 2000+ PDF files for semantic search, however, it always stopped at PDF #273. No error message was given.

  • @OlaPraveenMishra
    @OlaPraveenMishra 10 месяцев назад

    Expected each value in the embedding to be a int or float, got getting this error buddy, can you help with this. idea_collection_emb = client.create_collection("idea_collection_emb")
    idea_collection_emb.add(
    documents=documents,
    embeddings=embeddings,
    metadatas=metadatas,
    ids=ids
    )

  • @chet3118
    @chet3118 Год назад

    HI Pradip, Its a basic question but need to instruct my team, can you please let me the know the tool which you used to run the code.

    • @kien3848
      @kien3848 11 месяцев назад +1

      google colab bro

  • @shrutinathavani
    @shrutinathavani Год назад

    sir how to set it as a pinecone alternative and use as per your previous video with gpt 3.5 turbo ?

    • @FutureSmartAI
      @FutureSmartAI  Год назад

      Check this : ruclips.net/video/5NG8mefEsCU/видео.html

  • @TheRealWakanda
    @TheRealWakanda 11 месяцев назад

    @Pradip How to store and retrieve data in vector format

    • @FutureSmartAI
      @FutureSmartAI  10 месяцев назад

      Hi can you ellaborate on exactly what you want to do?

  • @henkhbit5748
    @henkhbit5748 Год назад

    Thanks for the video. Question how to get the vectors from chroma db where my documents already stored in the persist db?

  • @mwanthidaniel1254
    @mwanthidaniel1254 10 месяцев назад

    Where do we get the Pets data

    • @FutureSmartAI
      @FutureSmartAI  10 месяцев назад

      its sample file generated using chatgpt

  • @avishsharma8852
    @avishsharma8852 11 месяцев назад

    But I was hoping for semantic search

    • @FutureSmartAI
      @FutureSmartAI  11 месяцев назад

      Hi What you are expecting in Semantic Search. May you can check other tutorials where I explained
      Embeding, Semantic Search , QnA etc.

  • @wasimsalafi
    @wasimsalafi 11 месяцев назад +1

    @Pradip Nichite can you remove like 99% of "you know"s from your recordings and re-upload please? perhaps 1% of the time it is ok or better remove them 100%

  • @rageshantony2182
    @rageshantony2182 Год назад

    Hi
    If I give n_results=2, then I am getting all docs
    {'ids': [['id2', 'id1']],
    'distances': [[0.8069301247596741, 1.648103952407837]],
    'metadatas': [[{'category': 'vehicle'}, {'category': 'animal'}]],
    'embeddings': None,
    'documents': [['This is a document about car',
    'This is a document about cat']]}