How to Use CUDA & Multiprocessing to Add Records/Embeddings Faster in ChromaDB

Поделиться
HTML-код
  • Опубликовано: 10 сен 2024
  • I'll show you how I was able to vectorize 33,000 embeddings in about 3 minutes using Python's Multiprocessing capability and my GPU (CUDA). The key is to split the work into two processes: a producer that reads data and puts it into a queue, and a consumer that pulls data from the queue and vectorizes it using a local model. I tested this on the entire Game of Thrones script, and the results show that using a GPU significantly speeds up the process compared to using the CPU. Give it a try and let me know how it goes for you!
    Steal my code: github.com/joh...
    Support me here: www.buymeacoff...
    ChromaDB Playlist: • ChromaDB Vector Databa...

Комментарии • 33

  • @johnnycode
    @johnnycode  4 дня назад

    Check out my playlist of ChromaDB videos: ruclips.net/p/PL58zEckBH8fA-R1ifTjTIjrdc3QKSk6hI

  • @arirajuh
    @arirajuh 3 дня назад

    WOW, Elapsed seconds: 171 Record count: 33198. It worked well. Thank you for sharing step by step procedures in multi-processing., now embedding tasks gets much faster. Appreciate it. Your video helped me a lot of time.

  • @kenchang3456
    @kenchang3456 6 месяцев назад +4

    YAY!!! That was it! For me, using the Python venv rather than Conda (all although I see in the video you're using Conda with no issues) helped make a difference and your coming out with this video really helped implement using Cuda. For 11K documents, I went from 48 mins using CPU to 10 mins using GPU. Thanks again, I really appreciate your sharing. Best regards.

    • @johnnycode
      @johnnycode  6 месяцев назад +1

      Thanks for the multiple coffees, Ken :)

  • @kenchang3456
    @kenchang3456 6 месяцев назад +2

    Thanks @johnnycode I'll give this a try. After wrestling with trying to get my Nvidia involved with embedding with ChromaDb, using Conda, the Conda virtual environment seemed to get corrupted after about three kernel restarts. I just switched to python venv and we back to a previous working version based on your prior example. It is stable now but when I added the device="cuda" when creating the embedding function it threw and error and it's probably because I need install pytorch. I can about 1K rows of my data in 3 mins and once I got through my complete embeddings which is 11K, I'll try the pytorch way. Thanks for putting this out as it is very timely.
    PS: I have no idea why my conda virtual environment kept corrupting but at least with python venv I have a way forward.

  • @Tommy31416
    @Tommy31416 6 месяцев назад +1

    This was brilliant, thank you!
    Could you do a video on taking the text, images and tables from a pdf using Unstructured please? It would be great to see if it is possible to vectorise and embed all 3 types in a ChromaDB for langchain multimodal retrieval afterwards. To be able to query a document and return the relevant text plus any charts and tables would be the holy grail of RAG deployment. Love your channel, finally someone showing us how to use ChromaDB properly

    • @johnnycode
      @johnnycode  6 месяцев назад +2

      Great suggestion! Text with chart images and tables are particularly difficult, I'll see if Unstructured provides a good way to deal with them.

    • @Tommy31416
      @Tommy31416 6 месяцев назад +1

      @@johnnycode thank you so much!!

  • @davidtindell950
    @davidtindell950 6 месяцев назад +1

    Horray! Your Pytorch Chroma code worked well the very first run: " Elapsed seconds: 213 Record count: 33198 ! ". Please try to find additional ways to Maximize Speed and Improve Local Persistent Storage ! P.S. This test was on an OLD Dell G7 with a SLOW NVIDIA GTX 1060 !!!!

    • @johnnycode
      @johnnycode  6 месяцев назад +1

      Your 1060 is still pretty solid!

  • @user-oz4oh2wm7p
    @user-oz4oh2wm7p 4 месяца назад

    I watched your 2 previous videos it was very interesting, thank you very much.

    • @user-oz4oh2wm7p
      @user-oz4oh2wm7p 4 месяца назад

      I'm watching this video and trying it on my nvidia jetson nano

    • @johnnycode
      @johnnycode  4 месяца назад

      What are you planning to do with your jetson nano?

    • @user-oz4oh2wm7p
      @user-oz4oh2wm7p 4 месяца назад

      @@johnnycode I plan to use it because it has nvidia cuda gpu, my hp 845 g10 laptop doesn't have nvidia card =)))

    • @user-oz4oh2wm7p
      @user-oz4oh2wm7p 4 месяца назад

      oh it seems jetson nano doesn't meet the requirements to install chromadb

    • @johnnycode
      @johnnycode  4 месяца назад

      @user-oz4oh2wm7p Really? I thought 2gb memory is the only system requirement.

  • @truthwillout2371
    @truthwillout2371 6 месяцев назад +1

    Isn't ChromaDB for image vectors? I know you can use it but is it optimal?

    • @johnnycode
      @johnnycode  6 месяцев назад +1

      Actually, ChromaDB started out with text vectors, then added image vectors in the last few months.

  • @josef58149
    @josef58149 6 месяцев назад

    Man, thanks for this video u are amazing

  • @AbhishekKumar-jb4ky
    @AbhishekKumar-jb4ky 4 месяца назад

    when i'm using the cuda my gpu is not been crossing more than 25% usage. but it is getting the work done pretty quickly, any idea how to utilize its full power. i'm not using the batch technique as you are using.

    • @johnnycode
      @johnnycode  4 месяца назад

      The collection.add(your_batch) function is the one that utilizes your GPU. You don't necessarily need to implement multiprocessing like I did, just try the collection.add function with different sizes.

  • @ddoq1345j
    @ddoq1345j 3 месяца назад

    I want to know the method to use GPU when creating collection and querying collection. Is it possible?

    • @johnnycode
      @johnnycode  3 месяца назад

      If you enable device=cuda for the sentence transformer like how what I did in the video, then run query, it might use the GPU. You'll probably need a very large database before you can see much activity on the GPU though.

  • @KevKevKev74yes
    @KevKevKev74yes 2 месяца назад

    Hi, good job. Do you know if it's possible with Ollama Embeddings ? Because it seems to use only one process in my code.

    • @johnnycode
      @johnnycode  2 месяца назад

      Sorry, I have not worked with that so I don’t know.

    • @KevKevKev74yes
      @KevKevKev74yes 2 месяца назад

      ​@@johnnycode It works, thanks you with Ollama. Just I can't exceed more than 166 batch size

    • @johnnycode
      @johnnycode  2 месяца назад +1

      That is great! Thank you for the coffee!

  • @AbhishekKumar-jb4ky
    @AbhishekKumar-jb4ky 4 месяца назад

    i don't know why but without using the batch size i can get much lower time

    • @johnnycode
      @johnnycode  4 месяца назад

      Maybe you can use Python's time module to measure the execution time of various lines code that you're running.