Building End-to-End RAG Pipelines Locally on Kubernetes

Поделиться
HTML-код
  • Опубликовано: 7 апр 2024
  • Watch the replay and a deep dive into my KubeCon Paris demo. Tune in if you are interested in learning how to run an end-to-end RAG pipeline on cloud native infrastructure.
  • НаукаНаука

Комментарии • 2

  • @AIWithShrey
    @AIWithShrey 3 месяца назад

    Any reason why you chose BAAI and not any other Embedding model? What are the impacts of mix and matching the Embedding model and the LLM? My current app works just fine with GPT4ALL Embeddings, and Gemma 1.1 7B.
    Another note: Deploying a quantized LLM will significantly reduce VRAM usage, Gemma 7B Q8_0 quantized takes up 12 gigs of VRAM for me. Implementing KEDA and using Quantization in tandem will be a game-changer.

  • @venkatathota633
    @venkatathota633 3 месяца назад

    could you please provide git repo for the above code?