Maximize AI Efficiency With Upstash Semantic Cache: Save On Large Language Model Costs! | Gui Bibeau

Поделиться
HTML-код
  • Опубликовано: 8 ноя 2024

Комментарии • 21

  • @abrahamolaobaju1781
    @abrahamolaobaju1781 2 месяца назад +1

    useful thanks

    • @guibibeau
      @guibibeau  2 месяца назад

      Glad you liked! Working on a follow up that I will post soon

  • @codingbyte847
    @codingbyte847 2 месяца назад

    Please make a step by step video tutorial on this, as it is a very good approach in saving costs on LLMs. Thanks a lot😊

    • @guibibeau
      @guibibeau  Месяц назад

      Hi there! I think this blog post does a good job of showing the details!
      www.guibibeau.com/blog/llm-semantic-cache
      I don't really do step by step videos as I prefer to focus on overviews and new technologies but hope this helps!

    • @codingbyte847
      @codingbyte847 Месяц назад

      @@guibibeau thanks

  • @opeyemiomodara5888
    @opeyemiomodara5888 2 месяца назад

    I like how not just the code but its real life use in reduced load time and saving up on the costs of calls being made was brought to the fore. Bit of a tongue twister there with upstash and semantic cache in 3:53 as well, lol. Great video!

    • @guibibeau
      @guibibeau  2 месяца назад +1

      Thanks for watching. That was one of my most advanced video yet so I’m unsure if it will find its audience

    • @opeyemiomodara5888
      @opeyemiomodara5888 2 месяца назад +1

      @@guibibeau I am sure it will. Most importantly, it has been created and the knowledge shared already. It will always be a resource that can help builders globally, even if it is in the coming years.

  • @toby_solutions
    @toby_solutions 3 месяца назад +2

    OMG!! I submitted a CFP on this topic to a conference.

    • @guibibeau
      @guibibeau  3 месяца назад

      Hope the repo and video can help! Love that topic! Let me know if your talk is accepted!

    • @toby_solutions
      @toby_solutions 3 месяца назад

      @@guibibeau Sure thing!! I'd love to prep with you. It's gonna be awesome!

  • @kabirkumar5815
    @kabirkumar5815 3 месяца назад +2

    oh shit, might use this for a product question chatbot.
    hmm, this might also be what a img gen i was using a bit ago was using? noticed that if I sent the same prompt, it gave back the same image and if just slightly different, it was just a slightly different image and everything else pretty much exactly the same, with much less variation than other img gens

    • @guibibeau
      @guibibeau  3 месяца назад

      It could likely be also that the model has it's temperature set to really low. This would reduce variance in the output. Or the model could be overfit to some specific parameters.

  • @kabirkumar5815
    @kabirkumar5815 3 месяца назад +1

    this is useful, thank you

    • @guibibeau
      @guibibeau  3 месяца назад

      thanks for the comment!

  • @sreerag4368
    @sreerag4368 2 месяца назад +1

    This is great, but what if the data we are storing is user specific like for eg their pdf/doc data which is unique for each user, then does this work ?

    • @guibibeau
      @guibibeau  2 месяца назад +1

      There is support for namespaces which would allow you to separate your data per user using their id as a namespace key.
      I’ve not played a lot yet with this but I’m planning a video on it.

    • @sreerag4368
      @sreerag4368 2 месяца назад +1

      @@guibibeau Oh that'll be really great

  • @raimondszakis8337
    @raimondszakis8337 2 месяца назад +1

    lol this just sounds like copying LLMs DB into cache, won't it be more easy to run your own LLM?

    • @guibibeau
      @guibibeau  2 месяца назад

      You can run this with your own LLM to! The idea is that inferences is either expensive monetarily if you use a third party or on computations if you run it on your own hardware.
      Either way this will save on computations or money.

    • @raimondszakis8337
      @raimondszakis8337 2 месяца назад +1

      @@guibibeau well yeah makes sense, cache is the king, does that apply to everything as long as data is not changing too frequently where we actually care about it