Claude Prompt Caching: Did Anthropic Create a Better Alternative to RAG?

Поделиться
HTML-код
  • Опубликовано: 27 ноя 2024

Комментарии • 35

  • @OscarCaicedo
    @OscarCaicedo 3 месяца назад +17

    "380% Lower Latency." A percentage above 100% in this context is incorrect because latency cannot be reduced by more than 100%. A latency reduction of 100% is a latency of 0 ms.

    • @sanesanyo
      @sanesanyo 3 месяца назад +1

      AI is making RUclipsrs dumber. That's the only explanation otherwise I don't know how this can happen.

    • @vitalis
      @vitalis 3 месяца назад +1

      We have just identified the non-LLM entity here... unless you are Grog 10 "watermelon"

    • @avidlearner8117
      @avidlearner8117 3 месяца назад +1

      Wow man, go out, do something, but this is a bad look 🤣

  • @thenoblerot
    @thenoblerot 3 месяца назад +1

    This changed my entire approach to a project. I wish they held the cache longer than 5 minutes! Even 10 or 15 would be nice, but how about an hour!? Love that it will cache images, too.

  • @epipolar4480
    @epipolar4480 3 месяца назад +1

    Script execution time is different to latency. Latency is effectively how long it takes to return the first token, and and then the time to the next token and so on. This will always be a very low number, with or without caching, so for short responses such as in your second example the script execution will always be fast. For longer responses such as book summary the latency makes a difference as it accumulates for each token, and there are many more tokens. I didn't look at your code and haven't used the anthropic API, but I guess you weren't streaming the tokens so you couldn't actually measure the latency by this method. Still really appreciate the video as I was curious about this caching and this explained a lot to me, thank you!

  • @Quitcool
    @Quitcool 3 месяца назад +2

    The part you didn't understand happened because the meaning of you added sentence was far away in the terms of the book understanding tokens, so it got it pretty easily

  • @pioggiadifuoco7522
    @pioggiadifuoco7522 3 месяца назад +2

    It would be very useful if you showed us the cost of your entire testing, thanks.

  • @modoulaminceesay9211
    @modoulaminceesay9211 3 месяца назад +1

    Your videos being very helpful . I don’t know when do we get the desktop app

  • @ramp2011
    @ramp2011 3 месяца назад

    Thank you for the video. Second time you are asking a question, I am curious why you are passing the context again in the system prompt if its already cached? Can we just ask the question without calling the system prompt to send the context?

  • @radu-mirceasirbu2159
    @radu-mirceasirbu2159 3 месяца назад

    Do you write the code yourself or you generate it with LLM?

  • @kamelirzouni
    @kamelirzouni 3 месяца назад

    Thanks, Kris!

  • @StayPolishThinkEnglish
    @StayPolishThinkEnglish 3 месяца назад

    Sorry for posting a question randomly, but do you have any tutorial for ai voicebot for discord ?

  • @DitDev-o9t
    @DitDev-o9t 3 месяца назад

    It would be really nice to see how you talk to a book that wasn't in the training dataset (I am pretty sure that Harry Potter was there).

  • @micbab-vg2mu
    @micbab-vg2mu 3 месяца назад

    thanks for update:)

  • @PunitaOjha01
    @PunitaOjha01 2 месяца назад

    We are using anthropic claude v3.5 sonnet on amazon bedrock. Since the prompt caching feature is in beta, I wanted to clarify if it is available for bedrock. I tried reaching out to Anthropic support for the same but could not get through. It will be great if someone could answer this for me?

  • @rcj1337
    @rcj1337 3 месяца назад +4

    How is this replacing RAG?

    • @ahtoshkaa
      @ahtoshkaa 3 месяца назад

      Example: My AI companion uses facts about myself when answering. 5 facts are pulled based on the average vector of the latest input. this is done after each message. But I can dump all in facts into cache and forgo this system entirely.
      will it be better? damn if I know. probably? requires a lot of testing. it would be awesome if anthropic wasn't this censored. I'm not sure I can even use their models in my companion without it getting triggered.
      But it's definitely not a replacement for RAG... something different, but really cool

  • @JNET_Reloaded
    @JNET_Reloaded 3 месяца назад

    wheres link to code you used?

  • @JNET_Reloaded
    @JNET_Reloaded 3 месяца назад

    also i recommend putting timing in the script!

  • @j0hnc0nn0r-sec
    @j0hnc0nn0r-sec 3 месяца назад

    I’m thinking of trying the Anthropic cache will a local pgvector store or neo4j. Might make things better… or weird. Kris could do it better. Is this a good idea?

    • @j0hnc0nn0r-sec
      @j0hnc0nn0r-sec 3 месяца назад

      You can cache the “500 page book” context you find in Claude projects, btw

  • @hemanthkumar-tj4hs
    @hemanthkumar-tj4hs 3 месяца назад

    What if I ask another question after caching the entire book?

  • @ahtoshkaa
    @ahtoshkaa 3 месяца назад

    Damn, I completely forgot about Google's caching. Looked at the prices. It seems like Google caching is 4 times cheaper than normal. In contrast Anthropic is 10 times cheaper BUT it's more expensive to create Cache by 25%... So I have no idea what the math is here someone help me out.

  • @luisfelipe6368
    @luisfelipe6368 3 месяца назад

    Nice, but still expensive, $15 per MTok output is rough. Hopefully we will see this decrease in the future, specially since OpenAI probably has something similar on the works.

  • @perschistence2651
    @perschistence2651 3 месяца назад

    I do not understand why they not just cache everything that changed when you set this flag... Why the cache points?

  • @ginocote
    @ginocote 3 месяца назад

    5 minutes is very short, worse when you are programming. You can easly have more than 5 min. between 2 prompts. It shoud be 30 minimum, hope they will update this.

  • @newfrontiers5673
    @newfrontiers5673 3 месяца назад +1

    Interesting but not a replacement for rag I dont think.

  • @2BeFreeFromPain
    @2BeFreeFromPain 3 месяца назад

    5 minutes is very short

  • @norlesh
    @norlesh 3 месяца назад

    FYI Google Gemini has been doing prompt caching for some time now.

    • @Solo2121
      @Solo2121 3 месяца назад +2

      He mentions that at 13:33

  • @yellowboat8773
    @yellowboat8773 3 месяца назад

    Who actually uses rag? I've found it so unreliable

    • @DESX312
      @DESX312 3 месяца назад

      It's as good as your implementation of it is. Use crappy embedding models and crappy text organization, and get crappy output. The inverse is true as well.

  • @ronaldronald8819
    @ronaldronald8819 3 месяца назад

    Claude is getting stupid (its being quantized) To bad.