How to Build an AI Web App with Claude 3.5 and Cursor | Full Tutorial

How to Build Effective AI Agents (without the hype)

The Best RAG Technique Yet? Anthropic’s Contextual Retrieval Explained!

Vermont vs. Marshall: 2024 NCAA men’s soccer championship highlights

Manchester City v. Manchester United | PREMIER LEAGUE HIGHLIGHTS | 12/15/2024 | NBC Sports

🔴 BLOX FRUITS DRAGON UPDATE OFFICIAL COUNTDOWN!

Claude Prompt Caching: Did Anthropic Create a Better Alternative to RAG?

All About AI

Просмотров 11 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 24 янв 2025

Комментарии • 35

@thenoblerot 5 месяцев назад ⁺²
This changed my entire approach to a project. I wish they held the cache longer than 5 minutes! Even 10 or 15 would be nice, but how about an hour!? Love that it will cache images, too.
@OscarCaicedo 5 месяцев назад ⁺¹⁸
"380% Lower Latency." A percentage above 100% in this context is incorrect because latency cannot be reduced by more than 100%. A latency reduction of 100% is a latency of 0 ms.
@sanesanyo 5 месяцев назад ⁺²
AI is making RUclipsrs dumber. That's the only explanation otherwise I don't know how this can happen.
@vitalis 5 месяцев назад ⁺¹
We have just identified the non-LLM entity here... unless you are Grog 10 "watermelon"
@avidlearner8117 5 месяцев назад ⁺¹
Wow man, go out, do something, but this is a bad look 🤣
@Quitcool 5 месяцев назад ⁺³
The part you didn't understand happened because the meaning of you added sentence was far away in the terms of the book understanding tokens, so it got it pretty easily
@kamelirzouni 5 месяцев назад ⁺¹
Thanks, Kris!
@pioggiadifuoco7522 5 месяцев назад ⁺²
It would be very useful if you showed us the cost of your entire testing, thanks.
@epipolar4480 5 месяцев назад ⁺¹
Script execution time is different to latency. Latency is effectively how long it takes to return the first token, and and then the time to the next token and so on. This will always be a very low number, with or without caching, so for short responses such as in your second example the script execution will always be fast. For longer responses such as book summary the latency makes a difference as it accumulates for each token, and there are many more tokens. I didn't look at your code and haven't used the anthropic API, but I guess you weren't streaming the tokens so you couldn't actually measure the latency by this method. Still really appreciate the video as I was curious about this caching and this explained a lot to me, thank you!
@modoulaminceesay9211 5 месяцев назад ⁺¹
Your videos being very helpful . I don’t know when do we get the desktop app
@DitDev-o9t 5 месяцев назад
It would be really nice to see how you talk to a book that wasn't in the training dataset (I am pretty sure that Harry Potter was there).
@micbab-vg2mu 5 месяцев назад
thanks for update:)
@ramp2011 5 месяцев назад
Thank you for the video. Second time you are asking a question, I am curious why you are passing the context again in the system prompt if its already cached? Can we just ask the question without calling the system prompt to send the context?
@rcj1337 5 месяцев назад ⁺⁴
How is this replacing RAG?
@ahtoshkaa 5 месяцев назад
Example: My AI companion uses facts about myself when answering. 5 facts are pulled based on the average vector of the latest input. this is done after each message. But I can dump all in facts into cache and forgo this system entirely.
will it be better? damn if I know. probably? requires a lot of testing. it would be awesome if anthropic wasn't this censored. I'm not sure I can even use their models in my companion without it getting triggered.
But it's definitely not a replacement for RAG... something different, but really cool
@PunitaOjha01 4 месяца назад
We are using anthropic claude v3.5 sonnet on amazon bedrock. Since the prompt caching feature is in beta, I wanted to clarify if it is available for bedrock. I tried reaching out to Anthropic support for the same but could not get through. It will be great if someone could answer this for me?
@radu-mirceasirbu2159 5 месяцев назад
Do you write the code yourself or you generate it with LLM?
@StayPolishThinkEnglish 5 месяцев назад
Sorry for posting a question randomly, but do you have any tutorial for ai voicebot for discord ?
@hemanthkumar-tj4hs 5 месяцев назад
What if I ask another question after caching the entire book?
@JNET_Reloaded 5 месяцев назад
wheres link to code you used?
@perschistence2651 5 месяцев назад
I do not understand why they not just cache everything that changed when you set this flag... Why the cache points?
@JNET_Reloaded 5 месяцев назад
also i recommend putting timing in the script!
@j0hnc0nn0r-sec 5 месяцев назад
I’m thinking of trying the Anthropic cache will a local pgvector store or neo4j. Might make things better… or weird. Kris could do it better. Is this a good idea?
@j0hnc0nn0r-sec 5 месяцев назад
You can cache the “500 page book” context you find in Claude projects, btw
@luisfelipe6368 5 месяцев назад
Nice, but still expensive, $15 per MTok output is rough. Hopefully we will see this decrease in the future, specially since OpenAI probably has something similar on the works.
@ahtoshkaa 5 месяцев назад
Damn, I completely forgot about Google's caching. Looked at the prices. It seems like Google caching is 4 times cheaper than normal. In contrast Anthropic is 10 times cheaper BUT it's more expensive to create Cache by 25%... So I have no idea what the math is here someone help me out.
@newfrontiers5673 5 месяцев назад ⁺¹
Interesting but not a replacement for rag I dont think.
@ginocote 5 месяцев назад
5 minutes is very short, worse when you are programming. You can easly have more than 5 min. between 2 prompts. It shoud be 30 minimum, hope they will update this.
@norlesh 5 месяцев назад
FYI Google Gemini has been doing prompt caching for some time now.
@Solo2121 5 месяцев назад ⁺²
He mentions that at 13:33
@etherhealingvibes 5 месяцев назад
5 minutes is very short
@yellowboat8773 5 месяцев назад
Who actually uses rag? I've found it so unreliable
@DESX312 5 месяцев назад
It's as good as your implementation of it is. Use crappy embedding models and crappy text organization, and get crappy output. The inverse is true as well.
@ronaldronald8819 5 месяцев назад
Claude is getting stupid (its being quantized) To bad.

Следующие

Автовоспроизведение

How to Build an AI Web App with Claude 3.5 and Cursor | Full Tutorial

How to Build an AI Web App with Claude 3.5 and Cursor | Full Tutorial

How to Build Effective AI Agents (without the hype)

How to Build Effective AI Agents (without the hype)

The Best RAG Technique Yet? Anthropic’s Contextual Retrieval Explained!

The Best RAG Technique Yet? Anthropic’s Contextual Retrieval Explained!

Vermont vs. Marshall: 2024 NCAA men’s soccer championship highlights

Vermont vs. Marshall: 2024 NCAA men’s soccer championship highlights

Manchester City v. Manchester United | PREMIER LEAGUE HIGHLIGHTS | 12/15/2024 | NBC Sports

Manchester City v. Manchester United | PREMIER LEAGUE HIGHLIGHTS | 12/15/2024 | NBC Sports

🔴 BLOX FRUITS DRAGON UPDATE OFFICIAL COUNTDOWN!

🔴 BLOX FRUITS DRAGON UPDATE OFFICIAL COUNTDOWN!

How Employees Are Coffee Badging To Avoid Full Days At The Office

How Employees Are Coffee Badging To Avoid Full Days At The Office

Is This the End of RAG? Anthropic's NEW Prompt Caching

Is This the End of RAG? Anthropic's NEW Prompt Caching

Use Claude Prompt Caching to reduce your AI cost up to 90%

Use Claude Prompt Caching to reduce your AI cost up to 90%

Replit AI Agent - My First Impression | Cursor Challenger?

Replit AI Agent - My First Impression | Cursor Challenger?

Build a FULL Web App With Claude With 2 SCREENSHOTS!

Build a FULL Web App With Claude With 2 SCREENSHOTS!

Scale AI CEO Alexandr Wang on U.S.-China AI race: We need to unleash U.S. energy to enable AI boom

Scale AI CEO Alexandr Wang on U.S.-China AI race: We need to unleash U.S. energy to enable AI boom

Was I Wrong About AI Agents? | INSANE OpenAI-o1 Planning Capabilities

Was I Wrong About AI Agents? | INSANE OpenAI-o1 Planning Capabilities

I Replaced ALL my ADOBE APPS with these [free or cheaper] Alternatives!

I Replaced ALL my ADOBE APPS with these [free or cheaper] Alternatives!

Coding with AI: The Beginning of the Personalized Software Era?

Coding with AI: The Beginning of the Personalized Software Era?

Run ALL Your AI Locally in Minutes (LLMs, RAG, and more)

Run ALL Your AI Locally in Minutes (LLMs, RAG, and more)

Мэйби Бэйби, daryana - Oops! (snippet, 07/02)

Мэйби Бэйби, daryana — Oops! (snippet, 07/02)

Цены в СОЧИ: жилье 3000 руб., Mercedes 8000 руб.!

Цены в СОЧИ: жилье 3000 руб., Mercedes 8000 руб.!

13 Карт - Мёртвые души | 1 сезон 9 серия

13 Карт — Мёртвые души | 1 сезон 9 серия

Лавров в ШОКЕ от президента ГОПНИКА который перешел на русский мат

Лавров в ШОКЕ от президента ГОПНИКА который перешел на русский мат

Last Person Hanging Wins $10,000

Last Person Hanging Wins $10,000

Против Зеленского в США Трампом инициированы расследования! | Алексей Арестович | Канал Центр

Против Зеленского в США Трампом инициированы расследования! | Алексей Арестович | Канал Центр

Не самый безопасный пранк

Не самый безопасный пранк

Comedy Club: Шутки в офисе | Харламов, Мусагалиев, Дорохов, Кошкина, Иванов @ComedyClubRussia

Comedy Club: Шутки в офисе | Харламов, Мусагалиев, Дорохов, Кошкина, Иванов @ComedyClubRussia