Is This the End of RAG? Anthropic's NEW Prompt Caching

Long-Context LLM Extension

CONTEXT CACHING for Faster and Cheaper Inference

I Built a LEGO City From Scratch!

Using SPRUNKI to FOOL My Friend in Minecraft

Jake Paul Wins | Jake Paul vs. Mike Tyson | Netflix

Making Long Context LLMs Usable with Context Caching

Prompt Engineering

Просмотров 5 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 17 ноя 2024

Комментарии • 11

@unclecode 4 месяца назад ⁺³
Thanks! I found the ability to update the TTL very interesting. Imagine building an assistant application for answering questions or customer service. On the server side, we could update the TTL another let's say 5 minutes. When a new user sends a question, we can update it again. When there's no new user, it will be gone. Five minutes is just an example, but it's a great way to keep your cache ready and clear it when you don't need it.
I think the minimum token requirement is likely about profit. They need a minimum number to offer the service economically, saving expenses. Below that threshold, it wouldn't be cost-effective for them. That's my guess.
@engineerprompt 4 месяца назад ⁺¹
Dynamically controlling TTL can be really helpful and I agree the token limit is probably related to cost. I hope they implement the latency reduction soon, since that will make more sense.
@paraconscious790 4 месяца назад
this is very helpful buddy, very time saving and quickly updating my own biological cache without searching for it explicitly. Thanks!
@ylazerson 2 месяца назад
Great video - thanks!
@engineerprompt 2 месяца назад
thank you.
@boooosh2007 4 месяца назад ⁺²
Seems similar but more expensive to vector storage. What am I missing?
@engineerprompt 4 месяца назад ⁺⁶
A couple of thing that differentiate it from vector storage. When you use retrieve info with vector based search, you only get some "chunks" where the LLM doesn't have the whole context of the document, an approach like this will provide complete context to the LLM. Caching can also be really useful with RAG as well. I agree it is going to be more expensive than vectorstores but will potentially save on the infra. Will be interesting to see how it evolves.
@boooosh2007 4 месяца назад
@@engineerprompt yeah chunking would have to be perfect to match the context. But if vector representation and chunking are accurate it should match in context quality. Time will tell ehh?
@DearGeorge3 4 месяца назад
Great news! Thanks!!
@engineerprompt 4 месяца назад
thank you.
@khanra17 4 месяца назад
So much lazy voice 😴😴😴

Следующие

Автовоспроизведение

Is This the End of RAG? Anthropic's NEW Prompt Caching

Is This the End of RAG? Anthropic's NEW Prompt Caching

Long-Context LLM Extension

Long-Context LLM Extension

CONTEXT CACHING for Faster and Cheaper Inference

CONTEXT CACHING for Faster and Cheaper Inference

I Built a LEGO City From Scratch!

I Built a LEGO City From Scratch!

Using SPRUNKI to FOOL My Friend in Minecraft

Using SPRUNKI to FOOL My Friend in Minecraft

Jake Paul Wins | Jake Paul vs. Mike Tyson | Netflix

Jake Paul Wins | Jake Paul vs. Mike Tyson | Netflix

Juice WRLD & Nicki Minaj - AGATS2 (Insecure) (Official Audio)

Juice WRLD & Nicki Minaj - AGATS2 (Insecure) (Official Audio)

Will the New GEMINI PDF Feature Replace RAG?

Will the New GEMINI PDF Feature Replace RAG?

Graph RAG with Ollama - Save $$$ with Local LLMs

Graph RAG with Ollama - Save $$$ with Local LLMs

Slash API Costs: Mastering Caching for LLM Applications

Slash API Costs: Mastering Caching for LLM Applications

Stop Losing Context! How Late Chunking Can Enhance Your Retrieval Systems

Stop Losing Context! How Late Chunking Can Enhance Your Retrieval Systems

Agentic RAG: Make Chatting with Docs Smarter

Agentic RAG: Make Chatting with Docs Smarter

Multi-modal RAG: Chat with Docs containing Images

Multi-modal RAG: Chat with Docs containing Images

How to save money with Gemini Context Caching

How to save money with Gemini Context Caching

Claude Prompt Caching: Did Anthropic Create a Better Alternative to RAG?

Claude Prompt Caching: Did Anthropic Create a Better Alternative to RAG?

`const` was a mistake

`const` was a mistake

Почему ТАЙСОН КУСАЛ СВОИ ПЕРЧАТКИ? #бокс

Почему ТАЙСОН КУСАЛ СВОИ ПЕРЧАТКИ? #бокс

From Trashcan to Cozy Sleeping Capsule! Take Your Camp Anywhere! ♻️⛺️✨

From Trashcan to Cozy Sleeping Capsule! Take Your Camp Anywhere! ♻️⛺️✨

SWING or SWIM 💦 NEW VIDEO live now ! #storror

SWING or SWIM 💦 NEW VIDEO live now ! #storror

Ребенок из Газы просит у Всевышнего силы, поднимая тяжелые канистры с водой

Ребенок из Газы просит у Всевышнего силы, поднимая тяжелые канистры с водой

Lasers vs Lightning- Which Is More Powerful?

Lasers vs Lightning- Which Is More Powerful?

UFC 309: Джон Джонс - Слова после боя

UFC 309: Джон Джонс - Слова после боя

Побег из Тюрьмы : Тетрис помог Nuggets Gegagedigedagedago сбежать от Nikocado Avocado !

Побег из Тюрьмы : Тетрис помог Nuggets Gegagedigedagedago сбежать от Nikocado Avocado !

"ОНО" ПОСЕЛИЛОСЬ В НАШЕМ ДОМЕ / Вики Шоу

"ОНО" ПОСЕЛИЛОСЬ В НАШЕМ ДОМЕ / Вики Шоу