What does a Data Analyst actually do? (in 2024) Q&A

"I want Llama3.1 to perform 10x with my private knowledge" - Self learning Local Llama3.1 405B

Finally! SpaceX revealed Unexpected Starship Flight 7 TIMELINE: Massive Change Coming...

Jason Segel Breaks Down His Most Iconic Characters

I GOT BULLIED INTO CUTTING MY HAIR :(

The WORST dog matting I have ever seen in my 13 years as a pet groomer | EXTREME transformation

The Hidden Cost of Embeddings in RAG and how to Fix it

Prompt Engineering

Просмотров 9 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 8 янв 2025

Комментарии •

@greendsnow 4 месяца назад ⁺⁹
Use Embedding-3-small + Qdrant Quantization for saving storage costs.
@BadBite 4 месяца назад ⁺³
Pretty good! very useful as I never thought about the long term wallet bleeding
@jirikosek3714 4 месяца назад ⁺²
Just a question, I see you are mentioning e.g. the AWS X2gd EC2 instance. So if I understand correctly you want to keep all the vectors in memory. Isn't it better to just use a storage solution for this instead if the database is massive? E.g. Amazon OpenSearch Service. Storage should be cheap...
@uwegenosdude 4 месяца назад ⁺¹
Thank you very much ! This is very good to know if our app is becoming bigger.
@engineerprompt 4 месяца назад
Yes, something to keep in mind.
@unclecode 4 месяца назад
Very interesting and important points you raised. I’ve seen startups completely unaware of this and, as a result, they're doomed. Many don’t even use features like OpenAI’s dimension reduction. This binary and quantization has been around since March and is incredibly powerful. Now, with Gemini's support for PDF and long context windows, freeing up to a billion tokens in a day, it raises questions about when to use embedding and RAG, and when not to. When necessary, combining this with a long context window seems like the perfect solution. I suggest you create a video showing how to use this with Gemini to fetch and cache context, which will deliver the best balance of performance and cost.
@engineerprompt 4 месяца назад ⁺¹
I am also noticing the same, there are some great tools which needs to be every production pipeline but folks are not aware. Funny thing, I put together a video on the topic you suggested. Combining Gemini's PDF capabilities with context caching. Will be releasing it tomorrow. This is very powerful and definitely needs to be an options for developers in any retrieval task.
@unclecode 4 месяца назад
@@engineerprompt Looking forward to that, my friend. Your ability to create educational and practical content is your superpower! This embedding video should definitely be added to your course, and you should dive deep into the details. Just this alone is enough to convince a developer to take your course! I had a couple of interviews for an AI engineer position last week, and I asked them all, "Have you seen or followed the engineerprompt channel?" To motivate you, 2 out of 5 said yes, and no surprise, their answers were better than those who hadn't seen it. So as usual, I stay tuned for your next video.
@harshilpatel4989 4 месяца назад ⁺²
Please make a video on hybrid search using the BM25 algorithm.
@xuantungnguyen9719 4 месяца назад
Your channel is so insightful
@aibeginnertutorials 4 месяца назад
Brilliant and extremely useful and relevant information as usual. Thanks!
@engineerprompt 4 месяца назад
thank you!
@messam1981 4 месяца назад
Thank you, waiting for real tutorial for production RAG app
@MeinDeutschkurs 4 месяца назад ⁺¹
That’s great! Yes, please create a video with a useful example. I‘d appreciate it! 🎉🎉
@abdulrehmanbaber2104 4 месяца назад
Very helpful
One question, can you explain the difference between this word quantization used with embedding model (here) and use of quantization when doing inference or fine-tuning!?
@engineerprompt 4 месяца назад
Both are used in the same context. For inference, its used for quantization of the weights (numerical value) of the model (LLM). That will help you reduce the memory (RAM) needed when you load the model for inference. In the case of embeddings, we are talking about the output of the model (again numerical value) which needs to be stored somewhere (usually vectorstore). You want to quantize them to reduce storage cost.
@sashirestela8572 4 месяца назад
Thanks for your very useful information.
@kordou 4 месяца назад
very nice video. Thanks
@hsin-yusu9094 4 месяца назад
Yes, this is exactly what I'm looking for
@mattshelley6541 4 месяца назад
Using QDrant on our servers, RAM will be our largest expense to maintain the database as it grows.
@themax2go 4 месяца назад
what about sci-phi triplex ?
@Suro_One 4 месяца назад
Thanks!

Следующие

Автовоспроизведение

What does a Data Analyst actually do? (in 2024) Q&A

What does a Data Analyst actually do? (in 2024) Q&A

"I want Llama3.1 to perform 10x with my private knowledge" - Self learning Local Llama3.1 405B

"I want Llama3.1 to perform 10x with my private knowledge" - Self learning Local Llama3.1 405B

Finally! SpaceX revealed Unexpected Starship Flight 7 TIMELINE: Massive Change Coming...

Finally! SpaceX revealed Unexpected Starship Flight 7 TIMELINE: Massive Change Coming...

Jason Segel Breaks Down His Most Iconic Characters

Jason Segel Breaks Down His Most Iconic Characters

I GOT BULLIED INTO CUTTING MY HAIR :(

I GOT BULLIED INTO CUTTING MY HAIR :(

The WORST dog matting I have ever seen in my 13 years as a pet groomer | EXTREME transformation

The WORST dog matting I have ever seen in my 13 years as a pet groomer | EXTREME transformation

The White Lotus Season 3 | Official Teaser | Max

The White Lotus Season 3 | Official Teaser | Max

Marker: This Open-Source Tool will make your PDFs LLM Ready

Marker: This Open-Source Tool will make your PDFs LLM Ready

AI isn't gonna keep improving

AI isn't gonna keep improving

The Best RAG Technique Yet? Anthropic’s Contextual Retrieval Explained!

The Best RAG Technique Yet? Anthropic’s Contextual Retrieval Explained!

Try this Before RAG. This New Approach Could Save You Thousands!

Try this Before RAG. This New Approach Could Save You Thousands!

Exploring the Rise of Small Language Models

Exploring the Rise of Small Language Models

How Millionaire Bankers Actually Work | Authorized Account | Insider

How Millionaire Bankers Actually Work | Authorized Account | Insider

Run your own AI (but private)

Run your own AI (but private)

10 weird algorithms

10 weird algorithms

Coding Was HARD Until I Learned These 5 Things...

Coding Was HARD Until I Learned These 5 Things...

(Почти) ортопедическая обувь

(Почти) ортопедическая обувь

Учёные Обнаружили Самолёт, Погребённый во Льдах Арктики. То, Что Было Внутри, Потрясло Всех!

Учёные Обнаружили Самолёт, Погребённый во Льдах Арктики. То, Что Было Внутри, Потрясло Всех!

Тест на интеллект Роблокс НЕТ Майкрафт

Тест на интеллект Роблокс НЕТ Майкрафт

Кислород на МКС откуда он берется?

Кислород на МКС откуда он берется?

Веселая китайская шутка с русскими лучше не повторяйте

Веселая китайская шутка с русскими лучше не повторяйте

Смотри как надо!

Смотри как надо!

The Dog Helped Him Out 🤝

The Dog Helped Him Out 🤝

застряли в т….. прода в тг «хей! это марьяна!» #шортс #тикток

застряли в т….. прода в тг «хей! это марьяна!» #шортс #тикток