Embeddings vs Fine Tuning - Part 2, Supervised Fine-tuning

Serve a Custom LLM for Over 100 Customers

RAG vs. Fine Tuning

F1 Sprint Highlights | 2024 Sao Paulo Grand Prix

Squid Game: Season 2 | Official Teaser | Netflix

EPIC: The Musical - SAGAS 1-8 The Vengeance Saga release party

Embeddings vs Fine Tuning - Part 1, Embeddings

Trelis Research

Просмотров 7 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 3 ноя 2024

Комментарии • 25

@dalevi1 Год назад
Very good explanation of how embeddings work. After this video I got a much better understanding. Thank you very much for your video and the examples!
@Ev3ntHorizon Год назад ⁺³
This is great. Very much looking forward to Part 2
@NLPprompter Год назад ⁺⁶
This is really good. 🌟⭐⭐⭐⭐
@adriangabriel3219 9 месяцев назад
Great video! Could you explain why you chose MARCO instead of taking an arbitrary embedding model from the MTEB leaderboard (e.g. BGE)? What's your opinion on fine-tuning an embedding model on the domain in your case on the touch rugby dataset?
@TrelisResearch 9 месяцев назад
I just wanted to pick something more standard. Leaderboards can be misleading and there's a risk of grabbing something that later turns out not to be robust.
That said, I think it's a good idea to try leaderboard embeddings and probably you can get better performance.
I've never thought of fine-tuning an embedding model, but I like it! Might be a way to do better RAG!
@GrahamAndersonis 8 месяцев назад ⁺¹
any current thoughts on ColBERTv2 vs regular bert embeddings? Seems intriguing--not too hard to set up if your're not on WSL 2
@TrelisResearch 8 месяцев назад ⁺¹
I haven't dug in but since the v2 is optimised for retrieval, it does sound intriguing!
@yusufkemaldemir9393 Год назад ⁺¹
Great clarification! Well done!
1- I would love to see how your codes work with M1/M2 macbook.
2- Have you tried COLBERT? Would you be able to production-alize this notebook?
3-Would this embeddings and sentence transformers work for 1- hundreds of pages pdf or 2-thousands of pdfs with various pages?
4- For scenario in #3, should someone use i) embeddings only, ii) peft or fine tune or train LLM with specific documents, iii) both ,iv).
5- When are you publishing the Part2?
Thanks again!
@TrelisResearch Год назад ⁺³
1. With embeddings let me dig in and revert. For fine-tuning, that's trickier, I'll think about it.
2. I haven't tried ColBERT. The thing here is that marco dot-product is specifically trained for dot-product search, so you'd be trying marco dot-product fine-tuned versus some ColBERT approach. I'll add it to my list of things to do, although making fine tuning is higher priority.
3. Yes, embeddings will work for hundreds or thousands of pages. One issue is that there may be so many highly similar snippets returned that you can't fit them in the prompt to go to the LLM. This makes it dataset and question dependent.
4. In my experience, fine-tuning is really hard. I've only gotten it to work for structured responses (like function calling) or maybe a bit for tone. Encoding information is really difficult, so my suggestion would be to try and stick to prompting and embeddings.
5. Hopefully soon - I want to publish something that is useful, and in a lot of the fine-tunings, the results are just really bad.
@eduardoconcepcion4899 10 месяцев назад
Hi, great channel! Thanks for sharing. I have a questions. Do you have any recommendations regarding embeddings and models to use when dealing with spanish for context and questions? Thanks in advance.
@TrelisResearch 10 месяцев назад ⁺¹
Probably you could look at BETO (github.com/dccuchile/beto) and dig around from there
@NamLe-sz9cj 9 месяцев назад
This is an amazing tutorial. Thank you!
@finnsteur5639 Год назад ⁺²
Thanks for the video! You helped me a lot.
But no matter what you'll have a large prompt in the end right? I want my application to run on CPU and Llamma.cpp even tho it's way faster on CPU than other stuff specialized on GPU if you give it a long prompt it automatically go very slow.
I made test for 1 phrase I get a response in less than a second. For a 1 page prompt it takes 4 minutes! So I cannot use embedding for my usecase (I have 40 000 pages of technical documentation and want answer on them). I have to use finetuning I guess. But everyone on the internet told me that finetuning is not for that. How would you proceed?
@TrelisResearch Год назад
Exactly! Embeddings means long prompts. Fine tuning is your option here, but it’s hard
@ghrasko 10 месяцев назад ⁺¹
I had to add encoding parameter in the pdf_to_txt-py file here to prevent an error:
with open(txt_path, 'w', encoding='utf-8') as f:
f.write(text)
@TrelisResearch 10 месяцев назад
thanks, yeah dependent on your python environment, that may be needed. I've just added your improvement on the repo. Feel free to create issues there in the repo if you spot further bugs.
@HappyDancerInPink Год назад ⁺¹
Why are embeddings only from the first layer? Could you not pass it through N layers and attain lower dimensional embeddings that way?
@TrelisResearch Год назад ⁺²
Howdy! I was too categoric on that in the video. The embeddings can be from different layers. The first layers are closer semantically to the input, then they are transformed to being more abstract in the middle, and then they come back towards word-meanings in the last layers as the output prediction is approached. A language model is trained to predict the next token - does that align well with forming an abstract representation for comparison with other text? Yes, empirically it seems so, but I don't have great intuition about why a first or a middle layer would be better. Obviously you need at least one layer because you want to get into vector space.
Regarding dimensions, the dimensions of all layers (32 in Llama) are all the same, so it wouldn't be less dimensional just by counting the matrix sizes. Possibly the inner layers *are* lower rank, or could be represented by lower rank matrices (in fact, probably, because Low Rank Adapters, LoRAs, work). But maybe the first layer also could be very well approximated by a lower rank matrix.
So overall, probably you could do either of those things, but I need to learn more and see more examples to say something deeper.
@SatyamKumar-qt3xw Год назад ⁺¹
great video!!!!
@adriangabriel3219 9 месяцев назад
How did you manage to match the dimension of the question embedding tensor with the text tensors? I would assume that you would have to pad the dimension of the question tensor to match the dimension of the text tensors don't you?
@TrelisResearch 9 месяцев назад
Do you mean my test question set?
If so, that isn't a tensor but rather a list - it's not being sent through as a batch (although that would be a more efficient approach).
@adriangabriel3219 9 месяцев назад
@@TrelisResearch I am refering to your evaluation of the embedding model on your train set. The dimension of the question embedding must match the dimension of the stacked corpus tensors to do the dot product. How did you accomplish that? I am missing something like: padding = target_len - question_emb.shape[0]
question_emb = F.pad(question_emb (0, padding))
Where target_len is the length of the longest sentence in the corpus. To make the dimension of question and corpus match
@TrelisResearch 9 месяцев назад
@@adriangabriel3219 no matter the length of the sentence you put into the embedding model, it will return a 1D vectors whose length is the embedding dimension (not the sentence length). Does that help?
@TrelisResearch Год назад
**Running on Mac M1 or M2**
!!! Requires at least Mac M1 or M2 with 16 GB+ !!!
I'm making available for purchase a version of the Embedding.ipynb script for Mac M1 or M2. Video demo here: www.loom.com/share/eb45fad389364c229655567dcc3aaf0d?sid=86a9fc70-37e0-4808-b016-1707f9a34c9f
@caiyu538 Год назад ⁺¹
Great

Следующие

Автовоспроизведение

Embeddings vs Fine Tuning - Part 2, Supervised Fine-tuning

Embeddings vs Fine Tuning - Part 2, Supervised Fine-tuning

Serve a Custom LLM for Over 100 Customers

Serve a Custom LLM for Over 100 Customers

RAG vs. Fine Tuning

RAG vs. Fine Tuning

F1 Sprint Highlights | 2024 Sao Paulo Grand Prix

F1 Sprint Highlights | 2024 Sao Paulo Grand Prix

Squid Game: Season 2 | Official Teaser | Netflix

Squid Game: Season 2 | Official Teaser | Netflix

EPIC: The Musical - SAGAS 1-8 The Vengeance Saga release party

EPIC: The Musical - SAGAS 1-8 The Vengeance Saga release party

Spooky Month short: Kevin's Dream job

Spooky Month short: Kevin's Dream job

Embeddings: What they are and why they matter

Embeddings: What they are and why they matter

How To Create Datasets for Finetuning From Multiple Sources! Improving Finetunes With Embeddings.

How To Create Datasets for Finetuning From Multiple Sources! Improving Finetunes With Embeddings.

Fine-tuning Language Models for Structured Responses with QLoRa

Fine-tuning Language Models for Structured Responses with QLoRa

How AI 'Understands' Images (CLIP) - Computerphile

How AI 'Understands' Images (CLIP) - Computerphile

Fine-tuning Datasets with Synthetic Inputs

Fine-tuning Datasets with Synthetic Inputs

OpenAI Embeddings and Vector Databases Crash Course

OpenAI Embeddings and Vector Databases Crash Course

Smart RAG: Domain-Specific Fine-Tuning for End-to-End Retrieval

Smart RAG: Domain-Specific Fine-Tuning for End-to-End Retrieval

"okay, but I want Llama 3 for my specific use case" - Here's how

"okay, but I want Llama 3 for my specific use case" - Here's how

The Best Tiny LLMs

The Best Tiny LLMs

😮 Прикол с динозавром пошёл не по плану! | Новостничок

😮 Прикол с динозавром пошёл не по плану! | Новостничок

Я КУПИЛ САМЫЙ МОЩНЫЙ TESLA CYBERTRUCK! МОЯ САМАЯ ДОРОГАЯ ПОКУПКА!

Я КУПИЛ САМЫЙ МОЩНЫЙ TESLA CYBERTRUCK! МОЯ САМАЯ ДОРОГАЯ ПОКУПКА!

Can you Survive in Poison with Stu Hypercharge 🤔

Can you Survive in Poison with Stu Hypercharge 🤔

Wait… Maxim, did you just eat 8 BURGERS?!🍔😳| Free Fire Official

Wait… Maxim, did you just eat 8 BURGERS?!🍔😳| Free Fire Official

Кирилл Набутов. Что сделают с трупом Путина, смерть Байдена, захват Харькова, удар по Чечне, ООН все

Кирилл Набутов. Что сделают с трупом Путина, смерть Байдена, захват Харькова, удар по Чечне, ООН все

🤣 Придумал как ничего не делать и получать зарплату, но начальство всё узнало! | Новостничок

🤣 Придумал как ничего не делать и получать зарплату, но начальство всё узнало! | Новостничок

FOOTBALL PLAYERS AS A KID!!

FOOTBALL PLAYERS AS A KID!!