Fine-tune Mixtral 8x7B (MoE) on Custom Data - Step by Step Guide

Prompt Engineering

Просмотров 39 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 9 янв 2025
Наука

Комментарии • 69

@jprobichaud Год назад ⁺⁹
🎯 Key Takeaways for quick navigation:
00:00 🚀 *Introduction to Fine-Tuning Mixtral 87B Model*
- Overview of the video's purpose: fine-tuning Mixtral 87B model from Mistral AI on a custom dataset.
- Mention of the popularity and potential of Mixtral 87B as a mixture of experts model.
- Emphasis on practical considerations for fine-tuning, such as VRAM requirements and dataset details.
01:28 🛠️ *Installing Required Packages and Data Set Overview*
- Installation of necessary packages: Transformers, TRL, accelerate, P torch bits, and bytes.
- Discussion on using the Mosaic ML Instruct with 3 datasets for fine-tuning.
- Overview of the dataset structure, splits, and sources.
03:45 📝 *Formatting Data for Fine-Tuning Mixtral 87B*
- Explanation of the prompt template for fine-tuning, specific to Mixtral 87B Instruct version.
- Discussion on rearranging data to make it more challenging by creating instructions from provided text.
- Demonstration of a function to reformat the initial data into the desired prompt template.
06:28 🧩 *Loading Base Model and Configuring for Fine-Tuning*
- Acknowledgment of the source for the notebook and clarification that the base version is used.
- Setting configurations, loading the model, and tokenizer, along with using Flash attention.
- Explanation of the importance of setting up configurations for a smooth fine-tuning process.
08:18 🔄 *Checking Base Model Responses Before Fine-Tuning*
- Use of a function to check responses from the base model before any fine-tuning.
- Illustration of the base model behavior in generating responses to a given prompt.
- Recognition that the base model tends to follow next word prediction rather than explicit instructions.
10:06 📏 *Determining Max Sequence Length for Fine-Tuning*
- Explanation of the importance of max sequence length in fine-tuning Mixtral 87B.
- Presentation of a code snippet to analyze the distribution of sequence lengths in the dataset.
- Emphasis on selecting a max sequence length that covers the majority of examples.
12:20 🧠 *Adding Adapters with Lura for Fine-Tuning*
- Overview of the Mixtral 87B architecture, focusing on linear layers for adding adapters.
- Introduction to Lura configuration for attaching adapters to specific layers.
- Demonstration of setting hyperparameters and using the TRL package for supervised fine-tuning.
14:36 🚥 *Setting Up Trainer and Initiating Fine-Tuning*
- Verification of multiple GPUs for parallelization during model training.
- Definition of output directory and selection of training epochs or steps.
- Importance of configuring the trainer, including considerations for max sequence length.
16:50 📈 *Analyzing Fine-Tuning Results and Storing Model*
- Presentation of training and validation loss graphs, indicating a gradual decrease.
- Acknowledgment of the need for potential longer training for better model performance.
- Demonstration of storing the fine-tuned model weights locally and pushing to Hugging Face repository.
17:46 🔄 *Testing Fine-Tuned Model Responses*
- Utilization of the fine-tuned model to generate responses to a given prompt.
- Comparison of responses before and after fine-tuning, showcasing improved adherence to instructions.
- Acknowledgment that further training could enhance the model's performance.
Made with HARPA AI
@薇季芬 8 месяцев назад ⁺¹
3:37 format
4:15 follow a different format
4:26
Indicate the end of user input
4:33
special token Indicate the end of model response
4:39
you need to provide your data in this format
5:08
def create_prompt
5:31
System message
6:16
Load our based model
@MikewasG Год назад ⁺¹
Thank you for sharing, this is very helpful! Looking forward to the next videos!
@Tiberiu255 Год назад ⁺⁸
why are you using packing in the SFTTrainer if you just said that you're going to pad the examples?
@big_sock_bully3461 11 месяцев назад
Can you explain ?
@dev_navdeep 11 месяцев назад ⁺¹
kudos, really simple and direct explaination.
@AI-Makerspace 11 месяцев назад
Thanks for the tag @Prompt Engineering! What else is your audience requesting the most these days? Would love to find ways to create some value for them together!
@engineerprompt 11 месяцев назад ⁺¹
Thanks for the amazing work you guys are doing! really appreciate it. I think deployment is a topic that will be really valuable to my audience. Let's explore how to collaborate.
@AI-Makerspace 11 месяцев назад
@@engineerprompt absolutely! We started delving deper into deployment with LangServe and vLLM events in recent weeks. We'll connect to figure out next steps!
@IshfaqAhmed-p6d 11 месяцев назад
at 5:58, Why is the sample["response"] given as the input and sample["prompt"] is given as response
@WelcomeToMyLife888 Год назад ⁺¹
Awesome content as usual! Thanks!
@engineerprompt Год назад
Thank you 😊
@Akshatgiri 10 месяцев назад
I've noticed that Mixtral 8x7b-instruct ( and other mistral models ) constantly repeat part of the system prompt. Have you noticed this / found a fix for it?
@ahmedmechergui8680 Год назад ⁺²
Thanks for the video 😃 i just have a question , is it possible to use the model through an API and also provide the source files for the data with the response ?
@varunnegi-v7z Год назад ⁺¹
can you also make a video on fine-tuning multimodal models like llava, cog-vlm
@kaio0777 Год назад ⁺¹
Can you make this for home computer use in terms of my personal data and tech it to use tools on your system and online
@lukeskywalker7029 11 месяцев назад
IM sceptical this actually is effectively training mixtral MoE model and not making it worse!
@alexxx4434 Год назад ⁺²
Thanks for the guide!
How to continue fine-tuning process such as in this case?
Can you load previous work (Lora) and carry on, or do you need to restart?
@engineerprompt Год назад
I think you can do that by storing different check points
@shinygoomy2460 10 месяцев назад
how do you format a prompt that has multiple requests and responses within the same context???????
@AbhishekShivkumar-ti6ru 11 месяцев назад
very nicely explained!
@Ai-Marshal 10 месяцев назад
That's a great video. Thanks for sharing.
After pushing the model to hugging face, how to host it independently on runpod using VLLM ? When I try to do that, it gives me error. Tried searching a lot of videos and articles. But of no use so far.
@FarleyTheCoder 10 месяцев назад
did you come right?
@HarmeetSingh-ry6fm 11 месяцев назад
Great video just have one question can we use the fine-tuned model as a pickle file?
@VerdonTrigance 10 месяцев назад
Hi, thanks for this step by step guide, but in case we want LLM to learn something new about our domain (let's say it will be book Lord of the Rings) and we later want to ask our model open questions about this book (like 'where Frodo gets his sword?') what should we do? We definetely cannot prepare dataset in form of QnA, so it should self-supervised training. But I never saw examples of doing this and I can't image how it supposed to be done? Is it even possible? Looks like we should start from base model, fine-tune it somehow with our book, and later we should apply fine-tuning for instruct on top of it, right? But in this case someone still should prepare this QnA? I'm frustrated.
@xXCookieXx98 10 месяцев назад
Your use case sounds like a classic RAG one. It's not necessary to fine-tune for that. Although a fine-tuned model + RAG would probably create even better results, the effort here doesn't seem worth it. The video Building Corrective RAG from scratch with open-source, local LLMs from langchain (ruclips.net/video/E2shqsYwxck/видео.html) might help you, it also incudes a web search option, in case the provided context isn't sufficient, which should work pretty good with things like popular books. So, it's not limited to that and can be used in basically any domain. But you could also just build a RAG app without that. I would suggest a combination of a MultiQueryRetriever and a ParentDocumentRetriever for retrieving your context.
Nevertheless, if you still want to fine tune: From what I have learned so far it is possible to create datasets using LLMs: e.g. you prompt an instruct LLM to create questions based on context chunks and then use those questions and chunks to create answers. You will find similar methods on this channel e.g. "automate dataset creation for Llama-2 with GPT-4".
@joaops4165 Год назад ⁺¹
Could you make a tutorial teaching how to convert a model to ggml format?
@rishabhkumar4443 Год назад
How can I use a generative model to manipulate content of my website
Ex. Showing response from my site based on prompt given by the user
@Juan-n6k3c 9 месяцев назад
So with two 3090s this should work? And what about using multiple different gpus for training? Like I have one 3090ti 24g and one 4060 8g
@garyhutson6270 11 месяцев назад
What were your VM instance specs. It is struggling with an A100?
@sysadmin9396 Год назад
Can I use this to train a model to answer questions from a list of pdfs?
@divyagarh 8 месяцев назад
Great video! Could you please consider training and deploying it in Sagemaker?
@engineerprompt 8 месяцев назад
I am going to create a video on deployment soon
@LakshayKumar-v2p 9 месяцев назад
Could you please share the requirement.txt, i am having version conflicts despite using A100 GPU!
@lostInSocialMedia. Год назад ⁺²
can you finetune Uncensored Models of this with gemini pro ai ?
@PotatoMagnet Год назад ⁺²
The base model ofmistral is uncensored, but you can't fine tune one model with another model. Both are of different architecture, you can't even merge or fine tune between same models of different parameters like between 7B and 13B either, so forget completely different models.
@researchforumonline 11 месяцев назад
Thanks, what is the cost to do this? Server cost?
@abdeldjalilmouaz 9 месяцев назад
requires colab pro to work?
@DistortedV12 Год назад
Awesome man, any idea of how to get this running on a colab gpu or inference cost down?
@engineerprompt Год назад
Probably no way at the moment to run it on the colab gpu but you can look at the 2bit quantized version. If you are running this model as part of production pipeline, I would suggest to look at api providers such as together AI. They have really good pricing on it
@DistortedV12 Год назад
Are you finetuning the mixtral instruct version they just released or base model??
@engineerprompt Год назад
In this video, just the base version
@Zaheer-r4k Год назад ⁺²
So can't we run in colab or kaggle notebook?
@ilianos Год назад ⁺¹
in the video descr it says no (not on T4)
@luciolrv Год назад
I could not run it in A100 of Colab. It complains of lack of memory, not too much: actually less than 1GB. The "copilot" of colab gives some suggestions such as reducing batch size or the max_split_size_mb parameter, but that does not reduce enough. Any ideas? Good notebook
@jonjino Год назад
@@luciolrv It complains of less than 1GB of memory, but that's because it's loading the model a bit at a time so the error message isn't accurate. Kaggle doesn't offer better GPU's either. You'll need to setup a VM with an A100 80GB or H100. Unfortunately you'll probably just have to go through the hassle of setting up a VM with one of those GPU's via GCP or AWS.
@LeoAr37 Год назад
Can't we train the quantized version in a smaller GPU instead of training the full model?
@engineerprompt Год назад
Even training the quantized version of the full model will need a powerful GPU. That's why LoRa is used to add extra layers that are trained instead of the actual model. Hope this helps
@electricskies1707 Год назад ⁺¹
Can you clairfy, 1 epoch would be one run of the full data (34333 steps of your trimmed data) Why would you run this 2 epochs, does going over the data twice improve it?
Also how did you determine 32 was a good batch size for this data size? (this is about 0.9% of the data?)
@LeoAr37 Год назад ⁺²
I think the companies that trained big LLMs usually used 2-3 epochs
@engineerprompt Год назад ⁺²
Batch size determines how much data is fed to your model at once. 32 is the max I could do on the available hardware. Usually you will see that to be much lower. In regards to the epochs, you are right. In one epoch, the model will see each example once. If you have small amount of data, you might want to go over multiple epoch so the model can actually learn from the data but you need to be careful that the model can also overfit.
For large amount of data (billions or trillions of tokens) its very expensive and time consuming to have several epochs over the data, that's why you mostly see models trained for one more two epochs only. Hope this helps.
@pallavggupta Год назад
Hi,
I am trying to build an organisation level AI trained on my company data
I would to know how can I create dataset for my data to be trained on mistral AI
I was unable to find any tutorial on how to create a dataset for large data
@conscious_yogi Год назад
Did you found solution for this?
@nishhaaann 11 месяцев назад
Looking for same thing@@conscious_yogi
@scortexfire 9 месяцев назад
How do I fine tune without prompt and instruction? I basically want the model to "know" about a thousand very recent web articles.
@engineerprompt 9 месяцев назад
In this case, you probably want to further pretrain the base model with your dataset (you don't need prompt & instructions format) and then finetune it on a dataset. Or just use RAG.
@AIEntusiast_ 10 месяцев назад
i wish someone made a video from collecting data example pdf, conver that to working dataset tha can be used to train model, everyone is using huggingface models and just retrain another llm
@MehdiMirzaeiAlavijeh 11 месяцев назад
please let me know how to create a fixed forms with the below structures with special command to LLM:
Give me score out of 4 for (based on the TOEFL rubric) without any explanation, just display the score.
General Description:
Topic Development:
Language Use:
Delivery:
Overall Score:
Identify the number of grammatical and vocabulary errors, providing a sentence-by-sentence breakdown.
'Sentence 1:
Errors:
Grammar:
Vocabulary:
Recommend effective academic vocabulary and grammar:'
'Sentence 2:
Errors:
Grammar:
Vocabulary:
Recommend effective academic vocabulary and grammar:'
.......
@caiyu538 Год назад
Great
@tomski2671 Год назад
I think you can rent an H100 for $5/hour. So this would cost about $7
@hemeleh8683 Год назад
where?
@kanshkansh6504 10 месяцев назад
❤👍🏼
@kunalr_ai Год назад ⁺¹
64 gb vram kaha se laaoge
pata nahi kaunse dataset par fine tune kiya hai
bhai kisi kaam ka nahi hai ye video
tere paise to view se aa gaye
humare paise kaise banege
@bashafaris5908 Год назад
🥹‼️I am student.. who has no budget at all..but intrested in training any of the llm with my own dataset
What are the cost effective ways?
@jonjino Год назад
Get a 3B parameter model and play around with that. This can probably fit on the free T4 GPU in Google Colab since it's much smaller.
@user-jk9zr3sc5h Год назад
How much VRAM is necessary?
@engineerprompt Год назад
About 45GB
@user-jk9zr3sc5h Год назад
@@engineerprompt Do you suggest fine tuning on base model, and then further fine tuning with Q&A instruct format data?

Следующие

Автовоспроизведение