🎯 Key Takeaways for quick navigation: 00:00 🚀 *Introduction to Fine-Tuning Mixtral 87B Model* - Overview of the video's purpose: fine-tuning Mixtral 87B model from Mistral AI on a custom dataset. - Mention of the popularity and potential of Mixtral 87B as a mixture of experts model. - Emphasis on practical considerations for fine-tuning, such as VRAM requirements and dataset details. 01:28 🛠️ *Installing Required Packages and Data Set Overview* - Installation of necessary packages: Transformers, TRL, accelerate, P torch bits, and bytes. - Discussion on using the Mosaic ML Instruct with 3 datasets for fine-tuning. - Overview of the dataset structure, splits, and sources. 03:45 📝 *Formatting Data for Fine-Tuning Mixtral 87B* - Explanation of the prompt template for fine-tuning, specific to Mixtral 87B Instruct version. - Discussion on rearranging data to make it more challenging by creating instructions from provided text. - Demonstration of a function to reformat the initial data into the desired prompt template. 06:28 🧩 *Loading Base Model and Configuring for Fine-Tuning* - Acknowledgment of the source for the notebook and clarification that the base version is used. - Setting configurations, loading the model, and tokenizer, along with using Flash attention. - Explanation of the importance of setting up configurations for a smooth fine-tuning process. 08:18 🔄 *Checking Base Model Responses Before Fine-Tuning* - Use of a function to check responses from the base model before any fine-tuning. - Illustration of the base model behavior in generating responses to a given prompt. - Recognition that the base model tends to follow next word prediction rather than explicit instructions. 10:06 📏 *Determining Max Sequence Length for Fine-Tuning* - Explanation of the importance of max sequence length in fine-tuning Mixtral 87B. - Presentation of a code snippet to analyze the distribution of sequence lengths in the dataset. - Emphasis on selecting a max sequence length that covers the majority of examples. 12:20 🧠 *Adding Adapters with Lura for Fine-Tuning* - Overview of the Mixtral 87B architecture, focusing on linear layers for adding adapters. - Introduction to Lura configuration for attaching adapters to specific layers. - Demonstration of setting hyperparameters and using the TRL package for supervised fine-tuning. 14:36 🚥 *Setting Up Trainer and Initiating Fine-Tuning* - Verification of multiple GPUs for parallelization during model training. - Definition of output directory and selection of training epochs or steps. - Importance of configuring the trainer, including considerations for max sequence length. 16:50 📈 *Analyzing Fine-Tuning Results and Storing Model* - Presentation of training and validation loss graphs, indicating a gradual decrease. - Acknowledgment of the need for potential longer training for better model performance. - Demonstration of storing the fine-tuned model weights locally and pushing to Hugging Face repository. 17:46 🔄 *Testing Fine-Tuned Model Responses* - Utilization of the fine-tuned model to generate responses to a given prompt. - Comparison of responses before and after fine-tuning, showcasing improved adherence to instructions. - Acknowledgment that further training could enhance the model's performance. Made with HARPA AI
3:37 format 4:15 follow a different format 4:26 Indicate the end of user input 4:33 special token Indicate the end of model response 4:39 you need to provide your data in this format 5:08 def create_prompt 5:31 System message 6:16 Load our based model
Thanks for the tag @Prompt Engineering! What else is your audience requesting the most these days? Would love to find ways to create some value for them together!
Thanks for the amazing work you guys are doing! really appreciate it. I think deployment is a topic that will be really valuable to my audience. Let's explore how to collaborate.
@@engineerprompt absolutely! We started delving deper into deployment with LangServe and vLLM events in recent weeks. We'll connect to figure out next steps!
I've noticed that Mixtral 8x7b-instruct ( and other mistral models ) constantly repeat part of the system prompt. Have you noticed this / found a fix for it?
Thanks for the video 😃 i just have a question , is it possible to use the model through an API and also provide the source files for the data with the response ?
Thanks for the guide! How to continue fine-tuning process such as in this case? Can you load previous work (Lora) and carry on, or do you need to restart?
That's a great video. Thanks for sharing. After pushing the model to hugging face, how to host it independently on runpod using VLLM ? When I try to do that, it gives me error. Tried searching a lot of videos and articles. But of no use so far.
Hi, thanks for this step by step guide, but in case we want LLM to learn something new about our domain (let's say it will be book Lord of the Rings) and we later want to ask our model open questions about this book (like 'where Frodo gets his sword?') what should we do? We definetely cannot prepare dataset in form of QnA, so it should self-supervised training. But I never saw examples of doing this and I can't image how it supposed to be done? Is it even possible? Looks like we should start from base model, fine-tune it somehow with our book, and later we should apply fine-tuning for instruct on top of it, right? But in this case someone still should prepare this QnA? I'm frustrated.
Your use case sounds like a classic RAG one. It's not necessary to fine-tune for that. Although a fine-tuned model + RAG would probably create even better results, the effort here doesn't seem worth it. The video Building Corrective RAG from scratch with open-source, local LLMs from langchain (ruclips.net/video/E2shqsYwxck/видео.html) might help you, it also incudes a web search option, in case the provided context isn't sufficient, which should work pretty good with things like popular books. So, it's not limited to that and can be used in basically any domain. But you could also just build a RAG app without that. I would suggest a combination of a MultiQueryRetriever and a ParentDocumentRetriever for retrieving your context. Nevertheless, if you still want to fine tune: From what I have learned so far it is possible to create datasets using LLMs: e.g. you prompt an instruct LLM to create questions based on context chunks and then use those questions and chunks to create answers. You will find similar methods on this channel e.g. "automate dataset creation for Llama-2 with GPT-4".
The base model ofmistral is uncensored, but you can't fine tune one model with another model. Both are of different architecture, you can't even merge or fine tune between same models of different parameters like between 7B and 13B either, so forget completely different models.
Probably no way at the moment to run it on the colab gpu but you can look at the 2bit quantized version. If you are running this model as part of production pipeline, I would suggest to look at api providers such as together AI. They have really good pricing on it
I could not run it in A100 of Colab. It complains of lack of memory, not too much: actually less than 1GB. The "copilot" of colab gives some suggestions such as reducing batch size or the max_split_size_mb parameter, but that does not reduce enough. Any ideas? Good notebook
@@luciolrv It complains of less than 1GB of memory, but that's because it's loading the model a bit at a time so the error message isn't accurate. Kaggle doesn't offer better GPU's either. You'll need to setup a VM with an A100 80GB or H100. Unfortunately you'll probably just have to go through the hassle of setting up a VM with one of those GPU's via GCP or AWS.
Even training the quantized version of the full model will need a powerful GPU. That's why LoRa is used to add extra layers that are trained instead of the actual model. Hope this helps
Can you clairfy, 1 epoch would be one run of the full data (34333 steps of your trimmed data) Why would you run this 2 epochs, does going over the data twice improve it? Also how did you determine 32 was a good batch size for this data size? (this is about 0.9% of the data?)
Batch size determines how much data is fed to your model at once. 32 is the max I could do on the available hardware. Usually you will see that to be much lower. In regards to the epochs, you are right. In one epoch, the model will see each example once. If you have small amount of data, you might want to go over multiple epoch so the model can actually learn from the data but you need to be careful that the model can also overfit. For large amount of data (billions or trillions of tokens) its very expensive and time consuming to have several epochs over the data, that's why you mostly see models trained for one more two epochs only. Hope this helps.
Hi, I am trying to build an organisation level AI trained on my company data I would to know how can I create dataset for my data to be trained on mistral AI I was unable to find any tutorial on how to create a dataset for large data
In this case, you probably want to further pretrain the base model with your dataset (you don't need prompt & instructions format) and then finetune it on a dataset. Or just use RAG.
i wish someone made a video from collecting data example pdf, conver that to working dataset tha can be used to train model, everyone is using huggingface models and just retrain another llm
please let me know how to create a fixed forms with the below structures with special command to LLM: Give me score out of 4 for (based on the TOEFL rubric) without any explanation, just display the score. General Description: Topic Development: Language Use: Delivery: Overall Score: Identify the number of grammatical and vocabulary errors, providing a sentence-by-sentence breakdown. 'Sentence 1: Errors: Grammar: Vocabulary: Recommend effective academic vocabulary and grammar:' 'Sentence 2: Errors: Grammar: Vocabulary: Recommend effective academic vocabulary and grammar:' .......
64 gb vram kaha se laaoge pata nahi kaunse dataset par fine tune kiya hai bhai kisi kaam ka nahi hai ye video tere paise to view se aa gaye humare paise kaise banege
🎯 Key Takeaways for quick navigation:
00:00 🚀 *Introduction to Fine-Tuning Mixtral 87B Model*
- Overview of the video's purpose: fine-tuning Mixtral 87B model from Mistral AI on a custom dataset.
- Mention of the popularity and potential of Mixtral 87B as a mixture of experts model.
- Emphasis on practical considerations for fine-tuning, such as VRAM requirements and dataset details.
01:28 🛠️ *Installing Required Packages and Data Set Overview*
- Installation of necessary packages: Transformers, TRL, accelerate, P torch bits, and bytes.
- Discussion on using the Mosaic ML Instruct with 3 datasets for fine-tuning.
- Overview of the dataset structure, splits, and sources.
03:45 📝 *Formatting Data for Fine-Tuning Mixtral 87B*
- Explanation of the prompt template for fine-tuning, specific to Mixtral 87B Instruct version.
- Discussion on rearranging data to make it more challenging by creating instructions from provided text.
- Demonstration of a function to reformat the initial data into the desired prompt template.
06:28 🧩 *Loading Base Model and Configuring for Fine-Tuning*
- Acknowledgment of the source for the notebook and clarification that the base version is used.
- Setting configurations, loading the model, and tokenizer, along with using Flash attention.
- Explanation of the importance of setting up configurations for a smooth fine-tuning process.
08:18 🔄 *Checking Base Model Responses Before Fine-Tuning*
- Use of a function to check responses from the base model before any fine-tuning.
- Illustration of the base model behavior in generating responses to a given prompt.
- Recognition that the base model tends to follow next word prediction rather than explicit instructions.
10:06 📏 *Determining Max Sequence Length for Fine-Tuning*
- Explanation of the importance of max sequence length in fine-tuning Mixtral 87B.
- Presentation of a code snippet to analyze the distribution of sequence lengths in the dataset.
- Emphasis on selecting a max sequence length that covers the majority of examples.
12:20 🧠 *Adding Adapters with Lura for Fine-Tuning*
- Overview of the Mixtral 87B architecture, focusing on linear layers for adding adapters.
- Introduction to Lura configuration for attaching adapters to specific layers.
- Demonstration of setting hyperparameters and using the TRL package for supervised fine-tuning.
14:36 🚥 *Setting Up Trainer and Initiating Fine-Tuning*
- Verification of multiple GPUs for parallelization during model training.
- Definition of output directory and selection of training epochs or steps.
- Importance of configuring the trainer, including considerations for max sequence length.
16:50 📈 *Analyzing Fine-Tuning Results and Storing Model*
- Presentation of training and validation loss graphs, indicating a gradual decrease.
- Acknowledgment of the need for potential longer training for better model performance.
- Demonstration of storing the fine-tuned model weights locally and pushing to Hugging Face repository.
17:46 🔄 *Testing Fine-Tuned Model Responses*
- Utilization of the fine-tuned model to generate responses to a given prompt.
- Comparison of responses before and after fine-tuning, showcasing improved adherence to instructions.
- Acknowledgment that further training could enhance the model's performance.
Made with HARPA AI
3:37 format
4:15 follow a different format
4:26
Indicate the end of user input
4:33
special token Indicate the end of model response
4:39
you need to provide your data in this format
5:08
def create_prompt
5:31
System message
6:16
Load our based model
Thank you for sharing, this is very helpful! Looking forward to the next videos!
why are you using packing in the SFTTrainer if you just said that you're going to pad the examples?
Can you explain ?
kudos, really simple and direct explaination.
Thanks for the tag @Prompt Engineering! What else is your audience requesting the most these days? Would love to find ways to create some value for them together!
Thanks for the amazing work you guys are doing! really appreciate it. I think deployment is a topic that will be really valuable to my audience. Let's explore how to collaborate.
@@engineerprompt absolutely! We started delving deper into deployment with LangServe and vLLM events in recent weeks. We'll connect to figure out next steps!
at 5:58, Why is the sample["response"] given as the input and sample["prompt"] is given as response
Awesome content as usual! Thanks!
Thank you 😊
I've noticed that Mixtral 8x7b-instruct ( and other mistral models ) constantly repeat part of the system prompt. Have you noticed this / found a fix for it?
Thanks for the video 😃 i just have a question , is it possible to use the model through an API and also provide the source files for the data with the response ?
can you also make a video on fine-tuning multimodal models like llava, cog-vlm
Can you make this for home computer use in terms of my personal data and tech it to use tools on your system and online
IM sceptical this actually is effectively training mixtral MoE model and not making it worse!
Thanks for the guide!
How to continue fine-tuning process such as in this case?
Can you load previous work (Lora) and carry on, or do you need to restart?
I think you can do that by storing different check points
how do you format a prompt that has multiple requests and responses within the same context???????
very nicely explained!
That's a great video. Thanks for sharing.
After pushing the model to hugging face, how to host it independently on runpod using VLLM ? When I try to do that, it gives me error. Tried searching a lot of videos and articles. But of no use so far.
did you come right?
Great video just have one question can we use the fine-tuned model as a pickle file?
Hi, thanks for this step by step guide, but in case we want LLM to learn something new about our domain (let's say it will be book Lord of the Rings) and we later want to ask our model open questions about this book (like 'where Frodo gets his sword?') what should we do? We definetely cannot prepare dataset in form of QnA, so it should self-supervised training. But I never saw examples of doing this and I can't image how it supposed to be done? Is it even possible? Looks like we should start from base model, fine-tune it somehow with our book, and later we should apply fine-tuning for instruct on top of it, right? But in this case someone still should prepare this QnA? I'm frustrated.
Your use case sounds like a classic RAG one. It's not necessary to fine-tune for that. Although a fine-tuned model + RAG would probably create even better results, the effort here doesn't seem worth it. The video Building Corrective RAG from scratch with open-source, local LLMs from langchain (ruclips.net/video/E2shqsYwxck/видео.html) might help you, it also incudes a web search option, in case the provided context isn't sufficient, which should work pretty good with things like popular books. So, it's not limited to that and can be used in basically any domain. But you could also just build a RAG app without that. I would suggest a combination of a MultiQueryRetriever and a ParentDocumentRetriever for retrieving your context.
Nevertheless, if you still want to fine tune: From what I have learned so far it is possible to create datasets using LLMs: e.g. you prompt an instruct LLM to create questions based on context chunks and then use those questions and chunks to create answers. You will find similar methods on this channel e.g. "automate dataset creation for Llama-2 with GPT-4".
Could you make a tutorial teaching how to convert a model to ggml format?
How can I use a generative model to manipulate content of my website
Ex. Showing response from my site based on prompt given by the user
So with two 3090s this should work? And what about using multiple different gpus for training? Like I have one 3090ti 24g and one 4060 8g
What were your VM instance specs. It is struggling with an A100?
Can I use this to train a model to answer questions from a list of pdfs?
Great video! Could you please consider training and deploying it in Sagemaker?
I am going to create a video on deployment soon
Could you please share the requirement.txt, i am having version conflicts despite using A100 GPU!
can you finetune Uncensored Models of this with gemini pro ai ?
The base model ofmistral is uncensored, but you can't fine tune one model with another model. Both are of different architecture, you can't even merge or fine tune between same models of different parameters like between 7B and 13B either, so forget completely different models.
Thanks, what is the cost to do this? Server cost?
requires colab pro to work?
Awesome man, any idea of how to get this running on a colab gpu or inference cost down?
Probably no way at the moment to run it on the colab gpu but you can look at the 2bit quantized version. If you are running this model as part of production pipeline, I would suggest to look at api providers such as together AI. They have really good pricing on it
Are you finetuning the mixtral instruct version they just released or base model??
In this video, just the base version
So can't we run in colab or kaggle notebook?
in the video descr it says no (not on T4)
I could not run it in A100 of Colab. It complains of lack of memory, not too much: actually less than 1GB. The "copilot" of colab gives some suggestions such as reducing batch size or the max_split_size_mb parameter, but that does not reduce enough. Any ideas? Good notebook
@@luciolrv It complains of less than 1GB of memory, but that's because it's loading the model a bit at a time so the error message isn't accurate. Kaggle doesn't offer better GPU's either. You'll need to setup a VM with an A100 80GB or H100. Unfortunately you'll probably just have to go through the hassle of setting up a VM with one of those GPU's via GCP or AWS.
Can't we train the quantized version in a smaller GPU instead of training the full model?
Even training the quantized version of the full model will need a powerful GPU. That's why LoRa is used to add extra layers that are trained instead of the actual model. Hope this helps
Can you clairfy, 1 epoch would be one run of the full data (34333 steps of your trimmed data) Why would you run this 2 epochs, does going over the data twice improve it?
Also how did you determine 32 was a good batch size for this data size? (this is about 0.9% of the data?)
I think the companies that trained big LLMs usually used 2-3 epochs
Batch size determines how much data is fed to your model at once. 32 is the max I could do on the available hardware. Usually you will see that to be much lower. In regards to the epochs, you are right. In one epoch, the model will see each example once. If you have small amount of data, you might want to go over multiple epoch so the model can actually learn from the data but you need to be careful that the model can also overfit.
For large amount of data (billions or trillions of tokens) its very expensive and time consuming to have several epochs over the data, that's why you mostly see models trained for one more two epochs only. Hope this helps.
Hi,
I am trying to build an organisation level AI trained on my company data
I would to know how can I create dataset for my data to be trained on mistral AI
I was unable to find any tutorial on how to create a dataset for large data
Did you found solution for this?
Looking for same thing@@conscious_yogi
How do I fine tune without prompt and instruction? I basically want the model to "know" about a thousand very recent web articles.
In this case, you probably want to further pretrain the base model with your dataset (you don't need prompt & instructions format) and then finetune it on a dataset. Or just use RAG.
i wish someone made a video from collecting data example pdf, conver that to working dataset tha can be used to train model, everyone is using huggingface models and just retrain another llm
please let me know how to create a fixed forms with the below structures with special command to LLM:
Give me score out of 4 for (based on the TOEFL rubric) without any explanation, just display the score.
General Description:
Topic Development:
Language Use:
Delivery:
Overall Score:
Identify the number of grammatical and vocabulary errors, providing a sentence-by-sentence breakdown.
'Sentence 1:
Errors:
Grammar:
Vocabulary:
Recommend effective academic vocabulary and grammar:'
'Sentence 2:
Errors:
Grammar:
Vocabulary:
Recommend effective academic vocabulary and grammar:'
.......
Great
I think you can rent an H100 for $5/hour. So this would cost about $7
where?
❤👍🏼
64 gb vram kaha se laaoge
pata nahi kaunse dataset par fine tune kiya hai
bhai kisi kaam ka nahi hai ye video
tere paise to view se aa gaye
humare paise kaise banege
🥹‼️I am student.. who has no budget at all..but intrested in training any of the llm with my own dataset
What are the cost effective ways?
Get a 3B parameter model and play around with that. This can probably fit on the free T4 GPU in Google Colab since it's much smaller.
How much VRAM is necessary?
About 45GB
@@engineerprompt Do you suggest fine tuning on base model, and then further fine tuning with Q&A instruct format data?