Check for the unsloth framework, not only does it support quantisation, Lora and peft, but it's also optimised for Nvidia, by an ex Nvidia guy, and the open source version trains 2 x faster, the paying one you can run on more than 1 GPU, but the open source is more than enough for playing around.
Can you show the before and after performance of the LLM? In other words, for a given prompt, show what the response was before fine tuning and then show the response to the same prompt after fine tuning? Need to see if it makes a difference.
🎯 Key Takeaways for quick navigation: 00:00 🚀 *Overview of Fine-Tuning Mistral 7B on Custom Data* - Introduction to fine-tuning large language models on task-specific datasets. - Mention of Mistral 7B as a suitable option for fine-tuning. - Overview of the video, including data formatting and an alternative for fine-tuning without powerful GPUs. 01:33 📦 *Understanding Instruct V3 Dataset Structure* - Explanation of the structure of the Instruct V3 dataset. - Details on the columns in the dataset: prompt, model response, and source. - Identification of training and test data splits and composition of the dataset. 03:12 🧹 *Filtering and Selecting Data for Fine-Tuning* - Filtering the dataset to focus on the Dolly Harmful and Harmless data. - Insight into the Lambda function for dataset filtering. - Selection of a subset of examples for both training and test sets. 04:57 🔄 *Formatting Data for Training* - Explanation of the desired prompt template for Mistral 7B fine-tuning. - Introduction of a Python function to transform the dataset according to the prompt template. - Application of the prompt template to create a structured dataset. 09:57 🛠️ *Applying Prompt Template to Dataset* - Application of the prompt template using the create_prompt function. - Demonstration of how the function transforms the system prompt in a training example. - Discussion on applying the prompt template to the entire dataset using the Python map function. 11:32 ⚙️ *Introduction to Gradient for Large Language Model Fine-Tuning* - Introduction to Gradient as a platform for fine-tuning and serving large language models. - Highlights of Gradient's features, including API serving and infrastructure management. - Encouragement to explore Gradient for custom fine-tuning needs. 13:47 🔍 *Loading Model for Fine-Tuning with Low Precision* - Loading the Mistral 7B model for fine-tuning. - Explanation of using 4-bit precision for model loading to reduce VRAM usage. - Introduction of the tokenizer for Mistral 7B. 14:57 🤖 *Examining Base Model Response to New Prompt Template* - Testing Mistral 7B's response to the new prompt template. - Observation of model behavior and its limitations with the provided instruction. - Setting the stage for fine-tuning to improve model performance. 19:42 🚂 *Supervised Fine-Tuning Process with Low-Rank Adaptation (LURA)* - Explanation of the low-rank adaptation (LURA) concept for reducing trainable weights. - Configuration of LURA parameters and application to the Mistral 7B model. - Detailed discussion on hyperparameters for the fine-tuning process. 20:52 🧪 *Testing Fine-Tuned Model* - Creation of a function to test the fine-tuned Mistral 7B model. - Example input and model response demonstrating successful fine-tuning. - Considerations for pushing the fine-tuned model to GitHub for future use. 21:47 🎓 *Recap and Future Content* - Recap of the key steps in fine-tuning Mistral 7B on a custom dataset. - Encouragement for feedback and suggestions for future content. - Mention of related videos and resources for further exploration. Made with HARPA AI
TL;DR warning: This post is about this video's value as an instructional tool for job-seeking self-learners. I hope other's find use in my comments, At the end of your video you say, if you want me to make content like these and go into a lot more details of the training process I would love to do that. " I would love it too." ! Here's why: Trying to keep up with AI was a self-study learner whose goal is to land a meaningful, well-paying job in the field is next to impossible without well-defined course of action, content, and constraints, much like training a model. I am a paid subscriber/member of your YT channel and have watched many of your videos, all well-done, but this one stood out for me as the perfect template for accelerated learning in today's fast-moving, ever-changing environment. I am an instructional designer by formal trading, specializing in accessible and accelerated learning, and I think this video's content (amount and type information), the visible instruction (what and how you choose to say, do and show on screen) and supplemental material (the notes and links) are exceptional, highly effective and efficient. It was this video that made compelled me to buy you coffee each moth. You're going to need it. My roadmap, learning model, to a career (and life-long learning) in AI specifically learning and development affective computing includes you, Radu Mariescu-Istodor (@Radu) at freecodecamp (@freecodecamp), and Elvis Saravia (@elvissaravia ) as primary components . It's a model that I will continue to fine tune. I encourage other on any self-study path, but especially with work as an outcome to get a roadmap, and following you is place to start. I all of this with you in effort to offer you feedback that is a little more informative and helpful to you. And to thank you for you hard work.
Thank you for your support and this comment. Really appreciate it. Self-learning and tinkering is really rewarding in the long run. Love the feedback, my goal is to keep balance between beginner friendly and practical content. I have so much in store for next year.
Thank you for the helpful content. I have a question: If we want to fine-tune for a chat model instead of instruct, should we change the training prompt to answer the question rather than generate the instruction?
If you are training a chat model, you just need to have user and response. You dont' need to provide instruction in that case. You also want to add multi-turn chat examples.
@@engineerprompt thx for response and in the multi-turn scenario, the training input contains odd number of messages end with user message and the response will be the last assistant message. Am I correct?
I have a question about the tokenizer used in the tutorial. Why is "mistralai/Mistral-7B-v0.1" used instead of "mistralai/Mistral-7B-Instruct-v0.1"? By the way, the model itself uses "mistralai/Mistral-7B-Instruct-v0.1". Thanks.
Excellent video.What i am missing is after finetune gow can you upload mistral plus finetuned files i my your oersonal folder in huggingface? Also how can you interact with the midel plus finetuned data using hf text inference?
Hi, according to the video, may I ask one question: when I executed this: `instruct_tune_dataset = instruct_tune_dataset.map(create_prompt)` there is an error: TypeError: Provided `function` which is applied to all elements of table returns a variable of type . Make sure provided `function` returns a variable of type `dict` (or a pyarrow table) to update the dataset or `None` if you are only interested in side effects. so I adjusted the function create_prompt() to return like this: return {'prompt': full_prompt}, then re-run the cell of ` instruct_tune_dataset = instruct_tune_dataset.map(create_prompt)`, it works. but why there is no problem to set `formatting_func=create_prompt` for SFTTrainer for training when create_prompt() only `return full_prompt`, the training process succeed.
The base model will take a lot more data and compute to fine-tune it and start giving sensible answers. But in practice, if you have good quality data, it's already better to fine-tune the base version with your own prompt template.
I have one question... if i running the code in google colabs, ¿where is the model saving? I mena, if i run it on gradient i think it saves there and then I can use it by an API. But when i'm using google colabs only? How can i see the final model?
You can push the model to huggingface, there is a section in the notebook or you can download the model checkpoints from colab and run them on your local machine
You want to look the quantized version (huggingface.co/TheBloke/Mixtral-SlimOrca-8x7B-GPTQ). It seems like even for 3bit version you need around 18GB to load the model so might be a stretch in your case
Because most of the people in this space are new to IT as a whole / have no actual real world experience with software engineering so they depend on everyone else doing everything for them instead.
model = AutoModelForCausalLM.from_pretrained( "mistralai/Mistral-7B-Instruct-v0.1", device_map='auto', quantization_config=nf4_config, use_cache=False ) this part is not running in colab
Try out gradient here: gradient.1stcollab.com/engineerprompt
Check for the unsloth framework, not only does it support quantisation, Lora and peft, but it's also optimised for Nvidia, by an ex Nvidia guy, and the open source version trains 2 x faster, the paying one you can run on more than 1 GPU, but the open source is more than enough for playing around.
Can you show the before and after performance of the LLM? In other words, for a given prompt, show what the response was before fine tuning and then show the response to the same prompt after fine tuning? Need to see if it makes a difference.
its already there right?frst it is not giving instruction right?giving something else
One of the best video i've seen on fine-tuning. Just love it. Thank you
🎯 Key Takeaways for quick navigation:
00:00 🚀 *Overview of Fine-Tuning Mistral 7B on Custom Data*
- Introduction to fine-tuning large language models on task-specific datasets.
- Mention of Mistral 7B as a suitable option for fine-tuning.
- Overview of the video, including data formatting and an alternative for fine-tuning without powerful GPUs.
01:33 📦 *Understanding Instruct V3 Dataset Structure*
- Explanation of the structure of the Instruct V3 dataset.
- Details on the columns in the dataset: prompt, model response, and source.
- Identification of training and test data splits and composition of the dataset.
03:12 🧹 *Filtering and Selecting Data for Fine-Tuning*
- Filtering the dataset to focus on the Dolly Harmful and Harmless data.
- Insight into the Lambda function for dataset filtering.
- Selection of a subset of examples for both training and test sets.
04:57 🔄 *Formatting Data for Training*
- Explanation of the desired prompt template for Mistral 7B fine-tuning.
- Introduction of a Python function to transform the dataset according to the prompt template.
- Application of the prompt template to create a structured dataset.
09:57 🛠️ *Applying Prompt Template to Dataset*
- Application of the prompt template using the create_prompt function.
- Demonstration of how the function transforms the system prompt in a training example.
- Discussion on applying the prompt template to the entire dataset using the Python map function.
11:32 ⚙️ *Introduction to Gradient for Large Language Model Fine-Tuning*
- Introduction to Gradient as a platform for fine-tuning and serving large language models.
- Highlights of Gradient's features, including API serving and infrastructure management.
- Encouragement to explore Gradient for custom fine-tuning needs.
13:47 🔍 *Loading Model for Fine-Tuning with Low Precision*
- Loading the Mistral 7B model for fine-tuning.
- Explanation of using 4-bit precision for model loading to reduce VRAM usage.
- Introduction of the tokenizer for Mistral 7B.
14:57 🤖 *Examining Base Model Response to New Prompt Template*
- Testing Mistral 7B's response to the new prompt template.
- Observation of model behavior and its limitations with the provided instruction.
- Setting the stage for fine-tuning to improve model performance.
19:42 🚂 *Supervised Fine-Tuning Process with Low-Rank Adaptation (LURA)*
- Explanation of the low-rank adaptation (LURA) concept for reducing trainable weights.
- Configuration of LURA parameters and application to the Mistral 7B model.
- Detailed discussion on hyperparameters for the fine-tuning process.
20:52 🧪 *Testing Fine-Tuned Model*
- Creation of a function to test the fine-tuned Mistral 7B model.
- Example input and model response demonstrating successful fine-tuning.
- Considerations for pushing the fine-tuned model to GitHub for future use.
21:47 🎓 *Recap and Future Content*
- Recap of the key steps in fine-tuning Mistral 7B on a custom dataset.
- Encouragement for feedback and suggestions for future content.
- Mention of related videos and resources for further exploration.
Made with HARPA AI
1:45
Data set structure
2:52
download the data set
5:48
Combine the prompt and model response
6:39
Create prompt function
重點
Best video on fine-tuning. Thank you so much.
Hi nice video,
I have a question for you,
why are you not using the same prompt template of mistral instruct in orden to not confus the model?
Awesome content! Learned a lot!
Awesome video! Thanks for sharing!
TL;DR warning: This post is about this video's value as an instructional tool for job-seeking self-learners. I hope other's find use in my comments,
At the end of your video you say, if you want me to make content like these and go into a lot more details of the training process I would love to do that. " I would love it too." ! Here's why:
Trying to keep up with AI was a self-study learner whose goal is to land a meaningful, well-paying job in the field is next to impossible without well-defined course of action, content, and constraints, much like training a model. I am a paid subscriber/member of your YT channel and have watched many of your videos, all well-done, but this one stood out for me as the perfect template for accelerated learning in today's fast-moving, ever-changing environment.
I am an instructional designer by formal trading, specializing in accessible and accelerated learning, and I think this video's content (amount and type information), the visible instruction (what and how you choose to say, do and show on screen) and supplemental material (the notes and links) are exceptional, highly effective and efficient. It was this video that made compelled me to buy you coffee each moth. You're going to need it.
My roadmap, learning model, to a career (and life-long learning) in AI specifically learning and development affective computing includes you, Radu Mariescu-Istodor (@Radu) at freecodecamp (@freecodecamp), and Elvis Saravia (@elvissaravia ) as primary components . It's a model that I will continue to fine tune. I encourage other on any self-study path, but especially with work as an outcome to get a roadmap, and following you is place to start. I all of this with you in effort to offer you feedback that is a little more informative and helpful to you. And to thank you for you hard work.
Thank you for your support and this comment. Really appreciate it. Self-learning and tinkering is really rewarding in the long run. Love the feedback, my goal is to keep balance between beginner friendly and practical content. I have so much in store for next year.
Great video!
@engineerprompt What's the cost to train using the small sample from the video?
Thank you for the helpful content. I have a question: If we want to fine-tune for a chat model instead of instruct, should we change the training prompt to answer the question rather than generate the instruction?
If you are training a chat model, you just need to have user and response. You dont' need to provide instruction in that case. You also want to add multi-turn chat examples.
@@engineerprompt thx for response and in the multi-turn scenario, the training input contains odd number of messages end with user message and the response will be the last assistant message. Am I correct?
yes please go into a lot more detail of training/fine-tuning
Why have you chosen to generate the question alone as opposed to answering? If I finetune to generate answers, what would I be missing?
fantastic video! thank you!!!
I have a question about the tokenizer used in the tutorial. Why is "mistralai/Mistral-7B-v0.1" used instead of "mistralai/Mistral-7B-Instruct-v0.1"? By the way, the model itself uses "mistralai/Mistral-7B-Instruct-v0.1". Thanks.
Best explanation
Can you please explain why your structure is better than the one default - or any other structure, for that matter. Thx.
Very useful; You might want to train it for at least a full epoch otherwise you will not pass through all the training data.
Yes, I totally agree with you. This was just for demonstration purposes.
can you go through how to fine tune a dataset on mistral for classification problems?thanks
Awesome video! Thanks for sharing!
May I ask if there will be any more videos related to the use of the Mistral model at RAG in the future?
Yes, I am planning to make a whole series on RAG. That's one of most important and practical use case of LLMs today.
Excellent video.What i am missing is after finetune gow can you upload mistral plus finetuned files i my your oersonal folder in huggingface? Also how can you interact with the midel plus finetuned data using hf text inference?
Hi, according to the video, may I ask one question:
when I executed this: `instruct_tune_dataset = instruct_tune_dataset.map(create_prompt)`
there is an error: TypeError: Provided `function` which is applied to all elements of table returns a variable of type . Make sure provided `function` returns a variable of type `dict` (or a pyarrow table) to update the dataset or `None` if you are only interested in side effects.
so I adjusted the function create_prompt() to return like this: return {'prompt': full_prompt}, then re-run the cell of ` instruct_tune_dataset = instruct_tune_dataset.map(create_prompt)`, it works.
but why there is no problem to set `formatting_func=create_prompt` for SFTTrainer for training when create_prompt() only `return full_prompt`, the training process succeed.
Any reason why did you not finetune the base mistral (not instruct) model?
The base model will take a lot more data and compute to fine-tune it and start giving sensible answers. But in practice, if you have good quality data, it's already better to fine-tune the base version with your own prompt template.
Please create a video on fine tuning MoE LLM using LoRa adapters such as Mixtural 8x7B MoE LLM
Yes waiting for it. Do create a finetuning video if u get any source plse share link .
I have one question... if i running the code in google colabs, ¿where is the model saving? I mena, if i run it on gradient i think it saves there and then I can use it by an API. But when i'm using google colabs only? How can i see the final model?
You can push the model to huggingface, there is a section in the notebook or you can download the model checkpoints from colab and run them on your local machine
@@engineerprompt and can I run it if i save it in a local folder?
Which mixtral 8x7b model will run Good on 16gb ram and nvidia 3050
You want to look the quantized version (huggingface.co/TheBloke/Mixtral-SlimOrca-8x7B-GPTQ). It seems like even for 3bit version you need around 18GB to load the model so might be a stretch in your case
MistralForCausalLM' object has no attribute 'merge_and_unload'
merged_model = model.merge_and_unload()
error :MistralForCausalLM' object has no attribute 'merge_and_unload'
can u hepl me on this error
Hi. Can you add subtitle (your script) to understand well?
Sure, let me see what I can do here.
@@engineerprompt VERY Thank you
I have a company FAQ Dataset.Will this train the model to answer based on it ?
For FAQ, I would recommend to use a RAG pipeline. Look into my localgpt project
@@engineerprompt Thought so!
Is there any way to use RAG but make the model not say it in the exact same way as it's written ?
I tried to run this on runpod but ran into errors. Has anyone got this to run on runpod?
nice !
Why no one provide anything that can run locally e.g. with RTX 3090 ?
Because most of the people in this space are new to IT as a whole / have no actual real world experience with software engineering so they depend on everyone else doing everything for them instead.
Linux only?
You can do it on windows as well if you have a GPU
@@engineerprompt understood. I mean your demos?
how to slicing data..?? 4:30.. I want to do..
21:17
model = AutoModelForCausalLM.from_pretrained(
"mistralai/Mistral-7B-Instruct-v0.1",
device_map='auto',
quantization_config=nf4_config,
use_cache=False
)
this part is not running in colab