It looks like some people have tried this. I suspect it will be quite challenging (since most people aren't doing this sort of fine tuning), but possible. I can't confirm either way though, since we haven't tried it. We'll put this on the video ideas list though, and may be able to make a video on it in the future.
pretty cool video. I had two questions: 1) what is the difference between loading the model in 4-bit quantization but doing tuning in 16 bit? previously, I loaded a model in bfloat16 and didn't have to specify anything when doing tuning - maybe I am misunderstanding; 2) do you have any video or code recommendations for where I can see fine-tuning but not using LoRA? I feel semi committed to trying this first before going the LoRA route. ty for the great video (and music hehe)
1) You can do the Mistral "base model" weights via 4-bit quantization, while using 16-bits for the fine-tuning adapter layers. So, for your fine-tuned model, most of the layers will be quantized, but with a bit of non-quantized layers "on top". This is often a sweet spot between computational demands and model accuracy (generally a tradeoff). 2) We don't currently have non-adapter fine-tuning videos (e.g., full-parameter fine-tuning), but will try to create a video on this topic. Thanks for watching and good luck.
Yes, it works with question and answer data. The format is flexible - the approach chosen in the video is a somewhat standard format, that we've seen works well with Mistral. For your dataset, change "### Story" to "### Question" and change "### Summary" to "### Answer". Then you could try leaving the instruction blank or doing something generic like "Write a response that appropriately answers the question, while being as clear, factual, and helpful as possible.". I suspect that will work well for you.
Great video, how to save a model using saving strategy instead of getting the last one, is there an argument to add or something? I don't want to use the one at the end of training
The trainer automatically saves checkpoints to the output_dir model output directory, so to use a checkpoint (e.g., with `from_pretrained`), use the checkpoint's folder within that output_dir. For example, from_pretrained might use "outputs/checkpoint-200".
@@nodematicthank you so much that was very helpfull, is the number of epochs controlled by max_steps? What should I set max_steps for a dataset of 10k examples with about 1500 tokens in each? Thank you so much, this is the best practical tutorial for finetuning I found on youtube
The number of epochs is the total_steps / (dataset size / effective batch size). Effective batch size is per_device_train_batch_size * gradient_accumulation_steps * number_of_devices, so 8 if you haven't changed these values in the notebook. For example, 500 steps, 10k examples, and 8 batch size, would result in a little less than half an epoch through the dataset. For such a large fine-tuning dataset, maybe try 3000-5000 steps, and figure out where the steps have significantly diminishing returns on the loss (you can stop the training cell when you see this, to save money). Or you could let it run through and plot the loss to help find that "sweet spot", then use the checkpoint at that training step. Also, feel free to specify the training in terms of `num_train_epochs` rather than `max_steps`, if that makes more sense for you.
Sure thing - the fine-tuning code is updated and maintained at the notebook links here github.com/unslothai/unsloth?tab=readme-ov-file#-finetune-for-free.
How can I fine-tune the LLAMA 3 8B model for free on my local hardware, specifically a ThinkStation P620 Tower Workstation with an AMD Ryzen Threadripper PRO 5945WX processor, 128 GB DDR4 RAM, and two NVIDIA RTX A4000 16GB GPUs in SLI? I am new to this and have prepared a dataset for training. Is this feasible?
The approach highlighted in the video may work if your dataset doesn't have a very high token count. Just download the notebook and run it on your local machine. I haven't tried A4000s, but it's CUDA+Ampere technology, so should work similarly. The fine-tuning would need to stay within 16 GB GPU RAM since the open source, free unsloth doesn't include multi-GPU support.
hi can you make a follow up video where you save the GGUF finetuned version for it directly? I tried importing llama.cpp using git clone but it didnt seem to work, really looking forward to it keep up the good work!
Been using Mistral on Ollama and it is pretty amazing the way it compares to larger models like Gemini and Llama2
Super great work on the video and tutorial! Super insightful and just subscribed :) Thanks for sharing Unsloth as well! :)
Thanks, Daniel! Huge fan - appreciate what you guys are doing!
Awesome! Subscribed ❤
great video could you fine tune the model for images dataset like visual question answering
It looks like some people have tried this. I suspect it will be quite challenging (since most people aren't doing this sort of fine tuning), but possible. I can't confirm either way though, since we haven't tried it. We'll put this on the video ideas list though, and may be able to make a video on it in the future.
@@nodematic Thanks
Outstanding! Will this approach works for data extraction? Lets say I want all the Titles of a book? Thanks!
Thanks! Yes, your fine tuned model could definitely be focused on data extraction, like book titles.
pretty cool video. I had two questions: 1) what is the difference between loading the model in 4-bit quantization but doing tuning in 16 bit? previously, I loaded a model in bfloat16 and didn't have to specify anything when doing tuning - maybe I am misunderstanding; 2) do you have any video or code recommendations for where I can see fine-tuning but not using LoRA? I feel semi committed to trying this first before going the LoRA route. ty for the great video (and music hehe)
1) You can do the Mistral "base model" weights via 4-bit quantization, while using 16-bits for the fine-tuning adapter layers. So, for your fine-tuned model, most of the layers will be quantized, but with a bit of non-quantized layers "on top". This is often a sweet spot between computational demands and model accuracy (generally a tradeoff).
2) We don't currently have non-adapter fine-tuning videos (e.g., full-parameter fine-tuning), but will try to create a video on this topic.
Thanks for watching and good luck.
Does this fine tuning method work with question and answer dataset? Do you need an instruction too? If so, what does the json format need to be?
Yes, it works with question and answer data. The format is flexible - the approach chosen in the video is a somewhat standard format, that we've seen works well with Mistral.
For your dataset, change "### Story" to "### Question" and change "### Summary" to "### Answer". Then you could try leaving the instruction blank or doing something generic like "Write a response that appropriately answers the question, while being as clear, factual, and helpful as possible.". I suspect that will work well for you.
Great video, how to save a model using saving strategy instead of getting the last one, is there an argument to add or something? I don't want to use the one at the end of training
The trainer automatically saves checkpoints to the output_dir model output directory, so to use a checkpoint (e.g., with `from_pretrained`), use the checkpoint's folder within that output_dir. For example, from_pretrained might use "outputs/checkpoint-200".
@@nodematicthank you so much that was very helpfull, is the number of epochs controlled by max_steps? What should I set max_steps for a dataset of 10k examples with about 1500 tokens in each? Thank you so much, this is the best practical tutorial for finetuning I found on youtube
The number of epochs is the total_steps / (dataset size / effective batch size). Effective batch size is per_device_train_batch_size * gradient_accumulation_steps * number_of_devices, so 8 if you haven't changed these values in the notebook.
For example, 500 steps, 10k examples, and 8 batch size, would result in a little less than half an epoch through the dataset. For such a large fine-tuning dataset, maybe try 3000-5000 steps, and figure out where the steps have significantly diminishing returns on the loss (you can stop the training cell when you see this, to save money). Or you could let it run through and plot the loss to help find that "sweet spot", then use the checkpoint at that training step.
Also, feel free to specify the training in terms of `num_train_epochs` rather than `max_steps`, if that makes more sense for you.
Can you please share the colab notebook too!
Sure thing - the fine-tuning code is updated and maintained at the notebook links here github.com/unslothai/unsloth?tab=readme-ov-file#-finetune-for-free.
Awesome! Does it support fine-tuning of GLM-4?
No, I don't see that option. It looks like Unsloth has Hugging Face assets for fine-tuning Qwen and Yi through.
How can I fine-tune the LLAMA 3 8B model for free on my local hardware, specifically a ThinkStation P620 Tower Workstation with an AMD Ryzen Threadripper PRO 5945WX processor, 128 GB DDR4 RAM, and two NVIDIA RTX A4000 16GB GPUs in SLI? I am new to this and have prepared a dataset for training. Is this feasible?
The approach highlighted in the video may work if your dataset doesn't have a very high token count. Just download the notebook and run it on your local machine. I haven't tried A4000s, but it's CUDA+Ampere technology, so should work similarly.
The fine-tuning would need to stay within 16 GB GPU RAM since the open source, free unsloth doesn't include multi-GPU support.
@@nodematic Thank you for the clarification
hi can you make a follow up video where you save the GGUF finetuned version for it directly? I tried importing llama.cpp using git clone but it didnt seem to work, really looking forward to it keep up the good work!
Yes, we'll try to make a video on this. Thanks for the suggestion.
very good video.
can you do a tutorial how make a dataset parquet for chat template like chatml or others?
Yeah, we'll try to make a tutorial on this. Thanks for the suggestion.
It looks like you saw the new video (thanks for the comment). Posting the link here for others ruclips.net/video/9BN9Wz9azNg/видео.html.
Thanks!
I like my LLMs with 2 shots of tequila and a lot less moderation.... #FineTuned