Mistral Fine Tuning for Dummies (with 16k, 32k, 128k+ Context)

Nodematic Tutorials

Просмотров 13 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 1 фев 2025
Наука

Комментарии • 32

@fugamantew 10 месяцев назад ⁺⁵
Been using Mistral on Ollama and it is pretty amazing the way it compares to larger models like Gemini and Llama2
@danielhanchen 10 месяцев назад ⁺¹
Super great work on the video and tutorial! Super insightful and just subscribed :) Thanks for sharing Unsloth as well! :)
@nodematic 10 месяцев назад ⁺¹
Thanks, Daniel! Huge fan - appreciate what you guys are doing!
@fire17102 10 месяцев назад
Awesome! Subscribed ❤
@TheAmazonExplorer731 7 месяцев назад
great video could you fine tune the model for images dataset like visual question answering
@nodematic 7 месяцев назад ⁺¹
It looks like some people have tried this. I suspect it will be quite challenging (since most people aren't doing this sort of fine tuning), but possible. I can't confirm either way though, since we haven't tried it. We'll put this on the video ideas list though, and may be able to make a video on it in the future.
@TheAmazonExplorer731 7 месяцев назад
@@nodematic Thanks
@nicolassuarez2933 9 месяцев назад
Outstanding! Will this approach works for data extraction? Lets say I want all the Titles of a book? Thanks!
@nodematic 9 месяцев назад
Thanks! Yes, your fine tuned model could definitely be focused on data extraction, like book titles.
@JavMend 8 месяцев назад
pretty cool video. I had two questions: 1) what is the difference between loading the model in 4-bit quantization but doing tuning in 16 bit? previously, I loaded a model in bfloat16 and didn't have to specify anything when doing tuning - maybe I am misunderstanding; 2) do you have any video or code recommendations for where I can see fine-tuning but not using LoRA? I feel semi committed to trying this first before going the LoRA route. ty for the great video (and music hehe)
@nodematic 8 месяцев назад
1) You can do the Mistral "base model" weights via 4-bit quantization, while using 16-bits for the fine-tuning adapter layers. So, for your fine-tuned model, most of the layers will be quantized, but with a bit of non-quantized layers "on top". This is often a sweet spot between computational demands and model accuracy (generally a tradeoff).
2) We don't currently have non-adapter fine-tuning videos (e.g., full-parameter fine-tuning), but will try to create a video on this topic.
Thanks for watching and good luck.
@jrfcs18 10 месяцев назад ⁺¹
Does this fine tuning method work with question and answer dataset? Do you need an instruction too? If so, what does the json format need to be?
@nodematic 10 месяцев назад
Yes, it works with question and answer data. The format is flexible - the approach chosen in the video is a somewhat standard format, that we've seen works well with Mistral.
For your dataset, change "### Story" to "### Question" and change "### Summary" to "### Answer". Then you could try leaving the instruction blank or doing something generic like "Write a response that appropriately answers the question, while being as clear, factual, and helpful as possible.". I suspect that will work well for you.
@happygoblin8179 10 месяцев назад
Great video, how to save a model using saving strategy instead of getting the last one, is there an argument to add or something? I don't want to use the one at the end of training
@nodematic 10 месяцев назад ⁺¹
The trainer automatically saves checkpoints to the output_dir model output directory, so to use a checkpoint (e.g., with `from_pretrained`), use the checkpoint's folder within that output_dir. For example, from_pretrained might use "outputs/checkpoint-200".
@happygoblin8179 10 месяцев назад
@@nodematicthank you so much that was very helpfull, is the number of epochs controlled by max_steps? What should I set max_steps for a dataset of 10k examples with about 1500 tokens in each? Thank you so much, this is the best practical tutorial for finetuning I found on youtube
@nodematic 10 месяцев назад
The number of epochs is the total_steps / (dataset size / effective batch size). Effective batch size is per_device_train_batch_size * gradient_accumulation_steps * number_of_devices, so 8 if you haven't changed these values in the notebook.
For example, 500 steps, 10k examples, and 8 batch size, would result in a little less than half an epoch through the dataset. For such a large fine-tuning dataset, maybe try 3000-5000 steps, and figure out where the steps have significantly diminishing returns on the loss (you can stop the training cell when you see this, to save money). Or you could let it run through and plot the loss to help find that "sweet spot", then use the checkpoint at that training step.
Also, feel free to specify the training in terms of `num_train_epochs` rather than `max_steps`, if that makes more sense for you.
@geekyprogrammer4831 10 месяцев назад ⁺¹
Can you please share the colab notebook too!
@nodematic 10 месяцев назад ⁺¹
Sure thing - the fine-tuning code is updated and maintained at the notebook links here github.com/unslothai/unsloth?tab=readme-ov-file#-finetune-for-free.
@Wilr-bb9hd 7 месяцев назад
Awesome! Does it support fine-tuning of GLM-4？
@nodematic 7 месяцев назад
No, I don't see that option. It looks like Unsloth has Hugging Face assets for fine-tuning Qwen and Yi through.
@nimesh.akalanka 9 месяцев назад
How can I fine-tune the LLAMA 3 8B model for free on my local hardware, specifically a ThinkStation P620 Tower Workstation with an AMD Ryzen Threadripper PRO 5945WX processor, 128 GB DDR4 RAM, and two NVIDIA RTX A4000 16GB GPUs in SLI? I am new to this and have prepared a dataset for training. Is this feasible?
@nodematic 9 месяцев назад ⁺¹
The approach highlighted in the video may work if your dataset doesn't have a very high token count. Just download the notebook and run it on your local machine. I haven't tried A4000s, but it's CUDA+Ampere technology, so should work similarly.
The fine-tuning would need to stay within 16 GB GPU RAM since the open source, free unsloth doesn't include multi-GPU support.
@nimesh.akalanka 9 месяцев назад
@@nodematic Thank you for the clarification
@Vexxter09 10 месяцев назад
hi can you make a follow up video where you save the GGUF finetuned version for it directly? I tried importing llama.cpp using git clone but it didnt seem to work, really looking forward to it keep up the good work!
@nodematic 10 месяцев назад
Yes, we'll try to make a video on this. Thanks for the suggestion.
@alpapie 10 месяцев назад
very good video.
@SaiyD 10 месяцев назад ⁺¹
can you do a tutorial how make a dataset parquet for chat template like chatml or others?
@nodematic 10 месяцев назад ⁺¹
Yeah, we'll try to make a tutorial on this. Thanks for the suggestion.
@nodematic 9 месяцев назад ⁺¹
It looks like you saw the new video (thanks for the comment). Posting the link here for others ruclips.net/video/9BN9Wz9azNg/видео.html.
@tmangono 10 месяцев назад
Thanks!
@Tony_Indiana 10 месяцев назад
I like my LLMs with 2 shots of tequila and a lot less moderation.... #FineTuned

Следующие

Автовоспроизведение

Llama 3 Fine Tuning for Dummies (with 16k, 32k,... Context)