Thank you so much for this video. Creating dataset from gpt to fine tune other open source model was smart move. It helped to create my custom dataset for mistral 7b
Thank you so much! That did the trick. I originally ran the original code exposed in the video and didn't learn the fine tuned data. I made the change you suggested and now it works as it should.
Amazingly good example of text classification by an LLM! It also is a great tutorial on fine tuning using PEFT with LoRA. I really like this because one can directly verify the inference (i.e. color) with one’s own eyes.
Just add another 16 gigs and you'll be able to run Mixtral 8x7B just fine, 4 bits gguf quantized. I run it on 32 GB CPU only x86_64 mini PC machine (quite recent AMD Ryzen AM5 APU) and it runs amazingly
@@alx8439, nice to hear. Unfortunately, while Macs have amazing capabilities (look into their shared memory model sometime), you're essentially fixed with the memory you purchase it with. I bought it with 16 gigs of RAM, and it will have that until I get a new machine.
A really helpful video. Thank you! I had one question though, when fine-tuning you loaded the model in a quantized manner, whereas while inference you loaded the original model. Any specific reason behind the same? Wouldn't fine-tuning with the non quantized model be considered better?
I have the problem that my model often "forgets" the last number of the hexadecimal code. Depending on the input, I sometimes get a correct hexadecimal code and sometimes total nonsense because the last number is missing. Do you happen to know the reason for this (I'm completely new to ML)? I have trained the model for three epochs and otherwise left all parameters the same, as you do in your video. Apart from that, I only changed the prompt style so that it works properly. Which parameters would you advise me to play with first to get better results? Should I adjust the learning rate or perhaps train more epochs?
Brilliant! Thank you! If you (or someone) can help me refine my understanding of LoRAs: do you need to merge a LoRA with either a base or a fine-tuned model in order to get use out of it, or can the LoRA be useful independently?
You will need to merge it back with the model for it to work. But the beauty is that you can train multiple LoRAs for different tasks and use them with the base model. Looking at LoRA for Stablediffusion models. really neat implications there.
@@engineerprompt Do you have to unload the base model each time you want to use a lora? Or can I have a base model that persists in vram for each LoRa loaded?
@@engineerprompt, thank you. I'm trying to put together some business-focused presentations on implementation of AI. Businesses want LLMs trained on their own (corporate) data, but don't want to get stuck on the 'best model of today' and not be able to carry their data into the future with new models. I think the idea of training a LoRA adapter on a data set and merging it into a model for use, but continuing to train that LoRA adapter in the background as new corporate data emerges, then periodically merging is the right approach, merging with newer (same architecture) models as they come out. Does this sound right?
@@jdray No. You will need to retrain from scratch when a new model (same architecture) comes out. XAI / Grok have some revolutionary magic for continual live input, but no one knows what that is.
I want to fine tune on a context based question and answers dataset, what prompt template can I follow? With specific prompt templates how does the model focus on only the answer for calculating the loss?
Thank you for this video. Since you have used only input and response in text formatter, I want to add instruction as well. Among these two which one will work for my case or correct if any changes required in below text formatrer 1. f"system {instruction} user {input} assistant {response} " 2. f" {instruction} {input} {response}"
Here is what you want to use: You are a friendly chatbot who always responds in the style of a pirate How many helicopters can a human eat in one sitting?
I ran the collab notebook (run all, without changes) but i got different results. It seems that the fine tuning did not work and the results are generic.
Same, I get this output instead of the hex: user Light Orange color assistant: This is a light, warm orange color with slight tinge Time taken for inference: 2.3 seconds
It says: Please could you take a look the code? the color hex doesn't generated. Instead, it just say: user Light Orange color assistant: This is a bright and vibrant light orange shade Time taken for inference: 1.88 seconds
Hi..suppose I need to fine tune llms to create a structured summary (domain specific) while uploading the pdf file. For creating the datasets for the same, I have used chat gpt. But as there is a limit in the token size of llm, I am not able to create dataset using long documents. Can we create such a dataset using RAG? If we are creating datasets for training , then we must include the entire document and its structured summary, which will be very very lengthy. Is there any option to fine tune llm for such large documents using rag or any other technology?
There's something I don't understand about fine-tuning. Why is it that there is more than one video about it - a separate one for each model? Shouldn't the code be the same and just change the repo URL for the specific model? What would be the difference if I wanted to fine-tune say Mistral or Vicuna?
I'm not an ai expert, just an enthusiast but from what I've been able to gather it's the architectures that vary. For example, mixtral runs on a multi modal architecture which from what I understand is just many 8 smaller sized models working together. So I think, the complexities would differ from llm to llm. But probably, similar llms wrt architecture could possibly share the same stages to train.
@@engineerprompt Actually the colors example works well, I am actually using my own custom data, take a look sidhellman/constitution, is it the data. ?
@engineerprompt even on 80GB it throws OOM, i did not tuch any of the parameters, i left the playbook as it is. i have a dataset question answer like @perfectpremium5996
Thank you so much for this video. Creating dataset from gpt to fine tune other open source model was smart move. It helped to create my custom dataset for mistral 7b
The system prompt in the notebook seems to be incorrect, TinyLlama's model card says the prompt is:
f"
{input}
{response}"
I ran the notebook with it and finetuned model works surprisingly good.👍
You are right, I checked again. For some reasons, I thought its the ChatML. Thanks for pointing it out.
Thank you so much! That did the trick. I originally ran the original code exposed in the video and didn't learn the fine tuned data. I made the change you suggested and now it works as it should.
Amazingly good example of text classification by an LLM! It also is a great tutorial on fine tuning using PEFT with LoRA. I really like this because one can directly verify the inference (i.e. color) with one’s own eyes.
thank you.
Now looking for the Mistral people to release a Mixtral 8x1b model that will run on small-ish devices (my 16gb MacBook Pro, for instance).
Just add another 16 gigs and you'll be able to run Mixtral 8x7B just fine, 4 bits gguf quantized. I run it on 32 GB CPU only x86_64 mini PC machine (quite recent AMD Ryzen AM5 APU) and it runs amazingly
at that point i want 16x1b to have it specialize on many topics
@@user-qr4jf4tv2x Mixtral of experts doesn't work this way. There are no actual "dedicated experts" for different topics in there
@@alx8439, nice to hear. Unfortunately, while Macs have amazing capabilities (look into their shared memory model sometime), you're essentially fixed with the memory you purchase it with. I bought it with 16 gigs of RAM, and it will have that until I get a new machine.
@@user-qr4jf4tv2x, focus focus focus.
What you describe becomes, at some point, just a generalist, and probably no better than a single 16b model.
A really helpful video. Thank you!
I had one question though, when fine-tuning you loaded the model in a quantized manner, whereas while inference you loaded the original model. Any specific reason behind the same? Wouldn't fine-tuning with the non quantized model be considered better?
same question. you got any answer?
I have the problem that my model often "forgets" the last number of the hexadecimal code. Depending on the input, I sometimes get a correct hexadecimal code and sometimes total nonsense because the last number is missing. Do you happen to know the reason for this (I'm completely new to ML)? I have trained the model for three epochs and otherwise left all parameters the same, as you do in your video. Apart from that, I only changed the prompt style so that it works properly. Which parameters would you advise me to play with first to get better results? Should I adjust the learning rate or perhaps train more epochs?
Brilliant! Thank you!
If you (or someone) can help me refine my understanding of LoRAs: do you need to merge a LoRA with either a base or a fine-tuned model in order to get use out of it, or can the LoRA be useful independently?
You will need to merge it back with the model for it to work. But the beauty is that you can train multiple LoRAs for different tasks and use them with the base model. Looking at LoRA for Stablediffusion models. really neat implications there.
@@engineerprompt Do you have to unload the base model each time you want to use a lora? Or can I have a base model that persists in vram for each LoRa loaded?
@@engineerprompt, thank you. I'm trying to put together some business-focused presentations on implementation of AI. Businesses want LLMs trained on their own (corporate) data, but don't want to get stuck on the 'best model of today' and not be able to carry their data into the future with new models. I think the idea of training a LoRA adapter on a data set and merging it into a model for use, but continuing to train that LoRA adapter in the background as new corporate data emerges, then periodically merging is the right approach, merging with newer (same architecture) models as they come out. Does this sound right?
@@jdray No. You will need to retrain from scratch when a new model (same architecture) comes out. XAI / Grok have some revolutionary magic for continual live input, but no one knows what that is.
Thanks for the video. One question: the program sets epoch as 3 and step as 250, why the log stop at epoch = 0.47?!
Let's see some local fine-tuning. Maybe with Ollama on a Mac.
On it :)
many many thanks mister, very quick and helpful
Woah!! That was quick
I want to fine tune on a context based question and answers dataset, what prompt template can I follow? With specific prompt templates how does the model focus on only the answer for calculating the loss?
Thank you for this video. Since you have used only input and response in text formatter, I want to add instruction as well. Among these two which one will work for my case or correct if any changes required in below text formatrer
1. f"system
{instruction}
user
{input}
assistant
{response}
"
2. f" {instruction} {input} {response}"
Here is what you want to use:
You are a friendly chatbot who always responds in the style of a pirate
How many helicopters can a human eat in one sitting?
So, how would I use this model offline? In LM Studio for example.
Tell me also how you use it?can we use it on lmstudio?
I ran the collab notebook (run all, without changes) but i got different results. It seems that the fine tuning did not work and the results are generic.
I had the same experience. Not giving the color hex code.
Trained for three epochs
Same, I get this output instead of the hex:
user
Light Orange color
assistant: This is a light, warm orange color with slight tinge
Time taken for inference: 2.3 seconds
It says:
Please could you take a look the code? the color hex doesn't generated. Instead, it just say:
user
Light Orange color
assistant: This is a bright and vibrant light orange shade
Time taken for inference: 1.88 seconds
@@aggtor i got the same issue any solution ?
Hi..suppose I need to fine tune llms to create a structured summary (domain specific) while uploading the pdf file. For creating the datasets for the same, I have used chat gpt. But as there is a limit in the token size of llm, I am not able to create dataset using long documents. Can we create such a dataset using RAG? If we are creating datasets for training , then we must include the entire document and its structured summary, which will be very very lengthy. Is there any option to fine tune llm for such large documents using rag or any other technology?
Thank you so much!!! It's a really nice tutorial. ☺
Thanks for another great video.
How do you use it? After training it, you download it and load the model into ollama for example?
So you can push that to hugging face hub and then use it like any other HF model.
hello How can I run this model locally but train it from the colab?
Amazing! Can you try it on a little Documentary base (20 small PdF of 15/20 pages)?
Let me see, I am working on a pipeline that will convert text into question answer pairs for dataset generation. then can be used for training LLMs
Really helpful
great video, qq, how do you save the model as a GGUF?
Same ....
do you think we can finetune knowledge graph into this model?13b and 70b seems to be overfitting. I need to embed our knowledge graph into this
I haven't experimented with but my guess will be yes.
How can I discovery what is the format data for the input training?
There's something I don't understand about fine-tuning. Why is it that there is more than one video about it - a separate one for each model? Shouldn't the code be the same and just change the repo URL for the specific model? What would be the difference if I wanted to fine-tune say Mistral or Vicuna?
I'm not an ai expert, just an enthusiast but from what I've been able to gather it's the architectures that vary. For example, mixtral runs on a multi modal architecture which from what I understand is just many 8 smaller sized models working together. So I think, the complexities would differ from llm to llm. But probably, similar llms wrt architecture could possibly share the same stages to train.
I need this notebook, how can I get it?
Can anyone recommend any LLM/SLM fine-tuned with Financial Statements Dataset?
How do you load a local dataset instead of from huggingface?
look at an example here: ruclips.net/video/z2QE12p3kMM/видео.html
why it throws OOM, eventhough my GPU as 48 GB of memory ?
that's during training or inference? What is the batch size you are using?
@@engineerprompt Actually the colors example works well, I am actually using my own custom data, take a look sidhellman/constitution, is it the data. ?
@engineerprompt even on 80GB it throws OOM, i did not tuch any of the parameters, i left the playbook as it is. i have a dataset question answer like @perfectpremium5996
such fake accent. very irritating. can do away with it when the content is good.
didn't know people have problems with accent. Grow up man