Wondering how you can fine-tune LLMs? Take a look here to see how this is done with LoRa, a popular fine-tuning mechanism: ruclips.net/video/CNmsM6JGJz0/видео.html VIdeo mistakes: - At 2:30 the sum should be for j, not for i. Thanks @mriz for noticing this! - The probability distribution after selecting top-3 words at 4:10 is not accurate, and they should be sunny - 0.46, rainy - 0.38, the - 0.15. Thanks @koiRitwikHai for noticing this!
You're welcome! I'm glad to hear that those concepts were clear and easy to understand. If you have any more questions or need further clarification on this topic, feel free to ask! :)
Thank you for the video and explanation between the three types of sampling for LLMs. When sampling between Temperature, Top-K and Top-P, are you using or enabling all three sampling methods at the same time? For example if I chose to do Top-K sampling for controlled diversity and reduced nonsense, does that mean that I will choose a low temperature as well?
That's exactly what happens! The model samples 1 word out of the most probable 4, according to their distribution. (i.e. the higher the probabaility of a word, the more likely it is to sample it).
Hi there, this was a great introduction. I am working on a recommendation query using Gemini; would you be able to help me fine-tune for the optimal topK and topP? I am looking for an expert in this to be an advisor to my team.
Unfortunately my time is very tight right now since I am working full time as well, so I can't commit to anything extra. I could however help you with some advice if you can provide more info.
The probability distribution you get after selecting top-3 words at 4:10 is not accurate. The probabilities, after normalizing the 3-word-window, should be sunny-0.46, rainy-0.38, and the-0.15.
Yep, that's correct. Thanks for the feedback! I created/recorder the video over a longer period of time and it seems that I used two version of numbers in doing that (forgot to make any updates). I'm sorry if this has caused any confusion. I will add some corrections about this issue in the description/pinned comment. p.s. Maybe it would be a good idea to take the ceil of one of the probabilities you enumerated, so they sum up to 1.
Wondering how you can fine-tune LLMs? Take a look here to see how this is done with LoRa, a popular fine-tuning mechanism: ruclips.net/video/CNmsM6JGJz0/видео.html
VIdeo mistakes:
- At 2:30 the sum should be for j, not for i. Thanks @mriz for noticing this!
- The probability distribution after selecting top-3 words at 4:10 is not accurate, and they should be sunny - 0.46, rainy - 0.38, the - 0.15. Thanks @koiRitwikHai for noticing this!
Great introduction with a clear an simple explanation/ illustration. Thanks!
Thanks! Glad you found it helpful! :)
This is a really clear explanation in this concept. Loved it. Thanks!
Thanks! Happy to hear that you liked the explanation! :)
Thanks! Top p and Top k were easy to understand.
You're welcome! I'm glad to hear that those concepts were clear and easy to understand. If you have any more questions or need further clarification on this topic, feel free to ask! :)
video helped me a lot! thanks
Glad it helped! :)
Hello, in TOP-P, witch of the 4 words will be chosen? It's randomly between "sunny", "rainy", "the" and "good"?
Yes, it's random according to their distribution.
@@datamlistic so they are randomly selected, but higher probable values have higher chance of being selected?
@@Annaonawave exactly :)
Thank you for the video and explanation between the three types of sampling for LLMs. When sampling between Temperature, Top-K and Top-P, are you using or enabling all three sampling methods at the same time?
For example if I chose to do Top-K sampling for controlled diversity and reduced nonsense, does that mean that I will choose a low temperature as well?
Glad it was helpful! Yes, you can combine multiple sampling methods at the same time. :)
Let's say I use top_k=4, does the model sample 1 word out of the 4 most probable words randomly? If not, what happens?
That's exactly what happens! The model samples 1 word out of the most probable 4, according to their distribution. (i.e. the higher the probabaility of a word, the more likely it is to sample it).
Hi there, this was a great introduction. I am working on a recommendation query using Gemini; would you be able to help me fine-tune for the optimal topK and topP? I am looking for an expert in this to be an advisor to my team.
Unfortunately my time is very tight right now since I am working full time as well, so I can't commit to anything extra. I could however help you with some advice if you can provide more info.
The probability distribution you get after selecting top-3 words at 4:10 is not accurate. The probabilities, after normalizing the 3-word-window, should be sunny-0.46, rainy-0.38, and the-0.15.
Yep, that's correct. Thanks for the feedback! I created/recorder the video over a longer period of time and it seems that I used two version of numbers in doing that (forgot to make any updates). I'm sorry if this has caused any confusion. I will add some corrections about this issue in the description/pinned comment.
p.s. Maybe it would be a good idea to take the ceil of one of the probabilities you enumerated, so they sum up to 1.
2:3
bro you wrong the sums is not for input i , but for j
Yep, that's correct. Thanks for the feedback and sorry if this confused you! I will add a note about this mistake in the pinned comment. :)