LLM Prompt Engineering with Random Sampling: Temperature, Top-k, Top-p

Поделиться
HTML-код
  • Опубликовано: 9 фев 2025

Комментарии • 23

  • @datamlistic
    @datamlistic  Год назад

    Wondering how you can fine-tune LLMs? Take a look here to see how this is done with LoRa, a popular fine-tuning mechanism: ruclips.net/video/CNmsM6JGJz0/видео.html
    VIdeo mistakes:
    - At 2:30 the sum should be for j, not for i. Thanks @mriz for noticing this!
    - The probability distribution after selecting top-3 words at 4:10 is not accurate, and they should be sunny - 0.46, rainy - 0.38, the - 0.15. Thanks @koiRitwikHai for noticing this!

  • @stev__8881
    @stev__8881 10 месяцев назад +1

    Great introduction with a clear an simple explanation/ illustration. Thanks!

    • @datamlistic
      @datamlistic  9 месяцев назад

      Thanks! Glad you found it helpful! :)

  • @waiitwhaat
    @waiitwhaat 9 месяцев назад

    This is a really clear explanation in this concept. Loved it. Thanks!

    • @datamlistic
      @datamlistic  9 месяцев назад +1

      Thanks! Happy to hear that you liked the explanation! :)

  • @이수연-p1f9n
    @이수연-p1f9n 8 месяцев назад +1

    Thanks! Top p and Top k were easy to understand.

    • @datamlistic
      @datamlistic  8 месяцев назад

      You're welcome! I'm glad to hear that those concepts were clear and easy to understand. If you have any more questions or need further clarification on this topic, feel free to ask! :)

  • @starsmaker9964
    @starsmaker9964 7 месяцев назад

    video helped me a lot! thanks

  • @igordias8728
    @igordias8728 11 месяцев назад +1

    Hello, in TOP-P, witch of the 4 words will be chosen? It's randomly between "sunny", "rainy", "the" and "good"?

    • @datamlistic
      @datamlistic  11 месяцев назад +1

      Yes, it's random according to their distribution.

    • @Annaonawave
      @Annaonawave 10 месяцев назад +1

      @@datamlistic so they are randomly selected, but higher probable values have higher chance of being selected?

    • @datamlistic
      @datamlistic  10 месяцев назад +1

      @@Annaonawave exactly :)

  • @nizhsound
    @nizhsound Год назад

    Thank you for the video and explanation between the three types of sampling for LLMs. When sampling between Temperature, Top-K and Top-P, are you using or enabling all three sampling methods at the same time?
    For example if I chose to do Top-K sampling for controlled diversity and reduced nonsense, does that mean that I will choose a low temperature as well?

    • @datamlistic
      @datamlistic  Год назад

      Glad it was helpful! Yes, you can combine multiple sampling methods at the same time. :)

  • @varadarajraghavendrasrujan3210
    @varadarajraghavendrasrujan3210 8 месяцев назад

    Let's say I use top_k=4, does the model sample 1 word out of the 4 most probable words randomly? If not, what happens?

    • @datamlistic
      @datamlistic  8 месяцев назад

      That's exactly what happens! The model samples 1 word out of the most probable 4, according to their distribution. (i.e. the higher the probabaility of a word, the more likely it is to sample it).

  • @matthakimi3132
    @matthakimi3132 7 месяцев назад

    Hi there, this was a great introduction. I am working on a recommendation query using Gemini; would you be able to help me fine-tune for the optimal topK and topP? I am looking for an expert in this to be an advisor to my team.

    • @datamlistic
      @datamlistic  7 месяцев назад

      Unfortunately my time is very tight right now since I am working full time as well, so I can't commit to anything extra. I could however help you with some advice if you can provide more info.

  • @koiRitwikHai
    @koiRitwikHai Год назад

    The probability distribution you get after selecting top-3 words at 4:10 is not accurate. The probabilities, after normalizing the 3-word-window, should be sunny-0.46, rainy-0.38, and the-0.15.

    • @datamlistic
      @datamlistic  Год назад +1

      Yep, that's correct. Thanks for the feedback! I created/recorder the video over a longer period of time and it seems that I used two version of numbers in doing that (forgot to make any updates). I'm sorry if this has caused any confusion. I will add some corrections about this issue in the description/pinned comment.
      p.s. Maybe it would be a good idea to take the ceil of one of the probabilities you enumerated, so they sum up to 1.

  • @mriz
    @mriz 11 месяцев назад

    2:3
    bro you wrong the sums is not for input i , but for j

    • @datamlistic
      @datamlistic  11 месяцев назад +1

      Yep, that's correct. Thanks for the feedback and sorry if this confused you! I will add a note about this mistake in the pinned comment. :)