Test Time Compute, Part 1: Sampling and Chain of Thought

Поделиться
HTML-код
  • Опубликовано: 14 дек 2024

Комментарии • 25

  • @nguyenanhnguyen7658
    @nguyenanhnguyen7658 20 дней назад

    Outstanding talk ! Please keep it up.

  • @eugeniosegala
    @eugeniosegala 2 месяца назад

    I think this is the best channel for people who wants to use LLMs.
    Most of the other creators on YT are just reading Jupiter Notebooks live (which is something I can perfectly do on my own), but your channel is the only one which goes into enough level of details to be able to understand and learn.
    Please never stop with these videos 🙏I know it's a niche but this is the type of content useful for businesses which goes beyond the hype.

  • @JaredWoodruff
    @JaredWoodruff 2 месяца назад +1

    Excellent content as usual, thanks mate!

  • @3750gustavo
    @3750gustavo 21 день назад

    A tip, in my tests it's better to either use low temp and low min p to ensure a larger pool of valid tokens or a higher temp with high min p to ensure a small pool but with most tokens with close proximity, I found great results for math putting temp at 0.02, top p at 0.98, min p at 0.02, press penalty at 0.91 and rep penalty at 1.01, it gives accurate responses without entering in a loop

    • @3750gustavo
      @3750gustavo 21 день назад

      For anything beyond math, you can have great results just using temp and min p and testing them out, as you lower temp you lower min p, and as you increase temp you increase min p, you can find some demos online showing these params effects, min p 0.1 at temp 1 already cuts the pool drastically, you tested with min p at 0.2, for that I would say it would be better match with a temp of 1.2

    • @3750gustavo
      @3750gustavo 21 день назад

      Also consider some long context test, in my tests top k somehow degrade responses quality the longer context goes

    • @TrelisResearch
      @TrelisResearch  21 день назад

      Thanks for all these tips. Make sense intuitively to me

  • @nathank5140
    @nathank5140 2 месяца назад +2

    Would be good to see Monte Carlo tree search, scoring each reasoning step… I’m toying with the idea of a genetic algorithm variation to Monte Carlo tree search… but some way to do all this locally using ollama, and also a way to fine tune a local model to produce reasoning steps based on the discovered. Best scored reasoning steps.

    • @TrelisResearch
      @TrelisResearch  2 месяца назад

      Howdy. Yeah part 2 will start doing scoring/voting.
      For MTCS check out that video.
      Btw you can indeed do this locally if you have a GPU. What are you on - a Mac?

  • @PraveenKumar-bo7fw
    @PraveenKumar-bo7fw Месяц назад

    18:55 How will the scaling work when temp is 0? Will it not gice a divded by 0 error?

    • @TrelisResearch
      @TrelisResearch  Месяц назад

      it would except the numerator and denominator cancel in such a way that you're just left with the single most likely logit as temperature goes to zero.

  • @alchemication
    @alchemication 2 месяца назад

    Super helpful recap in more detail then you would think on a daily basis!
    Would you consider doing a video about fine tuning an LLM for proper text classification? For example Llama 3.2 8b to classify documents, and return proper probabilities (so model can be analysed and improved over time).
    It can be done by changing the last network layer using AutoModelForSequenceClassification, just can not find any examples for Llama 3.2 yet.
    No worries if it ain't your cup of tea ;)

    • @TrelisResearch
      @TrelisResearch  2 месяца назад

      yeah, that's not a bad idea. I'll add to my list.

    • @alchemication
      @alchemication 2 месяца назад

      @@TrelisResearch nice, would be great to see the comparison of accuracy between the chat completion/prompt-based classification and softmax-based/traditional classification layer using the same model. Honestly even llm providers we work with are not sure ;)

  • @TemporaryForstudy
    @TemporaryForstudy 2 месяца назад

    thumbnails are good

    • @TrelisResearch
      @TrelisResearch  2 месяца назад

      Thanks!
      I made them using ruclips.net/video/ThKYjTdkyP8/видео.html

    • @TemporaryForstudy
      @TemporaryForstudy 2 месяца назад

      @@TrelisResearch wow that is great. i thought you made it using some photos editing app

  • @btaranto
    @btaranto 2 месяца назад

    Hi, i have a code completion tool, what do you think is the best configuration for the model to do a code completion? to be more fast and accurate? Very nice video! Thank you so much!

    • @TrelisResearch
      @TrelisResearch  2 месяца назад +1

      I’d say add the chain of thought . So thought , then completion.
      Btw this can now be done with structured responses on vLLM. I’ll show in the next video.

  • @savanthtadepalli3968
    @savanthtadepalli3968 2 месяца назад

    Hey @TrelisResearch, I've a request slightly off topic. Can you share the original dockerfile for your RunPod one-click templates. I'm new to this field and I want to learn how to build a docker image for an inference engine like TensorRT-LLM. This sort of docker image building can also be used to deploy training scripts in my case. If it's not much trouble for you, can you please make a short video on the building of docker image for inference engine like TensorRT-LLM and saving it as a one-click template which can be readily deployed and expose an API endpoint.

    • @TrelisResearch
      @TrelisResearch  2 месяца назад +1

      Howdy, if you click on a template the docker template is listed. You can use that to find the public docker file. Once you find that org, you can go back and typically find the docker file on github.
      Any custom dockerfiles I have made are in the public one-click-template repo on github under TrelisResearch.
      And yeah I've had this request so will note it as a bit higher priority for a new vid.

    • @savanthtadepalli3968
      @savanthtadepalli3968 2 месяца назад

      @@TrelisResearch Can you please make the video for TensorRT-LLM engine deployment using the triton inference server. Thanks.