I think this is the best channel for people who wants to use LLMs. Most of the other creators on YT are just reading Jupiter Notebooks live (which is something I can perfectly do on my own), but your channel is the only one which goes into enough level of details to be able to understand and learn. Please never stop with these videos 🙏I know it's a niche but this is the type of content useful for businesses which goes beyond the hype.
A tip, in my tests it's better to either use low temp and low min p to ensure a larger pool of valid tokens or a higher temp with high min p to ensure a small pool but with most tokens with close proximity, I found great results for math putting temp at 0.02, top p at 0.98, min p at 0.02, press penalty at 0.91 and rep penalty at 1.01, it gives accurate responses without entering in a loop
For anything beyond math, you can have great results just using temp and min p and testing them out, as you lower temp you lower min p, and as you increase temp you increase min p, you can find some demos online showing these params effects, min p 0.1 at temp 1 already cuts the pool drastically, you tested with min p at 0.2, for that I would say it would be better match with a temp of 1.2
Would be good to see Monte Carlo tree search, scoring each reasoning step… I’m toying with the idea of a genetic algorithm variation to Monte Carlo tree search… but some way to do all this locally using ollama, and also a way to fine tune a local model to produce reasoning steps based on the discovered. Best scored reasoning steps.
Howdy. Yeah part 2 will start doing scoring/voting. For MTCS check out that video. Btw you can indeed do this locally if you have a GPU. What are you on - a Mac?
it would except the numerator and denominator cancel in such a way that you're just left with the single most likely logit as temperature goes to zero.
Super helpful recap in more detail then you would think on a daily basis! Would you consider doing a video about fine tuning an LLM for proper text classification? For example Llama 3.2 8b to classify documents, and return proper probabilities (so model can be analysed and improved over time). It can be done by changing the last network layer using AutoModelForSequenceClassification, just can not find any examples for Llama 3.2 yet. No worries if it ain't your cup of tea ;)
@@TrelisResearch nice, would be great to see the comparison of accuracy between the chat completion/prompt-based classification and softmax-based/traditional classification layer using the same model. Honestly even llm providers we work with are not sure ;)
Hi, i have a code completion tool, what do you think is the best configuration for the model to do a code completion? to be more fast and accurate? Very nice video! Thank you so much!
I’d say add the chain of thought . So thought , then completion. Btw this can now be done with structured responses on vLLM. I’ll show in the next video.
Hey @TrelisResearch, I've a request slightly off topic. Can you share the original dockerfile for your RunPod one-click templates. I'm new to this field and I want to learn how to build a docker image for an inference engine like TensorRT-LLM. This sort of docker image building can also be used to deploy training scripts in my case. If it's not much trouble for you, can you please make a short video on the building of docker image for inference engine like TensorRT-LLM and saving it as a one-click template which can be readily deployed and expose an API endpoint.
Howdy, if you click on a template the docker template is listed. You can use that to find the public docker file. Once you find that org, you can go back and typically find the docker file on github. Any custom dockerfiles I have made are in the public one-click-template repo on github under TrelisResearch. And yeah I've had this request so will note it as a bit higher priority for a new vid.
Outstanding talk ! Please keep it up.
thanks, appreciate it
I think this is the best channel for people who wants to use LLMs.
Most of the other creators on YT are just reading Jupiter Notebooks live (which is something I can perfectly do on my own), but your channel is the only one which goes into enough level of details to be able to understand and learn.
Please never stop with these videos 🙏I know it's a niche but this is the type of content useful for businesses which goes beyond the hype.
Appreciate it
Excellent content as usual, thanks mate!
cheers
A tip, in my tests it's better to either use low temp and low min p to ensure a larger pool of valid tokens or a higher temp with high min p to ensure a small pool but with most tokens with close proximity, I found great results for math putting temp at 0.02, top p at 0.98, min p at 0.02, press penalty at 0.91 and rep penalty at 1.01, it gives accurate responses without entering in a loop
For anything beyond math, you can have great results just using temp and min p and testing them out, as you lower temp you lower min p, and as you increase temp you increase min p, you can find some demos online showing these params effects, min p 0.1 at temp 1 already cuts the pool drastically, you tested with min p at 0.2, for that I would say it would be better match with a temp of 1.2
Also consider some long context test, in my tests top k somehow degrade responses quality the longer context goes
Thanks for all these tips. Make sense intuitively to me
Would be good to see Monte Carlo tree search, scoring each reasoning step… I’m toying with the idea of a genetic algorithm variation to Monte Carlo tree search… but some way to do all this locally using ollama, and also a way to fine tune a local model to produce reasoning steps based on the discovered. Best scored reasoning steps.
Howdy. Yeah part 2 will start doing scoring/voting.
For MTCS check out that video.
Btw you can indeed do this locally if you have a GPU. What are you on - a Mac?
18:55 How will the scaling work when temp is 0? Will it not gice a divded by 0 error?
it would except the numerator and denominator cancel in such a way that you're just left with the single most likely logit as temperature goes to zero.
Super helpful recap in more detail then you would think on a daily basis!
Would you consider doing a video about fine tuning an LLM for proper text classification? For example Llama 3.2 8b to classify documents, and return proper probabilities (so model can be analysed and improved over time).
It can be done by changing the last network layer using AutoModelForSequenceClassification, just can not find any examples for Llama 3.2 yet.
No worries if it ain't your cup of tea ;)
yeah, that's not a bad idea. I'll add to my list.
@@TrelisResearch nice, would be great to see the comparison of accuracy between the chat completion/prompt-based classification and softmax-based/traditional classification layer using the same model. Honestly even llm providers we work with are not sure ;)
thumbnails are good
Thanks!
I made them using ruclips.net/video/ThKYjTdkyP8/видео.html
@@TrelisResearch wow that is great. i thought you made it using some photos editing app
Hi, i have a code completion tool, what do you think is the best configuration for the model to do a code completion? to be more fast and accurate? Very nice video! Thank you so much!
I’d say add the chain of thought . So thought , then completion.
Btw this can now be done with structured responses on vLLM. I’ll show in the next video.
Hey @TrelisResearch, I've a request slightly off topic. Can you share the original dockerfile for your RunPod one-click templates. I'm new to this field and I want to learn how to build a docker image for an inference engine like TensorRT-LLM. This sort of docker image building can also be used to deploy training scripts in my case. If it's not much trouble for you, can you please make a short video on the building of docker image for inference engine like TensorRT-LLM and saving it as a one-click template which can be readily deployed and expose an API endpoint.
Howdy, if you click on a template the docker template is listed. You can use that to find the public docker file. Once you find that org, you can go back and typically find the docker file on github.
Any custom dockerfiles I have made are in the public one-click-template repo on github under TrelisResearch.
And yeah I've had this request so will note it as a bit higher priority for a new vid.
@@TrelisResearch Can you please make the video for TensorRT-LLM engine deployment using the triton inference server. Thanks.