Thank you so much for this great explanation. You make it look effortless. (It is, but only when you know it...) I envy your machine. I use an iMac with Intel Core i9 (128GB, 2TB). It's a great machine, but in speed, it's nothing compared to your baby.
On Red Hat Enterprise Linux, ilab 0.17.1, after running `ilab generate --model-family merlinite --num-instructions 100` and `ilab train --iters 300 --device=cuda`, how do I serve the automatically generated ggml-model-f16.gguf? When I use `ilab serve --model-family merlinite --model-path models/ggml-model-f16.gguf` and then use `ilab chat --model-family merlinite -m models/ggml-model-f16.gguf` the chat interface gives repetitive responses which require me to ^C out (see below)... I assume this has to do with using the incorrect model-family parameter since I'm not on mac and I'm on linux (which auto-generates the gguf file without needing to use ilab convert. Any help would be great! Thanks! >>> hello [S][default] ╭────── models/ggml-model-f16.gguf ─────────────────────────╮ ╭────── models/ggml-model-f16.gguf ─────────────────────────╮ │ Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello: │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── elapsed 2.094 seconds ─╯ >>> exit
It seems instructlab 0.17 automatically deletes the safetensors in the training_results/final directory. By downgrading to 0.15, I was able to re-run generate and train, and then use vllm to serve via the python -m vllm.entrypoints.openai.api_server --model /path/to/training_results/final --gpu-memory-utilization 0.98. I recommend adding a flag to the ilab model train command in 0.17+ versions that enables you to keep the safetensors hugging face format (maybe there is one I'm not aware of).
As soon as I have trained it, and verified it works as expected. Then how am I supposed to share this with the rest of the world? Submitting a PR I guess against the taxonomy repo or uploading the trained model under my account at HF?
very informative and detail video which clarify each and every steps and give insight on the capabilities of instructlab
really insightful for instructlab
Thanks for this great walk through of Instructlab Grant !
Thanks for the kind words.
great video, thanks for your effect!
Thank you so much for this great explanation. You make it look effortless. (It is, but only when you know it...)
I envy your machine. I use an iMac with Intel Core i9 (128GB, 2TB). It's a great machine, but in speed, it's nothing compared to your baby.
Is there a similar video but for Linux?
On Red Hat Enterprise Linux, ilab 0.17.1, after running `ilab generate --model-family merlinite --num-instructions 100` and `ilab train --iters 300 --device=cuda`, how do I serve the automatically generated ggml-model-f16.gguf? When I use `ilab serve --model-family merlinite --model-path models/ggml-model-f16.gguf` and then use `ilab chat --model-family merlinite -m models/ggml-model-f16.gguf` the chat interface gives repetitive responses which require me to ^C out (see below)... I assume this has to do with using the incorrect model-family parameter since I'm not on mac and I'm on linux (which auto-generates the gguf file without needing to use ilab convert. Any help would be great! Thanks!
>>> hello [S][default]
╭────── models/ggml-model-f16.gguf ─────────────────────────╮
╭────── models/ggml-model-f16.gguf ─────────────────────────╮
│ Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello:Hello: │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── elapsed 2.094 seconds ─╯
>>> exit
It seems instructlab 0.17 automatically deletes the safetensors in the training_results/final directory. By downgrading to 0.15, I was able to re-run generate and train, and then use vllm to serve via the python -m vllm.entrypoints.openai.api_server --model /path/to/training_results/final --gpu-memory-utilization 0.98. I recommend adding a flag to the ilab model train command in 0.17+ versions that enables you to keep the safetensors hugging face format (maybe there is one I'm not aware of).
Good stuff ❤
As soon as I have trained it, and verified it works as expected. Then how am I supposed to share this with the rest of the world? Submitting a PR I guess against the taxonomy repo or uploading the trained model under my account at HF?
if you want to share added knowledge or skills to the world, yes you would submit a PR to the taxonomy repo.