Adding Custom Models to Ollama

Matt Williams

Просмотров 35 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 26 янв 2025

Комментарии • 104

@banalMinuta 11 месяцев назад ⁺⁹
Just found this channel, you are the G.O.A.T sir!
@SlavaClass 11 месяцев назад ⁺⁷
Have wondered several times how to do this. Thanks!
One area I'm interested in is local function-calling LLMs (ideally with Ollama). Would love an explanation of how to get those working, if there are any reasonable solutions yet
@technovangelist 11 месяцев назад
That’s a great idea. Thanks
@SlavaClass 11 месяцев назад
I should also mention that typescript + typesafety would be a big plus as well, perhaps with zod. But really any info on plumbing open-source solutions would be great
@technovangelist 11 месяцев назад ⁺¹
I’m a much bigger fan of typescript. It’s so much easier for me to write than python.
@vishalnagda7 10 месяцев назад ⁺¹
This video is the solution to what exactly I was looking for. And I'm also glad to see someone else using terminal like me :P
@AlekseyRubtsov 11 месяцев назад ⁺³
Thank you for video. It's easy to get GGUF format now, and I was stupid enough to miss the fact that one can go with just Modelfile and have it all in Ollama. Thanks.
@DinoLopez 11 месяцев назад ⁺⁵
I wonder if could be possible to have very small LLM for a particular use cases, for example Java, Ruby, Python and are like modules. this way if only a particular programming is required only have to load that particular lamguage Model.
@technovangelist 11 месяцев назад ⁺²
That is exactly the scenario I am most excited about
@AlekseyRubtsov 11 месяцев назад
Thanks!
@technovangelist 11 месяцев назад
Wow. That’s the first time I have seen one of those. Thanks so much. I don’t know what to say. Thank you.
@AlekseyRubtsov 11 месяцев назад
@@technovangelist I hope, no, - I'm sure you will see more of those. Just continue.
@dib9900 7 месяцев назад ⁺²
How to produce modelfile for Embeddings models?
@_lilnuggetwithbbqsauce3615 10 месяцев назад ⁺¹
thank you so much this was really helpful
really good video 👍
@_lilnuggetwithbbqsauce3615 10 месяцев назад
I am quite new to all of this and there are some things that I don't understand. First, when I do this command hfdownloader -s : -m it tells me that hfdownloader is an unknown command even tho I downloaded the executable. Secondly, I don't understand what you mean at 1:54 when you say go to where you want your model to be downloaded since you don't show any folder folder being selected, only the terminal. Could you please explain ? thank you in advance
@technovangelist 10 месяцев назад
hfdownloader probably isn't in your path. for the second part, you have to decide where to download something, same as with downloading anything from the Internet. choose that place and run the command there.
@_lilnuggetwithbbqsauce3615 10 месяцев назад
@@technovangelist ty so much for such a quick answer, however things are still not so clear for me. How do I put hfdownloader in the path ?
@technovangelist 10 месяцев назад
I suggest you look for tutorials on how to work with your OS, especially the command line.
@ibrahimabualhaol2540 7 месяцев назад
./hfdownloader
@isaakcarteraugustus1819 8 месяцев назад
Great video! how do I add tags (what do I have to type in the terminal) in so that I can upload different quants to the same Ollama model repo?
@PaulRobello 7 месяцев назад
What markup do i need to add to the title or description to get the blue bubbles with the tag label like "7B" "70B" etc?
@Pure_Science_and_Technology 11 месяцев назад ⁺¹
The one thing that brings me anxiety in creating and using the dataset. A video would be great on that.. love your videos.
@technovangelist 11 месяцев назад ⁺¹
Can you tell me more about what you mean by that? What dataset?
@Pure_Science_and_Technology 11 месяцев назад
@@technovangelist omg! I just read what I wrote. Sorry. I meant to say, when we are required to fine-tune a model for a specific domain or required functionality. Curation of data, preparing it properly, ensuring the correct setting are used. As well as the evaluation of the dataset and testing.
@MUHAMMADQASIMMAKAOfficial 11 месяцев назад ⁺³
Very nice sharing 👍
Have a nice day 😊
@technovangelist 11 месяцев назад
Thanks
@SteveLacy-r9e 4 месяца назад
What if I created a model with a new architecture or I made an architectural tweak to an existing model? In other words, something that changes the number/type of model layers or the number of training parameters, etc. Is there a path for porting a model with this kind of customized architecture to run on Ollama? What would the process be?
@technovangelist 4 месяца назад
New architecture? No idea
@TimothyGraupmann 11 месяцев назад
(1:16) What architectures are supported? I only found these options in the comments of a PR..
1. LlamaForCausalLM
2. MistralForCausalLM
3. RWForCausalLM
4. FalconForCausalLM
5. GPTNeoXForCausalLM
6. GPTBigCodeForCausalLM
@TimothyGraupmann 11 месяцев назад
I see a lot of OCR models use `VisionEncoderDecoderModel` is that supported?
@gparakeet 11 месяцев назад
Ollama/quantize Dockerhub page lists next model architectures supported:
1. LlamaForCausalLM
2. MistralForCausalLM
3. YiForCausalLM
4. LlavaLlama
5. RWForCausalLM
6. FalconForCausalLM
7. GPTNeoXForCausalLM
8. GPTBigCodeForCausalLM
9. MPTForCausalLM
10. BaichuanForCausalLM
11. PersimmonForCausalLM
12. GPTRefactForCausalLM
13. BloomForCausalLM
@technovangelist 11 месяцев назад
I seem to remember that isn't 100% accurate.
@gparakeet 11 месяцев назад
Ollama/quantize docker page lists next architectures:
1. LlamaForCausalLM
2. MistralForCausalLM
3. YiForCausalLM
4. LlavaLlama
5. RWForCausalLM
6. FalconForCausalLM
7. GPTNeoXForCausalLM
8. GPTBigCodeForCausalLM
9. MPTForCausalLM
10. BaichuanForCausalLM
11. PersimmonForCausalLM
12. GPTRefactForCausalLM
13. BloomForCausalLM
@technovangelist 11 месяцев назад
looks like you sent that twice. Sorry, I have comments wait for me to approve. In a distant past there was a spam problem. And doing it this way I ensure that I can answer every question that comes in. The interface for comments is worse than the Ollama Discord. And I want to ensure I address everything that comes in.
@Rewe4life 3 месяца назад
Hi,
I am trying hard to get an embedding model for german language going on ollama. the architecture in the config file is named this: "BertForMaskedLM"
I assume that it does not work because the Architecture is not supported. I have two questions regarding that:
- can you tell me where I can find a list of architectures supported by ollama? I am unable to find one.
- is there a way to get it working with ollama even with the named architecture?
@technovangelist 3 месяца назад
I don' t know where that list is. and not a way that I know of.
@LevitatingSarcophagus Месяц назад
Hello Matt. I am working with an already quantized model in exl2 format. Since it is already quantized I wanted to make it compatible for working with Ollama. I created the modelfile and run the ollama create cmd but I am running into an error: "unsupported content type: unknown". Could you help me out? Or at least let me know if it is even possible to convert from exl2 to gguf?
@technovangelist Месяц назад
I don’t know of any converter. May be easier just to convert from the original to gguf then quantize. Even really big models take just a few minutes with normal hw.
@antonpictures 11 месяцев назад ⁺¹
This is nuts! How large is the ollama team? How many programmers?
@userou-ig1ze 11 месяцев назад
I think it's just this guy
@technovangelist 11 месяцев назад
There are a few in the team. But under 10. I’m not.
@giuliozeloni6684 9 месяцев назад
hey, thanks for the video. Im trying to quantize a llama3 model with the docker image showed in video, but i think it is not supported. will the docker image be updated?
@andrebarsotti 7 месяцев назад
where can I get the architectures are compatible with ollama?
@technovangelist 7 месяцев назад ⁺¹
I don't know if there is any one list anymore
@muraliytm3316 7 месяцев назад
Hello sir, whenever I run local models they are not using my VRAM, I saw its usage in taskmanager and their usage is not much, does making it default increases the speed of llm, so how can I make my VRAM as default for running any local llm. But sorry I have only 4gb VRAM nvidia gtx 1650
@technovangelist 7 месяцев назад
Looks like it supports the 1650 Ti but not the 1650. Need to upgrade to get that. Nvidia doesnt support that with their drivers.
@SIR_Studios786 8 месяцев назад
all architecture models can be converted to gguf, or any specific list , ?
@technovangelist 8 месяцев назад
Not everything but a lot of them can.
@SIR_Studios786 8 месяцев назад
@@technovangelist i need speech to text, do you know any model that can be converted to gguf, or can you help if any, i would be highly grateful
@technovangelist 8 месяцев назад
Ollama is for text to text and text/image to text. For speech to text take a look at OpenAI’s Whisper models that you can install locally.
@TimothyGraupmann 11 месяцев назад
Where can we find compatible models other than HuggingFace? Are TensorFlow HUB formats compatible?
@technovangelist 11 месяцев назад
I don’t know about that. But it’s safetensors and PyTorch models that are supported.
@JohnSigvald 3 месяца назад
Thank you for this!
@glorified3142 11 месяцев назад
I wish you can cover a video on how to finetune tinyllama for inferencing in ollama.
@technovangelist 11 месяцев назад ⁺²
Yes. I really want to do one on this.
@glorified3142 11 месяцев назад
@@technovangelist thanks in advance.
@valueray 10 месяцев назад
Whats the best file format for RAG? Is there a list what works best?
@cloudsystem3740 8 месяцев назад
how to convert gpt2 ?
@cryptowl1901 11 месяцев назад
Thank you so much for such a great video❤❤
@ToweringToska 4 месяца назад
This is way too woofin' hard for me, I thought I'd simply right-click and Save As a file on Huggingface and save it to whichever directory Ollama wants them. But I need to convert them to work, and to select a quantization type? My thoughts get hazy as soon as simply getting and placing a file requires me to learn CMD commands. = ,=;
@technovangelist 4 месяца назад
Its not for everyone. it’s a dev tool first and requires some basic cli skills...
@ToweringToska 4 месяца назад
@@technovangelist That's fine, I'm going to try out LM Studio next, Oobabooga was quick and easy to use but FlashAttention in it only supports GPUs as new as Ampere now, unlike my 1080, so I'm looking for something quicker that I can use.
@technovangelist 4 месяца назад
Omg. Ollama is sooo much easier than either of those. No question
@technovangelist 4 месяца назад
Just get the model you want from ollama. Too much work getting them from hf
@benevolencemessiah3549 3 месяца назад
@@technovangelist respectfully, I’d disagree. I have some custom finetuned and merged models and corresponding quants that I made but I can’t quite figure out how to convert them to a Modelfile. Maybe it’s the Go syntax throwing me off. But incidentally, since these are GGUF files, couldn’t the relevant instruction metadata be imported automatically? And the GGUF parameters templated into the generated Modelfile? Maybe I’m missing something, but why not just use llama.cpp/GGUF files directly? I’m quite surprised how much trouble I’m having despite Ollama being a quite popular tool.
@TimothyGraupmann 11 месяцев назад
This was such a great video just perfect for what I wanted to learn next. And this new model would work automatically with the api/chat and api/generate? Let's say we follow these steps on the Whisper model and somehow it works. The Whisper model has a translate() method. How would we add an api/translate custom endpoint to the Ollama API? Previously I used a nginx container to make a proxy so I could add my custom endpoint to that proxy. The Ollama API is golang. It would be great if there was some kind of plugin folder that I could drop Golang scripts that the API would automatically include. With the docker mount, if there was some predefined named mount point that Ollama monitored to automatically pull in your API additions that would be great!
@technovangelist 11 месяцев назад
If a model works on ollama it gets all the endpoints. But a non llava model can’t do image stuff. The whisper model doesn’t have the endpoint. It’s the runner in front that does.
@thenetgamer2 9 месяцев назад ⁺¹
This should really include some resources to go to if one runs into problems. I've noticed that a lot of these videos you do are a bit too narrow of focus to actually help people get into this, unless they were already into something of a similar nature.
@michaelallen2971 3 месяца назад
Lol I'm still stuck on alot of things from these videos. I really just wanted to do one of his recent videos incorporating web search in local llama responses. But no real instructions
@Codegix 6 месяцев назад
Thanks Matt!
@userou-ig1ze 11 месяцев назад
This guy is a beast, when does he sleep?! Keep it up!!! Thanks!!! I was just wondering today how to do that, especially since many models don't provide this standard format easy to import in ollama.
Will ollama support the openai http api format, so it can be integrated easier into autogen studio? ^^
Will ollama support easier RAG and web requests from console?
@technovangelist 11 месяцев назад ⁺¹
so will it support openai? i doubt it. Will ollama do rag on the console? that’s a bit out of scope for the project. But there are plenty of extensions that are doing it well.
@technovangelist 11 месяцев назад
re openai. just kidding, watch out for the next release video
@PleaseOpenSourceAI 11 месяцев назад
That was good! Thanks! I'd be interested to see how to make quantized llava models. Even better if it would be new moondream1 vision model by vikhyat
@technovangelist 11 месяцев назад
Interesting. Thanks.
@diegocaumont5677 4 месяца назад
Issues with architectures that aren't supported.
@diegocaumont5677 4 месяца назад
unknown architecture LlavaQwenForCausalLM
@technovangelist 4 месяца назад
Yup, can't work with architectures it doesn't know about.
@darthcryod1562 11 месяцев назад
great video! followed this video just to try running ollama with mistral and the emoji model mentioned in this video, but i see that is so painfully slow in my win11 even though i got 22GB ram and amd 6800S video card, anyone facing same issue on windows? i tried running using wsl2 and it was a bit faster but still slow compared to what these videos show. any suggestions? ran on my M1 mac 16G and its even faster than what wsl2
@technovangelist 11 месяцев назад
I assume that’s an AMD card. I don't think amd support is enabled yet on Windows
@darthcryod1562 11 месяцев назад
@@technovangelist - could be, also windowsdefender is goiing crazy with the ollama.exe detected Trojan:Script/Wacatac.B!m will remove ollama, this is concerning that installer has a trojan!
@technovangelist 11 месяцев назад ⁺¹
the ollama team is working with msft to get them to fix defender because it is broken on this. there is no trojan in ollama, or the hundreds of other tools that are using the latest go compiler that this is having a false positive on.
@user-he8qc4mr4i 9 месяцев назад
this seems like a pin in the neck!
@technovangelist 9 месяцев назад
For models not already on ollama this process is a single command and done in 5 minutes. It’s pretty painless. And this process goes away soon. It’s an old video. But it’s still faster than anything else out there
@JarppaGuru 9 месяцев назад
it allready was model. how do make own model from scratch. nobody know LOL. and we not need do it bcoz you did it and share it. some take your model and quantize again LOL. nobody start scratch
@technovangelist 9 месяцев назад
The example I showed was just the model weights. In ollama a model is everything to make it useful. The weights is just a part of it, along with the system prompt and template. There are plenty of places that show how to make a model from scratch. The downside is that no one has done it for less that $100k.
@JarppaGuru 9 месяцев назад
@@technovangelist can be done. take long. pc cost 4k how many years need train LOL. i was kinda mean how make model what has. hi mom, and ai answer hi son. it not take long. just want knoiw all commands LOL.
lets train model. use AI make question and answer then train lol. i not get it xD we allready had jarvis
@puruzsuz31 Месяц назад
please keep boomer jokes with you.
@technovangelist Месяц назад
Had to watch it again. I don’t have any jokes in this one.
@technovangelist Месяц назад
Oh was that the problem? I didn’t include any. Got it
@Simone-ek9hb 8 месяцев назад ⁺¹
PS C:\programmazione\ollama\ollama-ita> docker run --rm -v .:/model ollama/quantize -q q4_0 /model
Traceback (most recent call last):
File "/workdir/llama.cpp/convert.py", line 1658, in
/workdir/llama.cpp/gguf-py
Loading model file /model/model-00001-of-00004.safetensors
Loading model file /model/model-00001-of-00004.safetensors
Loading model file /model/model-00002-of-00004.safetensors
Loading model file /model/model-00003-of-00004.safetensors
Loading model file /model/model-00004-of-00004.safetensors
params = Params(n_vocab=128256, n_embd=4096, n_layer=32, n_ctx=8192, n_ff=14336, n_head=32, n_head_kv=8, f_norm_eps=1e-05, n_experts=None, n_experts_used=None, rope_scaling_type=None, f_rope_freq_base=500000.0, f_rope_scale=None, n_orig_ctx=None, rope_finetuned=None, ftype=, path_model=PosixPath('/model'))
main(sys.argv[1:]) # Exclude the first element (script name) from sys.argv
File "/workdir/llama.cpp/convert.py", line 1614, in main
vocab, special_vocab = vocab_factory.load_vocab(args.vocab_type, model_parent_path)
File "/workdir/llama.cpp/convert.py", line 1409, in load_vocab
path = self._select_file(vocabtype)
File "/workdir/llama.cpp/convert.py", line 1384, in _select_file
raise FileNotFoundError(f"{vocabtype} {file_key} not found.")
FileNotFoundError: spm tokenizer.model not found.
anyone?

Следующие

Автовоспроизведение