The Best Tiny LLMs

Trelis Research

Просмотров 14 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 4 ноя 2024

Комментарии • 53

@TrelisResearch 8 месяцев назад
** GGUF Not working **
Towards the end of the video, I state that the issue with function calling with GGUF is due to the prompt format. However, the issue is that the GGUF model (unlike the base model) is responding with incorrectly formed json objects.
There appears to be an issue with the GGUF quantization that I need to resolve. I'll updated here once resolved.
@stevdodd7515 5 месяцев назад
Your relaxed conversation style makes it seem like AI is just following a series of if-else statements.😁😁😁
@AP-hv5dh 10 месяцев назад ⁺²
As usual pure 🔥. Thanks for putting the time and energy into your outstanding didactic videos🙌🏾
@footube3 10 месяцев назад ⁺³
Thanks for the amazing content! Based on my downloads of various quantised models from The Bloke, 5-bit quantisation would seem to be the sweet spot if you want reduced memory usage, but you still care about quality.
@RonanMcGovern 10 месяцев назад ⁺¹
Yeah - it’s not as though there is one point that perplexity suddenly drops off. Very roughly, I say 8bit but yeah some 6 or 5 bit quants are good too.
@gautammandewalker8935 7 месяцев назад
You post most valuable content on AI/ML.
@agusavior_channel 10 месяцев назад ⁺²³
Tiny LLMs = Tiny Large LMs = (Tiny Large) LMs = LMs
@TrelisResearch 10 месяцев назад ⁺³
😂 Keep an eye out for an upcoming video on Large TLLMs
@PremierSullivan 10 месяцев назад ⁺¹
Medium Language Models
@BrentMalice 7 месяцев назад ⁺¹
... whoa
@eugenetapang 10 месяцев назад ⁺¹
Thank you so much! Youre research is highly appreciated, and this video solves the feasibility question mark in my mind! Looking forward to digging into your company and vids. 👍👍👍🎆
@maslaxali8826 9 месяцев назад
Awesomme video. Phi 2 is now available for commercial use under MIT License.
@alchemication 10 месяцев назад ⁺³
Man, how do you come up with ideas for the new videos?! This is pure gold! Would you consider doing something with non-english languages (considering Europe has a nice mixture of those)? I'm wondering if this is even something I should be thinking about when fine tuning open LLMs...
@TrelisResearch 10 месяцев назад ⁺¹
Cheers! What language? And what topic?
@alchemication 10 месяцев назад
Cheers. I am thinking about French, German, Italian, Spanish and Polish (and English obviously). But even a clue on how to deal with one extra language would be nice. I don't speak theese languages, which even makes it more "fun". I have a custom dataset of 1000 FAQ-style question/answer pairs, currently in English, so that would be an example use case to play with..
@alchemication 10 месяцев назад
Damn, did my reply not show up here? I must be loosing my mind... In general I deal with English, but also German, French, Italian, Spanish and Polish. Even seeing how to fine tune for 1 non-english language could be very interesting, what are some best practices, limitations, etc. I do have a custom dataset of ~1000 FAQ-style question/answer pairs (upsampled by GPT-4 from original ~150 questions/answers).
@nirsarkar 10 месяцев назад ⁺¹
Another great video.
@TrelisResearch 10 месяцев назад
UPDATE: Phi-2 is now available - incl. for commercial use - under an MIT license!
@BoominGame 9 месяцев назад ⁺¹
I have a 2018 16inch with a x86 but 32 gig of RAM, and I can run the hell out of solar, deep seek and mixtral simultaneously, with ollama or 1 at the time with jan (slower).
@easyaistudio 10 месяцев назад ⁺¹
lets agree they are called SLMs as in SLiM unless you want to start using the metric system of pico, nano, micro, milli, etc 😄 in 5 years "big" as in "big data" will be considered small compared to the biggest
@StevenPack-nh9ns 5 месяцев назад
Can you add some train videos?like distributed training with deepspeed...can ray used to distributed training?
@TrelisResearch 5 месяцев назад
check out the trelis fine-tuning playlist, you'll see a multi-gpu video there
@johnade-ojo2917 10 месяцев назад
Great insights. Would low rank training would be useful for narrow tasks like text classification for example?
@TrelisResearch 10 месяцев назад ⁺¹
Yes! Very effective for training for classification - the basic premise is the same as training for function calling (take a look at the recent vid and also the older vid on structured responses).
@todordonev 10 месяцев назад
Nice video as always, for function calling I am using NexusRaven V1 on a 1070ti and I think its better than gpt4.
PS Im using Ollama for inference.
@TrelisResearch 10 месяцев назад
thanks for the tips, I'll dig in on those
@todordonev 10 месяцев назад
@@TrelisResearch Its super fast
@sherpya 8 месяцев назад
it would be nice a video about groq (not grok) but I don't know how many infos there are around at the moment
@TrelisResearch 8 месяцев назад
yeah, I'm kind of tracking it, but they don't give a way to inference a custom model yet afaik, once they do, I think that would def be interesting
@thisurawz 10 месяцев назад
Can you do a video on finetuning a multimodal LLM (Video-LlaMA, LLaVA, or CLIP) with a custom multimodal dataset containing images and texts for relation extraction or a specific task? Can you do it using open-source multimodal LLM and multimodal datasets like video-llama or else so anyone can further their experiments with the help of your tutorial. Can you also talk about how we can boost the performance of the fine-tuned modal using prompt tuning in the same video?
@TrelisResearch 10 месяцев назад ⁺¹
yeah i wanna do a vid on multi-modal. I tried out llava and was unimpressed by performance versus openai, so I thought I would delay a little bit. I'll revisit soon
@webizfabulous2535 10 месяцев назад ⁺¹
Hey , I have a question. I trained a tokenizer changing the length of the tokenizer , then did peft + qlora ( embedding, lm head and qkv) fine tuning. But the model does not perform well? Is it because of the lack of datasets? Or because i have changed the dimensions?
@TrelisResearch 10 месяцев назад
I'd need more info to say...
- What kind of dataset were you using and training for what application?
- Did you merge the LoRA onto the base model you trained? (you have to be careful not to lose the updated embed and lm head layers).
- when changing the embeddings settings you have to update both the tokenizer and model.
The best video for all of this is the one I did on Chat Fine-tuning.
@webizfabulous2535 10 месяцев назад
@@TrelisResearch okay i will watch the video and i come back to you thanks 🙏
@CrypticConsole 10 месяцев назад
do you know if the advanced inference supports native logit biasing and constrained generation via the API
@anirudhsarma937 10 месяцев назад
great video guys, can someone help me understand when you use just the lora adapter weights for inference and when you merge the lora weights to the original model
@TrelisResearch 10 месяцев назад
generally it's best to merge because inference is slower unmerged (there's an extra addition step to apply the adapter).
The reason not to merge is that you can store the adapter (which is small) separately [if that's useful].
@anirudhsarma937 10 месяцев назад
@@TrelisResearch thanks for your reply. got it. please continue with your content, it has helped me a lot
@VijayDChauhaan 10 месяцев назад
Hi Ronan could you please do tutorial on GuardRails?
@TrelisResearch 10 месяцев назад ⁺²
interesting idea, let me add that to the list of potential vids
@fontenbleau 10 месяцев назад
you remind me Andrey Karpathy
@MProtik 2 месяца назад
can i run the fine tuned deepseec llm in raspberry pi 4 with 4gb ram.
plz reply. i need to know
@TrelisResearch 2 месяца назад
it's probably going to be really slow, but perhaps your best option is to look at llamafile, because that is properly optimised for cpu. Possibly you could also try models like smollm.
@JordanKaufman 10 месяцев назад
What's the best way to get clients for these types of solutions?
@TrelisResearch 10 месяцев назад
Howdy, Are you asking how to come up with applications for tiny LLMs? i.e. use cases/markets where having tiny llms is useful?
@konstantinlozev2272 10 месяцев назад
Even a second hand GTX 1070 laptop would be able to handle the 4 bit quantised variant.
@DJ369-Miami 9 месяцев назад
Can any be loaded on a iPhone ?
@TrelisResearch 9 месяцев назад
In principle yes, although I haven’t dug into that yet. I’ll add to my potential videos list
@とふこ 8 месяцев назад
I using MLC chat / chatterUI on android but i think they have iphone versions too.
@sherpya 8 месяцев назад
in the meanwhile phi 2 model changed licensing to a permissive one
@AlexBerg1 10 месяцев назад ⁺¹
Do you find Mozilla's Llamafile project interesting or useful? As someone who dabbles, I'm still not sure how to think about it.
@TrelisResearch 10 месяцев назад
Thanks for sharing . I just had a look and it looks like a strong option to get a chat going. Would be nice if they add Phi as an option . As you saw in this vid a 4bit quant is still too big for my machine .
Btw when llamcpp is installed and you run ./server - there’s also a simple chat interface on the localhost port

Следующие

Автовоспроизведение

Serve a Custom LLM for Over 100 Customers