Meta's Llama3 - The Mistral Killer?

Decoder

Просмотров 1,9 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 20 сен 2024

Комментарии • 12

@TimothyMusson 11 дней назад
I'm really impressed with the 27B version of Gemma2. It's working well for me as a usefully competent Russian language conversation partner/tutor, which is pretty amazing for something small enough to run locally. Mistral (7B) and Llama3 (8B) weren't quite sharp enough.
@proterotype 4 месяца назад
Great stuff as usual. Some good insights about Llama3. You know what I’ve noticed I have trouble with, preparing a dataset for LoRA training for the different Model types (e.g. Mistral or Llama). Especially with the Template formats including things like [inst] and [/inst] and how each variant uses them. You might be the man to clear this up for me!
@decoder-sh 4 месяца назад ⁺¹
I’m curious to learn more about your fine tuning setup! Also are you fine tuning from the base model or the instruct tuned version? Llama3 instruct is using a bit of a unique format as far as I can tell huggingface.co/blog/llama3#how-to-prompt-llama-3
@proterotype 4 месяца назад
I just got access to the Llama3-base model and downloaded it (using your ‘Import Open Source Models to Ollama’ video!). That link you shared in your comment was the exact Template format I was looking for. But your question abt the Base model inspired me to pull that model too. I think I’ll try training Llama3 Instruct using LoRA with a rank of 8. Then tackle the Base training. First I’ve gotta prep the dataset lol
@somerset006 4 месяца назад ⁺¹
I think that the conclusion here is that either model is not capable enough, in that the results will always be rather unpredictable. Maybe they are both good candidates for finetuning to be able to answer a smaller range of questions correctly or extract the right information from documents reliably? I'd be curious to know that.
@ts757arse 4 месяца назад
I've been trying this and have been a little disappointed. I think that's due to the unreasonable hype. At the moment, I'm using Mixtral and getting around 7t/s which is perfectly adequate for my needs. The 8B model just isn't there, and Meta did say that there simply isn't room in the model for all the detail and knowledge.
I do have the 70B model and I can run it locally as I have 128GB of RAM. But, that's running on CPU and it's just too slow to perform any real validation to consider using properly.
At the moment, unless Meta bring out something in the middle or until MoE versions of this come out, Mixtral is performing far too well for me to go to the effort of replacing it.
@decoder-sh 4 месяца назад
I bet you could find some Llama3 MoE models on huggingface, I would expect that they could perform better than even mixtral. With that said, I'd expect an MoE to beat an 8B model every time.
What quantization was your 70B model?
@ts757arse 4 месяца назад ⁺¹
@@decoder-sh I downloaded Q4_K_M and Q6 I think. But running entirely on CPU (no help at all getting my GPU involved and I think it ran slower with offload), it's no surprise that it was around 1t/s!
I shall have a gander on hugging face. The other requirement for me is it being uncensored, which limits things a fair bit. It's possible that the pruning of the training data to remove potentially harmful content will mean that it's less useful for me even when the Dolphin fine tune is applied.
I think there's a middle of the road size of model coming. I suspect that'll be the most interesting.
Just this evening I managed to finally get Mixtral+RAG to do exactly what I needed it to do and produce around an hour worth of work with a single prompt. I think I'll work on scaling that process, given the potential there, and see where the community takes Llama-3.
@decoder-sh 4 месяца назад
I know that Llama3 8b dolphin is already out. I haven't used mergekit yet someone will make a llama3 dolphin 16b merge soon I bet.
May I ask what your use case is for your mixtral+RAG setup? An hour of work saved is huge, congrats on your progress!
@ts757arse 4 месяца назад ⁺¹
@@decoder-sh yeah, I think we've spoken before. I have a physical security company and we do assessments and consultancy. A very time consuming part of that is writing up operational guidance which varies from place to place.
So, for preventing / spotting hostile recon, the guidance will be very different for a data centre, stadium or school.
The RAG database has a lot of texts in it, including guidance on hostile recon from several sources (including me). With a single prompt, it'll build that into a decent starter document. I can easily spend half a day on each document, so having that starting point saves a significant chunk of time. It's not the finished product but it easily shaves an hour off the task.
The next step is to see if I can put my site visit data into a structured document and have it generate specific guidance for each client.
The problem with all this is that it sounds so simple, but fiddling with it all until you get the results you want takes a lot of time.
Previously the set up was to help with coverage; ensuring I'd considered all angles of approach for a client. For that it has been incredibly effective.
@decoder-sh 4 месяца назад
@@ts757arse Oh yeah of course! Good to hear from you again. Do you use a different prompt for each section of your output document? I'm making up sections, but eg would you have one prompt for the "electronic network defense" section and a different prompt for "physical access defense" section?

Следующие

Автовоспроизведение

RAG from the Ground Up with Python and Ollama