o3-mini is the FIRST DANGEROUS Autonomy Model | INSANE Coding and ML Abilities

NVIDIA Jetson Orin Nano Super COMPLETE Setup Guide & Tutorial

UV and Ruff: Next-gen Python Tooling

MARK 마크 '프락치 (Fraktsiya) (Feat. 이영지)' MV

I GOT BULLIED INTO CUTTING MY HAIR :(

Superman - Teaser Trailer Tomorrow

Llama 3.3 70B Tested LOCALLY! (First Look & Python Game Test)

Bijan Bowen

Просмотров 2 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 1 фев 2025

Комментарии • 26

@MikeTheBard Месяц назад ⁺¹
Love your videos. Keep them coming!
@Bijanbowen Месяц назад ⁺¹
Thanks very much! I will for sure.
@youroldmangaming8150 Месяц назад
Good work mate, keep it up!
@Bijanbowen Месяц назад
Thanks very much! Been meaning to send you a note!
@marcomerola4271 Месяц назад
It was a very fun and instructive video. It would be interesting to see a comparison of the same game coded by the free versions or paid of the other providers
@Bijanbowen Месяц назад ⁺¹
Thanks very much! I have used the same prompt in some separate testing videos for paid providers, though I will next time perhaps throw in a quick comparison in the video as well!
@marcomerola4271 Месяц назад
@OminousIndustries yes, you're right. Apologies for not checking before asking. But a 1 to 1 comparison would still be cool. Maybe for the future, you could paste the link of related videos in the description down below 😊
@Bijanbowen Месяц назад
@@marcomerola4271 I agree direct comparisons are a good idea. Good thought on the additional links. I will keep note of this for future videos with similar testing conditions!
@KayWessel Месяц назад
Nice test! I was planning to build a dual 3090 system, but I guess I need to reconsider. This was slower then expected. Would a dual 5090 perform somewhat better or just 2x?
@Bijanbowen Месяц назад
I can't speak to this aside from what I have experienced, but fwiw I have had exl2 70B models running in the text-gen-webui that were much faster than this. I have seen some discussion on speed differential between gguf and exl2 but I am not knowledgeable enough to make any definitive statements on this - just personal anecdotes.
Not sure how much faster, but a dual 4090 let alone 5090 should be a nice speed increase based on some of the user benchmarks I have seen on r/localllama
@jeffwads Месяц назад ⁺¹
Better late than never. Sweet model. Been using the 8bit and the 128K context really smokes.
@Bijanbowen Месяц назад
You are giving me quantization inferiority complex mentioning the 8bit LOL! it is a rather impressive model indeed.
@MontyMcRib Месяц назад
would be interesting to know the difference between llama 3.0, 3.1, 3.2, and 3.3 in 4 bit quant. i got hardware running 70b in 8 bits, but i still cant make the jump from 3.0 to 3.1 or 3.2, from my own testing it seems like 3.0 with 8k context is still superior to the 128k models (although i didnt test 3.3 yet). im testing on real world use cases.
@Bijanbowen Месяц назад
I would assume they benchmark better between .0/.1/.2/.3 etc, but like you say real world use cases are often more important than benchmarks for folks like us.
@sevilnatas Месяц назад
On these locally hosted models, it would be interesting to know how many tokens per second that you're getting back.
@Bijanbowen Месяц назад ⁺¹
Good thought, I will try to get speed results for local testing in the future.
@AmrAbdeen Месяц назад
@OminousIndustries just run : ollama run modelname --verbose
will give you full Statistics after each response, including tokens per second
@proterotype Месяц назад
Come on man, where’s that mike you’ve been talking about? I know you can afford it. /s
When you add it, your vids will level up. Thanks for the walk throughs!!!
@Bijanbowen Месяц назад
I spent the mic budget on a ChatGPT pro subscription LOL. Thanks for the kind words, I actually have a nice akg mic I used to use for music related tasks so perhaps I will hook that up to the system and use that for screen recording audio.
@sevilnatas Месяц назад
Is DO Anything managing the hosting of the local model? I am interested in running multiple gpus and am trying to figure out the best way forward, as far as performance.
@Bijanbowen Месяц назад ⁺¹
No, Ollama is handling the hosting of the local model here, Anything LLM is just providing a user interface to be able to interact with the model.
@sevilnatas Месяц назад
@@Bijanbowen Ah, good to know, thanks.
@mostwanted2000 18 дней назад
What do you recommend? Linux? (What version) or Windows?
@Bijanbowen 17 дней назад ⁺¹
I personally prefer Ubuntu, but if someone is used to windows and does not want to have to trouble shoot a lot it might not be a bad idea to stick with windows haha
@KonstantinsQ Месяц назад
How much did it took of ram?
@Bijanbowen Месяц назад
It was using about 19gb on one card and 22 on the other, so a total of about 41gb or so for this Q4K_M quant.

Следующие

Автовоспроизведение

o3-mini is the FIRST DANGEROUS Autonomy Model | INSANE Coding and ML Abilities

o3-mini is the FIRST DANGEROUS Autonomy Model | INSANE Coding and ML Abilities

NVIDIA Jetson Orin Nano Super COMPLETE Setup Guide & Tutorial

NVIDIA Jetson Orin Nano Super COMPLETE Setup Guide & Tutorial

UV and Ruff: Next-gen Python Tooling

UV and Ruff: Next-gen Python Tooling

MARK 마크 '프락치 (Fraktsiya) (Feat. 이영지)' MV

MARK 마크 '프락치 (Fraktsiya) (Feat. 이영지)' MV

I GOT BULLIED INTO CUTTING MY HAIR :(

I GOT BULLIED INTO CUTTING MY HAIR :(

Superman - Teaser Trailer Tomorrow

Superman - Teaser Trailer Tomorrow

I Spent 100 Hours for IMPOSSIBLE Dragon Race V4 in Blox Fruits!

I Spent 100 Hours for IMPOSSIBLE Dragon Race V4 in Blox Fruits!

LocalAI LLM Testing: Llama 3.3 70B Q8, Multi GPU 6x A4500, and PCIe Bandwidth during inference

LocalAI LLM Testing: Llama 3.3 70B Q8, Multi GPU 6x A4500, and PCIe Bandwidth during inference

Fine-tuning, RAG, Llama, prompt-engineering, LLM-арены | Что происходит в LLM

Fine-tuning, RAG, Llama, prompt-engineering, LLM-арены | Что происходит в LLM

NVIDIA CEO Jensen Huang's Vision for the Future

NVIDIA CEO Jensen Huang's Vision for the Future

Файнтюнинг и квантизация Llama-3 70B

Файнтюнинг и квантизация Llama-3 70B

AI Is Making You An Illiterate Programmer

AI Is Making You An Illiterate Programmer

Simple Code, High Performance

Simple Code, High Performance

Running FULL DeepSeek R1 671B Locally (Test and Install!)

Running FULL DeepSeek R1 671B Locally (Test and Install!)

[1hr Talk] Intro to Large Language Models

[1hr Talk] Intro to Large Language Models

Train an LLM From Scratch On NVIDIA Jetson Nano (Step-by-Step Guide)

Train an LLM From Scratch On NVIDIA Jetson Nano (Step-by-Step Guide)

«Большая прожарка» Джигана | Новогодний выпуск

«Большая прожарка» Джигана | Новогодний выпуск

157 бригада посыпалась: повторение ротации ВСУ в Угледаре! | Антон Черный, Фельдман | Альфа

157 бригада посыпалась: повторение ротации ВСУ в Угледаре! | Антон Черный, Фельдман | Альфа

JENNIE & Dominic Fike - Love Hangover (Official Video)

JENNIE & Dominic Fike - Love Hangover (Official Video)

ВОТ ТЕБЕ И ПИЦЦА :) #standoff2

ВОТ ТЕБЕ И ПИЦЦА :) #standoff2

Игра в Кальмара, но там играет Глупый Парень, 1 серия (анимация в Роблокс)

Игра в Кальмара, но там играет Глупый Парень, 1 серия (анимация в Роблокс)

Игра в Калмэра 2. И ВИДЕО НОВОЕ ПОСМОТРИТЕ НА КАНАЛЕ

Игра в Калмэра 2. И ВИДЕО НОВОЕ ПОСМОТРИТЕ НА КАНАЛЕ

Трамп отменяет гранты // DeepSeek - Революция в ИИ // Илон Маск читер

Трамп отменяет гранты // DeepSeek - Революция в ИИ // Илон Маск читер

Арахнофобы будут в восторге

Арахнофобы будут в восторге