Phi 4 Local Ai LLM Review - Is This Free Local Chat GPT Alternative Good?

Digital Spaceport

Просмотров 14 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 24 янв 2025

Комментарии • 55

@julienarpin5745 14 дней назад ⁺⁶
Phi-4 underperformed on my personal benchmark as well. I think the synthetic training data cost it a lot of nuances in its considerations
@JoshVoyles 15 дней назад ⁺⁶
I'm using Phi4 for second brain writing and general summarization, ideas, and misc stuff. I have noticed weird stuff, but I usually just need to tweak a word or two and re-prompt. When I get close to what I'm after, it's pretty good. Q4 on 3060.
@erikjohnson9112 13 дней назад
@@JoshVoyles It MIGHT be Q4 working against you? I realize you may not have an option. If you have enough CPU RAM perhaps trying identical requests using the FP16 version? (just to help answer the question of quantization)
@testales 15 дней назад ⁺¹⁰
The array question wasn't passed and there's no room for interpretation because it didn't stick the very basic requirement of A = 0. It even printed to correct assignments of letters to number keys on a phone and so could have seen from its own output that A is not on 0. It's just like answering a "yes" or "no" question with a text wall but not "yes" or "no"! I really hate it when even basic requirements in prompts are completely ignored. But you are right, all these "unscientific" random tests of regular people often tend to gave a much better picture than all those benchmarks for which these LLMs apparently have been optimized for. When it comes to memory footprint though, you shouldn't ignore the context size! I think Ollama uses by default just the ancient value of 2048 which is completely useless for code generation and any reasonable dialog. The very minimum should be 8k, which was also the limit for Llama 3.0. Phi4 4 has a limit of 16k if I remember correctly while even the 8b LLama has now 128k.
@DigitalSpaceport 15 дней назад ⁺³
Yeah it did get that wrong after all your right. I was imagining a phonepad that doesnt exist for some reason now that I am looking at mine. I did set the context to 16k for it which is its ctx size before testing in its model settings. I just didnt show it in the video.
@netroy 14 дней назад
Regarding the context length, checkout unsloth's phi4 fixes that came out yesterday.
Might be worth trying it for coding with a working 128k context length.
@DigitalSpaceport 22 часа назад
I need to do an unsloth tune!
@MaJetiGizzle 14 дней назад ⁺¹
My understanding of phi4 is that Microsoft wanted to build a model that divorced reasoning from data memorization, which seems to have been the case that it did better on more of the reasoning based questions and worse on the ones where it needed to output something from its data.
@sinokrene8598 15 дней назад ⁺³
Nice. Have you thought about testing the new mac mini m4 pro? Because it seems like cost efficient? And easy to do since the ollama is optimised for mac Thanks
@autoboto 15 дней назад ⁺²
would you have gotten better answers with the phi4:14b-fp16? Thanks for the review of Q8
@DigitalSpaceport 15 дней назад ⁺⁴
I doubt it would be double but I will test it this evening or weekend
@Nworthholf 15 дней назад
I personally was never able to tell the difference between Q8 and fp16 (aside from speed) on any model I tried
@ebswv8 15 дней назад ⁺³
Thanks for the test. I am planning to download this weekend for some testing of my own. In your opinion, what model is the best "daily driver" for users with vram aroud 12gb?
@DigitalSpaceport 15 дней назад ⁺⁸
Llama 3.2-Vision 11b q4 for that GPU vram size as a DD. Swapped out for Qwen 2.5 coder 7b q8 for specialized code related tasks. Should give you enough for your embeds also to run.
@jeffwads 15 дней назад ⁺³
PHI 4 has difficulty keeping track of context. Overlooks things within the prompts in my testing.
@DigitalSpaceport 15 дней назад ⁺¹
Feels like it is missing gaps in knowledge as well. Not sure but also seems to be ultra safety aligned to the point of annoying. Maybe they took it too far?
@sinokrene8598 15 дней назад
I am getting the same problems. What do u suggest for maintaing some kind of memory when using llm. Its good if u using the expensive models but locally they struggle when the chat goes really long. Perhaps they're is a way to keep the project going withoutt having to start another chat
@fatihpoyraz 15 дней назад
Thansk, good. Especially the comparisons that companies say 00:16 (we are better than this, we are better than that) with which model and what kind of system promtu they test. So we are waiting for test videos about these performances. How real is the claim of a new LLM? Because in AI training, it can be trained to get good results in those tests. What about the real thing!
@MisterOcean5 15 дней назад ⁺¹
Could be interesting to use Phi-4 in an agentic setup together with different APIs and real world context or in a simple bot with RAG and VectorDB as data source aka Helpdesk
@Person-hb3dv 15 дней назад
Yeah that's what I wanted to see
@justinknash 15 дней назад
Appreciate your videos. Newbie to AI, coming from DevOps background and expanding my homelab based on your videos and recommends. Now, just need to sell a kidney to buy a 5090. :-) Running ollama (llama 3.1) on a 4060ti (8GB sad face) currently.
@DigitalSpaceport 15 дней назад ⁺²
🤗 chanting: 8 GB VRAM MODERN GPUS SHOULD NOT BE A THING
@justinknash 15 дней назад
@@DigitalSpaceport Ya, I'm kicking myself for not springing for a 4070super. Hoping prices come down and will upgrade.
@Qewbicle 15 дней назад
How about getting an older GPU with more vram. I've been fine in the 2070 super max q for stable diffusion. Maybe seconds longer on a large image. Then you can do your thing, upgrade later after they release other versions. If you keep buying the expensive ones that don't quite cut it, together, they kept you from hitting your goal.
@p73737E 10 дней назад
hey champ, super interesting. what's the shell tool to check the GPU performance called?
@DigitalSpaceport 8 дней назад
nvtop is the util. It's a linux tool so it should work when you remote it in after you apt install nvtop.
@hprompt166 14 дней назад
question: you have activity on all your 3090s, was there any configuration changes to product that? I have 2 P2000 and I only get activity on 1.
I have tested both P2000 individually and they work fine
@kirostar12 13 дней назад
Do you thing Open UI is better then LM Studio. In LM Studio I don't have the ability to run 2 GPU's in the same time. Аnd when I load Llama 3.3 70B model my RTX 4090 gives me very slow answers. It would also be nice to be able to connect 4 more computers to one model, this would give a lot of flexibility to large models. So far, only the Llama 3.3 70B model is doing reasonably well for different types of situations. Answers comprehensively and makes few mistakes. It also responds correctly with almost no errors if you work with this model even in another language, which impressed me.
@SongsOfTheFenix 15 дней назад ⁺¹
It used Kilometers, but said miles. Calculated it correctly using KMs and KPH, Just told you miles.
@AndrewErwin73 14 дней назад
I can't seem to make Phi work with cuda (on my GPU)... so, not really useful to me.
@ChrisCebelenski 14 дней назад
I haven't been too impressed with the MS models in the past. I'll check this one out. Honestly I'm getting tired of the responses from Llama models. (And, why are there still models derived from Llama 3.1, like the Dolphin you did earlier? They really can't compete anymore.) I've also added a new set of cards to my rigs - I have some choices now... In addition to the 4x4060ti 16gb machine, I have a slightly newer box with some 3rd gen Xeon scalable gold's, and I'm trying to decide where to go with it. I have an RTX A5000, which I think will be put in there. It also came with dual RTX 2080's, but for power reasons I think I'm going to put in an RTX 8000, which is a semi-affordable 48GB VRAM card. I could also through in a variety of other cards, like some Tesla enterprise cards (P40, T4) and also an RTX 5000 ada gen, but I think I have a better use for that elsewhere at the moment. Welcoming some thoughts! (btw, the A5000 can plot a C5 K32 chia plot in less than a minute. Impressive)
@DigitalSpaceport 14 дней назад
Im about to sell my dual A5000s (non ada) as they are still priced up high and its a good exit time methinks. I am looking to snag more *shocked face* 3090s which I think we see folks selling down cheaper in the next few weeks. The 8000 is interesting, just recently was looking at those. If you are going big model, that is the best path for modernish GPUs currently. That 3rd gen scale has PCIe4 so if you do image gen often that could be ideal. Tossing fast NVMe onto the gen4 slots in a hyperx also could give you a nice shared storage setup if you do that. I read about a new training via sparse latents over networking that is out. May make training over home class networked GPU setups viable. Getting rid of old is my personal '25 vibe, but will I do it is the question.
@themarksmith 15 дней назад
I have an RTX 3060 (12gb) and I am looking for a model which can help me with python/web dev/code tasks - do you have an recommendations suitable models to try within openwb-ui?
@cariyaputta 14 дней назад ⁺¹
qwen 2.5 coder 14b instruct q8 with cpu offload
@themarksmith 14 дней назад
@@cariyaputta Thanks dude, will give it a go!
@sauxybanana2332 15 дней назад ⁺¹
14b can only do so much... You have 3090, this doesn't even qualify to be loaded to the vram
@patsk8872 15 дней назад
Most people don't have 20GB VRAM. Any way this can be put into 16?
@DigitalSpaceport 15 дней назад ⁺¹
yeah run the q4 version and not the 8 and it will fit in 16.
@Nworthholf 15 дней назад
Ideally, Q4_K_L, if available. Steer away from _S - _L or even _M of next lower quant will be better probably in every single way
@DeepThinker193 15 дней назад ⁺²
For me Llama 3.1 was the best opensource. 3.2 garbage. 3.3 Good much better than 3.2 but slightly worse than 3.1 depending on the context of the question.
@sil778 14 дней назад
Llama 3.2 3b best summarization tool. especially multilangual
@danilorodriguez4665 15 дней назад
Q4 Tested using LM Studio, not impressed, will give it a pass
@johnbell1810 6 дней назад
Only got 2 x 3060 12GB so I'm out!
@solosailorsv8065 День назад
if VRAM is the primary spec, why not use a $120usd, HP J0G95A NVIDIA Tesla K80 - GPU computing processor - 2 GPUs - Tesla K80 - 24 GB GDDR5 - PCI Express 3.0 x16 - fanless , for example?
@RebelliousX 14 дней назад
I feel phi3.5 was better 🤷🏻‍♂️
@RBEmerson 10 дней назад
I was intrigued by how poorly Phi4 did with the driving exercise.At the time I looked into the problem with the mileage being so far off, I couldn't remember where in FL you ended the trip, so I picked Austin, TX (sure of that part) and Pensacola, FL (SWAG). And I got roughly 1100 miles. Huh? OK, tell me how you (Phi4) went, and its routing was Austin, San Antonio, Houston, New Orleans, Mobile, Pensacola. Google said 700+ miles vs. 1100 from Phi4. Say what? Ah, Google "cheated" and took a direct, non-Interstate, route to Houston before hitting I-10. Fine, add San Antonio (which is heading SW to go E, BTW). Alright I-35 to I-10 to Pensacola and we're now up to 820+ miles. Well, heck, use I-12 to by-pass NOLA and we're down to 800 miles. OK, Phi4, 'splain me how you're driving a third further... the response was, in my slightly cynical view, self-justifying noise. I eventually backed Phi4 into a logical corner where it conceded it was not 100% right in its response. However, I was unable to get Phi4 to tell me where it went with the 35% error mileage; it just insisted on giving a less than accurate response to the problem.
All of the above is by way of saying "if Phi4 got this wrong, what else is it getting wrong?" "Ollama rm phi4:latest".
PS Granted Phi4 was unable to display a map of its routing, while Google Maps did its usual map display. Seeing an equivalent map from Phi4 might be...um...instructive. Maybe. It's hard to forget the self-justifying noise (e.g., exact starting and ending addresses impact the total distance - but by up to a third of the total drive???).
@DigitalSpaceport 8 дней назад
Yeah this was a new question that I added in to address spatial geophysical knowledge of the real world that could be useful in the context of systems to augment or offset trad maps software we all use and it appears there is a decent amount of room for improvement. It should, I would think, be referencing stored data on relationships from some ingested periodical. It could also interpolate between fixed positions and get an estimate I would think fairly easily. What it did here someone pointed out might be use the KM distance as Miles, which it works out to be almost exactly 1100 km distance. We do need that accuracy eventually in these models I think however so it is a valid thing to expect.
@mlsterlous 13 дней назад
It's always funny to me when people talk about those huge and expencive nvidia cards to run these models. While people like me don't even have a video card and still run this phi-4 model just fine locally. I have ryzen 7 7735hs mini-pc with integrated graphics. Using q4k_m_gguf in llama_cpp, totally enough for good results. Works with vulkan for better speed, uses virtual 4GBvram from total 32gb ram. The speed is about 7+ tokens/s, but for casual ordinary people, it's fast enough.
@JG27Korny 5 дней назад
If you buy second hand it is not that expensive.

Следующие

Автовоспроизведение

5090 Local AI GPU Performance Meta-Analysis