I'm SO happy that Apple did this. The only thing we need from Apple now is to make their Apple Silicon capable of running AAA games, and then there will be no reason for Windows to exist.
Oh very nice video! I hope you do more MLX coverage, for many people this will be the only viable option as high ram >16GB vram isn't accessible otherwise. Would be interesting to see a performance comparison between MLX and using llama.cpp based models like on LM Studio.
I wish I was smart enough to understand everything you did in this video. Can someone maybe answer this instead? Does this mean I’ll eventually be able to run 7B model from huggingface with better performance on my M2 MacBook Air, once the models are updated/converted to this format?
When running I get the following error: [INFO] Loading model from disk. Traceback (most recent call last): File "/Users/MacBookAir/mlx-examples/llama/llama.py", line 356, in model = load_model(args.model) File "/Users/MacBookAir/mlx-examples/llama/llama.py", line 306, in load_model weights = mx.load(str(model_path / "weights.npz")) RuntimeError: [load] Failed to open file Llama-2-7b-chat-mlx/Llama-2-7b-chat.npz/weights.npz Why is it not able to locate the weights? Everything else has been completed according to the instructions.
Encountered similiar error. The runtime script auto appends weights.npz. If specified with path in llama.py run scripts mens two instances and command fials. This worked for me. Runs horribly slow on my mac m1 pro 16gb machine. python llama.py Llama-2-7b-chat-mlx/ Llama-2-7b-chat-mlx/tokenizer.model --prompt "hello" The terminal command as executed with environment mlx active in directory ~/Llama-2-7b-chat-mlx/llama
Hi, It was very informative video.. Thanks a ton.. I have a small question. When i run "[INFO] Loading model from Llama-2-7b-chat-mlx/weights.npz. Press enter to start generation" It doesn't show like yours and it also not using the full capacity of gpu nor cpu.. taking lot of time to give the response.. System specs.. 16gb ram & M2 Pro chip.
Please for the loves sake, make the advertisements come on in logical part not midsentence. Your videos are generally quiet so I pump up the volume and then an ad comes on Breaking my speakers and my ears. Pick a spot where an advertisement should start and then take a pause and say when we come back …. please
I'm SO happy that Apple did this. The only thing we need from Apple now is to make their Apple Silicon capable of running AAA games, and then there will be no reason for Windows to exist.
It is capable through Parallels and even Switch games with emulation
Nice coverage. I’m really keen to do a fine-tune on a Mac natively, taking full advantage of the hardware finally. :)
Yay! Nice edit !
Thanks, finally the right version :)
Excellent content as always. Keep it up!
Ok is it like pytorch tensorflow etc? Can we use trained llms on it. How we do ai model deployment and optimization pipeline on it?
This is the EDITTED VERSION 😃
i wanna see the blooper reel!
@@The_8Bit haha, you missed the unedited one :D May be will release those for all the videos :D
Oh very nice video! I hope you do more MLX coverage, for many people this will be the only viable option as high ram >16GB vram isn't accessible otherwise. Would be interesting to see a performance comparison between MLX and using llama.cpp based models like on LM Studio.
Thanks, will look into it further if there is interest.
I wish I was smart enough to understand everything you did in this video. Can someone maybe answer this instead? Does this mean I’ll eventually be able to run 7B model from huggingface with better performance on my M2 MacBook Air, once the models are updated/converted to this format?
I use llama cpp on make to do a lot and it’s pretty fast. I’m wondering this actually will make it faster,
You need an Activity Monitor that shows Neural Engine activity. Is the NE being used here?
NE is only used when running ML models, not for building them. This framework is to build models.
@@Sam16842 Does the training process only use the GPU??
@@darkwoodmoviesyes
When running I get the following error:
[INFO] Loading model from disk.
Traceback (most recent call last):
File "/Users/MacBookAir/mlx-examples/llama/llama.py", line 356, in
model = load_model(args.model)
File "/Users/MacBookAir/mlx-examples/llama/llama.py", line 306, in load_model
weights = mx.load(str(model_path / "weights.npz"))
RuntimeError: [load] Failed to open file Llama-2-7b-chat-mlx/Llama-2-7b-chat.npz/weights.npz
Why is it not able to locate the weights? Everything else has been completed according to the instructions.
The github file has been modified and needs to wait for huggingface fix it
Encountered similiar error. The runtime script auto appends weights.npz. If specified with path in llama.py run scripts mens two instances and command fials. This worked for me. Runs horribly slow on my mac m1 pro 16gb machine.
python llama.py Llama-2-7b-chat-mlx/ Llama-2-7b-chat-mlx/tokenizer.model --prompt "hello"
The terminal command as executed with environment mlx active in directory ~/Llama-2-7b-chat-mlx/llama
Can you make a video explaining to use mlx and whisper in Mac device
Sure, will do
Is MLX the same as ferret?
No, MLX is a framework. Ferret is a model :)
@@engineerprompt i would really enjoy a basic tutorial on ferret and mlx
can you explain how can use this model with local documentation
Yes, looking into it
@@engineerprompt
Is there a video that explains chatting with documents other than this model?
Is there a video that explains chatting with documents other than this model?
offline
Would be good to run on ANE
muchas gracias
641 Sage Track
Hi, It was very informative video.. Thanks a ton..
I have a small question. When i run
"[INFO] Loading model from Llama-2-7b-chat-mlx/weights.npz.
Press enter to start generation"
It doesn't show like yours and it also not using the full capacity of gpu nor cpu.. taking lot of time to give the response..
System specs.. 16gb ram & M2 Pro chip.
I think there were updates to the mlx package. I am thinking about making more content on the updated package. Coming soon :)
whats your config may i know? m2 max 16gb/24gb? I was trying to run on m1 16gb air but i think i was a bit ambitiuous, its horribly slow :(
Mine is M2 Max 96GB
424 O'Conner Wells
Please for the loves sake, make the advertisements come on in logical part not midsentence. Your videos are generally quiet so I pump up the volume and then an ad comes on Breaking my speakers and my ears. Pick a spot where an advertisement should start and then take a pause and say when we come back …. please
Just use an adblocker
Using the app
You tube just adds this in the creator doesn’t say where the ads are
Martin Sarah Young Angela Rodriguez Gary
Rodriguez Jason Taylor Richard Lee Dorothy
Anderson Nancy Clark Elizabeth Young Charles
Ummm isn't this exactly what llama.cpp was originally for? Literally for Apple chips native inference.
Yes, but this goes beyond LLMs
51768 Alana Meadow