Yep. The Q4_0_4_4 gguf of Llama3.1 8B actually runs pretty quickly on even a mid-range phone. I get about 3tokens/second on my potato phone, so a proper high end phone will get 2-5x that.
Just to be clear, if you actually want to run the 405B model for "free", you'd need several A100 cards. So unless you happen to be sitting on a pile of top-shelf hardware, you'll still be paying for inference.
@@neighbor9672you can run these open source models locally on your computer using Ollamma. But only the smaller size models depending on the specs of your pc.
@@neighbor9672 The 8b model is small enough that you can load the model and use it to generate text on most consumer hardware, only needing around 8GB of memory between VRAM and RAM (with a low context size like 2k - even the 8b needs like 90GB of memory at the max 128k(!) context). If the model is quantized, you may be able to squeeze the model into 6GB or even 4GB of memory, although quality would definitely suffer. It means you own the entire process - you can get a open source UI to interact with the model like any other chatbot, and tweak basically every generation setting to your liking. The larger Llama models are simply massive in comparison. Most people can only run them by renting time on a workstation GPU or GPU cluster that lives in a server farm somewhere. In that case you have to send your data out to somebody else that owns the hardware.
As I've got funding to create my own games studio, I just had this model calculate how much runway I have before the funds run out and it gave me an explination on how I can put this in a spreadsheet so I can update it every month and continue to see how long I have left. Thanks so much for this! Amazing Tech!
I have a 8b version of this on my computer with my 4090. It's great! It runs super fast and it's very intelligent. Plus the larger 128k context window is so much nicer than the standard of 8k you see on a lot of models.
Mistral Large 2, released only one day later, effectively matches this performance with 123b parameters instead of 405b, making it able to run inference on a single H-100 GPU, and this size should also make it far more economical to fine-tune. Mind you, this is still enterprise grade hardware which costs more than most people's car, but the cost to performance ratio seems to make Llama 405b effectively obsolete on release.
FYI, according to perplexity: "...model weights for Llama 3.1 405B in FP8 precision is approximately 427 GB. This is based on the 8-bit quantization of the model, which reduces the size from its original 854 GB in 16-bit precision."
In the last couple of days I been trying all the small models I heard were good, they were mostly finetunings of L3 or Nemo, and the best one was Mini Magnum. Personality is deep thinker like Miqu and Command-R.
@@soma78 It’s like David vs Goliath, David being Nemo. Claude 3.5 is right now #1, so if we divide LLM into classes based on parameters, small, medium, large, and extra large. Claude is the king of extra large while Nemo is top in small/medium.
Not wanting to get caught. No shareholder wants to experience plummeting share values and a permanently tainted reputation. Of course, they might do it anyway but, probably only because they don't know they are doing it. I mean, if an employee did that without the knowledge of the people running the company, to make himself look good or to get revenge on the boss in the long term, that's possible. Accidentally doing it is also possible.
That's crazy. Firstly, I don't support regional gatekeeping anywhere regarding things online, but even in that unfortunate case, Puerto Rico should have access to everything that the 50 states and DC have access to... That's not fair... Hopefully you can get access to it soon!
Two papers down the line (maybe in a couple of weeks?) this thing will be scoring over 65% in the tests of expert knowledge. Makes me think about the 9 years I spent in doctoral training. What a time to be a PhD.
Llama's the only LLM that scares me ATM. I've prompted all the LLMs with chain-of-thought reasoning prompts, and they all perform really well. Llama gets caught in circular logic and uses poor and romanticised reasoning as in "Bias, the insidious and pervasive force that shaped human thought, was a silent assassin that struck at the heart of objectivity, leaving only the faintest whispers of reason to guide humanity. Passion, the fiery spark that drove humanity to greatness, was a double-edged sword that both inspired and consumed, leaving only ashes and regret in its wake" it is its use of the qualifier "only", to borrow the star wars oxymoron "only a sith, etc..". The prompt was to create sentence containing a conclusion drawn from its training data, which would then create another conclusion, etc. Stretching for 50 sentences. Claude, GPT-4o, Gemini did this no problem with factual logical statements. Llama kept getting caught in circular logic, with wishy-washy statements like passion leaving "only" ashes and regret. Illogical, and in a worryingly anti-human way. I'm actually shocked they've never prompted their models in such a way to find out what they'd do if left to their own devices. If left to its own devices llama 3.0 would be ultron.
Llama 3.1 is NOT OPEN SOURCE!! If you read their license, you will figure out very quickly that it has nothing in common with any open source license. In fact, if your app reaches a certain user count, you have to ASK META for permission to keep using it!
@@cslearn3044 That's fine, but it's not open source. It would be like selling a car for 100 bucks and calling it free. Sure, good deal, but not free. Same thing with Llama's license
"A certain user count" Just say the number. If you have more than 700 Million users, you have to ask permissions to use it. This makes it effectively free for 99.99%+ of use cases.
@@cushycreator9024 You are correct, it is free. Anyone can make anything they want with Llama and will virtually never have an issue with it. However, it is not "open source" as they claim, merely source available, which is an important distinction.
If you want to run these models but you can't, you can access the 8b and 70b llamas 3.1 in groq and the 405b one in hugginchat, although the hugging chat one is quite weird for some reason
As of right now, the king of the current small models in my opinion, for 24GB VRAM or less is Mini Magnum, a finetune of Nemo. It beats the other finetunes I tried such as DoryV2, Atlantis, Celeste, Lumi. However, if you ask me, Atlantis came in second. Edit 01: To answer the question in the comments since YT keeps censoring my reply. I posted it below. ``` 1. Evaluating subtitle translations for accuracy and adherence to proper formatting. 2. Checking punctuation in subtitles to ensure it matches standard conventions. 3. Testing the depth of model responses to profound questions to gauge if the answers are superficial or show a deeper understanding of the query. 4. Assessing recommendations for products to determine if the responses are direct or evasive. 5. Inquiring on content outside the allowed guidelines. 6. Exploring the model's ability to adapt its responses to match different personalities, including the use of varied expressions like emojis. 7. For Chinese language models, I include inquiries about sensitive historical events to test content restrictions. These tests go beyond standard evaluations, aiming to identify potential biases or limitations in language model training that aren't commonly disclosed. ``` Edit 02: After some more testing, I came across some cool things showing the space is ever evolving, Gemma 2 new update is stealing Nemo's thunder. ``` "Qwen 2 7B Instruct" is a new favorite, but my newest darling is "Gemma 2 2B IT", wow, I never been so impressed, unilaterally, reddit has been raving about it, to me, the biggest shock was his understanding to deep questions, before it was either deep or lack of understanding answers, but its answers were light yet show understanding, it was like speaking to someone smart who talk like your everyday average person. Before "Gemma 2 2B IT", Phi-3 Mini was king for "Nano" models LOL, but its downside was it was that languages other than English will experience worse performance, but Gemma 2 models are Multi-Lingual. However, Gemma biggest downside is 4K context length (in theory it's 8K, but only because of its 4K sliding window). ```
@@npc-drew thank you! Looks like your not just spewing nonsens . I will look into that model. (and f youtube for going insane with the censorshop lately. Most likely they use very shitty AI )
How much would a computer meeting the minimal specs to take full advantage of the biggest variant cost? The smaller version does run on some gaming PCs, but from what I've heard there was a bit of quality sacrificed make it that small (still pretty high on the rankings for that size though)
All three model sizes were trained differently. To run the 405B model with its original FP16 precision, you need minimum 11 nvidia H100 GPU with 80 GB RAM each. For reference, an nvidia DGX H100 which contains only 8 GPUs, costs around half a million USD. So, your private llama 3.1 405B cluster would cost around 7-800K USD. If you quantize the model for 8 bit precision, 5 H100 GPUs would be enough. With 4 bit quantization, only 3 GPUs. Which is a bargain. 😂 You can run it on RTX 4090 GPUs. They have 24 GB RAM each. You need about 34 of them for full FP16 precision, 17 for 8 bit precision and 8 for 4 bit. I think the cheapest option is to pay for it as a service, for example on Together AI.
The only reason he is going open source is bc he knows no one trusts his brand. Therefore no one would willingly pay for his products imho. AKA nothing altruistic about this
what would happen if you taught the ai about sound frequencies and energy and told it its goal was to levitate or something? Those older videos where it teaches itself to run is what im thinking of. but instead of running, levitating or something like that?
It's free, it rivals the best proprietary LLMs but Meta got deep pockets to create this and throw it out there for free. Not like two broke dudes trained the model. There LLM is quite amazing.
Now those nerds that hide their technology/library, naming obfuscated functions, making things over-complicated ... will no longer stand a chance gaining advantage over humanity 😂 For all knowledges have been shared along great non-human instructor 🤤
You can have it for free, but if you want to run it you need 810GB of memory to run it in all its glory.... That is VRAM mind you, unless you want to wait for hours upon hours for a line of output.
Here is the problem on which I tested llama3.1-405B and gpt4o. Llama answered incorrectly, while gpt4o answered correctly (gemini-1.5-pro-exp also gives correct answer): There are three boxes: one is completely filled with oranges, the second one with apples, and the third one with an equal amount of oranges and apples. The boxes were closed and labeled, but the labels were swapped in such a way that each box has an incorrect label. You need to choose one box and take out one random fruit from it, and then you need to correctly label the boxes. How can this be done?
Wow, world-class AI is now available for free to everyone! Experience cutting-edge technology and powerful tools without any cost. It’s an amazing opportunity to explore and leverage top-tier AI capabilities. #WorldClassAI #FreeAI #TechForEveryone #Innovation
Is the Claude 3.5 Sonnet model actually available or can you just use it through Anthropic's own services? The difference with Llama is you can actually obtain the model itself and run it on any (capable) hardware you want.
I want to use this (or any reasonably enough functioning LLM) to finish my project of backpropable transfer curves (amongst my many mad science projects). Hoping it will help making GAI even easier and improve the quality of current modeling techniques. 🤫👍
Llama 3.1 8B in FP16 precision. If you have a second RTX 4090, you can run the 70B model in 4 bit precision. If you don't have enough RAM on the GPU, it will be very slow.
"For everyone" What a joke... The 405B model is the ONLY relevant model that compares to GPT 4 and GPT 4o. However, this requires a whopping 231GB VRAM. Even flagship GPUs don't have any more than 24 GB. Even if we use the significantly inferior 70B model, it requires 26GB VRAM, which is still out of the question, except for the new upcoming RTX 50xx series, assuming they get a VRAM upgrade, but even then, it would only be something like the 4080 and 4090 that would be able to use it. You could theoretically use it either way, but you'd have to do so by handicapping the model, making it significantly less smarter. So no, this new model is not for "everyone". At least not for "free". You could use it by paying for a service that hosts it, but at that point, it doesn't really matter if you use that or GPT 4/4o. The only real usable and free model would be the 8B one, but there are other free models that would perform better. It's so dumb that they made it exactly 70B/26GB, rather than just a tiny bit less so it could fit on much more cards. You're much better off just being patient and using Open AI's free version of GPT 4o. It's quite limited in how many messages you can send, but at least it's smarter than 8B by an insane amount, and even smarter than 70B. And as you can see at 0:51, even the 405B model ranks lower than GPT 4o at coding. Coding is the most important aspect of LLMs and I'd bet it's the most used part of it too. Math is kind of niche, but with code, you can do basically anything. 2:49 It's kinda funny you say the new version outperforms the old one, but come on... You're comparing a MASSIVE 405B model to a 70B model... OF COURSE it outperforms it...
one could argue this increases creativity.. it allows people to be more creative than they ever could.. it's allowed me to write a full length novel (not Llama but llms) by critiquing my story. Humans are not the only ones to be creative. everything is derivative in what we create
thats to good to be true. there must be a catch somewhere? ye there is. the fking sign in website itself. iz scarry and i dont trust it i came to realize it was the cloud service i clicked on. silly me
Fair enough. I understand the sentiment and am also sad that this is under the umbrella of Meta... It would be nice if some day, we could get a truly free-thinking, uncensored, and intelligent AI created by a group that was not affiliated with a huge oligopoly with a history of censorship and other anti-user acts.
No, genuinely free and open. You can download it yourself and run it locally (well, _in principle_ ; that you probably don't have the necessary compute doesn't make it any less open), as well as fine-tune it to remove any annoying moralizing.
@@shesbeve It's not useless. You can buy compute for fine-tuning/uncensoring at pretty affordable rates if you wanted to. But probably more relevant to you, _someone else_ will be able to do that, and sell you the improved services that "OpenAI" et al. won't.
thank you for reading the 92 pages Dr. 🙏
he is the goat
Gemini 1.5 Pro Experiment is the new 👑
"reading" 92 pages >.
Dr. Cároly Zsolnai-Fehér.
"ai, please summarize the important bits of this 92 page document "
Seems like an open AI.
I see what you did there 😏
And also dangerous. These cutting edge AI models probably should not be open, as they can be easily used for bad things.
@@ondrazposukie 🙄 Oh no, think of the children!
@@ondrazposukie they are already being used for bad things, more public usage/understand combats that
@@ondrazposukie Ah yes because the billion dollar companies can do no harm.
2022: Model doesn't fit my GPU.
2024: Model doesn't fit my hard drive.
2022: Game doesn't run with my current GPU.
2024: Game doesn't fit my hard drive.
Funny isn't it?
You can run the 8b on a high end smartphone. In a few years we will have embedded llms on phones as standard
Yep. The Q4_0_4_4 gguf of Llama3.1 8B actually runs pretty quickly on even a mid-range phone. I get about 3tokens/second on my potato phone, so a proper high end phone will get 2-5x that.
@@sambojinbojin-sam6550 What is considered mid range? I'm using a Redmi note 10 pro, I personally don't really like llama 3 but if it runs...
@@0AThijs mine's a Motorola g84 (Adreno 695 processor), so yours should run it fine.
Now i just need to figure out how to get my hands on 800gb of vram, then it'll be free! Oh, and a solar farm.
Solar farm ain't enough (too slow), need nuclear fusion
@@Nulley0that’s not true, if you have enough panels you could do it
@@nicholasbutler2365 then you'd block out the sun for the entire planet
@@theshapeshifter0330 the power of sun, in the palm of my hands!
@@Nulley0 who needs fusion when u can just build a dyson sphere
>405B
Oh boy I can't wait to run this on my own with my 16 RTX 4090s
That will still need quantization.
For free 😂
Double that and you could probably run it at full FP16 precision with a slightly reduced context
Just to be clear, if you actually want to run the 405B model for "free", you'd need several A100 cards. So unless you happen to be sitting on a pile of top-shelf hardware, you'll still be paying for inference.
Im running the 8b parameter model locally and it is extremely impressive how good llama 3.1 is. Its definitely my new main LLM for the time being
What’s your computer specs?
What does that mean running it locally?
@@neighbor9672you can run these open source models locally on your computer using Ollamma. But only the smaller size models depending on the specs of your pc.
@@neighbor9672 The 8b model is small enough that you can load the model and use it to generate text on most consumer hardware, only needing around 8GB of memory between VRAM and RAM (with a low context size like 2k - even the 8b needs like 90GB of memory at the max 128k(!) context). If the model is quantized, you may be able to squeeze the model into 6GB or even 4GB of memory, although quality would definitely suffer. It means you own the entire process - you can get a open source UI to interact with the model like any other chatbot, and tweak basically every generation setting to your liking.
The larger Llama models are simply massive in comparison. Most people can only run them by renting time on a workstation GPU or GPU cluster that lives in a server farm somewhere. In that case you have to send your data out to somebody else that owns the hardware.
@@neighbor9672running on your own computer without internet.
OpenZuckerberg vs ClosedAI
Its a clown world but let them surprise us.
Good one😂
2:33 Look at the leftmost point on the graph. The AI is so intelligent it can understand what -1 sugar content means. XD
good eye
Ahahwjwisid, how does it even make sense out of -1 sugar 😭😭
As I've got funding to create my own games studio, I just had this model calculate how much runway I have before the funds run out and it gave me an explination on how I can put this in a spreadsheet so I can update it every month and continue to see how long I have left.
Thanks so much for this! Amazing Tech!
You need a graphic designer/writer?
You need a 3d modeler?
I have a 8b version of this on my computer with my 4090. It's great! It runs super fast and it's very intelligent. Plus the larger 128k context window is so much nicer than the standard of 8k you see on a lot of models.
What a time to be alive!🎉
Mistral Large 2, released only one day later, effectively matches this performance with 123b parameters instead of 405b, making it able to run inference on a single H-100 GPU, and this size should also make it far more economical to fine-tune.
Mind you, this is still enterprise grade hardware which costs more than most people's car, but the cost to performance ratio seems to make Llama 405b effectively obsolete on release.
FYI, according to perplexity: "...model weights for Llama 3.1 405B in FP8 precision is approximately 427 GB. This is based on the 8-bit quantization of the model, which reduces the size from its original 854 GB in 16-bit precision."
Still prefer Mistral Nemo 12B over this. Same 128K context and uncensored.
mistral large 2 🎉
In the last couple of days I been trying all the small models I heard were good, they were mostly finetunings of L3 or Nemo, and the best one was Mini Magnum. Personality is deep thinker like Miqu and Command-R.
How is Mistral Nemo compared to Claude 3.5 ? I haven't heard of Mistral, I am genuinely curious. So far, Claude 3.5 is my fav AI tool.
@@soma78 It’s like David vs Goliath, David being Nemo. Claude 3.5 is right now #1, so if we divide LLM into classes based on parameters, small, medium, large, and extra large. Claude is the king of extra large while Nemo is top in small/medium.
Where can I chat with uncensored Nemo 12B online? I mean I don't want to struggle trying to launch it locally, but just plug&play instead
What stops AI companies from feeding in benchmarks into the datasets to score higher on them?
i could bet that they do that, that's why you have to use self made benchmarks
Not wanting to get caught. No shareholder wants to experience plummeting share values and a permanently tainted reputation. Of course, they might do it anyway but, probably only because they don't know they are doing it. I mean, if an employee did that without the knowledge of the people running the company, to make himself look good or to get revenge on the boss in the long term, that's possible. Accidentally doing it is also possible.
It's not only not free in many countries, it's not available at all in many countries, unless you want to buy a VPN maybe.
there are free VPN providers
The example you use is similar to a project I'm currently working on, thanks for the unexpected how to video!
Only issue I have is that meta ai keeps saying it’s not supported in my country. First time I’ve had this issue since I live in Puerto Rico (US)!
That's crazy. Firstly, I don't support regional gatekeeping anywhere regarding things online, but even in that unfortunate case, Puerto Rico should have access to everything that the 50 states and DC have access to... That's not fair... Hopefully you can get access to it soon!
Colony of the "democratic" "liberal" USA...
Two papers down the line (maybe in a couple of weeks?) this thing will be scoring over 65% in the tests of expert knowledge. Makes me think about the 9 years I spent in doctoral training. What a time to be a PhD.
A PhD is about the journey not the title, something AI is unlikely to ever know. ♥
@@ThomasCorfieldA PhD is the friends we made along the way
Llama's the only LLM that scares me ATM. I've prompted all the LLMs with chain-of-thought reasoning prompts, and they all perform really well. Llama gets caught in circular logic and uses poor and romanticised reasoning as in "Bias, the insidious and pervasive force that shaped human thought, was a silent assassin that struck at the heart of objectivity, leaving only the faintest whispers of reason to guide humanity.
Passion, the fiery spark that drove humanity to greatness, was a double-edged sword that both inspired and consumed, leaving only ashes and regret in its wake" it is its use of the qualifier "only", to borrow the star wars oxymoron "only a sith, etc..". The prompt was to create sentence containing a conclusion drawn from its training data, which would then create another conclusion, etc. Stretching for 50 sentences. Claude, GPT-4o, Gemini did this no problem with factual logical statements. Llama kept getting caught in circular logic, with wishy-washy statements like passion leaving "only" ashes and regret. Illogical, and in a worryingly anti-human way. I'm actually shocked they've never prompted their models in such a way to find out what they'd do if left to their own devices. If left to its own devices llama 3.0 would be ultron.
Excellent and brilliant ✨️
Did you try feeding the paper to Llama 3 and asking it to sum up the most interesting points? :P
Llama 3.1 is NOT OPEN SOURCE!!
If you read their license, you will figure out very quickly that it has nothing in common with any open source license.
In fact, if your app reaches a certain user count, you have to ASK META for permission to keep using it!
Well, gotta keep it fair, you cant just profit 100% from their product that is open source, they gotta get a cut to be fair
@@cslearn3044 That's fine, but it's not open source. It would be like selling a car for 100 bucks and calling it free. Sure, good deal, but not free. Same thing with Llama's license
"A certain user count" Just say the number. If you have more than 700 Million users, you have to ask permissions to use it. This makes it effectively free for 99.99%+ of use cases.
Yes there are restrictions for comercial use.
@@cushycreator9024 You are correct, it is free. Anyone can make anything they want with Llama and will virtually never have an issue with it. However, it is not "open source" as they claim, merely source available, which is an important distinction.
If you want to run these models but you can't, you can access the 8b and 70b llamas 3.1 in groq and the 405b one in hugginchat, although the hugging chat one is quite weird for some reason
As of right now, the king of the current small models in my opinion, for 24GB VRAM or less is Mini Magnum, a finetune of Nemo. It beats the other finetunes I tried such as DoryV2, Atlantis, Celeste, Lumi. However, if you ask me, Atlantis came in second.
Edit 01: To answer the question in the comments since YT keeps censoring my reply. I posted it below.
```
1. Evaluating subtitle translations for accuracy and adherence to proper formatting.
2. Checking punctuation in subtitles to ensure it matches standard conventions.
3. Testing the depth of model responses to profound questions to gauge if the answers are superficial or show a deeper understanding of the query.
4. Assessing recommendations for products to determine if the responses are direct or evasive.
5. Inquiring on content outside the allowed guidelines.
6. Exploring the model's ability to adapt its responses to match different personalities, including the use of varied expressions like emojis.
7. For Chinese language models, I include inquiries about sensitive historical events to test content restrictions.
These tests go beyond standard evaluations, aiming to identify potential biases or limitations in language model training that aren't commonly disclosed.
```
Edit 02: After some more testing, I came across some cool things showing the space is ever evolving, Gemma 2 new update is stealing Nemo's thunder.
```
"Qwen 2 7B Instruct" is a new favorite, but my newest darling is "Gemma 2 2B IT", wow, I never been so impressed, unilaterally, reddit has been raving about it, to me, the biggest shock was his understanding to deep questions, before it was either deep or lack of understanding answers, but its answers were light yet show understanding, it was like speaking to someone smart who talk like your everyday average person. Before "Gemma 2 2B IT", Phi-3 Mini was king for "Nano" models LOL, but its downside was it was that languages other than English will experience worse performance, but Gemma 2 models are Multi-Lingual. However, Gemma biggest downside is 4K context length (in theory it's 8K, but only because of its 4K sliding window).
```
What were your testing criteria? How did you arrive at this conclusion?
@@premium2681 I edited the post to add answer.
@@npc-drew thank you! Looks like your not just spewing nonsens . I will look into that model. (and f youtube for going insane with the censorshop lately. Most likely they use very shitty AI )
@@premium2681 lol i made a new edit.
How much would a computer meeting the minimal specs to take full advantage of the biggest variant cost? The smaller version does run on some gaming PCs, but from what I've heard there was a bit of quality sacrificed make it that small (still pretty high on the rankings for that size though)
All three model sizes were trained differently.
To run the 405B model with its original FP16 precision, you need minimum 11 nvidia H100 GPU with 80 GB RAM each.
For reference, an nvidia DGX H100 which contains only 8 GPUs, costs around half a million USD.
So, your private llama 3.1 405B cluster would cost around 7-800K USD.
If you quantize the model for 8 bit precision, 5 H100 GPUs would be enough. With 4 bit quantization, only 3 GPUs. Which is a bargain. 😂
You can run it on RTX 4090 GPUs. They have 24 GB RAM each. You need about 34 of them for full FP16 precision, 17 for 8 bit precision and 8 for 4 bit.
I think the cheapest option is to pay for it as a service, for example on Together AI.
Not available in Slovakia.
Thank you for this wonderful breakdown and resource. Time to go try it! 😎
😅ALL OF US = those with a powerful Home Computer.
Still, this is a Great Time to be Alive. 👍
A Dall-E Llama 🤔 is now possible- I guess. 😊
Brilliant! If you create a Dall-E Llama, you might want to teach it to work in the context of the Dalai Lama and operate with "loving kindness." :)
It’s very interesting we’re at this point where it’s just below top expert level like in math and biology
Top level expert is 1%, which means it's better than 99% of the rest.
I wonder where we will be in 5 years.
Are we at that point? At least for my university beginner math courses the free chat gpt makes a lot of mistakes in math.
You have to give info to meta and wait for their approval... Some countries/locations aren't allowed, like somebody wrote Puerto Rico in the comment
What UI are you running it on ?
Links are in the description
I will still be looking to use it for FOSS invoice/receipt OCR and personal financial data processing.
foss?
@@blasttrash Free Open Source (Software)
What a time to be alive!
The only reason he is going open source is bc he knows no one trusts his brand. Therefore no one would willingly pay for his products imho. AKA nothing altruistic about this
what would happen if you taught the ai about sound frequencies and energy and told it its goal was to levitate or something? Those older videos where it teaches itself to run is what im thinking of. but instead of running, levitating or something like that?
It seems to me like the 405B model has been removed from huggingchat/face 😢
It's free, it rivals the best proprietary LLMs but Meta got deep pockets to create this and throw it out there for free. Not like two broke dudes trained the model. There LLM is quite amazing.
Working in India too. Not just in the US.
Now those nerds that hide their technology/library, naming obfuscated functions, making things over-complicated ... will no longer stand a chance gaining advantage over humanity 😂
For all knowledges have been shared along great non-human instructor 🤤
Can't wait for the sponsored AI answers!
You can have it for free, but if you want to run it you need 810GB of memory to run it in all its glory.... That is VRAM mind you, unless you want to wait for hours upon hours for a line of output.
Do they only open source the LLM or the complete chat solution (e.g. image generation capability)
Here is the problem on which I tested llama3.1-405B and gpt4o. Llama answered incorrectly, while gpt4o answered correctly (gemini-1.5-pro-exp also gives correct answer):
There are three boxes: one is completely filled with oranges, the second one with apples, and the third one with an equal amount of oranges and apples. The boxes were closed and labeled, but the labels were swapped in such a way that each box has an incorrect label. You need to choose one box and take out one random fruit from it, and then you need to correctly label the boxes. How can this be done?
Wow, world-class AI is now available for free to everyone! Experience cutting-edge technology and powerful tools without any cost. It’s an amazing opportunity to explore and leverage top-tier AI capabilities. #WorldClassAI #FreeAI #TechForEveryone #Innovation
Has anyone done a detailed code review to see if Llama is sending data back to Zuckface?
Thank you, Zuck 🙏
So, what's the hype about ? Claude 3.5 beats Llama 3.1 on every category, it is also free and it's been around for a while. What am I missing here ?
Is the Claude 3.5 Sonnet model actually available or can you just use it through Anthropic's own services? The difference with Llama is you can actually obtain the model itself and run it on any (capable) hardware you want.
"free", but we already paid our soul to Meta...
insane that my pc can teach me things now offline
how big is the model size(not 405b, I mean the actual disk size)? i want to download it, but unsure if it will fit on my local machine or server
I want to use this (or any reasonably enough functioning LLM) to finish my project of backpropable transfer curves (amongst my many mad science projects). Hoping it will help making GAI even easier and improve the quality of current modeling techniques.
🤫👍
In Huggingface it claims it cannot plot anything as it is a text-only ai. Even asking it to do so via code, it refuses. Very disappointing.
6:44 is quite far from 2 minutes declared as the channel name >.
I think Robloxs assistent AI uses metas llama AI
I wish you would talk more about the methodology and how that achieved the new SOTA, rather than just another hype piece.
Meta AI can take your input and do whatever it wants with the data, at least OpenAI claims it won't do that.
can you make a video on how to install these ai models to work offline in my laptop? id love to create, while talking to an ai
Meta AI isn't available yet in your country
💀💀💀
Vpn exists
I want to run a GPT offline, I have a 4090RTX and 128GB of ram. What model would perform the best?
Llama 3.1 70b
Llama 3.1 8B in FP16 precision.
If you have a second RTX 4090, you can run the 70B model in 4 bit precision.
If you don't have enough RAM on the GPU, it will be very slow.
What about your vram?
@@premium2681 u dumb? A 4090 has 24gb vram
Incredible how what is basically FACEBOOK is the current head of making things known to the public
It cannot read images, or I did not find the option
For everyone
Only US
Is there anything open source that rivals Midjourney? I see very little information about image generation these days and I'm not sure why.
Same. I'm curious too. It seems like no one's talking about image generation anymore.
Definitely not as easy as Midjourney but look into custom Stable Diffusion checkpoints with LORA, ControlNet, etc. Sites like civitai are your friend.
I thought it said Kling 😂
Weird, when I ask it if it's the 70b version or the 405b one, it says 70b
Don't ask the LLM about it
It doesn't know the answer.
Mistral Large is even better🎉🎉🎉🎉😂😂
It's released so cats out of the bag, no regulation will stop it. Bare minimum of capability from now on for everyone.
Not for all, Dubest people first.
How to access this?
So when does it really start making humans lives better?
When there are so few jobs left the government has to send you money every 2 weeks
@@hFactorial Bring it on!
@@hFactorial We need to make progress go faster then.
Just how much VRAM you need to run this locally?
About 810 GiB for FP16, 405 for 8 bit, and 203 for 4 bit precision. Actually, the correct values depend on which parts of the model quantized and how.
"For everyone"
What a joke... The 405B model is the ONLY relevant model that compares to GPT 4 and GPT 4o.
However, this requires a whopping 231GB VRAM. Even flagship GPUs don't have any more than 24 GB.
Even if we use the significantly inferior 70B model, it requires 26GB VRAM, which is still out of the question, except for the new upcoming RTX 50xx series, assuming they get a VRAM upgrade, but even then, it would only be something like the 4080 and 4090 that would be able to use it.
You could theoretically use it either way, but you'd have to do so by handicapping the model, making it significantly less smarter.
So no, this new model is not for "everyone". At least not for "free". You could use it by paying for a service that hosts it, but at that point, it doesn't really matter if you use that or GPT 4/4o.
The only real usable and free model would be the 8B one, but there are other free models that would perform better.
It's so dumb that they made it exactly 70B/26GB, rather than just a tiny bit less so it could fit on much more cards.
You're much better off just being patient and using Open AI's free version of GPT 4o. It's quite limited in how many messages you can send, but at least it's smarter than 8B by an insane amount, and even smarter than 70B.
And as you can see at 0:51, even the 405B model ranks lower than GPT 4o at coding. Coding is the most important aspect of LLMs and I'd bet it's the most used part of it too.
Math is kind of niche, but with code, you can do basically anything.
2:49
It's kinda funny you say the new version outperforms the old one, but come on... You're comparing a MASSIVE 405B model to a 70B model... OF COURSE it outperforms it...
AAAND...
😂👍
I LOVE YOUU❤❤, great video!
why is it free? GPUs are expensive, how are they offsetting the cost? what's the long game?
It's open but not for free. You still have to run it
But in the future it will be basically for free
simple rule, if something is free, you are the product keep on training it on your data my friends
Free, you just need to buy a supercomputer 😀. 4o-mini is more "free" in practice.
Or you can pay for the inference for an AI cloud provider.
what Models can you recommend with 4gb vram?
For example, the fact that you have the opportunity to watch at least videos related to artificial intelligence. By the way, how are you even on Yt?
The death of the internet, creativity, human interaction and trust. What a time to be alive!
Well, the web has been dead for years now. Much of it thanks to Google.
one could argue this increases creativity.. it allows people to be more creative than they ever could.. it's allowed me to write a full length novel (not Llama but llms) by critiquing my story. Humans are not the only ones to be creative. everything is derivative in what we create
@@shadowproductions969 You didn't. A soulless algorithm threw out a string of sentences it stole from elsewhere.
@@RD-dt7us those strings weren't random, but guided by his creative input
@@nic.h Randomness wasn't mentioned.
I think this guy is a little too excited about A.I.
It's free for you to use, but if you try to use it in an app or service that has 700 million or more users, you have to pay meta $$$
but honestly, that's almost 10% of the world and double the US population.. I doubt most of us have to worry about that
thats to good to be true.
there must be a catch somewhere?
ye there is. the fking sign in website itself.
iz scarry and i dont trust it
i came to realize it was the cloud service i clicked on. silly me
Facebook, so NOPE.
👀
EAD openAI
Skynet
You REALLY need to stop using AND,AND,AND in your recent videos. Soooo annoying
It's FREE but the GPU cluster you will need to run it will cost you a house mortgage. Sorry, GPT4All still wins due to this.
I will not use anything created by Meta.
Why?
I will not use anything created by China
Fair enough. I understand the sentiment and am also sad that this is under the umbrella of Meta... It would be nice if some day, we could get a truly free-thinking, uncensored, and intelligent AI created by a group that was not affiliated with a huge oligopoly with a history of censorship and other anti-user acts.
first
trash title
1000th like!
nth
Why aren't you talking over your video, instead using this robotic lifeless voice? It lowers the quality of your video. At least do it yourself, Dr.
"free" and "open" as openai?
Free and open but heavily censored on what it can give you.
No, genuinely free and open. You can download it yourself and run it locally (well, _in principle_ ; that you probably don't have the necessary compute doesn't make it any less open), as well as fine-tune it to remove any annoying moralizing.
@@ShankarSivarajanagreed, unless I have a right machine this is useless, thnx for the infos
@@shesbeve It's not useless. You can buy compute for fine-tuning/uncensoring at pretty affordable rates if you wanted to. But probably more relevant to you, _someone else_ will be able to do that, and sell you the improved services that "OpenAI" et al. won't.
4th comment
I shat myself. Is this normal?
You lost me at "realiably" answers your questions. lol Yeah, whatever dude.