I can actually confirm it even runs with 6GB VRAM but only very slowly. "Very slowly" as in "It takes 10 minutes or more to generate a single 1024x1024 image".
Turn off "CUDA System Memory Fallback" and get "CUDA out of memory error". If "CUDA System Memory Fallback" is enabled, then very slow RAM is used instead of fast VRAM. (can be seen in task manager)
It seems to be at least better than the bare sd1.5 model - and look what the community has made out of that. So a few tweeks and finetunes down the line and we have an interesting sd competetor. Keep us updated!
Unfortunately the community being composed mainly of this rare breed of entitled high end rig gamers and whiny big booba weebs connoisseur they started shitting on the model already and speculating about why they can't achieve their lewd. I wish cool luck to Simo, I'd understand if it's first model is also his last.
Comfy still looks way more complicated to me compared to A1111 so I haven't taken the plunge yet, but it still interesting to see new things via Comfy. PS Nerdy, I like your short theme music at the end. Reminds me of early Stranglers. Would be appropriate if you actually have Rattus Norvegicus in your LP collection.🐀
It's a nice showcase of what seems to be their early access version of the final model, right now it's pretty slow with the uni_pc (1.4s/it with a 3090 on a 1024x1024 image) and produces nice results but nothing ground breaking. We also got like no guidance on how to efficiently use this thing and what different cfg and Schulders we can use, I'm very hopeful about the future of this model compared to what stability ai has been making! :)
Also as far as i can tell this thing hasn't been trained on anything that is considered nsfw prompting "naked woman" produces a woman's face with a bunch of hands on the sides and poses are also mixed up so this thing may be "open source" but it's not trying to break the norms and make a uncensored model.
Mixed reports on this. Some people in the replies have said they can run it just fine on 16GB and 12GB of VRAM. It's also hard to gauge just how much VRAM it's using if you're testing it with a batch size higher than 1.
@@pn4960 if it runs in AMD on RDNA 3, then an igpu can practically ignore vram requirements since igpus use system ram as VRAM on windows. RDNA 3 has an NPU onboard.
It obviously needs work, and ecosystem stables like controlnet, but open source providing a escape route away from SD’s enshittification is a massive W.
Thanks for this video. You lost me at 24 GB VRAM... 😂 One of the biggest mistakes I made building my current PC was going for an NVIDIA 4080 with "only" 16 GB VRAM. If I had known what was in store for me, I would have decided to sell a kidney and buy a 4090 instead.
It works in Automatic1111 but won't do text properly. It does text but it's gibberish. Has anyone solved this issue? If I run it in Comfy it is very slow. Like maybe one image an hour. Even running SDXL models in Comfy is painfully slow. I'm using a mac M1. Maybe there's already a solution for this but I'm not aware of one.
It more like it depends on the composition because Dalle3 still is bad at people compared to Open Source but it is better at more complicated prompts and can do decent text but at the same time Ideogram does better text and logos so right now there is not a good everything model.
@@southcoastinventors6583no, not really. Try to make a scene with several humans hugging, or shaking hands. Most open sourced models, including this new one, will mess everything up. Dalle3 performs very well. It’s only weakness - bad photorealism (probably on purpose) and heavy censorship.
@@KlimovArtem1 Control net and depth maps can easily do those. The point is Dalle3 is bad at things compared to other models and is not as good as ideogram on text and logos.
This is more an issue with text encoders. T5 models tend to perform similarly to Dalle 3 because they process prompts similarly. CLIP is more of a tokenizer than a text encoder. It's mostly just a waiting game now for community adoption of these sorts of models.
For a 0.1 version this model is great. It's still not fully trained and it's already better than SD3 in many regards while being truly open source.
Using 24gb of VRAM it doesnt seem to be even a good model, but we see in the few passing months.
@@taucalm ahaha 24gb
Oh, Nerdy Rodent, he really makes my day; showing us AI, in a really British way.
😅 makes my day every time i hear it
@@TruecrimeNas Same here XD
I can actually confirm it even runs with 6GB VRAM but only very slowly. "Very slowly" as in "It takes 10 minutes or more to generate a single 1024x1024 image".
Mmm… speedy 😊 Thanks for letting me know!
Turn off "CUDA System Memory Fallback" and get "CUDA out of memory error". If "CUDA System Memory Fallback" is enabled, then very slow RAM is used instead of fast VRAM. (can be seen in task manager)
"Can it do hands?"
"Can it do muddy red Wellies?"
Yes. Yes, it can. Welp, passed all my highest priority tests.
The essentials!
No it cant. Even women dont have six fingers or four hands.
For what it's worth, It runs fine on my 4090 mobile with 16GB of VRAM, albeit a bit slow. I was even able to do a batch size of 4 at 832x1216.
Nice!
yup works for me too, but cant do big t*tty goth nsfw
looks like its using about 14gb for me doing 832x1152 at about 2.6s/it. Batch of 4 runs, using 15/16gb vram
It seems to be at least better than the bare sd1.5 model - and look what the community has made out of that. So a few tweeks and finetunes down the line and we have an interesting sd competetor. Keep us updated!
I was able to run sd1.5 with 4gb vram though
@@DanielDota I'm sure that number will go down over time
Unfortunately the community being composed mainly of this rare breed of entitled high end rig gamers and whiny big booba weebs connoisseur they started shitting on the model already and speculating about why they can't achieve their lewd. I wish cool luck to Simo, I'd understand if it's first model is also his last.
Always good for different actors on the scene! A bit of competition is always nice, and it's only in beta so I guess that we will see more from them!
Not A Bad First Impression.
Hopefully is good enough for the community to update their tools for it.
A competitor to SD3 is rather needed right now.
Shows a lot of promise for an early beta, hope to see this come to InvokeAI soon!
Comfy still looks way more complicated to me compared to A1111 so I haven't taken the plunge yet, but it still interesting to see new things via Comfy.
PS Nerdy, I like your short theme music at the end. Reminds me of early Stranglers. Would be appropriate if you actually have Rattus Norvegicus in your LP collection.🐀
tbh this looks like next big thing to me, cloneofsimo brought us LoRAs and this model can only get better given its license.
It's a nice showcase of what seems to be their early access version of the final model, right now it's pretty slow with the uni_pc (1.4s/it with a 3090 on a 1024x1024 image) and produces nice results but nothing ground breaking.
We also got like no guidance on how to efficiently use this thing and what different cfg and Schulders we can use, I'm very hopeful about the future of this model compared to what stability ai has been making! :)
Also as far as i can tell this thing hasn't been trained on anything that is considered nsfw prompting "naked woman" produces a woman's face with a bunch of hands on the sides and poses are also mixed up so this thing may be "open source" but it's not trying to break the norms and make a uncensored model.
Always a good catch Nerdy!! 😊
It looks very promising. Unfortunately the 24 GB requirement is going to be a hard limit to how much it is used by the community.
Mixed reports on this. Some people in the replies have said they can run it just fine on 16GB and 12GB of VRAM.
It's also hard to gauge just how much VRAM it's using if you're testing it with a batch size higher than 1.
@@pn4960 if it runs in AMD on RDNA 3, then an igpu can practically ignore vram requirements since igpus use system ram as VRAM on windows. RDNA 3 has an NPU onboard.
I am using a RTX 3060 with 12GB VRAM and all works fine, only a little slow : 2 mins for an image.
I am using a RTX 3060 with 12GB VRAM and it works fine, only a little slow: about 2 mins and 10 secs for an image in average.
It obviously needs work, and ecosystem stables like controlnet, but open source providing a escape route away from SD’s enshittification is a massive W.
Me and my 4 gb of vram crying rn
Thats ok, you just need to download another 20gb of VRAM and you will be all set!
@@abaj006 You can also just plug some VRAM into an available USB port :D
I am using a RTX 4070ti with 16GB VRAM and it works great. about 33 secs each image
thanks this was indeed very interesting 🙂
I have a feeling it doesn't have training data with labeled styles at all. Most likely bulk labeled data via neural vision.
Amazing content as always! I'm hoping for some new model or flow that would allow the automation of 3d or 2d models game and animation assets
what different does this model do? and how much iteration less it takes to crete something good?
Thanks
And thank you! :)
Thanks for this video. You lost me at 24 GB VRAM... 😂 One of the biggest mistakes I made building my current PC was going for an NVIDIA 4080 with "only" 16 GB VRAM. If I had known what was in store for me, I would have decided to sell a kidney and buy a 4090 instead.
It runs with 16gb too. Also you can get a cheap used p40 (~300$) with 24gigs of vram as an additional card if you want
Save the kidney and get a used 3090
It works in Automatic1111 but won't do text properly. It does text but it's gibberish. Has anyone solved this issue? If I run it in Comfy it is very slow. Like maybe one image an hour. Even running SDXL models in Comfy is painfully slow. I'm using a mac M1. Maybe there's already a solution for this but I'm not aware of one.
Oh dear, lost me at the VRAM requirements. 4GB not gonna cut it eh.
Can i run this on my laptop? RTX 3050 Ti
I tried it yesterday, limbs quality is like SD3. Lets wait 1 more year.
24G of VRAM. Ok, not really related to me then.
How about "Vchitect Latte"?
My 6gb notebook doesn't like this, but it's actually pretty cool people can run it locally for free
24gb??
24GB VRAM? For what? The most community trained SD 1.5 Models do a better job. But thanks for the nice video.
👋
Is this model contain censorship in any capacity?
Woman on a grass - you got what you wanted!! 😂
not usable for now ...
Can't wait to spend $10k on a graphics card so I can generate woman with 4 hands 🙄
lol
Then spend $5k to get 2 hands instead
Just spend 300$ on a used p40 to get 24gigs of vram
"Truly open source" isn't needed if people stopped abusing "open source" on models that should be "open weight".
fixed seed bro
you lost me in the 24gb vram
Almost a year passed since Dalle3 release, and all these open sourced models still cannot achieve its level, sadly.
It more like it depends on the composition because Dalle3 still is bad at people compared to Open Source but it is better at more complicated prompts and can do decent text but at the same time Ideogram does better text and logos so right now there is not a good everything model.
@@southcoastinventors6583no, not really. Try to make a scene with several humans hugging, or shaking hands. Most open sourced models, including this new one, will mess everything up. Dalle3 performs very well. It’s only weakness - bad photorealism (probably on purpose) and heavy censorship.
@@KlimovArtem1 Control net and depth maps can easily do those. The point is Dalle3 is bad at things compared to other models and is not as good as ideogram on text and logos.
This is more an issue with text encoders. T5 models tend to perform similarly to Dalle 3 because they process prompts similarly. CLIP is more of a tokenizer than a text encoder. It's mostly just a waiting game now for community adoption of these sorts of models.
Can it do NSFW?
lul beata
Hard pass for now.