I really appreciate this kind of video, it saves everyone a lot time but also opens up new lines of enquiry. You could go even more granular with these. There's definately some weird weighting in the model. I'd love more like this, I'm trying to find photographic styles and oddities. Who knows, maybe we can find some magic in latent.
I'm impressed with Flux's ability to understand plain English prompts rather than a set of keywords, also impressed that it works on my 4GB VRAM card (5 to 20 minutes per image), but I'm playing with SD1.5 models that produce better images... mostly.
Hmm, looks like at least right now Flux is mostly good for pretty women and food. Latent vision was saying it did better with more verbose sentences rather than keywords. Like it was trained on generated image descriptions.
Not exactly verbose, if you put a novel in it will ignore it. But you can use more complex sentences so it understands the way words are connected. So if you say "blue dress" it won't make the rest of the image blue. You get better results than SDXL with a prose prompt but it still produces a better image IMHO with a more structures prompt. Despite it's oddities Flux is a fantastic model. I think a lot of the captions were LLM generated which might explain the boring houses, I hate LLM captions they are mostly garbage, but human captioning would be impossible.
A porridgeburger! Priceless... thanks for the research.
I really appreciate this kind of video, it saves everyone a lot time but also opens up new lines of enquiry. You could go even more granular with these. There's definately some weird weighting in the model. I'd love more like this, I'm trying to find photographic styles and oddities. Who knows, maybe we can find some magic in latent.
What a fun project to do. Thanks for going through the pain of putting it together so we didn't have to.😄. It was really interesting.
I'm impressed with Flux's ability to understand plain English prompts rather than a set of keywords, also impressed that it works on my 4GB VRAM card (5 to 20 minutes per image), but I'm playing with SD1.5 models that produce better images... mostly.
Simpletuner now works with FLUX for fine tuning!
😊
Hmm, looks like at least right now Flux is mostly good for pretty women and food. Latent vision was saying it did better with more verbose sentences rather than keywords. Like it was trained on generated image descriptions.
Not exactly verbose, if you put a novel in it will ignore it. But you can use more complex sentences so it understands the way words are connected. So if you say "blue dress" it won't make the rest of the image blue. You get better results than SDXL with a prose prompt but it still produces a better image IMHO with a more structures prompt. Despite it's oddities Flux is a fantastic model. I think a lot of the captions were LLM generated which might explain the boring houses, I hate LLM captions they are mostly garbage, but human captioning would be impossible.
Yup. Flux terrible in styles or Artists. Waiting loras for that.