Higher quality images by prompting individual UNet blocks

Latent Vision

Просмотров 20 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 19 дек 2024

Комментарии • 221

@AustinHartt 6 месяцев назад ⁺⁸²
This is pretty incredible. That’s some serious fine control and starting to feel like a real tool rather than a lottery game.
@beatemero6718 6 месяцев назад ⁺¹⁴
Dude..... Another Game changer. Honestly, your work helps the most with composition. With further development of this node we can have specific prompt inputs of things like subject, stlye, maybe even background and such. I gave up on prompting and always prompt only the bare minimum, while using mainly img2img and the different ip Adapter models. This will help a lot with prompting way more precisely. Thanks for all the work you are doing.
@bobbobster8619 6 месяцев назад ⁺⁹
Hello, just want to say that I really appreciate your delivery of your comfyui concepts. Out of the several content videos I learn more from yours. Keep up to great job! I look forward to your videos.
@ysy69 6 месяцев назад ⁺⁴
Amazing! I was reading an article titled "Anthropic's Breakthrough: Understanding Frontier AI" by Ignacio de Gregorio Nobleja where they trained a sparse autoencoder (SAE) model that could' dissect’ neuron activations into more fine-grained data - the interpretable features - but also reconstruct the original data - going back to the original activations. Your R&D made me think about this article.
@neofuturist 6 месяцев назад ⁺⁴³
The architect strikes again!!
@catalystzerova 6 месяцев назад ⁺²⁷
This is the first time I’ve badly wanted to contribute to a project
@latentvision 6 месяцев назад ⁺¹⁶
DO IT!
@AbsolutelyForward 6 месяцев назад ⁺⁴
The part where you just changed the haircut (prompt) without loosing the rest of the image made me realize the potential of this technique - fascinating ❤
@lefourbe5596 6 месяцев назад ⁺⁹
it's time for me to target train my LORA on specific layer depending on subject !!!
it's gonna be so GOOOOOD
THANKS YOU !!!!
@alu3000 6 месяцев назад ⁺¹
my thoughts exactly!
@Neront90 6 месяцев назад ⁺¹
Hey, i want that too! :D
@hilbrandbos 6 месяцев назад
Can you already target a layer while training?
@lefourbe5596 6 месяцев назад
@@hilbrandbos yes in standard LORA.
I do like LoCon beter but i found that kohya SS doesn't let us target blocks (with Locon)
@hilbrandbos 6 месяцев назад ⁺¹
@@lefourbe5596 Oh I have to look that feature up in Kohya then... of course we first have to find out what block does what, it would be really handy if you're training a style you'd only address the style block.
@APrettyGoodChannel 6 месяцев назад ⁺²⁰
There was a paper a long time back about doing Textual Inversion this way, an embedding per block rather than an embedding for the whole model, which apparently gave much better results.
@ryo0ka936 6 месяцев назад ⁺¹
definitely like to see that in play!
@Nikki29oC 5 месяцев назад
Do you remember the title of the paper? I'd like to check it out. Thx❤
@APrettyGoodChannel 5 месяцев назад
@@Nikki29oC Unfortunately not, it was probably 1.5 years ago now.
@ArrowKnow 6 месяцев назад ⁺¹
every time I run into a specific problem you release a video with the solution within a week. Thank you so much I was wondering how to do this to stop a bleeding issue last night! Excited to try this out. Keep up the amazing work.
@bgtubber 6 месяцев назад ⁺⁷
So cool! Experiments like these push progress forward.
@aivideos322 6 месяцев назад ⁺⁸
You keep coming out with things you would think the creators of the models would have thought of. Great work as always
@joeljoonatankaartinen3469 6 месяцев назад ⁺³
It just might be that the creator who understands the model structure and functioning to the degree required to get this idea, doesn't actually exist. A lot of the progress has been iterative papers where the author takes an existing structure, changes a couple of things and makes a new paper from the results. Also, to get an understanding of the functioning of the model requires using the model a lot, which is not necessarily the thing that people who designed the model are interested in doing.
There can also be a kind of blindness that comes from being too deep with the models. You can end up with observations from early versions that are no longer true with later versions that you never end up rechecking and that can blind you to possibilities.
It's very often the case that breakthroughs happen from someone who doesn't understand working towards understanding. Someone who doesn't understand has a more flexible mind and is thus more likely to discover something new than someone who already has an understanding.
@amorgan5844 6 месяцев назад ⁺¹¹
"And probably all Christian too".....the Irish catching strays for no reason 😂
@morenofranco9235 5 месяцев назад ⁺¹
This is incredible. I will have towatch it two or three time more to get a real understanding. Thanks for the lesson.
@latent-broadcasting 6 месяцев назад
I did some quick test and at a first glance I noticed less aberrations, better hands, backgrounds that make more sense, i.e. I was getting a couch that had different sizes for the chair backrest and this fixed it, also it gets the colors and the style of the photo much better with IP-Adaptor. I'll make better test tomorrow. Thanks for sharing this!
@Mranshumansinghr 6 месяцев назад
Wow. Just tried it a few minutes ago. Its like you are in control of the prompt. Genius!!
@jasondulin7376 5 месяцев назад
Just starting this video but want to say: You are a great teacher and your sense of humor is spot on!
@PulpetMaster 6 месяцев назад
Thank you for that. I'm learning how to make lora/model and trying to understand the blocks during the training to speed up/ make model output better quality. cannot wait for findings of the tests!
@2shinrei 6 месяцев назад
Wow! Matteo, you deliver as always. This could be the next big thing to get more control over inference. I'm excited to see how this will evolve.
@dadekennedy9712 6 месяцев назад ⁺¹
This is absolutely amazing. I look forward to seeing the development of this process.
@dck7048 6 месяцев назад
Thank you for sharing your ideas Matteo. My bet is that this isn't something that has never been conceptualized before- it likely has, but as so many other breakthroughs it's locked behind close source.
This is probably the most exciting news I've seen for gen AI in a while, definitely the seed of something big. Great work!
@DarkYukon 6 месяцев назад ⁺¹
Great work. Good to see that regional fine tools are progressed. First Omost and now this. Matteo you are really magician.
@nissessin Месяц назад ⁺¹
Thanks!
@jancijak9385 6 месяцев назад ⁺¹
I posted long ago, when sd1.5, came out, what we lack in the models are control. The answer from openai dale models was more natural language prompting, which is failure to understand the problem. When controlnet and ipadapters came out, it seamed like the right direction. There are other parts of the entire pipeline encoding, embedding, latent operations, which could have more nodes, to control the input/output. For example you could have different scheduling for each unet block, or have unet block from different model. I would split all the unet blocks into separate nodes.
@PamellaCardoso-pp5tr 6 месяцев назад
Yeah i already so something similar to that in the latent, where i got a workflow with 3 separate advanced ksamplers and i manually adjust the amount of denoise for each step (basically im manually scheduling) dividing the workload into 3 advanced ksamplers improves the quality of the generation BY A FUCKING LOT and you can even add newer details into it at specific areas by knowing at which step such concepts might be added (like small details usually show up closer to the end steps, while the composition is defined in the first 25% of the total steps.
So definitelly making separate schedulers for each unet block would improve the AI generations by a lot
@jibcot8541 6 месяцев назад
This is so cool, the results look great!, I will have a play with the block impacts.
@odw32 6 месяцев назад ⁺⁴
For the times when you want "fast results" rather than "fine-grained control", I could imagine that it could be interesting to split a single prompt into seperated UNet block inputs, using some kind of text classification.
@generalawareness101 6 месяцев назад
Very nice and your findings are what I found with peturb as I was playing around with it. Funny, as I was saying I would love to do what this does and here it is.
@canibalcorps 6 месяцев назад
OMG it's exactly what I was thinking about!
Thank you for your efforts and your works. You're a genius.
@Timbershield 6 месяцев назад ⁺⁴
I played with block merging a long time ago and found that about 5 of the output blocks influenced body composition, hair and clothing but i no longer have my list. From memory it was something like block 4,6,7 and 8 or 9, but i'll leave it to the experts as you only seem to have 5 output blocks and not the 10 the model merger i used had.
@MarcSpctr 6 месяцев назад ⁺²
If this gets paired up with OMOST.
It's a whole different level of Image Generation we can achieve.
edit: As omost currently targets Area to determine the composition, but if with the area, it can also target specific blocks, it's just next level thing.
@the_infinity_time 6 месяцев назад
gonna tried today, thanks for all the work and the tutorials Matteo, you are awesome
@davidb8057 6 месяцев назад ⁺¹
Another gem from Matteo, thank you! It's indeed very promising.
@zef3k 6 месяцев назад ⁺²
This is amazing. Been wanting a way to interact with the u-net without having to send random numbers at it that I don't know how scale. Something like this for LoRa would be pretty amazing. To only apply it to certain layers in a node like this. The conditioning here will make it easy to see where to focus.
@Seany06 6 месяцев назад ⁺¹
Have you checked the lora block weight extension?
@zef3k 6 месяцев назад
@@Seany06I think so, but didn't really have a grasp of what each affects and when. Still don't really, but this definitely helps. I was also thinking of trying to apply controlnet conditioning to specific i/o with this.
@BubbleVolcano 6 месяцев назад ⁺²
I played PAG in April, and my feeling was that saturation increased a lot, and I didn't expect it to change the content in addition to changing the brightness. It's kind of like a lora-block-weight operation, and then it's a direct checkpoint operation, right? There might be something to learn from it.I hope this to be definitive, not metaphysical / superstition. we need more precision for AI.
@Sedtiny 6 месяцев назад
Matteo's creation of such precise tools elevates technology to the level of art. The specificity inherent in these tools rivals that of art itself
@victorhansson3410 4 месяца назад
i laughed loudly at the sudden "they're all probably christian too". great video as always matteo
@courtneyb6154 5 месяцев назад
If anyone could prove that some ai artists should have copyright protection over their image generations, it would be you. Definitely no "Do It" button here. Amazing stuff and thank you for taking the time to grind through all the possibilities and then breaking it down for the rest of us dummies ;-)
@Neront90 6 месяцев назад ⁺²
You are the tip of the open source spear.
@fernandomasotto 6 месяцев назад ⁺¹
this is really interesting. Allways looking where no one else does.
@superlucky4499 6 месяцев назад ⁺¹
This is incredible! Thank you for all your work!
@vitalis 5 месяцев назад ⁺²
This channel is so underrated.
@latentvision 5 месяцев назад ⁺²
I'm keeping a low profile
@HestoySeghuro 6 месяцев назад ⁺⁴
GOLD CONTENT. For real.
@shadystranger2866 6 месяцев назад
Astonishing! Thank you for your work!
It would be great to have similar control over IPAdapter conditioning in the future
@shadystranger2866 6 месяцев назад
I guess it would be great to try this thing with Stability's Revision technique
@latentvision 6 месяцев назад ⁺¹
It's kinda possible already with the undocumented "mad scientist" node :)
@swannschilling474 6 месяцев назад
Thank you so much for all the work and all the great content!! You are the best!! 🥰
@skycladsquirrel 6 месяцев назад
It would be awesome to be able to customize the label all the inputs. Great work Matteo
@tomaslindholm9780 6 месяцев назад
I have actually been experimenting with Kohya deep shrink to increase resolution doing blockwise (0-32) alternations. Now, with this, if i prompt for attention to body proportions on in8, it seems like I get much better output for full body poses. In Kohya, having a downscale factor 2, on block 3 with a ending weight between 0.35 and 0.65 seems to do the trick producing double sdxl resolution output.
@Foolsjoker 6 месяцев назад
Why is it every time you post something, I'm like, "this, this is the direction SD will be heading in. Everyone is going to be using this." Oh, it is because that is exactly what happens. I cannot wait to try this.
@urgyenrigdzin3775 6 месяцев назад
Wow... this looks very powerful, thank you very much!
@hackerthumb1551 6 месяцев назад ⁺¹
thank you for the excellent video and all your work :D you truly rock dude ^^
after watching i was wondering if, theoretically could the injection method be adapted to use controlnets as an input also? using the injection to target the blocks that you would want the controlnet applied too.
i only ask as when using controlnets, iv observed input bleeding, similar to the prompt bleeding. it may be a way to achieve modification via controlnet without loosing as much consistency of the original character.
Thank you for all your hard work and passion :)
@PeppePascale_ 6 месяцев назад ⁺¹
1:51 mi son piegato dal ridere . Grazie per il video come sempre. il migliore
@nekodificador 6 месяцев назад ⁺²
I knew I was smelling the week's game changer! How can I contribute?
@pseudomonarchiadaemonum4566 4 месяца назад
High quality content! Thanks Mateo!
@Mika43344 6 месяцев назад
great video as always! better prompting FINALLY!!!
@fgrbl 6 месяцев назад
Valeu!
@DreamStarter_1 6 месяцев назад ⁺¹
Very interesting, thanks for the great work!
@kovakavics 6 месяцев назад
mate this is crazy! well done, and thank you!
@alu3000 6 месяцев назад
Controlling weights per block in Lora was a gamechanger for me, but this takes it on another level!
@lefourbe5596 6 месяцев назад
any doc you have to in reference that plz :) ?
@mariokotlar303 6 месяцев назад
I feel I'm witnessing history in the making
@jccluaviz 6 месяцев назад
Amazing !!! Genius !!! A new master piece is comming
@tomschuelke7955 6 месяцев назад ⁺³
Reminds me of a sientists who uses a MRT to identify the parts of a brain that react to language or specific words.. so you are cerating sort of a card of understanding..
@amorgan5844 6 месяцев назад ⁺¹
Question. How much do loras have an effect on the input and output?
@latentvision 6 месяцев назад
how do you mean? we work on the cross attention, the lora weights have been already added at that point. depends on the lora the kind of influence that it has
@vintagegenious 6 месяцев назад ⁺¹
We could choose which blocks to apply lora weights to
@amorgan5844 6 месяцев назад
@latentvision like vintage put it, loras only influencing certain blocks. I haven't had time to test what I'm asking so it may not be a very well structured question. I'll ha e some time this weekend and will come back🤣
@amorgan5844 6 месяцев назад
@@vintagegenious well put, that's what my brain was struggling to spit out😂 good job mate.
@vintagegenious 6 месяцев назад
@@amorgan5844 😁
@ProzacgodAI 6 месяцев назад
I wonder what a textual inversion would do on this, like the character turn around, in some cases the details of the character can be lost. This makes me think that you could use charturner on just one of these inputs, ipadapter for a character reference, and the prompts to help guide it a bit.
@Paulo-ut1li 6 месяцев назад
That’s amazing work, Matteo!
@kmdcompelelct 6 месяцев назад ⁺¹
I'm wondering if this can be applied to training LoRAs and doing fine tuning?
@latentvision 6 месяцев назад ⁺¹
I it might help with fine tuning yeah
@BuffPuffer 6 месяцев назад ⁺¹
Yes, this is already possible with B-LoRA. Look it up.
@beveresmoor 6 месяцев назад
Such an interesting finding. I have to try it myself. 👍😘
@solidkundi 6 месяцев назад
you are the Satoshi of Stable Diffusion!
@matthewharrison3813 6 месяцев назад
How do the layers map to the denoise process? Might the latter layers be good for detail prompts?
@throttlekitty1 6 месяцев назад
How about a debug sort of node that takes a prompt, and outputs a set of images for each unet block in isolation? Maybe useful, but I suspect this is going to vary a lot between prompts and the finetune being used. I remember seeing heatmap style visualizations for attention heads in the past, maybe that can be done here?
@KDawg5000 6 месяцев назад
Preventing color bleeding would be nice. If there was a way to tease out which blocks look at color for foreground objects vs background, that would be useful.
@Neront90 6 месяцев назад
Yes, we need download this and report our findings
@Ymirheim 6 месяцев назад
I'm looking forward to experiment with this! One thing that stands out to me out of the gate though. Shouldn't I get the same render if I pass the same prompt to a regular sampler and to the all pin on the injector while using the same seed for both? What is the technical reason for getting two slightly different images rather than two exact duplicates?
@latentvision 6 месяцев назад ⁺¹
the results is very similar but not the same because the embeds are applied in different places
@ahminlaffet3555 3 месяца назад
I get a tensor size error when i try.. Also, if i try to write a patched model with save checkpoint, it does not seem to get included into the result. I believe the error is still there, just being ignored. When i render the result model in a ksampler, it throws the different tensor size error, and wont continue. torch 2.4.1 and current version. "stack expects each tensor to be equal size"
@demondemons 6 месяцев назад ⁺¹
Your humour is amazing! :D
@Firespark81 6 месяцев назад
This is amazing!
@bobmurphy2216 3 месяца назад
How are the different layers mapped? Is it unique to the checkpoint or is it the same for each base model?
@latentvision 3 месяца назад ⁺¹
usually it's all the same, but some models do weird stuff with the blocks so it might not always work (eg: pony models might be different)
@bobmurphy2216 3 месяца назад
@@latentvision thanks for the reply. I hooked up a sd1.5 checkpoint to the standard node and it didn’t work so well. Ended up having to use the index node and I got decent results but it took a few hours to figure out by trial and error. I like the concept of being able to fine tune details at the Unet level.
@dmcdcm 6 месяцев назад
I’d be interested to test out control net conditioning sent to specific blocks
@RamonGuthrie 6 месяцев назад
When Matteo says suffer, I just think joy! 😄
@latentvision 6 месяцев назад
no kink shaming
@PixelPoetryxIA 6 месяцев назад ⁺¹
That is crazy! Good job
@mussabekovdaniyar8157 6 месяцев назад
How it works with ControlNet? We can use separate control nets to each block for better influence?
@ProzacgodAI 6 месяцев назад
What happens when you use a controlnet on the various inputs?
@latentvision 6 месяцев назад
that's technically feasible, haven't tested it yet
@SBaldo8 6 месяцев назад
Do you think that something going into input 4 is still affecting everything after it or by design of this node only that specific part of the unet is affected? Very interesting stuff I'd love to contribute maybe setting up a wildcard setup w random connections or even just by letting my gpu time going w your custom workflow and reporting results
@andreh4859 6 месяцев назад
Cool stuff! Is it only working for Turbo models? I've got am error at the KSampler
@timothywcrane 6 месяцев назад
I know this is stable diffusion, but could this same arch be put to use to in CLIP/VQGAN? I have TBs of retro (in AI time) complete CLIP/VQGAN step reels with known seeds ;) .
@lonelyeyedlad769 6 месяцев назад
Wonderful idea! Trying to test it out now. Ran into an error off the bat aha. Have you ever seen this by chance? 'Error occurred when executing KSampler:
stack expects each tensor to be equal size, but got [1, 231, 2048] at entry 0 and [1, 77, 2048] at entry 1'
@mdx-fm3vj 6 месяцев назад ⁺¹
it seems there is certain max number of tokens per prompt, shortening each prompt fixes this (for me)
@latentvision 6 месяцев назад ⁺¹
yeah at the moment it only works with simple prompts (no concat or long prompts). I'll fix that if there's enough interest
@lonelyeyedlad769 6 месяцев назад
@@latentvision No worries. Thank you for the help!
@allhailthealgorithm 6 месяцев назад
You might be able to use an LLM to automatically try prompting different blocks and a different model to analyze the outputs, like the RAM++ model...
@pandelik3450 6 месяцев назад
How is this technically different (or similar) from using options in the IPAdapter like ease in, ease out, etc?
@latentvision 6 месяцев назад
this is more surgical but also takes a lot of time
@MrPaPaYa86 6 месяцев назад
We need to come up with a system for systematic testing and reporting of blocks functions in stable diffusion, so they can be added to the models information in civitAI
@latentvision 6 месяцев назад
ah! that would be great!
@xyem1928 5 месяцев назад
It seems like it would be fairly easy to do. We'd just need to build a set of keywords along with what they represent (e.g. composition, pose, medium) and then present the "base" image as well as the one with the block prompt and have a user rating of 1) how different it is from the base and 2) how well the second one represents the keyword. This would identify which blocks were sensitive to which concept class and concept.
@aronsingh9243 2 месяца назад
how are you generating the landscape without a prompt?
@goodie2shoes 6 месяцев назад
My comments keep disappering. Maybe because I put a link in it? Anyway. I wanted to know if this is somehow related to the olmost technology that has recently been introduced.
@latentvision 6 месяцев назад ⁺¹
no, that's a different thing. they just do targeted prompting powered by an LLM
@Lahouel 6 месяцев назад
Back to your BEST Matteo. 👋👋👋👋👋
@unsivilaudio 6 месяцев назад ⁺¹
engineer> "Where are the engineers?" 😅
---
comment for the algorithm folks :)
@latentvision 6 месяцев назад
I am not an engineer 😅
@tamatlt7469 6 месяцев назад ⁺²
@@latentvision, no you are an art magician!
@ImmacHn 6 месяцев назад ⁺³
@@latentvision You are, a title doesn't make you an engineer.
@ChanhDucTuong Месяц назад
Nice thank you for sharing. Is it possible to do this with Flux?
@latentvision Месяц назад
kinda, I talked about it in my flux video, but it's not as effective
@styrke9272 6 месяцев назад
You Always bringing good surprises
@Skyn3tD1dN0th1ngWr0ng День назад
To be overly frank, Matteo, we all have many numerous and "dangerous" ideas, but we lack the talent to code for it, meaning after sometime the intelligent-people who actually works with codding languages and knows the suffering involved in the task get fed-up of our "wonderful" imagination and after some more time we learn to not share our "ingenious" ideas so broadly, all this (most of us) by the age of 10. That's why I think a Discord servers like the one you have can be a better suited "nest" for this idea-sharing process, because it's centralizes people with the same passion. You could say Reddit, but the amount of archived post with wrong information (often the most up-voted) no longer available for open discussion is appalling and by this point Reddit is as accurate as wikifandom when not straight up an advertisement-platform only.
I thinks is thanks to people like you that we have, let's say open-source software for overclocking GPUs; image editors and so forth and my big brain-storming lapsus won't help in any way. I thought you understood the individual blocks weights in-dept before this video and now I'm like "how did he made IpAdapter work in the first place?", also knowing that is an overstatement and part of your quirky humor.
What I want now is to have a check-box on OneTrainer to train specific blocks depending on the model concept I want to train, should I pester Nerogar with my "genius" brain-fart? 🤔
@joangonzalez8394 6 месяцев назад
Il Leonardo Da Vinci dell'IA Generativa 🤯
@lalayblog 6 месяцев назад
What if the role of each UNet block is different for different unrelated models (like SDXL vs Pony - they are seemingly the same in sense of architecture, but merge of SDXL and Pony gives bad results).
@latentvision 6 месяцев назад
from my testing most checkpoints react in the same way. maybe with the exception of heavily stylized checkpoints (possibly merged with a style lora)
@lrkx_ 6 месяцев назад
Damn, this is genius!!!
@juliana.2120 3 месяца назад
5:20 why the long face
@madmushroom8639 6 месяцев назад ⁺¹
Yer a wizard, Matteo!
@wndrflx 6 месяцев назад
What does the ConditioningZeroOut do?
@tomschuelke7955 6 месяцев назад
Well its not everyones thing, to test out where to plugi in differing Prompt parts untill it works..
So maybe one can find out to what topics in general, parts of those splitted inputs are sensitive. idealy if you prompt an image, another ai, would split this scentence into these inputs. But i also wonder if the sensitivity for these blocks are always everywhere the same.. or if with another model, the input corean, wouldn do anything because in the other model the block is responsibel for maybe only the style of the image
@latentvision 6 месяцев назад ⁺¹
"standard" models if they are not over trained seem to react the same way, yes. There might be some issues with stuff like pony models I'm sure.

Следующие

Автовоспроизведение