Dude..... Another Game changer. Honestly, your work helps the most with composition. With further development of this node we can have specific prompt inputs of things like subject, stlye, maybe even background and such. I gave up on prompting and always prompt only the bare minimum, while using mainly img2img and the different ip Adapter models. This will help a lot with prompting way more precisely. Thanks for all the work you are doing.
Hello, just want to say that I really appreciate your delivery of your comfyui concepts. Out of the several content videos I learn more from yours. Keep up to great job! I look forward to your videos.
Amazing! I was reading an article titled "Anthropic's Breakthrough: Understanding Frontier AI" by Ignacio de Gregorio Nobleja where they trained a sparse autoencoder (SAE) model that could' dissect’ neuron activations into more fine-grained data - the interpretable features - but also reconstruct the original data - going back to the original activations. Your R&D made me think about this article.
The part where you just changed the haircut (prompt) without loosing the rest of the image made me realize the potential of this technique - fascinating ❤
@@lefourbe5596 Oh I have to look that feature up in Kohya then... of course we first have to find out what block does what, it would be really handy if you're training a style you'd only address the style block.
There was a paper a long time back about doing Textual Inversion this way, an embedding per block rather than an embedding for the whole model, which apparently gave much better results.
every time I run into a specific problem you release a video with the solution within a week. Thank you so much I was wondering how to do this to stop a bleeding issue last night! Excited to try this out. Keep up the amazing work.
It just might be that the creator who understands the model structure and functioning to the degree required to get this idea, doesn't actually exist. A lot of the progress has been iterative papers where the author takes an existing structure, changes a couple of things and makes a new paper from the results. Also, to get an understanding of the functioning of the model requires using the model a lot, which is not necessarily the thing that people who designed the model are interested in doing. There can also be a kind of blindness that comes from being too deep with the models. You can end up with observations from early versions that are no longer true with later versions that you never end up rechecking and that can blind you to possibilities. It's very often the case that breakthroughs happen from someone who doesn't understand working towards understanding. Someone who doesn't understand has a more flexible mind and is thus more likely to discover something new than someone who already has an understanding.
I did some quick test and at a first glance I noticed less aberrations, better hands, backgrounds that make more sense, i.e. I was getting a couch that had different sizes for the chair backrest and this fixed it, also it gets the colors and the style of the photo much better with IP-Adaptor. I'll make better test tomorrow. Thanks for sharing this!
Thank you for that. I'm learning how to make lora/model and trying to understand the blocks during the training to speed up/ make model output better quality. cannot wait for findings of the tests!
Thank you for sharing your ideas Matteo. My bet is that this isn't something that has never been conceptualized before- it likely has, but as so many other breakthroughs it's locked behind close source. This is probably the most exciting news I've seen for gen AI in a while, definitely the seed of something big. Great work!
I posted long ago, when sd1.5, came out, what we lack in the models are control. The answer from openai dale models was more natural language prompting, which is failure to understand the problem. When controlnet and ipadapters came out, it seamed like the right direction. There are other parts of the entire pipeline encoding, embedding, latent operations, which could have more nodes, to control the input/output. For example you could have different scheduling for each unet block, or have unet block from different model. I would split all the unet blocks into separate nodes.
Yeah i already so something similar to that in the latent, where i got a workflow with 3 separate advanced ksamplers and i manually adjust the amount of denoise for each step (basically im manually scheduling) dividing the workload into 3 advanced ksamplers improves the quality of the generation BY A FUCKING LOT and you can even add newer details into it at specific areas by knowing at which step such concepts might be added (like small details usually show up closer to the end steps, while the composition is defined in the first 25% of the total steps. So definitelly making separate schedulers for each unet block would improve the AI generations by a lot
For the times when you want "fast results" rather than "fine-grained control", I could imagine that it could be interesting to split a single prompt into seperated UNet block inputs, using some kind of text classification.
Very nice and your findings are what I found with peturb as I was playing around with it. Funny, as I was saying I would love to do what this does and here it is.
I played with block merging a long time ago and found that about 5 of the output blocks influenced body composition, hair and clothing but i no longer have my list. From memory it was something like block 4,6,7 and 8 or 9, but i'll leave it to the experts as you only seem to have 5 output blocks and not the 10 the model merger i used had.
If this gets paired up with OMOST. It's a whole different level of Image Generation we can achieve. edit: As omost currently targets Area to determine the composition, but if with the area, it can also target specific blocks, it's just next level thing.
This is amazing. Been wanting a way to interact with the u-net without having to send random numbers at it that I don't know how scale. Something like this for LoRa would be pretty amazing. To only apply it to certain layers in a node like this. The conditioning here will make it easy to see where to focus.
@@Seany06I think so, but didn't really have a grasp of what each affects and when. Still don't really, but this definitely helps. I was also thinking of trying to apply controlnet conditioning to specific i/o with this.
I played PAG in April, and my feeling was that saturation increased a lot, and I didn't expect it to change the content in addition to changing the brightness. It's kind of like a lora-block-weight operation, and then it's a direct checkpoint operation, right? There might be something to learn from it.I hope this to be definitive, not metaphysical / superstition. we need more precision for AI.
If anyone could prove that some ai artists should have copyright protection over their image generations, it would be you. Definitely no "Do It" button here. Amazing stuff and thank you for taking the time to grind through all the possibilities and then breaking it down for the rest of us dummies ;-)
I have actually been experimenting with Kohya deep shrink to increase resolution doing blockwise (0-32) alternations. Now, with this, if i prompt for attention to body proportions on in8, it seems like I get much better output for full body poses. In Kohya, having a downscale factor 2, on block 3 with a ending weight between 0.35 and 0.65 seems to do the trick producing double sdxl resolution output.
Why is it every time you post something, I'm like, "this, this is the direction SD will be heading in. Everyone is going to be using this." Oh, it is because that is exactly what happens. I cannot wait to try this.
thank you for the excellent video and all your work :D you truly rock dude ^^ after watching i was wondering if, theoretically could the injection method be adapted to use controlnets as an input also? using the injection to target the blocks that you would want the controlnet applied too. i only ask as when using controlnets, iv observed input bleeding, similar to the prompt bleeding. it may be a way to achieve modification via controlnet without loosing as much consistency of the original character. Thank you for all your hard work and passion :)
Reminds me of a sientists who uses a MRT to identify the parts of a brain that react to language or specific words.. so you are cerating sort of a card of understanding..
how do you mean? we work on the cross attention, the lora weights have been already added at that point. depends on the lora the kind of influence that it has
@latentvision like vintage put it, loras only influencing certain blocks. I haven't had time to test what I'm asking so it may not be a very well structured question. I'll ha e some time this weekend and will come back🤣
I wonder what a textual inversion would do on this, like the character turn around, in some cases the details of the character can be lost. This makes me think that you could use charturner on just one of these inputs, ipadapter for a character reference, and the prompts to help guide it a bit.
How about a debug sort of node that takes a prompt, and outputs a set of images for each unet block in isolation? Maybe useful, but I suspect this is going to vary a lot between prompts and the finetune being used. I remember seeing heatmap style visualizations for attention heads in the past, maybe that can be done here?
Preventing color bleeding would be nice. If there was a way to tease out which blocks look at color for foreground objects vs background, that would be useful.
I'm looking forward to experiment with this! One thing that stands out to me out of the gate though. Shouldn't I get the same render if I pass the same prompt to a regular sampler and to the all pin on the injector while using the same seed for both? What is the technical reason for getting two slightly different images rather than two exact duplicates?
I get a tensor size error when i try.. Also, if i try to write a patched model with save checkpoint, it does not seem to get included into the result. I believe the error is still there, just being ignored. When i render the result model in a ksampler, it throws the different tensor size error, and wont continue. torch 2.4.1 and current version. "stack expects each tensor to be equal size"
@@latentvision thanks for the reply. I hooked up a sd1.5 checkpoint to the standard node and it didn’t work so well. Ended up having to use the index node and I got decent results but it took a few hours to figure out by trial and error. I like the concept of being able to fine tune details at the Unet level.
Do you think that something going into input 4 is still affecting everything after it or by design of this node only that specific part of the unet is affected? Very interesting stuff I'd love to contribute maybe setting up a wildcard setup w random connections or even just by letting my gpu time going w your custom workflow and reporting results
I know this is stable diffusion, but could this same arch be put to use to in CLIP/VQGAN? I have TBs of retro (in AI time) complete CLIP/VQGAN step reels with known seeds ;) .
Wonderful idea! Trying to test it out now. Ran into an error off the bat aha. Have you ever seen this by chance? 'Error occurred when executing KSampler: stack expects each tensor to be equal size, but got [1, 231, 2048] at entry 0 and [1, 77, 2048] at entry 1'
We need to come up with a system for systematic testing and reporting of blocks functions in stable diffusion, so they can be added to the models information in civitAI
It seems like it would be fairly easy to do. We'd just need to build a set of keywords along with what they represent (e.g. composition, pose, medium) and then present the "base" image as well as the one with the block prompt and have a user rating of 1) how different it is from the base and 2) how well the second one represents the keyword. This would identify which blocks were sensitive to which concept class and concept.
My comments keep disappering. Maybe because I put a link in it? Anyway. I wanted to know if this is somehow related to the olmost technology that has recently been introduced.
To be overly frank, Matteo, we all have many numerous and "dangerous" ideas, but we lack the talent to code for it, meaning after sometime the intelligent-people who actually works with codding languages and knows the suffering involved in the task get fed-up of our "wonderful" imagination and after some more time we learn to not share our "ingenious" ideas so broadly, all this (most of us) by the age of 10. That's why I think a Discord servers like the one you have can be a better suited "nest" for this idea-sharing process, because it's centralizes people with the same passion. You could say Reddit, but the amount of archived post with wrong information (often the most up-voted) no longer available for open discussion is appalling and by this point Reddit is as accurate as wikifandom when not straight up an advertisement-platform only. I thinks is thanks to people like you that we have, let's say open-source software for overclocking GPUs; image editors and so forth and my big brain-storming lapsus won't help in any way. I thought you understood the individual blocks weights in-dept before this video and now I'm like "how did he made IpAdapter work in the first place?", also knowing that is an overstatement and part of your quirky humor. What I want now is to have a check-box on OneTrainer to train specific blocks depending on the model concept I want to train, should I pester Nerogar with my "genius" brain-fart? 🤔
What if the role of each UNet block is different for different unrelated models (like SDXL vs Pony - they are seemingly the same in sense of architecture, but merge of SDXL and Pony gives bad results).
Well its not everyones thing, to test out where to plugi in differing Prompt parts untill it works.. So maybe one can find out to what topics in general, parts of those splitted inputs are sensitive. idealy if you prompt an image, another ai, would split this scentence into these inputs. But i also wonder if the sensitivity for these blocks are always everywhere the same.. or if with another model, the input corean, wouldn do anything because in the other model the block is responsibel for maybe only the style of the image
This is pretty incredible. That’s some serious fine control and starting to feel like a real tool rather than a lottery game.
Dude..... Another Game changer. Honestly, your work helps the most with composition. With further development of this node we can have specific prompt inputs of things like subject, stlye, maybe even background and such. I gave up on prompting and always prompt only the bare minimum, while using mainly img2img and the different ip Adapter models. This will help a lot with prompting way more precisely. Thanks for all the work you are doing.
Hello, just want to say that I really appreciate your delivery of your comfyui concepts. Out of the several content videos I learn more from yours. Keep up to great job! I look forward to your videos.
Amazing! I was reading an article titled "Anthropic's Breakthrough: Understanding Frontier AI" by Ignacio de Gregorio Nobleja where they trained a sparse autoencoder (SAE) model that could' dissect’ neuron activations into more fine-grained data - the interpretable features - but also reconstruct the original data - going back to the original activations. Your R&D made me think about this article.
The architect strikes again!!
This is the first time I’ve badly wanted to contribute to a project
DO IT!
The part where you just changed the haircut (prompt) without loosing the rest of the image made me realize the potential of this technique - fascinating ❤
it's time for me to target train my LORA on specific layer depending on subject !!!
it's gonna be so GOOOOOD
THANKS YOU !!!!
my thoughts exactly!
Hey, i want that too! :D
Can you already target a layer while training?
@@hilbrandbos yes in standard LORA.
I do like LoCon beter but i found that kohya SS doesn't let us target blocks (with Locon)
@@lefourbe5596 Oh I have to look that feature up in Kohya then... of course we first have to find out what block does what, it would be really handy if you're training a style you'd only address the style block.
There was a paper a long time back about doing Textual Inversion this way, an embedding per block rather than an embedding for the whole model, which apparently gave much better results.
definitely like to see that in play!
Do you remember the title of the paper? I'd like to check it out. Thx❤
@@Nikki29oC Unfortunately not, it was probably 1.5 years ago now.
every time I run into a specific problem you release a video with the solution within a week. Thank you so much I was wondering how to do this to stop a bleeding issue last night! Excited to try this out. Keep up the amazing work.
So cool! Experiments like these push progress forward.
You keep coming out with things you would think the creators of the models would have thought of. Great work as always
It just might be that the creator who understands the model structure and functioning to the degree required to get this idea, doesn't actually exist. A lot of the progress has been iterative papers where the author takes an existing structure, changes a couple of things and makes a new paper from the results. Also, to get an understanding of the functioning of the model requires using the model a lot, which is not necessarily the thing that people who designed the model are interested in doing.
There can also be a kind of blindness that comes from being too deep with the models. You can end up with observations from early versions that are no longer true with later versions that you never end up rechecking and that can blind you to possibilities.
It's very often the case that breakthroughs happen from someone who doesn't understand working towards understanding. Someone who doesn't understand has a more flexible mind and is thus more likely to discover something new than someone who already has an understanding.
"And probably all Christian too".....the Irish catching strays for no reason 😂
This is incredible. I will have towatch it two or three time more to get a real understanding. Thanks for the lesson.
I did some quick test and at a first glance I noticed less aberrations, better hands, backgrounds that make more sense, i.e. I was getting a couch that had different sizes for the chair backrest and this fixed it, also it gets the colors and the style of the photo much better with IP-Adaptor. I'll make better test tomorrow. Thanks for sharing this!
Wow. Just tried it a few minutes ago. Its like you are in control of the prompt. Genius!!
Just starting this video but want to say: You are a great teacher and your sense of humor is spot on!
Thank you for that. I'm learning how to make lora/model and trying to understand the blocks during the training to speed up/ make model output better quality. cannot wait for findings of the tests!
Wow! Matteo, you deliver as always. This could be the next big thing to get more control over inference. I'm excited to see how this will evolve.
This is absolutely amazing. I look forward to seeing the development of this process.
Thank you for sharing your ideas Matteo. My bet is that this isn't something that has never been conceptualized before- it likely has, but as so many other breakthroughs it's locked behind close source.
This is probably the most exciting news I've seen for gen AI in a while, definitely the seed of something big. Great work!
Great work. Good to see that regional fine tools are progressed. First Omost and now this. Matteo you are really magician.
Thanks!
I posted long ago, when sd1.5, came out, what we lack in the models are control. The answer from openai dale models was more natural language prompting, which is failure to understand the problem. When controlnet and ipadapters came out, it seamed like the right direction. There are other parts of the entire pipeline encoding, embedding, latent operations, which could have more nodes, to control the input/output. For example you could have different scheduling for each unet block, or have unet block from different model. I would split all the unet blocks into separate nodes.
Yeah i already so something similar to that in the latent, where i got a workflow with 3 separate advanced ksamplers and i manually adjust the amount of denoise for each step (basically im manually scheduling) dividing the workload into 3 advanced ksamplers improves the quality of the generation BY A FUCKING LOT and you can even add newer details into it at specific areas by knowing at which step such concepts might be added (like small details usually show up closer to the end steps, while the composition is defined in the first 25% of the total steps.
So definitelly making separate schedulers for each unet block would improve the AI generations by a lot
This is so cool, the results look great!, I will have a play with the block impacts.
For the times when you want "fast results" rather than "fine-grained control", I could imagine that it could be interesting to split a single prompt into seperated UNet block inputs, using some kind of text classification.
Very nice and your findings are what I found with peturb as I was playing around with it. Funny, as I was saying I would love to do what this does and here it is.
OMG it's exactly what I was thinking about!
Thank you for your efforts and your works. You're a genius.
I played with block merging a long time ago and found that about 5 of the output blocks influenced body composition, hair and clothing but i no longer have my list. From memory it was something like block 4,6,7 and 8 or 9, but i'll leave it to the experts as you only seem to have 5 output blocks and not the 10 the model merger i used had.
If this gets paired up with OMOST.
It's a whole different level of Image Generation we can achieve.
edit: As omost currently targets Area to determine the composition, but if with the area, it can also target specific blocks, it's just next level thing.
gonna tried today, thanks for all the work and the tutorials Matteo, you are awesome
Another gem from Matteo, thank you! It's indeed very promising.
This is amazing. Been wanting a way to interact with the u-net without having to send random numbers at it that I don't know how scale. Something like this for LoRa would be pretty amazing. To only apply it to certain layers in a node like this. The conditioning here will make it easy to see where to focus.
Have you checked the lora block weight extension?
@@Seany06I think so, but didn't really have a grasp of what each affects and when. Still don't really, but this definitely helps. I was also thinking of trying to apply controlnet conditioning to specific i/o with this.
I played PAG in April, and my feeling was that saturation increased a lot, and I didn't expect it to change the content in addition to changing the brightness. It's kind of like a lora-block-weight operation, and then it's a direct checkpoint operation, right? There might be something to learn from it.I hope this to be definitive, not metaphysical / superstition. we need more precision for AI.
Matteo's creation of such precise tools elevates technology to the level of art. The specificity inherent in these tools rivals that of art itself
i laughed loudly at the sudden "they're all probably christian too". great video as always matteo
If anyone could prove that some ai artists should have copyright protection over their image generations, it would be you. Definitely no "Do It" button here. Amazing stuff and thank you for taking the time to grind through all the possibilities and then breaking it down for the rest of us dummies ;-)
You are the tip of the open source spear.
this is really interesting. Allways looking where no one else does.
This is incredible! Thank you for all your work!
This channel is so underrated.
I'm keeping a low profile
GOLD CONTENT. For real.
Astonishing! Thank you for your work!
It would be great to have similar control over IPAdapter conditioning in the future
I guess it would be great to try this thing with Stability's Revision technique
It's kinda possible already with the undocumented "mad scientist" node :)
Thank you so much for all the work and all the great content!! You are the best!! 🥰
It would be awesome to be able to customize the label all the inputs. Great work Matteo
I have actually been experimenting with Kohya deep shrink to increase resolution doing blockwise (0-32) alternations. Now, with this, if i prompt for attention to body proportions on in8, it seems like I get much better output for full body poses. In Kohya, having a downscale factor 2, on block 3 with a ending weight between 0.35 and 0.65 seems to do the trick producing double sdxl resolution output.
Why is it every time you post something, I'm like, "this, this is the direction SD will be heading in. Everyone is going to be using this." Oh, it is because that is exactly what happens. I cannot wait to try this.
Wow... this looks very powerful, thank you very much!
thank you for the excellent video and all your work :D you truly rock dude ^^
after watching i was wondering if, theoretically could the injection method be adapted to use controlnets as an input also? using the injection to target the blocks that you would want the controlnet applied too.
i only ask as when using controlnets, iv observed input bleeding, similar to the prompt bleeding. it may be a way to achieve modification via controlnet without loosing as much consistency of the original character.
Thank you for all your hard work and passion :)
1:51 mi son piegato dal ridere . Grazie per il video come sempre. il migliore
I knew I was smelling the week's game changer! How can I contribute?
High quality content! Thanks Mateo!
great video as always! better prompting FINALLY!!!
Valeu!
Very interesting, thanks for the great work!
mate this is crazy! well done, and thank you!
Controlling weights per block in Lora was a gamechanger for me, but this takes it on another level!
any doc you have to in reference that plz :) ?
I feel I'm witnessing history in the making
Amazing !!! Genius !!! A new master piece is comming
Reminds me of a sientists who uses a MRT to identify the parts of a brain that react to language or specific words.. so you are cerating sort of a card of understanding..
Question. How much do loras have an effect on the input and output?
how do you mean? we work on the cross attention, the lora weights have been already added at that point. depends on the lora the kind of influence that it has
We could choose which blocks to apply lora weights to
@latentvision like vintage put it, loras only influencing certain blocks. I haven't had time to test what I'm asking so it may not be a very well structured question. I'll ha e some time this weekend and will come back🤣
@@vintagegenious well put, that's what my brain was struggling to spit out😂 good job mate.
@@amorgan5844 😁
I wonder what a textual inversion would do on this, like the character turn around, in some cases the details of the character can be lost. This makes me think that you could use charturner on just one of these inputs, ipadapter for a character reference, and the prompts to help guide it a bit.
That’s amazing work, Matteo!
I'm wondering if this can be applied to training LoRAs and doing fine tuning?
I it might help with fine tuning yeah
Yes, this is already possible with B-LoRA. Look it up.
Such an interesting finding. I have to try it myself. 👍😘
you are the Satoshi of Stable Diffusion!
How do the layers map to the denoise process? Might the latter layers be good for detail prompts?
How about a debug sort of node that takes a prompt, and outputs a set of images for each unet block in isolation? Maybe useful, but I suspect this is going to vary a lot between prompts and the finetune being used. I remember seeing heatmap style visualizations for attention heads in the past, maybe that can be done here?
Preventing color bleeding would be nice. If there was a way to tease out which blocks look at color for foreground objects vs background, that would be useful.
Yes, we need download this and report our findings
I'm looking forward to experiment with this! One thing that stands out to me out of the gate though. Shouldn't I get the same render if I pass the same prompt to a regular sampler and to the all pin on the injector while using the same seed for both? What is the technical reason for getting two slightly different images rather than two exact duplicates?
the results is very similar but not the same because the embeds are applied in different places
I get a tensor size error when i try.. Also, if i try to write a patched model with save checkpoint, it does not seem to get included into the result. I believe the error is still there, just being ignored. When i render the result model in a ksampler, it throws the different tensor size error, and wont continue. torch 2.4.1 and current version. "stack expects each tensor to be equal size"
Your humour is amazing! :D
This is amazing!
How are the different layers mapped? Is it unique to the checkpoint or is it the same for each base model?
usually it's all the same, but some models do weird stuff with the blocks so it might not always work (eg: pony models might be different)
@@latentvision thanks for the reply. I hooked up a sd1.5 checkpoint to the standard node and it didn’t work so well. Ended up having to use the index node and I got decent results but it took a few hours to figure out by trial and error. I like the concept of being able to fine tune details at the Unet level.
I’d be interested to test out control net conditioning sent to specific blocks
When Matteo says suffer, I just think joy! 😄
no kink shaming
That is crazy! Good job
How it works with ControlNet? We can use separate control nets to each block for better influence?
What happens when you use a controlnet on the various inputs?
that's technically feasible, haven't tested it yet
Do you think that something going into input 4 is still affecting everything after it or by design of this node only that specific part of the unet is affected? Very interesting stuff I'd love to contribute maybe setting up a wildcard setup w random connections or even just by letting my gpu time going w your custom workflow and reporting results
Cool stuff! Is it only working for Turbo models? I've got am error at the KSampler
I know this is stable diffusion, but could this same arch be put to use to in CLIP/VQGAN? I have TBs of retro (in AI time) complete CLIP/VQGAN step reels with known seeds ;) .
Wonderful idea! Trying to test it out now. Ran into an error off the bat aha. Have you ever seen this by chance? 'Error occurred when executing KSampler:
stack expects each tensor to be equal size, but got [1, 231, 2048] at entry 0 and [1, 77, 2048] at entry 1'
it seems there is certain max number of tokens per prompt, shortening each prompt fixes this (for me)
yeah at the moment it only works with simple prompts (no concat or long prompts). I'll fix that if there's enough interest
@@latentvision No worries. Thank you for the help!
You might be able to use an LLM to automatically try prompting different blocks and a different model to analyze the outputs, like the RAM++ model...
How is this technically different (or similar) from using options in the IPAdapter like ease in, ease out, etc?
this is more surgical but also takes a lot of time
We need to come up with a system for systematic testing and reporting of blocks functions in stable diffusion, so they can be added to the models information in civitAI
ah! that would be great!
It seems like it would be fairly easy to do. We'd just need to build a set of keywords along with what they represent (e.g. composition, pose, medium) and then present the "base" image as well as the one with the block prompt and have a user rating of 1) how different it is from the base and 2) how well the second one represents the keyword. This would identify which blocks were sensitive to which concept class and concept.
how are you generating the landscape without a prompt?
My comments keep disappering. Maybe because I put a link in it? Anyway. I wanted to know if this is somehow related to the olmost technology that has recently been introduced.
no, that's a different thing. they just do targeted prompting powered by an LLM
Back to your BEST Matteo. 👋👋👋👋👋
engineer> "Where are the engineers?" 😅
---
comment for the algorithm folks :)
I am not an engineer 😅
@@latentvision, no you are an art magician!
@@latentvision You are, a title doesn't make you an engineer.
Nice thank you for sharing. Is it possible to do this with Flux?
kinda, I talked about it in my flux video, but it's not as effective
You Always bringing good surprises
To be overly frank, Matteo, we all have many numerous and "dangerous" ideas, but we lack the talent to code for it, meaning after sometime the intelligent-people who actually works with codding languages and knows the suffering involved in the task get fed-up of our "wonderful" imagination and after some more time we learn to not share our "ingenious" ideas so broadly, all this (most of us) by the age of 10. That's why I think a Discord servers like the one you have can be a better suited "nest" for this idea-sharing process, because it's centralizes people with the same passion. You could say Reddit, but the amount of archived post with wrong information (often the most up-voted) no longer available for open discussion is appalling and by this point Reddit is as accurate as wikifandom when not straight up an advertisement-platform only.
I thinks is thanks to people like you that we have, let's say open-source software for overclocking GPUs; image editors and so forth and my big brain-storming lapsus won't help in any way. I thought you understood the individual blocks weights in-dept before this video and now I'm like "how did he made IpAdapter work in the first place?", also knowing that is an overstatement and part of your quirky humor.
What I want now is to have a check-box on OneTrainer to train specific blocks depending on the model concept I want to train, should I pester Nerogar with my "genius" brain-fart? 🤔
Il Leonardo Da Vinci dell'IA Generativa 🤯
What if the role of each UNet block is different for different unrelated models (like SDXL vs Pony - they are seemingly the same in sense of architecture, but merge of SDXL and Pony gives bad results).
from my testing most checkpoints react in the same way. maybe with the exception of heavily stylized checkpoints (possibly merged with a style lora)
Damn, this is genius!!!
5:20 why the long face
Yer a wizard, Matteo!
What does the ConditioningZeroOut do?
Well its not everyones thing, to test out where to plugi in differing Prompt parts untill it works..
So maybe one can find out to what topics in general, parts of those splitted inputs are sensitive. idealy if you prompt an image, another ai, would split this scentence into these inputs. But i also wonder if the sensitivity for these blocks are always everywhere the same.. or if with another model, the input corean, wouldn do anything because in the other model the block is responsibel for maybe only the style of the image
"standard" models if they are not over trained seem to react the same way, yes. There might be some issues with stuff like pony models I'm sure.