Don't sleep on Mochi1 - This is Pro Video Gen with ComfyUI on Local & also Runpod !

FiveBelowFiveUK

Просмотров 8 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 28 ноя 2024

Комментарии •

@ai-aniverse 21 день назад ⁺¹
You deserve more subs. Well done and I dig the unique presentation.
@FiveBelowFiveUK 21 день назад ⁺¹
Big thanks :)
@ai-aniverse 21 день назад
@@FiveBelowFiveUK Thank you for the work. i used to do open source development and the endless amount of thankless work and barbs just to help can be tiring. So I respect anyone who jumps in the fire to move the community forward.
@ysy69 Месяц назад ⁺⁸
excellent research and improvement. the future of Ai video is bright, running locally!
@FiveBelowFiveUK 29 дней назад ⁺¹
Yes - its a crazy advancement in my opinion ! Agreed :)
@p_p Месяц назад ⁺¹
amazing study on this amazing video model! thank you brother
@FiveBelowFiveUK 29 дней назад
pleasure is all mine - thanks for watching !
@mysoul_wisdom Месяц назад ⁺¹
I like 🎉 the passion and precision of useful data
Good looking Brother 👌
@FiveBelowFiveUK 29 дней назад
🤝
@flisbonwlove 29 дней назад ⁺¹
Great work mate! 👏👏
@FiveBelowFiveUK 28 дней назад
Thank you! Cheers!
@christofferbersau6929 Месяц назад ⁺³
Great stuff 👍
@FiveBelowFiveUK 29 дней назад
Big Thanks !!
@TomHimanen Месяц назад ⁺²
A wonderful deep dive which is very rare to see! Is video2video possible with Mochi?
@FiveBelowFiveUK 29 дней назад ⁺¹
until there is a VAE decoder all we can do is use ipadapter or Vision type nodes to get tokens from an image and then prompt them, this is only an estimation not true img2video - but i don't see why a future update would not add this in time, i think it's less than a week old. I'll be covering it for sure ! ~
and Thanks so much :)
@TomHimanen 29 дней назад
@@FiveBelowFiveUK Great! The most relieving thing is that there's no technical reason that the model itself would not bend to that purpose. This model is already so good that it's only a question of time when the open source AI-based generation of video will be crazy epic!
Most models seem to fail in motion, and video 2 video is a huge help with that. For example, I have been creating music videos with Runway mostly using Runway but with ComfyUI we will get more control instead of doing expensive lottery.
In my opinion, what would be needed in order to have a genuinely usable video tool for music videos and movies instead of animated portraits and talking heads, these inputs would be needed:
1. Driving video for motion
2. Reference image for background
3. Reference image for character
4. Text prompt for controlling background action, camera movement and such
Keep on keeping on!
@Artificialintelligenceo Месяц назад ⁺¹
Thanks! Good work. Which workflow gives the best result after your testing?
@FiveBelowFiveUK 29 дней назад
For maximum quality, the latent sideloading workflows in V6 are almost perfect - certainly peak as far as this model is concerned, you can also double the steps. Some people use 200 steps but i found 50-100 was enough. 49 frames was enough for my use case, but i know some want longer clips. V6 Fast is a good example of shorter videos. However - the reason that is required so much power to decode the latent files (runpod) was i did not use VAE tiling, this is insane because that was how this was even able to run on local in the first place - i wanted to show that the quality always suffered from ghosting no matter what tiling setting you used. Its the seams showing you see.
@lionhearto6238 Месяц назад ⁺¹
amazing. btw, how do you create your video avatar? that's awesome
@FiveBelowFiveUK 29 дней назад ⁺¹
What do you mean? it's just me :)
I used to use after effects and rotoscoping, but now it's all in a comfyui workflow, if you see the first video on the channel the secret is hiding in plain sight. Depth + Openpose, and a wireless mouse :)
@p_p Месяц назад ⁺¹
34:32 where to put those flash attention files^?
@FiveBelowFiveUK 29 дней назад
because it's so technical i have not even covered it yet, but the short story is you have to have the files in the comfyui folder (where the startup.bat files for comfyui are) and you do "pip install filename.whl" but this can break things, so again, i hope to return to this in a future update.
@jsfotografie 28 дней назад ⁺¹
Nice tutorial mate :) I got it running on runpot but the results had bigggggg ghosting not usable also the i2v did not look at all at my image. also the mochi model is not stored on the network storage so each time starting it loads it again :( . also when i just use the decode side the tile vae is not selectent then i get an OOM with an A40 :/ cheers janosch
@FiveBelowFiveUK 28 дней назад ⁺¹
Thanks for the feedback - it'#s really helpful for everyone to see what results you have!
I must admit that the nodes for the mochi loader are pulling in the models every time, but i think i can improve that by updating my Runpod Provisioning script ! Expect an update on that front soon ;)
Regarding the Decoder OOM's with no Tiling - I think this was a 100GB VRAM model (!) so we are still squeezing it in even on aa 48gb. I wanted to offer the "full fat" option for those people that used Runpod as a primary platform.
I only decode 2 second clips without the VAE tiling, Video VAE decoding takes an insane amount of VRAM due to all the frames.
@jsfotografie 28 дней назад
@@FiveBelowFiveUK no result yet mate hahah i think my runpod had a headache. also i was not aware that there is an i2v for mochi? perhabs thats why its not working as intended? but keep up the great work :)
@z1mt0n1x2 Месяц назад ⁺²
I think, people who recommend cloud/subscription services that in any way costs money, does not understand why most people are interested in generating locally.
The whole idea of generating locally is that it's free, no additional payments required than the one they've already made for the PC.
@ielohim2423 Месяц назад ⁺²
I agree to some degree but the cloud services like minimax also generate much faster than what any highend PC can. Putting 5 in a queue and generating 2 at a time, sometimes within a few minutes simply isnt happening locally. I choose to use both cloud and local for now.
@TheGalacticIndian Месяц назад ⁺²
It's not only about money. It is about as much privacy and safety of your data as possible (assuming it is even remotely possible with AI technology).
@quercus3290 Месяц назад ⁺²
sure but using a 48gig vram card on runpod is cheaper than my electric bill, so if its a matter of money some people might wanna take that into account.
@z1mt0n1x2 Месяц назад
@@quercus3290 I understand a lot of the reasoning and calculations behind it, but I always struggle when I ask myself 'why'.
Some people probably makes models, videos and images as a hobby, i make images for fun myself, it literally replaced gaming for me. Though I can't help but notice that there's also a lot of people that starts with generative AI thinking they're gonna make some cash, or fame, or both- Sinking money into it with expensive time constrained services.
@z1mt0n1x2 Месяц назад ⁺¹
@@TheGalacticIndian What's privacy anyway?
@Enigmo1 29 дней назад ⁺¹
I've not watched yet but can you do img2video with this?
@FiveBelowFiveUK 29 дней назад ⁺¹
AFAIK there is no VAE encoder, so all you can do is "Vision to Image" which will approximate and image using a complex description/captions. However this is only an update away if the author decided to add this, i was even tempted to try writing one myself. It's still early days so i decided to cover a few other things before coming back to it - hope that helps!
@3k3k3 Месяц назад ⁺¹
My favorite stick figure is back!
@mysoul_wisdom Месяц назад ⁺¹
This guy is brilliant
@FiveBelowFiveUK 29 дней назад
Big Thanks everyone :)
@sathien9158 25 дней назад ⁺¹
how many vrams needs this tool?
@FiveBelowFiveUK 25 дней назад ⁺¹
depending on the setup - I showed Q8_0 quantized with CLIP FP16 on CPU, so that would be ~20GB, running on a 4090/3090
However, there are many quantized setups and people in community are running under 16GB, i cannot confirm, but possible to squeeze down to 12 if you offload CLIP to CPU, although that required over 32GB system ram, so many optimization options these days it's hard to test them all.
for the safest bet and with the best quality on Local, 20GB is where i stake the signpost on this one.
@sirjcbcbsh Месяц назад ⁺¹
Setting up cuda toolkit and vision studio is pain in the ass
@FiveBelowFiveUK 29 дней назад
agreed - that is why i have not even covered adding flash attention, to be honest it's the difference between 20 minutes and 10 minutes, i can wait :)
@sirjcbcbsh 28 дней назад
@@FiveBelowFiveUK is there any alternatives that you would recommend?
@davidrosa4743 Месяц назад ⁺¹
All nodes in your workflows are giving float errors, how can I solve this?
@FiveBelowFiveUK 29 дней назад ⁺¹
sounds like you need to update the pytorch to at least 2.5.0, use the .bat in the update folder (comfyui) update with dependencies.
@davidrosa4743 28 дней назад
@@FiveBelowFiveUK ok, thanks I'll update. Do you have any optimized workflow to run on an rtx 3060 12 GB vRAM?
@Artificialintelligenceo Месяц назад ⁺¹
Does anybody know what this means LayerUtility: PurgeVRAM I can't use install missing nodes on this, restarted and searched.
@FiveBelowFiveUK 29 дней назад
github.com/chflame163/ComfyUI_LayerStyle -- this should be what you need, i think its in the comfyui manager - it is unloading the VRAM to help fit the models into VRAM on your GPU, i place them there to help with the crazy load these video models use. Full details in the links in description!
feel free to ask more questions if you have them :)
@vixenius Месяц назад ⁺¹
Can i do it on my 3090?
@FiveBelowFiveUK 29 дней назад
Yes, there are people in my discord that are using 3090 with this model, you would use the V6 Fast settings or V5 (Q4/Q8 + T5 FP8 Scaled). I will be making new versions to support lower VRAM this week, i had some other things to cover first :) I explain in articles all the different setups if you can't wait :)
@vixenius 29 дней назад
@FiveBelowFiveUK Thanks broski
@argentinox Месяц назад ⁺¹
Change the MIKE!
@mysoul_wisdom Месяц назад ⁺¹
Watts the problem
@FiveBelowFiveUK 29 дней назад
yeah i keep on looking away from the mic haha it's a new mic already see, i'm still getting used to it being on the desk and not on a boom arm.
@FiveBelowFiveUK 29 дней назад
hahaha :)
@Guardiao.DaCarteira Месяц назад
this works with amd gpu?
@FiveBelowFiveUK 29 дней назад
I would not like to guess as i do not have an AMD GPU, all i can say is try it and see?
If it doesn't, let us know, because it will help others :)
@Alter3go Месяц назад ⁺¹
🙂🙂
@Lexie-bq1kk 15 дней назад ⁺¹
I don't mean to complain because you're providing free info, but that filter on the audio is very distracting. would prefer no frills and just hear you clearly as you are. filters might still be cool but the one you are using now just changes the EQ too much like the low end drops out completely it sounds like the speaker cable is halfway unplugged
@FiveBelowFiveUK 12 дней назад
valid criticism is valid - i think it was an adobe audio preset going funky after changing to a new mic, combined with the new mic also having crazy cut off. Hoping it's solved in newer videos - thanks for the feedback - it lets me know to fix it !

Следующие

Автовоспроизведение

Mochi1 - Pro Video Gen at home, huge improvements with Spatial Tiling VAE !