Yes , why there's no way to motion control like ControlNet in AD...? All companies spend so much in the new AI video model training and development, but no add on features...
I'd be curious to see how CogVideoX 5B performs when you feed it a very simple 3D blocking animation as a base. The Blender Palladium tool seems to be built for this, and the samples they're showing with CogVideoX seem solid and coherent.
the CogVideoX 5B txt2vid was performing well, and some generated results are coherent for objects. But that was text , so we can't input any 3D obj or image. For this one, img2vid, it seems not getting a high percentage of great image to video result. I tested, and from my result, 3/10 img2vid can be a pass, which other result morphing , deform...
And if 3D animation, between 2 blender or AI video models , I am going to choose Blender for animation at this moment. Maybe a breakthrough later in Ai video model.
@@TheFutureThinker I'm more interested in vid2vid with the source vid being a rough animation from blender. Using that with control net and animdiff should work. Just curious if this model would produce better results. Will give it a try once I get v2v with animdiff working.
It's all working fine but when I set the frames number above 49 I get some sort of mosaic tile output. Is there any way to make this work. I really need this to run longer.
I assume you are talking about motion director for Comfyui. And that one is built for AnimateDiff model. For DiT AI video model, it does not work. So if we have to do a motion lora for CogvideoX, it will have to base on this model to train it, supposedly. Again , all ai video model are all playing gambling, you click few time wait for generate and hope theres one what you are looking for. For Runway v2v or AnimateDiff V2V , yes we can apply it for ai video, but its not directly 1 flow to generate a result like what we used to do with ControlNet.
Although this is working fine on my RTX 3090 at home on Windows, it's giving a weird error on RunPod's linux servers, though I've done other stuff which has been basically fine. It marks the models clip file as corrupted, even though the md5 code is right.
i just wonder about the reason of bad result. because from another video there show hints from github project: the model is trained with long prompts, so short prompts will generate bad result.
i think so. but img2vid model shouldn't need a long prompt, no other AI video models that on the market need it, as it's based on the vision of the image for reference. If a model need a long prompt even it's doing a img2vid, that mean is not good enough. Why required users to downgrade or put more work in order to make an AI work like other AI right? :)
Then that will back to the older techniques like some 3D arty RUclips talk about, Blender create 3D then bring it to AnimateDiff v2v, and they think its awesome. For business, AI new models should have a breakthrough.
@@cgraider I guess so, you can ask the checkpoint creator. But in Flux Hugging Face , they have the training code for it, and basically it can run on the cloud or local. Of course, when training CP, it's better to run on the cloud.
What does this error mean? How do I fix it? Given groups=1, weight of size [3072, 16, 2, 2], expected input[26, 32, 60, 90] to have 16 channels, but got 32 channels instead
No, this Model is literaly bullshit. If, for once, the process doesn't break down, you'll get a terribly bad video clip. This model is a complete waste of electricity.
no that sounds about right. it takes 10 minutes to generate a video with my rtx 3090 😑. i heard there is a "fun" 2b version that takes about 2 minutes per generation.
Hi, as usualy Cog Video doesn't like me :) That's the error I got CogVideoXPatchEmbed( (proj): Conv2d(32, 3072, kernel_size=(2, 2), stride=(2, 2)) (text_proj): Linear(in_features=4096, out_features=3072, bias=True) ) does not have a parameter or a buffer named pos_embedding.
Cool to be local, have privacy, and be able to generate whatever you want, but what’s the use of a model that generates low-quality video, with distortions, and only for 6 seconds?
The problem is, when there are large parameters size model, and give you a high quality run on local, for consumer pc, you can't even run it. Maybe you can do some research before comment, and as I mentioned about Transformer architecture.
Yes , why there's no way to motion control like ControlNet in AD...? All companies spend so much in the new AI video model training and development, but no add on features...
Well, another conspiracy is that, they want you to spend more on credits to keep generate 😂
Maybe we can combine it sometimes with toonCrafter, livePortrait or other options to get more control
Yes, i don't like that it have no control. Again, it feels like gambling in txt2vid and img2vid. Wait for the next seed number and see what do we get.
I'd be curious to see how CogVideoX 5B performs when you feed it a very simple 3D blocking animation as a base. The Blender Palladium tool seems to be built for this, and the samples they're showing with CogVideoX seem solid and coherent.
the CogVideoX 5B txt2vid was performing well, and some generated results are coherent for objects. But that was text , so we can't input any 3D obj or image.
For this one, img2vid, it seems not getting a high percentage of great image to video result.
I tested, and from my result, 3/10 img2vid can be a pass, which other result morphing , deform...
And if 3D animation, between 2 blender or AI video models , I am going to choose Blender for animation at this moment.
Maybe a breakthrough later in Ai video model.
@@TheFutureThinker I'm more interested in vid2vid with the source vid being a rough animation from blender. Using that with control net and animdiff should work. Just curious if this model would produce better results. Will give it a try once I get v2v with animdiff working.
@@SteveWarner yes, v2v AD is just work. With ControlNet , it can be controlled, and it's also very flexible upon the user's creativity.
hey, need some help.
I've been getting this error:
'CogVideoXTransformer3DModel' object has no attribute 'single_transformer_blocks'
any help?
Hilarious how your voice becomes so sad when talking about high memory consumption... 😂
I feel ya bro.
It's all working fine but when I set the frames number above 49 I get some sort of mosaic tile output. Is there any way to make this work. I really need this to run longer.
so far the motion director is not compatible with this model right ? so you can't have better control through pre-trained lora. 😟😟😟
I assume you are talking about motion director for Comfyui. And that one is built for AnimateDiff model. For DiT AI video model, it does not work. So if we have to do a motion lora for CogvideoX, it will have to base on this model to train it, supposedly.
Again , all ai video model are all playing gambling, you click few time wait for generate and hope theres one what you are looking for.
For Runway v2v or AnimateDiff V2V , yes we can apply it for ai video, but its not directly 1 flow to generate a result like what we used to do with ControlNet.
Although this is working fine on my RTX 3090 at home on Windows, it's giving a weird error on RunPod's linux servers, though I've done other stuff which has been basically fine. It marks the models clip file as corrupted, even though the md5 code is right.
i just wonder about the reason of bad result. because from another video there show hints from github project: the model is trained with long prompts, so short prompts will generate bad result.
i think so. but img2vid model shouldn't need a long prompt, no other AI video models that on the market need it, as it's based on the vision of the image for reference. If a model need a long prompt even it's doing a img2vid, that mean is not good enough. Why required users to downgrade or put more work in order to make an AI work like other AI right? :)
@@TheFutureThinker appreciate for explain this!
But can we use like Blender or some 3D editor , export the 3D motion run it in AI Video?
Then that will back to the older techniques like some 3D arty RUclips talk about, Blender create 3D then bring it to AnimateDiff v2v, and they think its awesome.
For business, AI new models should have a breakthrough.
Error occurred when executing CogVideoSampler:
'list' object has no attribute 'shape'
Any suggestions?!
This happen to me before, I added a image resize node after load image. The comfy can run the process. Thats how I solve it in my system.
quick question, is it possible to train flux dev model (Checkpoint) on GTX 4080, how ? what repo ?
um.. not sure if Flux have checkpoint model trainer yet. But it has the training script written in Python.
@@TheFutureThinker so those one in civit are using the main open source code ?
@@cgraider I guess so, you can ask the checkpoint creator. But in Flux Hugging Face , they have the training code for it, and basically it can run on the cloud or local.
Of course, when training CP, it's better to run on the cloud.
@@TheFutureThinker thanks.
What does this error mean? How do I fix it? Given groups=1, weight of size [3072, 16, 2, 2], expected input[26, 32, 60, 90] to have 16 channels, but got 32 channels instead
Looks like Diffusers need update?
Sequential image upscale ?
Ideally .exr.
Motion blur synthesis next ?
Yes, i want to try this as well. Great one
Is this limited to 720x480 ratio/resolution?
Yes, only this res.
this takes forever on my 4080 super and i9 14900k
am i doing something wrong ??
No, this Model is literaly bullshit. If, for once, the process doesn't break down, you'll get a terribly bad video clip. This model is a complete waste of electricity.
No you are not, you are doing normal.
This model do max out all Vram you have.
4090 3950 is churning 1 cycle in 5 min
no that sounds about right. it takes 10 minutes to generate a video with my rtx 3090 😑. i heard there is a "fun" 2b version that takes about 2 minutes per generation.
I like to listen your voice , its calming , make more videos and I try my workflow for text to image to video to video
why my I2v model not appear, only 2b and 5b
Update the last version, the drop down menu will appear this.
Hi, as usualy Cog Video doesn't like me :)
That's the error I got
CogVideoXPatchEmbed( (proj): Conv2d(32, 3072, kernel_size=(2, 2), stride=(2, 2)) (text_proj): Linear(in_features=4096, out_features=3072, bias=True) ) does not have a parameter or a buffer named pos_embedding.
Looks like its killing your pc 😅
Btw, diffuser update? From requirements.txt
@@TheFutureThinker Ok, the error is gone, thank you, but as bedore it stucked at Sampler. No more CogVideo for me :)
@@luisellagirasole7909 you don't need Cogvid for tiktok LOL.
oh yes.....
Benji the Leader
I am just a coordinator😅
Cool to be local, have privacy, and be able to generate whatever you want, but what’s the use of a model that generates low-quality video, with distortions, and only for 6 seconds?
The problem is, when there are large parameters size model, and give you a high quality run on local, for consumer pc, you can't even run it. Maybe you can do some research before comment, and as I mentioned about Transformer architecture.
@@TheFutureThinker well, i think not all artists or people came from multi media background do understand how AI run. So they comment this.