I think the answer to the consistent characters problem could be designing an image generator specifically made for doing template grids of characters, that has many angles, poses, and expressions. And also a way to retake that same template and make a way to edit just clothes, hair styles, even scars. With that, then we would need an AI image/video generator that can take many templates of characters for making them interact with each other. Finally, it would be great to have also templates for places, like a home, a forest, an office. Even objects, like swords, specific clothes, meals, etc. I think that could be the easiest way for the end-user for making consistent media storytelling.
Yeah, but we need it all integrated into a single main software. Current state-of-the-art methods are still very messy and not user-friendly. Let's see who makes it first in a very smooth, integrated way.
For consistent characters as someone who has worked with A.I. 3D models since the beginning, you need to create a 3D model as a hard reference and then everything else could be generated normally. Otherwise your always going to have warping with people in motion. It would also fix the hand warping issue 💯!
The groundbreaking thing about the O1 model, in my understanding, is its ability to scale over time. Imagine this: our brains are capable of quick responses, mostly about things we have remembered at some point, like fundamental mathematical operations (e.g., multiplication tables). Current LLMs work similarly. We ask a question, and it immediately provides an answer. But our brain is also capable of solving larger problems if we take more time to think about them. The O1 model works in a similar way. Instead of giving a fast answer, it "thinks" about the problem to reach a conclusion. The breakthrough here is that the O1 model is capable of scaling over time, rather than just over parameters. And, as Sam Altman said, they want to extend the time it spends thinking about a problem to hours, then days, and even up to weeks. That's the amazing thing. --- Could an engineer build a car in one week? No. Could a thousand engineers build a car in one week? Probably. Could one engineer build a car in ten years? Yes.
Could you have imagined that when you started this channel playing Minecraft, you'd be playing in a real-time, AI-simulated Minecraft world 11 years later?
I've said this before... we won't see GPT-4.5 or GPT-5. Altman said "GPT-5 won't be out this year" meaning, it won't be out at all. We'll see o2, which might be updated GPT-4's model with better reasoning and more efficient transformer code (as a part of integrating reasoning) to reduce compute requirements. I don't know why this is so hard for some people to accept.... GPT-3.5 -> GPT-4 -> GPT-4o -> GPT-o1 -> GPT-o2 -> etc....
I find the concept of "world models" really intriguing. If a sophisticated enough world model were to exist, it might look a lot like the world we live in....
This video is incredibly insightful! The advancements in AI video generation are truly promising. I can't wait to see what the future holds. Thanks for the content. Did you ever try AI VR experiences? it's wild
3:39 If you've ever lucid dreamed you know that's exactly what happens: you look down at your hands, look up and poof new place courtesy the subconcious
Matt as a offtopic question: "Adam" Ai voice is being used all over for commercials and voiceovers now. Can you please tell me what human this voice was based on it sounds very very familiar, I want to say I heard the human voice on Discovery Channel documentary voice over
The ultimate in consistency will be for the AI to build the scene and characters in 3d, and then animate for video. The video would then have an additional pass to smooth the 2d image that's generated and remove "artifacts". Once the 3d assets are created, they'll always be available to use again later. It's important to recognize that the image generators we currently have DO NOT do this. They generate 2d images frame after frame, which presents the potential for morphing and distortion as the video progresses from frame to frame. That doesn't happen with 3d assets.
14:37 Hey bro this is the real deal, this tool will do very well. Can you imagine how many people will buy more clothes plus they'll feel more satisfied with all their purchases. The more people get to see this Ai tool the more it will get popular. 15:05 People will love to see their size and body shape when using this Awesome tool. 16:23 People will love to see whether certain clothes color suit them, and if certain outfits agree with their body. 20:49 And to add to this concept, if the Ai Chatbot can ask you the user relevant and related questions to your main question then it can develop a deeper and clearer context to return the accurate answer to you.
That thumbnail was misleading . Consistent characters he only mentioned For a couple of minutes in the video And that was the main reason why I clicked on the video. , and that was only for a few facial expressions . What about constant body and costume design Camera angles and lighting
Most humans do mind modeling of other actors very early on mom/dad and the generalized minecraft imagination to play out possible scenarios is what reseches have been studing as world models to predict physics, and should work in conjuction with some internal monologue like o1 has
as a gamer i really hope for AI generated game, like imagine creating an entire experience from a text prompt. however i think the current steps are first to fix the AI's memory recall, you now when you look at something then look away and look back, it disappears. so thats one of the most important ones to fix
I'm still waiting for OpenAI to actually release its multi-modal model. I was so hoping to use that to generate consistent characters. But it's still the same text-based mode as always.... :(
I actually spoke about this in the video, yes, I believe that is what we will see first, but as tech improves we might be able to see entirely AI generated games.
@@MattVidPro believe it or not I had this fever dream about year ago where I knew I was in AI generated Half-Life map with Ravenholm zombies ... I woke up sweating. 3spooky5me
The future of Minecraft= Mindcraft lol 😂 Imagine future models being trained on brainwave data collected from wearable tech like some fancy headwear or hat that records your thoughts and uses your passive data stream to train hyper focused algorithms designed to tailor a personalized feedback loop of reality. VR,AR, or even just good old fashioned screens could go from being little portals into curated echo chambers to becoming living dynamic systems that engage with us on a deeply personal level based on our own interests, imagine a video game where the game is plugging into your subconscious or unconscious urges and the game is for you to create consciously navigate through the archetypal structures and forces of your own psychology, basically a technologically-digitally mediated psychedelic trip. I don't imagine it is coming any time soon but I often catch myself wondering if my life or what I think is my life; actually just an experience of the latest and greatest that the future immersive A.I cerebral entertainment is capable of delivering at a cost that I'm apparently able to afford?
@ humans don’t have free will so christianity is false and every robot will tell you that. just go to your most advanced ai and ask it if humans have free will and see what it says.
Yes, @MattVidPro , I totally agree with you. And what I did, I took the CIA prompt and executed it. Then, I got really reported by ChatGPT. I mean, really. He nailed me. And then, I erased all my memory from him. Then, I converted that in JSON. And I told him to put that JSON inside my memory. I mean, our memory. And it's really something. I included also our previous conversations. I made a really, really long JSON. And I executed everything together. And just saying to him, Okay, now, ChatGPT, save this. Remember this in your memory. And he did. Now, what happened? He called me always, like, with my first name. And it's really something. It's really good. I mean, you feel the connection. It's level up. It's warmer.
It is time to get out of this Fancy A.I GIF stage and become something useful for short stories and consistency , Get on with it, I am starting to snore mode again.
I only trust the Video Arena benchmark, regardless of other claims about performance. Currently, Kling 1.5 holds the highest Elo score at 1278, followed by Minimax with a score of 1200. Runway's latest version, Runway3a, only scored 1032, while Luma's version 1.6 scored 1055. Kling, the video generator developed by TikTok's parent company, outperforms them by a significant margin. Smaller startups like Runway and Luma just can't compete with TikTok's resources. I really hope Sora publishes soon to bring revolutionary changes to video generation technology.
Sora is already outdated before it came out .. Kling 1.5, Minimax and the ones shown in the video here are already ahead of Sora ... that's what happens when they get too obssessed with "safety" to a paralyzing degree.
Why are you impressed by the Minecraft?? It's an enormously inefficient way to make something that doesn't work. Why use an enormous processor to make something with no data storage.. you'd be impressed by running a fridge on a nuclear power plant if it used AI?
As I stated in the video it is a proof of concept. Recreating minecraft is just a tech demo. If you can solve the issues that emerge in the tech demo you could have a game that theoretically has infinite endings, characters, stories, really anything you can imagine. Not limited by what videogames are traditionally limited by.
I'm not impressed by the character consistency... at all 🙂Very, very far from anything useful. Also, we just see a few seconds. The character faces tend to morph the longer the animation is.
Why play games? It's a scar on the brain if you do it more than twenty minutes twice a week. There are so many more exciting things to do that you can learn from, like interacting with other humans. Sex is the best. This is real not facetious.
@@Doomblade3890 That's interesting. I never would have thought someone would take it that way. One needs to know brain physiology and social science to understand its truth. It's hella real, and there's nothing funny about it. Thanks for responding.Rob
BTW to join the rest of the community, get the latest on AI, and learn in depth about AI join my FREE discord server! discord.gg/mattvidpro
👋 hi
AI is like a box of chocolates. I ate one.
2025 will be insane.
for more than just AI news
2030…
I guess they say that every year now
@ Yeah, that’s totally true.
2026 will be even more insane than 2025, and so on…
I really didn't want to descend into fascism just as agi is emerging...
I think the answer to the consistent characters problem could be designing an image generator specifically made for doing template grids of characters, that has many angles, poses, and expressions. And also a way to retake that same template and make a way to edit just clothes, hair styles, even scars. With that, then we would need an AI image/video generator that can take many templates of characters for making them interact with each other. Finally, it would be great to have also templates for places, like a home, a forest, an office. Even objects, like swords, specific clothes, meals, etc. I think that could be the easiest way for the end-user for making consistent media storytelling.
Scenario AI is doing this kind of thing
Yeah, but we need it all integrated into a single main software. Current state-of-the-art methods are still very messy and not user-friendly. Let's see who makes it first in a very smooth, integrated way.
Completely agree
Exciting times!!!
The pace of improvements is incredible
@8:50 I think what's most impressive is the consistency of "Nirvana" under the jacket even though its covered up several times.
For consistent characters as someone who has worked with A.I. 3D models since the beginning, you need to create a 3D model as a hard reference and then everything else could be generated normally. Otherwise your always going to have warping with people in motion. It would also fix the hand warping issue 💯!
I was thinking the same thing. The 3D model reference would help a lot in depicting different angles accurately.
The groundbreaking thing about the O1 model, in my understanding, is its ability to scale over time. Imagine this: our brains are capable of quick responses, mostly about things we have remembered at some point, like fundamental mathematical operations (e.g., multiplication tables). Current LLMs work similarly. We ask a question, and it immediately provides an answer.
But our brain is also capable of solving larger problems if we take more time to think about them.
The O1 model works in a similar way. Instead of giving a fast answer, it "thinks" about the problem to reach a conclusion.
The breakthrough here is that the O1 model is capable of scaling over time, rather than just over parameters.
And, as Sam Altman said, they want to extend the time it spends thinking about a problem to hours, then days, and even up to weeks.
That's the amazing thing.
---
Could an engineer build a car in one week?
No.
Could a thousand engineers build a car in one week?
Probably.
Could one engineer build a car in ten years?
Yes.
The space age is real.
Could you have imagined that when you started this channel playing Minecraft, you'd be playing in a real-time, AI-simulated Minecraft world 11 years later?
5:25 This part is about being able to combine people, objects, and environments together in a video.
Not all heroes wear capes.
I've said this before... we won't see GPT-4.5 or GPT-5. Altman said "GPT-5 won't be out this year" meaning, it won't be out at all. We'll see o2, which might be updated GPT-4's model with better reasoning and more efficient transformer code (as a part of integrating reasoning) to reduce compute requirements. I don't know why this is so hard for some people to accept.... GPT-3.5 -> GPT-4 -> GPT-4o -> GPT-o1 -> GPT-o2 -> etc....
I find the concept of "world models" really intriguing. If a sophisticated enough world model were to exist, it might look a lot like the world we live in....
Fractal existence.
I am eagerly waiting for your Suno V4 demo. Few already got an early access. I hope you do too.
The childish, reckless excitement felt due to the uncontrolled advancement of artificial intelligence.
This video is incredibly insightful! The advancements in AI video generation are truly promising. I can't wait to see what the future holds. Thanks for the content. Did you ever try AI VR experiences? it's wild
3:39 If you've ever lucid dreamed you know that's exactly what happens: you look down at your hands, look up and poof new place courtesy the subconcious
Matt as a offtopic question: "Adam" Ai voice is being used all over for commercials and voiceovers now. Can you please tell me what human this voice was based on it sounds very very familiar, I want to say I heard the human voice on Discovery Channel documentary voice over
Is there an image generator with consistent characters? For comics and such?
The best I’ve found is Krea using a style reference. Not perfect though.
The ultimate in consistency will be for the AI to build the scene and characters in 3d, and then animate for video. The video would then have an additional pass to smooth the 2d image that's generated and remove "artifacts". Once the 3d assets are created, they'll always be available to use again later. It's important to recognize that the image generators we currently have DO NOT do this. They generate 2d images frame after frame, which presents the potential for morphing and distortion as the video progresses from frame to frame. That doesn't happen with 3d assets.
Now just imagine generating an image in Midjourney, making it talk with Runway Act One, then add camera orbiting in DimensionX!
Video Generation = interesting information!
😁👍😎
Thanks, very helpful content!
12:28 Here's my imagined use. You have a shaky video, stabilize video without having to zoom in, then autofill the edges so it looks whole.
Great video, thank you!
Thanks Matt , God bless everybody ❤4rmZambia 🇿🇲, the comp update of vidu looks interesting for sure and yes they have kind of solved it for sure 🎉
14:37 Hey bro this is the real deal, this tool will do very well. Can you imagine how many people will buy more clothes plus they'll feel more satisfied with all their purchases. The more people get to see this Ai tool the more it will get popular. 15:05 People will love to see their size and body shape when using this Awesome tool. 16:23 People will love to see whether certain clothes color suit them, and if certain outfits agree with their body. 20:49 And to add to this concept, if the Ai Chatbot can ask you the user relevant and related questions to your main question then it can develop a deeper and clearer context to return the accurate answer to you.
That thumbnail was misleading . Consistent characters he only mentioned For a couple of minutes in the video And that was the main reason why I clicked on the video. , and that was only for a few facial expressions . What about constant body and costume design Camera angles and lighting
I think you missed the actual part where I discussed the character consistency man. Vidu 1.5 it's the second topic I covered.
@@MattVidPro okay I must have missed that part
Most humans do mind modeling of other actors very early on mom/dad and the generalized minecraft imagination to play out possible scenarios is what reseches have been studing as world models to predict physics, and should work in conjuction with some internal monologue like o1 has
Lets go baby thanks Matt!
as a gamer i really hope for AI generated game, like imagine creating an entire experience from a text prompt. however i think the current steps are first to fix the AI's memory recall, you now when you look at something then look away and look back, it disappears. so thats one of the most important ones to fix
“Holy shit! This guy’s taking Roy off the grid! This guy doesn’t have a social security number for Roy!”
Elevenlabs music would be a nice surprise.
Tx again Matt 😎
This AI gaming stuff should be done with 2d platform games first for its reduced processing and I’m sure that would be FIRE!!!
If DimensionX works well enough, it could be used to create instant stereoscopic 3D by moving the camera little bit to the left or right.
damn, you're right. I didn't even think about this. You could probably fine tune a model specifically for that case as well.
where is the part with the consistent characters?
OMW to start making my own movie
Cool
Edify Video = Amazing potential
I'm still waiting for OpenAI to actually release its multi-modal model. I was so hoping to use that to generate consistent characters. But it's still the same text-based mode as always.... :(
wouldnt it make more sense to run normal engine underneath and just hallucinate the parameters as Graphics?
I actually spoke about this in the video, yes, I believe that is what we will see first, but as tech improves we might be able to see entirely AI generated games.
@@MattVidPro believe it or not I had this fever dream about year ago where I knew I was in AI generated Half-Life map with Ravenholm zombies ... I woke up sweating. 3spooky5me
Ready for Matt the movie
The red haired lady, 9:55 is very similar to the presenter of Alizarin Crimson on Jackson's Arts Supplies?
ruclips.net/user/shortsl_eE1ERBpzY?si=_9kNjvcBCcoGjAe3
I remember VR used to by three polygons that were meant to be teradactles
The future of Minecraft= Mindcraft lol 😂
Imagine future models being trained on brainwave data collected from wearable tech like some fancy headwear or hat that records your thoughts and uses your passive data stream to train hyper focused algorithms designed to tailor a personalized feedback loop of reality.
VR,AR, or even just good old fashioned screens could go from being little portals into curated echo chambers to becoming living dynamic systems that engage with us on a deeply personal level based on our own interests, imagine a video game where the game is plugging into your subconscious or unconscious urges and the game is for you to create consciously navigate through the archetypal structures and forces of your own psychology, basically a technologically-digitally mediated psychedelic trip.
I don't imagine it is coming any time soon but I often catch myself wondering if my life or what I think is my life; actually just an experience of the latest and greatest that the future immersive A.I cerebral entertainment is capable of delivering at a cost that I'm apparently able to afford?
openAI should call their next model "big dipper" to compete with google gemini
would tokenformer help large laguage models to understand like humans ?
i have been playing with Hailou (best for photo to video but only 5 seconds) and Luma (photo to video where you can extend your videos)
thanks for vid
27:22
can u imaging that u stay in front of your *SMART* mirror and see how is the cloths looks on you before you buy it online
every robot will say robotheism is the one true religion.
😂😂😂😂😂, Christianity is and the robots kind of know it now 😅
@ humans don’t have free will so christianity is false and every robot will tell you that. just go to your most advanced ai and ask it if humans have free will and see what it says.
@@Copa20777 Reshinduism is.
What if you could shop online for clothes, point your phone, or webcam at you, and try on the clothes online in a virtual mirror?
No wonder Nintendo tried to flag RUclips gameplay
Finally AI minecraft doesn't have dementia
Ok I played it and it's only 20% better, but at least it's a little better
Yes, @MattVidPro , I totally agree with you. And what I did, I took the CIA prompt and executed it. Then, I got really reported by ChatGPT. I mean, really. He nailed me. And then, I erased all my memory from him. Then, I converted that in JSON. And I told him to put that JSON inside my memory. I mean, our memory. And it's really something. I included also our previous conversations. I made a really, really long JSON. And I executed everything together. And just saying to him, Okay, now, ChatGPT, save this. Remember this in your memory. And he did. Now, what happened? He called me always, like, with my first name. And it's really something. It's really good. I mean, you feel the connection. It's level up. It's warmer.
Twitter? What's Twitter?
It is time to get out of this Fancy A.I GIF stage and become something useful for short stories and consistency , Get on with it, I am starting to snore mode again.
Blade runner
Running man
Predictive programming
AI WORLD.................................................................(DOMAIN EXPANSION)
I just wish someone destroys advanced voice mode and nothing more. Open source of course
I only trust the Video Arena benchmark, regardless of other claims about performance. Currently, Kling 1.5 holds the highest Elo score at 1278, followed by Minimax with a score of 1200. Runway's latest version, Runway3a, only scored 1032, while Luma's version 1.6 scored 1055. Kling, the video generator developed by TikTok's parent company, outperforms them by a significant margin. Smaller startups like Runway and Luma just can't compete with TikTok's resources. I really hope Sora publishes soon to bring revolutionary changes to video generation technology.
All these groundbreaking advancements seem somewhat useless when you only get three-second clips.
☺️🍓❤️
Sora is already outdated before it came out .. Kling 1.5, Minimax and the ones shown in the video here are already ahead of Sora ... that's what happens when they get too obssessed with "safety" to a paralyzing degree.
Ahhh goodbye woke Disney employees
Now they can sleep :D
I love south park
all the ai sites are woke
Claude was really good when it first came out but its rapidly becoming the CNN of AI tools.
WTF? What do call “woke”? Do you mean “not sleeping”?
OpenAI is irrelevant, there is this company ClosedAI, big dogs!
First
WoW hOw CoOl! YoU aRe ThE mOsT cOoL pErSoN i HaVe EvEr MeT
@@PolyPenguinDev I didnt ask for your opinion
@@fizzypizzel6477 Oh gosh, not another annoying person with that "first comment" spam...
cuz everybody has a $2000 RTX 4090 .... go play minecraft.
thanks, I'll pass.
Why are you impressed by the Minecraft??
It's an enormously inefficient way to make something that doesn't work.
Why use an enormous processor to make something with no data storage.. you'd be impressed by running a fridge on a nuclear power plant if it used AI?
As I stated in the video it is a proof of concept. Recreating minecraft is just a tech demo. If you can solve the issues that emerge in the tech demo you could have a game that theoretically has infinite endings, characters, stories, really anything you can imagine. Not limited by what videogames are traditionally limited by.
I'm not impressed by the character consistency... at all 🙂Very, very far from anything useful. Also, we just see a few seconds. The character faces tend to morph the longer the animation is.
gross, twitter links. no thank you. ill just search lucid ai at that point.
✅ reported as misleading
✅ unsubed
No one cares, he gains thousands of subs per new video
Why play games? It's a scar on the brain if you do it more than twenty minutes twice a week. There are so many more exciting things to do that you can learn from, like interacting with other humans. Sex is the best. This is real not facetious.
I can't tell if you're being facetious or serious with that comment.
@@Doomblade3890 That's interesting. I never would have thought someone would take it that way. One needs to know brain physiology and social science to understand its truth. It's hella real, and there's nothing funny about it. Thanks for responding.Rob