The Turbulent Rise of AI Avatars
HTML-код
- Опубликовано: 26 июн 2024
- Deepfake? We don't really use that phrase around here. We call it "AI Avatars" now.
NVIDIA Contest Link: nvda.ws/422biZ6
Getting Started Blog: nvda.ws/3O7f8up
Dev Blog: nvda.ws/490uadi
This integration is in partnership with NVIDIA
check out my leaderboard website at:
leaderboard.bycloud.ai/
Research Sauces
Animate Anyone
[Paper] arxiv.org/pdf/2311.17117.pdf
[Project Page] github.com/HumanAIGC/AnimateA...
[Unofficial Implementation 1] github.com/guoqincode/Open-An...
[Unofficial Implementation 2] github.com/MooreThreads/Moore...
MagicAnimate
[Project Page] showlab.github.io/magicanimate/
[Paper] arxiv.org/abs/2311.16498
[Code] github.com/magic-research/mag...
DreaMoving
[Project Page] dreamoving.github.io/dreamoving/
[Paper] arxiv.org/abs/2312.05107
Outfit Anyone
[Project Page] humanaigc.github.io/outfit-an...
[Demo] huggingface.co/spaces/HumanAI...
Cloth2Tex
[Project Page] tomguluson92.github.io/projec...
[Code] github.com/HumanAIGC/Cloth2Tex
[Paper] tomguluson92.github.io/projec...
VividTalk
[Project Page] humanaigc.github.io/vivid-talk/
[Paper] arxiv.org/pdf/2312.01841.pdf
DreamTalk
[Project Page] dreamtalk-project.github.io/
[Paper] arxiv.org/abs/2312.09767
[Code] github.com/ali-vilab/dreamtalk
This video is supported by the kind Patrons & RUclips Members:
🙏Andrew Lescelius, alex j, Chris LeDoux, Alex Maurice, Miguilim, Deagan, FiFaŁ, Daddy Wen, Tony Jimenez, Panther Modern, Jake Disco, Demilson Quintao, Shuhong Chen, Hongbo Men, happi nyuu nyaa, Carol Lo, Mose Sakashita, Miguel, Bandera, Gennaro Schiano, gunwoo, Ravid Freedman, Mert Seftali, Mrityunjay, Richárd Nagyfi, Timo Steiner, Henrik G Sundt, projectAnthony, Brigham Hall, Kyle Hudson, Kalila, Jef Come, Jvari Williams, Tien Tien, BIll Mangrum, owned, Janne Kytölä, SO, Richárd Nagyfi
[Discord] / discord
[Twitter] / bycloudai
[Patreon] / bycloud
[Music 1] massobeats - peach prosecco
[Music 2] massobeats - lavender
[Profile & Banner Art] / pygm7
[Video Editor] @askejm
0:00 Intro
2:11 Animate Anyone
5:17 MagicAnimate
6:40 DreaMoving
8:09 Outfit Anyone
8:48 Cloth2Tex
9:24 VividTalk
9:58 DreamTalk
10:46 NVIDIA Gen AI Contest
11:16 Outro Наука
have no purpose just like my website
leaderboard.bycloud.ai/
check it out 😎
On that GPS "have no purpose" part when first researched in 1973, it's absolutely wrong as the US DoD needed more reliable position and navigation system for the planes, rockets and missiles to prepare for the next war, especially after suffering lots of pain from Vietnam War. And later the government decide to release it for public use as it'd really help with the economy, like A LOT.
Ya this is true, most tech has roots in warfare and ability to win wars.
things are moving scary fast even next 5 years are unimaginable right now
It will get to the point where you can't trust anything you see on a screen
oh crap, we're already at the event horizon aren't we?
@@beowulf2772 i feel like we passed it not long after ai voices got scary good
@@serronserron1320 Oop apple is trying to put a screen on your face 24/7 were going to be living in the matrix
@@serronserron1320 yo many image sites changed their policy just yesterday they are allowing ai images on because no tool or their mod team can distinguish it... we already reached the point, also stable cascade a new architecture based image model is here.
Imagine if you record yourself to see if the clothes you're interested in look good on you or not, no longer having to go to test the clothes in the physical store.
Then you order the clothes and they don't fit
Finally, I can single-handedly create an anime using AI for everything
Don't use it for the story
@@Redrandddhentai story
you aint gonna make it
anime... right...
yeah, using a single hand
Well TikTok really helped increasing the speed of dev. The new model looks interesting with depth detection.
Where is the data from?
*TIKTOK*
The main use case as of right now is adverts, I am 70% sure that a few recent adverts from TEMU used this.
Motion capture is an expensive tool used by the film industry for decades now. It got even more popular thanks to all the vtbers, vtber-like influencers and vrchat. So this technology has some serious business usecases. Not to mention the technology that might emerge from this one. Creating and animating 3D models could be made very easy. You could even edit movement of some people in postprocessing of a movie / video. It might have even more business potential than generating static images which we all came to accept as a go-to solution for graphics design.
Yep, it will likely revolutionize VFX at least. Perhaps even getting good enough to animation companies to replace a large amount of their animation staff.
It's also obviously of interest for intelligence departments around the world.
Going through the leaderboard you referenced at the end, it was actually one of the coolest things I have seen recently. I did a master's thesis on AI in the arts in just 2020 and the difference in tech since then is honestly insane
IMO this will be cool for giving AIs their own avatars. Imagine being in a Zoom call with an AI that generates its video feed on the fly, either photoreal or straight up anime.
Your AI a generated assistant can't help you right now they are simulating a video game in the background in front of their fake Avatar of them working.
I'd like to imagine Neuro-sama with her own 3d functional avatar that acts according to her own will
imagine being so socially inept that you hop on calls asking Optimus prime for advice 🤦♂️
Imagine being so trendily skeptical that you completely overlook useful or suitable usecases of it down the road.
@@aaagaming2023 And yet we can infer the actual intent of the OP in spite of deflection.
"I can't think of any practical use cases for this."
Suuure you can't.
There's totally no use cases for this technology that spring to mind. Least of all shady ones.
My AI journey started from this channel few yrs back and I still keep checking your channel with the same excitement every time. Thanks mate🙂
i died when Dont ask CHina where they got data from .....
One use-case I can't wait for is virtual group chats like in GITS, where we all have our own avatars -- not unlike metaverse or vrchat, but like no wearables, just ai and software.
r34 is going to be WIIILD
It already is
lets not beat around the bush whoever manages to turn this into lewd stuff is gonna be rich as fuck
already done, the porn industry always drives development in new areas.
id even wager that the distribution of porn was a major contributing factor in the development of the internet itself.@@dronesflier7715
Ya, like the video started with him asking 'why are these being made?' lol.
Jeeze, we got next level parasocial relationships with streamer/ai companion hybrids coming tomorrow. 😅
@@dronesflier7715just like war
So, the biggest use, is if you want to make some sort of content without having to show your actual face allowing a bit more anonymity. While giving your views more to visually go off of than if you only had audio. So they can get attached to you. Without having to share too much of yourself.
Like "Vtubers" refer to streamers, or in some-cases youtubers who use a digital avatar that lip synchs to their audio. And maybe
Or in VR more generally, to make games seem more immersive.
But, when you design your own 3d model that isn't trying to be completely realistic and hook it up to some kind of motion tracking it often looks way better.
That basic idea has been popping up in big movies for awhile. Like the star wars prequels. And we have been able to make that kind of thing look good for over 10 years. However, it isn't as arbitrary.
There are also image to 3d model conversion tools that are decent. (with only a couple issues.) So I have seen some decent stuff with stringing image to 3d, use AI rigging tool.
Take key frame from image. And then maybe give it some other poses/keyframes and letting the AI or a math equation interpolate.
Like the V-tuber on the right shown at 3:05 would be really easy to do with that method. And the one on the left would be a little harder but, it would just take a couple shaders you could download off the internet to get something like that. (But probably doing a better job with the hands.
As you mentioned, being able to generate AI dances, poses, or animations to sell clothes seems like it could be worth some money.
The use case is probably big: Indentity and social, which are basic human needs.
Is anything of this can animate lips with input audio on a streaming video?
Who says these are useless? Saves a ton on photo shoots and video for brands 🤔I could keep going I mean if you’re a business this list is endless
Realtime AI avatars are a very real problem to solve before we can have AI-rendered, fully photorealistic virtual worlds, so I guess it's good the time and money's going in now.
AI content is always going to be distinguished by quality vs quantity. Like with the music generation stuff. People who dont have an ear for music will still put out crap no matter how good the AI gets.
0:21 what program/software is that?
I wonder if they could improve fingers by also including a reference for that. But I haven't learned the details of how that skeleton movement is obtained. Maybe including fingers in that would require some fundamental improvement.
But the more they make this movement thing look better, it will only serve to highlight anything that looks wrong. Now we see this and think oh neat look at them dance, but if the quality gets higher we will think oh gosh look at those hands
Practical use cases;
Porn,
Animation,
Stylized game graphics,
Manipulating the general public,
Movies,
generated video content
You forgot porn. Also, porn, and porn.
@@Ithirahad and hentai Damn forgot them all :/
What a comprehensive introduction! Thanks for sharing, the video is so charming and funny hahahaha
I am thankful for you citing all mentioned papers in the description. It's appreciated
Love how the AI boom has been lots of open sources and of course chine locks it behind $$$
stop being lazy..if u understand the reference material, u cn reverse engineer anything
I mean, the big US software companies are doing the same, but sure, there are exceptions. ClosedAI, and their overlord Microsoft, being the two most well known examples of locked doors AI research.
Is this the same Animate Anything that is now available in comfyui?
I see this as an absolute win for game development since animation will become easier for indie developers
about the face animation, i could see a use of it for dubbing and reduce the need to "butcher" the translation in favor of making the current mouth movement sync well.
Great video as always!
i got the nastiest most evil looking ai ad before this
He doesn't use an AdBlock or YT Premium!
Lmao
Couple of notes. If you look at the paper, it didn’t actually come from nowhere. They’re comparing their model to several previous ones released by other companies earlier that year. It is just AA that broke into public consciousness.
While the data comment was funny, it is actually a particular dataset that various companies have been using for a few years now. Alibaba actually trained 3 AA models. There was their main one trained on a variety of data.
Then there was a fashion one to compare to one previous model and the TikTok one to compare to a diff model. The controversy about it being fake because of emulating existing videos is missing the point. A previous model trained on some of the TikTok vids and then took a skeleton from one of the other videos and animated it and compared how close it got to the original video with a numeric score. Then Alibaba did the same thing and showed they got a higher score. So really that part of their video was purely a comparison to another model and shouldn’t take it literally (like videos in the training data might have used similar meme songs, making it predict facial expressions better etc).
The part of their video to focus on is the one animating anime characters and iron man and all that. That’s what was using their main trained model.
All this facial and video AI stuff and all I want is opensource TTS that is realistic and doesn't require an internet connection to some 3rd party api/server. :/
not gonna lie, dancing cute anime girls are worth the effort.
Feels like we're one step closer to creating real anime girls 💀.
The AI developers when they finally achieve that despite people saying "anime girls aren't real" for years: _They called me a mad man._
Equating AI avatars and the Slinky. Well done. Well. Done.
How did you come up with "zero practical use cases"? I can make a fake me for videocalls and not be subject to showing what I'm doing or where I am.
And fortnitedancing.
no use case? just imagine how much companies can save by using fakes instead of actors to record guides, tutorials, comercials, etc
What I learned from this video is, that it is absolutely not worth it for more or less casual people to be at the bleeding edge of AI, because it WILL be outdated and replaced by something better like only 3 months later. The speed of development is crazy.
The end result will be using a photo of a person and a snippet of audio to generate a character in a videogame thqt you can interact with
Oh I think it will eventually be possible but not maybe in the next 5 years or so. Realism is still extremely hard to copy and for every attempt to make things as real as possible there's also counter movements to debunk fake footage and such.
you can use it for social media
Superstar Cop, bottom left, 1:11 "I don't want to get better, I want to get worse"
Has anyone seen something like VividTalk but in realtime? I'd like to stream my llm into some video as well as audio (currently with 11labs)
The idea that this technology doesn't have a use case is an absurd proposition. First of all, media and film production is a multi-billion dollar industry where a handful of superstar acts take the lion's share of the acting budget, regularly signing multi-million dollar contracts. Imagine being able to have leonardo Dicaprio or Scarlet Johanson or any number of big name actors star in your independent film for pennies on the dollar. Or imagine the animation of fantasy characters being done simply by having a human act out the part and replacing them with the fantasy avatar.
Going beyond film, this technology has use cases for simplifying animation in video games and drastically reducing time and cost for character development.
In the future this technology will evolve to real time virtual reality avatars that are customized to your design and depict your every action perfectly as you perform it.
Beyond that, this research will broaden our ability to not just replicate character behaviors accurately, but to gain fine gained control over them through text to video commands. "Make them jump" will convert directly to an accurate and believable video of someone jumping.
The applications of this technology are innumerable.
2:00 bycloud lives in a 4d world confirmed
Love when the Chinese researchers sneaked in quite a few memes in those demo footages
There is a lot of Money in it for Content Creators especially removing limits to budget
Dude fire vid
"0 practical usecase" I don't know what to say, it's literally in front of your eyes
Use case:
It will level the playing field on content creation. There is often a bias to faced content over faceless and a bias towards the pretty Vs the not so pretty...
With this tech everyone will have a pretty face for their content and the only factor left will be the contents quality.
I'll point out this guy's animated cloud head as my example. He believes it helps. >_
It will get easier and easier and movie studios will love to hire someone once have them sign a contract and never higher that person again and use them in movies for all time and save billions of dollars.
Eventually creating a big budget AAA movie with cost basically nothing. AI to write, AI to Make and maybe one person to review.
You can't honestly believe that gps was created without a use? like seriously, just think about it.
This could be used to produce anime series and movies more cheaply. Draw a character in 2D, then film human actors performing the actions, then use this software to animate the character by combining the video and the illustration.
We are using AI to accelerate AI, using AI. Our researchers have used AI to research the uses of AI, using AI.
not in the way you're thinking of it.
Literally add with a 3D character and make an anime.
0:17 Porn the answer is porn.
Alibaba group and Tencent Arc my beloved ❤❤❤
Zero practical use cases? Imagine not having to pay actors.
Holy shit, this is a game changer for vtubing
not everything is money, some is just research.
hello video historian watching from the future to analize the first ai videos
So they took the data from tiktok users on iPhones and grabbed the depth camera info, right?
Go Donnie!
the chinese are so far ahead, not only a lot of research comes from there but they also found a way to monetize it immediately - I believe being able to upload your body into this shopping app and then generate yourself wearing clothes should be a huge advantage for a company that's selling clothes to sell more
Mh, what makes you think that? Last known status is that the state is very behind in pivotal ml research because it is very hard to make sure that generative models don’t show or say stuff that is “not welcome”….
I like the fact that you claim that this has no use cases when most users see tons of them (in the comment section of your own video) .
This tech is a merge of multiple technique, including open pose and different controlnet and at the speed this is evolving 90% of us will be out of job pretty soon :P The model can definitly use sdxl model that has been trained on license free content.
Most people that ignore how the tech work prefers to say it's stealing art work, copyrights and is an evil creation.
It's ok to be afraid, but it wont stop it to happen anyway. Progress is progress.
Yeah it's absolutely mental that he said this has zeor practical uses and he can't see a way to make money with this.
It's such a surreally asinine and uncreative statement that it actually made me question the validity of everything else in the video lmao.
In 15 seconds of thinking, I can come up with at least 5 extremely profitable and practical uses for this, from porn (obvious) to military/intelligence applications (US publishing a video of Putin saying something he didn't, etc). He himself mentioned multiple valid uses regarding online shopping in the video...
I mean, Jesus man... you think maybe something like this could help in the entertainment industry in the creation of animated shows or CGI in fantasy scenes, etc? The uses are almost infinite. And that's without considering the tech advances this new tech will spark in the future (like how static image creation allowed video image creation). Like who knows what kinds of metaverse-related tech could spawn after this video creation/animation tech is mastered?
I know a lot of great use-cases for this *today* for creators, I don't know why this isn't obvious. Not having to hire actors when you want to make an indy film as a hobby with no money? Also video game engines will use this technology for all its models very soon.
35 seconds into the video - people need this to create more content and faster. I and my company would greatly benefit from an actual good solution because it'd be easier to create branding, short-form content for youtube shorts or tiktok.
eh, this will only drive the demand for private content platforms even higher.
Sakura being turned into an ai avatar is more horrifying than anything she has done in Heaven’s Feel LOL
No one going to talk about the left ear?
So it's for unconventionally looking people who wouldn't be appreciated by the masses that they're trying to make money off of?
If we don't live in a matrix already, certainly someone will make a matrix not far from now
pretty sure we all know what this is gonna be used for...
Thank you for calling it "twitter"
neuro sama, an ai vtuber
"GPS was invented without any use case" Are you just making things up? GPS was meant for naval navigation from the start, do you think they just sent up time synced satellites and then shopped around for use cases? It was always for navigation. Where do you get your information that you do not cite?
i think this is being used to create AI generated models to allow men to also milk simps. i seen an article about at least one which has this kind of fluid animation in an AI model.
Computer, load up Celery Man please
I’m gonna put myself in Harry Potter movies
Putin release a ai video at the tower wishing everyone a happy holidays, because he paranoid he get sniper shot if he did go out. so ai avatar technology was helpful for him
it's not for us, it's so other people can own copies of us while make them for them
nice
Hehe. I know what I am gonna do today.
Finally, a good use for TikTok
as a car content creator i approve this video
Why do it? Because it's interesting and entertaining. Bonus if YOU do it first and find some other use case for it later, and you can brag about your patented technology.
hohahhaha lad ur doing great l like this recap , keep it up
everyone is freaking out that you will speak with robot instead of governemnt official in the future
i can't wait for that, nothing worse than talking to human npc...
If people in general feeling talking to the AI NPCs are better than talking to average human, I guess the flesh humanity is doomed. (Pretty low bar if you use dating app as the litmus test lol)
appreciate your videos , however this one has a concerning shift in tone . stating this research as both having "no purpose" and "problematic" while your judgment of its value is dependent on if it can "generate big money" is a grotesque mischaracterization , and a warped capitalistic valuation . this research is fundamental to effectively guiding synthesized character through animations in a way that is consistent , it is integral for say the creation of any animated series or synthesized movie going forward . im greatly looking forward to a time when anyone can create their own animation series / show / movie , as we have seen there are times when fan projects produce really great works from their passion rather then the lukewarm creations of corporates who design by committee and end up diluting the vision . the "turbulent" response of twitter is irrelevant they dont represent us , the stats reveal that only a small percent of people are responsible for 90% of the replies on twitter they are a vocal minority appearing larger then reality , do not mistake their deranged opinions for consensus .
Huh
The unfortunate sinophobia in the video is sad. 😔
No no no it's hostility towards ccp.
First!
To go to the human gulag.
I think this technology is very useful even today for generating an AI avatar. Keep in mind that due to the one-child policy in china there are way more males than femlas living in the country. The need for a virtual companion is much greater there than in the west.
Are you racist against chinese people or just ignorant towards the data procurement strategies of western companies?
can it turn the avatar around and look away from the camera? I notice donald trump dancing has a suit with a red tie on both the front and back.
EDIT: alibaba seems to have nailed the turning around thing
here before asmongold reacts to this to say women are finished
Onlyfans models are gonna get outdated very fast
And that's a good thing
@@constantinethecataphract5949 100% agree