The *unofficial* Two Minute Papers discord server is now available. If you wish to volunteer/help, please let the mod(s) know! Thank you so much! reddit.com/r/twominutepapers/comments/f9u640/discord_or_slack/
Use AI to generate random stories Use AI to generate photos from stories Use AI to generate videos from photos Use AI to upscale and make it 60fps An AI movie?
There's already a movie whose screenplay was generated by AI, then fully acted out. But that was a few years back, prior to transformers. I wonder how things have progressed since then.
First time I heard you say "Doctor". It rolled off your tongue smoothly! Beautiful! Congrats on your achievement! You have always been a PhD to us all.
Disclaimer, I don't know much about any of this past a surface level. But I'd imagine it might be because of the input reference materials used to train or design the AI having more generic lights. The lights for the 3D render of the Mini Cooper look like it's from an old Volkswagen Golf, which would be an ideal candidate for hatchbacks because of it's lack of a distinct feature; it's a generic car. I'd imagine that's what happens to the birds too, the AI just pulls from the reference images used to train it, and applies it accordingly.
@@CGPacifica You're right, it can pull from the input itself since it can see it clearly, why pull from the resources? I'm just thinking out loud but perhaps it recognized the input as a "vehicle" and knowing it's a vehicle means it has to be at least somewhat symmetrical. Maybe the lights, or just the features on the far side (the part it can't see) are too much for it to infer and so it just approximates it to reference material instead?
It looks like it's not so much converting a 2d image to 3d, but instead creating a 3d object from what it was trained on and using the 2d image as a source for basic features-- almost like if someone was describing the photo and someone else was creating a 3d object based on that description.
This low level of detail already has applications such as for making game prototypes, with more fidelity than whiteboxing, just step outside or walk a trail and capture stand in objects. OR and this is the big one just feed mockup images to it.
This explains visual perception to a high degree. It shows how people deconstruct the scene they're present in into objects, and it also shows why people can project their awareness into space and look at things from another perspective. It also explains out of body experiences.
the only thing I don´t understand: why does the AI image look so different in the same perspective as the input photo? Couldn´t they project the texture from the camera perspective onto the model, so at least the visible part of the model looks better? 3:35
They could and it'd be trivial, but it's not what they're trying to do. For a commercial application, google earth for example, it'd be an obvious improvement, but I believe here they're trying to generate everything from scratch to create more generalised and comprehensive method. This method will eventually work just from word descriptors for example, while projection mapped approach wouldn't.
If we did that it would, essentially, be cheating. The point is to see what the neural network is "seeing" and how it builds the image. They will of course continue to make the image better until it can one day create a perfect photo realistic recreation.
What they ought to do is have AI 2 test the finished output of AI 1 against that original angle and either reward or punish AI 1 for better and worse versions of the same object. This would train it on specific objects after which you'd have to train it on categories of things (e.g., 👎🐦 ➡ 👍🐦, 👎🦉➡ 👍🦉, 👎🦚 ➡ 👍🦚, 👎🦆 ➡ 👍🦆 ⏩ good bird)
The hummingbird example -clearly this is a bird, so object detection / categorisation would tell the net that it needed two wings as its a bird! Humans can fill in the details becuase they would know that a bird has two wings and it is in flight so any different angle is going to show the birds wings... 2 more papers...
Cool work! A few thoughts: A system like this is theoretically impossible if I want to use it for 'any ' object because a single input image does not contain enough information for the computer to know what the other sides look like or what 'thickness' anything has. The only reason why it does know here, is because it was trained on specific examples. If I were to get a random foreign object that was not in the training set, it would likely not be able to figure some things out. Second, I doubt the resulting model is at all a starting point for a 3D artist. They can't just 'add in the details' like you say, because the mesh itself likely has poor edge loops and is 'unhandleable' or unoptimized for any sort of software to be able to work with it well. Same goes for the generated UV and texture. Nothing beats the modeling and unwrapping skills of a human, because they truly understand what makes the most sense to do. That car UV looked really inefficient for example, using 4 different spots for every wheel while that could have been one. Still, it provides an interesting starting point and may venture into better applications in the future.
This would serve as a huge bonus for not only robots but anything relating to entertainment which would be especially modeling for video games and movies. Really awesome to see.
I think what I love so much about these videos is that you can see that the technique is still not perfect, and that in the future we will likely look back on this and scoff, but still see it as one of many stepping stones in computer generation. You get the sense you’re watching history in the making.
This technique has been around for over a decade! I saw a documentary called CSI NY back in 2009 where an investigator zoomed, rotated and enhanced a photo using a computer program like this.
Yeah, I'm definitely thinking this will enhance photogrammetry for the time being, rather than outright replacing it. It'll probably make scanning reflective objects and parsing out shadows a lot easier, so objects wouldn't have to ideally be shot in diffuse ambient light to extract textures and minimize redlections.
it would make sense, of course there would be a lot of complexity to overcome. Conceptually you should be able to use trigonometry on 3 separate images to make a really accurate model, especially if the program knew the exact angle the picture was taken from.
As somebody who has reimplemented this paper for a dissertation, it does (usually) already use multiple views during training for better 3D comprehension. This is not mandatory, but tends to help a lot with the 3D shape
You can still improve this neural network if it distinguishes between objects and represents their approximate geometry in space. In addition, she can determine where the wheel or any other part should be located, then color the delati and assemble the whole machine separately, if the low-poly modeling from one object, then determine at what point the glass is, etc. as a result of the photograph, she will be able to collect all that she knows from the details inside the space. Cars, animals, people
If you added a classification/recognition + modification neural network, to this one, it should get pretty close to perfect. Whatever features the AI is unsure of, it can just assume that its like other birds that it knows about, like humans do. When human beings turn 2D into 3D, we fill in whatever we dont know with what we expect. I.e. when were looking at that bird, wed automatically fill in specific details about feather, head shape etc, that we normally see in birds.
That's amazing!!! Creating PS1 level graphics from a single photograph?!? That's insane! How would the results look if we took a synthesized image from previous works and fed it into this one? Would it end up with several artifacts? And if so, could we train a network to search for artifacts and further refine the generation process of Both of the generating networks?
This system is a huge step for automation, so autonomous driving systems can generate occluded or far sides of objects the way our minds do automatically!
This teases the possibility of finally having a useful robotic house cleaner. Especially with iterative feedback from the actual environment into the planner.
I imagine this could be improved by an algorithm that mirrors the objects, since most natural and man-made objects are symmetrical. If the network could somehow find the mirror of least resistance (most overlap with mirror object) in the least destructive way, then small faults like single wings could be filtered out.
I bet that in 10 years from now we will be able to watch all our favorite movies in complete 3D reconstruction in real time, having the original movie as reference only. In 20 years from now, we will enjoy them as hologram projection movies.
Hey, I'm working on a publication based on this research and would love to add a citation of your paper, but I'm having problems executing the code (Issue #6 on your github repo). Would appreciate assistance!
I wonder if you could use a library of existing 3d models, then use image analysis to work out what is in the image, and find a relavent 3d model to use as a starting point
The first thing I thought of when I watched this was that this algorithm could be part of a network that watches old movies, creates models of all the objects in a scene and then renders an enhanced version of the same footage. We 've already seen RNNs that can ensure temporal coherence and physics based simulations that would provide enough information to perhaps even enhance footage considered to be damaged beyond repair. Is it very far fetched to think that a similar approach might be able to do the same with audio? I 'm absolutely fascinated and can 't wait to see someone try that.
If I was a police officer looking at those 3d car models the blurry CMR looked more accurate to shapes and branding styles of the car manufacturer than the newer one. It was hard to tell what brand car the newer one was making.
@@Crayphor I remember hearing how the developers behind Microsoft Flight Simulator 2020 used AI to help build the cities in that game from real maps. Something like this would probably have an even better effect. Perhaps, one day, we can have street view without the need for cars to 360-degree camera drives about the world.
I think Google Maps is using something based on the Category-specific Mesh Reconstruction method, or at least related to it. You usually see those jagged artifacts in Maps which appear in CMR.
Maybe giving the network the 3D axis and light direction can help a lot, or even make a miror of one face of the object to replicate so the final model is actually simetrical. I feel like a few features can greatly improve the end result
This is one of those things that is incredibly technically difficult but can't always be appreciated. People who aren't aware of what goes into those renders might look at them and think "wow, that's incredibly fuzzy and inaccurate. It barely even looks like a _____!" But I'm very glad to be able to appreciate this technology with the others watching :)
This is such a nice and curious community you gathered here! Congratulation on creating something socially significant in the digital age with digital measures :)
@@joshinils The problem is that there are still plenty of places that have only been photographed once quite a while ago. Definitely "popular" places will have their imagery updated anyway, but it would still be great to have a way to cover the less popular areas.
3:17 - clearly (by the fact that the lights are from a completely different car) -- they are pulling in models from elsewhere, so why don't they look as good as CAD images?
What would be clever would be if it could accurately remove the natural shading and lighting in the input image. So that lighting and shading can be supplied by regular means by a GPU. Otherwise you'll get shade on shade if the models are used in a regular graphics engine.
The *unofficial* Two Minute Papers discord server is now available. If you wish to volunteer/help, please let the mod(s) know! Thank you so much! reddit.com/r/twominutepapers/comments/f9u640/discord_or_slack/
I'm in
is good
Two Minute Papers when did you become a doctor?
Why is it unoficcial?
@@FeatherSlowfall cause official two minute papers doesn't own it
last time I was this early, Dr Károly Zsolnai-Fehér wasn't a doctor yet
Lol me 2, I'm glad he's now. He deserves it.
Same here. I'm so glad that he's called Doctor now.
You missed a video then
He was born a doctor
I could never in hundred years know how to write his name
I was so surprised when you said “doctor” that I dropped my papers. I wish someone warned me to hold on to them. Congratulations!
What a time to be alive - Congrats Doc
Károly is a Doctor now? Congrats man!
Merlin Kater watch the previous video.
hes not the kind with access to any mind altering drugs
I was thinking the same thing
Use AI to generate random stories
Use AI to generate photos from stories
Use AI to generate videos from photos
Use AI to upscale and make it 60fps
An AI movie?
Oh yes please damn i want to see that frankenstein monster of a movie yes please
There's already a movie whose screenplay was generated by AI, then fully acted out.
But that was a few years back, prior to transformers.
I wonder how things have progressed since then.
Yes!
Hahaha
Already has AI music, not recommend thou :))
Oh, you got your doctor's degree? Congratulations!
You are very kind, thank you. So happy!
Someone call a doctor.. My A.I. has just choked on a paper jam.
@@opendstudio7141 silly AI, I think you need to teach it to hold onto its papers better
@@king999art WHAT A TIME TO BE ALIVE!
Computer, enhance! *Rotate in 3D!*
"Now, show the perpetrator who was standing behind the corner!"
Stop! Enhamce that reflection on Epstein's glasses!... Yes! I think we got... Her???
Risto Paasivirta 😂😂
First time I heard you say "Doctor". It rolled off your tongue smoothly! Beautiful! Congrats on your achievement! You have always been a PhD to us all.
Just casually dropping that "Doctor"...
It's OK, I'd do it too.
3:21 Mini Cooper. Meenee cupper. Mghig eroigoig.
Luke Faulkner www
Bird. birb. bew.
You're totally rocking that doctor title.
3:20 Interesting that those cars ended up with completely different headlights compared to the source photos. Wonder why that would be.
Looks like it generates most common car model, the same for the bird.
Disclaimer, I don't know much about any of this past a surface level. But I'd imagine it might be because of the input reference materials used to train or design the AI having more generic lights. The lights for the 3D render of the Mini Cooper look like it's from an old Volkswagen Golf, which would be an ideal candidate for hatchbacks because of it's lack of a distinct feature; it's a generic car. I'd imagine that's what happens to the birds too, the AI just pulls from the reference images used to train it, and applies it accordingly.
@@mustakeenbari_serena_silentium Right, but you'd think that it wouldn't change the parts that it CAN see, and just reconstruct the rest.
@@CGPacifica You're right, it can pull from the input itself since it can see it clearly, why pull from the resources?
I'm just thinking out loud but perhaps it recognized the input as a "vehicle" and knowing it's a vehicle means it has to be at least somewhat symmetrical. Maybe the lights, or just the features on the far side (the part it can't see) are too much for it to infer and so it just approximates it to reference material instead?
It looks like it's not so much converting a 2d image to 3d, but instead creating a 3d object from what it was trained on and using the 2d image as a source for basic features-- almost like if someone was describing the photo and someone else was creating a 3d object based on that description.
Doctor! Congratulations 🎉🎊🍾🎈
You missed a video. The video before had a funny moment in the intro
This low level of detail already has applications such as for making game prototypes, with more fidelity than whiteboxing, just step outside or walk a trail and capture stand in objects. OR and this is the big one just feed mockup images to it.
This explains visual perception to a high degree. It shows how people deconstruct the scene they're present in into objects, and it also shows why people can project their awareness into space and look at things from another perspective. It also explains out of body experiences.
the only thing I don´t understand:
why does the AI image look so different in the same perspective as the input photo?
Couldn´t they project the texture from the camera perspective onto the model, so at least the visible part of the model looks better?
3:35
They could and it'd be trivial, but it's not what they're trying to do. For a commercial application, google earth for example, it'd be an obvious improvement, but I believe here they're trying to generate everything from scratch to create more generalised and comprehensive method. This method will eventually work just from word descriptors for example, while projection mapped approach wouldn't.
If we did that it would, essentially, be cheating. The point is to see what the neural network is "seeing" and how it builds the image. They will of course continue to make the image better until it can one day create a perfect photo realistic recreation.
What they ought to do is have AI 2 test the finished output of AI 1 against that original angle and either reward or punish AI 1 for better and worse versions of the same object.
This would train it on specific objects after which you'd have to train it on categories of things (e.g., 👎🐦 ➡ 👍🐦, 👎🦉➡ 👍🦉, 👎🦚 ➡ 👍🦚, 👎🦆 ➡ 👍🦆 ⏩ good bird)
The hummingbird example -clearly this is a bird, so object detection / categorisation would tell the net that it needed two wings as its a bird! Humans can fill in the details becuase they would know that a bird has two wings and it is in flight so any different angle is going to show the birds wings... 2 more papers...
Doctor? Congratulations!!! Well done!!!
This channel has been a huge inspiration to me.
Every time I am amazed. Thanks for putting the time in these videos!
Cool work! A few thoughts:
A system like this is theoretically impossible if I want to use it for 'any ' object because a single input image does not contain enough information for the computer to know what the other sides look like or what 'thickness' anything has. The only reason why it does know here, is because it was trained on specific examples. If I were to get a random foreign object that was not in the training set, it would likely not be able to figure some things out.
Second, I doubt the resulting model is at all a starting point for a 3D artist. They can't just 'add in the details' like you say, because the mesh itself likely has poor edge loops and is 'unhandleable' or unoptimized for any sort of software to be able to work with it well. Same goes for the generated UV and texture. Nothing beats the modeling and unwrapping skills of a human, because they truly understand what makes the most sense to do. That car UV looked really inefficient for example, using 4 different spots for every wheel while that could have been one.
Still, it provides an interesting starting point and may venture into better applications in the future.
Thanks doc! This is awesome. Also congratulations!
What a time to be alive!
Did u just say doctor???????? Congratulations !!!!!!! May the papers be with you
Did you notice he said "Doctor"... Congrats on your thesis dude
Nice to hear that you now are doktor :)
Hope you keep up the good videos
This would serve as a huge bonus for not only robots but anything relating to entertainment which would be especially modeling for video games and movies. Really awesome to see.
I think what I love so much about these videos is that you can see that the technique is still not perfect, and that in the future we will likely look back on this and scoff, but still see it as one of many stepping stones in computer generation. You get the sense you’re watching history in the making.
Now THIS is scary, congrats on your Doctorate!
Congrats on finishing your Ph. D, Károly!! Hoping to publish soon, as well! You're an inspiration!!
Congratulations doctor.
This software is to die for!
Congrats Karoly!!! You became a doctor!
It's pretty impressive generating the 3D, but it looks like it would be better off projecting the photo onto the mesh after that.
i did that once with a 3d face mesh inferred from photo( using vrn ) and you are absolutely right, projected texture result looked much better
Wow it's amazing,
Thanks dr.
3:00
input - The dude she tells me not to worry about
CMR - Me
This one got me good. 😄
This technique has been around for over a decade! I saw a documentary called CSI NY back in 2009 where an investigator zoomed, rotated and enhanced a photo using a computer program like this.
what if it's given 3 or more pictures?
would greatly improve the photogrammetry process.
Yeah, I'm definitely thinking this will enhance photogrammetry for the time being, rather than outright replacing it. It'll probably make scanning reflective objects and parsing out shadows a lot easier, so objects wouldn't have to ideally be shot in diffuse ambient light to extract textures and minimize redlections.
it would make sense, of course there would be a lot of complexity to overcome. Conceptually you should be able to use trigonometry on 3 separate images to make a really accurate model, especially if the program knew the exact angle the picture was taken from.
As somebody who has reimplemented this paper for a dissertation, it does (usually) already use multiple views during training for better 3D comprehension. This is not mandatory, but tends to help a lot with the 3D shape
You can still improve this neural network if it distinguishes between objects and represents their approximate geometry in space. In addition, she can determine where the wheel or any other part should be located, then color the delati and assemble the whole machine separately, if the low-poly modeling from one object, then determine at what point the glass is, etc. as a result of the photograph, she will be able to collect all that she knows from the details inside the space. Cars, animals, people
If you added a classification/recognition + modification neural network, to this one, it should get pretty close to perfect.
Whatever features the AI is unsure of, it can just assume that its like other birds that it knows about, like humans do.
When human beings turn 2D into 3D, we fill in whatever we dont know with what we expect. I.e. when were looking at that bird, wed automatically fill in specific details about feather, head shape etc, that we normally see in birds.
Congratulations on Doctor! What a time to be alive!
DOCTOR Károly Zsolnai-Fehér! What a time to be alive!
Congratulations Doctor!
The name "Dr Kàroly Zsolnai-Fehér" just sounds right. Good addition to the intro.
This opens up many many possibilities.
This thing is revolutionary!!
Congratulations, doctor. 🎉🎊
That's amazing!!! Creating PS1 level graphics from a single photograph?!? That's insane! How would the results look if we took a synthesized image from previous works and fed it into this one? Would it end up with several artifacts? And if so, could we train a network to search for artifacts and further refine the generation process of Both of the generating networks?
Neural Networks are so cool 😲.
Got to study them 😅
This system is a huge step for automation, so autonomous driving systems can generate occluded or far sides of objects the way our minds do automatically!
Congratulations on your successful thesis defense. I'm sure it was a long hard road, and it's amazing to see you set off on your new journey.
Where did you find time to get your Ph.D. between making all these videos?
how do you generate training data for this? do you have pics of the thing from different angles?
2:34 ... that's something
This teases the possibility of finally having a useful robotic house cleaner. Especially with iterative feedback from the actual environment into the planner.
no one:
me reading "ours": *soviet music starts playing*
Congrats Doctor :D
You're a doctor? Awesome congrats man!
I imagine this could be improved by an algorithm that mirrors the objects, since most natural and man-made objects are symmetrical. If the network could somehow find the mirror of least resistance (most overlap with mirror object) in the least destructive way, then small faults like single wings could be filtered out.
I bet that in 10 years from now we will be able to watch all our favorite movies in complete 3D reconstruction in real time, having the original movie as reference only. In 20 years from now, we will enjoy them as hologram projection movies.
Doctor! Very much compliments! Congratulations!⭐⭐⭐🎊🎉🥳
Hey, I'm working on a publication based on this research and would love to add a citation of your paper, but I'm having problems executing the code (Issue #6 on your github repo).
Would appreciate assistance!
I wonder if you could use a library of existing 3d models, then use image analysis to work out what is in the image, and find a relavent 3d model to use as a starting point
can it work with multiple images to increase the 3d model?
The first thing I thought of when I watched this was that this algorithm could be part of a network that watches old movies, creates models of all the objects in a scene and then renders an enhanced version of the same footage. We 've already seen RNNs that can ensure temporal coherence and physics based simulations that would provide enough information to perhaps even enhance footage considered to be damaged beyond repair. Is it very far fetched to think that a similar approach might be able to do the same with audio? I 'm absolutely fascinated and can 't wait to see someone try that.
How can I use this project to an avatar based startup? if you could please help, I'd be glad. Thanks
If I was a police officer looking at those 3d car models the blurry CMR looked more accurate to shapes and branding styles of the car manufacturer than the newer one. It was hard to tell what brand car the newer one was making.
I love not understanding anything, but still being surprised by stuff like this.
Any big progress since this video? (2 years old). Very impressive
congrats doc!
Is it a similar technique used to create the 3D representation of a city in Google Maps?
Amazing video btw !
@@Crayphor I remember hearing how the developers behind Microsoft Flight Simulator 2020 used AI to help build the cities in that game from real maps. Something like this would probably have an even better effect. Perhaps, one day, we can have street view without the need for cars to 360-degree camera drives about the world.
Google uses photogrammetry technique.
I think Google Maps is using something based on the Category-specific Mesh Reconstruction method, or at least related to it. You usually see those jagged artifacts in Maps which appear in CMR.
@@Crayphor like trees!
Thank you all for the knowledge!
Happy new PhD.
Thank you.
And what a paper!
I can imagine those robots in streets in very close future. Like 10 years from now.
Congratulations Doctor ;)
Could this be applied to 2D satellite imagery to generate 3D surfaces?
Maybe giving the network the 3D axis and light direction can help a lot, or even make a miror of one face of the object to replicate so the final model is actually simetrical. I feel like a few features can greatly improve the end result
Can you hybrid this with photogrammetry to create ultra accurate and error resistant photogrammetry?
thankyou doctor.. now teach us more about A.I more.. because we are curious..!
Congratulations Dr. Zsolnai-Fehér!
This is one of those things that is incredibly technically difficult but can't always be appreciated. People who aren't aware of what goes into those renders might look at them and think "wow, that's incredibly fuzzy and inaccurate. It barely even looks like a _____!" But I'm very glad to be able to appreciate this technology with the others watching :)
Imagine retraining the network with two slightly differently displaced images just like your eyes!
THE TIME HAS FINALLY COME FOR 3D
Two more papers down the line™ and it can include physics on those objects
Maybe this could improve street view when transitioning from a position to the next.
*3D artists* : *damn dem machines , dey took 'er jobs*
what a time to be alive!!! i love this so much!
This is such a nice and curious community you gathered here! Congratulation on creating something socially significant in the digital age with digital measures :)
I really hope we'll be able to rebuild a 3D world from street view images without having to drive all the streets over again.
probably gonna happen anyway, those images get outdated. on my area there are images from stores, chains etc, which are long gone
@@joshinils The problem is that there are still plenty of places that have only been photographed once quite a while ago. Definitely "popular" places will have their imagery updated anyway, but it would still be great to have a way to cover the less popular areas.
Cheers for what you are doing doc!!!!
This kind of software can be very useful for easily creating video game assets from images. Is there anywhere I can download it/
is their already some free software for making 3D from video or multiple photos
This would be great for game development.
Hoping for some software 4 papers down the line 👌🏻
Was it only trained on birds and cars? If so, how much effort does it take to get it capable of handling other objects?
Thank god I never learned any kind of 3d modelling /sculpting.
I knew this day would come.
You'd be surprised of what's coming in other things in these fields..
Yes this will benefit every designer...
why do most project use torch?
I could imagine a followup paper that combines multiple inputs to create a more accurate 3D image.
Dr! Congratulations!!!
I wish you'd talk more about the limitations of such techniques.
TMP: "Hold onto your papers!"
Also TMP: *Rotates SCP 173 on a platter*
Love the new "Hold on to your papers" icon
3:17 - clearly (by the fact that the lights are from a completely different car) -- they are pulling in models from elsewhere, so why don't they look as good as CAD images?
Hmm. Weird. Nice spot
This will make Google Maps street view so much better!
What would be clever would be if it could accurately remove the natural shading and lighting in the input image. So that lighting and shading can be supplied by regular means by a GPU. Otherwise you'll get shade on shade if the models are used in a regular graphics engine.
Great vid Dr.!
This is gonna be so useful in game