This Neural Network Creates 3D Objects From Your Photos

Two Minute Papers

Просмотров 253 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 27 дек 2024

Комментарии • 441

@TwoMinutePapers 4 года назад ⁺⁸³
The *unofficial* Two Minute Papers discord server is now available. If you wish to volunteer/help, please let the mod(s) know! Thank you so much! reddit.com/r/twominutepapers/comments/f9u640/discord_or_slack/
@nadiyayasmeen3928 4 года назад ⁺³
I'm in
@imatree4015 4 года назад ⁺²
is good
@vinster9165 4 года назад
Two Minute Papers when did you become a doctor?
@FeatherSlowfall 4 года назад
Why is it unoficcial?
@imatree4015 4 года назад ⁺¹
@@FeatherSlowfall cause official two minute papers doesn't own it
@RedDuke42 4 года назад ⁺¹¹⁴⁵
last time I was this early, Dr Károly Zsolnai-Fehér wasn't a doctor yet
@nabil.hamawi 4 года назад ⁺⁴¹
Lol me 2, I'm glad he's now. He deserves it.
@kijetesantakaluSokete 4 года назад ⁺²²
Same here. I'm so glad that he's called Doctor now.
@albendsouza 4 года назад ⁺⁹
You missed a video then
@thomassynths 4 года назад ⁺¹⁴
He was born a doctor
@erblinbeqa6550 4 года назад ⁺¹⁸
I could never in hundred years know how to write his name
@RickLindstrom 4 года назад ⁺³⁵⁰
I was so surprised when you said “doctor” that I dropped my papers. I wish someone warned me to hold on to them. Congratulations!
@willdarling1 4 года назад ⁺⁷
What a time to be alive - Congrats Doc
@Katerspacedopwater 4 года назад ⁺⁶²⁷
Károly is a Doctor now? Congrats man!
@leecaste 4 года назад ⁺¹¹
Merlin Kater watch the previous video.
@judgeomega 4 года назад ⁺²
hes not the kind with access to any mind altering drugs
@vinster9165 4 года назад
I was thinking the same thing
@mathematicalninja2756 4 года назад ⁺⁶¹⁶
Use AI to generate random stories
Use AI to generate photos from stories
Use AI to generate videos from photos
Use AI to upscale and make it 60fps
An AI movie?
@Katerspacedopwater 4 года назад ⁺¹¹²
Oh yes please damn i want to see that frankenstein monster of a movie yes please
@toyuyn 4 года назад ⁺³⁷
There's already a movie whose screenplay was generated by AI, then fully acted out.
But that was a few years back, prior to transformers.
I wonder how things have progressed since then.
@over7532 4 года назад ⁺¹
Yes!
@talhatariqyuluqatdis 4 года назад ⁺¹
Hahaha
@dt9327 4 года назад ⁺⁷
Already has AI music, not recommend thou :))
@toboterxp8155 4 года назад ⁺³⁹⁵
Oh, you got your doctor's degree? Congratulations!
@TwoMinutePapers 4 года назад ⁺¹⁴⁵
You are very kind, thank you. So happy!
@opendstudio7141 4 года назад ⁺¹⁰
Someone call a doctor.. My A.I. has just choked on a paper jam.
@king999art 4 года назад ⁺²
@@opendstudio7141 silly AI, I think you need to teach it to hold onto its papers better
@cvspvr 4 года назад ⁺³
@@king999art WHAT A TIME TO BE ALIVE!
@-Alluvium- 4 года назад ⁺²²⁷
Computer, enhance! *Rotate in 3D!*
@ristopaasivirta9770 4 года назад ⁺⁴⁴
"Now, show the perpetrator who was standing behind the corner!"
@-Alluvium- 4 года назад ⁺³³
Stop! Enhamce that reflection on Epstein's glasses!... Yes! I think we got... Her???
@thyros_ 4 года назад ⁺²
Risto Paasivirta 😂😂
@ronjohn200 4 года назад ⁺⁵⁸
First time I heard you say "Doctor". It rolled off your tongue smoothly! Beautiful! Congrats on your achievement! You have always been a PhD to us all.
@dryued6874 4 года назад ⁺⁶³
Just casually dropping that "Doctor"...
It's OK, I'd do it too.
@LukeFaulkner 4 года назад ⁺³⁶
3:21 Mini Cooper. Meenee cupper. Mghig eroigoig.
@Hoshikani 4 года назад
Luke Faulkner www
@vulnerablerummy 4 года назад ⁺¹
Bird. birb. bew.
@marigo856 4 года назад ⁺⁴⁶
You're totally rocking that doctor title.
@CGPacifica 4 года назад ⁺³²
3:20 Interesting that those cars ended up with completely different headlights compared to the source photos. Wonder why that would be.
@radeksvoboda7629 4 года назад ⁺⁹
Looks like it generates most common car model, the same for the bird.
@mustakeenbari_serena_silentium 4 года назад ⁺⁹
Disclaimer, I don't know much about any of this past a surface level. But I'd imagine it might be because of the input reference materials used to train or design the AI having more generic lights. The lights for the 3D render of the Mini Cooper look like it's from an old Volkswagen Golf, which would be an ideal candidate for hatchbacks because of it's lack of a distinct feature; it's a generic car. I'd imagine that's what happens to the birds too, the AI just pulls from the reference images used to train it, and applies it accordingly.
@CGPacifica 4 года назад ⁺¹
@@mustakeenbari_serena_silentium Right, but you'd think that it wouldn't change the parts that it CAN see, and just reconstruct the rest.
@mustakeenbari_serena_silentium 4 года назад
@@CGPacifica You're right, it can pull from the input itself since it can see it clearly, why pull from the resources?
I'm just thinking out loud but perhaps it recognized the input as a "vehicle" and knowing it's a vehicle means it has to be at least somewhat symmetrical. Maybe the lights, or just the features on the far side (the part it can't see) are too much for it to infer and so it just approximates it to reference material instead?
@SuperAlgae 4 года назад ⁺⁹
It looks like it's not so much converting a 2d image to 3d, but instead creating a 3d object from what it was trained on and using the 2d image as a source for basic features-- almost like if someone was describing the photo and someone else was creating a 3d object based on that description.
@phylliida 4 года назад ⁺⁶⁹
Doctor! Congratulations 🎉🎊🍾🎈
@Guztav1337 4 года назад ⁺²
You missed a video. The video before had a funny moment in the intro
@TheNewton 4 года назад ⁺¹
This low level of detail already has applications such as for making game prototypes, with more fidelity than whiteboxing, just step outside or walk a trail and capture stand in objects. OR and this is the big one just feed mockup images to it.
@Jon58004 4 года назад ⁺²
This explains visual perception to a high degree. It shows how people deconstruct the scene they're present in into objects, and it also shows why people can project their awareness into space and look at things from another perspective. It also explains out of body experiences.
@FreshPe 4 года назад ⁺²⁰
the only thing I don´t understand:
why does the AI image look so different in the same perspective as the input photo?
Couldn´t they project the texture from the camera perspective onto the model, so at least the visible part of the model looks better?
3:35
@z3dar 4 года назад ⁺²³
They could and it'd be trivial, but it's not what they're trying to do. For a commercial application, google earth for example, it'd be an obvious improvement, but I believe here they're trying to generate everything from scratch to create more generalised and comprehensive method. This method will eventually work just from word descriptors for example, while projection mapped approach wouldn't.
@Viperzka 4 года назад ⁺⁵
If we did that it would, essentially, be cheating. The point is to see what the neural network is "seeing" and how it builds the image. They will of course continue to make the image better until it can one day create a perfect photo realistic recreation.
@RubelliteFae 4 года назад ⁺⁴
What they ought to do is have AI 2 test the finished output of AI 1 against that original angle and either reward or punish AI 1 for better and worse versions of the same object.
This would train it on specific objects after which you'd have to train it on categories of things (e.g., 👎🐦 ➡ 👍🐦, 👎🦉➡ 👍🦉, 👎🦚 ➡ 👍🦚, 👎🦆 ➡ 👍🦆 ⏩ good bird)
@noutram1000 4 года назад ⁺¹
The hummingbird example -clearly this is a bird, so object detection / categorisation would tell the net that it needed two wings as its a bird! Humans can fill in the details becuase they would know that a bird has two wings and it is in flight so any different angle is going to show the birds wings... 2 more papers...
@lucidzfl 4 года назад ⁺¹
Doctor? Congratulations!!! Well done!!!
This channel has been a huge inspiration to me.
@AlexandruJalea 4 года назад ⁺²
Every time I am amazed. Thanks for putting the time in these videos!
@LiamSwiftTheDog 4 года назад ⁺¹
Cool work! A few thoughts:
A system like this is theoretically impossible if I want to use it for 'any ' object because a single input image does not contain enough information for the computer to know what the other sides look like or what 'thickness' anything has. The only reason why it does know here, is because it was trained on specific examples. If I were to get a random foreign object that was not in the training set, it would likely not be able to figure some things out.
Second, I doubt the resulting model is at all a starting point for a 3D artist. They can't just 'add in the details' like you say, because the mesh itself likely has poor edge loops and is 'unhandleable' or unoptimized for any sort of software to be able to work with it well. Same goes for the generated UV and texture. Nothing beats the modeling and unwrapping skills of a human, because they truly understand what makes the most sense to do. That car UV looked really inefficient for example, using 4 different spots for every wheel while that could have been one.
Still, it provides an interesting starting point and may venture into better applications in the future.
@JohnSmith-ox3gy 4 года назад ⁺¹
Thanks doc! This is awesome. Also congratulations!
@bikkikumarsha 4 года назад ⁺²¹
What a time to be alive!
@bhaskersriharshasuri7359 4 года назад ⁺¹²
Did u just say doctor???????? Congratulations !!!!!!! May the papers be with you
@frosecold 4 года назад ⁺¹⁰
Did you notice he said "Doctor"... Congrats on your thesis dude
@mfrimannm 4 года назад ⁺¹
Nice to hear that you now are doktor :)
Hope you keep up the good videos
@TheDiscusserOriginal 4 года назад
This would serve as a huge bonus for not only robots but anything relating to entertainment which would be especially modeling for video games and movies. Really awesome to see.
@alexc1485 4 года назад
I think what I love so much about these videos is that you can see that the technique is still not perfect, and that in the future we will likely look back on this and scoff, but still see it as one of many stepping stones in computer generation. You get the sense you’re watching history in the making.
@jrow96 4 года назад ⁺⁵
Now THIS is scary, congrats on your Doctorate!
@datuhhhh 4 года назад
Congrats on finishing your Ph. D, Károly!! Hoping to publish soon, as well! You're an inspiration!!
@geneuscorpus9810 4 года назад ⁺¹
Congratulations doctor.
This software is to die for!
@samori38383 4 года назад ⁺²
Congrats Karoly!!! You became a doctor!
@mdoerkse 4 года назад ⁺¹
It's pretty impressive generating the 3D, but it looks like it would be better off projecting the photo onto the mesh after that.
@mrnobody1042 4 года назад
i did that once with a 3d face mesh inferred from photo( using vrn ) and you are absolutely right, projected texture result looked much better
@MajidAli-xu5lk 4 года назад ⁺³
Wow it's amazing,
Thanks dr.
@Wecoc1 4 года назад ⁺²²
3:00
input - The dude she tells me not to worry about
CMR - Me
@TwoMinutePapers 4 года назад ⁺⁴
This one got me good. 😄
@pasqualz 4 года назад
This technique has been around for over a decade! I saw a documentary called CSI NY back in 2009 where an investigator zoomed, rotated and enhanced a photo using a computer program like this.
@marc_frank 4 года назад ⁺⁶
what if it's given 3 or more pictures?
would greatly improve the photogrammetry process.
@stab2486 4 года назад
Yeah, I'm definitely thinking this will enhance photogrammetry for the time being, rather than outright replacing it. It'll probably make scanning reflective objects and parsing out shadows a lot easier, so objects wouldn't have to ideally be shot in diffuse ambient light to extract textures and minimize redlections.
@Nabo00o 4 года назад
it would make sense, of course there would be a lot of complexity to overcome. Conceptually you should be able to use trigonometry on 3 separate images to make a really accurate model, especially if the program knew the exact angle the picture was taken from.
@jledragon 4 года назад ⁺¹
As somebody who has reimplemented this paper for a dissertation, it does (usually) already use multiple views during training for better 3D comprehension. This is not mandatory, but tends to help a lot with the 3D shape
@AnotherSapiens 4 года назад
You can still improve this neural network if it distinguishes between objects and represents their approximate geometry in space. In addition, she can determine where the wheel or any other part should be located, then color the delati and assemble the whole machine separately, if the low-poly modeling from one object, then determine at what point the glass is, etc. as a result of the photograph, she will be able to collect all that she knows from the details inside the space. Cars, animals, people
@talhatariqyuluqatdis 4 года назад ⁺¹
If you added a classification/recognition + modification neural network, to this one, it should get pretty close to perfect.
Whatever features the AI is unsure of, it can just assume that its like other birds that it knows about, like humans do.
When human beings turn 2D into 3D, we fill in whatever we dont know with what we expect. I.e. when were looking at that bird, wed automatically fill in specific details about feather, head shape etc, that we normally see in birds.
@RemyNote 4 года назад ⁺¹¹
Congratulations on Doctor! What a time to be alive!
@katelikesrectangles 4 года назад ⁺¹
DOCTOR Károly Zsolnai-Fehér! What a time to be alive!
@falnesioghander6929 4 года назад ⁺²
Congratulations Doctor!
@RodrigoTheHappyDog 4 года назад ⁺¹
The name "Dr Kàroly Zsolnai-Fehér" just sounds right. Good addition to the intro.
@beedykh2235 4 года назад
This opens up many many possibilities.
This thing is revolutionary!!
@LaughingTeapot 4 года назад ⁺²
Congratulations, doctor. 🎉🎊
@Theminecraftian772 4 года назад ⁺⁴
That's amazing!!! Creating PS1 level graphics from a single photograph?!? That's insane! How would the results look if we took a synthesized image from previous works and fed it into this one? Would it end up with several artifacts? And if so, could we train a network to search for artifacts and further refine the generation process of Both of the generating networks?
@mixingitup7653 4 года назад ⁺¹
Neural Networks are so cool 😲.
Got to study them 😅
@Freshbott2 4 года назад
This system is a huge step for automation, so autonomous driving systems can generate occluded or far sides of objects the way our minds do automatically!
@zaksmith1035 4 года назад
Congratulations on your successful thesis defense. I'm sure it was a long hard road, and it's amazing to see you set off on your new journey.
@sciencecompliance235 4 года назад ⁺¹
Where did you find time to get your Ph.D. between making all these videos?
@ariel4778 4 года назад
how do you generate training data for this? do you have pics of the thing from different angles?
@danielkrajnik3817 3 года назад ⁺¹
2:34 ... that's something
@flowerpt 4 года назад
This teases the possibility of finally having a useful robotic house cleaner. Especially with iterative feedback from the actual environment into the planner.
@elidoz9522 4 года назад ⁺⁶
no one:
me reading "ours": *soviet music starts playing*
@mimireich 4 года назад ⁺¹⁰
Congrats Doctor :D
@recklesflam1ngo968 4 года назад ⁺³
You're a doctor? Awesome congrats man!
@Invaeyncible 4 года назад ⁺¹
I imagine this could be improved by an algorithm that mirrors the objects, since most natural and man-made objects are symmetrical. If the network could somehow find the mirror of least resistance (most overlap with mirror object) in the least destructive way, then small faults like single wings could be filtered out.
@ct8060 4 года назад
I bet that in 10 years from now we will be able to watch all our favorite movies in complete 3D reconstruction in real time, having the original movie as reference only. In 20 years from now, we will enjoy them as hologram projection movies.
@Matteinko 4 года назад ⁺¹
Doctor! Very much compliments! Congratulations!⭐⭐⭐🎊🎉🥳
@razkarl 4 года назад
Hey, I'm working on a publication based on this research and would love to add a citation of your paper, but I'm having problems executing the code (Issue #6 on your github repo).
Would appreciate assistance!
@amyshaw893 4 года назад
I wonder if you could use a library of existing 3d models, then use image analysis to work out what is in the image, and find a relavent 3d model to use as a starting point
@MaZe741 4 года назад
can it work with multiple images to increase the 3d model?
@dertobusch5720 4 года назад
The first thing I thought of when I watched this was that this algorithm could be part of a network that watches old movies, creates models of all the objects in a scene and then renders an enhanced version of the same footage. We 've already seen RNNs that can ensure temporal coherence and physics based simulations that would provide enough information to perhaps even enhance footage considered to be damaged beyond repair. Is it very far fetched to think that a similar approach might be able to do the same with audio? I 'm absolutely fascinated and can 't wait to see someone try that.
@Osmosick 2 года назад
How can I use this project to an avatar based startup? if you could please help, I'd be glad. Thanks
@GoldCaesar 4 года назад
If I was a police officer looking at those 3d car models the blurry CMR looked more accurate to shapes and branding styles of the car manufacturer than the newer one. It was hard to tell what brand car the newer one was making.
@gamerboygaming 4 года назад ⁺¹
I love not understanding anything, but still being surprised by stuff like this.
@evgenymikheev4462 2 года назад ⁺¹
Any big progress since this video? (2 years old). Very impressive
@Zakoyo 4 года назад ⁺⁴
congrats doc!
@NabilJabour 4 года назад ⁺¹⁵
Is it a similar technique used to create the 3D representation of a city in Google Maps?
Amazing video btw !
@gregorythompson8627 4 года назад ⁺²
@@Crayphor I remember hearing how the developers behind Microsoft Flight Simulator 2020 used AI to help build the cities in that game from real maps. Something like this would probably have an even better effect. Perhaps, one day, we can have street view without the need for cars to 360-degree camera drives about the world.
@NCSiebertdesign 4 года назад
Google uses photogrammetry technique.
@LegacyCrono 4 года назад ⁺¹
I think Google Maps is using something based on the Category-specific Mesh Reconstruction method, or at least related to it. You usually see those jagged artifacts in Maps which appear in CMR.
@NCSiebertdesign 4 года назад ⁺²
@@Crayphor like trees!
@NabilJabour 4 года назад
Thank you all for the knowledge!
@red181526 4 года назад ⁺¹
Happy new PhD.
Thank you.
And what a paper!
I can imagine those robots in streets in very close future. Like 10 years from now.
@legotechnicmastery 4 года назад ⁺²
Congratulations Doctor ;)
@jebbo-c1l 4 года назад
Could this be applied to 2D satellite imagery to generate 3D surfaces?
@lasauceart 4 года назад
Maybe giving the network the 3D axis and light direction can help a lot, or even make a miror of one face of the object to replicate so the final model is actually simetrical. I feel like a few features can greatly improve the end result
@Silpheedx 4 года назад
Can you hybrid this with photogrammetry to create ultra accurate and error resistant photogrammetry?
@ryaeon9793 4 года назад ⁺¹
thankyou doctor.. now teach us more about A.I more.. because we are curious..!
@Baschdi2610 4 года назад
Congratulations Dr. Zsolnai-Fehér!
@undefined6947 4 года назад
This is one of those things that is incredibly technically difficult but can't always be appreciated. People who aren't aware of what goes into those renders might look at them and think "wow, that's incredibly fuzzy and inaccurate. It barely even looks like a _____!" But I'm very glad to be able to appreciate this technology with the others watching :)
@Brennanium 4 года назад
Imagine retraining the network with two slightly differently displaced images just like your eyes!
@JorgetePanete 4 года назад ⁺¹
THE TIME HAS FINALLY COME FOR 3D
@JorgetePanete 4 года назад
Two more papers down the line™ and it can include physics on those objects
@joannot6706 4 года назад ⁺⁵
Maybe this could improve street view when transitioning from a position to the next.
@rohitghumare7515 4 года назад ⁺¹
*3D artists* : *damn dem machines , dey took 'er jobs*
@BubbaLoob43 4 года назад
what a time to be alive!!! i love this so much!
@asconblake 4 года назад
This is such a nice and curious community you gathered here! Congratulation on creating something socially significant in the digital age with digital measures :)
@pyk_ 4 года назад ⁺⁵
I really hope we'll be able to rebuild a 3D world from street view images without having to drive all the streets over again.
@joshinils 4 года назад
probably gonna happen anyway, those images get outdated. on my area there are images from stores, chains etc, which are long gone
@pyk_ 4 года назад
@@joshinils The problem is that there are still plenty of places that have only been photographed once quite a while ago. Definitely "popular" places will have their imagery updated anyway, but it would still be great to have a way to cover the less popular areas.
@paolopiccoli7042 4 года назад
Cheers for what you are doing doc!!!!
@konstantinrebrov675 4 года назад
This kind of software can be very useful for easily creating video game assets from images. Is there anywhere I can download it/
@omnianti0 4 года назад
is their already some free software for making 3D from video or multiple photos
@afterarrival8169 4 года назад ⁺²
This would be great for game development.
Hoping for some software 4 papers down the line 👌🏻
@Benny_Blue 4 года назад
Was it only trained on birds and cars? If so, how much effort does it take to get it capable of handling other objects?
@agnel47 4 года назад
Thank god I never learned any kind of 3d modelling /sculpting.
I knew this day would come.
@Zoza15 4 года назад ⁺¹
You'd be surprised of what's coming in other things in these fields..
Yes this will benefit every designer...
@truboxl 4 года назад
why do most project use torch?
@seandunn7977 4 года назад
I could imagine a followup paper that combines multiple inputs to create a more accurate 3D image.
@mwdcodeninja 4 года назад ⁺¹
Dr! Congratulations!!!
@unvergebeneid 4 года назад
I wish you'd talk more about the limitations of such techniques.
@graysonking16 4 года назад ⁺²
TMP: "Hold onto your papers!"
Also TMP: *Rotates SCP 173 on a platter*
@bendertvandijk7275 4 года назад
Love the new "Hold on to your papers" icon
@dougb70 4 года назад
3:17 - clearly (by the fact that the lights are from a completely different car) -- they are pulling in models from elsewhere, so why don't they look as good as CAD images?
@maraschwartz6731 4 года назад
Hmm. Weird. Nice spot
@fabianosoriani 4 года назад
This will make Google Maps street view so much better!
@joeblack4436 4 года назад
What would be clever would be if it could accurately remove the natural shading and lighting in the input image. So that lighting and shading can be supplied by regular means by a GPU. Otherwise you'll get shade on shade if the models are used in a regular graphics engine.
@ClarkPotter 4 года назад
Great vid Dr.!
@roccoleader279 4 года назад
This is gonna be so useful in game

Следующие

Автовоспроизведение

Meshroom: 3D models from photos using free photogrammetry software