Stereo 3D Vision (How to avoid being dinner for Wolves) - Computerphile

Computerphile

Просмотров 145 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 14 янв 2025

Комментарии • 133

@Jader7777 9 лет назад ⁺³²
Best computerphile video:
- Clean desk
- Tidy shelf
- Nice hair
- Classic perforated printing paper
- Popped collar
@Triantalex Месяц назад
false.
@dvoraj20 9 лет назад ⁺¹⁷⁸
Frankly... this ended where I hoped it would start.
@unvergebeneid 9 лет назад ⁺⁴
+Jan Dvořák Well, there _are_ extra bits. Don't know if they contain what you wanted to see but I wanted to make sure you checked.
@hoopshank 9 лет назад ⁺⁶⁸
+Jan Dvořák Watch it backwards?
@simoncarlile5190 9 лет назад ⁺¹
+hoopshank lololol awesome response
@nickwoodward819 7 лет назад ⁺⁵
whereas your comment started where I hoped it would end.
@DeJayHank 9 лет назад ⁺²⁴
Since pictures can be a bit noisy due to sensor imperfections, and the information gained from a single pixel isn't that much, Stereo vision algorithms often utilize Block Matching. It means that instead of finding a single pixel in the other image, you look at a "block" of pixels around it (a 5x5 block for example) and see if a very similar block can be found in the other picture. It is much more robust to single pixels being very distorted due to noise etc. but of course takes more computing time since now you have to work with 25 pixels for each matching test, instead of just 1.
@giphe Год назад
That makes alot of sense, thanks for the additional info!
@giphe Год назад
So is it kind of like performing a convolution from the sets of pixels from one image onto the other image and finding the closest match from that?
@Triantalex Месяц назад
??
@unvergebeneid 9 лет назад ⁺⁵
You can also move one eye/camera to get "actual" 3D because it's mathematically the same as two eyes for static scenes and under certain circumstances it even resembles the 3D "qualia" if you want to call it that.
@ayanangshudasmajumder112 4 месяца назад
The best explanation of stereo-matching in the Internet.
@pirouettenerd2675 8 лет назад ⁺⁷
Needs a part 2.
@BGBTech 9 лет назад ⁺²
I once did it with low enough CPU use to run (sort of) on a Raspberry Pi at ~ 10 fps, but it was pretty crude (output depth map was 80x60, and a bit noisy/glitchy. input was a pair of 320x240 images from parallel webcams).
partial optimization was making use of blocky-VQ so that for a block of pixels (4x4), you can determine early if they are out of range (or are a nearly exact match). it was based on a trick I had used for doing motion compensation in video compression during capture (motion compensation helps somewhat with compression).
the trick greatly reduced the amount of pixel-by-pixel checks. also it worked internally using a dichromatic colorspace, partly to save space and also because it was cheaper to only compare two axes (whereas a single Y axis loses the ability for it to distinguish things based on color, reducing accuracy somewhat).
I had also tried unsuccessfully to use some designs based around Haar wavelets.
@DaanLuttik 9 лет назад ⁺⁵
Could you do one about how you account for reflective objects or objects in a space where lighting isn't ambient? I have some ideas on how one could do this. But I'd like to hear the smart ticks surrounding this topic.
@FabrizioBianchi 9 лет назад ⁺¹
Mike is the best speaker in the channel. AI guy comes a close second.
@DeJayHank 9 лет назад ⁺¹⁶
Would have liked if you showed an example of disparity map obtained from some stereo vision algorithm. It can help show what the result would look like.
@Kruglord 9 лет назад
+DeJayHank Google "disparity map" and look at the image results, you'll see a bunch of examples of what that might look like.
@DeJayHank 9 лет назад ⁺⁷
+Kruglord Well I know what it looks like because I've been working with Stereo matching algorithms with and without OpenCV, but I thought it would have been a good addition in the video for people who haven't seen.
@Kruglord 9 лет назад ⁺¹
Great video, I think a good follow up video might be how people have approached the correspondence problem, such as using SIFT or SURF points.
@LiborTinka 9 лет назад
+Kruglord Don't forget MSER, these are especially useful (area) features for wide-baseline stereo. Feature detectors and descriptors alone are an interesting topic.
@Pr0toc01 6 лет назад ⁺¹
at 6:12 he is talking about its only possible because they know the positions of the 2 cameras, and that if they dont know the camera positions they have to search through the whole image.
Question: Once you have done this one, two,...,ten times can you compute the cameras relative position?
Once you calculate the cameras relative position you could then use that to make any future searches easier?
@AhpgZfoc4s 9 лет назад ⁺²
Looks like the cameraman didn't have his customary dozen shots of espresso.
@TheDrawdex 9 лет назад ⁺³
I remember doing this for a final proyect. :D This would've been awesome.
@TheAprone 9 лет назад
When the 3D picture was zoomed into the screen at 1:32, I just barely was able to make out the image before they started messing with it. It doesn't match the simulated "answer" they displayed a few seconds later.
@steve1978ger 9 лет назад ⁺³
I played around with stereo vision using OpenCV, and found it really hard. Computers are still years away from having 3d vision like even a rather simple animal.
@BriMR 9 лет назад ⁺⁴
+steve1978ger That's why stereo vision is no longer used in most cases, the current laser and RGBD cameras are easier and more effective in depth feature recognition.
@gigige5928 2 года назад
....'like even rather a simple animal' - only predators have binocular vision which in turn are the most advanced biological life on this planet, like humans. overlapping field of vision might appear in some avians, but because totally different reasons (flight) and only for a few degrees, unlike predators who have true stereoscopic vision, there is no simple animal with that kind of trait
@steve1978ger 2 года назад
@@gigige5928 - okay, very simple animals may not have binocular vision, but "only predators" is an overgeneralization.
@jll8520 9 лет назад ⁺¹
1:42 Shots fired at the Fine Bros
@tombombadillo1 9 лет назад
just thought Id make a point that I find I judge the distance to an object with one eye primarily with focussing; refocussing on an object until its the least blurry, and having your brain estimate how far away it is from memory.
@DeJayHank 9 лет назад ⁺¹
+Hayden Muscat Yeah there is a lot more going on in the brain for distance estimation, like just knowing what size cars usually are or a mug or whatever is enough to decently judge its distance from you even with a single image.
There are algorithms that judge depth by taking several pictures from one camera with different focus for every image. This wouldn't quite work on moving objects though since it takes time to get enough pictures for it to be useful.
@LyCaNid 5 лет назад
Great explanation!
@abdulrahmanalmoamaralmadan7843 7 лет назад
Thank you for your great explanation . Amazing!
@sungjinchun1094 4 года назад
Wow nice explanation! thanks
@chrispomeroyYT 9 лет назад
I know you've already talked about color spaces, which was very interesting, but could you get Mike to do an episode on Gamma / Gamma Correction?
@WaqarRashid 8 лет назад
Its really helpful channel and had lots of interesting videos but I can't find some videos in order, some videos are hidden from the channel and there is only a small number of playlists.. Is there any website where I can access these videos in some order. Thanks.
@LiborTinka 9 лет назад
Reminds me of the "Fundamental Matrix Song", which is about the matrix connecting the two images. For more mathematically oriented people :-) Multiple-view geometry can be fun.
@OSrBurns 8 лет назад
Oh god, it's a headache to calculate all the points in analytic geometry, but is possible to use the focus of the camera to create a useful constant.
@Spongman 9 лет назад ⁺¹
Monocular vision gets accurate depth from micro focus changes. Otherwise how does your eye know how to focus when you close one eye?
@pratherat 8 лет назад
If I were to attempt this, I would create a coincidence map from the left and right image, representing how much each left pixel matched the corresponding right pixel, using an offset of a series of intervals from -somevalue to +somevalue. Somevalue weighed against each coincidence pixel should yield a kind of edge map depicting the differential offsets of similar pixels, interpreted as distance.
I dream about developing some kind of inverse GPU card. Instead of taking a set of 3-D polygons and rendering them onto a 2-D screen, this would take a set of 2 or more camera inputs and render them into a set of polygons in 3-D. Given my history of "invention", this has already been done.
@TenSeiKenZX 9 лет назад
I wish I had Dr Mike Pound as a lecturer
@LLHLMHfilms 9 лет назад ⁺⁸
The Rubik's cube on the sled isn't solved!!!! It's driving me crazy!!!!!!
@jesper86broberg 8 месяцев назад
Im considering using TOF or Stereo 3D for a QA vision project for small details (o-rings) in a production. What pros and cons are they? to me, just by reading it seems TOF seems as the better option in most aspects but you seem more geard towards stereo 3d? Thank you for the nice videos!
@peterjamesfinn 9 лет назад
Instead of keeping the cameras in a fixed alignment, why not have some permanent feature in between the cameras of a known size (like our nose for our eyes) that allows you to calibrate each frame separately?
Sure you will get a bit of obscuring, but it makes the 3D position calculation less reliant on keeping the relative position and angle of the two cameras absolutely fixed. You would need to use a bit of inference (or possibly something like staccato) to fill in the positions of the blank bit.
@Kruglord 9 лет назад
+peterjamesfinn The video glosses over this fact (for good reason) but the camera calibration step they mentioned is actually a very important and relatively complex step in the whole system. Because the relative position of the two cameras determines the measurement of the rest of the system, even very small errors in their estimated position can have huge impacts on the precision of everything else. For this reason, in stereo-vision systems the cameras tend to be rigidly mounted together, and their position determined in a separate step before they're used in any other measurements.
Now, it's common to have a system that only has a single camera, that takes lots of pictures and moves around the scene alone rather than having a pair. This method relies on a simultaneous calculation of the camera's position at each photo location, and the location of the features in each photo. This can be fairly accurate as well, but it also has it's limitations. Specifically, the scale is indeterminate without additional observations, so you can only record the shape of the scene, not the size. Also, as mentioned in the video, the camera's locations can only be determined through features common to different images, which leads us into the correspondence problem. There are methods available to solve this problem, but they're complex and take a lot of time to calculate, even with today's powerful computers.
@trunc8 5 лет назад
So finally, how is the occlusion problem solved(feature hidden in one view, existing in the other)?
@tedchirvasiu 9 лет назад ⁺²
where the object oriented programming video gone at?
@JegErHolyNoah 9 лет назад
+Ted Chirvasiu Yeah I'd like to know too
@RomainQ 9 лет назад
That wolf sound effect at the beginning... I'm sure I've heard it in many different game but I can't find a source for it!
@ancientapparition1638 9 лет назад
Dota 2 when the clock hits night time.
@RomainQ 9 лет назад
+Ancient Apparition I also heard it in Dofus, WoW and HoTS
@OVBLANA 9 лет назад
Could it be possible to recreate this with 2 mobile phones? You could calculate the distance between the two phones using gps location, but the precision could possibly be too bad.
@SyukriLajin 9 лет назад ⁺¹
No need for gps, wifi/bluetooth triangulation could do it more precisely. Of course we would need 3 or more phones.
@tetradb_ 9 лет назад
+Purple Blaze Google(not sure of the project name, but it's to do with google earth/maps) and Microsoft(Photo Tourism) already do this using peoples images, then they build up a 3d map of an area using those individual images. I dont think that accuarate/any gps data is a requirement, I think the algorithms used are clever enough to extrapolate the position.
@highwayrunner9771 4 года назад
part II please
@piwithatsme 9 лет назад
Moving your head also works to help see depth with one eye
@OsamaRana 9 лет назад ⁺¹
+WiWiPiWiWi Essentially you're using one eye to gather information that you'd normally get with two eyes
@DeJayHank 9 лет назад ⁺²
+WiWiPiWiWi That would work with a camera as well, but only if the object(s) you are taking pictures of is stationary during the whole process, and you know how far you moved the camera.
@harshitkhandelwal1243 5 лет назад
Why he is not seeing in the camera?
@noahwilliams8996 9 лет назад
But how do we know what direction that line is going in?
@ACDCBoy62 9 лет назад
+Noah Williams It's a straight line between the camera and the object.
@noahwilliams8996 9 лет назад
Elias Simon
No I mean the one that the other camera has that can be quickly checked to see if it has the same value.
@Larbydarg 9 лет назад
So how do you determine the depth of a point in one view if it's occluded in the other view? Can you?
@Kruglord 9 лет назад ⁺¹
+MrAlbedo39 You can't really determine the coordinate of a point that only appears in one image. All you can say is that it exists, it falls on the epi-polar line outside of the bounds of the second image. So is either too close or too far to be seen by both cameras.
@Larbydarg 9 лет назад
+Kruglord I'm more interested in how that occluded point ends up being represented in the 3-D result. Does it appear as a flaw that must be manually corrected?
@ITR 9 лет назад
Wouldn't ultrasound be easier or cheaper?
@Thomcat 9 лет назад
+MMMIK13 Accuracy, lack of colour information, limited depth...
At my work we use tens of fixed IR cameras to do motion capture, there is no way it could be reasonably done with ultrasound.
@ITR 9 лет назад
***** Hmm, but in the example at the beginning he talked about using Stereo 3D Vision as an alternative to lasers, and in that case, do you think ultrasound would work better?
@SyukriLajin 9 лет назад
Or radar. Google's project soli is doing exactly this. Pretty interesting
@aka5 9 лет назад
+Syukri Lajin Radar uses ultrasound dunnit, isn't that what the previous guy was referring to?
@SyukriLajin 9 лет назад ⁺¹
Akașșș ultrasound uses.. sound. radar uses electromagnetic waves. if i'm not wrong
@MrRJReynolds 9 лет назад
How do the cameras function when there is specularity?
@Kruglord 9 лет назад
+R.J. Reynolds Specularity (highly reflective surfaces) generally cause the correspondence to break down, resulting in what effectively appears to be occlusion in the depth map.
@rufioh 9 лет назад
Would this be easier with three cameras instead of 2?
@ahmetmelihafsar2352 4 года назад
I don't think so. For example if you use 3 cameras named a, b, c, you have to draw the triangles for a-b, b-c, a-c. It would be more accurate, but it would cost 3 times processing power.
@raj61091 4 года назад
i leant how to explain something from your video, also a bit about stereo vision
@chrisradford1157 6 лет назад
Does anyone know if I can use this method if I have the cameras gps coordinate when the images were taken? Can I calibrate the cameras using that data and follow the same method?
@memorablename5187 7 лет назад
i had a question on an exam 'is stereo vision possible with only 1 camera, if soo what ancillary data is needed' how would u guys answer this????
@mitigatekeeps1371 7 лет назад
No.
@demetriuspsf 9 лет назад
I work with this technique to reconstruct faces from photos.
@GISP 9 лет назад
Can it be used in real time?
eg. In a car, to give a 100% accuracy on distence to stuff?
Can it be used, in Augmented Reality applications, and games in real time?
@TestDrivenUK 9 лет назад ⁺¹
+GISP Subaru have a system (called 'EyeSight' strangely enough) that uses 'stereo' cameras mounted high in the windscreen to detect obstacles and warn the driver if they get too close, applying the brakes if necessary.
@Nulono 8 лет назад
1:49 *farther
@unveil7762 9 месяцев назад
@titaniumdiveknife 9 лет назад ⁺³
I understood about a quarter of that. that's enough for today.
@AnimeReference 9 лет назад
If somehow my eyes were moved further apart would I have distorted vision?
Is the distance between the eyes of a human even constant? or are they in the same spot since birth?
I wouldn't bother with a system for eye distance calibration if I were a ... god?
@Tfin 9 лет назад ⁺¹
+Jake Surname You might have trouble for a while, but you'd adjust, because your pattern matching ability is still much better than a computer's.
@liquidminds 9 лет назад
+Jake Surname since you won't wake up one morning with your eye-distance being completely different, it doesn't matter. even if you grow a little, the changes are so small, that you can easily adapt.
If you woke up and your vision was impaired by
@Tfin 9 лет назад ⁺²
You can simulate moving your eyes with a series of mirrors. Take one of those toy periscopes, and turn it sideways. Now your eyes are suddenly very far apart.
@Frrk 9 лет назад
+R3Testa Suddenly, superhuman depth perception :)
@camius1 7 лет назад
Im trying to implement this using IR cameras in real-time without any luck haha
@remybrandt8347 8 лет назад
use filters on the receiver. That kills sunshine.
@99Davidcool 4 года назад ⁺¹
This is easier with a plenoptic camera
@olatunjifelix2102 4 года назад
great
@elerosvecchio 7 лет назад
MIKE! Finish the rubiks cube damn it
@j7ndominica051 9 лет назад
I can't see anything special in the "magic eyes" picture. There are 10 by 8 repeated patterns of noise.
@Sazoji 9 лет назад
+j7ndominica0 i see two cube popping towards me diagonal to each other and a square and a circle with a circle inside popping in
@tankmohit 9 лет назад ⁺²
am i the only one who found Dr Mike Pound speaks like christian bale.
@MarcelRobitaille 9 лет назад ⁺¹
I love how he says "free d"
@OH5EDP 9 лет назад
Ye fock'n non brit :D
@MarcelRobitaille 9 лет назад ⁺¹
+Jimi Leander I'm Canadian eh. The Queen is on my money. Pretty British if you ask me.
@avro549B 9 лет назад
+Marcel Robitaille He's close to having a speech impediment, (e.g. notice "ovver"). It may be an English public (expensive private) school accent/affectation.
@rich1051414 8 лет назад
+avro549B "Peasant accent" :D
@jadoo16815125390625 9 лет назад ⁺⁴
This started very abruptly. It would have been nicer to have a gooder introduction.
@leestons 6 лет назад ⁺⁷
"gooder"
@joelproko 9 лет назад ⁺⁸
What a boring magic image :(
@TheMasonX23 8 лет назад ⁺²
+joelproko Stereograms can be so cool, with detailed shapes and customized backgrounds, and the example they go with are some simple shapes over static...
@oldcowbb 4 года назад
i was expecting more nerdy stuff :(
@afroninjadeluxe 9 лет назад
So the human brain knows the distance between the eyes?
@TrollingAround 9 лет назад
Who in their right mind leaves an unsolved Rubics cube on a shelf in the background of a video? No idea what the video was about as I was totally distracted. :-(
@canguar 9 лет назад ⁺¹
does the brain work similarly, i wonder
@calfischer1149 9 лет назад
in what way?
@Germanywithtripti101 4 года назад
Corrospondance problem
@RedSquirrelVanguard 6 лет назад
It's always triangles!
@MadMonkey126 9 лет назад
Your hair is on fleek! Wow did I just say fleek?
@calt03 7 лет назад ⁺¹
I wanna pound dr pound😇🤔😅 he is so cute
@turbotrading7910 7 лет назад
liked bcoz of wolf story
@jebus6kryst 9 лет назад
Everyone's a LOBO!.
You do realize that there has never been a reported wolf attack in the Americas.
@PwnUIDo 9 лет назад
You do realize saying something doesnt make it true.
@jebus6kryst 9 лет назад
I Am The Way
You do realize you are communicating with me on a device that can show if I am right or wrong. I am sorry if I do not cite my work in the RUclips comment section.
@melihaslan9509 3 года назад
What I understand? Nothing
@hanniffydinn6019 9 лет назад
Try driving a car with one eye closed....!
@realcygnus 9 лет назад
he should have explained some of the maths
@IonoTheFanatics 9 лет назад ⁺¹
+realcygnus ??? but the math there is really secondary to the principal core of the mechanism
ie: how to simplify the problem to make solving it actually feasible instead of matching pixel by pixel on the entire image
@Kruglord 9 лет назад
+realcygnus The maths tends to use a lot of linear algebra, which is probably beyond the scope of these videos.
@fivforfivfor 2 года назад
I already know the answer Because I have already solved this issue (light cones) ...🧐🧐🧐...
@sam08g16 9 лет назад
computerphile is way too nerd for the normal human being
@bcn1gh7h4wk 9 лет назад
basically, "how to solve a problem with 3 variables"
fix one, know another, and math the result out.
that's a thousand-year-old principle.... and people still fail to apply it to daily situations.

Следующие

Автовоспроизведение

Has Generative AI Already Peaked? - Computerphile