Since pictures can be a bit noisy due to sensor imperfections, and the information gained from a single pixel isn't that much, Stereo vision algorithms often utilize Block Matching. It means that instead of finding a single pixel in the other image, you look at a "block" of pixels around it (a 5x5 block for example) and see if a very similar block can be found in the other picture. It is much more robust to single pixels being very distorted due to noise etc. but of course takes more computing time since now you have to work with 25 pixels for each matching test, instead of just 1.
You can also move one eye/camera to get "actual" 3D because it's mathematically the same as two eyes for static scenes and under certain circumstances it even resembles the 3D "qualia" if you want to call it that.
I once did it with low enough CPU use to run (sort of) on a Raspberry Pi at ~ 10 fps, but it was pretty crude (output depth map was 80x60, and a bit noisy/glitchy. input was a pair of 320x240 images from parallel webcams). partial optimization was making use of blocky-VQ so that for a block of pixels (4x4), you can determine early if they are out of range (or are a nearly exact match). it was based on a trick I had used for doing motion compensation in video compression during capture (motion compensation helps somewhat with compression). the trick greatly reduced the amount of pixel-by-pixel checks. also it worked internally using a dichromatic colorspace, partly to save space and also because it was cheaper to only compare two axes (whereas a single Y axis loses the ability for it to distinguish things based on color, reducing accuracy somewhat). I had also tried unsuccessfully to use some designs based around Haar wavelets.
Could you do one about how you account for reflective objects or objects in a space where lighting isn't ambient? I have some ideas on how one could do this. But I'd like to hear the smart ticks surrounding this topic.
Would have liked if you showed an example of disparity map obtained from some stereo vision algorithm. It can help show what the result would look like.
+Kruglord Well I know what it looks like because I've been working with Stereo matching algorithms with and without OpenCV, but I thought it would have been a good addition in the video for people who haven't seen.
+Kruglord Don't forget MSER, these are especially useful (area) features for wide-baseline stereo. Feature detectors and descriptors alone are an interesting topic.
at 6:12 he is talking about its only possible because they know the positions of the 2 cameras, and that if they dont know the camera positions they have to search through the whole image. Question: Once you have done this one, two,...,ten times can you compute the cameras relative position? Once you calculate the cameras relative position you could then use that to make any future searches easier?
When the 3D picture was zoomed into the screen at 1:32, I just barely was able to make out the image before they started messing with it. It doesn't match the simulated "answer" they displayed a few seconds later.
I played around with stereo vision using OpenCV, and found it really hard. Computers are still years away from having 3d vision like even a rather simple animal.
+steve1978ger That's why stereo vision is no longer used in most cases, the current laser and RGBD cameras are easier and more effective in depth feature recognition.
....'like even rather a simple animal' - only predators have binocular vision which in turn are the most advanced biological life on this planet, like humans. overlapping field of vision might appear in some avians, but because totally different reasons (flight) and only for a few degrees, unlike predators who have true stereoscopic vision, there is no simple animal with that kind of trait
just thought Id make a point that I find I judge the distance to an object with one eye primarily with focussing; refocussing on an object until its the least blurry, and having your brain estimate how far away it is from memory.
+Hayden Muscat Yeah there is a lot more going on in the brain for distance estimation, like just knowing what size cars usually are or a mug or whatever is enough to decently judge its distance from you even with a single image. There are algorithms that judge depth by taking several pictures from one camera with different focus for every image. This wouldn't quite work on moving objects though since it takes time to get enough pictures for it to be useful.
Its really helpful channel and had lots of interesting videos but I can't find some videos in order, some videos are hidden from the channel and there is only a small number of playlists.. Is there any website where I can access these videos in some order. Thanks.
Reminds me of the "Fundamental Matrix Song", which is about the matrix connecting the two images. For more mathematically oriented people :-) Multiple-view geometry can be fun.
If I were to attempt this, I would create a coincidence map from the left and right image, representing how much each left pixel matched the corresponding right pixel, using an offset of a series of intervals from -somevalue to +somevalue. Somevalue weighed against each coincidence pixel should yield a kind of edge map depicting the differential offsets of similar pixels, interpreted as distance. I dream about developing some kind of inverse GPU card. Instead of taking a set of 3-D polygons and rendering them onto a 2-D screen, this would take a set of 2 or more camera inputs and render them into a set of polygons in 3-D. Given my history of "invention", this has already been done.
Im considering using TOF or Stereo 3D for a QA vision project for small details (o-rings) in a production. What pros and cons are they? to me, just by reading it seems TOF seems as the better option in most aspects but you seem more geard towards stereo 3d? Thank you for the nice videos!
Instead of keeping the cameras in a fixed alignment, why not have some permanent feature in between the cameras of a known size (like our nose for our eyes) that allows you to calibrate each frame separately? Sure you will get a bit of obscuring, but it makes the 3D position calculation less reliant on keeping the relative position and angle of the two cameras absolutely fixed. You would need to use a bit of inference (or possibly something like staccato) to fill in the positions of the blank bit.
+peterjamesfinn The video glosses over this fact (for good reason) but the camera calibration step they mentioned is actually a very important and relatively complex step in the whole system. Because the relative position of the two cameras determines the measurement of the rest of the system, even very small errors in their estimated position can have huge impacts on the precision of everything else. For this reason, in stereo-vision systems the cameras tend to be rigidly mounted together, and their position determined in a separate step before they're used in any other measurements. Now, it's common to have a system that only has a single camera, that takes lots of pictures and moves around the scene alone rather than having a pair. This method relies on a simultaneous calculation of the camera's position at each photo location, and the location of the features in each photo. This can be fairly accurate as well, but it also has it's limitations. Specifically, the scale is indeterminate without additional observations, so you can only record the shape of the scene, not the size. Also, as mentioned in the video, the camera's locations can only be determined through features common to different images, which leads us into the correspondence problem. There are methods available to solve this problem, but they're complex and take a lot of time to calculate, even with today's powerful computers.
Could it be possible to recreate this with 2 mobile phones? You could calculate the distance between the two phones using gps location, but the precision could possibly be too bad.
+Purple Blaze Google(not sure of the project name, but it's to do with google earth/maps) and Microsoft(Photo Tourism) already do this using peoples images, then they build up a 3d map of an area using those individual images. I dont think that accuarate/any gps data is a requirement, I think the algorithms used are clever enough to extrapolate the position.
+WiWiPiWiWi That would work with a camera as well, but only if the object(s) you are taking pictures of is stationary during the whole process, and you know how far you moved the camera.
+MrAlbedo39 You can't really determine the coordinate of a point that only appears in one image. All you can say is that it exists, it falls on the epi-polar line outside of the bounds of the second image. So is either too close or too far to be seen by both cameras.
+Kruglord I'm more interested in how that occluded point ends up being represented in the 3-D result. Does it appear as a flaw that must be manually corrected?
+MMMIK13 Accuracy, lack of colour information, limited depth... At my work we use tens of fixed IR cameras to do motion capture, there is no way it could be reasonably done with ultrasound.
***** Hmm, but in the example at the beginning he talked about using Stereo 3D Vision as an alternative to lasers, and in that case, do you think ultrasound would work better?
+R.J. Reynolds Specularity (highly reflective surfaces) generally cause the correspondence to break down, resulting in what effectively appears to be occlusion in the depth map.
I don't think so. For example if you use 3 cameras named a, b, c, you have to draw the triangles for a-b, b-c, a-c. It would be more accurate, but it would cost 3 times processing power.
Does anyone know if I can use this method if I have the cameras gps coordinate when the images were taken? Can I calibrate the cameras using that data and follow the same method?
Can it be used in real time? eg. In a car, to give a 100% accuracy on distence to stuff? Can it be used, in Augmented Reality applications, and games in real time?
+GISP Subaru have a system (called 'EyeSight' strangely enough) that uses 'stereo' cameras mounted high in the windscreen to detect obstacles and warn the driver if they get too close, applying the brakes if necessary.
If somehow my eyes were moved further apart would I have distorted vision? Is the distance between the eyes of a human even constant? or are they in the same spot since birth? I wouldn't bother with a system for eye distance calibration if I were a ... god?
+Jake Surname since you won't wake up one morning with your eye-distance being completely different, it doesn't matter. even if you grow a little, the changes are so small, that you can easily adapt. If you woke up and your vision was impaired by
You can simulate moving your eyes with a series of mirrors. Take one of those toy periscopes, and turn it sideways. Now your eyes are suddenly very far apart.
+Marcel Robitaille He's close to having a speech impediment, (e.g. notice "ovver"). It may be an English public (expensive private) school accent/affectation.
+joelproko Stereograms can be so cool, with detailed shapes and customized backgrounds, and the example they go with are some simple shapes over static...
Who in their right mind leaves an unsolved Rubics cube on a shelf in the background of a video? No idea what the video was about as I was totally distracted. :-(
I Am The Way You do realize you are communicating with me on a device that can show if I am right or wrong. I am sorry if I do not cite my work in the RUclips comment section.
+realcygnus ??? but the math there is really secondary to the principal core of the mechanism ie: how to simplify the problem to make solving it actually feasible instead of matching pixel by pixel on the entire image
basically, "how to solve a problem with 3 variables" fix one, know another, and math the result out. that's a thousand-year-old principle.... and people still fail to apply it to daily situations.
Best computerphile video:
- Clean desk
- Tidy shelf
- Nice hair
- Classic perforated printing paper
- Popped collar
false.
Frankly... this ended where I hoped it would start.
+Jan Dvořák Well, there _are_ extra bits. Don't know if they contain what you wanted to see but I wanted to make sure you checked.
+Jan Dvořák Watch it backwards?
+hoopshank lololol awesome response
whereas your comment started where I hoped it would end.
Since pictures can be a bit noisy due to sensor imperfections, and the information gained from a single pixel isn't that much, Stereo vision algorithms often utilize Block Matching. It means that instead of finding a single pixel in the other image, you look at a "block" of pixels around it (a 5x5 block for example) and see if a very similar block can be found in the other picture. It is much more robust to single pixels being very distorted due to noise etc. but of course takes more computing time since now you have to work with 25 pixels for each matching test, instead of just 1.
That makes alot of sense, thanks for the additional info!
So is it kind of like performing a convolution from the sets of pixels from one image onto the other image and finding the closest match from that?
??
You can also move one eye/camera to get "actual" 3D because it's mathematically the same as two eyes for static scenes and under certain circumstances it even resembles the 3D "qualia" if you want to call it that.
The best explanation of stereo-matching in the Internet.
Needs a part 2.
I once did it with low enough CPU use to run (sort of) on a Raspberry Pi at ~ 10 fps, but it was pretty crude (output depth map was 80x60, and a bit noisy/glitchy. input was a pair of 320x240 images from parallel webcams).
partial optimization was making use of blocky-VQ so that for a block of pixels (4x4), you can determine early if they are out of range (or are a nearly exact match). it was based on a trick I had used for doing motion compensation in video compression during capture (motion compensation helps somewhat with compression).
the trick greatly reduced the amount of pixel-by-pixel checks. also it worked internally using a dichromatic colorspace, partly to save space and also because it was cheaper to only compare two axes (whereas a single Y axis loses the ability for it to distinguish things based on color, reducing accuracy somewhat).
I had also tried unsuccessfully to use some designs based around Haar wavelets.
Could you do one about how you account for reflective objects or objects in a space where lighting isn't ambient? I have some ideas on how one could do this. But I'd like to hear the smart ticks surrounding this topic.
Mike is the best speaker in the channel. AI guy comes a close second.
Would have liked if you showed an example of disparity map obtained from some stereo vision algorithm. It can help show what the result would look like.
+DeJayHank Google "disparity map" and look at the image results, you'll see a bunch of examples of what that might look like.
+Kruglord Well I know what it looks like because I've been working with Stereo matching algorithms with and without OpenCV, but I thought it would have been a good addition in the video for people who haven't seen.
Great video, I think a good follow up video might be how people have approached the correspondence problem, such as using SIFT or SURF points.
+Kruglord Don't forget MSER, these are especially useful (area) features for wide-baseline stereo. Feature detectors and descriptors alone are an interesting topic.
at 6:12 he is talking about its only possible because they know the positions of the 2 cameras, and that if they dont know the camera positions they have to search through the whole image.
Question: Once you have done this one, two,...,ten times can you compute the cameras relative position?
Once you calculate the cameras relative position you could then use that to make any future searches easier?
Looks like the cameraman didn't have his customary dozen shots of espresso.
I remember doing this for a final proyect. :D This would've been awesome.
When the 3D picture was zoomed into the screen at 1:32, I just barely was able to make out the image before they started messing with it. It doesn't match the simulated "answer" they displayed a few seconds later.
I played around with stereo vision using OpenCV, and found it really hard. Computers are still years away from having 3d vision like even a rather simple animal.
+steve1978ger That's why stereo vision is no longer used in most cases, the current laser and RGBD cameras are easier and more effective in depth feature recognition.
....'like even rather a simple animal' - only predators have binocular vision which in turn are the most advanced biological life on this planet, like humans. overlapping field of vision might appear in some avians, but because totally different reasons (flight) and only for a few degrees, unlike predators who have true stereoscopic vision, there is no simple animal with that kind of trait
@@gigige5928 - okay, very simple animals may not have binocular vision, but "only predators" is an overgeneralization.
1:42 Shots fired at the Fine Bros
just thought Id make a point that I find I judge the distance to an object with one eye primarily with focussing; refocussing on an object until its the least blurry, and having your brain estimate how far away it is from memory.
+Hayden Muscat Yeah there is a lot more going on in the brain for distance estimation, like just knowing what size cars usually are or a mug or whatever is enough to decently judge its distance from you even with a single image.
There are algorithms that judge depth by taking several pictures from one camera with different focus for every image. This wouldn't quite work on moving objects though since it takes time to get enough pictures for it to be useful.
Great explanation!
Thank you for your great explanation . Amazing!
Wow nice explanation! thanks
I know you've already talked about color spaces, which was very interesting, but could you get Mike to do an episode on Gamma / Gamma Correction?
Its really helpful channel and had lots of interesting videos but I can't find some videos in order, some videos are hidden from the channel and there is only a small number of playlists.. Is there any website where I can access these videos in some order. Thanks.
Reminds me of the "Fundamental Matrix Song", which is about the matrix connecting the two images. For more mathematically oriented people :-) Multiple-view geometry can be fun.
Oh god, it's a headache to calculate all the points in analytic geometry, but is possible to use the focus of the camera to create a useful constant.
Monocular vision gets accurate depth from micro focus changes. Otherwise how does your eye know how to focus when you close one eye?
If I were to attempt this, I would create a coincidence map from the left and right image, representing how much each left pixel matched the corresponding right pixel, using an offset of a series of intervals from -somevalue to +somevalue. Somevalue weighed against each coincidence pixel should yield a kind of edge map depicting the differential offsets of similar pixels, interpreted as distance.
I dream about developing some kind of inverse GPU card. Instead of taking a set of 3-D polygons and rendering them onto a 2-D screen, this would take a set of 2 or more camera inputs and render them into a set of polygons in 3-D. Given my history of "invention", this has already been done.
I wish I had Dr Mike Pound as a lecturer
The Rubik's cube on the sled isn't solved!!!! It's driving me crazy!!!!!!
Im considering using TOF or Stereo 3D for a QA vision project for small details (o-rings) in a production. What pros and cons are they? to me, just by reading it seems TOF seems as the better option in most aspects but you seem more geard towards stereo 3d? Thank you for the nice videos!
Instead of keeping the cameras in a fixed alignment, why not have some permanent feature in between the cameras of a known size (like our nose for our eyes) that allows you to calibrate each frame separately?
Sure you will get a bit of obscuring, but it makes the 3D position calculation less reliant on keeping the relative position and angle of the two cameras absolutely fixed. You would need to use a bit of inference (or possibly something like staccato) to fill in the positions of the blank bit.
+peterjamesfinn The video glosses over this fact (for good reason) but the camera calibration step they mentioned is actually a very important and relatively complex step in the whole system. Because the relative position of the two cameras determines the measurement of the rest of the system, even very small errors in their estimated position can have huge impacts on the precision of everything else. For this reason, in stereo-vision systems the cameras tend to be rigidly mounted together, and their position determined in a separate step before they're used in any other measurements.
Now, it's common to have a system that only has a single camera, that takes lots of pictures and moves around the scene alone rather than having a pair. This method relies on a simultaneous calculation of the camera's position at each photo location, and the location of the features in each photo. This can be fairly accurate as well, but it also has it's limitations. Specifically, the scale is indeterminate without additional observations, so you can only record the shape of the scene, not the size. Also, as mentioned in the video, the camera's locations can only be determined through features common to different images, which leads us into the correspondence problem. There are methods available to solve this problem, but they're complex and take a lot of time to calculate, even with today's powerful computers.
So finally, how is the occlusion problem solved(feature hidden in one view, existing in the other)?
where the object oriented programming video gone at?
+Ted Chirvasiu Yeah I'd like to know too
That wolf sound effect at the beginning... I'm sure I've heard it in many different game but I can't find a source for it!
Dota 2 when the clock hits night time.
+Ancient Apparition I also heard it in Dofus, WoW and HoTS
Could it be possible to recreate this with 2 mobile phones? You could calculate the distance between the two phones using gps location, but the precision could possibly be too bad.
No need for gps, wifi/bluetooth triangulation could do it more precisely. Of course we would need 3 or more phones.
+Purple Blaze Google(not sure of the project name, but it's to do with google earth/maps) and Microsoft(Photo Tourism) already do this using peoples images, then they build up a 3d map of an area using those individual images. I dont think that accuarate/any gps data is a requirement, I think the algorithms used are clever enough to extrapolate the position.
part II please
Moving your head also works to help see depth with one eye
+WiWiPiWiWi Essentially you're using one eye to gather information that you'd normally get with two eyes
+WiWiPiWiWi That would work with a camera as well, but only if the object(s) you are taking pictures of is stationary during the whole process, and you know how far you moved the camera.
Why he is not seeing in the camera?
But how do we know what direction that line is going in?
+Noah Williams It's a straight line between the camera and the object.
Elias Simon
No I mean the one that the other camera has that can be quickly checked to see if it has the same value.
So how do you determine the depth of a point in one view if it's occluded in the other view? Can you?
+MrAlbedo39 You can't really determine the coordinate of a point that only appears in one image. All you can say is that it exists, it falls on the epi-polar line outside of the bounds of the second image. So is either too close or too far to be seen by both cameras.
+Kruglord I'm more interested in how that occluded point ends up being represented in the 3-D result. Does it appear as a flaw that must be manually corrected?
Wouldn't ultrasound be easier or cheaper?
+MMMIK13 Accuracy, lack of colour information, limited depth...
At my work we use tens of fixed IR cameras to do motion capture, there is no way it could be reasonably done with ultrasound.
***** Hmm, but in the example at the beginning he talked about using Stereo 3D Vision as an alternative to lasers, and in that case, do you think ultrasound would work better?
Or radar. Google's project soli is doing exactly this. Pretty interesting
+Syukri Lajin Radar uses ultrasound dunnit, isn't that what the previous guy was referring to?
Akașșș ultrasound uses.. sound. radar uses electromagnetic waves. if i'm not wrong
How do the cameras function when there is specularity?
+R.J. Reynolds Specularity (highly reflective surfaces) generally cause the correspondence to break down, resulting in what effectively appears to be occlusion in the depth map.
Would this be easier with three cameras instead of 2?
I don't think so. For example if you use 3 cameras named a, b, c, you have to draw the triangles for a-b, b-c, a-c. It would be more accurate, but it would cost 3 times processing power.
i leant how to explain something from your video, also a bit about stereo vision
Does anyone know if I can use this method if I have the cameras gps coordinate when the images were taken? Can I calibrate the cameras using that data and follow the same method?
i had a question on an exam 'is stereo vision possible with only 1 camera, if soo what ancillary data is needed' how would u guys answer this????
No.
I work with this technique to reconstruct faces from photos.
Can it be used in real time?
eg. In a car, to give a 100% accuracy on distence to stuff?
Can it be used, in Augmented Reality applications, and games in real time?
+GISP Subaru have a system (called 'EyeSight' strangely enough) that uses 'stereo' cameras mounted high in the windscreen to detect obstacles and warn the driver if they get too close, applying the brakes if necessary.
1:49 *farther
I understood about a quarter of that. that's enough for today.
If somehow my eyes were moved further apart would I have distorted vision?
Is the distance between the eyes of a human even constant? or are they in the same spot since birth?
I wouldn't bother with a system for eye distance calibration if I were a ... god?
+Jake Surname You might have trouble for a while, but you'd adjust, because your pattern matching ability is still much better than a computer's.
+Jake Surname since you won't wake up one morning with your eye-distance being completely different, it doesn't matter. even if you grow a little, the changes are so small, that you can easily adapt.
If you woke up and your vision was impaired by
You can simulate moving your eyes with a series of mirrors. Take one of those toy periscopes, and turn it sideways. Now your eyes are suddenly very far apart.
+R3Testa Suddenly, superhuman depth perception :)
Im trying to implement this using IR cameras in real-time without any luck haha
use filters on the receiver. That kills sunshine.
This is easier with a plenoptic camera
great
MIKE! Finish the rubiks cube damn it
I can't see anything special in the "magic eyes" picture. There are 10 by 8 repeated patterns of noise.
+j7ndominica0 i see two cube popping towards me diagonal to each other and a square and a circle with a circle inside popping in
am i the only one who found Dr Mike Pound speaks like christian bale.
I love how he says "free d"
Ye fock'n non brit :D
+Jimi Leander I'm Canadian eh. The Queen is on my money. Pretty British if you ask me.
+Marcel Robitaille He's close to having a speech impediment, (e.g. notice "ovver"). It may be an English public (expensive private) school accent/affectation.
+avro549B "Peasant accent" :D
This started very abruptly. It would have been nicer to have a gooder introduction.
"gooder"
What a boring magic image :(
+joelproko Stereograms can be so cool, with detailed shapes and customized backgrounds, and the example they go with are some simple shapes over static...
i was expecting more nerdy stuff :(
So the human brain knows the distance between the eyes?
Who in their right mind leaves an unsolved Rubics cube on a shelf in the background of a video? No idea what the video was about as I was totally distracted. :-(
does the brain work similarly, i wonder
in what way?
Corrospondance problem
It's always triangles!
Your hair is on fleek! Wow did I just say fleek?
I wanna pound dr pound😇🤔😅 he is so cute
liked bcoz of wolf story
Everyone's a LOBO!.
You do realize that there has never been a reported wolf attack in the Americas.
You do realize saying something doesnt make it true.
I Am The Way
You do realize you are communicating with me on a device that can show if I am right or wrong. I am sorry if I do not cite my work in the RUclips comment section.
What I understand? Nothing
Try driving a car with one eye closed....!
he should have explained some of the maths
+realcygnus ??? but the math there is really secondary to the principal core of the mechanism
ie: how to simplify the problem to make solving it actually feasible instead of matching pixel by pixel on the entire image
+realcygnus The maths tends to use a lot of linear algebra, which is probably beyond the scope of these videos.
I already know the answer Because I have already solved this issue (light cones) ...🧐🧐🧐...
computerphile is way too nerd for the normal human being
basically, "how to solve a problem with 3 variables"
fix one, know another, and math the result out.
that's a thousand-year-old principle.... and people still fail to apply it to daily situations.