Visual Odometry with Monocular Camera For Beginners: A Project in OpenCV

Поделиться
HTML-код
  • Опубликовано: 14 окт 2024

Комментарии • 86

  • @NicolaiAI
    @NicolaiAI  Год назад +4

    Join My AI Career Program
    www.nicolai-nielsen.com/aicareer
    Enroll in My School and Technical Courses
    www.nicos-school.com

  • @stevennovotny9568
    @stevennovotny9568 10 месяцев назад +8

    Nice overview of a VO process. Well done. However, I think the use of using a unit translation in getting your second set of homogenous coordinates and then using those points to ultimately calc scale will always give you something close to 1. This issue is hidden by the fact that the ground truth in your dataset changes by approximately one unit per frame. You can check this by skipping several frames. The displacement should be greater but your algorithm will still give a scale of about one.

  • @Elku
    @Elku 2 года назад +11

    Great video, I used your package and modified it a bit to my liking.
    You do have one correction to make though, transf = vo.get_pose(q1, q2) can return an infinite value in [0,3] and [1,3], especially when a video shows something stopping.
    Adding transf = np.nan_to_num(transf, neginf=0,posinf=0) fixes the issue.

  • @alessandrotorresani5915
    @alessandrotorresani5915 2 года назад +5

    Thank you for this amazing material! Can't wait to see the next steps with bundle adjustment 🙂

    • @NicolaiAI
      @NicolaiAI  2 года назад

      Thanks for watching!

  • @thomassouza5853
    @thomassouza5853 2 года назад +4

    This is amazing, I will definitely use this in my final course work.

  • @rajithamdesilva
    @rajithamdesilva Год назад +3

    Thanks! Found your channel today and it's an absolute gem for beginner computer vision researchers.

    • @NicolaiAI
      @NicolaiAI  Год назад +1

      Thanks a lot for the nice words!

  • @JoseAlejandroDonate
    @JoseAlejandroDonate Год назад +2

    Thanks for this excelent tutorial. I have a question. Basically, I took a look at your approach to see if I could get my pose estimation relative scale working correctly. It didn't. The relative scale computed is, most of the time, a number close to 1. Any recommendations on this?

  • @19_dharshak87
    @19_dharshak87 9 месяцев назад +1

    Hey ! Awesome video. Just a tiny question, I was not able to find the image_r folder in the KITTY datasets. Could anyone help with that

  • @serialsensor2756
    @serialsensor2756 3 месяца назад

    You can recover scale by e.g. introducing the assumption that the points you use are from the street in front of you. Did that to stabilize a self balance robot with VO (there is a Video on my account)

  • @gildrondavid3229
    @gildrondavid3229 2 года назад +1

    Really helpful vid! Might hav some questions later on tho. Thanks man and keep up the amazing work!

    • @NicolaiAI
      @NicolaiAI  2 года назад

      Thank you so much! Really appreciate it and feel free to ask whatever questions u have

  • @TheEdmaster87
    @TheEdmaster87 8 месяцев назад

    I have started my master's thesis project in VIO so this is quite interesting. I will use sensor fusion thought and not just VO.

  • @hofitroy1
    @hofitroy1 2 года назад +2

    so what is the difference between optical flow and visual odometry? which one is better to use for real-time "location" estimation and navigation for drones for example?

    • @NicolaiAI
      @NicolaiAI  2 года назад +1

      There are some similarities. Optical flow can also be used to track features from frame to frame. What to use depends on your feature extractor and system and so on. I'm gonna make a stereo visual odometry video too where I use optical flow to track feature points

  • @poproduction3994
    @poproduction3994 Год назад +1

    Hey Nicolai, can you please share any resource from where i can learn to integrate bundle adjustment to this code(basically get vslam working)? Thanks for the tutorial.

  • @azmyin
    @azmyin 2 года назад +1

    In the decompose_essential_mat, the technique you used in finding the correct [R,t] pair when decomposing essential matrix, is it a heuristic method that you implemented from scratch or is there a published paper explaining the method?

  • @Zap12348
    @Zap12348 Год назад +1

    why are we taking 0th and 2nd index of the translation vector in "gt_path.append((gt_pose[0, 3], gt_pose[2, 3]))"?
    Isnt that (tx, tz) whereas we need (tx,ty)?
    or is it because in 2D world with respect to camera, z of camera is y of real 2d world?

  • @shehanhere
    @shehanhere 6 месяцев назад

    I'm interested in learning the possibilty of applying visual odometry as an initial step to camera matchmoving. any thoughts?

  • @teetanrobotics5363
    @teetanrobotics5363 2 года назад +1

    Amazing tutorial bro !!. Keep it going.

  • @benaamediajo
    @benaamediajo Год назад

    Thank you for your video!
    How can we create VO for 360 cameras like insta360 x3 if at all possible? Also, is calibrating such a camera possible (equirectangular images)?

  • @bociek125
    @bociek125 2 года назад +1

    great work! would love to see move videos on the topic of optical vision where the camera is moving

    • @NicolaiAI
      @NicolaiAI  2 года назад

      Thanks for watching! Will definitely do more of those

  • @amineaitallala3420
    @amineaitallala3420 Год назад +1

    Thank you, do you have any idea how to implement this with an other feature detector ? i tried with sift it didn't works so well

    • @NicolaiAI
      @NicolaiAI  Год назад

      Thanks for watching! Yeah u can use all the feature detectors from opencv

  • @SweAwesome
    @SweAwesome 2 года назад +1

    Thank you so much for the great video! Just one question:
    If we would end up not using the KITTY dataset, then how would we go about creating the first Projection matrics used at 32:08 ? (self.P)

    • @NicolaiAI
      @NicolaiAI  2 года назад +1

      Camera calibration. Thanks a lot for watching!

    • @ifedayoolusanya5202
      @ifedayoolusanya5202 11 месяцев назад

      Hi so we can convert our camera calibration value to a 3 by 4 matrix to get the self.p right @@NicolaiAI

  • @anshXR
    @anshXR Год назад

    Instead of using decompose essential matrix and then finding the correct pose by trigulation of points, can I just use cv2.recoverPose() method. It does the same thing by itself.

  • @rubenponsaers9124
    @rubenponsaers9124 7 месяцев назад

    Nice video, trying to do it on my own data. Aren't the extrinsic parameters different for each image, so how is it possible that you can use it for your whole image sequence?

    • @NicolaiAI
      @NicolaiAI  7 месяцев назад

      Have it runnning in another video with live cameras

  • @azmyin
    @azmyin 2 года назад +1

    Great video. However, why is your algorithm not calculating the Z direction of the pose??

    • @NicolaiAI
      @NicolaiAI  2 года назад +1

      It is, but only x and y are visualized

    • @azmyin
      @azmyin 2 года назад

      Oh great.

  • @saadazhar4175
    @saadazhar4175 2 года назад

    Great video, have you done the video of optimizations as well?

  • @ChaitanyaKrishnabodduluri
    @ChaitanyaKrishnabodduluri Год назад +1

    Monocular camera odometry suffers with scale drift right? the Pose(R,T) doesn't have any units here right?

    • @NicolaiAI
      @NicolaiAI  Год назад

      It does. We take the relative scale into account. I go over that in the code. But it Will be another accumulating error for the odometry

  • @karanbirchahal3268
    @karanbirchahal3268 2 года назад

    Amazing video really great job, will you implement slam as well ?

  • @nomuchohan
    @nomuchohan Год назад

    Hi Nicolai! I've been following your channel since a long time and have learned quite a lot of things from you since I started following you... I am texting you because the project I am stuck on this time is by far the hardest one I've ever come across. It's a freelance project I got from somewhere and seems like I have exhausted all my options on how to actually get it done. I need your help with the project or at the very least suggestions on how I can approach the problem statement or solve it. So, I'll give a brief summary of the project:-
    I have to come up with a system that can map football players from the video frame to a 2D field image and get their velocities, acceleration etc stuff. I have used yolov7 for detection of the players from the video frame and using euclidean distance to keep track of the centroid of the selected player. Now, I want to be able to design a system to map this player on a 2D field image and get the player's acceleration and velocity. I tried perspective transform but it does not seem feasible as I will have to click on four separate corners every frame if I want to map. I want this process to be automated. Is there any way you can help me? note:- throughout the video the camera angle will not stay constant it will keep on changing. It's a ptz camera. Please help me with the above.
    Thank you.

    • @VikasRajpurohit-t2s
      @VikasRajpurohit-t2s Год назад

      Here's a step-by-step approach to achieve this:
      Camera Calibration:
      Perform camera calibration to obtain the camera's intrinsic matrix and distortion coefficients. You can use a chessboard pattern and OpenCV's cv2.calibrateCamera function for this.
      For a PTZ camera with a variable field of view, you may need to calibrate the camera multiple times as the camera angle changes.
      Object Detection and Tracking:
      Use YOLOv7 or any other object detection algorithm to detect football players in video frames. Extract their bounding boxes.
      Implement an object tracker (e.g., Kalman filter, CentroidTracker) to track players between frames based on their bounding boxes.
      Perspective Transform (Bird's-eye view):
      Obtain the 2D field image that you want to map the players onto.
      Define four points on the field image corresponding to the four corners of the field.
      Implement an automatic method (e.g., feature matching) to estimate the perspective transformation between the field image and the camera view in each frame.
      Optical Flow:
      Use optical flow algorithms (e.g., Lucas-Kanade, Farneback) to estimate the motion vectors of players between consecutive frames.
      Based on the motion vectors and the camera's frame rate, calculate the players' velocities and accelerations in the 2D field coordinate system.
      Combine Data:
      Using the perspective transform, map the players' positions from the camera view to the 2D field image.
      Combine the positional data with the calculated velocities and accelerations to obtain the desired player tracking information.
      Might help !!!

  • @poproduction3994
    @poproduction3994 Год назад

    Great tutorial seriously its great thanks for putting this out !!!! I have one question
    when i==0 of pose estimation we are using
    cur_pose = gt_pose
    and after that in i==1 we are using
    cur_pose = np.matmul(cur_pose, np.linalg.inv(transf))
    so in second iteration we are using the pose we have from ground truth and multiplying it pose we have calculated.
    What if we dont have ground truth. how will we calculate the cur_pose for i=1 then?
    thanks in advance

    • @poproduction3994
      @poproduction3994 Год назад

      nevermind i got the ans in your live camera trajectory video. thanks a bunch

    • @akilarsath6499
      @akilarsath6499 8 месяцев назад

      @@poproduction3994can u tell the solution?

  • @zaidarif218
    @zaidarif218 2 года назад +1

    Thanks man this is awesome Keep up the good work ;)

  • @CuongNguyen-kq7cx
    @CuongNguyen-kq7cx 10 месяцев назад

    wow, great!!! Can I use raspberry pi cam for that ?

  • @tselin7611
    @tselin7611 2 года назад +1

    Hello,
    I tried to feed live frames into the code. However, it yielded very high constant bias and noises. Is there any way to reduce the constant bias and noises?
    Thanks!

    • @NicolaiAI
      @NicolaiAI  2 года назад

      U Can use different filters. Try out with a low pass filter to start with

  • @sattarmonjezi4396
    @sattarmonjezi4396 2 года назад +1

    Thank You for this video.

    • @NicolaiAI
      @NicolaiAI  2 года назад

      Thanks for watching! Hope that it can help u

  • @jaskiratsingh9710
    @jaskiratsingh9710 2 года назад +1

    How did you create the calibration and poses txt file. Is there any code for that? Please share if it is there

    • @NicolaiAI
      @NicolaiAI  2 года назад

      That's from the KITTI dataset

  • @TheWeibing
    @TheWeibing 2 года назад +1

    Thanks! But I do have a question. How do we obtain the pose data of our own without referring to kitti datasets?

    • @NicolaiAI
      @NicolaiAI  2 года назад +1

      Thanks for watching! If u want to find the poses of ur own data u can just replace the images with ur own. But then u won't have the ground Truth poses

    • @TheWeibing
      @TheWeibing 2 года назад +1

      Is the ground truth pose mandatory? Does the code works without ground truth text file? I read bout it the other day and it seemed to be important to obtain scale information?

    • @NicolaiAI
      @NicolaiAI  2 года назад +1

      @@TheWeibing it's not mandatory but then u kinda don't know how ur system performs

    • @TheWeibing
      @TheWeibing 2 года назад

      @@NicolaiAI Hey just wondering what does the 2nd Row, 4th Colum term in the output transformation matrix represents? [x, ?, y]

    • @helenagarcia5103
      @helenagarcia5103 Год назад

      Caca

  • @maaitrayodas5630
    @maaitrayodas5630 Год назад +1

    Hey can you please share the article or paper from where the theory is taken

    • @NicolaiAI
      @NicolaiAI  Год назад

      Have not used a specific article or paper

    • @maaitrayodas5630
      @maaitrayodas5630 Год назад

      @@NicolaiAI I was intrigued by the scale calculation using triangulation, and then estimating R and t. Without using the initial R and t using inbuilt cv2.recoverPose

  • @ChaitanyaKrishnabodduluri
    @ChaitanyaKrishnabodduluri Год назад +1

    Whats the use of getting a pose with out scale?

    • @NicolaiAI
      @NicolaiAI  Год назад

      Actually we do. We take the relative scale into account. I go over that in the code

  • @ConsultingjoeOnline
    @ConsultingjoeOnline 4 месяца назад +1

    Very cool!

  • @santos4027
    @santos4027 2 года назад +1

    THANK YOU!!!

    • @NicolaiAI
      @NicolaiAI  2 года назад

      Thanks for watching!

  • @dynamicgecko1213
    @dynamicgecko1213 2 года назад

    Thank you for these videos man. I really appreciate it.
    I forked the repo to replicate results. Where can we get the "lib" module?

    • @LoayAltal
      @LoayAltal Год назад

      It's a folder next to the python script, in his github

  • @gbo10001
    @gbo10001 Год назад

    can it also work with object tracking?

  • @rajparikh7730
    @rajparikh7730 Год назад

    I have used this code with my own 1350 input images
    The only problem I'm facing is that this model is not able to run the images sequentially
    What I've noticed is it first runs a few images sequentially (say 30-50) and then it goes back to the start (0-10)
    I don't know what to do
    Please help

  • @jealouseggs5619
    @jealouseggs5619 2 года назад

    can i run this on ROS for a NAO robot?

  • @gjgb8836
    @gjgb8836 6 месяцев назад +1

    what is the name of the github repo_

  • @ashishgarg4965
    @ashishgarg4965 Год назад

    Can you make a video on visual slam??

  • @Theo-cn2cy
    @Theo-cn2cy 7 месяцев назад

    Where can I get the link for his discord server?

  • @goroyeh1898
    @goroyeh1898 Год назад

    Great tutorials! May I ask what is the full pipeline on 8:43? The part after Local optimization is occluded by your handsome face 😆

  • @iminaboroberts8516
    @iminaboroberts8516 7 месяцев назад

    @NicolaiAI how can i get the dataset

  • @alirezasoltani3049
    @alirezasoltani3049 2 года назад +1

    Thanks

    • @NicolaiAI
      @NicolaiAI  2 года назад

      Thanks for watching! Hope that u can use it

  • @nicolasnicolas-iz5ke
    @nicolasnicolas-iz5ke 2 года назад +1

    Something is strange, your method does not estimate the magnitude of translation (only its direction) and somehow it is pretty close to ground truth

    • @NicolaiAI
      @NicolaiAI  2 года назад

      Nope the translation is the magnitude. The whole transformations of the camera poses are estimated

  • @AnkitVashisht
    @AnkitVashisht 2 года назад +1

    Bro didn't used pnp ?