Bounding Box Prediction | Yolo | Essentials of Object Detection

Поделиться
HTML-код
  • Опубликовано: 9 сен 2024
  • This tutorial explains finer details about the bounding box coordinate predictions using visual cues.

Комментарии • 36

  • @sashalyuklyan5195
    @sashalyuklyan5195 3 месяца назад +1

    Thank you a lot for your videos! Selection of subjects in your series is excellent, every tutorial offers very interesting information.

  • @sheerazahmad3131
    @sheerazahmad3131 Год назад +3

    Just completed this series. This is gold❤. Thank you so much❤️
    Are you working on new videos as well?

    • @KapilSachdeva
      @KapilSachdeva  Год назад

      I have plans for it; have been busy lately and hence the delay. 🙏

    • @sheerazahmad3131
      @sheerazahmad3131 Год назад

      @@KapilSachdeva Got it. I will be waiting out for more content

  • @mohammadyahya78
    @mohammadyahya78 Год назад +1

    Indeed, this is the best explanation so far on this topic. Hopefully you can explain other concepts in YOLO after anchor box such as what is t_o?

  • @johngrabner
    @johngrabner Год назад +1

    You are an excellent instructor, thank you. The rationale for multiple priors per cell is not explained. Unlikely to be inconsequential since they are common in object detection. I suspect that it helps in multiple ways. For example: Neural network will smear its objectiveness assessment over a number of cells. If multiple objects are present in a picture, then the multiple object's smears will overlap (superposition). By having multiple priors per cell, this smearing can be diluted, essentially a hash to separate predictions do the superposition does not go above the detection threshold. I also suspect that the encoded ground truth of height and width are easier to learn by using these multiple priors per cells. Does this make sense and are there more ways that multiple priors per cell help the trainability of these models?

    • @KapilSachdeva
      @KapilSachdeva  Год назад

      Reason for multiple priors per cell is to incorporate the different size of objects belonging to different classes. And as such even for a single class it is not bad idea to use different scales and aspect rations.

    • @johngrabner
      @johngrabner Год назад

      @@KapilSachdeva I suppose I am looking for why it is a good idea to have multiple aspect ratios per cell. If I have a sufficient number of cells, then unlikely for two objects to be perfectly aligned with that cell. So I suspect it is not to disambiguate multiple perfectly overlapped objects. Setting the prior to say each cell has one object that is full page, and let the ground truth train the actual value. I suspect this will not work as good as multiple aspect ratios in each cell. if so, why?

    • @KapilSachdeva
      @KapilSachdeva  Год назад

      > Setting the prior to say each cell has one object that is full page, and let the ground truth train the actual value.
      Not sure if I understand above line.
      Regarding disambiguate multiple perfectly overlapped objects -
      The objects may very well overlap in the "feature map". A cell in the feature map (grid) covers many pixels in the original image so it is possible that a given cell contains the center of multiple objects.

    • @johngrabner
      @johngrabner Год назад

      @@KapilSachdeva good point, thank you

  • @lordfarquad-by1dq
    @lordfarquad-by1dq Год назад +1

    hello , when are you going to publish new videos? this series is really helpful, subbed and looking forward for the new vids

    • @KapilSachdeva
      @KapilSachdeva  Год назад +1

      Very soon. Apologies for the delay.

    • @lordfarquad-by1dq
      @lordfarquad-by1dq Год назад

      @@KapilSachdeva thank you and looking forward to it , would you recommend any books that extensively go through these topics ?

    • @KapilSachdeva
      @KapilSachdeva  Год назад +1

      I am not aware of any book. Even if it exists it will get outdated just after it is published.
      The best way to learn is to first read the papers and then read the official code and try to implement by yourself.
      The most challenging aspect in this domain (object detection) is that even though the people (researchers) are brilliant they are not good at software development. This makes their code very hard to read. This is why as part of these tutorials I am showing some code as well.

    • @lordfarquad-by1dq
      @lordfarquad-by1dq Год назад

      @@KapilSachdeva thank you and looking forward for more of your tutorials

  • @wiputtuvaynond761
    @wiputtuvaynond761 Год назад

    Thank you very much for excellent explanation how it work. I have the question. As you have described in anchor box video, the only one predefined anchor box is firstly selected based on IoU threshold and the prediction is learnt by that anchor. Is it neccessary that anchor box sizes are prior determined corresponding to the known ground truth objects?.

  • @indianmanhere
    @indianmanhere Год назад +1

    Sir question is how boundary boxes are made like how a cat man cycle all locations and boxes are made at a single time, i know how to select the best one and to write feature vectors....
    Please kindly respond

    • @KapilSachdeva
      @KapilSachdeva  Год назад

      Just to confirm -
      Are you asking how ground truth tensors are prepared so that we can pass them in the loss function along with the predictions? And the ground truth tensor are prepared in such a way that label assignment is done for all them?

    • @indianmanhere
      @indianmanhere Год назад +1

      @@KapilSachdeva i mean how Boundary boxes are made for every objects in a image
      Like when we are writing vectors for every grid which consists of probability of class and coordinates, how they(coordinates/boundary boxes) are calculated using which algorithm

  • @issanasralli4529
    @issanasralli4529 Год назад +1

    thanks a lot!!! good video!!
    Please, when you will explain to us how updating height and width bw=pw*e(tw)

  • @dinoyjohny5211
    @dinoyjohny5211 5 месяцев назад

    Thankyou so much

  • @mohammadyahya78
    @mohammadyahya78 Год назад

    If we have 5 priori bounding boxes, we will do the calculations b_x, b_y, b_w, b_h 5 times and then find IoU between them and ground truth bounding boxes?

  • @mohammadyahya78
    @mohammadyahya78 Год назад

    Why we don't use the bounding box prior in the calculations of b_x and b_y
    in 7:29 please?

    • @KapilSachdeva
      @KapilSachdeva  Год назад +1

      We are using the bounding box prior. cx and cy are that of prior bounding box that we have associated with the ground truth box.

    • @mohammadyahya78
      @mohammadyahya78 Год назад

      @@KapilSachdeva Thanks! What about p_x and p_y please then? I though these are the prior bounding box coordinates

    • @KapilSachdeva
      @KapilSachdeva  Год назад +1

      First it is not p_x and p_y. It is p_w and p_h i.e. width and height of the prior box.
      A bounding box is defined using (cx,cy,pw,ph)

    • @mohammadyahya78
      @mohammadyahya78 Год назад

      @@KapilSachdeva Thank you. Lastly, I was trying to say why we don't use p_x and p_y in the calculations of b_x and b_y instead c_x and c_y please?

    • @KapilSachdeva
      @KapilSachdeva  Год назад +1

      In yolo, bounding box is defined using (cx,cy,w,h)

  • @rampavanmedipelli6152
    @rampavanmedipelli6152 Год назад +1

    Please explain the NECKS part

    • @KapilSachdeva
      @KapilSachdeva  Год назад

      Will do. Stuck at few problems and hence not getting any cycles. But will definitely get to them.

    • @KapilSachdeva
      @KapilSachdeva  Год назад

      done 😀