You are an excellent instructor, thank you. The rationale for multiple priors per cell is not explained. Unlikely to be inconsequential since they are common in object detection. I suspect that it helps in multiple ways. For example: Neural network will smear its objectiveness assessment over a number of cells. If multiple objects are present in a picture, then the multiple object's smears will overlap (superposition). By having multiple priors per cell, this smearing can be diluted, essentially a hash to separate predictions do the superposition does not go above the detection threshold. I also suspect that the encoded ground truth of height and width are easier to learn by using these multiple priors per cells. Does this make sense and are there more ways that multiple priors per cell help the trainability of these models?
Reason for multiple priors per cell is to incorporate the different size of objects belonging to different classes. And as such even for a single class it is not bad idea to use different scales and aspect rations.
@@KapilSachdeva I suppose I am looking for why it is a good idea to have multiple aspect ratios per cell. If I have a sufficient number of cells, then unlikely for two objects to be perfectly aligned with that cell. So I suspect it is not to disambiguate multiple perfectly overlapped objects. Setting the prior to say each cell has one object that is full page, and let the ground truth train the actual value. I suspect this will not work as good as multiple aspect ratios in each cell. if so, why?
> Setting the prior to say each cell has one object that is full page, and let the ground truth train the actual value. Not sure if I understand above line. Regarding disambiguate multiple perfectly overlapped objects - The objects may very well overlap in the "feature map". A cell in the feature map (grid) covers many pixels in the original image so it is possible that a given cell contains the center of multiple objects.
I am not aware of any book. Even if it exists it will get outdated just after it is published. The best way to learn is to first read the papers and then read the official code and try to implement by yourself. The most challenging aspect in this domain (object detection) is that even though the people (researchers) are brilliant they are not good at software development. This makes their code very hard to read. This is why as part of these tutorials I am showing some code as well.
Thank you very much for excellent explanation how it work. I have the question. As you have described in anchor box video, the only one predefined anchor box is firstly selected based on IoU threshold and the prediction is learnt by that anchor. Is it neccessary that anchor box sizes are prior determined corresponding to the known ground truth objects?.
Sir question is how boundary boxes are made like how a cat man cycle all locations and boxes are made at a single time, i know how to select the best one and to write feature vectors.... Please kindly respond
Just to confirm - Are you asking how ground truth tensors are prepared so that we can pass them in the loss function along with the predictions? And the ground truth tensor are prepared in such a way that label assignment is done for all them?
@@KapilSachdeva i mean how Boundary boxes are made for every objects in a image Like when we are writing vectors for every grid which consists of probability of class and coordinates, how they(coordinates/boundary boxes) are calculated using which algorithm
If we have 5 priori bounding boxes, we will do the calculations b_x, b_y, b_w, b_h 5 times and then find IoU between them and ground truth bounding boxes?
Thank you a lot for your videos! Selection of subjects in your series is excellent, every tutorial offers very interesting information.
🙏
Just completed this series. This is gold❤. Thank you so much❤️
Are you working on new videos as well?
I have plans for it; have been busy lately and hence the delay. 🙏
@@KapilSachdeva Got it. I will be waiting out for more content
Indeed, this is the best explanation so far on this topic. Hopefully you can explain other concepts in YOLO after anchor box such as what is t_o?
🙏 Yes, working on those.
You are an excellent instructor, thank you. The rationale for multiple priors per cell is not explained. Unlikely to be inconsequential since they are common in object detection. I suspect that it helps in multiple ways. For example: Neural network will smear its objectiveness assessment over a number of cells. If multiple objects are present in a picture, then the multiple object's smears will overlap (superposition). By having multiple priors per cell, this smearing can be diluted, essentially a hash to separate predictions do the superposition does not go above the detection threshold. I also suspect that the encoded ground truth of height and width are easier to learn by using these multiple priors per cells. Does this make sense and are there more ways that multiple priors per cell help the trainability of these models?
Reason for multiple priors per cell is to incorporate the different size of objects belonging to different classes. And as such even for a single class it is not bad idea to use different scales and aspect rations.
@@KapilSachdeva I suppose I am looking for why it is a good idea to have multiple aspect ratios per cell. If I have a sufficient number of cells, then unlikely for two objects to be perfectly aligned with that cell. So I suspect it is not to disambiguate multiple perfectly overlapped objects. Setting the prior to say each cell has one object that is full page, and let the ground truth train the actual value. I suspect this will not work as good as multiple aspect ratios in each cell. if so, why?
> Setting the prior to say each cell has one object that is full page, and let the ground truth train the actual value.
Not sure if I understand above line.
Regarding disambiguate multiple perfectly overlapped objects -
The objects may very well overlap in the "feature map". A cell in the feature map (grid) covers many pixels in the original image so it is possible that a given cell contains the center of multiple objects.
@@KapilSachdeva good point, thank you
hello , when are you going to publish new videos? this series is really helpful, subbed and looking forward for the new vids
Very soon. Apologies for the delay.
@@KapilSachdeva thank you and looking forward to it , would you recommend any books that extensively go through these topics ?
I am not aware of any book. Even if it exists it will get outdated just after it is published.
The best way to learn is to first read the papers and then read the official code and try to implement by yourself.
The most challenging aspect in this domain (object detection) is that even though the people (researchers) are brilliant they are not good at software development. This makes their code very hard to read. This is why as part of these tutorials I am showing some code as well.
@@KapilSachdeva thank you and looking forward for more of your tutorials
Thank you very much for excellent explanation how it work. I have the question. As you have described in anchor box video, the only one predefined anchor box is firstly selected based on IoU threshold and the prediction is learnt by that anchor. Is it neccessary that anchor box sizes are prior determined corresponding to the known ground truth objects?.
More or less yes.
Sir question is how boundary boxes are made like how a cat man cycle all locations and boxes are made at a single time, i know how to select the best one and to write feature vectors....
Please kindly respond
Just to confirm -
Are you asking how ground truth tensors are prepared so that we can pass them in the loss function along with the predictions? And the ground truth tensor are prepared in such a way that label assignment is done for all them?
@@KapilSachdeva i mean how Boundary boxes are made for every objects in a image
Like when we are writing vectors for every grid which consists of probability of class and coordinates, how they(coordinates/boundary boxes) are calculated using which algorithm
thanks a lot!!! good video!!
Please, when you will explain to us how updating height and width bw=pw*e(tw)
🙏
Thankyou so much
If we have 5 priori bounding boxes, we will do the calculations b_x, b_y, b_w, b_h 5 times and then find IoU between them and ground truth bounding boxes?
Yes.
Why we don't use the bounding box prior in the calculations of b_x and b_y
in 7:29 please?
We are using the bounding box prior. cx and cy are that of prior bounding box that we have associated with the ground truth box.
@@KapilSachdeva Thanks! What about p_x and p_y please then? I though these are the prior bounding box coordinates
First it is not p_x and p_y. It is p_w and p_h i.e. width and height of the prior box.
A bounding box is defined using (cx,cy,pw,ph)
@@KapilSachdeva Thank you. Lastly, I was trying to say why we don't use p_x and p_y in the calculations of b_x and b_y instead c_x and c_y please?
In yolo, bounding box is defined using (cx,cy,w,h)
Please explain the NECKS part
Will do. Stuck at few problems and hence not getting any cycles. But will definitely get to them.
done 😀