C4W3L09 YOLO Algorithm

DeepLearningAI

Просмотров 230 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 5 янв 2025

Комментарии • 66

@ujjalkrdutta7854 3 года назад ⁺⁷
i have read multiple blog posts on yolo, along with the original paper, but this video provides the intuition at a different level. amazing !
@andresfernandoaranda5498 4 года назад ⁺²⁰
Same concept is used in YOLO v3, but instead of softmax activation for all classes, logistic regression is applied to each class (meaning there can be an object belonging to two classes)
@vaibhavsingh1049 5 лет назад ⁺⁴
So, at 1:49 we give the pc value to the 2nd anchor box because it had more IoU and not to the 1st. So to generalize, check if there's something worth in the grid; if there is, assign the associated pc value to anchor box with the highest IoU.
@manuel783 4 года назад ⁺²
YOLO algorithm *CORRECTION*
At time 5:00, for the slide titled "Outputting the non-max suppressed output" the text should read "For each grid cell" instead of "For each grid call".
@PoRouS22 5 лет назад ⁺⁷
Thank you very much for all your YOLO videos. They are just great :)
@PoRouS22 3 года назад
@Kohen Dominick ...
@mehranmehralian4608 6 лет назад ⁺⁹
@0:56 I think something is wrong. According to YOLO paper we have [S,S,(B * 5 + C)] which means each cell has C classes but here you said that each anchor box has C classes or [S,S,B*(5+C)].
@yumik4990 6 лет назад ⁺¹³
[S,S,(B*5+C)] is YOLOv1. He is talking about YOLOv2. The two models are pretty different in input encoding and also in the definition of loss, if I understood correctly. fairyonice.github.io/Part_4_Object_Detection_with_Yolo_using_VOC_2012_data_loss.html
@aicoding2010 2 года назад
@@yumik4990 YOLOv1 predict 2 boxes for each grid cell. I saw that YOLOv1 doesn't predefine 2 anchor boxes but it predicts based on the grid cell. And YOLOv1 assign only 1 box to grid cell with object with the highest IoU in the training process. I don't know how we can do this in the training process. If we run 1-st epochs, box 1 can have higher IoU, but when we run 2-nd epoch, box 2 can have higher IoU.
@BSelm05 4 года назад ⁺³
the best AI teacher, thank you
@maheshmedam1672 6 лет назад ⁺⁴
@5:28 How come bounding boxes size are different ? How is the bounding box size changing ?
@akanksharathore3946 5 лет назад ⁺¹
Have the same question. How exactly are the bounding boxes being predicted at every step?
@pradeepkumar-qo8lu 4 года назад
Bounding boxes are parameters to be learned/trained basically a continuous/regression output hence the bounding boxes change in size but not anchor boxes they are fixed in size
@rpcruz 6 лет назад ⁺²⁰
This algorithm simplified the bounding box regression by having a 3x3 (or some other) grid output, right? What I didn't understand is how anchor boxes are used in this algorithm...
@smart_world7928 6 лет назад ⁺⁸
YOLO and faster R-CNN have something in common and that is anchor boxes which are used to simulate the famous commonly used image pyramids as we see in SVM classifiers conventional training. Regardless of why we use anchors, YOLO v2 uses 5 anchor boxes (instead of 9 , unlike Faster R-CNN) for each cell of the 3x3 grid here. Faster R-CNN uses 9 of them but not for each cell of the grid and slides them on conventional feature maps resulted from the intermediate layer of a CNN. As far as I understood, YOLO used 5 bounding boxes at each cell in the 13x13 output feature map and predicts 5 coordinates for each bounding box. Since they constrain the location prediction (by using grids that Faster R-CNN does not use) the parametrization is easier to learn and it makes the network more stable. Hope you got the point. ;)
@beandog5445 5 лет назад ⁺⁴
@@smart_world7928 k
@azharhussian4326 4 года назад ⁺²
Yeah I didn't get it too. It is just increasing the size of the output, instead of making one predictions per cell, now the algorithm makes two predictions per cell. I don't understand what is the purpose of predefined boxes here.
@thelonespeaker 4 года назад
@@azharhussian4326 Did you get it in the end? Been through every post on the internet about anchor boxes and no one has come close to correctly explain how they are used in the process. Pretty frustrating.
@azharhussian4326 4 года назад
@@thelonespeaker Yeah still kind of stuck. But anchor boxes are usually used in the loss functions.
@sahil-7473 3 года назад ⁺¹
I am not clear how will it's work at Inference time? How can I get model output BB into original image format? Kindly give me the mathematics how to compute it?
@adityarajora7219 4 года назад
at 6:01 how this lady's bounding box is made.......because there is separate CNN for each grid cell....can somebody explain ?
@sanjivgautam9063 4 года назад
2 bounding boxes from 2 anchor boxes. Maybe this question comes because of your previous question in previous video.
@Denmark_ 5 лет назад ⁺¹
Is this YOLO or YOLO 9000? According to the YOLO paper, I think the y should be 3x3x((2x5)+3), so y is 3x3x13. Is this right?
@zatizsumkoolshyte 4 года назад ⁺¹
amazing educator
@haroldsu1696 6 лет назад
Thank you Andrew !
@alexter-sarkisov8321 5 лет назад
OK, so how many objects can one cell of YOLOv1 predict? The article says 'we only predict one set of class probabilities per grid cell regardless of the number of boxes'? It seems that the article skirts around the fact that the model can only predict at most 1 object/cell, but the wording above does not exclude, for example, the case when all B objects belong to the same class. So how many?
@mager8460 2 года назад
I think the answer is: One cell can predict, at maximum, one object for every anchor box.
@sandipansarkar9211 4 года назад ⁺¹
great explanation
@RH-mk3rp 2 года назад
what are the values of don't care question marks? Is it up to the labeler or is there a convention?
@vaneEAE Год назад
Did you get the answer?
@jamieabw4517 Год назад
@@vaneEAE from my research it seems its just up to the labeler, i havent found any convention anywhere
@vaneEAE Год назад
@@jamieabw4517 I saw that these terms are not considered in the loss function. Therefore, it is of no interest to know what value these terms take.
@lorryzou9367 Год назад
If we divide the image into 3*3=9 small boxes, why do we still need bx, by, bh, bw these box coordinate variables?
@maker72460 8 месяцев назад
Object may not lie in the center of that grid. Bounding boxing coordinates will specify the fitting bb for the object
@manikantabandla3923 Год назад
Is Non-Max suppression used during training?
@haojiechen4284 5 лет назад ⁺¹
Clear and good ecough. Thank you.
@GaganDaroach 4 года назад ⁺¹
Is this a graduate or undergraduate level course?
@waqasmalik4657 5 лет назад ⁺²
Is yolo is a deep learning algorithm???
@dhidhi1000 4 года назад
How is this grid cell segmentation actually encoded in the neural network? Is it encoded at all?
If I understood correctly, the segmentation is only encoded into the training data, and the network is supposed to "learn" to output the y=3x3x16 that matches the locations of the objects relative to the grid cell on the training data. In other words, the network has no information about any image grid.
@TragicGFuel 6 месяцев назад
In the previous videos, its shown that the grid is actually the "cut down" version of the image after being through multiple convolution layers.
That's the grid!
@Sniper-rl3xq Год назад
Amazing tutorial!! thank you so muchh
@ritwek98 4 года назад
Thank you!!
@lakshmanvengadesan9096 4 года назад
Let's say I have an object in 3 of the grid cells. Then, the outputs of all the 3 of the grid cells should be identical, with the same values of bx,by, bh,bw. Am I correct?
@MrAmgadHasan Год назад
Not really. The object should be assigned to the cell that contains the object's center. The remaining cells should predict 'background'
@danielkusuma6473 3 года назад
Thanks for the video, it brought me back to light:)
I however still have a question: In the Yolo v1 paper it is described that the final convolutional output layer is a tensor of 7x7x1024 dimension (Darknet), then the detection follows, where grid cells dimension of 7x7 are defined. My assumption here is, since the dimension of the conv output the same as the grid cell's, can one say that one grid cell represents one pixel, hence the detection proceeds one 'pixel' at a time?
@MrAmgadHasan Год назад
One grid cell represents a small 'crop' (e.g. 20x20 pixels) of the image, not necessarily a single pixel.
Another thing to note: the algorithms processes all grid cells simultaneously in one shot. It doesn't process them sequentiall.
@supriamir5251 Год назад
@@MrAmgadHasan how the yolo can predict the final output for 3 different scale? Yolov3 have 3 scale with different feature map
@ngantrieuninh9871 4 года назад
I read some documents and I know yolo use HSV, can you explain for me why?
@guardrepresenter5099 5 лет назад
Is someone else tell me training time we are using anchor box terminology become boundingbox in prediction time is that right?Prediction time acnhorbox not using only boundingbox right?
@AISHORTS9797 5 лет назад
Yes you are correct.Anchor box is only used to see the IOU matching with the ground truth bounding box. If the value of IOU between anchorbox and ground truth box of particular object is greater than 0.5 then we will consider the anchor and for that classlabels are [object confidence as 1(object with which IOU>0.5),bounding box coordinates of that object,classlabel as 1 for that object and zero for remaining].
@alexdalton4535 2 года назад
what if an object spans more than one grid cell?
@ab452 Год назад
They often do, bh and bw ground truths are defined taking into account the size of the orignal image. Meaning that one object on a cell can have bh, and bw that go behind the boundaries of the cell. That's not a problem because you do regression towards these values. the cell serves only to mark where the object should be detected. If the center point of the object is one cell then, the target vector for that cell is the only one that will have bh bw x, y for that object.
@maximlopin 5 лет назад
thank you
@nipunaviduranga6614 4 года назад
How to define anchors boxes boundary
@ricoaditya1 3 года назад
How to get value c1, c2, c3?
@polimetakrylanmetylu2483 3 года назад
c1 c2 and c3 are the classification part of an algorithm. They basically mean 'if the bounding box intersects an object, what is its type'. During training, your data should be annotated, so each bbox should have position and class if applicable. When you train your nn, you check if a box is over some iou with an object, and if it is then you train c1 c2 c3 like any other classifier
@theunknown2090 6 лет назад ⁺²
How to get the programming exercise
@omihalikar7799 6 лет назад
what happens when the expected training output was close to bounding box 1, but the output of the network was 2 and coordinates of the box on expected output were incorrectly marked close to 1 whereas they should have been close to 2
@munzutai 4 года назад
Just wanted to let you know that this video has been ripped and re-uploaded:
ruclips.net/video/3Pv66biqc1E/видео.html
@neileapenninan8706 4 года назад
I know I thought that too! Actually Andrew Ng is the founder of Deeplearning.ai so technically it isn't a reupload he must've just want to consolidate everything
@AutomationExpertry 5 лет назад
source code?

Следующие

Автовоспроизведение