How it's works at Inference time. I am not able to get it. Each output with give range between -1 to 1. Now how can I bring it BB into original image.? Kindly tell me the mathematics, how to compute it's? This is where I stuck. Help me🙏
YOLO is so fucking hilarious.. it's a big "fuck you" to all these kind of scientists who see things a bit too seriously. I love these kind of things and it gets me motivated in the science field, given that science for most part is very dry and it easily makes you depressed. Just thinking about the fact that "YOLO" will probably be mentioned in my masters thesis is so good :D 0:01 That picture is top notch.
I just Had one question When we know where the ground-truth centre of the object is why can't we scan just that area or nearby area why do we scan the whole image??
Yes you're correct that when we know where the ground-truth center is, we can just scan that area. The problem is generalization i.e. our model will only be good at that specific instance, and when the object happens to be located in another region of the image as is often the case in the test set, the model fails completely and that defeats our training objective and learning wouldn't have taken place in that respective. Hope it makes some sense? Thanks for reading.
When you say "dont adjust the class probabilities or coordinates" if there are no object centered in that grid cell, you mean simply pass on that cell and move to next, right? So you only backpropagate the NN when there is an object centered in that cell. Am I getting it right?
why they use 2 bounding boxes for 1 cell? For localization 1 bounding box for each cell should be enough or? In OpenCv for example the Object Detection draws only 1 bounding box around an object.
i guess more than 2 anchor boxes being in a same grid cell if u use a large grid is relatively low...check out andrew ngs video on yolo on deeplearning.ai s channel
This will make it easier (make sure you watch the previous videos as well to understand the building blocks): ruclips.net/video/9s_FpMpdYW8/видео.html . Hope it helps!
Hey I'm new to the field of Convolutional Neural Network. I have a presentation in school on YOLO and I need some help. Can someone please explain how the output of the convolution layer works. The input to the first convolution network is a 448*448*3 tensor. And it's output is a 224*224*64 tensor on a filter of 7*7. I understand that the depth is 64 because of 64 different filters (features) Thank you!
Hey I'm new to the field of Convolutional Neural Network. I have a presentation in school on YOLO and I need some help. Can someone please explain how the output of the convolution layer works. The input to the first convolution network is a 448*448*3 tensor. And it's output is a 224*224*64 tensor on a filter of 7*7. I understand that the depth is 64 because of 64 different filters (features) Thank you!
Great show and unbelievable explanation. Thank you for your tremendous effort.
love the "toilet" regonition at 11:17😂
lol
😂
Industry defining talk!
Great presentation: clear, thoughtful and fun!
you've reduced me a lot of time. Thank you!
yeah reading the conference paper is quite cumbersome and i am having a hard time understanding it. this video made it simpler to understand
wonderful presentation.
amazing work!
Incredible presentation
The video takes me to another dimension hahaha
Awesome
AWSOME VIDEO!!!!
重温经典,膜拜大神
Informative & entertaining!
Love Yolo ❤️❤️❤️👌👌👌
Great talk!
Greate work!
excellent conference on YOLO.
*THAT IT IS AN EXCELENT SOFTWARE FOR USING IN "DASH CAMERAS" FOR CAPTURING AND VIDEO RECORDING PEOPLE AROUND YOUR BELONGINGS OR CAR KEYING YOUR CAR*
2:30 .... Wow... !
[Edited] 11:50 .... Awe-Inspiring !
How it's works at Inference time. I am not able to get it. Each output with give range between -1 to 1. Now how can I bring it BB into original image.? Kindly tell me the mathematics, how to compute it's? This is where I stuck. Help me🙏
super
how does a grid cell predict a box that is bigger than itself?
cool!
2:43 more than 105 % sure that there is a person when there is not.
Video is still on and he is still in front of the camera. 105 is still weird, but okay.
@@phoneplaysguitar it should be between 0 and 100 though lmao, how do you get 105%
Is it possible to integrate the YOLO algorithm with arduino or raspberry pi using a webcam?
sir , where can i get complete code....pls help i am working on this project
Thank you for the video.
I did not get "NMS and threshold detections"
could you explain a bit more?
NMS (Non Maximum Supression) : take the bounding box with the max condidence value
Nms will only keep the bounding box which has max intersection over union of overlapping bounding boxes
YOLO is so fucking hilarious.. it's a big "fuck you" to all these kind of scientists who see things a bit too seriously. I love these kind of things and it gets me motivated in the science field, given that science for most part is very dry and it easily makes you depressed. Just thinking about the fact that "YOLO" will probably be mentioned in my masters thesis is so good :D 0:01 That picture is top notch.
A milstone in cv.
I just Had one question When we know where the ground-truth centre of the object is why can't we scan just that area or nearby area why do we scan the whole image??
Yes you're correct that when we know where the ground-truth center is, we can just scan that area. The problem is generalization i.e. our model will only be good at that specific instance, and when the object happens to be located in another region of the image as is often the case in the test set, the model fails completely and that defeats our training objective and learning wouldn't have taken place in that respective. Hope it makes some sense? Thanks for reading.
@@ogsconnect1312 oh my god !
that's the single point I was confused the whole time thanks a lot buddy.
When you say "dont adjust the class probabilities or coordinates" if there are no object centered in that grid cell, you mean simply pass on that cell and move to next, right? So you only backpropagate the NN when there is an object centered in that cell. Am I getting it right?
Hello, I'm also looking for the answer to the same question, do you got the idea??
which laptop do you have ?
Something with a titan x for sure....its on his github page
This new method is going to be the future of object detection... So fast and accurate. Is he running on a windows or linux pc ??
linux
why they use 2 bounding boxes for 1 cell? For localization 1 bounding box for each cell should be enough or? In OpenCv for example the Object Detection draws only 1 bounding box around an object.
i think it is called anchor boxes.
i guess more than 2 anchor boxes being in a same grid cell if u use a large grid is relatively low...check out andrew ngs video on yolo on deeplearning.ai s channel
How to calculate the p(class/object)
model itself produces confidence level by softmax(logits)
I stil dont get it ..
This will make it easier (make sure you watch the previous videos as well to understand the building blocks): ruclips.net/video/9s_FpMpdYW8/видео.html . Hope it helps!
Hey I'm new to the field of Convolutional Neural Network.
I have a presentation in school on YOLO and I need some help.
Can someone please explain how the output of the convolution layer works.
The input to the first convolution network is a 448*448*3 tensor. And it's output is a 224*224*64 tensor on a filter of 7*7.
I understand that the depth is 64 because of 64 different filters (features)
Thank you!
Because the stride is 2?
103% probability that its a person. Something fishy in your calculation
can u share the source code with me?
github.com/pjreddie/darknet
toilet lol
poor presentation
Hey I'm new to the field of Convolutional Neural Network.
I have a presentation in school on YOLO and I need some help.
Can someone please explain how the output of the convolution layer works.
The input to the first convolution network is a 448*448*3 tensor. And it's output is a 224*224*64 tensor on a filter of 7*7.
I understand that the depth is 64 because of 64 different filters (features)
Thank you!
Output shape= (𝑊 −𝐾+2𝑃)/𝑆 + 1 ; W = input volume, K = kernel size, P = padding, S = stride