You Only Look Once: Unified, Real-Time Object Detection

Поделиться
HTML-код
  • Опубликовано: 6 фев 2025
  • This video is about You Only Look Once: Unified, Real-Time Object Detection

Комментарии • 66

  • @engineer.alqupatimohammed6942
    @engineer.alqupatimohammed6942 2 года назад +13

    Great show and unbelievable explanation. Thank you for your tremendous effort.

  • @ΦιλιπποςΚουμπάρος
    @ΦιλιπποςΚουμπάρος 5 лет назад +58

    love the "toilet" regonition at 11:17😂

  • @hristovrigazov3150
    @hristovrigazov3150 5 лет назад +11

    Industry defining talk!

  • @beteaberra
    @beteaberra 3 года назад +1

    Great presentation: clear, thoughtful and fun!

  • @quang-namvu407
    @quang-namvu407 6 лет назад +3

    you've reduced me a lot of time. Thank you!

    • @chesterkylescolita3393
      @chesterkylescolita3393 6 лет назад

      yeah reading the conference paper is quite cumbersome and i am having a hard time understanding it. this video made it simpler to understand

  • @Bamboo_gong
    @Bamboo_gong 3 года назад +1

    wonderful presentation.

  • @lorinma
    @lorinma 8 лет назад +14

    amazing work!

  • @user-ahmed-
    @user-ahmed- 2 года назад

    Incredible presentation

  • @hamzakholti-e7j
    @hamzakholti-e7j 5 месяцев назад

    The video takes me to another dimension hahaha

  • @xylineone
    @xylineone 3 года назад +1

    Awesome

  • @friskybiscuits10
    @friskybiscuits10 7 лет назад +1

    AWSOME VIDEO!!!!

  • @srd6263
    @srd6263 3 года назад +1

    重温经典,膜拜大神

  • @christians6295
    @christians6295 7 лет назад

    Informative & entertaining!

  • @nguyenanhnguyen7658
    @nguyenanhnguyen7658 3 года назад

    Love Yolo ❤️❤️❤️👌👌👌

  • @christianreiser779
    @christianreiser779 8 лет назад

    Great talk!

  • @lyltencent
    @lyltencent 7 лет назад

    Greate work!

  • @dompower500
    @dompower500 5 лет назад

    excellent conference on YOLO.

  • @zuam7645
    @zuam7645 4 года назад

    *THAT IT IS AN EXCELENT SOFTWARE FOR USING IN "DASH CAMERAS" FOR CAPTURING AND VIDEO RECORDING PEOPLE AROUND YOUR BELONGINGS OR CAR KEYING YOUR CAR*

  • @crabsynth3480
    @crabsynth3480 7 лет назад

    2:30 .... Wow... !
    [Edited] 11:50 .... Awe-Inspiring !

  • @sahil-7473
    @sahil-7473 3 года назад +1

    How it's works at Inference time. I am not able to get it. Each output with give range between -1 to 1. Now how can I bring it BB into original image.? Kindly tell me the mathematics, how to compute it's? This is where I stuck. Help me🙏

  • @sumod12
    @sumod12 4 года назад

    super

  • @alexdalton4535
    @alexdalton4535 2 года назад

    how does a grid cell predict a box that is bigger than itself?

  • @莫莫-l4k
    @莫莫-l4k 5 лет назад

    cool!

  • @dominiksulzer1338
    @dominiksulzer1338 7 лет назад +9

    2:43 more than 105 % sure that there is a person when there is not.

    • @phoneplaysguitar
      @phoneplaysguitar 4 года назад +1

      Video is still on and he is still in front of the camera. 105 is still weird, but okay.

    • @alexdalton4535
      @alexdalton4535 2 года назад

      @@phoneplaysguitar it should be between 0 and 100 though lmao, how do you get 105%

  • @apurbaroy8411
    @apurbaroy8411 3 года назад

    Is it possible to integrate the YOLO algorithm with arduino or raspberry pi using a webcam?

  • @sherlockskey4131
    @sherlockskey4131 2 года назад +1

    sir , where can i get complete code....pls help i am working on this project

  • @hyunseokjeong7994
    @hyunseokjeong7994 8 лет назад +1

    Thank you for the video.
    I did not get "NMS and threshold detections"
    could you explain a bit more?

    • @elbouziadyabderrahim8086
      @elbouziadyabderrahim8086 6 лет назад +2

      NMS (Non Maximum Supression) : take the bounding box with the max condidence value

    • @sridharkashyap9603
      @sridharkashyap9603 4 года назад +1

      Nms will only keep the bounding box which has max intersection over union of overlapping bounding boxes

  • @nano7586
    @nano7586 5 лет назад +14

    YOLO is so fucking hilarious.. it's a big "fuck you" to all these kind of scientists who see things a bit too seriously. I love these kind of things and it gets me motivated in the science field, given that science for most part is very dry and it easily makes you depressed. Just thinking about the fact that "YOLO" will probably be mentioned in my masters thesis is so good :D 0:01 That picture is top notch.

  • @XiuyuYang
    @XiuyuYang Год назад

    A milstone in cv.

  • @vikrantchoudhary4411
    @vikrantchoudhary4411 4 года назад

    I just Had one question When we know where the ground-truth centre of the object is why can't we scan just that area or nearby area why do we scan the whole image??

    • @ogsconnect1312
      @ogsconnect1312 4 года назад +3

      Yes you're correct that when we know where the ground-truth center is, we can just scan that area. The problem is generalization i.e. our model will only be good at that specific instance, and when the object happens to be located in another region of the image as is often the case in the test set, the model fails completely and that defeats our training objective and learning wouldn't have taken place in that respective. Hope it makes some sense? Thanks for reading.

    • @vikrantchoudhary4411
      @vikrantchoudhary4411 4 года назад

      @@ogsconnect1312 oh my god !
      that's the single point I was confused the whole time thanks a lot buddy.

  • @huawei2091
    @huawei2091 7 лет назад

    When you say "dont adjust the class probabilities or coordinates" if there are no object centered in that grid cell, you mean simply pass on that cell and move to next, right? So you only backpropagate the NN when there is an object centered in that cell. Am I getting it right?

    • @migrantama
      @migrantama 6 лет назад

      Hello, I'm also looking for the answer to the same question, do you got the idea??

  • @anoushakhan7896
    @anoushakhan7896 6 лет назад +4

    which laptop do you have ?

  • @manojguha2046
    @manojguha2046 7 лет назад +1

    This new method is going to be the future of object detection... So fast and accurate. Is he running on a windows or linux pc ??

  • @Dennis-nn5tc
    @Dennis-nn5tc 7 лет назад

    why they use 2 bounding boxes for 1 cell? For localization 1 bounding box for each cell should be enough or? In OpenCv for example the Object Detection draws only 1 bounding box around an object.

    • @chesterkylescolita3393
      @chesterkylescolita3393 6 лет назад

      i think it is called anchor boxes.

    • @maheswaranparameswaran8532
      @maheswaranparameswaran8532 4 года назад +1

      i guess more than 2 anchor boxes being in a same grid cell if u use a large grid is relatively low...check out andrew ngs video on yolo on deeplearning.ai s channel

  • @shaz7163
    @shaz7163 7 лет назад +1

    How to calculate the p(class/object)

    • @Splish_Splash
      @Splish_Splash Год назад

      model itself produces confidence level by softmax(logits)

  • @pooorman-diy1104
    @pooorman-diy1104 4 года назад

    I stil dont get it ..

    • @alchemication
      @alchemication 4 года назад

      This will make it easier (make sure you watch the previous videos as well to understand the building blocks): ruclips.net/video/9s_FpMpdYW8/видео.html . Hope it helps!

  • @dudeking1000
    @dudeking1000 7 лет назад

    Hey I'm new to the field of Convolutional Neural Network.
    I have a presentation in school on YOLO and I need some help.
    Can someone please explain how the output of the convolution layer works.
    The input to the first convolution network is a 448*448*3 tensor. And it's output is a 224*224*64 tensor on a filter of 7*7.
    I understand that the depth is 64 because of 64 different filters (features)
    Thank you!

  • @TheVasanthbuddy
    @TheVasanthbuddy 8 лет назад +14

    103% probability that its a person. Something fishy in your calculation

  • @sandaminirathnayaka876
    @sandaminirathnayaka876 7 лет назад

    can u share the source code with me?

  • @xinyizhang3766
    @xinyizhang3766 2 года назад

    toilet lol

  • @rusMusDie
    @rusMusDie 7 лет назад

    poor presentation

  • @anikethshetty992
    @anikethshetty992 7 лет назад

    Hey I'm new to the field of Convolutional Neural Network.
    I have a presentation in school on YOLO and I need some help.
    Can someone please explain how the output of the convolution layer works.
    The input to the first convolution network is a 448*448*3 tensor. And it's output is a 224*224*64 tensor on a filter of 7*7.
    I understand that the depth is 64 because of 64 different filters (features)
    Thank you!

    • @dhruba1992
      @dhruba1992 3 года назад +2

      Output shape= (𝑊 −𝐾+2𝑃)/𝑆 + 1 ; W = input volume, K = kernel size, P = padding, S = stride