C 5.0 | Object Localization | Bounding Box Regression | CNN | Machine Learning | EvODN

Поделиться
HTML-код
  • Опубликовано: 27 окт 2024

Комментарии • 93

  • @Cogneethi
    @Cogneethi  5 лет назад

    See full course on Object Detection: ruclips.net/p/PL1GQaVhO4f_jLxOokW7CS5kY_J1t1T17S and Subscribe to my channel
    If you found this tutorial useful, please share with your friends(WhatsApp/iMessage/Messenger/WeChat/Line/KaTalk/Telegram) and on Social(LinkedIn/Quora/Reddit),
    Tag @cogneethi on twitter.com
    Let me know your feedback @ cogneethi.com/contact

  • @anandphilip
    @anandphilip 3 года назад +4

    This is the simplest, most lucid explanation for the topic I've heard.

  • @PJDuro
    @PJDuro 4 года назад +5

    There are very few good explanations of this on the net, including in the online courses. You really made it easer to understand, thanks!

    • @rodrigodevon1152
      @rodrigodevon1152 3 года назад

      A tip : you can watch movies on Flixzone. Me and my gf have been using it for watching lots of of movies recently.

    • @warrenjefferson2237
      @warrenjefferson2237 3 года назад

      @Rodrigo Devon Yup, I have been watching on flixzone for since november myself :)

    • @ansondiego8875
      @ansondiego8875 3 года назад

      @Rodrigo Devon Definitely, I have been using Flixzone for months myself :D

    • @deaconjacob2283
      @deaconjacob2283 3 года назад

      @Rodrigo Devon Definitely, I have been using Flixzone for years myself :D

    • @Replcate
      @Replcate Год назад

      hey can you help with coding parts i am a beginner

  • @ImranKhan-tc8jz
    @ImranKhan-tc8jz 4 года назад +2

    Thankyou so much sir. I was looking for an explanation and unfortunately I could not get it even after watching alot of youtube videos and reading articles . But when I saw this video, My confusion is clear now. Thanks again.

  • @ganeshchalamalasetti2884
    @ganeshchalamalasetti2884 3 года назад +1

    That was an amazing explanation and insight about the bounding box regressor. Rare video about this topic. I appreciate your efforts.

    • @Cogneethi
      @Cogneethi  3 года назад +1

      Thanks Ganesh!

    • @ganeshchalamalasetti2884
      @ganeshchalamalasetti2884 3 года назад

      @@Cogneethi By the way, are you still working on to put YOLO framework on board?

    • @Cogneethi
      @Cogneethi  3 года назад

      @@ganeshchalamalasetti2884 Not yet. Caught up with some projects. So not getting the time. So may be later.

    • @ganeshchalamalasetti2884
      @ganeshchalamalasetti2884 3 года назад

      @@Cogneethi Make sense. All the best 👍

  • @SouravDas-eg8ok
    @SouravDas-eg8ok 4 года назад +1

    Very good videos. Small and crispy. Easy to understand. Thank you very much.

  • @shobhitsharma1022
    @shobhitsharma1022 2 года назад

    which can model is good for detecting bounding boxes of customer demographic on national ID cards?

  • @valentinfontanger4962
    @valentinfontanger4962 4 года назад

    This is excellent ! If you are willing to understand YOLO, YOU NEED TO WATCH THIS VIDEO !!! I had no idea how bouding boxes were predicted. Now it's clear, the only thing I have to figure out is how to spread my last layers into a classifier and a regressor.
    Thanks !!!!!

    • @Cogneethi
      @Cogneethi  4 года назад

      Welcome Valentin!

    • @Replcate
      @Replcate Год назад

      @@Cogneethi the theory explanation is good but can you tell me codes also for this

  • @dynocodes
    @dynocodes 3 года назад

    most simplified video on youtube keep it up bro hats off for your explanation.i would like to learn coding part for it.

  • @anonymosranger4759
    @anonymosranger4759 4 года назад +2

    Amazing video!!! You deserve more subs!

  • @waterspray5743
    @waterspray5743 2 года назад

    When training the neural network, should all the other bounding boxes be zeros? Say there are three classes: *people, boat, tv.* If the image contains only a boat, what is the ground truth for people and tv?

    • @Cogneethi
      @Cogneethi  2 года назад

      We will still have some bounding box estimate for the other classes, but will not be considered. Only the BBox corresponding to the highest class will be considered.

  • @sahasradalghara9904
    @sahasradalghara9904 3 года назад

    do we need 4 output layer for Bouning box regressor ?

  • @BiancaAguglia
    @BiancaAguglia 4 года назад +1

    You're a good teacher. I know that, as you said on your website, video/audio recording is time consuming and is not for the faint of heart 😀, but I hope you'll continue to do these tutorials.
    One question: do you have examples of actually coding the neural networks you explain in your videos? I looked on your website and your github account but didn't find anything. It might be that I didn't do a very good job at searching. 😊

    • @Cogneethi
      @Cogneethi  4 года назад

      @Bianca,
      Regarding the code, I have not posted them on github/website. I will probably post them to github. I will comment here and let you know as soon as I do. (But first I have to find them, I dont know where I saved them :( )
      Thank you for the encouragement. I will try to do more of these as and when I find time. :)
      And let me know if I have made any mistakes and how I can improve, since, like you, I am still learning and not an expert yet!

    • @Cogneethi
      @Cogneethi  4 года назад +1

      Meanwhile, these are the libraries that I used in this tutorial:
      HOG: scikit-image.org/docs/dev/auto_examples/features_detection/plot_hog.html
      SVM: scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html#sklearn.svm.LinearSVC
      VGG & Faster RCNN: github.com/endernewton/tf-faster-rcnn

    • @Cogneethi
      @Cogneethi  4 года назад

      Code and PPT are here: drive.google.com/drive/folders/120KC9i3F0WMhqksngS-dWS1iJNP-mXAv?usp=sharing

  • @kinsung85
    @kinsung85 3 года назад

    Since the feature maps separate into two path, one for classification, the other one for grounding box regressor. How does one know the results of classification (object1, object2, object3) belong to which bounding box (box1, box2, box3)
    what I mean the result could be object1 -> box2, but in your explanation, object1 -> box1

    • @kinsung85
      @kinsung85 3 года назад

      My question is how to decide which object belong to which bounding box?

  • @marlene5547
    @marlene5547 2 года назад

    Thank you so much for explaining this!!

  • @pallawirajendra
    @pallawirajendra 4 года назад +1

    Very clear explanation. Keep creating.

  • @yusufbalik6784
    @yusufbalik6784 2 года назад +1

    Sir, what you said is theoretically very good, thank you. I train the model with faster rcnn. This model gives me good outputs. It draws the boxes, but I can't get the coordinates of the boxes with the code, how can I do this?

  • @chyldstudios
    @chyldstudios 2 года назад

    Amazing explanation!

  • @secretfolder8870
    @secretfolder8870 2 года назад

    In 4:10, how do you mean modifying the last FC layer to obtain the BBox coordinates? Are these coordinates obtained from 4 Layers of the FC layer or from only the last FC layer? Please clarify.

  • @tiasm919
    @tiasm919 4 года назад

    There should be 2 loss functions used here right ?
    1. For the classification layer
    2. For the regression layer
    Does we sum the loss before backpropagating through the network ? What i understand is, we count the classification loss to "only" backpropagate through the classification layer AND we count the regression loss to "only" backpropagte through the regression layer (only for the 4 neuron correspond to the predicted class in classification layer). Both of the loss will be "added" to the neurons in the last FC layer, lets say FC7, through both of the layer backprop step (which branched into class and regg layer)
    Is this right ? Can you please clarify this for me ?

  • @pavithrans7153
    @pavithrans7153 4 года назад +2

    How to change the last fully connected layer to give the co-ordinates of bounding boxes?

    • @Cogneethi
      @Cogneethi  4 года назад +1

      The output you get in the last FC layer depends on the loss function that you use. If you use softmax and use class labels to calculate loss, then eventually after many steps of training, it will learn to predict the correct class labels. Here you need just 1 output per class. This is a case of 'classification'.
      In the code, you might use something like this: pytorch.org/docs/stable/nn.html?highlight=loss#torch.nn.CrossEntropyLoss
      Instead, If you use L2 loss, and use the co-ordinates of the 'ground truth' bounding box as input to L2 loss function, after many training steps, if will learn to predict the co-ordinates of bounding box. Only, in this case, since bbox needs 4 points, your last FC layer outputs will have 4 outputs per class. This is a case of 'regression'.
      In the code, you might use something like this: pytorch.org/docs/stable/nn.html?highlight=loss#torch.nn.MSELoss
      In general, it all depends on what is the output you are expecting and what kind of loss function you are using. Based on this, you decide the number of outputs in your last FC layer.
      If, you see this pytorch.org/docs/stable/search.html?q=loss&check_keywords=yes&area=default, there are different types of loss functions suitable for different use cases.
      Let me know, if I need to eloborate further.

    • @Cogneethi
      @Cogneethi  4 года назад

      Lets say, your last fully connected layer before classification and bbox regression is called 'fc7'.
      Then, from this, to get the classification probabilities, you do:
      cls_score = fully_connected(fc7, num_classes,...)
      cls_prob = softmax(cls_score)
      # here number of outputs = num_classes
      And to get the bbox co-ordinates, you do:
      bbox = fully_connected(fc7, num_classes * 4,...)
      # here num of outputs = num_classes * 4.
      That is all there is to it. In fact, when I was studying, I too was too confused by it. But after seeing the code, it was clear.

    • @Cogneethi
      @Cogneethi  4 года назад

      The coding part, I have covered a bit more in the last chapter 'Faster-RCNN'.
      ruclips.net/video/09DRku3USAs/видео.html

    • @RnFChannelJr
      @RnFChannelJr 4 года назад

      @@Cogneethi if it possible, may i see the code for implementation this lecture ?

    • @tiasm919
      @tiasm919 4 года назад

      Does the last FC layer branced into 2 different layers:
      1. For classification, consist of Nclass+1(background) neurons and softmax function
      2. For bounding box regression, consist of Nclass*4(each for the coordinate)
      Is this true ? (I belive this is what pavithran was asking)
      Edit : reading the code you provide in the first reply of this comment, i believe what i say is right. Thanks :)

  • @kirushikeshdb1885
    @kirushikeshdb1885 3 года назад

    I have a doubt you told us that we do backpropagation based on the L2 loss between the actual bounding box and the predicted bounding box coordinates. But we actually have bounding box coordinates only for the correct class which means all the coordinates for the incorrect classes are fixed to zero.

  • @aakashr4974
    @aakashr4974 3 года назад

    Can you please explain the loss function, will you put the correct coordinates only corresponding to the actual lable?

    • @Cogneethi
      @Cogneethi  3 года назад +1

      It is L2 loss used in this example: heartbeat.fritz.ai/5-regression-loss-functions-all-machine-learners-should-know-4fb140e9d4b0
      "will you put the correct coordinates only corresponding to the actual lable?"
      That is correct.

  • @akhilsraj6698
    @akhilsraj6698 4 года назад +2

    Thank you!!

  • @rs9130
    @rs9130 2 года назад

    thank you for good explaination. can we find approximation of bbox for semantic segment masks?

  • @trungphamduc8271
    @trungphamduc8271 3 года назад

    Many thanks about this video.

  • @vineethgogu2309
    @vineethgogu2309 3 года назад

    Hello sir
    Can we completely remove/wipe out text from an image ??? Using python libraries like easyocr ,pytessarat

    • @Cogneethi
      @Cogneethi  3 года назад

      May be you can first identify text position and crop it off and use 'image in-painting' technique to fill the gaps.
      Not sure about the quality but should work.

    • @vineethgogu2309
      @vineethgogu2309 3 года назад

      @@Cogneethi hey please
      Why can't you make a video on it ?? And explain how it actually works so that it would be very beneficial to me

    • @vineethgogu2309
      @vineethgogu2309 3 года назад

      And more over I identified the text position on a image using EASYOCR
      Have a try with easyocr ???

    • @Cogneethi
      @Cogneethi  3 года назад

      @@vineethgogu2309 Unfortunately, as of now, i dont have the bandwidth for new videos.
      But I found a blog on inpainting which might help.
      heartbeat.fritz.ai/guide-to-image-inpainting-using-machine-learning-to-edit-and-correct-defects-in-photos-3c1b0e13bbd0
      paperswithcode.com/task/image-inpainting
      Once you have the text coordinates from any ocr library, you can just set them to ones or zeros and try inpainting.

    • @vineethgogu2309
      @vineethgogu2309 3 года назад

      @@Cogneethi thank you so much sir for providing blog links ....
      👍👍👍👍👍👍👍

  • @mukundsrinivas8426
    @mukundsrinivas8426 4 года назад +1

    wonderful video

  • @maschleimichael16
    @maschleimichael16 4 года назад

    Thanks for the explanation,it is very clear and easy to understand

  • @harishkumaranandan5946
    @harishkumaranandan5946 4 года назад

    Hi, between (6:00 - 6:03) regarding the initial bbox that u mentioned as a hypothetical one..will it be a bbox of any one of the feature map among the stack in last FC layer? also having said the last FC layer basically has the stack of different features as a vector..will the stack have the entire boat as one of feature and based on ground truth co-ordinates and bbox regression we are using the L2 loss to narrow down on that one as the location. Like basically backtracking the 4 ground truth bbox co-ordinates in the feature vector space bbox coordinates for any input?

    • @Cogneethi
      @Cogneethi  4 года назад

      No, the enitre boat will not be a feature.
      It is difficult to guage exactly what is happening.
      You may check this: distill.pub/2020/circuits/zoom-in/
      and this:
      ruclips.net/video/AgkfIQ4IGaM/видео.html
      The network basically learns patterns in the data. And based on the pattern it will approximately guage the location of the object.
      And we fine tune the detection part based on the Ground Truth.

    • @harishkumaranandan5946
      @harishkumaranandan5946 4 года назад

      @@Cogneethi Hi Thanks for getting back. I had a look at the links and also aware of the visualization toolbox. However the thing still not clear is about the initial hypothetical bbox co-ordinates that you mentioned. Let u say that when we start training even before the first back propagation step the network's FC layer will have a stack of feature map and as you say yes it won't be an entire boat or whatever object we are trying to detect. If this is the case then the o/p bbox co-ordinates that we try to detect are around an entire boat right. So how does the stack of feature map in FC layer with each representing just a part of feature extracted through the CNN operation coupled with pooling stride etc....return set of bbox coordinate that is supposed to represent a bbox for an entire boat.
      2nd question: if the 1st bbox coordinate is hypothetical then does is not co relate with the features of the boat and it is through the ground truth and L2 loss we are forcing the network to spit out the final numbers or if the initial bbox co-ordinates co relate or is formed based on features of a boat then can you show a similar explanation as how u did for HOG+ SVM how we form the bbox co-ordinates from features in FC layer stack (the transformation) even though they are not accurate.

    • @Cogneethi
      @Cogneethi  4 года назад +1

      @@harishkumaranandan5946
      Sorry, at this point of time, I dont have an easy answer to the 1st question.
      Regarding the 2nd one, I will have to dig deeper into visualization and show some examples as you suggested.
      I have received similar queries from other viewers.
      But to briefly ans ur 2nd q:
      Yes, initially the network just spits out random numbers.
      Later on as the training proceeds, once the network sees 1000s of boat images, we are using the ground truth values to force the network to learn the correct bbox co-ordinates from the feature maps extracted by the CNN.
      This way, the network learns to read the feature maps and guess the correct bbox values.
      I have given some sort of imperfect demo at the end of 8th chapter. That might help ur intuitions a bit.
      Meanwhile, I will keep this in mind when I try to expand the course, I will probably include more visualizations for better understanding.

    • @harishkumaranandan5946
      @harishkumaranandan5946 4 года назад

      @@Cogneethi Hi cogneethi, thanks for getting back. I appreciate it. I will keeping touch.

  • @DrMukeshBangar
    @DrMukeshBangar 2 года назад

    Great 👍🙏

  • @medhavimonish41
    @medhavimonish41 4 года назад +1

    best explanation , thank you sir

  • @Life_on_wheeel
    @Life_on_wheeel 3 года назад

    Thanks for crystal clear explanation.

  • @annie157
    @annie157 3 года назад

    How did you find the coordinates (200,250) (600,400) I didn't understand please explain

  • @studentpointofview9328
    @studentpointofview9328 4 года назад +1

    amazing tutorial!

  • @sathishbabu3867
    @sathishbabu3867 4 года назад +1

    sir how to find the distance from camera to bounding box

  • @RnFChannelJr
    @RnFChannelJr 4 года назад +1

    hello sir, thanks for explanations. but if it possible can you explain that theory into the real source code ?

    • @Cogneethi
      @Cogneethi  4 года назад

      Unfortunately I have not covered the coding part. May be on a later date I will add some kind of explanations for the code.
      But in the end, I have covered the code for Faster RCNN.
      ruclips.net/video/09DRku3USAs/видео.html
      drive.google.com/drive/folders/120KC9i3F0WMhqksngS-dWS1iJNP-mXAv?usp=sharing
      github.com/endernewton/tf-faster-rcnn

  • @SouravDas-eg8ok
    @SouravDas-eg8ok 4 года назад

    Please let me know which libraries you have used for coding.

    • @Cogneethi
      @Cogneethi  4 года назад

      @Saurav, these are the libraries that I used in this tutorial:
      HOG: scikit-image.org/docs/dev/auto_examples/features_detection/plot_hog.html
      SVM: scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html#sklearn.svm.LinearSVC
      VGG & Faster RCNN: github.com/endernewton/tf-faster-rcnn

  • @aviseklahiri3864
    @aviseklahiri3864 4 года назад

    Thanks for the awesome tutorial. Just one doubt: so this video assumes that a given image has only 1 instance of the object ?

    • @Cogneethi
      @Cogneethi  4 года назад

      Yes, that is the assumption in case of 'Localization'. For 'Detection' there will be multiple objects in an image.

  • @chandrabindu4440
    @chandrabindu4440 3 года назад

    It would be extremely fruitful if you explained the code along with this theory.

    • @Cogneethi
      @Cogneethi  3 года назад

      Yes, I think that is one mistake that I made which i realised later. "If" I make some videos in future, I will definitely include code. :)

  • @durgabhavanitirumala9632
    @durgabhavanitirumala9632 4 года назад +1

    Sir could you provide me some very good reference videos to understand how yolov3 work?

  • @Replcate
    @Replcate Год назад

    the theory explanation is good but can you tell me codes also for this

  • @khabarsilva6850
    @khabarsilva6850 4 года назад

    Good explanation 👍

  • @anirudhbabu8496
    @anirudhbabu8496 4 года назад

    what if camera sensor outputs in bounding boxes with 3rd order polynomial.
    how to decode it

    • @Cogneethi
      @Cogneethi  4 года назад

      Sorry, I dont know about this.

  • @vcvracarkad
    @vcvracarkad 4 года назад

    can anyone link to a keras implementation of the object detector model?

  • @durgabhavanitirumala9632
    @durgabhavanitirumala9632 4 года назад

    you did not cover yolo model?

    • @Cogneethi
      @Cogneethi  4 года назад

      Not yet, will do so in few months time.

  • @abdussametturker
    @abdussametturker 3 года назад +1

    Thank you

  • @adityarajora7219
    @adityarajora7219 4 года назад

    explain YOLO .....it would be a great help.

    • @Cogneethi
      @Cogneethi  4 года назад

      Yes, in a few months.

  • @sahilmakandar773
    @sahilmakandar773 4 года назад +1

    very good

  • @vikramreddy5631
    @vikramreddy5631 4 года назад

    how do you get expected values to compare

    • @Cogneethi
      @Cogneethi  4 года назад

      It is manually annotated for each image by some person. See this ruclips.net/video/e4G9H18VYmA/видео.html