C4W2L05 Network In Network

Поделиться
HTML-код
  • Опубликовано: 29 дек 2024

Комментарии • 34

  • @nikilragav
    @nikilragav 5 месяцев назад

    If I didn't already know what was going on, I'd be supremely confused by this explanation 3:31...
    The channel @AnimatedAI has a great explanation on 1x1 convolutions.
    The way I think about it is you've made a previous layer with a bunch of filters. So maybe you have one filter detecting vertical edges, another doing horizontal edges, another doing angled edges, another detecting red to black transitions, another for yellow colors, etc etc. Now you've got a stack of those images, and you want to go thru each pixel and combine the results of those filters with some weights. So if you want to get complete edge detections, you might add the horizontal edge channel + vertical edge channel + diagonal edge channel (and not include the yellow channel results or the red to black channel). That's what the 1x1 convolution is doing. Mixing the results of the various filters.
    Maybe you have a second 1x1 filter channel that is trying to isolate yellow objects next to red objects (like mustard bottles next to ketchup bottles, idk). Then the second filter channel might heavily weight the yellow channel pixels and the red-black channel pixels but ignore the other channels.
    You inherently need mixing like this if you want to eventually get to "detect a dog's face".

  • @lvdiful
    @lvdiful 2 месяца назад

    For the example at 6:12, with the 32 1x1 filters, even the channel number dropped down to 32, but each channel are almost identical or be different just by the value of the 1*1 filter, is this correct? What is a typical use case for this example?

  • @lennonli9100
    @lennonli9100 4 года назад +5

    isnt it just a regular filter with 1by1 dimension that is not used for edge detection but change filter dimension or add non linearity?

  • @JagtarSingh-pv9mn
    @JagtarSingh-pv9mn 5 лет назад +3

    Is there information sharing happening across the channels in this case?

  • @siddhantvats9088
    @siddhantvats9088 3 года назад +2

    Curious if we use filters any some other dim but less channel, won't it reduce the resulting channel?

  • @nikhilrana8800
    @nikhilrana8800 5 лет назад +2

    When we multiply 1*1*32 filter with 6*6*32 then no. after multiplied we get for all the 32 channels, we have to take the sum and then apply the relu function to it. Is I am right??

    • @sammathew243
      @sammathew243 5 лет назад +1

      Yes

    • @shaelanderchauhan1963
      @shaelanderchauhan1963 3 года назад +1

      @@sammathew243 and relu will be a Number if greater than zero and 0 if no is less or equl to 0 right?

  • @btobin86
    @btobin86 5 лет назад +8

    I'm not quite sure what he means when he says the output is the # of filters. Doesn't the output of one of those 1 x 1 x 32 (in this case) filters just a single real number?

    • @chaitragn3379
      @chaitragn3379 5 лет назад

      may be its a channel(R,G,B) and number of filters are different.

    • @anasfirdousi
      @anasfirdousi 5 лет назад +2

      The output after applying filter = ( n - f + 1 ) x ( n - f + 1 ) x #of filters
      n = input dimension which is 6 x 6, so n = .6
      f = filter dimension which is 1 x 1, so f = 1
      # of filters = 32
      so the final o/p after applying all filters will be:
      ( 6 - 1 + 1 ) x ( 6 -1 +1 ) x 32 = 6 x 6 x 32
      The formula n - f + 1 works when stride = 1 , watch : ruclips.net/video/smHa2442Ah4/видео.html

    • @NikhilAngadBakshi
      @NikhilAngadBakshi 5 лет назад +2

      Yes for each filter at each location is a single real number. Therefore for one 1x1 filter the output over all locations in the input image volume is an image with depth=1. Usually we have multiple filters and hence the output depth is equal to the number of filters.

    • @sandyz1000
      @sandyz1000 5 лет назад +3

      ( 1 x 1 x 32 ) is the filter volume and #n x ( 1 x 1 x 32) where as n = no of filter. The output of ( 1 x 1 x 32) gives a scalar value for each pixel in the 32 input channel and #n is the no of channel in the output filter.

    • @טללהט-ו7ו
      @טללהט-ו7ו 2 года назад +2

      This is old but I will answer for future viewers - In my understanding the # of filters IS NOT 32, the number of filters will be the number of times you applied different filters of 1X1X32, so if you did 1X1X32 X Z times you will get here 6X6XZ

  • @MeAndCola
    @MeAndCola 5 лет назад +1

    Is this also the case for normal sized filters too? Filters aren't applied over 2D'ally, for each channel, but rather 3D'ally, over the entire channels?

    • @XxXMrGuiTarMasTerXxX
      @XxXMrGuiTarMasTerXxX 4 года назад

      As far as I understood, yeah. But I think sometimes (specially at the beginning of the network) the filter is shared over the three channels RGB. This is, instead of, for example, a 3x3x3 filter, you only have a 3x3x1 filter and the parameters are shared. However, this is a trick, and the filters are applied in 3D

    • @MrAmgadHasan
      @MrAmgadHasan Год назад

      Yes. Every filter in a cnn will have 3 dimensions (height, width, depth) with depth being equal to the depth of the input features maps.

  • @gauravfotedar
    @gauravfotedar 4 месяца назад

    I don't see the point in what he said about 1x1 convolution reducing filters? For any convolution filter 3x3, 5x5 or any size, the output channels are always determined by the number of filters not by filter size. So if you have 192 input channels, if you use 32 3x3 size filters, that will also reduce the channel dimension to 32 just like using 32 1x1 filters. So why decouple reducing height width and reducing channels? Filters of any size do both at the same time anyway.

    • @gauravfotedar
      @gauravfotedar 4 месяца назад

      Okay, One use case is explained in the next Inception motivation video

  • @subhamjha8917
    @subhamjha8917 Год назад

    In what situations is it useful? Can you please provide some case study/example.

    • @whitesaladchips
      @whitesaladchips 8 месяцев назад

      to reduce number of channels

    • @deeplearningexplained
      @deeplearningexplained 6 месяцев назад

      1. GoogLeNet Inception network
      2. ResNet when they get to more than 50 layers

  • @urarakono442
    @urarakono442 2 года назад

    Does the yellow block of size 1*1*32 have the same numbers over 32 voxels?

    • @MrAmgadHasan
      @MrAmgadHasan Год назад

      No. It can have 32 different weights.

  • @katyhessni
    @katyhessni Год назад

    Thanks

  • @sandipansarkar9211
    @sandipansarkar9211 4 года назад

    good explanation

  • @MabrookAlas
    @MabrookAlas Год назад

    Great 👍🏼

  • @sadenb
    @sadenb 5 лет назад

    Can a siamese network be done upon 1x1 convolutions if we have precomputed 1-D features ?

    • @sandyz1000
      @sandyz1000 5 лет назад +2

      You can use inception n/w which uses 1x 1 convolution for computing the embedding for the siamese n/w. Siamese which means same or similar is used in the final layer where constructive loss / triplet loss is used to optimise the loss function so that similar vectors tend to have less distance than dissimilar vectors with certain margins in it.

    • @som6553
      @som6553 8 месяцев назад

      @@sandyz1000 when should one use it ? and when not?

  • @computerlifesupport
    @computerlifesupport 2 года назад

    thaanks (⌐■_■)

  • @EranM
    @EranM 6 лет назад

    thsrink!

  • @trexmidnite
    @trexmidnite 3 года назад

    Scary movie