Pooling and CONV are acturally similar, the output shape of each filter of them can be calculated by (n(l-1)+2p(l)-f(l))/s(l)+1. Sometimes we want to keep the output as the shape of input, that was saying n(l)=n(l-1) where n(l)=[n(l-1)+2p(l)-f(l)]/s(l)+1. Usually we set s=1, that's simplified as n(l-1)=n(l-1)+2p(l)-f(l)+1, that means we could keep the shape by setting padding as p=0.5[f(l)-1]. For exapmle, if we have a pooling filter shaped 5 by 5, our padding should be 2.
well, there is no hard rule for that, Rule of thumb is, More the number of filters, More feature you are extracting. Also, keep in mind, not all the features are important (some people think more the features are better the model will be, this is not true at all) and this could lead to overfitting and computational overhead. And It is totally problem-specific (choosing filter size). It is hyper-parameter in itself. Lastly, You could do a hyper-parameter search to get the best filter size (that would be insane because you have multiple layers and each layer has a filter).
@@akashkewar Hey, could you explain how come 28*28 remains to be 28*28 after 3*3 filters and also for others? I get for 1*1 it remains to be 28*28 as it is (28-1+1)=28.In the similar manner is not it like( 28-3+1)=26 for 3*3?
question regarding the 1x1 conv strategy at 6:00 i understand that this trick reduces the number of parameters. but what i don't understand is how it is comparable to the original 5x5 conv. from my understand this would create completely different features because it does not use the original input of the layer but the output of the 1x1 conv. So what's the point? Update: Ah okay he mentions this thought at the end of the video. It seems there is no big impact on performance "if you choose the reduction right".
What i dont understand is how an input image could have 192 multiple channels..? Is there a common type of usage where inputs are not only consist of R, G and B channels?
I think the input layer he is talking about is the inception module that resides somewhat deeper in the inception network. If you look closely to the overall inception model, there is a lot of hidden layers before this model kicks in. So it's actually the 'general' inception model that he is talking about rather than the overall architecture itself.
If you are familiar with the idea of edge detectors, then these 192 multiple channels are used to detect many different features from the image or in other word extract features. I guess you are watching the videos from the middle I suggest you, go through the whole playlist and watch videos one by one.
Also don't forget that this 28x28x192 input could be the concatenated output from the previous inception module and probably occurs quite deep in the model so that's why the number of channels is high
what I can not understand is that how after applying 5x5 or 3x3 filter still we have 28x28 output as we saw in earlier lecture we can found it by nh-2p-f+1/s.
I have a question. Let At first convolution layer if we apply 32 filters on a gray scale image then output of first layer would be 32 matrixes or say 32 filtered images. Then at second layer if we are applying 64 filters then does it mean that we are applying 64 different filters over each of 32 filtered images???? And output of second layer would be 64*32=2048 filtered images???. Plz let it clear if anyone can
@@kirandeepsingh9144 based on filters it will extract features which will give you scaled down martrices. Dimesnsions will depend on the filter dimensions.
The 64 filters must have a lower dimensionality than the 32 activation maps...A simple rule is that when you decrease the dimensionality of a filter the no. of activation maps(outputs) from that filter increases keeping in mind a constant stride is taken into account. Basically to extract more precise features out of the input activation maps, you increase the no. of filters and reduce their dimensionality.
When I try to work on Coursera . Artificial intelligence using tensorflow . When I run the. Assignment number 3 . It says kernel died and will restract automatically
No, he does not skip any topics. These videos are from coursera and have questions in between the videos so that is the reason there are cuts between the video.
you ease our minds that are complicated by other professors and we are thankful for that!! 🙏
at 2:05, how does a pool operator change the channel size 192 to 32? Does pooling over channel make sense?
Think NIN was applied after the pooling to make the number of channels match, but not sure why its heights and weights are still 28*28 after pooling
Pooling and CONV are acturally similar, the output shape of each filter of them can be calculated by (n(l-1)+2p(l)-f(l))/s(l)+1. Sometimes we want to keep the output as the shape of input, that was saying n(l)=n(l-1) where n(l)=[n(l-1)+2p(l)-f(l)]/s(l)+1. Usually we set s=1, that's simplified as n(l-1)=n(l-1)+2p(l)-f(l)+1, that means we could keep the shape by setting padding as p=0.5[f(l)-1]. For exapmle, if we have a pooling filter shaped 5 by 5, our padding should be 2.
The size of filters in pooling would be 28 * 28 *192 , and the amount of filters would be 32.
But i dont have ideas for you second question, sorry.
that's because we are using the same padding! And by the way what is a NIN?
I am also confused about it.
best explanation available online 👍
also free.
How to choose the number of filters? 3:50
Why 1x1 uses 64, 3x3 uses 128, and so on?
well, there is no hard rule for that, Rule of thumb is, More the number of filters, More feature you are extracting. Also, keep in mind, not all the features are important (some people think more the features are better the model will be, this is not true at all) and this could lead to overfitting and computational overhead. And It is totally problem-specific (choosing filter size). It is hyper-parameter in itself. Lastly, You could do a hyper-parameter search to get the best filter size (that would be insane because you have multiple layers and each layer has a filter).
you can change P(padding) to have the same 28x28
@@akashkewar Hey, could you explain how come 28*28 remains to be 28*28 after 3*3 filters and also for others? I get for 1*1 it remains to be 28*28 as it is (28-1+1)=28.In the similar manner is not it like( 28-3+1)=26 for 3*3?
@@gourabmukhopadhyay7211 Good point indeed.
And this ( 28*28 remains to be 28*28 after 3*3 filters ) is done, by setting padding='same'.
So every time the output shape will be 28 * 28.
Checkout out below code see the result.
```py
from keras.layers import Conv2D
from keras.models import Sequential
models = Sequential()
models.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 192), padding='same'))
models.add(Conv2D(64, kernel_size=(3, 3), activation='relu', padding='same'))
models.add(Conv2D(128, kernel_size=(3, 3), activation='relu', padding='same'))
models.summary()
```
OUTPUT
```
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_3 (Conv2D) (None, 28, 28, 32) 55328
conv2d_4 (Conv2D) (None, 28, 28, 64) 18496
conv2d_5 (Conv2D) (None, 28, 28, 128) 73856
=================================================================
Total params: 147,680
Trainable params: 147,680
Non-trainable params: 0
_________________________________________________________________
```
@@RohanPaul-AI Yes, I also figured that out that padding was same. But still thank you for making time to comment here as it helped me to confirm.
7:59 the value of the convoluted matrix should be 24*24*32, since the 28*28 when convoluted with a 5*5 filter will return (28-5+1) 24.
Not necessarily, padding can lead to a matrix of the same dimensions.
@@aangulog but it wasnt mentioned that we are using padding , yes you are correct tho we can get that output using padding
@@harshniteprasad5301 Maybe it's implied, because you can say the same for the stride.
give this guy a good mic
😅
🤣lamo
If u r watching on laptop/pc try toggling on the stable audio option
question regarding the 1x1 conv strategy at 6:00
i understand that this trick reduces the number of parameters. but what i don't understand is how it is comparable to the original 5x5 conv.
from my understand this would create completely different features because it does not use the original input of the layer but the output of the 1x1 conv. So what's the point?
Update: Ah okay he mentions this thought at the end of the video. It seems there is no big impact on performance "if you choose the reduction right".
3:12 rap right there :D
Absolutely lucid, as ever. 👏
P
P
what does "same" mean? Does it mean have the same height and width as the previous layer?
Exactly, as he mentions, you will need to add padding for that
essentially just applies filter, then pads such that the output image has same width and length as the input
Andrew is HUGEEE!☺
@8:84,why we need to multiply with output 28*28*16 instead of 1*1*192*16
That was very useful for me, thank you so much
thank you very very much🥲🥲🥲🥲🥲🥲
why did you use max pooling and do the same padding i thoughout the utilization of max pooling is to divide the dimension ??
GREAT SIR
amazing sir..thank you
Really helpful!
nice explanation.need to watch again
matt damon
What i dont understand is how an input image could have 192 multiple channels..? Is there a common type of usage where inputs are not only consist of R, G and B channels?
I think the input layer he is talking about is the inception module that resides somewhat deeper in the inception network. If you look closely to the overall inception model, there is a lot of hidden layers before this model kicks in. So it's actually the 'general' inception model that he is talking about rather than the overall architecture itself.
If you are familiar with the idea of edge detectors, then these 192 multiple channels are used to detect
many different features from the image or in other word extract features. I guess you are watching the videos from the middle I suggest you, go through the whole playlist and watch videos one by one.
Also don't forget that this 28x28x192 input could be the concatenated output from the previous inception module and probably occurs quite deep in the model so that's why the number of channels is high
I was having the same doubt but here 192 represents the the concatenation of results from different kernals passed over the image
Now I gotta watch Inception again.. 🤔
what I can not understand is that how after applying 5x5 or 3x3 filter still we have 28x28 output as we saw in earlier lecture we can found it by nh-2p-f+1/s.
The answer is padding
Also, floor((nh+2p-f)/s + 1)
Inception Network Motivation *CORRECTION*
At 3:00, Andrew should have said 28 x 28 x 192 instead of 28 x 28 x 129. The subtitles have been corrected.
Why he's soo many filters? Can anyone explain me?
Why the output dimension is still 28*28
Same Padding. You add the exact amount of padding so that your output dimension is the same as your input
@@justforfun4680 I also think so
I have a question. Let At first convolution layer if we apply 32 filters on a gray scale image then output of first layer would be 32 matrixes or say 32 filtered images. Then at second layer if we are applying 64 filters then does it mean that we are applying 64 different filters over each of 32 filtered images???? And output of second layer would be 64*32=2048 filtered images???. Plz let it clear if anyone can
you apply filter for feature extraction and not to recreate filtered images
@@manu1983manoj then what would it be?
@@kirandeepsingh9144 based on filters it will extract features which will give you scaled down martrices. Dimesnsions will depend on the filter dimensions.
The 64 filters must have a lower dimensionality than the 32 activation maps...A simple rule is that when you decrease the dimensionality of a filter the no. of activation maps(outputs) from that filter increases keeping in mind a constant stride is taken into account. Basically to extract more precise features out of the input activation maps, you increase the no. of filters and reduce their dimensionality.
Is it a way to reduce 28 * 28 * 16 to the maximum?
Is it possible to reduce to 28 * 28 * 1?
Yeah, I'm wondering too.. Is it hurt the data to reduce such a low 3rd dimension at the bottleneck layer?
Would have been nice if a comparison of computations required for 1x1 and 3x3 convolutions were provided
There are some nasty and offensive commercials comes during viewing this video, I think Andrew *should* do something about it.
When I try to work on Coursera . Artificial intelligence using tensorflow . When I run the. Assignment number 3 . It says kernel died and will restract automatically
He keeps skipping most of the topics.
No, he does not skip any topics. These videos are from coursera and have questions in between the videos so that is the reason there are cuts between the video.