Found this gem after wasting my time on several 'fancy' deeplearning video tutorials. "If you can’t explain something in simple terms, you don’t understand it." - Feynman
The way Andrew deconstructed the 3D convolution into a simple series of steps just goes in to say how great teachers can accelerate learning by manifolds.
By far the best explanation I have ever seen. Such simple and crisp! I had one doubt though professor, can we use CNN with data apart from images? If so, what does the filter size represent then? And how do we interpret the features of the data in terms of number of input channels?
The formula in the summary is wrong, it should be (n x n x c) input, (f x f x c x z) filter, and (n-f+1 x n-f+1 x z) output dimensions - for z output filters and c input channels. So the convolution is a 4d tensor.
You just accounted for the fact that there could be more than one filter and so the same number of output channels. I think the prof wrote with regard to having one filter in the summary. Not necessarily wrong ig
First Nine Numbers from red channel then 3 beneath green channel then 3 beneath blue channel? i didn't understand that aren't we taking 3x3 from each color channel?
..so, in every 4 X 4 convoluted matrix's pixels , u put the sum of the products of the kernel pixels for the respective 3 X 3 of the imput image, for every channel (RGB)? meaning u sum the output of the dot product (kernel by respective pixels on the image) of every channel the number of one pixel in the convoluted matrix ?
ok, so at first i was a little confused by what adding all the filters at last mean. say pixal at position (0,0) for RGB are 20,10,30 after applying filter adding all the channels means [20,10,30] and not [60] . correct me if i am wrong.
Thanks for video , i have a question , why does convolving a 6x6x3 * 3x3x3 = 4x4 ( which is a 2D ) we convolved 3D objects , so the output should be in 3D ?
Yes. Instead of thinking of it as 3x 2D convolutions added together, try thinking of it as 1x 3D convolution. It's still an element-wise product and sum of the cube of filters (or kernel) and a 3D portion of the stack of images.
@@Vishnupratap you know that not possible to apply that filter to all 3 layers at once programmatically it must be done iteratively, but I think what Andrew did not say is that when you apply the filters to each layer s you get single value a summation of the 3 filter outcome goes to the 4 x 4 matrix, that is why you don't get 4x4x3 but 4x4x1...
It's funny how concepts like this can be so confusing when you don't know it. I had no idea the conv layers had an extra unconfigurable dimension and going from 3d to 2d confused me.
Why is the output 2 dimensions? If you convolve over a 2d image with a 2d filter, you get a 2d output. Wouldnt this mean if you convolve over a 3d image(R, G, B) with a 3d filter, then the output should be 3 dimensions as well right? Edit: I think I get it now. It's because the size of the 3rd dimension is the same for both the filter and the rgb image, so it only has to convolve over the z axis once, producing a 3rd dimension size of 1 in the output. So technically the output is 3 dimensions, it's just that the 3rd dimension is a size of 1 which is basically just 2d If you convolved over an rgb image with a 2x2x2 filter, than the output would then be 3 dimensions.
Awesome. Hv 4 questions, scratching my head for the last 2 weeks. In my conv layer 1, I mentioned 32 filter , does that mean 32 diff features will be extracted from each image sequentially, am using greyscale image 28x28x1. Is it possible to make the filters to apply in parallel . Next, In the case of multiple filters , can the filters applied on the image in parallel or in sequential ? How to influence the conv layer to use multiple filters ? Next question is, how to override the default filter by custom filter type ?
@MattAufF5 thanks a lot. But still I hv one nagging question... Let's say if 32 filters ( feature detectors) applied on a single image won't it cause any contention ?
It think that it is because he is using the 3 filters as a cube. Thus, after the multiplication, you should sum everything. For the output to be 4 x 4 x 3 I think it would be necessary to have 3 filters for each channel
Why are you stacking the features on each other? I don't get it! normally don't we just SUM UP the features so we have only one layer of features (e.g. horizontal + vertical edges)?
That really confused me as well. I had to step back and understand how a computer reads an image. Computers reads an imagine as an example 6x6x3 volume. Breaking it down you have matrix of 6x6 for red color, 6x6 for green color and 6x6 for blue. They refer to the colors as 'depth' or 'channel'. With that being said, when you convolve the filters with the input image, you have to apply it to all 3 'channel' (colors). That's why one filer is again as an example 3x3x3. Watch just the introduction part in this video ruclips.net/video/umGJ30-15_A/видео.html
Kavita Bhosale I might be wrong, but I think that won’t be of much use. Since such network will just learn to match the input with the training images. It won’t be able to extract lower level features such as edges etc. It probably will show impressive performance on the training set but would not generalise well. Hoping for feedback from specialists on the topic.
Found this gem after wasting my time on several 'fancy' deeplearning video tutorials.
"If you can’t explain something in simple terms, you don’t understand it."
- Feynman
can't agree more
yeah they all just use fancy words like keras, tensorflow blah blah blah
THANK YOU for ending my 4 days 9 hours search on understating CNN first layer input data structure/computation.... Moving on to the next step
Same here👍👍👍
The way Andrew deconstructed the 3D convolution into a simple series of steps just goes in to say how great teachers can accelerate learning by manifolds.
Best explanation I've found about convolutions over multiple channels. Thanks.
He explains this so well that I want to binge the entire playlist.
Blessed are the people who are passionate about nn and just made it into stanford to attend lecture given by this legend
Such a calm, clear and graphically nice represented explaination. Thanks.
Finally, someone who can clearly explain the material!
Thank you so much! This video helped me to understand CNN very much!
The most effective way of explaining depth(no of channels) of CNN
thankyou sir for having great people like you in this life
By far the best explanation I have ever seen. Such simple and crisp!
I had one doubt though professor, can we use CNN with data apart from images? If so, what does the filter size represent then? And how do we interpret the features of the data in terms of number of input channels?
Excellent. Convolution over volumes was bugging me for a long time.
thanks for clarifying that the filter is channel deep
Dude I really was searching this for 2 days but there was no clear explanation on volumes thanks a lot
Great! So a conv64 basically applies 64 different filters on segments of the input.
The formula in the summary is wrong, it should be (n x n x c) input, (f x f x c x z) filter, and (n-f+1 x n-f+1 x z) output dimensions - for z output filters and c input channels. So the convolution is a 4d tensor.
You just accounted for the fact that there could be more than one filter and so the same number of output channels. I think the prof wrote with regard to having one filter in the summary. Not necessarily wrong ig
First Nine Numbers from red channel then 3 beneath green channel then 3 beneath blue channel? i didn't understand that aren't we taking 3x3 from each color channel?
WOW, paham juga akhirnya, thanks
thank u sir! You are the real hero.
Deep learning k one and only Jeetu bhaiya :)
Can we use different filter sizes in the multiple filter case? And what will be the output shape then?
Answer mila?
..so, in every 4 X 4 convoluted matrix's pixels , u put the sum of the products of the kernel pixels for the respective 3 X 3 of the imput image, for every channel (RGB)? meaning u sum the output of the dot product (kernel by respective pixels on the image) of every channel the number of one pixel in the convoluted matrix ?
great video. Thank you !
very clear explanation thank you
ok, so at first i was a little confused by what adding all the filters at last mean. say pixal at position (0,0) for RGB are 20,10,30 after applying filter adding all the channels means [20,10,30] and not [60] . correct me if i am wrong.
Thank you so much sir
Thanks for video , i have a question , why does convolving a 6x6x3 * 3x3x3 = 4x4 ( which is a 2D ) we convolved 3D objects , so the output should be in 3D ?
You are my hero. Thank you so much
@3:11 : So do we add the 3 convolution to output the value of the 4x4 feature map ?
Yes. Instead of thinking of it as 3x 2D convolutions added together, try thinking of it as 1x 3D convolution. It's still an element-wise product and sum of the cube of filters (or kernel) and a 3D portion of the stack of images.
Why do we add the 3 convolutions, why not take thake their average value?
@@robbellis5944Why do we add the 3 convolutions instead of taking their average value?
5:51, I don't understand why the result is not 4x4x3, but 4x4. So where are the 3 layers?
it seems like each layer resulted from dot product is added up to a single number. That means, you have 3x3x3 (27) multiply operations that sums up.
3:21 answered my question, add them all those numbers.
Why is the RGB convolution output not a 4x4x***3*** image?
The filter is applied to all 3 layers at once in a step, to get a single output. Simple
Since after applying the 3-channel filter across the 3 channels of the image, we get a single output, so **1** is the 3rd dimension of the output!
@@Vishnupratap you know that not possible to apply that filter to all 3 layers at once programmatically it must be done iteratively, but I think what Andrew did not say is that when you apply the filters to each layer s you get single value a summation of the 3 filter outcome goes to the 4 x 4 matrix, that is why you don't get 4x4x3 but 4x4x1...
Thank you , very well explained :)
It's funny how concepts like this can be so confusing when you don't know it. I had no idea the conv layers had an extra unconfigurable dimension and going from 3d to 2d confused me.
beautiful
Why is the output 2 dimensions? If you convolve over a 2d image with a 2d filter, you get a 2d output. Wouldnt this mean if you convolve over a 3d image(R, G, B) with a 3d filter, then the output should be 3 dimensions as well right?
Edit:
I think I get it now. It's because the size of the 3rd dimension is the same for both the filter and the rgb image, so it only has to convolve over the z axis once, producing a 3rd dimension size of 1 in the output. So technically the output is 3 dimensions, it's just that the 3rd dimension is a size of 1 which is basically just 2d
If you convolved over an rgb image with a 2x2x2 filter, than the output would then be 3 dimensions.
brilliant!
Awesome. Hv 4 questions, scratching my head for the last 2 weeks. In my conv layer 1, I mentioned 32 filter , does that mean 32 diff features will be extracted from each image sequentially, am using greyscale image 28x28x1. Is it possible to make the filters to apply in parallel . Next, In the case of multiple filters , can the filters applied on the image in parallel or in sequential ? How to influence the conv layer to use multiple filters ? Next question is, how to override the default filter by custom filter type ?
@MattAufF5 thanks a lot. But still I hv one nagging question... Let's say if 32 filters ( feature detectors) applied on a single image won't it cause any contention ?
@MattAufF5 awesome, thanks a ton
From 3×3 convolution how comes 4x4?
6:02, i was expecting the output to be 4 x 4 x 3. why it was just 4 x 4 ?
It think that it is because he is using the 3 filters as a cube. Thus, after the multiplication, you should sum everything. For the output to be 4 x 4 x 3 I think it would be necessary to have 3 filters for each channel
nice explanation
Thank you!!!
Thanks a lot!!
Are the filter values trainable?
That is the whole point.
Why are you stacking the features on each other? I don't get it!
normally don't we just SUM UP the features so we have only one layer of features (e.g. horizontal + vertical edges)?
That really confused me as well. I had to step back and understand how a computer reads an image. Computers reads an imagine as an example 6x6x3 volume. Breaking it down you have matrix of 6x6 for red color, 6x6 for green color and 6x6 for blue. They refer to the colors as 'depth' or 'channel'. With that being said, when you convolve the filters with the input image, you have to apply it to all 3 'channel' (colors). That's why one filer is again as an example 3x3x3. Watch just the introduction part in this video ruclips.net/video/umGJ30-15_A/видео.html
@@ericksonramos4622 so at last Adding all three filter what does it mean, does the RGB [23,45,23] concerts to single value 51 ??
@@amitnair92 i dont quite follow what you said. Elaborate more. You dont add the filter data together. You slide or convlve them with the input image.
!thank you so much
Are these 3D convolutions ?
is it possible that input 1 X 1 X 155 and filter 1 X 1 X 155 for pixel classification
Kavita Bhosale I might be wrong, but I think that won’t be of much use. Since such network will just learn to match the input with the training images. It won’t be able to extract lower level features such as edges etc. It probably will show impressive performance on the training set but would not generalise well. Hoping for feedback from specialists on the topic.
1p x 1p is so much much much tinny input not generalize
output of rgb channels after convolution must be 4x4x3 right?
Is it possible that the number of filter channels greater than the number of input channels?
HEllo please is it possible to use 256*256*3 images for LeNet architecture .?
yes it is
anda perlu menjelaskan kandungan
a god
I am a beginner in the field of deep learning if there is anyone who can help me in my project