Visualizing Convolutional Neural Networks | Layer by Layer

far1din

Просмотров 92 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 26 янв 2025

Комментарии • 97

@Purely_Andy 2 года назад ⁺⁴³
I've been wanting a video like this for a long time, you nailed the balance between showing and explaining. I subscribed and liked the video.
@far1din 2 года назад ⁺³
Thank you my friend. Glad you got value out of it! 🔥
@swedenontwowheels Год назад ⁺⁶
This is an excellent video explaining how convolutional layers work at a glance! Fantastically well done!
@fdshdsfdsqq Год назад ⁺²
Man, you're amazing! This is a perfect visualization, thank you God give you health!
@far1din Год назад
Haha, thank you for the kind words my friend :D
@teofilciobanu Год назад ⁺²
Hi there! Pretty cool video :D
One suggestion I would have would be to show how the output looks if we change the input.
For example, at 2:50 and 4:30 a 2x10 and 4x10 grid respectively where the ith column is the layer output for the ith digit.
@joaogustavo6268 8 месяцев назад ⁺²
This is art! Never delete this video bro!
@ArashSoleimanifar Год назад ⁺²
You are fantastic! I look forward to watching your next videos. please produce mooore videos!
@far1din Год назад
Thank you brother. Working on it! 😃🚀
@AnaS-lp4mf 9 месяцев назад
Literally thank you so much for your video. You don’t know how much relief you brought me in understanding it! Your voice is so clear and your explanations were just spot on and easy to understand. I’ve spent like weeks trying to figure out how the pieces go together and you have completed the puzzle of confusion in my mind. Thank you so much !!!
@sugurufoundation4127 6 месяцев назад ⁺¹
Thank you very much for this explanation. I have been struggling to understand how exactly convolutions work, and you explained it perfectly. Thanks keep up the good work bro.
@Democracy_Manifest Год назад ⁺²
Excellent video. We needed this, thank you. Keep it up!
@dndr4818 Год назад ⁺¹
One of the best video explaining this topic. What a great job and thank you!
@programmingwithchatti4063 Год назад ⁺³
why is the background music soo soothing xD I want to focus not lost in thoughts xD
@far1din Год назад ⁺¹
Haha, next time myfriend. I will turn down the music 🚀
@r0cketRacoon 6 месяцев назад
WWOWW! I need a video visualizing CNN from layer to layer like that, and I encountered ur channel. The best CNN visualization for me now. Tks u!
@khayyamnaeem5601 2 года назад ⁺³
This is my favourite channel. I especially like it when you explain everything so nicely. I wish you a lot of success with the channel and happy life.
@far1din 2 года назад ⁺¹
That’s great to hear. I wish you all the best! 😃
@SterileNeutrino 4 месяца назад ⁺¹
Nice. I remember working on digit recognition using handcoded analysis of pixel runs a long time ago. It never worked properly 😂 And it was computationally intensive.
@AutobahnBuilder Год назад ⁺⁴
I gotta say your animation is great and the way you explain it is really good. It feels like 3blue 1brown and just as well explained. Keep the good work!
@far1din Год назад
Thank you my friend 🔥
@RAHUL1181995 Год назад ⁺¹
This was really helpful....Thank you so much for the vizualization...Keep up the good work...Looking forward to your future uploads.
@boramin3077 6 месяцев назад
Amazing work! Please continue to make more videos on other ML topics. I find your videos are really helpful to understand the concepts.
@maxedin7910 Год назад ⁺¹
Amazing visualization! I wish for more short vids like this!
@hackeymabel8296 Год назад ⁺²
thank you really needed this
@mehdismaeili3743 Год назад ⁺¹
excellent. 3 video and 1.89 k subscriber !!! it's amazing, and you are amazing .
@far1din Год назад
Thank you my friend! 💯
@zafer-x1v 16 дней назад
Wonderfullexplanation. I appreciate it.
@alfashow2439 Год назад ⁺¹
Great video! you got me interseted!
@_aitalks_ 2 года назад ⁺²
Nice walkthrough!
@far1din 2 года назад
I am glad you enjoyed the content!
@sivaladeepak1538 Год назад ⁺¹
Great Viz!!!
@websciencenl7994 2 года назад ⁺¹
Great video, Thank you!
@lapusikrid475 2 года назад ⁺²
Great video! So good and affordable explanation! Thank you, it helps me a lot!)
@fiya.. 2 года назад ⁺¹
Nice video! Thank you!
@jaberfarahani6645 4 месяца назад
the best channel❤
@KarolBura 11 месяцев назад ⁺⁵
thank you from Silicon Valley Startup
@Sowmya_codes 9 месяцев назад
Hello user-zn1hm6tk1o Do let me know if you're in need of an intern !
@IvoVacca Год назад ⁺¹
Very cool.
@suthanraja1657 4 месяца назад
Thank you so much!
@ratfuk9340 2 года назад ⁺⁴
4:10 How are the outputs of the two convolutions for each filter combied to get 1 filter map / filter?
@far1din 2 года назад ⁺²
So, you multiply the values from one filter with the corresponding input layer. Then you add together the outputs + the bias. Then you pass them through an activation funtion (in this case the sigmoid). After this you move the filter one step to the right (since the stride is one) and do the same thing until everything is covered.
This is hard to explain in text. I will do my best to get out an in depth video by this weekend where everything will be explained in detail! 💯
@ratfuk9340 2 года назад ⁺²
@@far1din I think I get it. Anyway thanks for making these videos, they help a lot.
@far1din 2 года назад ⁺¹
@@ratfuk9340 ruclips.net/video/jDe5BAsT2-Y/видео.html
@locngoxuan1329 2 года назад ⁺¹
it's really informative for me. Tks bro!!
@far1din 2 года назад
Thank you brother. I will try to post a more in depth video soon!
@ForbiddenPrime 4 месяца назад
Thank you for the source code. this will help me to create some content for my syllabus. with love
@far1din 3 месяца назад ⁺¹
Glad it was helpful although not the most ideal code 😂
@randomtypebeats7348 2 года назад ⁺²
keep em coming brodie!
@far1din 2 года назад
For sure! :)
@KhanhTran-qp5pm Год назад
Love your video.
@nvmvel 2 года назад ⁺¹
Wow. Such a high quality content with great effort. U deserve million subscribers.. Hope soon u ll get popular and make more videos, 🙌
@far1din 2 года назад
Haha, stay tuned! 🚀🔥
@emmanuelsheshi961 5 месяцев назад
nice work sir
@mattialoszach6405 9 месяцев назад
Awesome video🫶
@ahamedismail1630 Год назад ⁺¹
So we use filters to catch the features, and maxpooling to capture the most active feature? Thus if we use maxpooling of higher dimensions, we loose the other less significant features to reduce the model complexity but results in loss of information?
@far1din Год назад
That is the idea and it seems to be working in practice. However, people use average-pooling and min-pooling also. It all comes down to what yields the best result in the end! :D
A lot of this is just trial and error until the best performance is acchieved.
@EpromLee 10 месяцев назад
thank u!!
@emadilabid 7 месяцев назад
Bravo dude!
@Leuel48Fan Год назад
4:24 kind of wild how after this layer there's no resemblance or perceived logic to our eyes how it's characterizing the digit. Can you show maybe 10 examples each of the 10 different digits at this step (just the output)? I wonder if they will look similar in pattern to each other right before the fully connected final layer.
@far1din Год назад
It is very strange how all og this just «works». I will try to do that for one of my next videos! 😊
@abdullahhashmi654 Год назад ⁺²
For the convolution, you have two filters through which you pass the original image and you get two images which have been passed by the filter. But for the second convolution, you have 4 filters each for the two images, so a total of 8 filters for the two images. 4 for one and 4 for the other. But in the visualization, dissimilar to the first convolution, you run both the filters through both the images in parallel and then combine them. So instead of getting 8 images from the second convolution, we get 4. Why?
@far1din Год назад ⁺⁷
Great question my friend. This is one of the reasons why I felt I had to make this animation.
First of all, in the second convolution, we have one “image” consisting of two channels. You can think of an RGB (colored) image. These images consists of three channels that you stack together. See reference 1 for image. This basically implies that if you have a 28 x 28 pixel RGB image, when you extract the values for the three channels (Red, Green and Blue) you will have three matrices. The dimensions are therefore actually 28 x 28 x 3, where 3 is the number of channels.
Now, when you build a convolutional neural network, you choose how many filters you want in each layer. If you have X amount of channels in the inputted image, each of the filters will have X amount of channels. Examples:
Input = 28 x 28 x 3 and you choose five filters with dimensions 5 x 5, you will technically have five filters that are 5 x 5 x 3. Let’s say you run the convolution. Your outputted activation layer will have the dimensions 24 x 24 x 5. Where 5 is the number of channels.
This implies that if you choose to have a convolutional layer next with two filter with the dimensions 7 x 7, these will actually have the dimensions 7 x 7 x 5.
I hope this made sense and I know it can be a bit confusing. I highly recommend you watch Andrew NG’s video on convolutions over volumes (Reference 2) as he explains the concept very well.
Reference 1: media.geeksforgeeks.org/wp-content/uploads/Pixel.jpg
Reference 2: ruclips.net/video/KTB_OFoAQcc/видео.html
@nithyaprakash5812 Год назад
Great job.. U got a new subscriber 🎉😊
@leesohaeng Год назад ⁺¹
hi, how to make 5*5 kernel? just random?
@far1din Год назад
It is initialized randomly if that is what you were asking! :)
@Brandonator24 4 месяца назад
I'm curious, why is the first convolution using ReLU and then later convolutions using sigmoid?
Edit: Also, when convolving over the previous convolution-max pooling output, we have two 2 images, how are the convolutions from these two separate images combined? Is it just adding them together?
@far1din 4 месяца назад ⁺¹
Hey Brandon!
1. The ReLU and Sigmoid are just serving as examples to showcase the different activation functions. This video is just a «quick» visualization from the longer in depth version.
2. Not sure if I understood, but if your referring to the filters, they are added. I go through the math behind this with visualizations in the in depth video. I believe it should clarify your doubts! 😄
@Brandonator24 4 месяца назад
@@far1din Will be checking that out, thanks!
@triplemmm3 2 года назад ⁺²
What do use use for making videos ?
@far1din 2 года назад ⁺¹
Manim!
www.manim.community
@vivek65522 Год назад
Great Video, a quick question, does kernel values (in every conv. layer) changes during training of CNN? Does the kernel changes it self so as to only extract exactly the shapes and pattern required for classification ?
@kneehaw8093 Год назад
Convolutional kernels will train their weights such that they “learn” whatever feature detection is most desirable.
I believe that is a “yes” to your question.
@far1din Год назад
1. Does the kernel values change? It depends. Normally it does, but you can also freeze the network and train for example only the last 3 layers. This is common for transfer learning: ruclips.net/video/FQM13HkEfBk/видео.html
2. It definetly looks like that is the case if you look at a network post training. Why it extract certain shapes and patterns are still unknown for me.
@very_old_dog Год назад
I didn't get why in 1st convolution after passing through 5*5 filter image of size 28*28 we've ended up with image of size 24*24. Stride is 1 hence filter should perfectly marsh till the end of the image with no bypassing certain pixels. Probably I miss smth but what?
@far1din Год назад
Brother, for example, let’s say you have an image with the size of 7 x 7 and a filter that is 5 x 5. You start at the left, where the left side of the filter touch the left side of the image. You slide it to the end where the filters **right** side touch the right side of the image. Since the filter is 5 x 5, you can only slide it two times to the right. Essentially, 5 «squares» of the image is already covered when you start, and there is only two more left.
I will highly recommend you draw this out and «scale down» the image to 7 x 7 or something to get some clarification. I’ll also link a video that show the process.
Watch the animations in this video for clarification: ruclips.net/video/tQYZaDn_kSg/видео.html
@very_old_dog Год назад
@@far1din Bro, I've got it !!! Thank You so much for a great clarification and a link. That helped a lot.
@ritwiksharma4359 Год назад
Why there are 2 parts of each filter in layer 2? Is this just for the different color channels?
@far1din Год назад
Yes my friend. If there where four channels that were convoluted over, then each filter would have had four channels as well.
The amount of channels in the filter will mirror that of the input! 💯
@richardfoldesi3980 2 года назад ⁺¹
best
@far1din 2 года назад
🚀
@paedrufernando2351 Год назад ⁺¹
why use 2nd convolution with 8 filters
@far1din Год назад
Hello my friend, the number of filters are typically choosen. A lot of deep learning is trial and error. I used 4 filters in the second convilution and these filters had two channels. Therefore, technically four filters. There is no underlying reason for this as this video was just meant to visualize the process. I could have gone with 10 filters, but the video would have been impossible to render.
@phanstudio Год назад
How do you create the visualizations
@far1din Год назад
Manim!
docs.manim.community/en/stable/
@figloalds Год назад
If you start from the last layer with a 1 on the 7 and 0 on the rest then do the reverse process, how would the image look like?
@far1din Год назад
My friend, I do not understand. Could you explain in detail? :)
@figloalds Год назад
@@far1din i mean trying to reverse the process from a given output all the way to a generated image and see a visualization of what the network "understands" of a 7 for example. I was seeing some videos of people reversing AI outputs in this way for deep dream and gpt and they found very weird stuff, like for example very weird tokens in GPT that causes the model to completely break
@Leuel48Fan Год назад ⁺¹
@@figloalds I believe I've seen this tried somewhere and it'll look nothing like a '7'. Basically it can't generate images, you'd just get random gibberish static looking pixels turned on. I also think if you input a smiley face or letter, unless specifically trained for "not a number" scenario, you may get an inconclusive output or falsely confident on one number.
Even though the math of the process checks out, nothing about this whole concept makes any intuitive sense, which is why I find it fascinating tbh lol
@far1din Год назад
@@Leuel48Fan Yes haha. I believe I have seen this been done on a dumbbell. It was very noisy, but you could still see the outer line. However, I was unable to find it now 🤷🏾‍♀️
@far1din Год назад ⁺²
@@figloalds Ahaa, yes I know what you are talking about. This is a great idea. This would be what the computer perceives as the ideal «7» or ideal «1». I will try to make a video on that! 🚀
@Dreadwinner 2 года назад
🧡🖤
@emotional_stuff 2 года назад
Love it! Sub+
@gamercatsz5441 Год назад ⁺¹
Then you verify it’s a 7,and the AI algorithm gets better. Does the machine learn what is a 7? or does it learn how to get positive feedback from us. If we use advanced AI to ask advanced questions. Do we actually know for certain, it is not lying to us to get good feedback? Advanced AI like chat gtp 4 which is rapidly advancing will already is much smarter than 99% of humans, potentially is a lying manipulating horrible thing. It’s better to make 100% sure it is truthfull and fully understand intentions before making it smarter and more powerfull, train it woth more data, more computers, ect…
@far1din Год назад
Hello, my friend. Here is my take on this topic. First of all, these kinds of AI models are mathematical models that learn to get the "correct" response. This is called supervised learning, where the model is trained on labelled data. It is hard to define whether or not a model is lying, as you have to define lying. Is it lying if the answers are correct? The concerns are valid, as models can converge towards outputting responses that are not correct but perceived as such. This is something researchers and users should be wary about.
On the other hand, human intelligence and deep neural networks seem to take different paths. Even though deep neural networks are inspired by how the brain works and are capable of many of the "intelligent tasks" we throw at them, the learning process is not the same. Modern deep learning models use backpropagation while humans don't (or at least, that is the perception for now). Although we come to the same answer, the consciousness factor might differ. Therefore, it is hard to tell if a model is lying because lying is something I would like to think of as a conscious act.
You may find it beneficial to watch an interview with Geoffrey Hinton, as he discusses some of the concerns you've raised: ruclips.net/video/qpoRO378qRY/видео.html
@justinnejbauer 7 месяцев назад
@@far1dinI would argue that humans do use back propagation. When you make a mistake in life, how many times have you looked back to ask yourself what you would have done differently to minimize the negative effects.

Следующие

Автовоспроизведение

Convolutional Neural Networks from Scratch | In Depth