Inside a Neural Network - Computerphile

Поделиться
HTML-код
  • Опубликовано: 6 янв 2025

Комментарии • 312

  • @pw7225
    @pw7225 8 лет назад +355

    Dr Pound is the best lecturer here. Very clear, intelligently funny, interesting topics.
    Would deserve his own channel

  • @theREAL9er
    @theREAL9er 8 лет назад +186

    The pictures he printed of the layers helped me grasp the concept so much better than other videos, so thank you

  • @aplcc323
    @aplcc323 7 лет назад +4

    Computerphile, you single handedly helped me regain my interest with computer science.
    Thank you very much for all your videos (:

  • @mohammaddawas481
    @mohammaddawas481 6 лет назад +20

    This is the best explaination of what is going on inside a neural net! Now I can imagine it more clearly
    Thanks alot!

  • @ProphecySam
    @ProphecySam 8 лет назад

    I've been studying neural network for the last couple of months and haven't come across any resource that explains it with this perfection. You have made it so easy with the visualization.
    I'd really appreciate more videos on topics like RNN, how to set number of layers, filters etc (hyperparameters).

  • @rhoneletobe
    @rhoneletobe 8 лет назад +1

    So useful. As a CS student, this was more helpful than a ton of other DLNN stuff I've seen online. Thank you!

  • @oliviamay
    @oliviamay 8 лет назад +16

    Loving these videos with Dr. Pound, keep it up!

  • @SomethingUnreal
    @SomethingUnreal 8 лет назад +61

    It'd be really interesting to take a network trained to detect random objects as seen by a camera, then give it the live feed from a camera and watch the activation of each neuron in realtime as the object moves about in the camera's view, or rotates around the object, etc. I guess the earlier layers would change a lot, while the deeper layers (which have a better idea of what constitutes an object) would change less.

    • @Vulcapyro
      @Vulcapyro 8 лет назад +5

      Projects like this have been done, but only in the sense that they usually just output the most probable class(es) because that's usually the only real way to deal with the amount of information. For modern networks you should be able to visualize activations of a single layer in real-time, but the number of pixels you'd need for a given layer can range from thousands to millions. So doable, but probably not easy to visually parse just by looking at it.

    • @smileyball
      @smileyball 8 лет назад +1

      Consider looking into visualization approaches (like saliency/heat maps/deconvolutional neural network) and approaches that focus on maximal activation (like Google's DeepDream)

    • @MadJDMTurboBoost
      @MadJDMTurboBoost 7 лет назад +1

      SomethingUnreal Id imagine if it was programmed properly and trained long enough, it may look similar to an fMRI.

  • @Aarrmehearties
    @Aarrmehearties 3 года назад +6

    Massively interesting and well presented, even for my aging neural network!

  • @astropgn
    @astropgn 8 лет назад +19

    In the last video I asked how the images were in these various convolutions. I knew that they wouldn't be nothing like the input image, but I was very curious to see the process anyway.
    And now you make a video answering exactly what I wanted! Thank you so much! :)

  • @tho207
    @tho207 8 лет назад +5

    what a fantastic explanation, I loved the digits convolution representation
    hope to see more videos about this!
    (RNNs)

  • @talhatariqyuluqatdis
    @talhatariqyuluqatdis 8 лет назад

    This guy is my second favourite on computerphile. Lovin these demos

  • @aungthuhein007
    @aungthuhein007 7 лет назад +33

    Would love to have someone like him as my professor in my life!

  • @Kitsudote
    @Kitsudote 2 года назад

    Oh wow, this video made me understand neurological networks in an insanely deep way. Thank you!

  • @dolibert
    @dolibert 6 лет назад

    Mike and Rob, the stars of computerphile.
    Great content and nice puns.
    Keep it up guys

  • @Pfaeff
    @Pfaeff 8 лет назад +71

    Doesn't google use those captchas as a crowd sourced labeling technique for their own deep learning stuff?

    • @emosp0ngebob
      @emosp0ngebob 8 лет назад +8

      there's a computerphile video on that too somewhere, again a google project. you get shown two words, and the computer knows one of them and not the other, so when you type the two words in the computer learns what a word is. that's for transcribing libraries and things... i cant remember which computerphile video it was though.

    • @zubirhusein
      @zubirhusein 8 лет назад +1

      They do

    • @naturegirl1999
      @naturegirl1999 4 года назад +1

      I wonder if there’s a website that someone can go to to do the image things to help train the deep learning systems?

  • @black_platypus
    @black_platypus 8 лет назад

    Now I see it, too... For some reason, YT gave me the video at a lower resolution (not watching it in full screen mode, I hadn't noticed), and I was thinking "I don't understand all the people complaining about the video being "wobbly", the video looks fine to me"... Then I saw I wasn't watching it at 50 fps, so I changed the quality.
    I, too, find it a bit weird. I guess stabilization doesn't quite kick in as hard when there are more frames to interpolate between?

  • @cazino4
    @cazino4 4 года назад

    Excellent video! Visual seeing the neurons light up blew me away... It was like looking at an artificial, scaled down brain being imaged...

  • @Tggfredcsawsdfgbhhhhu
    @Tggfredcsawsdfgbhhhhu 4 года назад

    wow....
    I didn't expect to understand any of that, but it was all explained perfectly. It made sense. Awesome video

  • @CarterColeisInfamous
    @CarterColeisInfamous 8 лет назад

    13:16 what he wants to say is that if the images are segmented then its much easier. the segmentation problem is hard. That's why google captchas are all mushed up on each other. Google apparently fixed the segmentation problem by just training it to recognize multiple pairs of letters

  • @scientistgeospatial
    @scientistgeospatial 5 лет назад +1

    The best tutorial ever! Cheers, Mike!

  • @dino130395
    @dino130395 8 лет назад

    Seeing that a lot of people are confused by this video being 50fps, I'd want to clear that up. 50fps is a standard frame rate for television and video in general. 60fps is a standard for animated and generated images, like animations, or games. Sure, you can do either with both, but it's generally so that high-frame-rate TV broadcast are always 50, not 60 fps.
    The scale for TV: 25, 50, 100, 200 Hz
    The scale for Computers: 30, 60, 120 Hz
    (Hz = fps)

  • @crazygood150
    @crazygood150 8 лет назад +4

    who needs dual monitors when you have dual PC! Great video btw

  • @Wurzelbrumpft1
    @Wurzelbrumpft1 8 лет назад +1

    love this series about machine learning

  • @mimArmand
    @mimArmand 6 лет назад

    Very cool!
    @5:24 Grayscale is quite a few bits deep, 1-bit depth would be Black & White ( which is not the case in your images, looks like you have at least 16-bit images - if not 256-bit standard grayscale - )

  • @CarterColeisInfamous
    @CarterColeisInfamous 8 лет назад

    14:36 google captcha api morphs the filter the more samples you get until the training data is useless, also those image based captchas are also being broken after all the success of imagenet

  • @zakeryclarke2482
    @zakeryclarke2482 7 лет назад +3

    Please do a video on the maths of forward and back propagation and how they are implemented

  • @davidm.johnston8994
    @davidm.johnston8994 7 лет назад

    Thank you Mike, and thank you Shaun, this video is really helping me in my quest! I'm making a small game in which I'm trying to make an AI using the tensorflow library.

  • @davidscarlett5097
    @davidscarlett5097 8 лет назад +1

    How are the outputs of the multiple kernels at each layer managed? Are they somehow merged so that the kernels of the next layer all process the same input? Or do the 20 kernels of layer 2 operate on the 20 outputs of the layer 1 kernels respectively? And if the latter, then what happens when moving from a 20 kernel layer to a 50 kernel layer? Would some of the 20 kernels of the previous layer be duplicated twice, and others duplicated three times to make up the inputs to the 50 kernels in the new layer?

  • @shiphorns
    @shiphorns 8 лет назад

    After watching this, the one thing I don't feel is completely explained is where the convolution kernel values come from. At first he says they are things "like Sobel edge detectors", but later says they are not manually entered, but rather learned values. That leaves the obvious question of how are they initialized? Do they start as just matrices with random entries? During the training, how are they adjusted? Is the "training" some kind of iterative search for kernel values that give the strongest response (e.g. the values that most consistently uniquely identify the one digit being learned and most strongly reject the other 8 digits?) I could use a bit more explanation on what the training process looks like and how it adjusts all the kernels.

  • @probE466
    @probE466 8 лет назад +5

    Could you maybe link to the actual code? Would be interesting to look at the implementation

  • @displayoff
    @displayoff 8 лет назад +34

    I love this man.

  • @SinanAkkoyun
    @SinanAkkoyun 5 лет назад +2

    Dr Mike Pound, please make a tutorial series on q learning! In depth

  • @beautifulsmall
    @beautifulsmall 3 года назад

    wrote python convolution algos on bitmaps around this time just to self learn python, filters and convolutions are amazing to see in action . Its a little scarry to see how far we are now in 2021 . Covid hasnt stopped SW engineers .

  • @harborned
    @harborned 8 лет назад

    Fantastic video! Interesting to see "inside the mind" of a neural network

  • @jambalaya201
    @jambalaya201 4 года назад +1

    How do you even visualize the output of the NN?
    Crazy, this perspective is so insightful.

  • @alwinpriven2400
    @alwinpriven2400 7 лет назад +1

    3:43 but wouldn't it mean that the digit's 2? because we're starting at index 0, and index 0 is 0, so index 2 is 2.

    • @bentaye
      @bentaye 7 лет назад +1

      Index 0 is digit 1
      Index 1 is digit 2
      ..
      Index 8 is digit 9
      Index 9 is digit 0

    • @alwinpriven2400
      @alwinpriven2400 7 лет назад

      oh ok.

  • @JohnHollowell
    @JohnHollowell 8 лет назад +2

    I wonder, can you work backwards somewhat to get a general idea of what the original image looked like from the convolution layers?

    • @RoySchl
      @RoySchl 8 лет назад +1

      i don't think so, that would be like asking "what 5 numbers did i multiply to get 3600?" there is only on possibility if you do the multiplication, but many possibilities when you try to guess backwards, and with those convolutions it's the same thing just exponentially worse.
      basically you drop a lot of information

    • @DagarCoH
      @DagarCoH 8 лет назад

      Well, I'd say yes, as if you have all the information every layer puts out, you just have to reverse the process the first layer did on the data it gave. Since you have many processes on the one image, there should be much redundancy and therefore a high certainty. If you however only have the output of the sixth convolution layer, I highly doubt that you could get much out of it.

    • @compuholic82
      @compuholic82 8 лет назад +1

      Partially. The problem is that (in general) a convolution is not a reversible operation. However, you can apply something that is known as a "matched filter" which is basically a convolution with the transposed filter kernel. If you go backwards through the network you can (to some degree) reconstruct the input signal. If you look at this paper you can see how the reconstructions look like: arxiv.org/pdf/1311.2901v3.pdf
      And just to prevent confusion: The author calls it "Deconvolution". But he isn't doing a "deconvolution" as he describes in his paper. He is applying a "matched filter".

  • @sirivellamadhuphotos
    @sirivellamadhuphotos 5 лет назад

    @ 5:58 I got the point i am searching for. Thank you very much..

  • @jonathanstrasburg3609
    @jonathanstrasburg3609 7 лет назад +1

    I realize this isn't likely to get a reply this late, but I'm trying to replicate the configuration of this network. What activation function are you using for the first fully connected layer? Is it dotplus with a renormalization? I'm assuming FC2 is a softmax layer, so maybe they are both softmax.

  • @shanesrandoms
    @shanesrandoms 8 лет назад

    enjoying the neural net videos. looks like ANNs are coming back in after not really seeing much of it since the 90s.
    i remember my first exposure to the math ans theory behind this was an assembly program on my 8bit C64 back in the late 80s creating a 3 layer Back Propegation network

  • @feliceserra106
    @feliceserra106 8 лет назад +2

    Why are there 4x4x50 neurons after the last conv-layer?
    I get 4x4x(20^2)x(50^4) neurons, if every 5x5 kernel runs over every image from the previous layer.
    I'm confused.. maybe the kernels in the following layers are 3-dimensional? Like 20 5x5x20 kernels in the second layer?

    • @feliceserra106
      @feliceserra106 8 лет назад

      Now I understand. Thanks a lot!

    • @feliceserra106
      @feliceserra106 8 лет назад +1

      Got it, thank you!

    • @tenalexandr1991
      @tenalexandr1991 8 лет назад

      I have the same puzzle. Could you enlighten me?

    • @feliceserra106
      @feliceserra106 8 лет назад

      In short: "Each kernel is 5x5xD, where D is the number of features in the previous layer"
      I dont know why their answers are not showing up on youtube. Maybe a google+ thing.

  • @haakonvt
    @haakonvt 8 лет назад

    As always, a splendid video! However, every single clip taken from the angle where the pictures of the convolutions are visible, are out of focus. Pitty

  • @pequenoZero
    @pequenoZero 8 лет назад

    It would be nice, if you talked a bit about how much data is needed for a CNN to be any kind of useful. The datasets in this video seem extremely big. Specifically it would be nice to have an idea on how well it works on many "categories" with a low amount of data.

  • @SapphireCrook
    @SapphireCrook 8 лет назад +2

    I never knew YT even supported 50FPS. :O
    Also, cool computer learning. Today is a day of new smarts.

  • @Embedonix
    @Embedonix 8 лет назад

    Can you share the caffe scripts you used please?

  • @andreyguskov1697
    @andreyguskov1697 8 лет назад

    If the first convolution layer has 20 filter and the second one has 20, does thing mean that each C2 filter processes all 20 images from C1? That would make 400 images for C2 output

  • @the1exnay
    @the1exnay 8 лет назад

    would i be wrong in thinking that if you gave a convolutional neural network the ability to control where to click and what to type and gave it enough convolutions and kernals (perhaps beyond what current computers can handle) and trained it enough then it would be able to solve any captcha, even a new one with different interface that still used the same basic principles?

  • @SleeveBlade
    @SleeveBlade 8 лет назад +1

    really interesting! I would be interested tot see if it is possible to start from the final convolution and see which image fits it the best, as in 'what looks the most like a 2'.

    • @ruben307
      @ruben307 8 лет назад

      it would be interesting to know if there can be totally different pictures that just would get the same number. Similar to a hashing collisions.

    • @black_platypus
      @black_platypus 8 лет назад

      Well sure, that's basically the same concept (irreversible / one-way transformations giving you an abstract result)

    • @quakquak6141
      @quakquak6141 8 лет назад

      I saw somewhere a neural network that was trained to fool convolutional neural networks, sometimes it produced normal images (in this case it would have produced a 2) other times it produced something that looked almost like pure noise but it was still able to fool the networks

  • @lohphat
    @lohphat 7 лет назад

    How to you replicate the learned connections to other systems? How is the "knowledge" abstracted for transport, backup, and further improvements?
    With discrete programming, the instructions are compact and finite and are easily copied.

  • @marcoswappner8331
    @marcoswappner8331 5 лет назад

    Let's see if someone can help me out here. The first layer here outputs 20 24×24 images (or a 20 channel image) after performing all the convolutions. The second layer will output 20 20×20 images. But how are they constructed? How do they combine the 20 channels from the previous layer? I mean, they are not applying all 20 filters to each of the 20 channels, that'd be a 400 channel output. Do they simple add the convolutions for each channel up? So channel 1 of layer 2 is the sum of the convolution between kernel 1 of layer 2 with each of the 20 channels of layer 1?

  • @mikejones-vd3fg
    @mikejones-vd3fg 8 лет назад

    so how do the nueral networks do this? is there speed advantages to this network vs just regular processing?

  • @hotfrost_
    @hotfrost_ 2 года назад

    Thank you so much. This was very helpful.

  • @CDmc98
    @CDmc98 8 лет назад

    Shouldn't one be able to generate characters(letters, whatever) by going the other way around? I'm thinking what if you tell it to generate a picture from a fully connected layer?

  • @CarterColeisInfamous
    @CarterColeisInfamous 8 лет назад

    how were the kernels generated for this one?

  • @caw25sha
    @caw25sha 8 лет назад

    I am one of those strange people who draws a horizontal bar through the number 7. How would you deal with that? Would you need a separate set of 7+bar training digits (in effect an 11th character) and then map both 7 and 7+bar back to 7?

  • @tamebeverage
    @tamebeverage 4 года назад

    Excuse me if I have missed something obvious, but I'm not sure I understand what the input of, say, C2 is. Is it a sort of average of all of the images produced by C1?

  • @DracarmenWinterspring
    @DracarmenWinterspring 8 лет назад

    With all the edge detection going on, would it be harder to recognize a 4 if some versions had the top parts join at an angle, like the 4 in this font, versus the open version as in the video? Likewise a 7 with or without the strike through it? I mean, does it remember some kind of average of all the objects in a class or all of them / all of the sufficiently different ones (which might be hard for a large database)?

  • @thecakeredux
    @thecakeredux 4 года назад

    I imagined that each layer uses all its kernels on all the images of the previous layer. But that can't be right, hearing that the last convolutional layer here only outputs in a size of 50*4*4. Does that mean that there essentially are "kernel pipelines"? So kernel0 of layer1 will only be fed with the output of kernel0 of layer0?

  • @ericklestrange6255
    @ericklestrange6255 4 года назад

    amazing, i didnt know u could visualize the high rank features

  • @gryzman
    @gryzman 8 лет назад +2

    What's the library/program Dr Mike is using please ?

    • @realcygnus
      @realcygnus 8 лет назад

      Big Corporate Top Secret

    • @MrVbarroso
      @MrVbarroso 8 лет назад +7

      It's called caffee.

    • @kolby4078
      @kolby4078 8 лет назад +1

      using caffe in linux

  • @shifter65
    @shifter65 5 лет назад

    Love the visualization!

  • @andre.queiroz
    @andre.queiroz 8 лет назад +1

    people interested in this experiment, you can actually do it in the Machine Learning course (Stanford) on Coursera

  • @Locut0s
    @Locut0s 8 лет назад +1

    Very interesting! I wonder if this gives some insight into how neurones in our brains work on a very basic level?

    • @cmptrn825
      @cmptrn825 8 лет назад

      Points for making me look at my screen with my head turned 90 degrees to the left until I realize I look like a crazy person

  • @s.e.7268
    @s.e.7268 4 года назад

    it was very enjoyable, thanks for the video.

  • @bofk7306
    @bofk7306 8 лет назад

    Can you look at your last but one fully connected layer and calculate the typical "distance" between different digits? E.g. just euclidean distance on the normalized terms in FC1.
    Would those distances depend on your neural network you're using or would they be similar across all successful neural networks? That is could you say something like a 1 and a 7 are typically closer than a 0 and a 4.

  • @TheGameFreak013
    @TheGameFreak013 6 лет назад

    how do you know how many kernels, layers, etc. are best suited for your needs?

    • @m3n4lyf
      @m3n4lyf 6 лет назад

      That is an excellent question! Unfortunately it requires at least a moderate amount of knowledge in the subject matter to answer, so I doubt you'll be getting a satisfactory response from this resource any time soon.

  • @Will-lt4by
    @Will-lt4by 8 лет назад

    Can someone explain how the final convolutional layer is 4x4x50? My understanding based on the previous Neural Network video is that the first convolution will produce an output of 24x24x20, but then wouldn't the next convolution, which has 20 kernels, produce 20 images of the first image layer of the 20 produced from the first convolution, and then another 20 on the second image layer of the 20 produced from the first convolution, such that at the end of the second layer you'd have a 20x20x400 output, and so forth until at the end you'd have 4x4x(some large number) not 4x4x50?

    • @kazedcat
      @kazedcat 8 лет назад

      You decide the depth on each layer. So the first layer will have 20 different 4x4x1 kernels but the second layer will have 20 different 4x4x20. Then after that he uses 50 kernels of 4x4x20 and then 50 kernels of 4x4x50 until the last layer before the fully connected network

    • @fasligand7034
      @fasligand7034 7 лет назад

      wow. I didnt realize that kernels also got multi-dimensional on the way. thanks

  • @Zorbeltuss
    @Zorbeltuss 8 лет назад

    A thought that I've gotten when thinking about this and the previous episode, would it be possible to "reverse" the order of the convolutional neural network, getting a sort of idealized result, probably not extremely useful in most cases, but likely somewhat usable for seeing what extra data can be used to train it for more accurate results or perhaps some sort of data generation.
    Doing the same for a standard neural network would not result in any useable data I know, but it seems like it might be possible with the convolutional one.

  • @AlexanderTrefz
    @AlexanderTrefz 8 лет назад +2

    Would you not have 11 output options? 0-10 and NaD(Not-a-Digit)?

    • @OddlyTugs
      @OddlyTugs 8 лет назад

      This is single digit recognition, multi digit/character recognition is a whole 'nother can of worms. When there is no activation's on the output layer you know it is not a digit.

    • @AlexanderTrefz
      @AlexanderTrefz 8 лет назад +1

      Francois Molinier that makes sense.

    • @andre.queiroz
      @andre.queiroz 8 лет назад

      +Francois Molinier That's not how it works. the grayscale pixels represent the probability of it being a given number. if u input NaN u will probably see a few bright grays or a couple of almost whites.

  • @RG-jv2nv
    @RG-jv2nv 8 лет назад

    This really clarified the previous video :)

  • @Remix00zero
    @Remix00zero 8 лет назад

    So would it be possible to use convolutional neural networks for something general like arbitrary image matching or are they limited to narrowly trained applications like the one here?

    • @2Cerealbox
      @2Cerealbox 8 лет назад

      You can. I think google uses a neural net for their "visually similar images" feature.

    • @paulhendrix8599
      @paulhendrix8599 8 лет назад

      nice picture.

  • @thejll
    @thejll Год назад

    What if the training images have digits drawn at different scales?

  • @3zehnutters
    @3zehnutters 8 лет назад +21

    is there a GitHub link to the projekt, Mike ?

    • @recklessroges
      @recklessroges 8 лет назад +1

      github or it didn't happen ;-) "I call GnuImageManip'Prog"

    • @CodeAbstract
      @CodeAbstract 8 лет назад +3

      Why do you need a GitHub page? He just literally explained the full architecture of his built CNN (Convolutional Neural Network). Now, if you want to test this for yourself, you can easily implement all he said. Only find the right programming language which is supported by the libraries to complete the task. He even mentioned what he used, but you could also look at Lua with Torch for example. All the libraries that he mentioned all already there, so you won't need to code in any of the layers, just implement them.

  • @TheCritic609
    @TheCritic609 8 лет назад +1

    Great Video!

  • @HyzerFlip
    @HyzerFlip 8 лет назад +1

    Love the out of focus shots on the pictures...

  • @deepquest
    @deepquest 8 лет назад

    Its a really good lecture to understand what is going on inside NN. I am using NN for target classification in thermal images. Is NN is a good approach to do that ? Or I should go for any other option.

  • @elanvanbiljon5237
    @elanvanbiljon5237 8 лет назад

    Is there any chance you could upload a copy of the source code for the CNN some where? (or even pseudo code) I am sure many people would greatly appreciate it :D

  • @david2sdad
    @david2sdad 7 лет назад

    Is captcha a method to filter out bots, or is it a way to coerce humans into training and AI?

  • @timl2k11
    @timl2k11 6 месяцев назад

    The person who interviews this guy doesn’t ask enough questions.

  • @DrBlort
    @DrBlort 8 лет назад

    How do you decide what the convolution kernels should be? Is that important, or could they be defined randomly at the beginning?

    • @rory4987
      @rory4987 5 лет назад +1

      Neural network weights are set randomly and then learnt

  • @Sebi0043
    @Sebi0043 6 лет назад

    Thank you very much! Very helpful video!

  • @germangb8752
    @germangb8752 8 лет назад

    I'm taking a two credit course in deep learning next week!

  • @sedthh
    @sedthh 8 лет назад +1

    please do more on this

  • @jacobdawson2109
    @jacobdawson2109 8 лет назад

    I am curious, would it be possible to run this sort of neural network in reverse in order to produce the sort of "Deep Dream" images that you can see on the Internet? For instance, instead of asking the network 'what digit dose this image resemble?', ask 'what dose a 2 look like?'

  • @G3rain1
    @G3rain1 8 лет назад +2

    It would have been way more interesting to see different examples of the same number and how it tranlates into the same output.

  • @fejfo6559
    @fejfo6559 8 лет назад +3

    I'm still confused ... how can you have so many different convolutions? Isn't a convolution a very specific operation?

    • @florianh.6256
      @florianh.6256 8 лет назад

      As i understand it, "convolution" in this context just means, that you apply some sort of filter (or function) to your data and use their results to do another set of filters on it. Filter can mean anything here. For this task he used some filter that he ran on the image, from the looks a sobel, which highlights edges that have a certain angle. After doing this a few times in a row with different filters you get those 4x4 images, that are brighter the more edges on specific angles were in the picture.
      Guess im trying to tell you that convolution is not the filter, but rather the method of generating feature specific data from your original data.

    • @alexthi
      @alexthi 8 лет назад +4

      Convolution is a very specific operation between two things : an image and a kernel. By choosing different kernels, the result will vary : you may start detecting horizontal edges, thin diagonal lines, dark spots...

    • @TheHDreality
      @TheHDreality 8 лет назад +7

      A convolution is a function that takes a grid of values (an image) and returns another image. It works by replacing each pixel in the image with a function of the pixels around it, so a "blur" kernel will replace each pixel with the average value of of a grid of pixels centred around it, but other options exist, there are convolutions that do many different things, like making diagonal edges sharper, or removing small spots and such.
      Each convolution produces another image, meaning you can easily chain them together, much like you can blur an image in photoshop (which uses a convolution) and then call edge detection (also a convolution) on the result.
      This channel has a video on convolutions that explains it better than I can.

    • @fejfo6559
      @fejfo6559 8 лет назад

      TheHDreality I already watched that episode but they only talked about a blur and an edge detector they didn't speak about how it can be modified with parameters at all.
      But I think I understand know that it just gives every pixel in the grid a weight and comes up with a total which it devides by a number

    • @alexthi
      @alexthi 8 лет назад +1

      Florian H. As a math student, I know of function convolution, which is a binary operator very similar to what is shown here. But you're saying there exists more general convolution, usable on databases. This seems interesting. Could you please give me a link to a reliable source?

  • @punkkap
    @punkkap 8 лет назад

    Brilliant video.

  • @PhilStrahl
    @PhilStrahl 8 лет назад +2

    It’s like a David Lynch movie to me: I almost think I understood it and then everything just becomes a convoluted mess and I feel dumber than before...

  • @0xChRS
    @0xChRS 8 лет назад +24

    whoa why is it at 50 fps?

    • @ChristopherPuzey
      @ChristopherPuzey 8 лет назад +47

      europe is poor and can't afford the extra 10

    • @deadalusdx5637
      @deadalusdx5637 8 лет назад +4

      muh socialism

    • @AuroraNora3
      @AuroraNora3 8 лет назад +1

      Except for Scandinavia

    • @CJSwitz
      @CJSwitz 8 лет назад +17

      Based on PAL (25/50/100HZ), whereas NA is based on NTSC (30/60/120HZ). There is no techincal reason for it anymore, but it was originally because it was clocked off of the AC power grid which ran at 60hz in NA and 50hz in Europe.

    • @ChristopherPuzey
      @ChristopherPuzey 8 лет назад

      "So my monitor runs at 60 fps"

  • @DrNaserRazavi
    @DrNaserRazavi 8 лет назад

    Actually, The correct pronunciation of Le-Net is Lo-Net. "Le" in French is like "The" in in English but just for masculine.

  • @vikassrivastava2680
    @vikassrivastava2680 7 лет назад

    HOw can i get these algos if i want to do it on my machine?

  • @ahmadibraheem1141
    @ahmadibraheem1141 4 года назад

    Hey I have a question. After the first conv layer, we are left with 20 images of 24*24 pixels. Do these 20 images transform into one 24*24 sized image, to be given as an input to the next conv. layer?

    • @TheAbdelwahab83
      @TheAbdelwahab83 2 года назад

      No, after the first conv layer you have like a volume (24*24*20), and it is the input to the next conv layer of size ( 5*5*20), so if you apply this kernel to that input volume you'll get one image of size 20*20, and because you have 50 filters of (5*5*20) so your output will be 20*20*50

  • @fethiourghi
    @fethiourghi 5 лет назад

    The best explanation of CNN's thanx

  • @MeAtHome5
    @MeAtHome5 5 лет назад +1

    I always do a *very* firm two. 06:55

  • @Lost_Evanes
    @Lost_Evanes 8 лет назад

    May be I didn't get the idea, but why there is 10, and not 11 classes for numbers?
    Because if I will give an image of "A" to this network, it will probably say to me "this is 1" or "this is 4" instead of giving the negative answer like "its neither of 10 numbers".

    • @SrssSteve
      @SrssSteve 7 лет назад

      Дмитрий Сулин Because what you are expecting is that one, and only one, of the ten output nodes would have a very high confidence, say 0.9 or 0.95 of 1.0. The other nodes would be near 0. If the input image didn’t match any number 0 to 9, then all the nodes would output a low or very low confidence value.

  • @kachrooabhishek
    @kachrooabhishek 3 года назад

    i just love this guy

  • @Sean_735
    @Sean_735 8 лет назад +4

    Why would you not show us what it does when you put in a random squiggle? That'd be cool.

    • @CodeAbstract
      @CodeAbstract 8 лет назад +2

      Since the end layers are fully connected, it will have to choose randomly from any of the 10 output classes. So it will just output any number wrongly, and it won't have any output like "No Number" or something like that. Even though you could extend the network in smart ways, like adding another class. And the eleventh class will then in turn have the meaning of 'NaN' (Not a Number), but you will have to label additional samples with "NaN' input for training of course.

  • @FlumenSanctiViti
    @FlumenSanctiViti 8 лет назад

    I haven even noticed the video was at 50fps. Probably because the video in most part is out of focus.