@@michaelgismondi9861 Dividing conquering each subproblem, backtracking if solution doesn't match, moving forward. Simple algorithm of learning and relearning - coming up with new forms. Imagination seems like best tool - memorizing sensory input and reproduce it. How to connect many complex ideas? Each idea visualised in form of graph having nodes (constant/functions - aka values) and edges(operations) between symbols.
Yes, interesting. Question though: why would he approximate the number of experiences by life seconds and not, say, milliseconds? And can't we have multiple experiences at the same time? One could arguably treat any excitation of any sense neuron as a separate experience, and that would dramatically change the pseudo-math underlying Hinton's statement...
In response to 3:33: Are you familiar with the idea of "back EMF"? AC motors only have an explicit forward signal, and yet are able to be regulated by a speed controller via back emf. It seems like an analogous concept could be at play in the brain. (I'm still watching)
I wish my Ph.D class is like this. Not just professors telling you to do research and they just critic your work even though they don't even know it. LOL
teachers have lost their sense of empathy for students nowadays. they just brag off their patents and papers instead of understanding and helping. they dont even come and help in labs. too busy with themselves. they say indian education system is not good. i say education system is not good anywhere. teaching has become an underrated profession.
@1:22:54 Q: Just curious … could you repeat your statement about why statistics can’t explain ahh how these .. work so well? Hinton: My major complaint about statistics, at least in this talk, uhm was they studied models that are very different regime than the brain. They studied models where umm the data isn’t all that high dimensional and you have not that many parameters and not much data. That’s the history of statistics. And what they call big data is ... if you have a billion training sample .. ..that’s called big data. Uhmm from the brain's point of view. It’s got a billion training sample or more than a billion training samples. And it’s small data because you have got so many more parameters. So it’s in a different regime. Where it can’t just assume that … I mean at Google they're telling this to me all the time. They say you don’t need to worry about regularization. We’ve got so much data that it’s just not a problem. Well that’s just not true. For some of these problems they’ve only got a trillion examples. Unless you’ve got 10 to the fifteen parameters, that’s no good. You’d better regularize.
Sorry to say, but Dr. Hinton occasionally seems unable to hide his sense of humor! (But you'd have to get a hint on.) -- This, you sir, was a most informative and entertaining lecture! Thanks :)
For a given observation we ask: what is the probability of this model producing this observation? A low probability suggests an error in the model. More broadly, does the model produce outputs (predictions) with the same distribution as observations, the cross entropy of the two distributions is a fundamental measure of their difference, measured in how much information is required to correct the error. See heliosphan.org/cross-entropy.html
Sure, but luckily the cortex doesn't need to calculate error. It is simply a feedforward hebbian network. Although dendrites do use backpropagation, the biological mechanism is very different and much more efficient then NN implementations.
The brain may not need to calculate an error quantity directly, but at some level there needs to be a notion of what states are desirable and which aren't, and a means of determining which direction to modify weights to obtain the desired states. Fundamentally for a system to follow such a scheme there needs to be some information flowing in reverse (feedback). "the biological mechanism is very different and much more efficient then NN implementations." please elaborate
What your describing is a cybernetic system. Most machine learning systems are based on cybernetic principles. However the cortical neurons use Spike Timing Dependent Plasticity to passively model the streaming input data. There is no notion of what is desirable or correct/incorrect. A synaptic connection acts less like a 'weight' that propagates various signal strengths (spike trains), but rather more like a binary 'switch'. The more backprogation signals the 'switch' gets the more permanent it becomes. Else the switch weakens until its 'off'. Biological back propagation is very different because it operates on very simple local rules, oblivious to any notion of correct/incorrect. There is no higher-level/higher-order signal that back propagates through the whole network. That would be highly inefficient for the brain. Instead the permanence of synaptic connections in dendrite segments correspond to the activity of the neurons their connected to. With enough neural and synaptic genesis and pruning, each cell is able to effectively react to hundreds of different neural patterns to partake in the prediction of current activity. This system is repeated in cortical regions arranged in a hierarchy of inference feedforward/feedback white-matter connections. The feedback is inference only, nothing to do with error. The cortex does operate loosely on cybernetic principles during goal-oriented behavior, but its much higher-level operating between regions rather than individual neurons or columns.
Because it's way of determining how good a model is at modelling the data set we give it, without using labelled data. I.e. we can just provide a set of images, train on cross entropy error, and through necessity, the end result is a model that contains detectors for all the relevant features in the images, from low level features such as edges and basic shapes,right up to high level features such as 'cat', 'car', 'face', etc. See heliosphan.org/generative-models.html
They are "good" under the premise that the brain's "real" way of "learning" is "best". According to the lecture, the brain doesn't use real values, so that is why real values aren't as good as spikes. I don't see how a "spike" isn't a real value. It sounds to me like a binary spike up or not, 0/1. But, I am not sure that is what is meant. One spike could be high, one could be low.... so that would mean they have more than just binary values. In my opinion, real values are limited because they can often include lack the extreme values, like 0 and whatever the extreme limits are. Also, this point reminds me of an article I read about hunter gatherers, they don't use many numbers, just 0-3 and beyond that its just 'a lot', or something like that.
@@Sam5LC spikes are binary values in the sense that neurons cannot produce specific spike heights, so the exact spike height will be of no use to the neural net.
We decrease the basses of the audio - because the bass oscillations are fat and take space. After we decrease the basses, we increase the 1kHz region, and a little less we do increase also the higer region of the spectrum, but not till the far high frequencies. Of course we have to compress and then use make up gain. If we make it simply loud and allow spill offs, then we get a noisy distortion. Sound is crucial when we communicate through it! We record correctly at the beginning, because distorted files cannot ever become acceptable! Before you develop something afar, develop scientifically something you use every day! The clever person before he-she starts lecturing, creates a lecturing model about lecturing and methods to maintain eternally the lectures in high quality! Knowledge is memory. That memory here isn't the best we humans can do!
This talk is being touted (elsewhere) as "while he may have been critical of it, this video shows that/why Geoff Hinton is actually okay with ML relying on feedforward/back-propagation and why feedback is not that important". As an outsider to AI (meaning ML), I came expecting that. I think I found the "that" GH is okay with BP in ML, but not the "why", other than "it works" ... for ML. And the treatment of feedback was unsatisfying. Do the brain's feedback mechanisms use derivatives? Why if they don't use real values, just "spikes", is it so easy for neurons to constantly calculate derivatives and hand them off to other neurons for easy processing. Maybe the rate of change over time isn't being treated as a derivative. Maybe calculus and stats are limiting our understanding of neural nets / cognition / learning / what to do with 'big data'. Could the "learning" in ML be replaced with a word meaning "static pattern reproduction and updating based on past observed patterns and gaps with present observations" (success being "t=0, I see an image of a static set of furniture in a room. t=1 Now I have updated the image. t=2 Again, it's updated."), "Machine SPRUBPOBGPO" M-SPRU-POGO sounds pretty cool.
Here is where I got the link to this video: www.quora.com/Why-is-Geoffrey-Hinton-suspicious-of-backpropagation-and-wants-AI-to-start-over see post by James Morris.
spikes rather than actuate real no grossly mislead by so called statisticicans n scientists batesuan stastics r liberall hugely more psrams but better integrated more posterior huge parameters in training cases
The day he gave this lecture no body would have knew that one day he will be awarded nobel prize in physics for this algorithm and neural network.
There is a saying : "Geoff Hinton knows exactly how brain works. Because it is the 52nd time he discovered a new way how brain works"
He is converging upon a solution.
@@michaelgismondi9861 Dividing conquering each subproblem, backtracking if solution doesn't match, moving forward. Simple algorithm of learning and relearning - coming up with new forms. Imagination seems like best tool - memorizing sensory input and reproduce it. How to connect many complex ideas? Each idea visualised in form of graph having nodes (constant/functions - aka values) and edges(operations) between symbols.
"Synapses are much cheaper than experiences so it makes sense to throw a lot of synapses at each experience." Mind blown.
Yes, interesting. Question though: why would he approximate the number of experiences by life seconds and not, say, milliseconds? And can't we have multiple experiences at the same time? One could arguably treat any excitation of any sense neuron as a separate experience, and that would dramatically change the pseudo-math underlying Hinton's statement...
13:06 "Just the furniture", lol... gotta love Hinton's dry humour.
In response to 3:33: Are you familiar with the idea of "back EMF"? AC motors only have an explicit forward signal, and yet are able to be regulated by a speed controller via back emf. It seems like an analogous concept could be at play in the brain. (I'm still watching)
Jvvf
Absolutely well done and definitely keep it up!!! 👍👍👍👍👍
I would've loved to have the slides separate and be able to see GH hands all the time. Makes more sense to me somehow.
I wish my Ph.D class is like this. Not just professors telling you to do research and they just critic your work even though they don't even know it. LOL
teachers have lost their sense of empathy for students nowadays. they just brag off their patents and papers instead of understanding and helping. they dont even come and help in labs. too busy with themselves. they say indian education system is not good. i say education system is not good anywhere. teaching has become an underrated profession.
And here we are.
Neuroscientists have also found backpropagatinf spikes to dendrites so I'm surprised that wasn't mentioned
@1:22:54
Q: Just curious … could you repeat your statement about why statistics can’t explain ahh how these .. work so well?
Hinton: My major complaint about statistics, at least in this talk, uhm was they studied models that are very different regime than the brain. They studied models where umm the data isn’t all that high dimensional and you have not that many parameters and not much data. That’s the history of statistics. And what they call big data is ... if you have a billion training sample .. ..that’s called big data. Uhmm from the brain's point of view. It’s got a billion training sample or more than a billion training samples. And it’s small data because you have got so many more parameters. So it’s in a different regime. Where it can’t just assume that … I mean at Google they're telling this to me all the time. They say you don’t need to worry about regularization. We’ve got so much data that it’s just not a problem. Well that’s just not true. For some of these problems they’ve only got a trillion examples. Unless you’ve got 10 to the fifteen parameters, that’s no good. You’d better regularize.
The second to last sentence start 'And if', not 'Unless'
Thanks for this talk. Super interesting!
Enlightening presentation! The audio could be improved.
by using backpropagation on the audio input
+Pragy Agarwal Fortunatly only at the beginning
Sorry to say, but Dr. Hinton occasionally seems unable to hide his sense of humor! (But you'd have to get a hint on.) -- This, you sir, was a most informative and entertaining lecture! Thanks :)
"Just the furniture." lol.
The way the cortex actually learns is much easier and simpler than this. The main objection is - where does the cortex get the error from?
For a given observation we ask: what is the probability of this model producing this observation? A low probability suggests an error in the model. More broadly, does the model produce outputs (predictions) with the same distribution as observations, the cross entropy of the two distributions is a fundamental measure of their difference, measured in how much information is required to correct the error. See heliosphan.org/cross-entropy.html
Sure, but luckily the cortex doesn't need to calculate error. It is simply a feedforward hebbian network. Although dendrites do use backpropagation, the biological mechanism is very different and much more efficient then NN implementations.
The brain may not need to calculate an error quantity directly, but at some level there needs to be a notion of what states are desirable and which aren't, and a means of determining which direction to modify weights to obtain the desired states. Fundamentally for a system to follow such a scheme there needs to be some information flowing in reverse (feedback).
"the biological mechanism is very different and much more efficient then NN implementations."
please elaborate
What your describing is a cybernetic system. Most machine learning systems are based on cybernetic principles. However the cortical neurons use Spike Timing Dependent Plasticity to passively model the streaming input data. There is no notion of what is desirable or correct/incorrect.
A synaptic connection acts less like a 'weight' that propagates various signal strengths (spike trains), but rather more like a binary 'switch'. The more backprogation signals the 'switch' gets the more permanent it becomes. Else the switch weakens until its 'off'.
Biological back propagation is very different because it operates on very simple local rules, oblivious to any notion of correct/incorrect. There is no higher-level/higher-order signal that back propagates through the whole network. That would be highly inefficient for the brain. Instead the permanence of synaptic connections in dendrite segments correspond to the activity of the neurons their connected to. With enough neural and synaptic genesis and pruning, each cell is able to effectively react to hundreds of different neural patterns to partake in the prediction of current activity. This system is repeated in cortical regions arranged in a hierarchy of inference feedforward/feedback white-matter connections. The feedback is inference only, nothing to do with error.
The cortex does operate loosely on cybernetic principles during goal-oriented behavior, but its much higher-level operating between regions rather than individual neurons or columns.
I think Geoff already answered it at 1:04:00
Very interesting most research seems to go into unsupervised deep learning, makes sense. I wonder what will come out of it.
Interesting! Why is it interesting to be able to generate images that look real?
Because it's way of determining how good a model is at modelling the data set we give it, without using labelled data. I.e. we can just provide a set of images, train on cross entropy error, and through necessity, the end result is a model that contains detectors for all the relevant features in the images, from low level features such as edges and basic shapes,right up to high level features such as 'cat', 'car', 'face', etc. See heliosphan.org/generative-models.html
regressing towards desired output
Can some one explain why spikes are better than the real values, or can provide an appropriate link.
They are "good" under the premise that the brain's "real" way of "learning" is "best". According to the lecture, the brain doesn't use real values, so that is why real values aren't as good as spikes. I don't see how a "spike" isn't a real value. It sounds to me like a binary spike up or not, 0/1. But, I am not sure that is what is meant. One spike could be high, one could be low.... so that would mean they have more than just binary values. In my opinion, real values are limited because they can often include lack the extreme values, like 0 and whatever the extreme limits are. Also, this point reminds me of an article I read about hunter gatherers, they don't use many numbers, just 0-3 and beyond that its just 'a lot', or something like that.
@@Sam5LC spikes are binary values in the sense that neurons cannot produce specific spike heights, so the exact spike height will be of no use to the neural net.
The content is great but the sound is very bad
We decrease the basses of the audio - because the bass oscillations are fat and take space. After we decrease the basses, we increase the 1kHz region, and a little less we do increase also the higer region of the spectrum, but not till the far high frequencies. Of course we have to compress and then use make up gain. If we make it simply loud and allow spill offs, then we get a noisy distortion. Sound is crucial when we communicate through it! We record correctly at the beginning, because distorted files cannot ever become acceptable! Before you develop something afar, develop scientifically something you use every day! The clever person before he-she starts lecturing, creates a lecturing model about lecturing and methods to maintain eternally the lectures in high quality! Knowledge is memory. That memory here isn't the best we humans can do!
Anyone can summary this video ?
~1 minute starting @1:10:27
totally unlike anything statstcisns have studied
This talk is being touted (elsewhere) as "while he may have been critical of it, this video shows that/why Geoff Hinton is actually okay with ML relying on feedforward/back-propagation and why feedback is not that important". As an outsider to AI (meaning ML), I came expecting that. I think I found the "that" GH is okay with BP in ML, but not the "why", other than "it works" ... for ML. And the treatment of feedback was unsatisfying.
Do the brain's feedback mechanisms use derivatives? Why if they don't use real values, just "spikes", is it so easy for neurons to constantly calculate derivatives and hand them off to other neurons for easy processing. Maybe the rate of change over time isn't being treated as a derivative. Maybe calculus and stats are limiting our understanding of neural nets / cognition / learning / what to do with 'big data'.
Could the "learning" in ML be replaced with a word meaning "static pattern reproduction and updating based on past observed patterns and gaps with present observations" (success being "t=0, I see an image of a static set of furniture in a room. t=1 Now I have updated the image. t=2 Again, it's updated."), "Machine SPRUBPOBGPO" M-SPRU-POGO sounds pretty cool.
Here is where I got the link to this video: www.quora.com/Why-is-Geoffrey-Hinton-suspicious-of-backpropagation-and-wants-AI-to-start-over see post by James Morris.
Interesting
forward-forward :-)
spikes rather than actuate real no
grossly mislead by so called statisticicans n scientists
batesuan stastics r liberall
hugely more psrams but better integrated more posterior
huge parameters in training cases