I can't understand the fundamental assumption (first introduced at 11:30) that the weights are random variables. This is only true for an untrained network at the very first step of training. Thereafter, and certainly for a trained network, the weights are not random. As a matter of fact, much effort has gone into extracting or controlling the distribution of the weights. Explicitly, in the case of variational autoencoders, for example. And one can empirically observe that many CNN kernels develop obvious low-entropy patterns, particularly in the first layers. Can anyone help me to understand what I am missing here?
Yeah, I have the same question and more such questions. Inspite of reading a few books on Deep Learning there are still a lot of questions I have. It's like the more I read the more questions arise.
I wondered the same thing. My guess is that non-random distribution of weights in the network would be encoded in a prior? It wouldn't be a true GP anymore and you couldn't derive anything in closed form (I assume), but the general Bayesian/GP formalism may still hold. Just my guess..
If youconsider all possible types of networks and all possible types of applications, the optimized weights are random variables that has to follow CLT
@@n00bphd84 That's just saying "everything is random" in a different way. But, as soon as you start to train an actual network, that is no longer true. And then none of this applies.
@@JohnUrbanic-m3q I may be wrong since I'm not an expert but random variables in probability do not necessarily indicate randomness in the sense we informally think of. They are random variables in the sense that they are modelled as such. And even if after training we do converge to a set of values for the weights, each is a random variable in the sense that they're one realization out of many; one from a distribution.
This is a very interesting talk, but I will tilt against the windmill of obnoxious jargon and point out that the word "compute" is a VERB and there is nothing wrong with the word "computATION"!
Apparently any thing can be called "first principles" these days. As long as the title sounds cool, any BS material is sold as top tier in Machine learning research
Similarity, equivalence = Duality. Input (thesis) is dual to output (synthesis) -- problem, reaction, solution or the Hegelian dialectic. Neural networks conform to the Hegelian dialectic. Your mind is a reaction or impulse response to input problems -- thesis, anti-thesis, synthesis. Cause is dual to effect -- correlation. "Always two there are" -- Yoda. Concepts are dual to percepts -- the mind duality of Immanuel Kant.
@@ChaseCoble-u7h Antinomy (duality) is two truths that contradict each other -- Immanuel Kant. "This sentence is false" -- the sentence. If the sentence is true then the sentence is false. If the sentence is false then the sentence is true -- antinomy. The sentence is both true and false both at the same time -- duality. Syntax is dual to semantics -- languages, communication or information. If mathematics is a language then it is dual. Structure (syntax) is dual to function (semantics) -- protein folding in biology. Protein shape or structure determines its function, goal or purpose -- protein folding is dual. All codes, languages are dual -- DNA is a code. Large language models are using duality -- neural networks. The double helix or DNA should be called the dual helix -- the code of life is dual. Clockwise is dual to anti-clockwise -- the Krebs energy cycle is dual. Trees or plants emit oxygen and absorb carbon dioxide -- clockwise. Mammals or humans emit carbon dioxide and absorb oxygen -- anti-clockwise. You exist because the Krebs energy cycle changed direction millions of years ago. Good is dual to bad -- drugs or you are using duality. Your mind is an impulse response to the external world -- the Hegelian dialectic. Mind (syntropy, synergy) is dual to matter (entropy, energy) -- Descartes or Plato's divided line. Your mind is syntropic as you make predictions. Syntropy (knowledge, prediction) is dual to increasing entropy (lack of knowledge) -- the 4th law of thermodynamics! Lacking is dual to non lacking -- knowledge or information. Synergy is dual to energy -- energy is dual, the Krebs energy cycle!
@@hyperduality2838 I've dealt with both and antinomy is just a sidestep to attempt to avoid the objectivity that natural comes from proving absurdity. The amount of categorical misattributions you just participated in is absolutely ridiculous.
@@ChaseCoble-u7h Antinomy (duality) is two truths that contradict each other -- Immanuel Kant. Absolute truth is dual to relative truth -- Hume's fork. Truth is dual to falsity -- propositional logic. "This sentence is false" -- the sentence. If the sentence is true then the sentence is false. If the sentence is false then the sentence is true -- antinomy. The sentence is true and false both at the same time -- duality. Truth is a dual concept. Duality means that there are new laws of physics:- Syntropy (knowledge, prediction) is dual to increasing entropy (lack of knowledge) -- the 4th law of thermodynamics! Lacking is dual to non lacking. Knowledge is dual according to Immanuel Kant -- synthetic a priori knowledge. If knowledge is dual then information is dual. Objective information (syntax) is dual to subjective information (semantics) -- information is dual. Average information (entropy) is dual to co or mutual information (syntropy) -- information is dual. Teleological physics (syntropy) is dual to non teleological physics (entropy). Knowledge or science is syntropic -- duality! "Through imagination and reason we turn experience into foresight (prediction)" -- Spinoza describing syntropy. Converting experience or knowledge into predictions is a syntropic process -- teleological. Your mind reacts to problems and synthesizes solutions -- the Hegelian dialectic. Enantiodromia is the unconscious opposite or opposame (duality) -- Carl Jung. There is also a 5th law of thermodynamics but you are having problems with understanding the 4th law. Your mind is dual according to Immanuel Kant, Yoda is correct.
@@ChaseCoble-u7h Categories (syntax, form) are dual to sets (semantics, substance) -- Category theory. Category theory is the study of duality! Injective is dual to surjective synthesizes bijective or isomorphism -- the Hegelian dialectic.
Great, insightful talk! Learned a lot here.
素晴らしい👍
Great 🎉
I can't understand the fundamental assumption (first introduced at 11:30) that the weights are random variables. This is only true for an untrained network at the very first step of training. Thereafter, and certainly for a trained network, the weights are not random.
As a matter of fact, much effort has gone into extracting or controlling the distribution of the weights. Explicitly, in the case of variational autoencoders, for example. And one can empirically observe that many CNN kernels develop obvious low-entropy patterns, particularly in the first layers.
Can anyone help me to understand what I am missing here?
Yeah, I have the same question and more such questions. Inspite of reading a few books on Deep Learning there are still a lot of questions I have. It's like the more I read the more questions arise.
I wondered the same thing. My guess is that non-random distribution of weights in the network would be encoded in a prior? It wouldn't be a true GP anymore and you couldn't derive anything in closed form (I assume), but the general Bayesian/GP formalism may still hold. Just my guess..
If youconsider all possible types of networks and all possible types of applications, the optimized weights are random variables that has to follow CLT
@@n00bphd84 That's just saying "everything is random" in a different way. But, as soon as you start to train an actual network, that is no longer true. And then none of this applies.
@@JohnUrbanic-m3q I may be wrong since I'm not an expert but random variables in probability do not necessarily indicate randomness in the sense we informally think of.
They are random variables in the sense that they are modelled as such. And even if after training we do converge to a set of values for the weights, each is a random variable in the sense that they're one realization out of many; one from a distribution.
This is a very interesting talk, but I will tilt against the windmill of obnoxious jargon and point out that the word "compute" is a VERB and there is nothing wrong with the word "computATION"!
Apparently any thing can be called "first principles" these days. As long as the title sounds cool, any BS material is sold as top tier in Machine learning research
🙏🏻
Similarity, equivalence = Duality.
Input (thesis) is dual to output (synthesis) -- problem, reaction, solution or the Hegelian dialectic.
Neural networks conform to the Hegelian dialectic.
Your mind is a reaction or impulse response to input problems -- thesis, anti-thesis, synthesis.
Cause is dual to effect -- correlation.
"Always two there are" -- Yoda.
Concepts are dual to percepts -- the mind duality of Immanuel Kant.
Drugs are bad
@@ChaseCoble-u7h Antinomy (duality) is two truths that contradict each other -- Immanuel Kant.
"This sentence is false" -- the sentence.
If the sentence is true then the sentence is false.
If the sentence is false then the sentence is true -- antinomy.
The sentence is both true and false both at the same time -- duality.
Syntax is dual to semantics -- languages, communication or information.
If mathematics is a language then it is dual.
Structure (syntax) is dual to function (semantics) -- protein folding in biology.
Protein shape or structure determines its function, goal or purpose -- protein folding is dual.
All codes, languages are dual -- DNA is a code.
Large language models are using duality -- neural networks.
The double helix or DNA should be called the dual helix -- the code of life is dual.
Clockwise is dual to anti-clockwise -- the Krebs energy cycle is dual.
Trees or plants emit oxygen and absorb carbon dioxide -- clockwise.
Mammals or humans emit carbon dioxide and absorb oxygen -- anti-clockwise.
You exist because the Krebs energy cycle changed direction millions of years ago.
Good is dual to bad -- drugs or you are using duality.
Your mind is an impulse response to the external world -- the Hegelian dialectic.
Mind (syntropy, synergy) is dual to matter (entropy, energy) -- Descartes or Plato's divided line.
Your mind is syntropic as you make predictions.
Syntropy (knowledge, prediction) is dual to increasing entropy (lack of knowledge) -- the 4th law of thermodynamics!
Lacking is dual to non lacking -- knowledge or information.
Synergy is dual to energy -- energy is dual, the Krebs energy cycle!
@@hyperduality2838 I've dealt with both and antinomy is just a sidestep to attempt to avoid the objectivity that natural comes from proving absurdity. The amount of categorical misattributions you just participated in is absolutely ridiculous.
@@ChaseCoble-u7h Antinomy (duality) is two truths that contradict each other -- Immanuel Kant.
Absolute truth is dual to relative truth -- Hume's fork.
Truth is dual to falsity -- propositional logic.
"This sentence is false" -- the sentence.
If the sentence is true then the sentence is false.
If the sentence is false then the sentence is true -- antinomy.
The sentence is true and false both at the same time -- duality.
Truth is a dual concept.
Duality means that there are new laws of physics:-
Syntropy (knowledge, prediction) is dual to increasing entropy (lack of knowledge) -- the 4th law of thermodynamics!
Lacking is dual to non lacking.
Knowledge is dual according to Immanuel Kant -- synthetic a priori knowledge.
If knowledge is dual then information is dual.
Objective information (syntax) is dual to subjective information (semantics) -- information is dual.
Average information (entropy) is dual to co or mutual information (syntropy) -- information is dual.
Teleological physics (syntropy) is dual to non teleological physics (entropy).
Knowledge or science is syntropic -- duality!
"Through imagination and reason we turn experience into foresight (prediction)" -- Spinoza describing syntropy.
Converting experience or knowledge into predictions is a syntropic process -- teleological.
Your mind reacts to problems and synthesizes solutions -- the Hegelian dialectic.
Enantiodromia is the unconscious opposite or opposame (duality) -- Carl Jung.
There is also a 5th law of thermodynamics but you are having problems with understanding the 4th law.
Your mind is dual according to Immanuel Kant, Yoda is correct.
@@ChaseCoble-u7h Categories (syntax, form) are dual to sets (semantics, substance) -- Category theory.
Category theory is the study of duality!
Injective is dual to surjective synthesizes bijective or isomorphism -- the Hegelian dialectic.