I have noticed that if we frame a statement in terms of conditional probabilities instead of correlation we can infer some information about the causal structure. If P(B|A) != P(A)*P(B) then one of the following must be true: A causes or prevents B, B causes or prevents A, C some other factors or set of factors cause or prevent both A and B or finally it is random chance. There is an almost linear relationship between the distance that P(B|A) is from P(A)*P(B) and the probability of being in the first two states. Meaning, as an example, as P(B|A) grows larger than P(A)*P(B) so does the probability that we are in either state 1 or 2. In many cases I can with relative confidence say if we are either in states 1 or 2 or in 3 or 4.
23:53---We want the "software" to modify itself. Can a neural network posit counterfactuals? A neural net can predict. And then measure how closely aligned that prediction matches a training set or constraining parameters. But when is one prediction of value, and when is that same prediction not of value? When is one prediction more "correct" than another, or all others, and when is it more useful to be replaced with another prediction? So the mechanism we're looking for is how to decide between, how to govern predictions in order to achieve something closer to a human level of counterfactual positing, and more importantly, when, rank, order, context. What we're looking for is a pattern recognition that isn't constrained by resembling objects, but by resembling functions.
13:45--Predicate logic, symbolic logic are forms of math...or, math is a form of logic. Predicate logic COULD be represented with numbers, obviously. But without the presence of functional arrays, all you can get the computational platform to do is recognize co-occurrences. Yet...Siri and Alexa use predicate logic to mimic language understanding to great effect (NLP). So what's missing?
28:29---Simpson's Paradox...when to go with one data set or another -isn't that answerable by regression models? When age is relevant to a prediction and when it is not? Leave out age and you get one answer. Leave it in, get another. That's the difference between the two graphs, no? A variable? And the relationship between variables can be addressed by function(s). It's when we apply different functions to variables, and between variables, when we create new variables this way, that we often are able to enhance and surpass the original model. Can't we set a neural net, a governing or discriminating entity to determine which pattern of functions and variable sets and interactions are more explanatory of the data under consideration, the new occurrences of patterns with which we wish the AI to recognize? So the counterfactual isn't what matches the data, but rather the series or occurrences of functions. The causal relationships all can be represented by functions or series of functions and can be data-agnostic - that patterns of functions can be recognized across disparate data sets despite context, or, depending on context - these are both possible. No?
Our first function is association and it comes from algebra. Are second function is back propagation/chain rule and that comes from calculus. All of our current power in data science and AI comes from these two functions. Why stop there?
We can observe and measure age. But there are things that are hard to observe and measure, such as human intelligence, perseverance, etc., for example.
Note to future organisers: Please give Judea an attached mic rather than a hand held =)
I have noticed that if we frame a statement in terms of conditional probabilities instead of correlation we can infer some information about the causal structure. If P(B|A) != P(A)*P(B) then one of the following must be true: A causes or prevents B, B causes or prevents A, C some other factors or set of factors cause or prevent both A and B or finally it is random chance. There is an almost linear relationship between the distance that P(B|A) is from P(A)*P(B) and the probability of being in the first two states. Meaning, as an example, as P(B|A) grows larger than P(A)*P(B) so does the probability that we are in either state 1 or 2. In many cases I can with relative confidence say if we are either in states 1 or 2 or in 3 or 4.
23:53---We want the "software" to modify itself. Can a neural network posit counterfactuals? A neural net can predict. And then measure how closely aligned that prediction matches a training set or constraining parameters. But when is one prediction of value, and when is that same prediction not of value? When is one prediction more "correct" than another, or all others, and when is it more useful to be replaced with another prediction? So the mechanism we're looking for is how to decide between, how to govern predictions in order to achieve something closer to a human level of counterfactual positing, and more importantly, when, rank, order, context. What we're looking for is a pattern recognition that isn't constrained by resembling objects, but by resembling functions.
When do we activate one prediction over another? That can be programmed as a function, and more interestingly, as a series of functions.
35:49---Back propagation makes counterfactuals possible.
13:45--Predicate logic, symbolic logic are forms of math...or, math is a form of logic. Predicate logic COULD be represented with numbers, obviously. But without the presence of functional arrays, all you can get the computational platform to do is recognize co-occurrences. Yet...Siri and Alexa use predicate logic to mimic language understanding to great effect (NLP). So what's missing?
Brilliant!!!
28:29---Simpson's Paradox...when to go with one data set or another -isn't that answerable by regression models? When age is relevant to a prediction and when it is not? Leave out age and you get one answer. Leave it in, get another. That's the difference between the two graphs, no? A variable? And the relationship between variables can be addressed by function(s). It's when we apply different functions to variables, and between variables, when we create new variables this way, that we often are able to enhance and surpass the original model. Can't we set a neural net, a governing or discriminating entity to determine which pattern of functions and variable sets and interactions are more explanatory of the data under consideration, the new occurrences of patterns with which we wish the AI to recognize? So the counterfactual isn't what matches the data, but rather the series or occurrences of functions. The causal relationships all can be represented by functions or series of functions and can be data-agnostic - that patterns of functions can be recognized across disparate data sets despite context, or, depending on context - these are both possible. No?
Our first function is association and it comes from algebra. Are second function is back propagation/chain rule and that comes from calculus. All of our current power in data science and AI comes from these two functions. Why stop there?
I think the overall message is the danger of an unknown variable, which if condioned on reverses the pattern observed in data.
We can observe and measure age. But there are things that are hard to observe and measure, such as human intelligence, perseverance, etc., for example.
Science simplified. Thanks! :)
great talk, i really like his books
Correlation is not causation