Yoshua Bengio | From System 1 Deep Learning to System 2 Deep Learning | NeurIPS 2019

Preserve Knowledge

Просмотров 39 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 10 дек 2019
Slides: www.iro.umontreal.ca/~bengioy/...
Summary:
Past progress in deep learning has concentrated mostly on learning from a static dataset, mostly for perception tasks and other System 1 tasks which are done intuitively and unconsciously by humans. However, in recent years, a shift in research direction and new tools such as soft-attention and progress in deep reinforcement learning are opening the door to the development of novel deep architectures and training frameworks for addressing System 2 tasks (which are done consciously), such as reasoning, planning, capturing causality and obtaining systematic generalization in natural language processing and other applications. Such an expansion of deep learning from System 1 tasks to System 2 tasks is important to achieve the old deep learning goal of discovering high-level abstract representations because we argue that System 2 requirements will put pressure on representation learning to discover the kind of high-level concepts which humans manipulate with language. We argue that towards this objective, soft attention mechanisms constitute a key ingredient to focus computation on a few concepts at a time (a "conscious thought") as per the consciousness prior and its associated assumption that many high-level dependencies can be approximately captured by a sparse factor graph. We also argue how the agent perspective in deep learning can help put more constraints on the learned representations to capture affordances, causal variables, and model transitions in the environment. Finally, we propose that meta-learning, the modularization aspect of the consciousness prior and the agent perspective on representation learning should facilitate re-use of learned components in novel ways (even if statistically improbable, as in counterfactuals), enabling more powerful forms of compositional generalization, i.e., out-of-distribution generalization based on the hypothesis of localized (in time, space, and concept space) changes in the environment due to interventions of agents.

Комментарии • 32

@evankim4096 4 года назад ⁺⁷
I like that Yoshua approaches the theory of neural networks in the language of probability at its core.
@AR-iu7tf 4 года назад ⁺⁵
Prof. Bengio is perhaps one of the key (if not the only key voice) who so clearly articulates in great detail what is lacking in DL to date and what could be one path forward ( and is kind enough to give links to all relevant references). Few exhibit the intellectual honesty and earnestness in helping the rest of us understand what to expect in the future.
Wish I had teachers like him when I went to school.
@CristianGarcia 4 года назад ⁺³⁹
Yoshua: "Conscience is the next big thing"
Next job offering: AI Conscience Engineer
@ACogloc 4 года назад ⁺¹
Following job: Conscient AI
@cristhian4513 4 года назад
XD te pasas
@CosmiaNebula 3 года назад
From computer science to comscience.
@SterileNeutrino Год назад
Like a RUclips video, the AI will be able to convince you of anything and its opposite.
@gangfang8835 Год назад
It took me a month to fully understand everything he discussed in this presentation (at a high level). I think this is the future. Would love to hang out and discuss if anyone is in Toronto.
@wehitextracellularidiombit4907 4 года назад ⁺²
Who's the speeker who introduced Mr YB? Is she a researcher too ?
@robosergTV 4 года назад
why?
@lalithbharadwajbaru8704 3 года назад ⁺¹
Leon Bottu.
Yes he's one of the great researcher
@keghnfeem4154 4 года назад ⁺³

Hello Yoshua.
@arjunashok4956 3 года назад
The link for the slides don't work! Please update them!
@araldjean-charles3924 Год назад ⁺¹
A big chunk of knowledge maybe pre verbal. Look at our cats, dogs, and other mammals.
@leo.budimir 4 года назад ⁺⁵
"In our community, the C-word (consciousness) ..." =D
@immortaldiscoveries3038 4 года назад
Transformers. Deep Learning. Training. Hard Problem. Fixed-Size set. Prune.....I could keep going...
@Dmdmello 4 года назад ⁺³
Isn't causality just a special case of correlation across time? At least that's how it seems to works for human intuition of causal effect, I think
if so, I don't see the fact that modern neural nets are only capable of learning correlations as an impediment for them to also learn causal relations.
@viswanathgangavaram7385 4 года назад ⁺⁴
I suggest to read "The Book Of Why by Judea Pearl", especially the first two chapters
@ans1975 4 года назад ⁺¹
I am not sure about that, but I want to pose another question: if you invert time what happens?
Does this thought experiment help in looking for clarifications?
The correlation of x at time t with x' at time t' should not be affected by that change.
But a causal relation should be affected, as far as I can see.
I am now at the conference, if I manage to meet Bengio I may ask directly to him and
report his answer here.
@Dmdmello 4 года назад ⁺²
@Shikhar Srivastava @Shikhar Srivastava Shikhar Srivastava
"Say we're in a given state. If events A & B are simply correlated, and if B occurs, we can consequently say there's a probability of A occuring.
Now if event B is caused by event A, and if B occurs, then A has already occured, and so the probability of the event A occuring forward in time is independent of event B - as in we don't expect A to occur simply because B has occurred. However, if A has occurred, then B must occur with the known P(B|A) probability. Hence the directionality of the relationship."
Let me see if I understand...basically, the problem is that simultaneous correlation between A and B, say in a bayesian net, is not able to grasp the fact that a future occurrence of A becomes conditionally ind. from a past occ. of B, whereas a future occ. of B is still conditionally dependent from a past occ. of A, hence the asymmetry problem.
Then why not treat A in time t=0 and A in time t>0 as different events, so it wouldn't make any sense to compute P(At>0 | Bt=0), since there wouldn't be any connection in the graph of relations? Doesn't it solve the problem of dependence/independence asymmetry, since the A that preceded B would still be dependent on B, but the A that comes after would just be another variable? I guess the problem is a limitation of representation of sequential relations in a Bayesian Net, which is unfeasible, but this is not a difficult for a neural net such as RNNs to model, which are able to grasp those sequential relationships.
@viswanathgangavaram7385 4 года назад
@@ans1975 Just inverting time does not mean anything causal world; Causal world talks about antecedent follows precedent, because precedent causes antedent
@viswanathgangavaram7385 4 года назад
@UCGTnKVtLrM0sI8QZbeEFo7Q Even though almost all causal relations are encoded in data, it seems like without a causal model it is some what impossible to infer those causal relations from data ( even with RNNs)
@cafeliu5401 4 года назад
大牛挖坑
@catsaresocute650 2 года назад
I am a horrible sister I just went to somome IS in the room and I just wanted to make sure He doesn't get into too much trouble what is like snnsnsjjsjsn Always make doubble an tripple sure that the absuive persons know a meeting has been argreed multiple Times and so that they can't deny it and schools so good for that too because it's so good that it can't be rejected socialy without going to tue relm of negelect- Like saying a sister May not teach her brother how to do things. Need to keep maybe book of Interactions w Jackob so I have a better case?

Следующие

Автовоспроизведение

Yoshua Bengio: Deep Learning Cognition | Full Keynote - AI in 2020 & Beyond