Deep Foundations
Deep Foundations
  • Видео 82
  • Просмотров 67 628
Vecrtor Quantization and Multi-Modal Models
Vecrtor Quantization and Multi-Modal Models
Просмотров: 110

Видео

Contrastive Coding
Просмотров 2163 месяца назад
This lecture describes contrastive coding, a fourth fundamental loss function of deep learning.
Guidance for diffusion Models
Просмотров 2863 месяца назад
In practice diffusion models need a poorly understood alteration of their objective function called "guidance". This talk gives the timeline of the rapid development from the introduction of guidance in May 2021 to DALLE2 in March 2022.
Diffusion1
Просмотров 2154 месяца назад
This is an introduction to the mathematics of diffusion models.
VAE
Просмотров 1414 месяца назад
This lecture describes the ELBO loss function that defines variational auto encoders (VAEs) as a third fundamental equation (together with cross entropy loss and the GAN adversarial objective).
GANs
Просмотров 834 месяца назад
The formulation of GANs plus a variety of applications of GANs and discriminative loss.
Lecture 8b
Просмотров 1584 месяца назад
The Occam Guarantee (The Free Lunch Theorem), The PAC-Bayes Theorem (real valued model parameters and L2 regularization guarantees), Implicit Regularization, Calibration, Ensembles, Dounble Descent, and Grokking.
SDE
Просмотров 2295 месяцев назад
Gradient Flow, Diffusion Processes (Brownian Motion), Langevin Dynamics and the Stochastic Differential Equation (SDE) model of SGD
SGD
Просмотров 2715 месяцев назад
A presentation of Vanilla SGD, momentum and Adam with an analysis based on understanding temperature and its relationship to hyper-parameter tuning.
Transformer
Просмотров 2625 месяцев назад
Language Modeling, Self Attention and the Transformer.
Some Fundamental Architectural Elements
Просмотров 2215 месяцев назад
This describes the motivation for RELU, initialization methods, normalization layers and residual connections.
history to 2024
Просмотров 7076 месяцев назад
This is an overview of the history of Deep learning. It reviews the history starting from the introduction of the neural threshold unit in 1943 but focusing mainly on the "current era" which starts in 2012 with AlexNet.
Lecture 5: Language Modeling and the Transformer
Просмотров 289Год назад
Lecture 5: Language Modeling and the Transformer
Lecture 4: Initialization, Normalization, and Residual Connections
Просмотров 327Год назад
Lecture 4: Initialization, Normalization, and Residual Connections
Lecture 3 Einstein notation and CNNs
Просмотров 410Год назад
Lecture 3 Einstein notation and CNNs
Lecture 1: A Survey of Deep Learning
Просмотров 770Год назад
Lecture 1: A Survey of Deep Learning
More Recent Developments
Просмотров 5472 года назад
More Recent Developments
Vector Quantized Variational Auto-Encoders (VQ-VAEs).
Просмотров 8 тыс.2 года назад
Vector Quantized Variational Auto-Encoders (VQ-VAEs).
Progressive VAEs
Просмотров 4602 года назад
Progressive VAEs
Gaussian Models and the Perils of Differential Entropy
Просмотров 4102 года назад
Gaussian Models and the Perils of Differential Entropy
Variation Auto Encoders (VAEs)
Просмотров 8202 года назад
Variation Auto Encoders (VAEs)
VAE Lecture 1
Просмотров 4542 года назад
VAE Lecture 1
SGD Lecture 1
Просмотров 2733 года назад
SGD Lecture 1
2021 Developments
Просмотров 5183 года назад
2021 Developments
Back-Propagation
Просмотров 8343 года назад
Back-Propagation
Back-Propagation with Tensors
Просмотров 1 тыс.3 года назад
Back-Propagation with Tensors
The Educational Framework (EDF)
Просмотров 1 тыс.3 года назад
The Educational Framework (EDF)
Minibatching
Просмотров 5623 года назад
Minibatching
Trainability
Просмотров 5273 года назад
Trainability
Einstein Notation
Просмотров 5013 года назад
Einstein Notation

Комментарии

  • @nserver109
    @nserver109 Месяц назад

    Wonderful!

  • @DorPolo-x5g
    @DorPolo-x5g Месяц назад

    great video.

  • @ees7416
    @ees7416 2 месяца назад

    fantastic course. thank you.

  • @saikalyan3966
    @saikalyan3966 3 месяца назад

    Uncanny

  • @stevecaya
    @stevecaya 4 месяца назад

    At minute 16 is priceless. The teacher is not sure if there has been any big advances in AI other then this small thing called GPT-3. Ha ha ha ha. Nothing big about that model other then it would turn out to be the biggest consumer app to 100 million users in history. And usher in the AI age to the general public. Dude, how did you miss that one…ouch.

  • @moormanjean5636
    @moormanjean5636 4 месяца назад

    This is such a helpful video thank you

  • @martinwafula1183
    @martinwafula1183 5 месяцев назад

    Very timely tutorial

    • @solomonw5665
      @solomonw5665 5 месяцев назад

      *released 3 years ago 🫠

    • @quickpert1382
      @quickpert1382 5 месяцев назад

      @@solomonw5665 timed for showing off in RUclips recommendations after KANs released. For him was quite timed, for me, too late already

  • @shivamsinghcuchd
    @shivamsinghcuchd 5 месяцев назад

    This is gold!!

  • @aditya_a
    @aditya_a 9 месяцев назад

    Narrator: deep networks were NOT saturated lol

  • @jennifergo2024
    @jennifergo2024 11 месяцев назад

  • @jennifergo2024
    @jennifergo2024 11 месяцев назад

  • @jennifergo2024
    @jennifergo2024 11 месяцев назад

  • @jennifergo2024
    @jennifergo2024 11 месяцев назад

  • @jennifergo2024
    @jennifergo2024 11 месяцев назад

  • @jennifergo2024
    @jennifergo2024 11 месяцев назад

  • @K3pukk4
    @K3pukk4 Год назад

    what a legend!

  • @yorailevi6747
    @yorailevi6747 Год назад

    Thanks i was just searching for this idea

  • @zeydabadi
    @zeydabadi Год назад

    Could you elaborate on “… j ranges over neurons at that position …” ?

  • @verystablegenius4720
    @verystablegenius4720 Год назад

    terrible exposition - doesn't seem to understand it himself either. "we should do the verification" even your notation is not clear. Also: "unary potential" is called a BIAS. Just read a stat. mech. book before making these videos, sigh.

  • @andrewluo6088
    @andrewluo6088 Год назад

    After watched this video, I finally understand

  • @stupidoge
    @stupidoge 2 года назад

    Thanks for ur interpretation. I have a clear understading of how this equation works. (If could, I still need some detailed teaching on each part of equation. all in all, thanks for your help!!!

  • @AmitKumarPradhan57
    @AmitKumarPradhan57 2 года назад

    I understood when Ps = Pop, contrastive divergence goes to zero as distribution of Y_hat and Y are same. It's not clear to me why the gradient also goes to zero. Thank you in advance. PS: I took this course last quarter.

  • @Jootawallah
    @Jootawallah 3 года назад

    Another question, why is the gate function G(t) not just an independent parameter between 0 and 1? What do we gain from making it a function of h_t-1 and x? At the end, SGD would find good values for G(t) even if it were an independent parameter.

  • @Jootawallah
    @Jootawallah 3 года назад

    Is there an explanation for why the three gated RNN architectures here differ in performance? Why is the LSTM, the architecture with the most parameters, not the most effective one? In fact, neither is the simplest one the most effective. It's the intermediate one that takes the gold medal. But why?

    • @davidmcallester4973
      @davidmcallester4973 3 года назад

      A fair comparison uses the same number of parameters for each architecture --- you can always increase the dimension of the hidden vectors. Some experiments have indicated that at the same number of parameters all the gated RNNs behave similarly. But there is no real analytic understanding.

  • @Jootawallah
    @Jootawallah 3 года назад

    I don't understand, what is the benefit of using a gated, i.e. residual, architecture? You talk about gates allowing forgetting or remembering, but why would we want to forget anyway? Also, whether G is zero or one, we always remember the previous state h_t-1 in some way! So I don't get it ...

    • @davidmcallester4973
      @davidmcallester4973 3 года назад

      A vanilla RNN just after initialization does not remember the previous hidden state because the information is destroyed by the randomly initialized parameters. Vanilla RNNs could probably be improved with initializations that are better at remembering the previous state, but the structure of a gated RNN seems to cause SGD to find parameter settings with better memory than happens by running SGD on vanilla RNNs.

    • @Jootawallah
      @Jootawallah 3 года назад

      @@davidmcallester4973 So is this again just a matter of residual architectures providing a lower bound to the gradient, and thus preventing them vanishing?

  • @Jootawallah
    @Jootawallah 3 года назад

    On slide 11, shouldn't it be self.x.addgrad(self.x.grad*...) ? self.grad isn't defined, right?

  • @Jootawallah
    @Jootawallah 3 года назад

    Illuminating!

  • @Jootawallah
    @Jootawallah 3 года назад

    So if KL(p|q) = 0, does it mean that p = q upto a constant? Or are there other symmetries to take into account?

  • @jonathanyang2359
    @jonathanyang2359 3 года назад

    Thanks! I don't attend this institution, but this was an extremely clear lecture :)

  • @addisonweatherhead2790
    @addisonweatherhead2790 3 года назад

    The intuition around the 13 minute mark was really helpful! I've been trying to understand this paper for a few days now, and this has really made its goal and reasoning more succinct. Thanks!

  • @bernhard-bermeitinger
    @bernhard-bermeitinger 3 года назад

    Thank you for this video, however, please don't call your variable ŝ 😆 (or at least don't say it out loud)

  • @kaizhang5796
    @kaizhang5796 3 года назад

    Great lecture! May I ask how to choose the conditional probability of node i given its neighbors in a continuous case? Thanks!

    • @davidmcallester4973
      @davidmcallester4973 3 года назад

      If the node values are continuous and the edge potentials are Gaussian then the conditional probability of a node given its neighbors is also Gaussian.

    • @kaizhang5796
      @kaizhang5796 3 года назад

      David McAllester thanks! if each node has d-dimensional features, and let node i has k neighbors. Then how to determine the parameters of this Gaussian p(i | k neighbors)?

    • @kaizhang5796
      @kaizhang5796 3 года назад

      Should I multiply k Gaussians together, where the mean of each Gaussian are the k neighbors?

  • @LyoshaTheZebra
    @LyoshaTheZebra 3 года назад

    Thanks for explaining that! Great job. Subscribed!

  • @sdfrtyhfds
    @sdfrtyhfds 3 года назад

    also, what if you skip the quantization during inference? would you still get images that make sense?

    • @davidmcallester4973
      @davidmcallester4973 3 года назад

      Do you mean "during generation"? During generation you can't skip the quantization because the pixel-CNN is defined to generate the quantized vectors (the symbols).

    • @sdfrtyhfds
      @sdfrtyhfds 3 года назад

      @@davidmcallester4973 I guess that during generation it wouldn't make much sense, i was thinking more in the direction of interpolating smoothly between two different symbols.

  • @sdfrtyhfds
    @sdfrtyhfds 3 года назад

    do you train the pixel cnn on the same data and just not update the Vae weights while training?

    • @davidmcallester4973
      @davidmcallester4973 3 года назад

      yes, the vector quantization is held constant as the pixel CNN is trained.

  • @bastienbatardiere1187
    @bastienbatardiere1187 3 года назад

    you are not even taking the examle of a graph with loops, which is the whole point of LBP. Moreover, please introduce a little bit the notations, we should be able to understand even if we did not watch your previous videos. Nevertheless, it's great to teach such method.

  • @mim8312
    @mim8312 3 года назад

    Now, the cutting edge scientists are working on the future AIs. Creating an AI by a combination of multiple AI's, which reportedly is similar to how our brain functions, which different portions performing specific functions, which can then understand and perform a multiple set of completely different tasks better than humans? What could go wrong?

  • @mim8312
    @mim8312 3 года назад

    I think that too many people are focusing on the game, which I also follow, as if this were an ordinary player. Since I have significant knowledge, and since I believe that Hawking and Musk were right, I am really anxious by the self-taught nature of this AI. This particular AI is not the worrisome thing, albeit it has obvious, potential applications in military logistics, military strategy, etc. The really scary part is how fast this was developed after AlphaGO debuted. We are not creeping up on the goal of human-level intelligence. We are likely to shoot past that goal amazingly soon without even realizing it, if things continue progressing as they have. The early, true AIs will also be narrow and not very competent or threatening, even if they become "superhuman" in intelligence. They will also be harmless, idiot savants at first. Upcoming Threat to Humanity. The scary thing is the fact that computer speed (and thereby, probably eventually AI intelligence) doubles about every year, and will likely double faster when super-intelligent AIs start designing chips, working with quantum computers as co-processors, etc. How fast will our AIs progress to such levels that they become indispensable -- while their utility makes hopeless any attempts to regulate them or retroactively impose restrictions on beings that are smarter than their designers? At first, they may have only base functions, like the reptilian portion of our brain. However, when will they act like Nile crocodiles and react to any threat with aggression? Ever gone skinny dipping with Nile crocodiles? I fear that very soon, before we realize it, we will all be doing the equivalent of skinny dipping with Nile crocodiles, because of how fast AIs will develop by the time that the children born today reach their teens or middle age. Like crocodiles that are raised by humans, AIs may like us for a while. I sure hope that lasts. As the announcer in Jeopardy said about a program that was probably not really an advanced AI long ago, I, for one, welcome our future, AI overlords.

  • @zv8369
    @zv8369 3 года назад

    5:50 The reference was meant to be Poole et el. rather than Chen et al.? arxiv.org/abs/1905.06922

  • @zv8369
    @zv8369 3 года назад

    Could you please provide reference for your statement at 11:18 *"cross-entropy objective is an upperbound on the population entropy"*

    • @zv8369
      @zv8369 3 года назад

      I think I got around to understanding why this is the case. Entropy, H(X), is the minimum number of bits required for representing X. Cross entropy is minimum when q matches the true distribution p (the minimum CE value is the entropy that is using the true distribution); otherwise CE is larger than entropy. Therefore, CE is an upperbound on the entropy! I didn't do a good job describing this but hope this helps!

    • @siyaowu7443
      @siyaowu7443 Год назад

      @@zv8369 Thanks! It helps me a lot!