Видео 14
Просмотров 102 561

Deep Generative Models, Stable Diffusion, and the Revolution in Visual Synthesis

36:27

BERT: transfer learning for NLP

13:45

Transformers - Part 7 - Decoder (2): masked self-attention

8:37

Transformer - Part 6 - Decoder (1): testing and training

10:45

Transformer - Part 8 - Decoder (3): Encoder-decoder self-attention

8:53

Transformers - Part 5 - Transformers vs CNNs and RNNS

11:19

Guest lecture by Wayve,2023

The video shows the guest lecture by Oleg Sinavski from Wayve. The lecture was a part of my course "SSY340 -- Deep Machine Learning", which is given at Chalmers University of Technology. More details are provided below.
0:00 Self-driving vehicles and Software 2.0
4:14 Wayve's approach to self-driving vehicle control
07:45 Limitations of AV2.0
10:01 GAIA-1
16:35 LINGO
28:00 Questions to Oleg Sinavski
Title: Language and Video-Generative AI in Autonomous Driving
Abstract:
Recent advances in Large Language Models (LLMs) demonstrate the possibility of achieving human-level capabilities in generating explanations and reasoning using end-to-end text models. Meanwhile, explainability and reasoning remai...

Видео

Deep Generative Models, Stable Diffusion, and the Revolution in Visual Synthesis

36:27

Deep Generative Models, Stable Diffusion, and the Revolution in Visual Synthesis

Просмотров 2,5 тыс.2 года назад

We had the pleasure of having Professor Björn Ommer as a guest lecturer in my course SSY340, Deep machine learning at Chalmers University of Technology. Chapters: 0:00 Introduction 8:10 Overview of generative models 15:00 Diffusion models 19:37 Stable diffusion 26:10 Retrieval-Augmented Diffusion Models Abstract: Recently, deep generative modeling has become the most prominent paradigm for lear...

13:45

BERT: transfer learning for NLP

Просмотров 10 тыс.3 года назад

In this video we present BERT, which is a transformer-based language model. BERT is pre-trained in a self-supervised manner on a large corpus. After that, we can use transfer learning and fine-tune the model for new tasks and obtain good performance even with a limited annotated dataset for the specific task that we would like solve (e.g., a text classification task). The original paper: arxiv....

Transformers - Part 7 - Decoder (2): masked self-attention

8:37

Transformers - Part 7 - Decoder (2): masked self-attention

Просмотров 20 тыс.3 года назад

This is the second video on the decoder layer of the transformer. Here we describe the masked self-attention layer in detail. The video is part of a series of videos on the transformer architecture, arxiv.org/abs/1706.03762. You can find the complete series and a longer motivation here: ruclips.net/p/PLDw5cZwIToCvXLVY2bSqt7F2gu8y-Rqje Slides are available here: chalmersuniversity.box.com/s/c2a6...

Transformer - Part 6 - Decoder (1): testing and training

10:45

Transformer - Part 6 - Decoder (1): testing and training

Просмотров 9 тыс.3 года назад

This is the first out of three videos about the transformer decoder. In this video, we focus on describing how the decoder is used during testing and training since this is helpful in order to understand how the decoder is constructed The video is part of a series of videos on the transformer architecture, arxiv.org/abs/1706.03762. You can find the complete series and a longer motivation here: ...

Transformer - Part 8 - Decoder (3): Encoder-decoder self-attention

8:53

Transformer - Part 8 - Decoder (3): Encoder-decoder self-attention

Просмотров 9 тыс.3 года назад

This is the third video about the transformer decoder and the final video introducing the transformer architecture. Here we mainly learn about the encoder-decoder multi-head self-attention layer, used to incorporate information from the encoder into the decoder. It should be noted that this layer is also commonly known as the cross-attention layer. The video is part of a series of videos on the...

Transformers - Part 5 - Transformers vs CNNs and RNNS

11:19

Transformers - Part 5 - Transformers vs CNNs and RNNS

Просмотров 4,1 тыс.4 года назад

In this video, we highlight some of the differences between the transformer encoder and CNNs and RNNs. The video is part of a series of videos on the transformer architecture, arxiv.org/abs/1706.03762. You can find the complete series and a longer motivation here: ruclips.net/p/PLDw5cZwIToCvXLVY2bSqt7F2gu8y-Rqje Slides are available here: chalmersuniversity.box.com/s/c2a64rz0hlp44pdouq9mc24msbz...

6:32

Transformers - Part 4 - Encoder remarks

Просмотров 4,2 тыс.4 года назад

In this video we highlight a few properties of the transformer encoder. The video is part of a series of videos on the transformer architecture, arxiv.org/abs/1706.03762. You can find the complete series and a longer motivation here: ruclips.net/p/PLDw5cZwIToCvXLVY2bSqt7F2gu8y-Rqje Slides are available here: chalmersuniversity.box.com/s/c2a64rz0hlp44pdouq9mc24msbz60xf2

Transformers - Part 2 - Self attention complete equations

9:52

Transformers - Part 2 - Self attention complete equations

Просмотров 9 тыс.4 года назад

In this video, we present the complete equations for self-attention. The video is part of a series of videos on the transformer architecture, arxiv.org/abs/1706.03762. You can find the complete series and a longer motivation here: ruclips.net/p/PLDw5cZwIToCvXLVY2bSqt7F2gu8y-Rqje Slides are available here: chalmersuniversity.box.com/s/c2a64rz0hlp44pdouq9mc24msbz60xf2

14:46

Transformers - Part 3 - Encoder

Просмотров 10 тыс.4 года назад

In this video, we present the encoder layer in the transformer. Important components of this presentation is that we introduce multi-head attention, positional encodings and the architecture of the encoder blocks that appear inside the encoder. The video is part of a series of videos on the transformer architecture, arxiv.org/abs/1706.03762. You can find the complete series and a longer motivat...

Transformers - Part 1 - Self-attention: an introduction

15:56

Transformers - Part 1 - Self-attention: an introduction

Просмотров 18 тыс.4 года назад

In this video, we briefly introduce transformers and provide an introduction to the intuition behind self-attention. The video is part of a series of videos on the transformer architecture, arxiv.org/abs/1706.03762. You can find the complete series and a longer motivation here: ruclips.net/p/PLDw5cZwIToCvXLVY2bSqt7F2gu8y-Rqje Slides are available here: chalmersuniversity.box.com/s/c2a64rz0hlp44...

10:31

Flipping a Masters course

Просмотров 9387 лет назад

In this video we describe how we flipped a Masters course (SSY320) at Chalmers University of Technology, during the fall 2014. We will also go though what the teacher and the students thought about the experience.

An introduction to flipped classroom teaching

7:56

An introduction to flipped classroom teaching

Просмотров 1,4 тыс.7 лет назад

In this video we explain the concept of flipped classroom teaching and discuss some of the arguments regarding why it is good.

Generalized optimal sub-pattern assignment metric (GOSPA)

14:53

Generalized optimal sub-pattern assignment metric (GOSPA)

Просмотров 4 тыс.7 лет назад

This video presents the GOSPA paper: A. S. Rahmathullah, Á . F. García-Fernández, and L. Svensson, “Generalized optimal sub-pattern assignment metric,” in 2017 20th International Conference on Information Fusion, Xi’an, P.R. China, Jul. 2017. arxiv.org/abs/1601.05585 The paper received the best paper award at the Fusion conference in 2017, and Matlab code to compute the GOSPA metric is availabl...

@subusrable 3 дня назад
very few people know these concepts well enough to give detailed explanation with formulae. thanks a ton. I was having a lot of queries and this video helped resolve those
@farrugiamarc0 6 месяцев назад
A very clear and amazingly detailed explanation of such a complex topic. It would be nice to have more videos related to ML from you!
@farrelledwards 6 месяцев назад
This is great
@jeremykenn 6 месяцев назад
does 5:45 - 8:15 refer to the old RNN training method ? and hence next video is the real transformer decoder
@nomecognome-f9w 7 месяцев назад
thank you for your work, these are incredible videos. but there is one thing I didn't understand. during the training phase, the entire sentence already correctly translated is given as input to the decoder and to prevent the transformer from "cheating" masked self attention is used. How many times does this step happen? because if it only happened once then the hidden words would not be usable during training. During the training phase, after each step does backpropagation occur and then does the mask move, hiding fewer words?
@prateekpatel6082 9 месяцев назад
pretty bad example . Even if we have trainiable Wq and Wk , what if there was a new sentence where we had Tom and and he , the WQ will still make word 9 point to wmma and she
@babbobill510 9 месяцев назад
Hi, great video! I have just a question about this. When we compute Z = KT * Q, where KT is the transpose of K, we are doing Z = (W_K * X)T * (W_Q * X) = XT * W_KT * W_Q * X. Now, calling M = W_KT * W_Q, we have Z = XT * M * X. So why we are decomposing M into W_KT * W_Q? At the end we use only the product of W_K and W_Q, so why do we learn both separately and not just learn M directly? Thank you
@DmitryPesegov 11 месяцев назад
Need an example with BATCH being fed into that. What would be the rows in batches? What would Y looks like? Only then it is possible to really see how masks works.
@cedricmanouan1615 Год назад
The first sentence of the video solved my problem 😅 "what enables us to parallelize calculation during training"
@ryanhewitt9902 Год назад
These videos are wonderful, thank you for putting in the work. Everything was communicated so clearly and thoroughly. My interpretation of the attention mechanism is that the result of the similarity (weight) matrix multiplied by the value matrix gives us an offset vector, which we then add to the value and normalize to get a contextualized vector. It's interesting in the decoder, we derive this offset from a value vector in the source language, add it to the target words and it is still somehow meaningful. I presume that it is the final linear layer which ensures that this resulting normalized output vector maps coherently to a discrete word in the target language. If we can do this across languages, I wonder if this can be done across modalities.
@lennartsvensson7636 Год назад
Thanks. That sounds like an accurate description of cross-attention (what I refer to as encoder-decoder attention). It can certainly be used across modalities and there are many papers describing just that. The common combination is probably images and language but images and point clouds, video and audio, and many other combinations can be found in the literature.
@Chill_Magma Год назад
Great job Dr Lennart . Everyone should learn from you
@cheese-power Год назад
The linear production equations have operands in the wrong order.
@dirtyharry7280 Год назад
Excellent, thank you!
@hemanthsai369 Год назад
Best video on masking!
@amirnasser7768 Год назад
Thank you for the nice explanation. I think you missed to mention that in order to get zeros masking with the softmax you need to set the values (upper triangle of the matrix) to negative infinity.
@jeremyyd1258 Год назад
Excellent video! Thank you!
@zbynekba Год назад
Hi Lennart, where from have you retrieved all the details that you are presenting here? I mean, have you maybe studied/analyzed a source code of an existing implementation of transformer-based models? I haven’t found anywhere else this detailed explanation. Bravo! And thank you.
@piyushkumar-wg8cv Год назад
What is the difference between attention and self-attention?
@piyushkumar-wg8cv Год назад
Intuition buildup was amazing, you clearly explained why we need learnable parameters in the first place and how that can help relate similar words. Thanks for the explanation.
@g111an Год назад
Just binged the entire playlist, helped me understand the intuitions behind the math. I hope you make more videos :)
@atheistjourney Год назад
Excellent video, as all of yours are. Thank you, I have learned a lot. One thing I'm not clear on is why we need the free parameters W. And how might those be trained? Thank you again.
@randalllionelkharkrang4047 Год назад
Would you be interested in working together, in say, creating a "better" model than tranformer models? I believe we can integrate reasoning , that would be similar to how humans reason in the higher layers of the tranformer
@manishagarwal5323 Год назад
Hi Professor, are there lectures of courses, or weblinks to your what you teach? Love your clear, precise and well paced coverage of the concepts here! many thanks.
@lennartsvensson7636 Год назад
Thanks for your kind words. I have an online course in multi-object tracking (on RUclips and edX) but it is model-based instead of learning-based. Hopefully, I will soon find time to post more ML material.
@notanape5415 Год назад
Beautifully explained, thank you. Transformers are so simplistic yet powerful.
@violinplayer7201 Год назад
this is so clear explanations, thanks so much
@nappingyiyi Год назад
Thank you professor for this amazing series on the transformer!
@SungheeYun Год назад
Best RUclips video explaining Transformer ever!
@ahmedb2559 Год назад
Thank you !
@ahmedb2559 Год назад
Thank you !
@TechSuperGirl Год назад
Dear Lennart, that was awesome, could you please make a tutorial in python as well? :)
@mir7tahmid Год назад
best explanation! Thank you Mr. Svensson.
@leiqin111 Год назад
The best transformer video!
@andrem82 Год назад
Best explanation of self-attention I've seen so far. This is gold.
@prasadkendre149 Год назад
best explanation
@prasadkendre149 Год назад
greatful forever
@abrahamowos 2 года назад
At 7:14 ,I was the notations would be sm(Z_11, Z_12) and sm(Z_21, Z_22) for the second column...... This that correct?
@asmersoy4111 2 года назад
Very helpful. Thank you!
@akhileshbisht6469 2 года назад
After the training of the model, when we are giving an unknown source sentence to the model how does it predict or decode the words?
@lennartsvensson7636 2 года назад
One of the earlier videos focuses on this process. Have you watched the entire series?
@deanmeece326 2 года назад
"Revolution in visual synthesis" is an excellent label of this epoch. Good video and helps people understand how inverting diffusion helps us arrive towards the concepts of "stable" diffusion.
@goelnikhils 2 года назад
Thanks a lot Lennart. What a crisp and clear explanation of BERT.
@exxzxxe 2 года назад
Lennart, you are the RUclips Wizard of Transformers!
@exxzxxe 2 года назад
Well done!
@exxzxxe 2 года назад
A first class explanation of self attention- the best on RUclips.
@gehadhisham2539 2 года назад
I really love this series, Many thanks to you sir. But may I ask if I want additional sources to study the transformer from, what do you recommend?
@antonisnesios1701 2 года назад
Thanks a lot, the only complete course about transformers that I found. One question, Why K = [q1 q2 ... q_(nE)] and not K=[ k1 .... ] (or its typo?)
@lennartsvensson7636 2 года назад
That's definitely a typo! Thanks for pointing it out. I might actually adjust the videos later this fall to correct typos like this.
@jaehoyoon7061 2 года назад
Very easy to follow
@СоломияНовицкая 2 года назад
「動画の音が良くない」、
@rickyebay 2 года назад
This is the best explanation on Transformer I have found in the web. Can you doing another set of video for T5 ?
@shaifulchowdhury9967 2 года назад
Undoubtedly, these 8 videos best explain transformers. I tried other videos and tutorials, but you are the best.
@victormachadogonzaga1898 2 года назад
Awesome!

Lennart Svensson

Видео

Комментарии