Why Non-linear Activation Functions (C1W3L07)

DeepLearningAI

Просмотров 95 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 19 ноя 2024

Комментарии • 49

@anirudhsriram2125 4 года назад ⁺³⁵
These lectures deserve to be recognized as the bible of machine learning
@drdee94 6 лет назад ⁺¹²
Prof Ng is the man!
@tonyflow6244 Год назад ⁺⁷
As a layman clicking buttons to try get a better understanding of large language models I thought I was making some progress, then I watched this video, now I think I should go back to primary school 😢
@lets-see-where-it-is 4 месяца назад
Have you been watching the previous videos in the playlist? If so, I'm very surprised this video was the one you found challenging. Feels like a breather after the last few videos
@moeinhasani8718 5 лет назад ⁺²⁶
it was like i was blind about lin activation function but now im gifted with vision :)
@gpietra 2 года назад ⁺¹
*computer vision
@mario1ua Год назад ⁺²
I needed this, thank you!
@sandipansarkar9211 3 года назад ⁺¹
nice explanation
@SantoshGupta-jn1wn 7 лет назад ⁺⁵
Great video!!! I am still confused about why Relu works when it's properties are quite linear. I mean, I know it's a piece-wise linear function, therefore does not meet the mathematical definition of linear function. But by using Relu, the output is still just a linear combination. Perhaps some neurons don't 'contribute', but the output is still a mathematical result of a linear combination of numbers.
@dengpan8086 7 лет назад ⁺²²
Santosh Gupta Hi, in my understanding, the aim of a neural network is to simulate a function that usually cannot be represented by a closed form expression. A linear activation function could cause the final result to be linear, which is useless as the professor explained. However, the ReLU can avoid this problem, just image that, the linear combination of several outputs that activated by ReLU, is equivalent to a piece-wise function, although each piece in the function is linear, we can somehow view it as an approximation of another complex function. It is just like what we do in computer graphics, we do not draw a curve, instead, we draw a lot of straight lines to simulate a curve. Hope this could be helpful.
@SantoshGupta-jn1wn 7 лет назад
Thanks!!!
@dpacmanh 6 лет назад
For a function to be linear, its slope must be constant throughout. As the ReLu has a lil kink, it makes it a non-linear function.
@jchhjchh 6 лет назад
Hi, I am confused. ReLU will kill the neuron only during the forward pass? Or also during the backward pass?
@dpacmanh 6 лет назад ⁺²
An activation function like ReLu is used only during forward propagation. During the backward propagation, you just take the derivatives and update the weights and biases.
@ysun5125 4 года назад
superb
@TheThunderSpirit 3 года назад
I got a question - does the use of non-linear activation increase model capacity?
@imdb6942 3 года назад ⁺⁴
Exactly my thoughts, came here to gain insight as to how non linear functions like ReLU drastically change usefulness of hidden layers. Instead it's just re-iterated common knowledge of linear functions producing no value when used on hidden layers.
@almoni127 3 года назад ⁺¹
Definitely yes.
Without non-linearity you can only express linear functions.
With non-linearities such as ReLU and sigmoid you can approximate any continuous function on a compact set (search "Universal approximation theorem" for more information).
@TheThunderSpirit 3 года назад
@@almoni127 Sure, but look at that statement and tell me if that always true, even for a single layer network. How can we claim that model capacity has increased after using a non-linear activation instead of linear activation, since it is not quantifiable. How much capacity has increased?
@almoni127 3 года назад
@@TheThunderSpirit
A good analogue is Boolean circuits.
With no hidden layers you don't have much expressiveness.
Already with one hidden layer you have universality, albeit the required hidden layer size might be too large.
With arbitrarily deep networks you can approximate all polynomial computations.
@imdb6942 3 года назад ⁺¹
@@almoni127 Thank you
@ttxiton9622 6 лет назад ⁺²
It's so ridiculous that a video whose speaker is a Chinese but we only got Korean subtitle!!
@yangli8575 5 лет назад ⁺⁵
you know you can actually choose English subtitle in setting...
@Shen-w9q 4 года назад
And hopefully you could add Chinese subtitle to the video.
@chitralalawat8106 5 лет назад
From the videos above, I got to understand that RELu is a linear function... Rest are non linear functions.. But how can consider sigmoid function as binary?? Because a binary function always give output in the form of 0 or 1 but the sigmoid function varies between -ve infinity to +infinity & touching y axis at 0.5?
@Mats-Hansen 5 лет назад ⁺⁶
ReLU is NOT a linear function, just like the sigmoid function or tanh f.ex.
Also. The sigmoid function does not vary between -inf, +inf, its output is in the range (0, 1).
@saurabhshubham4448 5 лет назад
@@Mats-Hansen How you are saying relu as non-linear function? Its combination of two linear function.
@Mats-Hansen 5 лет назад ⁺⁴
@@saurabhshubham4448 Yes it is. But a combination of two linear functions doesn't need to be linear.
@saurabhshubham4448 5 лет назад
@@Mats-Hansen but whenever you apply it on input, it's output will be linear, and thus relu don't help in adding non linearity to the model.
@Mats-Hansen 5 лет назад ⁺⁴
@@saurabhshubham4448 The output is linear yes, but as far as I understand you lose the linear dependencies between the weights. Let's take two weights w1 = -0.2 and w2 = 0.5, and a linear function like lets say f(x) = 3x+0.2. Then f(w1) = -0.4 and f(w2) = 1.7. This function preserves the difference (up to a linear factor) between the two weights (w2-w1 = 0.7, and f(w2)-f(w1) = 3* (w2-w1)=2.1. You will always have this with linear functions, but with a function like ReLU you will not (only interesting of course if at least one of the weights are negative). Now maybe this math is just nonsense, but I think I got a point here somewhere. In a sense you cannot write old weights as a "linear combination" of new weights.
@lets-see-where-it-is 4 месяца назад
h
@techanddroid1164 2 года назад
This person has a lot of knowledge, he picks one thing and then starts explaining other and other. this disease is called explainailibalible sorry
@Charles-rn3ke 6 лет назад ⁺¹⁸
Grandpa telling bedtime story.........
@saanvisharma2081 5 лет назад ⁺²
Then get out of here. SOB!!!
@chitralalawat8106 5 лет назад
Yeps.. ✌️
@netional5154 5 лет назад ⁺⁷
Some of us very much appreciate his way of teaching.
@aqwkpfdhtla9018 6 лет назад ⁺⁶
Your English is difficult to understand. I keep going back to figure out what you mean by some words.
@abdelrahmanabdallah9530 6 лет назад ⁺⁸
lol he is doctore in stanford
@Xirzhole 6 лет назад ⁺¹⁰
maybe learn the basics such as ReLU or Sigmoid will help? I don't think these are daily English words.
@yangli8575 6 лет назад ⁺²²
Do you know who this guy is? You should cherish this opportunity that such a great talent teaches you online. Also, this is the first time that I see someone picking on his English. Maybe it is time for you to improve your listening skills...
@saanvisharma2081 5 лет назад
Such a sick person you're. Neeku burra dobbindhi Ra houla sale gaa,(Telugu)
@umairgillani699 5 лет назад
his English sucks!

Следующие

Автовоспроизведение

Derivatives Of Activation Functions (C1W3L08)