Thank you! It was very clear and the practical example (GAN) was very helpful. I always wondered what Lipschitz continuity was, but didn't dare to ask!
(very crude): If you have a function f, in this case: your network, and you can make sure that the gradient of the function is always 1, i.e. f'(x) = 1, then you can be sure that f(x) is only linear (cuz it's derivative is 1 everywhere) and that means that there are no local optima. For a local optima, you'd need f'(x) = 0 - which you regularise to not happen. The paper in the description expresses it quite well (WGAN with GP)
Thank you! It was very clear and the practical example (GAN) was very helpful. I always wondered what Lipschitz continuity was, but didn't dare to ask!
Thank you for explaining these questions in more detail! :)
Great!
Woohoo, high five for team #without_the_z !
This was a very precise explanation for gan training difficulty
I wasn't prepared for GANs and the trauma it caused me in my masters.
The gradient regularisation was not clear. Could you please refer me to an easy to understand resource?
At 1:48 do you mix up L1 with L2?
Dang it. Thanks! Somehow slipped through
The gradient regularisation was not clear. Could you please refer me to an easy to understand resource?
(very crude): If you have a function f, in this case: your network, and you can make sure that the gradient of the function is always 1, i.e. f'(x) = 1, then you can be sure that f(x) is only linear (cuz it's derivative is 1 everywhere) and that means that there are no local optima. For a local optima, you'd need f'(x) = 0 - which you regularise to not happen.
The paper in the description expresses it quite well (WGAN with GP)