This is great, but I have to agree with the other comments: if there is more you can say about the details of automatic differentiation that would be helpful!
If you mean a place where it's really explained basically, like you'd teach it to a school kid, I have not found anything like that yet. But if you look up the basics of Dual Numbers and apply them to the elements of Calculus, then you can build it up quite quickly: For instance, at school you are taught that f'(x) = (f(x + dx) - f(x)) / dx Let's call dx "a.epsilon" , now rearrange the above to get f(x+a.epsilon) = f(x) + f'(x).a.epsilon So the input to your function is a dual number and so is its output. As somebody already pointed out (@John Doe), the coefficient of epsilon in this case will be 1, since that's the derivative of x wrt x. Notice that the above actually yields the Chain Rule FOR FREE! 😀 f(g(x+epsilon)) = f( g(x) + g'(x).epsilon)) NB: The coefficient of epsilon is g'(x) here = f(g(x)) + f'( g(x) ). g'(x).epsilon If you now were to add the generalized form of the Exponentation Rule, i.e. (u^v)' = (u^v) * (u' (v/u) + v' log(u) ) to the Product and Quotient Rules, you'll be able to do calculations involving powers (including Sqrt) WITHOUT needing to resort to Babylonian or any other iterative algorithms! This is what it looks like in Julia, following the style of Edelman in this video. (I hope I typed it in correctly, cut-n-paste unavailable right now) ^(x::D,y::D) = D(( x.f[1]^y.f[1], # first part of Dual, i.e. the function evaluation x.f[1]^y.f[1] * # (u^v) * (x.f[2]* y.f[1]/x.f[1] # u'(v/u) + f.f[2]*log(x.f[1])))) # + v' log(u) e.g. > D((2,1))^0.5 === > D((1.4142...blah, 0.353553...blah)) # iree - no Babahlonian algorithm used mon
Makes sense (lucky I once programmed a square root Newton algorithm when Java BigInteger didn't have it yet) And I was like: who is this guy? Ah, famous professor.
There is a bug in the convert function. Rather than initializing the gradient to 0 you should be initializing it to 1. The reason why is because the conversion should map a real number to the identity morphism, which for the category of differentiable functions is a function whos derivative is 1. You even knew it was wrong because when you ran blocks 8 and 9 you used 1 as the initial gradient, not 0.
I think it is correct to initialise the derivative to zero; this is the forward mode, for a constant scalar c, its tangent w.r.t the input x, dc/dx should always be zero. However, for x itself, dx/dx=1 by definition. If however, all non-dual's derivatives initialise as 1, then you are computing the directional derivative (of direction [1,1,1,1...]) instead of partials.
It's in the Julia tutorials repository on github: github.com/JuliaAcademy/JuliaTutorials/blob/master/introductory-tutorials/intro-to-julia/AutoDiff.ipynb
No, you do not need to know the derivative of the entire expression, just derivatives of simple building blocks (like multiplication or exponentiation.)
@@ayush9psycho I still think matlab is easier (I use julia, matlab and swift). Matlab makes machine learning feel like you’re walking on a cloud, it just works and works very well.
This is great, but I have to agree with the other comments: if there is more you can say about the details of automatic differentiation that would be helpful!
More details can be found in the 18.337 Parallel Computing and Scientific Machine Learning course! ruclips.net/video/zHPXGBiTM5A/видео.html
Great content, but this should have been a 30min video.
Does anyone know a place where I can go to properly understand this concept?
If you mean a place where it's really explained basically, like you'd teach it to a school kid, I have not found anything like that yet. But if you look up the basics of Dual Numbers and apply them to the elements of Calculus, then you can build it up quite quickly:
For instance, at school you are taught that
f'(x) = (f(x + dx) - f(x)) / dx
Let's call dx "a.epsilon" , now rearrange the above to get
f(x+a.epsilon) = f(x) + f'(x).a.epsilon
So the input to your function is a dual number and so is its output. As somebody already pointed out (@John Doe), the coefficient of epsilon in this case will be 1, since that's the derivative of x wrt x.
Notice that the above actually yields the Chain Rule FOR FREE! 😀
f(g(x+epsilon)) = f( g(x) + g'(x).epsilon)) NB: The coefficient of epsilon is g'(x) here
= f(g(x)) + f'( g(x) ). g'(x).epsilon
If you now were to add the generalized form of the Exponentation Rule, i.e.
(u^v)' = (u^v) * (u' (v/u) + v' log(u) )
to the Product and Quotient Rules, you'll be able to do calculations involving powers (including Sqrt) WITHOUT needing to resort to Babylonian or any other iterative algorithms!
This is what it looks like in Julia, following the style of Edelman in this video. (I hope I typed it in correctly, cut-n-paste unavailable right now)
^(x::D,y::D) = D(( x.f[1]^y.f[1], # first part of Dual, i.e. the function evaluation
x.f[1]^y.f[1] * # (u^v) *
(x.f[2]* y.f[1]/x.f[1] # u'(v/u)
+ f.f[2]*log(x.f[1])))) # + v' log(u)
e.g. > D((2,1))^0.5
=== > D((1.4142...blah, 0.353553...blah)) # iree - no Babahlonian algorithm used mon
Makes sense (lucky I once programmed a square root Newton algorithm when Java BigInteger didn't have it yet) And I was like: who is this guy? Ah, famous professor.
wow. great stuff
There is a bug in the convert function. Rather than initializing the gradient to 0 you should be initializing it to 1. The reason why is because the conversion should map a real number to the identity morphism, which for the category of differentiable functions is a function whos derivative is 1. You even knew it was wrong because when you ran blocks 8 and 9 you used 1 as the initial gradient, not 0.
Nice catch. As a Julia beginner, figuring out from first principles, I wondered about that...
I think it is correct to initialise the derivative to zero; this is the forward mode, for a constant scalar c, its tangent w.r.t the input x, dc/dx should always be zero. However, for x itself, dx/dx=1 by definition. If however, all non-dual's derivatives initialise as 1, then you are computing the directional derivative (of direction [1,1,1,1...]) instead of partials.
I’m not sure from the documentation, how to take a derivative at a certain value of x.
can you provide the link for this notebook?
It's in the Julia tutorials repository on github:
github.com/JuliaAcademy/JuliaTutorials/blob/master/introductory-tutorials/intro-to-julia/AutoDiff.ipynb
Wow really interesting approach. Does it requires knowing the derivative in the first place?
No, you do not need to know the derivative of the entire expression, just derivatives of simple building blocks (like multiplication or exponentiation.)
There is no subject in this video. Neither autodiff nor the dual number were explained properly.
Julia seems to be competive with Matab. Matalb is powerful for interactive scientific
computation, and easy to use. Julia is too overwhelming for me.
overwhelming no!! It's as easy as python!!
@@ayush9psycho I still think matlab is easier (I use julia, matlab and swift). Matlab makes machine learning feel like you’re walking on a cloud, it just works and works very well.
Unfortunately this language has got those shitty "end" statements also ...