U r awesome 👍👏😊 nitish, I have recently joined ur channel... Bahut kuch acha or naya seekhne ko mil rha.. Wo bi hindi mai... Seedhe dimag mai chipak jata... Thanks to be the part of my data journey...love you 3000 ❤❤❤
Suppose we have an input X with n components and a linear neuron with random weights Wthat spits out an output Y. The variance of y can be written as: Y=W1X1+W2X2+⋯+WnXn We know that the variance of WiXi is Var(WiXi)=E(Xi)2Var(Wi)+E(Wi)2Var(Xi)+Var(Wi)Var(Xi) Here, we assume that Xi and Wi are all identically and independently distributed (Gaussian distribution with zero mean), we can work out the variance of Y which is: Var(Y)=Var(W1X1+W2X2+⋯+WnXn)=Var(W1X1)+Var(W2X2)+⋯+Var(WnXn)=nVar(Wi)Var(Xi) The variance of the output is the variance of the input but it is scaled by nVar(Wi). Hence, if we want the variance of Y to be equal to the variance of X, then the term nVar(Wi) should be equal to 1. Hence, the variance of the weight should be: Var(Wi)=1/n(input) This is Xavier Initialization formula. We need to pick the weights from a Gaussian distribution with zero mean and a variance of 1n where n is the number of input neurons in the weight tensor.. That is how Xavier (Glorot) initialization is implemented in Caffee library. Similarly, if we go through backpropagation, we apply the same steps and get: Var(Wi)=1/n(out) In order to keep the variance of the input and the output gradient the same, these two constraints can only be satisfied simultaneously if n(input)=n(output). However, in the general case, the n(input) and n(output) of a layer may not be equal, and so as a sort of compromise, Glorot and Bengio suggest using the average of the n(input) and n(output), proposing that: Var(Wi)=1/n(avg) where n(avg)=(n(input)+n(out))/2. So, the idea is to initialize weights from Gaussian Distribution with mean = 0.0 and variance: σ=√(2/(n(input)+n(output)) Note that when the number of input connections is roughly equal to the number of output connections, you get the simpler equations: σ^2=1/n(input)
Hey, will you be covering Momentum optimization, Nesterov accelerated gradient, adaGrad, RmsProp, adam and nadam optimization in your future videos in deep learning playlist?
How do you upload your data to raspberry pi or arduino. Do i need to buy heavy graphics card laptop? Or I can simply go with dell inspiron 14 5514 or hp Pavillion aero 13.
how to join your full course because in 100 day python learning there is no 100 videos ...i want to there is any site or u only upload video in u tube not other site?...i want to do proper course?
U r awesome 👍👏😊 nitish, I have recently joined ur channel... Bahut kuch acha or naya seekhne ko mil rha.. Wo bi hindi mai... Seedhe dimag mai chipak jata... Thanks to be the part of my data journey...love you 3000 ❤❤❤
Your deep learning playlist is helping me so much,i have my AI/ML paper after 3 days 😭..your videos are helping me a lot
which year?
@@hritikroshanmishra3630 which clg?
don't use these learnings against India
The playlist is litreally a gem, especially because understanding the applications as well as the 'math' behind it!
sir keep uploading big big big fan of your teaching ,full support to you sir...your videos helping me a lot ...god bless you
God bless you sir🙏you’re helping so many lives💯
Your deep learning playlist is helping me so much ,please sir upload CNN lecture as soon as possible
Very good knowledge getting from this chennel.... Jay Swaminarayan
Very informative
Thanks for your efforts.
This playlist is amazing..........
wish you and your channel keep growing!
Suppose we have an input X with n components and a linear neuron with random weights Wthat spits out an output Y.
The variance of y can be written as:
Y=W1X1+W2X2+⋯+WnXn
We know that the variance of WiXi is
Var(WiXi)=E(Xi)2Var(Wi)+E(Wi)2Var(Xi)+Var(Wi)Var(Xi)
Here, we assume that Xi and Wi are all identically and independently distributed (Gaussian distribution with zero mean), we can work out the variance of Y which is:
Var(Y)=Var(W1X1+W2X2+⋯+WnXn)=Var(W1X1)+Var(W2X2)+⋯+Var(WnXn)=nVar(Wi)Var(Xi)
The variance of the output is the variance of the input but it is scaled by nVar(Wi). Hence, if we want the variance of Y to be equal to the variance of X, then the term nVar(Wi) should be equal to 1. Hence, the variance of the weight should be:
Var(Wi)=1/n(input)
This is Xavier Initialization formula. We need to pick the weights from a Gaussian distribution with zero mean and a variance of 1n where n is the number of input neurons in the weight tensor.. That is how Xavier (Glorot) initialization is implemented in Caffee library.
Similarly, if we go through backpropagation, we apply the same steps and get:
Var(Wi)=1/n(out)
In order to keep the variance of the input and the output gradient the same, these two constraints can only be satisfied simultaneously if n(input)=n(output). However, in the general case, the n(input) and n(output) of a layer may not be equal, and so as a sort of compromise, Glorot and Bengio suggest using the average of the n(input) and n(output), proposing that:
Var(Wi)=1/n(avg)
where n(avg)=(n(input)+n(out))/2.
So, the idea is to initialize weights from Gaussian Distribution with mean = 0.0 and variance:
σ=√(2/(n(input)+n(output))
Note that when the number of input connections is roughly equal to the number of output connections, you get the simpler equations:
σ^2=1/n(input)
I'm at that point in my academics where i feel lost without you XD
Sir please don't stop to make videos
Your videos are actually help us🙏
Hey, will you be covering Momentum optimization, Nesterov accelerated gradient, adaGrad, RmsProp, adam and nadam optimization in your future videos in deep learning playlist?
Yes
@@campusx-official please continue your dl series 🥺🥺🥺
Shahzada, Bubber Shair veer....😍
Outstanding effort for everyone especially for me like new students. Plz tell me about kiras????wts date???
great sir. thank you sir
How do you upload your data to raspberry pi or arduino. Do i need to buy heavy graphics card laptop? Or I can simply go with dell inspiron 14 5514 or hp Pavillion aero 13.
Thanku so much sir
from Pakistan
Thank You Sir.
how to join your full course because in 100 day python learning there is no 100 videos ...i want to there is any site or u only upload video in u tube not other site?...i want to do proper course?
Sir I am not getting better or equal results performing initialization setting the weights manually and setting them using kernel initializer
Bhaiya projects pe video laao na pls 🙏🏻
hello brother , have you got any job?
Thanks
Sir Plz update more videos
It feels like have a runny nose. Take care brother.
5:00 who remembers XAVIER BHAIYA ?😂😂
Superb
best
🙏🙏🙏🙏🙏🙏
God level teacher ❤️🤌🏻