@15:41 "with great complexity comes....great power" with great power comes great responsibility. with great responsibility comes great expectations. with great expectations comes great sacrifice. with great sacrifice comes great reward. And thus... the objective function was maximized
@@rahulpramanick2001But alas we only seek from the function great reward, and not the greatest reward. For achieving such greatness, you need a dash of convexity apart from the aforementioned complexity!
I think the objective loss function (yi_hat-yi)^2 is correct. It minimizes the error for all the samples while training which are i = 1 to N. What you did was write the error function in granularly. bith are needed.
In a = b+w*h formula either w should be transposed or w size should be (no.of outputs by no.of inputs). only then the matrix multiplication w*h happens as expected.
Ya, It completely depends on how you represent the X vectors... If you make it a column vector or a row vector, the matrix will be re-written accordingly! get the idea, and you can do the math yourself... with so many courses out there, different people do it differently, but the idea remains the same... while writing the formula, write down the vector/matrix dimensions and proceed accordingly... in the end, the summation formula should hold...
We are trying to fit the model for ‘N’ number of training data. So we are trying to minimise the error of training data as a collection. And since the output is a vector he sums error in each elements of a vector also. Gradient descent algorithm will work only if f(x) is a real number.
So actual y_i corresponding to each training example i, will be a k dimensional vector, with 1 at co-ordinate of the vector for the class it belongs to and 0 for the rest. That is, if the example lies in class 'p', then 'pth' co-ordinate of the vector y_i will be 1 and 0 for rest of the dimensions. Now our NN can spit out arbitrary k dimension vector. So our loss function is sample mean of element wise difference of the 2 vectors.
Find if following is a Linearly Separable Problem or not. ((¬A OR B) AND 1) OR 0 Also create a Neural Network for given equation with a suitable set of weights.
@15:41 "with great complexity comes....great power" with great power comes great responsibility. with great responsibility comes great expectations. with great expectations comes great sacrifice. with great sacrifice comes great reward.
And thus... the objective function was maximized
But we have to minimize it here.
@@nishkarshtripathi6123 Thank you for the correction!
min f(x) = max -f(x)
and thus the great sacrifices were not in vain :-)
Awesome!!!
@@RahulMadhavan this is only true if max f(x) is the global maxima.
@@rahulpramanick2001But alas we only seek from the function great reward, and not the greatest reward.
For achieving such greatness, you need a dash of convexity apart from the aforementioned complexity!
I think the objective loss function (yi_hat-yi)^2 is correct. It minimizes the error for all the samples while training which are i = 1 to N. What you did was write the error function in granularly. bith are needed.
y hat and y are both of dimension k. they are column vectors. had the same doubt. tq👍
In a = b+w*h formula either w should be transposed or w size should be (no.of outputs by no.of inputs). only then the matrix multiplication w*h happens as expected.
Ya, It completely depends on how you represent the X vectors... If you make it a column vector or a row vector, the matrix will be re-written accordingly!
get the idea, and you can do the math yourself...
with so many courses out there, different people do it differently, but the idea remains the same...
while writing the formula, write down the vector/matrix dimensions and proceed accordingly... in the end, the summation formula should hold...
There is a slight mistake in the formula ai = bi + W(i)`*h(i-1)
It makes sense when we see which weight wi is multiplied by which xi
Shouldn't W_L at 6:31 be 'kxn' and not the other way around?
Can anyone plz explain the last error.
What does summation over i instances mean?
We are trying to fit the model for ‘N’ number of training data. So we are trying to minimise the error of training data as a collection. And since the output is a vector he sums error in each elements of a vector also. Gradient descent algorithm will work only if f(x) is a real number.
So actual y_i corresponding to each training example i, will be a k dimensional vector, with 1 at co-ordinate of the vector for the class it belongs to and 0 for the rest. That is, if the example lies in class 'p', then 'pth' co-ordinate of the vector y_i will be 1 and 0 for rest of the dimensions. Now our NN can spit out arbitrary k dimension vector. So our loss function is sample mean of element wise difference of the 2 vectors.
it will be min(i/k(fun)) not min(i/n(fun))
Find if following is a Linearly Separable Problem or not.
((¬A OR B) AND 1) OR 0
Also create a Neural Network for given equation with a
suitable set of weights.
if you look at it closely its just an or function
@7:38 , b11 = b12= b13?
Not necessarily..
(Y)
SIR you look like khan sir