wow... this 11 min video took me 2 hours to understand most of it. You did a really good job putting ALL that information in such a short amount of time. Great job Siraj, keep up the good work!
I really appreciate all the work you're doing with these videos. Sorry for my caustic comments before. I am a rank ammeter. You're videos are getting better and better.
Hey Siraj, you're AWESOME! Nothing less. I am watching your videos to learn Machine Learning while my college admissions are going on. Never stop, cuz i too want to see AI solved in my lifetime.
Kind of late but, could somebody explain why the random wight matrix at 2:15 is multiplied by 2 and minus 1? I tried without them and it worked pretty much the same, but I'm doing the simple AF one...
Thanks for the amazing info mate. In the Fast Ai course, they say: one should learn the code first then the theory, but you prove them wrong in my opinion. Thanks again my friens.
hello Siraj, make a video talking about what is necessary to start to learning machine learning, like basic math necessary and programming language to learn before start. sorry for my English, I'm Brazilian, thanks
Can someone clarify the part at 2:26 about dot product and matrix multiplication? It says that they're the same, while they're completely different, dot product producing a scalar, and matrix multiplication producing a matrix.
All the talk about neural networks from conferences to individual series are cool, but what a lot of people aren't clearing up is exactly how to apply it based on real-world example. Its like giving a person an engine and showing how the engine itself works, but one person may want a car engine, another may want a boat engine, another may want a jet engine and another may want whatever engine the Starship Enterprise uses. So in all actuality, there is not really any information on how to use neural networks so that programmers can use it to apply to whatever problem.
in IN[43] 2:22 , can someone tell me what does this line mean: *synaptic_weights = 2 * np.random.random((3,1)) - 1* What is the significance of (3,1) - 1 and why was his code working without affixing 'np' in the beginning (like I did)? And why Random.random (random 2 times )?
np was "affixed" (you mean imported) by import numpy as np It sounds to me like you are a total beginner, but I'm going to answer anyway. (3,1) is a python data structure called tuple, it's packing 2 values into one variable. It's supposed to describe the dimensions of the output matrix which will be 3 rows and one column. By default random() returns a matrix with random values between 0 and 1, the matrix size is specified by this tuple. np is the numpy module you imported np.random is the random number generator inside of numpy np.random.random((3,1)) calls the function random() on the random number generator, requests matrix dimensions 3x1 2*np.random.random((3,1)) multiplies all values in this (3x1) matrix by 2, resulting in a matrix with random values betweem 0 and 2 2*np.random.random((3,1))-1 the minus one subtracts one from each value, making a matrix with random values between -1 and 1
nono i knew np was numpy , my question was that he didnt import it and yet his code worked I always consider myself a beginner in everything, but I am brand new to python Ohhh i get it now , u r a BOSS danke very mush
Hey, how do we optimize the total number of hidden layers required and number of neurons present in each layer for a model. e.g., Like a image recognition problem can be solved by having 2 hidden layers and each layer having 100 neurons each but same can be solved by using 5 layers each having 400 neurons. So how do we optimize these numbers ?
Hi... Could you please explain the difference and relation between big data,data science,machine learning and neural networks. please please make a video on that.
I read over your notebook, I liked the nice and simple vectorized code. I am trying to understand the general intuition behind how you did the MNIST example. Correct me if I am wrong, but your output lattice of nodes is 20 x 20, so you have 400 weight vectors lying in dimension 784 (number of pixels in image). You then represented this information as a 3D matrix of size 20x20x784. After training this matrix has the finalized weights. Its not clear to me what your doing next? Are you now using these 400 weights, to form 400 clusters in your data, and then plotting the each image in the clusters on the 20x20 lattice to get the visualization?
hammad shaikh Yeah, you're right. To visualise the 3D Tensor, we have to transform it to a 2D matrix first. So each 768 weight vector is converted to 28x28 matrix and aligned according to the parent node in the lattice.
Excellent lecture bro but i have some doubts....why neural networks need hidden layer with multiple neurons why cant it adjust with one neuron in the hidden layer?Moreover same inputs are connected to each neuron in the hidden layer which gives the same output.Do we need to give different set of weights to each input so that differentiates one neuron from other?What every neuron in the hidden layer is computing?
Siraj, Could you kindly provide us with an example (tutorial) on how properly to update a trained deep learning model based on new data (lets say from a sensor)?
Siraj, I wonder. Sigmoid function is y = 1 / (1 + e^-x). It's derivative is equal to e^x / (e^x + 1)^2 Why in this video are you using different function as derivative? x*(1 - x)
that is exactly what I was thinking as well. The derivative can be rewritten as s(x)*(1-s(x)), where s(x) is sigmoid function, but definitely not as x*(1-x). His training seems to be working though :O
I get it now. I am probably used to different order of computation. Error is defined as partial derivative of cost function, w.r.t. weighed input z (W*x + b). If you want do calculate error in the last layer, according to chain rule, you have: error = dC/da * ds/dz, where C is cost function, a is activation in the last layer and s is sigmoid/activation function. If you want to compute exact value of second term, you should plug in z to sigmoid prime, but Siraj plugs in activation (sigmoid already applied) and that's why we don't have to apply sigmoid in the function
@Simon Mandlik I still do not understand . see activation(np.array([2.0,1.0,-1.0]),True) and np.array([2.0,1.0,-1.0])*(1-np.array([2.0,1.0,-1.0])) generates the same result. I do not see how x*(1-x) is the same as S(x)*(1-S(x)). ?
@Simon Mandlik I still do not understand . see activation(np.array([2.0,1.0,-1.0]),True) and np.array([2.0,1.0,-1.0])*(1-np.array([2.0,1.0,-1.0])) generates the same result. I do not see how x*(1-x) is the same as S(x)*(1-S(x)). ?
So, it does appear that nonlin returns x*(1-x) when deriv=True, however when it is called, the x that is passed to it is itself a sigmoid function L1, effectively making it the same thing. I guess, it just helps to represent it as x instead of typing it again.
@Siraj Raval can you create some video about data overfitting? And of course solution for that problems... Because i try a lot of time create your previous challenge, but sometimes I have big overfitting problems, like when I use train data then I have 100% acc. but with test data I have ~10% -_- (of course I check the best prediction in tensorboard, but it isn't great solution for it. Correct me if I am wrong :D) Have you any better solution for overfitting problems?
Hi Siraj, all of your videos are playable offline except this one. Im trying to learn machine learning and i downloaded all of your videos to watch it when im travelling going to work. Hope that in a few weeks i could send an entry for your github contests. Anyway, can you change the setting to be saved offline?
I have a problem in the last line of code .In your notebook u have this -' #testing print(activate(np.dot(array([0, 1, 1]), syn0))) [ 0.99973427 0.98488354 0.01181281 0.96003643]' So when i just copy-past this i had an error like NameError.Then i 'from numpy import array' and got different result from activation function.it was like that = [ 0.36375058].What the prroblem? Ps.U have a mistake in this code -github.com/llSourcell/neural_networks/blob/master/simple_af_network.ipynb .( #Use it to compute the gradient layer2_gradient = l2_error*activate(layer2,deriv=True) .In this line we have l2_error parametr.Instead of this u need to use layer2_error).Thank you
Hey I was unable to install PIL via pip, so I changed 3 lines and it worked: import matplotlib.pyplot as plt from scipy.misc import toimage # from pillow import Image def show(self): plt.imshow(toimage(self.weights.astype('uint8'), mode='RGB')) plt.show()
Abhishek Singh Chauhan well Python. but honestly that doesn't matter as much as having the patience to go line by line and equation by equation and trusting that yr brain will make sense of it all. also a good "statistics vocabulary" , and familiarity with Linear Algebra and matrix operations lol.
Why don't we use each channel/layer as a form of captured time? Say 5 second length as a captured time. Then use that as a channel/layer and apply it to the system. Action as a symbol.
As far as i know Sigmoid is still used because to get the probabilities. [0-1] values , which is not the case with ReLU(for binary classification problems | Softmax in case of multiclass classification problems. ) . So they are used just in last layer. ReLu doesn't suffer from vanishing gradient problem so they are all used in hidden layers so that errors can be propagated back effectively.
*On Sigmoid* : I was just reading about it The derivative of a sigmoid function S'(x) = S(x) * (1-S(x)) But here you did: S'(x) = x * (1-x) *Can someone please explain?*
I'm a business programmer and I just have one thing to say. If I ever have to program like this to keep a job ... I'm screwed. What the hell is a "sigmoid function?"
Doulingo is well, but i think that the practice is the better way to learn than have a boring course of any language that you wish. por example, my english is not perfect, but learned a lot making the subs in english and spanish of math of intelligence. before, i have been colaborator in other videos of philosophy, memes, reviews etc. without a basic english and now i'm here, typing you.
Bob Crunch, I have never heard either of those sentences. But that's possible because even I sometimes don't know an idiom that's not used very often. But I'd choose 'Met een windmolen in het hoofd slaan', because you can hit someone in the head with a windmill (as in a windmill-toy), but one can't literally 'through a windmill hit in the head'. Note that the word 'door' means 'through' in this sentence.
I heard from a Dutch native speaker that "Hit in the head by a windmill" was an idiom for someone who is crazy or maybe someone with a bad idea. Thanks for the reply.
wow... this 11 min video took me 2 hours to understand most of it. You did a really good job putting ALL that information in such a short amount of time. Great job Siraj, keep up the good work!
I really appreciate all the work you're doing with these videos. Sorry for my caustic comments before. I am a rank ammeter. You're videos are getting better and better.
Hey Siraj, you're AWESOME! Nothing less. I am watching your videos to learn Machine Learning while my college admissions are going on. Never stop, cuz i too want to see AI solved in my lifetime.
Did I forget to mention that your videos are easy to understand. Sorry for that.
This video went from 0 - 100 real quick.
Kind of late but, could somebody explain why the random wight matrix at 2:15 is multiplied by 2 and minus 1? I tried without them and it worked pretty much the same, but I'm doing the simple AF one...
Terima kasih mas siraj, saya di kasih tugas karena anda
sama sama mas novan
-mas siraj
Thanks for the amazing info mate.
In the Fast Ai course, they say: one should learn the code first then the theory, but you prove them wrong in my opinion.
Thanks again my friens.
Those beats... deserved a rewind all on their own. A beat souffle I would say.
The flute beat is mine. Hurricane. Video is on my channel.
hello Siraj, make a video talking about what is necessary to start to learning machine learning, like basic math necessary and programming language to learn before start.
sorry for my English, I'm Brazilian, thanks
Can someone clarify the part at 2:26 about dot product and matrix multiplication? It says that they're the same, while they're completely different, dot product producing a scalar, and matrix multiplication producing a matrix.
this channel is so under rated
Japi Sandhu he is a great communicator.
All the talk about neural networks from conferences to individual series are cool, but what a lot of people aren't clearing up is exactly how to apply it based on real-world example. Its like giving a person an engine and showing how the engine itself works, but one person may want a car engine, another may want a boat engine, another may want a jet engine and another may want whatever engine the Starship Enterprise uses. So in all actuality, there is not really any information on how to use neural networks so that programmers can use it to apply to whatever problem.
+Partisan Black see my intro to deep learning playlist
***dammit man, Your explanations are awesome!
This video is awasome!!!! Thank you so much :)
I liked the "LOVE" equation was too good.... Thanks Siraj :)
lol thanks
Learning from you is amazing.
Take a shot everytime he says function.
Great vid btw
in IN[43] 2:22 , can someone tell me what does this line mean:
*synaptic_weights = 2 * np.random.random((3,1)) - 1*
What is the significance of (3,1) - 1 and why was his code working without affixing 'np' in the beginning (like I did)?
And why Random.random (random 2 times )?
np was "affixed" (you mean imported) by
import numpy as np
It sounds to me like you are a total beginner, but I'm going to answer anyway.
(3,1) is a python data structure called tuple, it's packing 2 values into one variable. It's supposed to describe the dimensions of the output matrix which will be 3 rows and one column. By default random() returns a matrix with random values between 0 and 1, the matrix size is specified by this tuple.
np is the numpy module you imported
np.random is the random number generator inside of numpy
np.random.random((3,1)) calls the function random() on the random number generator, requests matrix dimensions 3x1
2*np.random.random((3,1)) multiplies all values in this (3x1) matrix by 2, resulting in a matrix with random values betweem 0 and 2
2*np.random.random((3,1))-1 the minus one subtracts one from each value, making a matrix with random values between -1 and 1
nono i knew np was numpy , my question was that he didnt import it and yet his code worked
I always consider myself a beginner in everything, but I am brand new to python
Ohhh i get it now , u r a BOSS
danke very mush
He simply didn't bother to show the import. If you look at his code on github you'll see it's there
12345a scroll up, there is his import
Explaining LSTM and Conv Net implementations would be very helpful in upcoming tutorials!
i will
hey siraj , are you a speedreader/ speedlearner ? if yes , please try to make a video series on your fast learning style too
great vid again.....really helpful
This video is GOLD!!!!
Hey, how do we optimize the total number of hidden layers required and number of neurons present in each layer for a model.
e.g., Like a image recognition problem can be solved by having 2 hidden layers and each layer having 100 neurons each but same can be solved by using 5 layers each having 400 neurons.
So how do we optimize these numbers ?
can anyone please explain me
why derivative of sigmoid function is taken as x*(x-1) . ??
Hi... Could you please explain the difference and relation between big data,data science,machine learning and neural networks. please please make a video on that.
Hi Siraj! Here's my solution for this week's coding challenge: github.com/jrios6/Math-of-Intelligence/tree/master/4-Self-Organizing-Maps
I read over your notebook, I liked the nice and simple vectorized code. I am trying to understand the general intuition behind how you did the MNIST example. Correct me if I am wrong, but your output lattice of nodes is 20 x 20, so you have 400 weight vectors lying in dimension 784 (number of pixels in image). You then represented this information as a 3D matrix of size 20x20x784. After training this matrix has the finalized weights. Its not clear to me what your doing next? Are you now using these 400 weights, to form 400 clusters in your data, and then plotting the each image in the clusters on the 20x20 lattice to get the visualization?
hammad shaikh Yeah, you're right. To visualise the 3D Tensor, we have to transform it to a 2D matrix first. So each 768 weight vector is converted to 28x28 matrix and aligned according to the parent node in the lattice.
u rule Ong
Excellent lecture bro but i have some doubts....why neural networks need hidden layer with multiple neurons why cant it adjust with one neuron in the hidden layer?Moreover same inputs are connected to each neuron in the hidden layer which gives the same output.Do we need to give different set of weights to each input so that differentiates one neuron from other?What every neuron in the hidden layer is computing?
Siraj, Could you kindly provide us with an example (tutorial) on how properly to update a trained deep learning model based on new data (lets say from a sensor)?
love ur vids, keep up the good work!
thank u!
Siraj, I wonder.
Sigmoid function is y = 1 / (1 + e^-x). It's derivative is equal to e^x / (e^x + 1)^2
Why in this video are you using different function as derivative? x*(1 - x)
that is exactly what I was thinking as well. The derivative can be rewritten as s(x)*(1-s(x)), where s(x) is sigmoid function, but definitely not as x*(1-x). His training seems to be working though :O
I get it now. I am probably used to different order of computation. Error is defined as partial derivative of cost function, w.r.t. weighed input z (W*x + b). If you want do calculate error in the last layer, according to chain rule, you have: error = dC/da * ds/dz, where C is cost function, a is activation in the last layer and s is sigmoid/activation function. If you want to compute exact value of second term, you should plug in z to sigmoid prime, but Siraj plugs in activation (sigmoid already applied) and that's why we don't have to apply sigmoid in the function
@Simon Mandlik
I still do not understand . see
activation(np.array([2.0,1.0,-1.0]),True) and
np.array([2.0,1.0,-1.0])*(1-np.array([2.0,1.0,-1.0]))
generates the same result. I do not see how x*(1-x) is the same as
S(x)*(1-S(x)). ?
@Simon Mandlik
I still do not understand . see
activation(np.array([2.0,1.0,-1.0]),True) and
np.array([2.0,1.0,-1.0])*(1-np.array([2.0,1.0,-1.0]))
generates the same result. I do not see how x*(1-x) is the same as
S(x)*(1-S(x)). ?
So, it does appear that nonlin returns x*(1-x) when deriv=True, however when it is called, the x that is passed to it is itself a sigmoid function L1, effectively making it the same thing. I guess, it just helps to represent it as x instead of typing it again.
awesome siraj
@Siraj Raval can you create some video about data overfitting?
And of course solution for that problems...
Because i try a lot of time create your previous challenge, but sometimes I have big overfitting problems, like when I use train data then I have 100% acc. but with test data I have ~10% -_- (of course I check the best prediction in tensorboard, but it isn't great solution for it. Correct me if I am wrong :D)
Have you any better solution for overfitting problems?
Hey Siraj, why do we add the gradients after we backproped them instead of subtracting. We are going for the minima right?!
I think it's because the results are negative? or am I just dumb :D
hai sriraj kindly provide links to learn machine learning iam new to this field
Siraj Raval
, thank you for all these great videos
Can you become a little bit slower? Because our first language isn't English
This video is very helpful
Hi Siraj, all of your videos are playable offline except this one. Im trying to learn machine learning and i downloaded all of your videos to watch it when im travelling going to work. Hope that in a few weeks i could send an entry for your github contests. Anyway, can you change the setting to be saved offline?
hmm use keepvid dot com
Wow amazing!
I have a problem in the last line of code .In your notebook u have this -'
#testing
print(activate(np.dot(array([0, 1, 1]), syn0)))
[ 0.99973427 0.98488354 0.01181281 0.96003643]'
So when i just copy-past this i had an error like NameError.Then i 'from numpy import array' and got different result from activation function.it was like that = [ 0.36375058].What the prroblem?
Ps.U have a mistake in this code -github.com/llSourcell/neural_networks/blob/master/simple_af_network.ipynb .( #Use it to compute the gradient
layer2_gradient = l2_error*activate(layer2,deriv=True) .In this line we have l2_error parametr.Instead of this u need to use layer2_error).Thank you
why is the print function parameter censored? hehe =)
It is not censored it is a reference to "black box"
Hey I was unable to install PIL via pip, so I changed 3 lines and it worked:
import matplotlib.pyplot as plt
from scipy.misc import toimage
# from pillow import Image
def show(self):
plt.imshow(toimage(self.weights.astype('uint8'), mode='RGB'))
plt.show()
Hey Siraj,Plz make lots of tutorial videos on Neural Network.For students(just like me) who want to learn about ANN.
akash vaid perhaps if you support him on Patreon.
i dont know.
i have so many countless neural network videos see my intro to deep learning playlist. i will make more
thank you for math intelligence video
King of memology!
yo - what's the name of the song? it's amazing!
Hurricane - Jef Kearns
Please make a video how to configure, train and use Tensorflow new Object Detection API with own dataset and model
When time mattered in the input sequence then RNN Comes in. Good.
what are the prerequisite for ML?
Calculus + basic knowledge of programming
Abhishek Singh Chauhan patience
Nick Ellis I mean to say which programming language
Abhishek Singh Chauhan well Python. but honestly that doesn't matter as much as having the patience to go line by line and equation by equation and trusting that yr brain will make sense of it all. also a good "statistics vocabulary" , and familiarity with Linear Algebra and matrix operations lol.
Aditya Abhyankar voice responsive automated system like assistant
Why don't we use each channel/layer as a form of captured time? Say 5 second length as a captured time. Then use that as a channel/layer and apply it to the system. Action as a symbol.
Can anyone tell me how we calculated gradient?
Was crackin up at 1:15
You got it goin on ...
breh why are people still using sigmoid? I thot ReLu was superior
As far as i know Sigmoid is still used because to get the probabilities. [0-1] values , which is not the case with ReLU(for binary classification problems | Softmax in case of multiclass classification problems. ) . So they are used just in last layer. ReLu doesn't suffer from vanishing gradient problem so they are all used in hidden layers so that errors can be propagated back effectively.
what Nandan said is true.
Simplified AF network, not familiar with that one.
Hmm. yeah. technical definition is a single layer feedforward network. older terminology is perceptron. i shouldve said that instead. thanks
No, keep it. You're entertaining AF! Best channel for learning AI. Keep up the good work.
Why would you need to practice dutch?
i live in amsterdam
is this Deep Learning or is this Art?
both
*On Sigmoid* : I was just reading about it
The derivative of a sigmoid function
S'(x) = S(x) * (1-S(x))
But here you did:
S'(x) = x * (1-x)
*Can someone please explain?*
oly shit i got it now
u were being cheeky smart with puttin those 2 important parts in one function
or mayb i am a dumb shit
god, i m so laggin you
They are the same things just in different syntaxes.
I'm a business programmer and I just have one thing to say. If I ever have to program like this to keep a job ... I'm screwed. What the hell is a "sigmoid function?"
Sigmoid function is a type of activation function for neural networks. Search activation functions siraj on RUclips and watch that vid you’ll love it
when you realize siraj knows several languages and probably actually went to go practice his Dutch
yay
Hi Siraj
hi
I heard practising Dutch, which triggered me because I'm a Dutchie.
BTW, if you want a suggestion for learning material I'm using Duolingo and I like it.
What's the correct way to say:
Met een windmolen in het hoofd slaan
or
Door een windmolen in het hoofd slaan
?
Doulingo is well, but i think that the practice is the better way to learn than have a boring course of any language that you wish. por example, my english is not perfect, but learned a lot making the subs in english and spanish of math of intelligence. before, i have been colaborator in other videos of philosophy, memes, reviews etc. without a basic english and now i'm here, typing you.
Bob Crunch, I have never heard either of those sentences. But that's possible because even I sometimes don't know an idiom that's not used very often. But I'd choose 'Met een windmolen in het hoofd slaan', because you can hit someone in the head with a windmill (as in a windmill-toy), but one can't literally 'through a windmill hit in the head'. Note that the word 'door' means 'through' in this sentence.
I heard from a Dutch native speaker that "Hit in the head by a windmill" was an idiom for someone who is crazy or maybe someone with a bad idea. Thanks for the reply.
Ngl, siraj has the weirdest sense of humour
saved4
I came here looking to learn the math of intelligence and left looking to a math tutor. :|
:( i will do better
Thanks! I’m looking forward to your future videos!
BTW your short-hair girl friend is beautiful
thx shes not my gf u guys r my gf
"Clowning the explanation of neural networks". Is this some kind of american talk show?
siraj please dont upload a videos per week. please upload maybe like 2 or 3 in a week.
'don't upload a videos?' was this a typo i dont understand
what he meant was , upload more videos in a week cause we are hooked now
we need a dose of your mind
Siraj Raval yup that was a typo I meant 1 video per week
kk thx