I'm graduating in Computer Science Masters, my research is NLP (use a lot of RNNs) but your videos always give me small insights that help me understand deep learning more "deeply" haha. Just wanted to say that, you're great Siraj!
Hi Siraj I got a chance to watch few of your video. I am 8 years Ml researcher but I found you teaching method is awesome. anyone could learn. Great work
Thank you so much for this video! I was just about to start researching the differences between the SGD optimization algorithms. Thank you so much for saving me so much time and making a video that has all the pertinent information in a very informative and understandable way. I love your videos so much. Thank you, Siraj, you are my favorite person on the internet. Don't stop what you're doing. You're helping so many people learn so much information that can be sometimes hard to find. Thanks!!!
Your last few videos have been so on point! Very interesting things that are useful for someone who already knows a decent amount of ML and NNs, but not NNs so deeply.
@2:30 Siraj, you should change that graphic of the function y=x^2. The function you have shown there is not x^2 and could confuse people. You're talking about decreasing the x value in the negative direction of the gradient from x=2.3 to x=1.4 to x=0.7, basically moving from a high x on the right towards smaller x values on the left. This is decreasing the x value, yet the graphic you have up there is showing moving right from the left. Some newbies may be confused by that. But great vid overall.
Thank you for stating this, as I was confused about the gradient (slopes at the tangent of the cost function) values in relation to the graph. I was look for such a comment here and you provided it.
duuuuuude, you're the best. I wasn't able to understand these concepts (reading these overly complicated articles) but now it's becoming clear. Thanks a lot. For example, I realized that adam is the best solver after weeks of gridsearch tests, but I didn't know why.... and now it's clear.
Yes more evolution of DL algorithms pls. Its really hard to decide which algo to use in which situation most of the time! Thanks Siraj for these great videos
*Laughs* I try to hate Siraj because of his too animated video format, but I actually really like him: "Generalization is the hallmark of intelligence", this is true and a rare statement, congratulations!
i seen some videos on image classification they are using rmsprop like, why do we use , where do we use "rmsprop" model.compile(loss='categorical_crossentropy', optimizer='rmsprop',metrics=["accuracy"])
I have understanding problems from 7:10 to 7:35. Fist I don't get it why an square root in a fraction causes the lerarning rate to decrease (dividing with small numbers return big numbers). And I have problems understanding the term: E[g^2].
But what is the difference between having momentum vs. just having a higher learning rate? After everything in an update step is complete the result is the same, is it not? Basic momentum calculation -> may overshoot because we are still "moving"; and high learning rate -> we overshoot as well.
@Siraj: Although ADAM is objectively the best call, I have noticed that it can't be generalised. Especially in computer vision problems, I have realised that most of the time a vanilla SGD works better than other advanced methods; or some other times SGD works better for the first n-th epochs, and only after ADAM gives a good contribution. What do you think about this ?
In visualization(8:14) NAG and Momentum methods follow adaptive gradient methods but you told that they go to wrong direction. Can you make it clearer?
Feel that your course more and more scattered, if you can have an exciting project, you have taught a lot of knowledge points can be used, then we have a deep understanding of these knowledge points will be very helpful. So I once again suggested that you should do an automatic driving car project, rappi pi + camera + remote control car, there will be a lot of problems to be solved, you teach a lot of knowledge points can be used in such projects.
Hi Siraj, just wanted to complement everything that you do but the last three videos in particular - the pacing and the overviews were awesome (usually your videos are a bit too fast for me, and I have to go over and over...) :) And a question - I am working with Keras right now (it is just sooo much easier and intuitive compared to TF, for which in 5 tutorials I see 5 different coding approaches and TF parameters used for effectively the exactly same network) and thought about 2 options for deploying: 1. export model and weights, load them in TF, and do everything according to your video 2. save model and weights, and make a small script in Keras that loads the model and does prediction. Thoughts? Thanks!
Hello. I hope you will answer as it's really important for me. I'm currently working on a project and my task is to generate meaningful unique text from a set of keywords. It doesn't need to be large, at least a couple of sentences. I'm pretty sure I have to use LSTM but I can not find any good examples of generation of meaningful texts. I saw a few of randomly generated but that's all. I would be grateful for any advice. Thank you in advance.
@Siraj could you explain more about ada grad and ada delta , i noticed from 7:09 to 7:35 , this part of video is confusing , As you stated that for ada delta *Running average at a time step depends on only the previous average and current gradient* but few seconds back you said that *E[g^2]t is the sum of all past squared gradients* , i feel these both lines are mutually exclusive , could you explain more! please
Can you explain about lambda and gamma variables in nesterov_method.py? I have no idea where it comes from. It doesn't seem to be part of the equation in the video.
Having to endure a semester of Control Systems Engineering, the resemblance between Nesterov's accelerated gradient and PID controllers is uncanny. The momentum alone acts like the I(ntegral) term in a PI controller, accelerating convergence while adding its own oscillations. Meanwhile, Nesterov's modification and D(erivative) term both serve to "brake" the momentum/integral from overshooting. I wonder, what other control theories could be applied?
Siraj, how much maths should I know in order to learn Machine Learning? I understand I have to know Linear Algebra, Calculus and Probability, but to what extent?
I am not Siraj, but I can suggest another way of learning things, start by learning what you find interesting, (for you, probably something related to AI, you are awesome! ;) ) Learn that and when you encounter maths, only then learn the maths that you need this will prevent you from wasting your time with knowledge that you don't need. I laughed at myself when I first had to learn what a basic sigma does to understand softmax on an ML course on Udacity (yeah I wanted to learn ML without even knowing that! xD) but now I know far more than that and I am on my way to learn and create so much more and so can anyone!
Ah that's awesome! How far are you into the Udacity ML course and what do you think of it? I want to take the ML course by Andrew Ng on Stanford, but I need prior knowledge of linear algebra and calculus. I am learning both at the moment, but I'm not too sure how much of each I need. What would you recommend?
StyleTrick I am still a beginner as well in ml so I don't have the best opinion on that ^^ I stopped the intro to machine learning course at lesson three, it's not my first udacity class and as usual it's a little slow to go through all the videos and sometimes it really goes into every steps sometime, I wish there where more details. I was too impatient so I jumped right into tensorflow, this I recommend no matter what course you want to take! Try the "get started" on their website try to understand and run "MNIST for ML Beginners"and "Deep MNIST for Experts" you'll see how much deep learning reduces error rate and also you'll see how fast it is to make a model with this library compared to the way it's taught in the Stanford course! I saw the ml course you talked about and it really goes into the details. I am stating the obvious but when you build any code, you can do more and faster by using libraries. if you are more interested in research for AI my guess is that it's good, but is it better than the udacity class? that's for you to tell! It depends on how you want to use AI in the end. Whatever you do the only thing that matter is to keep trying at it as much as you can, seriously, I'm ditching calls and snaps from my friends, I know that's bad but it doesn't even bother me ^^ try to learn tensorflow on their official website if you didn't yet!
Thanks so much for the information! I've heard great things about TensorFlow. It's just I'm studying some Linear Algebra and Calculus to get some indepth knowledge on ML. Btw, Can I jump into tensorflow straight away, like no prereqs?
hey, Siraj what do u think of information security or cyber security as a career vs Machine learning as a career. If u could make a video on different career options like web devlopment, cyber security and machine learning. and do we use gradient as a term only for derivative of multivariable functions or for both single and multi variable functions..?
Thanks for sharing the knowledge.. this was really helpful in understanding the core concept, although lots of things are there to digest.. still very useful..!!
Hi, I just came across your videos and ML and I am loving it... I saw an example recently on youtube of a Google employee training a mobile to identify labels on candy bar and chocolate wrapper...Would you kindly do one from beginning to end? None of the code show how to capture the label in real time and doing the recognition in the device itself... Thank you once again. You are really terrific doing all this for all of us hungry for knowledge
Adagrad is not borrowing the idea from Nesterov. Nesterov emphasizes on the momentum of the observed gradients while Adagrad signifies the importance of less frequently seen (sparse) updates. This makes the transition in the middle of the video a bit confusing. Great video though.
I just wanted to tell you that this video and the similar one on the activation functions are by far the ones that helped me the most! It really helps getting started with ML. I guess it is probably much more difficult (if possible at all), but I would love to get a similar intuition for the appropriate size and number of layers in a deep neural network, depending on what I wanted to do with it. Or at least, how to tackle the task of finding the best choices for them. Are you planning to do something like that? :)
Hey siraj, I was wondering if you could have a live session where we decide on some topic beforehand (something new and really challenging) and you do a live session where all of us contribute and we try out some new network structures.(brainstorming with ideas from arxiv-sanity) I am saying like a web session which companies have amongst employees but now you have the power to do it with a lot more. I wanna talk about general ideas in this field and there are just too many being published everyday :D
Hey man, Im in real need of help with the Adam algorithm I just cant grasp it. I cant find anywhere what the terms in the algorithm means. Like, what would Mt and Vt mean? I know its the Mean and a Variance but I don't understand what you mean by that. And what does T stand for? You also said that it adapts and "learns" the learning rate, making it a parameter instead of a hyper one. Then why is it in the math? Is that N the initial learning rate that I have to tweak? All I can understand is the e (realy small number just so it doesn't divide by 0) and Beta, being another hyperparamter. And O, being each weight or bias, but I don't know what the Ts stand for Thanks for all the videos btw. Youre videos usually don't get a lot into detail, but its from them that I acknowledge that these new optimizing functions and so on exist in the first place. Usually I just run off to google looking and learning for myself the details of what u show on the videos, but in this case I cant learn anywhere and Im desperate lol. Thx for the help
Hello siraj, I enjoy your videos so very much!😊 but I have a question. I myself don't use tensorflow or any similar library, I enjoy coding the models myself completely from scratch in c++ and implementing the training algorithm the same way. How useful would be such a skill set in the market?
Ron it is always recommend to use a library for deep learning, as most deep learning training needs lots of calculations that can be done best with GPU and distributed computing where the established libraries like tensorflow excels. But for beginner you can hand code everything from scratch to understand the concepts. Speaking for the market you asked, there are machine learning research labs and big software companies like google where you can join.
Great video as always! Have you seen the Levenberg-Marquardt algorithm been used with any deep learning frameworks. It is available in the neural network toolbox in MATLAB and I have found it to give better results that Adam for single layer NN's.
This is a very good video. Just wanted to add in from this paper (arxiv.org/abs/1705.08292) that adaptive optimizers seem to not always generalise so well
Bro, please do a video on face recognition using tensorflow. If you have done some work on it please share it with me. I am in grave need. Thanks, love your channel.
Okay Siraj, I've finished my first semester of Data Science (nearly), and I want to start learning more. I need some book suggestions that will help guide me from a decent knowledge of calculus, statistics, probability, Python and R to something more machine learning and 'real world problem' orientated. Are you a fan of the O'Reilly series of Data Science based books, or do you have some other favourites? Just give me everything, I a lot of options and a lot of reading to do over the semester break #wanttocontributetothecreationofthesingularityinsteadofdestroyingmyliver
I'm graduating in Computer Science Masters, my research is NLP (use a lot of RNNs) but your videos always give me small insights that help me understand deep learning more "deeply" haha. Just wanted to say that, you're great Siraj!
hell yes great to hear
These videos are definitely getting better.
Luis He for sure optimize his video production with ADAM
thanks luis i optimize
Hi Siraj I got a chance to watch few of your video. I am 8 years Ml researcher but I found you teaching method is awesome.
anyone could learn. Great work
Thank you so much for this video! I was just about to start researching the differences between the SGD optimization algorithms. Thank you so much for saving me so much time and making a video that has all the pertinent information in a very informative and understandable way. I love your videos so much. Thank you, Siraj, you are my favorite person on the internet. Don't stop what you're doing. You're helping so many people learn so much information that can be sometimes hard to find. Thanks!!!
Your last few videos have been so on point! Very interesting things that are useful for someone who already knows a decent amount of ML and NNs, but not NNs so deeply.
1:59 was simply awesome
glad u liked
@2:30 Siraj, you should change that graphic of the function y=x^2. The function you have shown there is not x^2 and could confuse people. You're talking about decreasing the x value in the negative direction of the gradient from x=2.3 to x=1.4 to x=0.7, basically moving from a high x on the right towards smaller x values on the left. This is decreasing the x value, yet the graphic you have up there is showing moving right from the left. Some newbies may be confused by that. But great vid overall.
Thank you for stating this, as I was confused about the gradient (slopes at the tangent of the cost function) values in relation to the graph. I was look for such a comment here and you provided it.
duuuuuude, you're the best. I wasn't able to understand these concepts (reading these overly complicated articles) but now it's becoming clear. Thanks a lot. For example, I realized that adam is the best solver after weeks of gridsearch tests, but I didn't know why.... and now it's clear.
Bro,That was Awesome when you said that "ohh Gradient Descent lead us to convergence!!".
Aye what an explanation man, big ups, you make an already interesting topic way more interesting. Thanks Siraj!
awesome video man. Never seen a guy explain something so technical in such an ebullient way!!
I can't believe how useful is this video. Rad! Thanks Siraj
You are improving a lot in you presentation style Siraj! Talking slower and clearer is really working for your material. Great work👍
This video has way fewer views than it should have...
I really hope that more people will find you and your great content!
Yes more evolution of DL algorithms pls. Its really hard to decide which algo to use in which situation most of the time! Thanks Siraj for these great videos
Siraj is a robot. His videos keep getting better and better.
*Laughs* I try to hate Siraj because of his too animated video format, but I actually really like him: "Generalization is the hallmark of intelligence", this is true and a rare statement, congratulations!
your ode to gradient descent is priceless
i seen some videos on image classification they are using rmsprop like, why do we use , where do we use "rmsprop"
model.compile(loss='categorical_crossentropy', optimizer='rmsprop',metrics=["accuracy"])
Great videos Siraj. Keep up the awesome work
Just finished reading "Ch 4: Numerical Computation" from Bengio's "Deep Learning" book, I actually understood what you were talking about! haha
I have understanding problems from 7:10 to 7:35. Fist I don't get it why an square root in a fraction causes the lerarning rate to decrease (dividing with small numbers return big numbers). And I have problems understanding the term: E[g^2].
But what is the difference between having momentum vs. just having a higher learning rate? After everything in an update step is complete the result is the same, is it not? Basic momentum calculation -> may overshoot because we are still "moving"; and high learning rate -> we overshoot as well.
Awesome! Loved the visual demos
@Siraj: Although ADAM is objectively the best call, I have noticed that it can't be generalised. Especially in computer vision problems, I have realised that most of the time a vanilla SGD works better than other advanced methods; or some other times SGD works better for the first n-th epochs, and only after ADAM gives a good contribution. What do you think about this ?
you are always improving, thanks
another question: What about the Stochastic Average Gradient Descent? or SAG solver used in python sklearn?
In visualization(8:14) NAG and Momentum methods follow adaptive gradient methods but you told that they go to wrong direction. Can you make it clearer?
You're awesome. Thanks for making these videos!! They really help and are entertaining as well.
Feel that your course more and more scattered, if you can have an exciting project, you have taught a lot of knowledge points can be used, then we have a deep understanding of these knowledge points will be very helpful.
So I once again suggested that you should do an automatic driving car project, rappi pi + camera + remote control car, there will be a lot of problems to be solved, you teach a lot of knowledge points can be used in such projects.
libai tony exactly my thoughts on this
Awesome video. I checked my video speed settings twice to check if RUclips having issues or Siraj is speaking slowly ;)
Hi Siraj,
just wanted to complement everything that you do but the last three videos in particular - the pacing and the overviews were awesome (usually your videos are a bit too fast for me, and I have to go over and over...) :) And a question - I am working with Keras right now (it is just sooo much easier and intuitive compared to TF, for which in 5 tutorials I see 5 different coding approaches and TF parameters used for effectively the exactly same network) and thought about 2 options for deploying:
1. export model and weights, load them in TF, and do everything according to your video
2. save model and weights, and make a small script in Keras that loads the model and does prediction.
Thoughts?
Thanks!
Gradient Descent with all those optimization is better the Genetic Algorithm? Should i use only back propagation w/ gradient descent?
Hello. I hope you will answer as it's really important for me. I'm currently working on a project and my task is to generate meaningful unique text from a set of keywords. It doesn't need to be large, at least a couple of sentences. I'm pretty sure I have to use LSTM but I can not find any good examples of generation of meaningful texts. I saw a few of randomly generated but that's all. I would be grateful for any advice. Thank you in advance.
You explained about Sgd, momentum, nag, adagrad,adadelta,adam , Can you explain about ''rmsprop'' when will we use?
hey siraj, great video, do you have the code for the stochastic gradient descent animation?
Holy shit Siraj, the video quality has gotten so amazing. :)
@Siraj Raval 04:09 do you have python code for example this plot?
Fantastic video. Keep up the great work Siraj.
You are amazing. Thank you so much for these videos. So entertaining and great content!
@Siraj could you explain more about ada grad and ada delta , i noticed from 7:09 to 7:35 , this part of video is confusing , As you stated that for ada delta *Running average at a time step depends on only the previous average and current gradient* but few seconds back you said that *E[g^2]t is the sum of all past squared gradients* , i feel these both lines are mutually exclusive , could you explain more! please
So an optimisers efficiency depends on the amount of data available? Whats the best optimiser for all the data?
Can you explain about lambda and gamma variables in nesterov_method.py?
I have no idea where it comes from. It doesn't seem to be part of the equation in the video.
Having to endure a semester of Control Systems Engineering, the resemblance between Nesterov's accelerated gradient and PID controllers is uncanny. The momentum alone acts like the I(ntegral) term in a PI controller, accelerating convergence while adding its own oscillations. Meanwhile, Nesterov's modification and D(erivative) term both serve to "brake" the momentum/integral from overshooting. I wonder, what other control theories could be applied?
My mind is blown! Gradient descent telling me to subscribe now to optimize my future!
It depends right? Different situations might favor different ones.
yes
Siraj, how much maths should I know in order to learn Machine Learning? I understand I have to know Linear Algebra, Calculus and Probability, but to what extent?
I am not Siraj, but I can suggest another way of learning things, start by learning what you find interesting, (for you, probably something related to AI, you are awesome! ;) ) Learn that and when you encounter maths, only then learn the maths that you need this will prevent you from wasting your time with knowledge that you don't need.
I laughed at myself when I first had to learn what a basic sigma does to understand softmax on an ML course on Udacity (yeah I wanted to learn ML without even knowing that! xD) but now I know far more than that and I am on my way to learn and create so much more and so can anyone!
Ah that's awesome! How far are you into the Udacity ML course and what do you think of it? I want to take the ML course by Andrew Ng on Stanford, but I need prior knowledge of linear algebra and calculus. I am learning both at the moment, but I'm not too sure how much of each I need. What would you recommend?
StyleTrick I am still a beginner as well in ml so I don't have the best opinion on that ^^ I stopped the intro to machine learning course at lesson three, it's not my first udacity class and as usual it's a little slow to go through all the videos and sometimes it really goes into every steps sometime, I wish there where more details. I was too impatient so I jumped right into tensorflow, this I recommend no matter what course you want to take! Try the "get started" on their website try to understand and run "MNIST for ML Beginners"and "Deep MNIST for Experts" you'll see how much deep learning reduces error rate and also you'll see how fast it is to make a model with this library compared to the way it's taught in the Stanford course! I saw the ml course you talked about and it really goes into the details. I am stating the obvious but when you build any code, you can do more and faster by using libraries. if you are more interested in research for AI my guess is that it's good, but is it better than the udacity class? that's for you to tell! It depends on how you want to use AI in the end. Whatever you do the only thing that matter is to keep trying at it as much as you can, seriously, I'm ditching calls and snaps from my friends, I know that's bad but it doesn't even bother me ^^ try to learn tensorflow on their official website if you didn't yet!
Thanks so much for the information! I've heard great things about TensorFlow. It's just I'm studying some Linear Algebra and Calculus to get some indepth knowledge on ML. Btw, Can I jump into tensorflow straight away, like no prereqs?
wait 5 days
hey, Siraj what do u think of information security or cyber security as a career vs Machine learning as a career. If u could make a video on different career options like web devlopment, cyber security and machine learning.
and do we use gradient as a term only for derivative of multivariable functions or for both single and multi variable functions..?
Why not NAdam (Adam with Nesterov momentum)?
in love with this video ! it makes make graduation work easier thanks :)
Thanks for sharing the knowledge.. this was really helpful in understanding the core concept, although lots of things are there to digest.. still very useful..!!
thanks man this video really helped me understand this concept.
1:51 THAT looked so good. :D
why u not mention rmsprop which is very useful ?
RMSprop not covered?
Man, your memes are out of this world.
Hi, I just came across your videos and ML and I am loving it... I saw an example recently on youtube of a Google employee training a mobile to identify labels on candy bar and chocolate wrapper...Would you kindly do one from beginning to end? None of the code show how to capture the label in real time and doing the recognition in the device itself... Thank you once again. You are really terrific doing all this for all of us hungry for knowledge
Adagrad is not borrowing the idea from Nesterov. Nesterov emphasizes on the momentum of the observed gradients while Adagrad signifies the importance of less frequently seen (sparse) updates. This makes the transition in the middle of the video a bit confusing. Great video though.
Hey siraj , love your videos.Can you do a video on batchnormalization and batchrenormalization ?
Thank you
can you do a video on speech generation with wavenet?
I just wanted to tell you that this video and the similar one on the activation functions are by far the ones that helped me the most! It really helps getting started with ML.
I guess it is probably much more difficult (if possible at all), but I would love to get a similar intuition for the appropriate size and number of layers in a deep neural network, depending on what I wanted to do with it. Or at least, how to tackle the task of finding the best choices for them. Are you planning to do something like that? :)
I cant focus. his shirt is distracting.
mk17173n lmfaoooo
can't*
typical indian
Great stuff! Thank's Siraj!
Hey siraj,
I was wondering if you could have a live session where we decide on some topic beforehand (something new and really challenging) and you do a live session where all of us contribute and we try out some new network structures.(brainstorming with ideas from arxiv-sanity) I am saying like a web session which companies have amongst employees but now you have the power to do it with a lot more.
I wanna talk about general ideas in this field and there are just too many being published everyday :D
Hey man, Im in real need of help with the Adam algorithm
I just cant grasp it. I cant find anywhere what the terms in the algorithm means.
Like, what would Mt and Vt mean? I know its the Mean and a Variance but I don't understand what you mean by that. And what does T stand for?
You also said that it adapts and "learns" the learning rate, making it a parameter instead of a hyper one. Then why is it in the math? Is that N the initial learning rate that I have to tweak?
All I can understand is the e (realy small number just so it doesn't divide by 0) and Beta, being another hyperparamter. And O, being each weight or bias, but I don't know what the Ts stand for
Thanks for all the videos btw. Youre videos usually don't get a lot into detail, but its from them that I acknowledge that these new optimizing functions and so on exist in the first place. Usually I just run off to google looking and learning for myself the details of what u show on the videos, but in this case I cant learn anywhere and Im desperate lol.
Thx for the help
great video. I would have been interested by knowing more about nadam too
Hello siraj, I enjoy your videos so very much!😊 but I have a question. I myself don't use tensorflow or any similar library, I enjoy coding the models myself completely from scratch in c++ and implementing the training algorithm the same way. How useful would be such a skill set in the market?
Ron it is always recommend to use a library for deep learning, as most deep learning training needs lots of calculations that can be done best with GPU and distributed computing where the established libraries like tensorflow excels. But for beginner you can hand code everything from scratch to understand the concepts.
Speaking for the market you asked, there are machine learning research labs and big software companies like google where you can join.
Great video as always! Have you seen the Levenberg-Marquardt algorithm been used with any deep learning frameworks. It is available in the neural network toolbox in MATLAB and I have found it to give better results that Adam for single layer NN's.
One of Siraj's best IMHO
you didnt specify how to submit the adam optimizer implementation, or i missed it in the readme
This is a very good video. Just wanted to add in from this paper (arxiv.org/abs/1705.08292) that adaptive optimizers seem to not always generalise so well
Pls, can you do a tutorial on generating images from text desription with TensorFlow?
Absolutely awesome
Three cheers for Gradient Descent. Hip hip, hooray!
Where was Siraj when i had to take Numerical Analysis!!
A min to that ode to gradient decent.
You are always the best
1:52-2:02 was fabulous
Bro, please do a video on face recognition using tensorflow. If you have done some work on it please share it with me. I am in grave need. Thanks, love your channel.
Dude thank you for the video. Can you suggest me some reference for this topic?
Helps alot. Thanks!
You are awesome. Thank you for this nice video!
Love your shirt siraj :)
Very Enlightening T-shirt, for a moment I thought of asking you about my future 😂
2:01 Siraj : To convergence!
ML squad : *To convergence!*
woot!!
5:33 Was awesome
9:04 to 9:10 it changed from a lower case f to capital F in mid sentence xD How did that happen?
...lead us to singularity....
*To singularity!*
i will
This video is part of the Deep Learning/Neural Network Playlist in Siraj's channel; FYI B)
Okay Siraj, I've finished my first semester of Data Science (nearly), and I want to start learning more. I need some book suggestions that will help guide me from a decent knowledge of calculus, statistics, probability, Python and R to something more machine learning and 'real world problem' orientated.
Are you a fan of the O'Reilly series of Data Science based books, or do you have some other favourites? Just give me everything, I a lot of options and a lot of reading to do over the semester break
#wanttocontributetothecreationofthesingularityinsteadofdestroyingmyliver
Introduction to Statistical Learning and then Elements of Statistical Learning. They use R.
You're awesome! Alao, cool shirt bro
you make amazing videos. thank you
Maybe you should do a follow up on this video reflecting the latest comments from ICLR2018 regarding ADAM.
The neural networks in my brain are classifying this video as SPAM
Amazing video!!
@Siraj:
Here is my submission for this weeks challenge:
github.com/rhnvrm/mini-projects/blob/master/adam/adam_implementation.ipynb
rohan so good to see u post, keep it up next time
Hi everyone ! Can someone explain Nadam ?
U are God of this field man haha
Adam was developed in 2014... imagine what type of optimizations the near future has for neural networks, eh?
Adaptive Adam. We should call it as Eve.
We need to train a neural network to predict step.
Thank you