- Видео 34
- Просмотров 48 111
AMILE - Machine Learning with Christian Nabert
Добавлен 5 ноя 2020
AMILE - lectures about machine learning and artificial intelligence. Logically structured content beautifully illustrated for easy and deep understanding.
Dropout - a Method to Regularize the Training of Deep Neural Networks [Lecture 6.4]
What is dropout? Why use inverted dropout and how does it work? Why regularizes droupout the neural network?
Dropout randomly drops neurons during the training process. The idea is to imitate fault tolerance of the brain. The keep-proability specifies the probability at which a neuron is kept in the current training step. The most popular implementation of dropout is the so-called inverted dropout which give efficient stable predictions on test data set. In general, dropout regularizes the neural network and leads to a more robust network, similar to L2 regularization.
Dropout randomly drops neurons during the training process. The idea is to imitate fault tolerance of the brain. The keep-proability specifies the probability at which a neuron is kept in the current training step. The most popular implementation of dropout is the so-called inverted dropout which give efficient stable predictions on test data set. In general, dropout regularizes the neural network and leads to a more robust network, similar to L2 regularization.
Просмотров: 515
Видео
How does Batch Normalization really works? [Lecture 6.3]
Просмотров 1493 года назад
What is the idea of batch normalization? How can batch normlization stabelize training of deep neural networks? Batch normalization bases on a similar idea than normalization in data preprocessing. Basically, batch normalization is a layer standardization with mean and variance for a mini batch. Including batch normalization requires a modification of the backpropagation algorithm. The effect o...
Increasing Variance of Deep Neural Networks - Xavier Initalization [Lecture 6.2]
Просмотров 2123 года назад
Why does the variance increases across the layers of a deep neural network? Why is this a bad thing? How can Xavier initialization tackle the problem? Changes in Variance (Simplified scenario with a neuron in the middle of a deep neural network, change in variance across layers, assumtions such as linearization and standard normal distribution, variance equation, Interpretation: increasing vari...
Vanishing Gradient Problem - Why it is Difficult to Train Deep Neural Networks [Lecture 6.1]
Просмотров 2603 года назад
Why is it difficult to train deep neural networks? Why can gradients vanish or explode? The vanishing gradient problem is presented by an example of a deep neural network with sigmoid activation with assumptions of bounded derivative and a standard normal inizalization. We find out that ReLU activation is a good choice and how the vanishing and exploding gradients occur.
Gradient Based Training of Neural Networks [Lecture 5.7]
Просмотров 3703 года назад
"Why not use finite differences to train neural networks? Why not use BFGS? What are the differences between vanilla, batch and stochastic gradient descent?" Subscribe the channel ruclips.net/channel/UCgQlZ6kefvYeHDe YkFluA Part 1: Why Neural Networks for Machine Learning? ruclips.net/video/NaYvohpr9No/видео.html Part 2: Building Neural Networks - Neuron, Single Layer Perceptron, Multi Layer Pe...
Derive Backpropagation Algorithm for Neural Networt Training [Lecture 5.6]
Просмотров 4293 года назад
"How is the backpropagation algorithm related to the delta rule? Why is backpropagation so efficient? How to derive the backpropagation algorithm?" Subscribe the channel ruclips.net/channel/UCgQlZ6kefvYeHDe YkFluA Part 1: Why Neural Networks for Machine Learning? ruclips.net/video/NaYvohpr9No/видео.html Part 2: Building Neural Networks - Neuron, Single Layer Perceptron, Multi Layer Perceptron r...
Delta Rule for Neural Network Training as Basis for Backpropagation [Lecture 5.5]
Просмотров 2,1 тыс.3 года назад
"How can we train neural networks efficiently? How is the famous backpropagation algorithm derived?" Subscribe the channel ruclips.net/channel/UCgQlZ6kefvYeHDe YkFluA Part 1: Why Neural Networks for Machine Learning? ruclips.net/video/NaYvohpr9No/видео.html Part 2: Building Neural Networks - Neuron, Single Layer Perceptron, Multi Layer Perceptron ruclips.net/video/MqeGZcqkrn0/видео.html Part 3:...
How Neural Networks Really Work - From Logistic to Piecewise Linear Regression [Lecture 5.4]
Просмотров 4133 года назад
"Can we understand the black box neural networks? Why neural networks often are just composed of linear sections." Subscribe the channel ruclips.net/channel/UCgQlZ6kefvYeHDe YkFluA Part 1: Why Neural Networks for Machine Learning? ruclips.net/video/NaYvohpr9No/видео.html Part 2: Building Neural Networks - Neuron, Single Layer Perceptron, Multi Layer Perceptron ruclips.net/video/MqeGZcqkrn0/виде...
Activation Function of Neural Networks - Step, Sigmoid, Tanh, ReLU, LeakyReLU, Softmax [Lecture 5.3]
Просмотров 6563 года назад
"Why is the activation function the most crucial component in neural networks? What are the differences of different activation functions? When to use which?" Subscribe the channel ruclips.net/channel/UCgQlZ6kefvYeHDe YkFluA Part 1: Why Neural Networks for Machine Learning? ruclips.net/video/NaYvohpr9No/видео.html Part 2: Building Neural Networks - Neuron, Single Layer Perceptron, Multi Layer P...
Building Neural Networks - Neuron, Single Layer Perceptron, Multi Layer Perceptron [Lecture 5.2]
Просмотров 5623 года назад
"What is a neuron and how does it work? From a single neuron to a layer of neurons to multiple layers of neurons." Subscribe the channel ruclips.net/channel/UCgQlZ6kefvYeHDe YkFluA Part 1: Why Neural Networks for Machine Learning? ruclips.net/video/NaYvohpr9No/видео.html Part 2: Building Neural Networks - Neuron, Single Layer Perceptron, Multi Layer Perceptron ruclips.net/video/MqeGZcqkrn0/виде...
Why Neural Networks for Machine Learning? [Lecture 5.1]
Просмотров 2533 года назад
"What is the idea behind artificial neural networks?" Subscribe the channel ruclips.net/channel/UCgQlZ6kefvYeHDe YkFluA Part 1: Why Neural Networks for Machine Learning? ruclips.net/video/NaYvohpr9No/видео.html Part 2: Building Neural Networks - Neuron, Single Layer Perceptron, Multi Layer Perceptron ruclips.net/video/MqeGZcqkrn0/видео.html Part 3: Activation Function of Neural Networks - Step,...
When to Stop the Training of a Decision Tree? - Hyperparameters of Decision Trees [Lecture 4.3]
Просмотров 4523 года назад
"Hyperparameters are used to control the conditions for a split. This is necessary to avoid overfitting." Subscribe the channel ruclips.net/channel/UCgQlZ6kefvYeHDe YkFluA Part 1: Decision Tree Classifier - Main Ideas! ruclips.net/video/FxtJKkILyng/видео.html Part 2: How to Make a Decision Tree - Mathematical Theory of Training with Gini Impurity ruclips.net/video/jM4hGZSUl4E/видео.html Part 3:...
How to Make a Decision Tree - Mathematical Theory of Training with Gini Impurity [Lecture 4.2]
Просмотров 4343 года назад
"Understand the mathematical theory of decision tree training. The gini impurity is used to build a local cost function and with a greedy strategy, this leads to the CART algorithm." Subscribe the channel ruclips.net/channel/UCgQlZ6kefvYeHDe YkFluA Part 1: Decision Tree Classifier - Main Ideas! ruclips.net/video/FxtJKkILyng/видео.html Part 2: How to Make a Decision Tree - Mathematical Theory of...
Decision Tree Classifier - Main Ideas! [Lecture 4.1]
Просмотров 3313 года назад
"What is a decision tree? Understand the theory of decision trees, a basic algorithm in machine learning." Subscribe the channel ruclips.net/channel/UCgQlZ6kefvYeHDe YkFluA Part 1: Decision Tree Classifier - Main Ideas! Part 2: How to Make a Decision Tree - Mathematical Theory of Training with Gini Impurity ruclips.net/video/jM4hGZSUl4E/видео.html Part 3: Hyperparameters of Decision Trees rucli...
Support Vector Regression - in Comparison to Linear Regression [Lecture 3.6]
Просмотров 23 тыс.3 года назад
"How to use the support vector machine for regression problems? Why it is different to linear regression?" Subscribe the channel ruclips.net/channel/UCgQlZ6kefvYeHDe YkFluA Part 1: Support Vector Machines - Main Ideas! ruclips.net/video/54_KNIdPn6E/видео.html Part 2: Training of Support Vector Machines ruclips.net/video/54_KNIdPn6E/видео.html Part 3: Derive the Dual Formulation for Support Vect...
Soft Margin for Support Vector Machines [Lecture 3.5]
Просмотров 6423 года назад
Soft Margin for Support Vector Machines [Lecture 3.5]
Kernel Trick for Support Vector Machines [Lecture 3.4]
Просмотров 1,3 тыс.3 года назад
Kernel Trick for Support Vector Machines [Lecture 3.4]
Derive the Dual Formulation for Support Vector Machines [Lecture 3.3]
Просмотров 2,8 тыс.3 года назад
Derive the Dual Formulation for Support Vector Machines [Lecture 3.3]
Training of Support Vector Machines [Lecture 3.2]
Просмотров 1,2 тыс.3 года назад
Training of Support Vector Machines [Lecture 3.2]
Support Vector Machines - Main Ideas! [Lecture 3.1]
Просмотров 1,9 тыс.3 года назад
Support Vector Machines - Main Ideas! [Lecture 3.1]
How to Evaluate Classification Models - Confusion Matrix and Precision-Recall Curve [Lecture 2.7]
Просмотров 2513 года назад
How to Evaluate Classification Models - Confusion Matrix and Precision-Recall Curve [Lecture 2.7]
What is the Meaning of Cross Entropy/ Log Loss as Cost Function for Classification? [Lecture 2.6]
Просмотров 8003 года назад
What is the Meaning of Cross Entropy/ Log Loss as Cost Function for Classification? [Lecture 2.6]
Cross Entropy vs. MSE as Cost Function for Logistic Regression for Classification [Lecture 2.5]
Просмотров 3,9 тыс.3 года назад
Cross Entropy vs. MSE as Cost Function for Logistic Regression for Classification [Lecture 2.5]
One vs One and One vs All for Multiple Class Classifications [Lecture 2.4]
Просмотров 3153 года назад
One vs One and One vs All for Multiple Class Classifications [Lecture 2.4]
Softmax Regression as a Generalization of Logistic Regression for Classification [Lecture 2.3]
Просмотров 1,4 тыс.3 года назад
Softmax Regression as a Generalization of Logistic Regression for Classification [Lecture 2.3]
Classification with the k-Nearest Neighbor Algorithm - kNN [Lecture 2.2]
Просмотров 2173 года назад
Classification with the k-Nearest Neighbor Algorithm - kNN [Lecture 2.2]
Logistic Regression and the Classification Task of Machine Learning [Lecture 2.1]
Просмотров 4873 года назад
Logistic Regression and the Classification Task of Machine Learning [Lecture 2.1]
Regularization - Early Stopping, Ridge Regression (L2) and Lasso Regression (L1) [Lecture 1.6]
Просмотров 3873 года назад
Regularization - Early Stopping, Ridge Regression (L2) and Lasso Regression (L1) [Lecture 1.6]
Cross Validation - How to Select the best Machine Learning Model? [Lecture 1.5]
Просмотров 3173 года назад
Cross Validation - How to Select the best Machine Learning Model? [Lecture 1.5]
Data Preprocessing - Normalization, Outliers, Missing Data, Variable Transformation [Lecture 1.4]
Просмотров 3423 года назад
Data Preprocessing - Normalization, Outliers, Missing Data, Variable Transformation [Lecture 1.4]
I found this the best explanation on the internet! Still, I have one question: The primal optimazition is minimizing the weight vector. At the dual optimization are we minimaizing lamdas aswell?
Thank you. To your question: If you are using the dual formulation, you are optimising the lambdas. In the cost function are only lambdas. Then, you can directly calculate the weights with w=sum lambda*x*y
I have searched all over the internet and this is by far the best explanation of the derivation of the Dual formula for an SVM.
Super clear, thank you so much!!
hello, what do the subscript i and j represent in the double summation?
The subsricpts i and j represent summations over the N data points to train the model.
Hello. Which method should I follow to get the best lambda value on Ridge regression? For example I have a small dataset and I want to manually (by hand) get the best lambda value. How do I do this?
Hi. This is pretty diffecult to answer, especially without more detailed knowledge about your data. But typical values are 0.1, 1.0, 10.0. Best regards Christian
very intuitive ! Thank You !
so clear
Nice explanation and presentation, thanks
Many thanks for the clear step-by-step explanation!
Thank you so much! This really helped me!
great explaination.
AWESOME !!!!!!
Excellent !!!
Amazing video. I was wondering the different between MSE and Cross Entropy and nothing on the internet gives a clear and detailed explanation as you do. So sad you stop producing new videos, this kind of high quality explanation is what we need. May I ask how do you learn all this stuff?
Thank you for your nice comment. I hope to find time to make new videos, but the lectutes are beside my daily job just for fun...and fun time is at the momente rare ;-) But to your question: After my phd in physics, i started a job as a Data Scientist in the industry. I wanted to learn all the ML basics and read some books. To reflect my knowledge, as a side project, i wrote a lecture on ML for my former university. I hold the lecture and realized that i missed a lot of explanations and proofs. And, as you also mentioned, for some of these it is diffecult to find something in the internet. But often this does not matter (if you have some knowledge about mathematical proofs - i took a lot of math and physics courses at university). Start with looking up how a statement like "convex shape" is defined. Then write down all the math formulation of your "assumtions" like cost function and so on. They have to be substituted into the "left side" of the statement. Now, of course, the tricky part is to calculate that the "right side" of the statement shows up. This needs some experience you might become by studying math, physics or something similar. But I think most important is, to start by writing the first steps down on paper and not geting afraid that this problem is to diffecult or so. Hope this gives you a quick insight... Best regards
Enjoy the video. Is it possible to get the slides of the presentation?
Hi, sorry, right kow, there is no website where you can download the slides. If they are available, i give a note here on RUclips.
Hi! These playlists are incredibly effective. I utilize them for my all works. But i just wanna ask, where is the references? I just wonder, and want to add to my works them. Is there any sources like paper, review article or books that you used to prepare these videos? If yes, i really wanna know them to examine more. Thanks for your labour. Keep going please :)
Thank you for your comment and question. I can stongly recommend among other the books "Hands-On Machine Learning with Scikit-Learn and Tensorflow" by Aurelien Geron, "Deep Learning" by Ian Goodfellow, Yoshua Bengio, Aaron Courville, "The Elements of Statistical Learning" by Travor Hastie, Robert Tibshirani, Jerome Friedman. The first book gives a nice lightweigt introduction with Python implementations whereas the latter show a more mathematical-based approach to machine learning. Besides these books I recommend the videos by Andrew Ng ruclips.net/user/Deeplearningai and Patrick Winston ruclips.net/user/mitocw . Best regards Christian
nice #amflearning #amflearningbydoing
awesome #amflearning #amflearningbydoing
you are awesome, #amflearning #amflearningbydoing
wonderful !
why we must use the sigmoid, if we don't use the sigmoid, the loss function will be convex function
Thanks for your question/comment. Yes, in general, you can either modify the loss function or the classification-function to obtain a convex optimization task for classification. Logistic regession make use of the sigmoid function for solving classification tasks. It is a very common used function for classification. But indeed, you can use different functions for classification as well. For example, a trival choice is the linear function which gives you a convex optimization problem. However, this function has problems when is comes to outliers (ruclips.net/video/9fTHqy51N_o/видео.html). Because the logistic function has nice properties when it comes to classification, the easy way is to modify the loss function.
Fantastic video. Well done!
great job
good video, i really like the bullet points and the fact that you talk through them
This is the clearest explanation of SVM optimization math-wise. Other channels just don’t have the patience or the partial derivative skill to clarify the details of every step.
great lecture
The data inside the margin are NOT taken into account. Only the outside data do.
Thank you sir, it's clearly explained.
Very nice mate keep it on.
Thanks a lot
Excellent simplification of a complex concept...surprising that there are so very views...in fact one of the best explanations i have come across...thank you for your efforts...
good explanation!
Thnx sir😊
Thank you for good lecture
Thank U!! I didn't know how the support vector regression model worked! but I got to know because of you~
So in the higher-order dimension, SVR it is not seeking to find out "two separate groups" like in SVM but is trying to fit a line between the points?
Thanks for your question. Yes, Support Vector Regression (SVR) does not separate groups in your data. SVR fits a curve to the data points. And this curve is used for a regression problem to predict a concrete numeric value for new data. To find the best curve, margin boundaries are specified which include all data points (at least for hard margins). The curve for the predictions is then in the middle of the margin boundaries. The curve is linear for the linear SVR and can be non-linear if the kernel trick is applied (see: ruclips.net/video/6fqONx4mMI8/видео.html). In contrast, a Support Vector Machine (SVM) used for classification (also SVC) separates two groups of different classes by a curve. Then, new data is classified with respect to this curve. I hope this explanation helps. If you still have questions, please dont hesitate to ask.
Thanks a lot for the crystal clear lectures !
ReLU automatically has a rather brutal degree of curvature. Perhaps 2 sided Parametric ReLU could be a better choice and also allows zero curvature initialization --- AI462 neural networks.
Thanks for your comment. I agree, a two sided parametric ReLU is a good choice. You have to find a tradeoff between incorporating non-linearity and at the same time gradients of approximately 1. Most often I prefer LeakyReLU, but more different slopes can be advantagous depending on the use case.
Thanks for this tutorial but i didn't understand something when we want to predict a test vector using one vs one model and suppose the label of this test vector is A so how we process
Thanks for your question and sorry for the late response. Here is a concrete example for one vs. one classification: Let's assume we have 4 classes A, B, C, and D. 1.) First, one-hot encoding is applied to the data to obtain a vector representation, i.e., we have 4 variables in a vector which can be 0 or 1. Then, the test vector has 4 entries and each entry can be either 0 or 1. For example, a test vector for class A is (1,0,0,0). 2.) Let's consider a logistic regression model. For the 4 different classes, we have the following logistic regression models: (1) a model for A or B; (2) a model for B or C; (3) a model for C or D; (4) a model for A or C; (5) a model for A or D; (6) a model for B or C. Each model outputs typtically a probability value between 0 and 1 to decide between the two classes. For example, model (1) outputs the probability for class B and the inverse is the probability for class A. So, if the model outputs 0.8, this model would prefer class B over class A. 3.) Application of the models: assume an input which corresponds to a test vector output with class A. If all models are well trained, we excpect model (1), (4), and (5) to vote for class A. So class A has a score of +3. Model (2), (3), and (6) vote randomly for the other classes. However, non of these classes can have a score geather than 2 (because non of the 3 classes B, C, D is in all 3 models). So, in total, the one vs. one approach selects class A. I hope this explanations helps. If there are still questions, please do not hesitate to ask.
@@amile-machinelearningwithc4547 Thanks a lot
Convolution, weighted sums and fast transforms (FFT Hadamard) are dot products or collections of them. Max pooling and ReLU are switches. You can swap what is adjusted in a neural net. You can use a fast transform as a fixed collection of dot products with adjustable (parametric) activation functions like fi(x)=ai.x x<0, fi(x)=bi.x x>=0, i=0 to m. To stop the first transform from taking a spectrum you can apply a fixed randomly chosen (or sub-random) pattern of sign flips to the input to the net. Such a net then is: sign flips, transform, functions, transform, functions,....., transform. How can that work? Each dot product is a statistical summary measure and filter looking at all the neurons in the prior layer and responding to what it sees and is able to modulate its response to what it sees. The cost per layer is nlog2(n) add subtract operations and n multiplies. And uses 2n parameters where n is the width of the net.
ReLU is a switch🤔 f(x)=x is connect, f(x)=0 is disconnect. A light switch in your house is binary on off yet connects and disconnects a continuously variable AC voltage signal. A ReLU net is a switched composition of dot products. The dot product of a number of (switched) dot products is still a dot product. When all the switch states become known the net collapses to a simple matrix.