Neural Network learns Sine Function with custom backpropagation in Julia
HTML-код
- Опубликовано: 16 июл 2024
- Reverse-Mode Automatic Differentiation (the generalization of the backward pass) is one of the magic ingredients that makes Deep Learning work. For a simple Multi-Layer Perceptron, we will implement it from scratch. Here is the code: github.com/Ceyron/machine-lea...
To me, autodiff is something extremely beautiful. It is different from symbolic or numeric differentiation and, in its reverse-mode, has a beautiful constant complexity with respect to the number of parameters. In other words, it will take us only one forward and one backward evaluation of a Neural Network, in order to obtain a gradient estimate. Such derivative information is then useful for training the network by means of gradient descent, one of the simplest optimization algorithms.
This video will be a hands-on implementation of the core pillars for Neural Network training with a particular focus on the backward pass.
Timestamps:
00:00 Introduction
00:17 About Deep Learning and focus on backward pass
00:50 Simplifications for the script
01:10 About the (artificial) dataset
01:41 Weights & Biases defined by layers
02:03 Four major steps for training Neural Networks
02:37 Theory of Forward Pass
05:00 Theory of Parameter Initialization (Xavier Glorot Uniform method)
05:50 Theory of Backward Pass (backpropagation)
11:36 Theory of Learning by gradient descent
12:26 More details on the backward pass
13:!2 Working interactively with Julia REPL session
13:27 Imports
13:47 Define Hyperparameters
14:26 Random Number Generator
14:36 Generate (artificial) dataset
15:56 Scatter plot of dataset
17:22 Define Sigmoid nonlinear activation function
17:39 Lists for parameters and activation functions
18:05 Parameter initialization
21:11 Implement (pure) forward pass
22:43 Plot initial network prediction
24:07 Implement forward loss computation (MSE)
25:28 Primal pass for backpropagation
27:53 Implement pullback for loss computation
28:56 Backward pass
34:27 Define activation function derivatives
35:42 Sample call to backward pass
37:00 Wrapping it in a training loop
39:14 First run of the training loop
39:32 Bug Fixing and re-running the training
40:27 Prediction with the final fit (plus plot)
41:21 Plotting the loss history
41:54 Summary
42:58 Outro
-------
📝 : Check out the GitHub Repository of the channel, where I upload all the handwritten notes and source-code files (contributions are very welcome): github.com/Ceyron/machine-lea...
📢 : Follow me on LinkedIn or Twitter for updates on the channel and other cool Machine Learning & Simulation stuff: / felix-koehler and / felix_m_koehler
💸 : If you want to support my work on the channel, you can become a Patreon here: / mlsim
🪙: Or you can make a one-time donation via PayPal: www.paypal.com/paypalme/Felix...
-------
⚙️ My Gear:
(Below are affiliate links to Amazon. If you decide to purchase the product or something else on Amazon through this link, I earn a small commission.)
- 🎙️ Microphone: Blue Yeti: amzn.to/3NU7OAs
- ⌨️ Logitech TKL Mechanical Keyboard: amzn.to/3JhEtwp
- 🎨 Gaomon Drawing Tablet (similar to a WACOM Tablet, but cheaper, works flawlessly under Linux): amzn.to/37katmf
- 🔌 Laptop Charger: amzn.to/3ja0imP
- 💻 My Laptop (generally I like the Dell XPS series): amzn.to/38xrABL
- 📱 My Phone: Fairphone 4 (I love the sustainability and repairability aspect of it): amzn.to/3Jr4ZmV
If I had to purchase these items again, I would probably change the following:
- 🎙️ Rode NT: amzn.to/3NUIGtw
- 💻 Framework Laptop (I do not get a commission here, but I love the vision of Framework. It will definitely be my next Ultrabook): frame.work
As an Amazon Associate I earn from qualifying purchases.
-------
Wonderful video, just started with Julia and saw your video. Thanks a lot!
Thanks for the comment and welcome to the channel. 😊 Hope you'll enjoy it, it's a beautiful and very promising language.
Great video again! I guess there is a little mistake in the line 211. Mean function already calculates the sum of elements in the delta vector. In my opinion, it should be like loss = 0.5 * mean(delta.^2). What do you think? Thanks for the tutorial😊😊
@@aykutcayir64 Interesting point. 👍 In the example considered here, there is no difference between the two, since we only have one spatial dimension (over which we sum). More generally speaking, I think there is some debate on which to use (mean or summation). It depends on whether you want to use a norm induced by a function space (i.e. a vector norm on degrees of freedom for FE interpolation of a function) then you would use the MSE in space as well, or just a finite dimensional norm which is when you would use SSE. Over the batch dimension, I would say that only MSE is consistent with the expectation.
What do you think?
And of course, thanks for the kind feedback :)
@@aykutcayir64 On a different note. If you create a new comment (instead of replying in the comment thread), the visibility for me is higher, because new comments are shown in the RUclips studio app for, whereas answers are only in the RUclips notification feed. :)
wonderful video, could you make a video on parallel or cuda computing with julia as well
Great idea 👍
Certainly, there will be more sophisticated Julia videos using these topics. Already have the videos for the next weeks planned, it's something for the long-term content. I will add it to my Todo list.
@@MachineLearningSimulation I would appreciate gaussian process regression theory and tfp implementation :). maybe you could pick that up in the future :)
@@mullermann2899 Already on my To-Do list, but I always get dragged away by topics close to my own research :D. Probabilistic ML got a bit out of focus for me personally. Hope I can pick it up soon, but I should better not give promises based on when. A big part of why I create the videos is because I enjoy it a lot. I am afraid to lose that once I stop following my own agenda. :D Hope you can understand that.
@@MachineLearningSimulation i can truly understand that. No problem :). I was suggesting it because of my own research haha 😂