I can't say how great the videos of this series are, although java isn't great for machine learning, this definetly pushed me toward learning more about it!!
Thank you! :) Java is really slow. Modern technologies all use GPU support (which is like 10000 times faster). But this series is only about understanding. It also helped me to get a better understanding of this topic and I am getting many requests if I can help someone with neural networks. I also worked for my university in that field. This topic is incredibly complex to begin with but it gets easier with time. You will eventually understand that you are simply calculating the gradient of your error function and change the weight accordingly. If you have any questions. Feel free to ask them :)
Thanks for putting this series out there! I've been interested in NN for a while but didn't know where to start. Java is my main language so this should be a good exercise to get into it. I haven't completed the series yet, though. In fact, I watched this video second, then 10, then 9, and then 8, before I realized that your playlist is backwards, haha. So.. thanks, again but you might want to recreate the playlist.
Jagarti Dev oh thank you for telling me and I am glad you like it. Sorry for that. If you have any questions just let me now. What you could do is to code neural networks object Orientated. Objects for neurons, connections, layers etc. It obviously is much slower than what I did but it’s good for understanding the stuff aswell.
@@finneggers6612 I'll be trying all kinds of things, I think. I tried to use the open source lib Neuroph a long time ago but didn't have a clue what I was looking at. After what I've seen in this series and another video I saw somewhere, I'm excited to get cracking on my own experiments. I'll be sure to reach out if I have questions. Thanks so much!
@@finneggers6612 Hey there, I'd be really interested too in looking at the code. I just binge watched the entire series! May I send you an email on the email address provided in your channel description? Looking forward to hearing from you!
Hey man, first thing, excellent videos. I learned hell alot about neural networks, which I needed because of college and stuff. So, I´m working on a regression type neural network, and I realy didnt know there was this difference with ur type of network until a ran tests and searched up things. I was wondering if ur still into this and could maybe make a video of a regression type network? Or if u know a video of someone doing it in Java and explaining real well like u do. Im searching on how exactly this linear regression thing works but most of the videos use other program languages and dont explain realy well.
Well linear regression and neural networks are not really the same thing but let me explain it to you: First: Yes I am still into that. A: A neural network is not linear so linear regression isn't really a thing. If you only take on single layer of a network, you will have a linear function IF there is no activation function for that layer. B: Let's say that we do linear regression on some data. Basically, what we want to do is find a function f(x) = a*x + b that fits the data perfectly. Now, we do exactly the same with neural networks. We define our cost function and minimize that. In fact, we use the same minimizer (the mean squared error (for regression tasks, it is the best)). So we define E(a,b) = (f(x_target) - y_target)^2. You can use a pen and a paper to write down the derivative of that with respect to a and b and apply gradient descent to find out optimal a and b. If you have any questions about this, feel free to ask me. If I did not understand your question correctly, feel free to say that aswell and I will do some research to help you if you like.
I've just used my whiteboard to derive the formular to optimize f(x) = ax+b. The formula for the gradient is: dE/da = y_i - a * x_i - b -x_i dE/db = y_i -x_i (a+1) - b Now, if you do this iteratively for every point in your dataset y_i and x_i, you will eventually have your perfect a and b. Allthough f(x) = ax + b doesnt work in higher dimensions but the theory is the same. Just use neural networks with 1 layer and no activation function and voila, your function f(x) aka. your neural network is linear and you can apply gradient descent aka. backpropagation for regression
@@finneggers6612 maybe i just used the wrong words or something ruclips.net/video/LvIa0-ZKCrc/видео.html I found out I was messing up with this video at 5:41 I was making a classification code when I wanted a regression one lol. And everytime I look it up, people talk about this linear regression with the y=mx+b function, being this function necessary to predict outputs with the input data. I just cant get what exactly theyre doing or talking about. I can understand why finding the y=mx+b function for all the input and output data, but just that
Great video series! How do you know how many neurons should be in each layer? My intput layer has a size of 300 and the output layer 2. (from index 0-149 five of the inputs will be 1 and the rest 0, (same for index 150-300). Currently I've tried to use 300-600-100-2 for my layers, but did not get great accuracy. Given enough training time, should I add more neurons/layers or less?
There is no simple approach of finding the best network structure (topology). There have been some attempts and they are basically like this: Start with a small network. If underfitting occures (bad accuracy), increase the network size as long as it gets better. Most of the time, I just choose something that would make sense for me. But there are still a few "rules" that you might want to follow: - Don't go from many neurons to very little amount of neurons (100-2). Do more like 100-60-10-2 - Try to always reduce the network size (there are many examples where this is not the case) but maybe you should try: 300-100-20-2. But most important: Get enough training data.Especially when there are so many different possibilities (like you have). Also you should consider finding a better way for the input. 10 activated in 300 is very little. If you could somehow increase this number, it would be great. Feel free to E-Mail me if you want to show me your project or have any questions.
@@finneggers6612 Thank you for such a thorough reply. Would it be better if I inverted the input? (changing the input to 290 1's and 10 0's)? In regards to getting more training data, I have around 100k datasets, do you reckon that is enough?
100k? Yeah probably :D "Would it be better if I inverted the input?": I am not sure, It might make things worse. Could you describe what the input stands for? I would like to know the context.
@@finneggers6612 The input data looks like this: 0 132 3 68 26 94 97 86 39 15 64. I'm trying to predict the outcome of a LOL game based on champion select. The first number represents win/loss, and the rest are the champion that were picked. The champion ids are the index of the input array, and will have the value of 1; (last 5 have an index of 150+ the id);
@@finneggers6612 I'm very new to neural networks and programming in general, I just figured that since there are so many different possibilities I would want more input neurons. (just based on intuition haha). I do have discord, but the code I've written is very uninteresting (just modifying the mnist.java slightly). I will check out the gitHub code, but at this point my understanding of neural networks is too bad for me to understand it, I reckon.
Hi, @Finn Eggers ! Just wanted to say that your tutorials are amazing, helped me a lot in understanding the "coding" part of the NNs. I am really impressed with your project, i.e. visualization of a NN working process by plotting dots and then grouping them by coloring the area nearby. Just wandering if there is any possibility to get the access to the full code for this example? In any case, thanks again :3 and keep it up!
Thank you! I was looking for that specific code and found something better. The code isnt beautiful in both project but I have redone that project and added a 3rd dimension. github.com/Luecx/Luecxjee/tree/master/src/projects/interpolator This is my personal AI library, server stuff. the interpolator uses the AI-Stuff. Feel free to check it out. In the example, the dots can be placed in 3D and therefor the neural network takes 3 values as an input and predicts the value in between the dots. I had some fun with that and used a class from someone else to put everything into a minecraft world to view it in 3d.
how about a small competition? my Network(784, 300, 120, 60, 10) i trained it like a few hours and now it identifies 29950 / 30001 pictures correctly... did someone got a better Network ?
I am afraid that your accuracy is so far the highest that I've seen. I've had one with an accuracy of 99.81% (Just to clarify, you are not testing on the data you've trained on, right?) I could let my pc run the next night and train a network for a few hours as well. Let's just clarify how big our training data and how big our testing data is.
xD i just realized that the whole data is only 30.000 pictures big... not 60.000, like the whole mnist dataSet... ( i thought i trained on the first 30.000(0 - 30000) and tested on the last 30.000(30000 - 60000) ... sorry my fault...
I've been thinking lately how powerful is Samsung 8K AI technology. Just imagine, 8K -> 7680×4320 is 33177600 pixels. We have here only 28x28. Their input data is huuuuge haha
Yeah but they first apply Conv. layers which might be discussed later :) There you have shared weights, that means that you have like. maybe 1000 weights in total. instead of billions
Julian Abhari no it does not but I had the same Idea so I’ve written a new structure in java that supports different layer types with different activation functions. I started a series about that where the fully connected layer is already implemented. I did not yet have time to make the video about convolution but the code is already finished. You can take a look at that series and for the convolution check out the github repo. I am sure you could transform that code into convolution but that’s pretty much what I did in the other structure. You need to save and access your weights differently because of weights sharing. That’s pretty much it
I checked out the github repo but I'm a bit confused on the ConvLayer and how to implement it in the ConvLayer. The Mnist class in the github repository has some commented out code on a convolutional architecture. When I treid it I only got a max success rate of 8.75%. Then I changed the activation and error functions to MSE and Sigmoid and could ony get a success rate of 54%. Do you know why it turned out so poorly?
Well the thing about the conv. implementation is that it is rather slow. I might have pushed the wrong code. Could you tell me which main-method you were running :) or post the code of the main-method here? I will check that now and see if I get the same result. I will create a main method with working code. You will see your name attached.
Huh, that's odd. I just tried that exact code in the Mnist class and it took 30 seconds but only got 87 / 1000 -> 8.7%. I think the problem then must be my current implementation of CrossEntropy, Softmax, ConvLayer, or PoolingLayer. I got those classes from your Github repo but you're right that the last commit happened on May 20, 2018. Thanks, btw, for helping me out :)
IntelliJ is pretty weird when it comes to console output, it sometimes eat one line or smth. I always test in a standalone .jar artifact, so it works flawlessy. PS: Your project is pretty nice, I will try to do something similar later.
I guess it didn't print the "load" method? I was really confused when making this video so I had to skip like 1 minute (can be seen somewhere in the video) to figure out how to load the network.
I can't say how great the videos of this series are, although java isn't great for machine learning, this definetly pushed me toward learning more about it!!
Thank you! :) Java is really slow. Modern technologies all use GPU support (which is like 10000 times faster). But this series is only about understanding.
It also helped me to get a better understanding of this topic and I am getting many requests if I can help someone with neural networks. I also worked for my university in that field.
This topic is incredibly complex to begin with but it gets easier with time.
You will eventually understand that you are simply calculating the gradient of your error function and change the weight accordingly.
If you have any questions. Feel free to ask them :)
Thanks a lot . The study of the subject was presented in an interesting and useful way. The source code of the GUI
is missing. Can you add it to web
Thanks for putting this series out there! I've been interested in NN for a while but didn't know where to start. Java is my main language so this should be a good exercise to get into it. I haven't completed the series yet, though. In fact, I watched this video second, then 10, then 9, and then 8, before I realized that your playlist is backwards, haha. So.. thanks, again but you might want to recreate the playlist.
Jagarti Dev oh thank you for telling me and I am glad you like it. Sorry for that.
If you have any questions just let me now. What you could do is to code neural networks object Orientated. Objects for neurons, connections, layers etc. It obviously is much slower than what I did but it’s good for understanding the stuff aswell.
@@finneggers6612 I'll be trying all kinds of things, I think. I tried to use the open source lib Neuroph a long time ago but didn't have a clue what I was looking at. After what I've seen in this series and another video I saw somewhere, I'm excited to get cracking on my own experiments. I'll be sure to reach out if I have questions. Thanks so much!
How can I get that guy's UI implementation for drawing the numbers? I could not find the code in his channel :(
I can send you the code via mail if you wish. Or any other alternative?
@@finneggers6612 email would be perfect! If the file is too large, maybe with pCloud transfer?
anizivzivadze86@gmail.com
Thank you so much
@@finneggers6612 Hey there, I'd be really interested too in looking at the code. I just binge watched the entire series! May I send you an email on the email address provided in your channel description? Looking forward to hearing from you!
@@Opdelta122 thats why i put that mail there ;) I just hope its working because i am having a few problems with it the last couple of days
@@finneggers6612 Amazing! I just sent you an email. I'll let you know if it works for me!
Hey man, first thing, excellent videos. I learned hell alot about neural networks, which I needed because of college and stuff.
So, I´m working on a regression type neural network, and I realy didnt know there was this difference with ur type of network until a ran tests and searched up things. I was wondering if ur still into this and could maybe make a video of a regression type network? Or if u know a video of someone doing it in Java and explaining real well like u do. Im searching on how exactly this linear regression thing works but most of the videos use other program languages and dont explain realy well.
Well linear regression and neural networks are not really the same thing but let me explain it to you:
First: Yes I am still into that.
A: A neural network is not linear so linear regression isn't really a thing.
If you only take on single layer of a network, you will have a linear function IF there is no activation function for that layer.
B: Let's say that we do linear regression on some data. Basically, what we want to do is find a function f(x) = a*x + b that fits the data perfectly.
Now, we do exactly the same with neural networks.
We define our cost function and minimize that. In fact, we use the same minimizer (the mean squared error (for regression tasks, it is the best)).
So we define E(a,b) = (f(x_target) - y_target)^2.
You can use a pen and a paper to write down the derivative of that with respect to a and b and apply gradient descent to find out optimal a and b.
If you have any questions about this, feel free to ask me. If I did not understand your question correctly, feel free to say that aswell and I will do some research to help you if you like.
I've just used my whiteboard to derive the formular to optimize f(x) = ax+b.
The formula for the gradient is:
dE/da = y_i - a * x_i - b -x_i
dE/db = y_i -x_i (a+1) - b
Now, if you do this iteratively for every point in your dataset y_i and x_i, you will eventually have your perfect a and b.
Allthough f(x) = ax + b doesnt work in higher dimensions but the theory is the same. Just use neural networks with 1 layer and no activation function and voila, your function f(x) aka. your neural network is linear and you can apply gradient descent aka. backpropagation for regression
@@finneggers6612 maybe i just used the wrong words or something
ruclips.net/video/LvIa0-ZKCrc/видео.html
I found out I was messing up with this video at 5:41
I was making a classification code when I wanted a regression one lol. And everytime I look it up, people talk about this linear regression with the y=mx+b function, being this function necessary to predict outputs with the input data.
I just cant get what exactly theyre doing or talking about. I can understand why finding the y=mx+b function for all the input and output data, but just that
hi man amazing lessons . just one question ...how did you use your input in this program? where have you add the input (learn input and teach input)?
video 7/8
@@finneggers6612 thanks man
Great video series! How do you know how many neurons should be in each layer? My intput layer has a size of 300 and the output layer 2. (from index 0-149 five of the inputs will be 1 and the rest 0, (same for index 150-300). Currently I've tried to use 300-600-100-2 for my layers, but did not get great accuracy. Given enough training time, should I add more neurons/layers or less?
There is no simple approach of finding the best network structure (topology). There have been some attempts and they are basically like this:
Start with a small network. If underfitting occures (bad accuracy), increase the network size as long as it gets better.
Most of the time, I just choose something that would make sense for me. But there are still a few "rules" that you might want to follow:
- Don't go from many neurons to very little amount of neurons (100-2). Do more like 100-60-10-2
- Try to always reduce the network size (there are many examples where this is not the case) but maybe you should try: 300-100-20-2.
But most important: Get enough training data.Especially when there are so many different possibilities (like you have).
Also you should consider finding a better way for the input. 10 activated in 300 is very little. If you could somehow increase this number, it would be great. Feel free to E-Mail me if you want to show me your project or have any questions.
@@finneggers6612 Thank you for such a thorough reply. Would it be better if I inverted the input? (changing the input to 290 1's and 10 0's)? In regards to getting more training data, I have around 100k datasets, do you reckon that is enough?
100k? Yeah probably :D
"Would it be better if I inverted the input?": I am not sure, It might make things worse. Could you describe what the input stands for? I would like to know the context.
@@finneggers6612 The input data looks like this: 0 132 3 68 26 94 97 86 39 15 64. I'm trying to predict the outcome of a LOL game based on champion select. The first number represents win/loss, and the rest are the champion that were picked. The champion ids are the index of the input array, and will have the value of 1; (last 5 have an index of 150+ the id);
@@finneggers6612 I'm very new to neural networks and programming in general, I just figured that since there are so many different possibilities I would want more input neurons. (just based on intuition haha). I do have discord, but the code I've written is very uninteresting (just modifying the mnist.java slightly). I will check out the gitHub code, but at this point my understanding of neural networks is too bad for me to understand it, I reckon.
Hi, @Finn Eggers ! Just wanted to say that your tutorials are amazing, helped me a lot in understanding the "coding" part of the NNs. I am really impressed with your project, i.e. visualization of a NN working process by plotting dots and then grouping them by coloring the area nearby. Just wandering if there is any possibility to get the access to the full code for this example? In any case, thanks again :3 and keep it up!
Thank you!
I was looking for that specific code and found something better. The code isnt beautiful in both project but I have redone that project and added a 3rd dimension.
github.com/Luecx/Luecxjee/tree/master/src/projects/interpolator
This is my personal AI library, server stuff. the interpolator uses the AI-Stuff. Feel free to check it out.
In the example, the dots can be placed in 3D and therefor the neural network takes 3 values as an input and predicts the value in between the dots. I had some fun with that and used a class from someone else to put everything into a minecraft world to view it in 3d.
@@finneggers6612 Thanks a lot for the reply! I will take a look at it asap)
can link to git?
Interesting video. I'm looking forward to the next tutorials :)
how about a small competition?
my Network(784, 300, 120, 60, 10) i trained it like a few hours and now it identifies 29950 / 30001 pictures correctly...
did someone got a better Network ?
I am afraid that your accuracy is so far the highest that I've seen.
I've had one with an accuracy of 99.81%
(Just to clarify, you are not testing on the data you've trained on, right?)
I could let my pc run the next night and train a network for a few hours as well. Let's just clarify how big our training data and how big our testing data is.
xD i just realized that the whole data is only 30.000 pictures big... not 60.000, like the whole mnist dataSet... ( i thought i trained on the first 30.000(0 - 30000) and tested on the last 30.000(30000 - 60000) ...
sorry my fault...
Yeah I figured that out myself. Don't know why. Maybe the way I am reading the data is wrong or something...
did you copy the dataset from someone else ?
caus i found a guy on github with the same size of the file... XP
christian smbl yeah I did. Well I copied it from the web somewhere but I did mention that I did not write the code myself :)
I've been thinking lately how powerful is Samsung 8K AI technology. Just imagine, 8K -> 7680×4320 is 33177600 pixels. We have here only 28x28. Their input data is huuuuge haha
Yeah but they first apply Conv. layers which might be discussed later :)
There you have shared weights, that means that you have like. maybe 1000 weights in total. instead of billions
Is it possible to turn this fully connected network into a convolutional neural network? If so how would that work?
Julian Abhari no it does not but I had the same Idea so I’ve written a new structure in java that supports different layer types with different activation functions.
I started a series about that where the fully connected layer is already implemented.
I did not yet have time to make the video about convolution but the code is already finished.
You can take a look at that series and for the convolution check out the github repo.
I am sure you could transform that code into convolution but that’s pretty much what I did in the other structure.
You need to save and access your weights differently because of weights sharing.
That’s pretty much it
I checked out the github repo but I'm a bit confused on the ConvLayer and how to implement it in the ConvLayer. The Mnist class in the github repository has some commented out code on a convolutional architecture. When I treid it I only got a max success rate of 8.75%. Then I changed the activation and error functions to MSE and Sigmoid and could ony get a success rate of 54%. Do you know why it turned out so poorly?
Well the thing about the conv. implementation is that it is rather slow.
I might have pushed the wrong code.
Could you tell me which main-method you were running :) or post the code of the main-method here?
I will check that now and see if I get the same result.
I will create a main method with working code. You will see your name attached.
Okay So it might work now. Not sure if it actually pushed it. Doesnt really work for some reason.
ConvLayer conv1;
NetworkBuilder builder = new NetworkBuilder(1, 28, 28);
builder.addLayer(conv1 = new ConvLayer(12, 5, 1, 2)
.biasRange(0, 0)
.weightsRange(-2, 2)
.setActivationFunction(new ReLU()));
builder.addLayer(new PoolingLayer(2));
builder.addLayer(new ConvLayer(30, 5, 1, 0)
.biasRange(0, 0)
.weightsRange(-2, 2)
.setActivationFunction(new ReLU()));
builder.addLayer(new PoolingLayer(2));
builder.addLayer(new TransformationLayer());
builder.addLayer(new DenseLayer(120)
.setActivationFunction(new ReLU())
);
builder.addLayer(new DenseLayer(10)
.setActivationFunction(new Softmax())
);
Network network = builder.buildNetwork();
network.setErrorFunction(new CrossEntropy());
network.printArchitecture();
TrainSet trainSet = createTrainSet(0,999);
network.train(trainSet,3,10,0.0001);
testTrainSet(network, trainSet,1);
This takes 2-3 minutes and returns:
Testing finished, RESULT: 833 / 1000 -> 0.833 %
Huh, that's odd. I just tried that exact code in the Mnist class and it took 30 seconds but only got 87 / 1000 -> 8.7%. I think the problem then must be my current implementation of CrossEntropy, Softmax, ConvLayer, or PoolingLayer. I got those classes from your Github repo but you're right that the last commit happened on May 20, 2018. Thanks, btw, for helping me out :)
IntelliJ is pretty weird when it comes to console output, it sometimes eat one line or smth. I always test in a standalone .jar artifact, so it works flawlessy.
PS: Your project is pretty nice, I will try to do something similar later.
Yeah that's what I do most of the times aswell :)
I guess it didn't print the "load" method? I was really confused when making this video so I had to skip like 1 minute (can be seen somewhere in the video) to figure out how to load the network.
Weird because there is obviously a line of System.out.println() for that. Either you deleted it on accident, or IntelliJ has cancer.
Well I might have deleted it by accident :/