The core assumption here comes from the fact that every variable is known to us. As they say, we don't know what we don't know. So although the maths can add up for all the known variables, how does one calculate the unknown variables. I think this is why the brain has so many neuron's, why it constantly changes, grows, and why it takes 5 years for a human to be able to correctly interact in the world. I mean, a human doesn't even use their hands correctly until about 2-3 years... The immense computation the brain requires to interact with the world, is such, because of the immense amount of variables present in the objective universe.
Thank you. Although I'm quite far removed from anything related to statistics, your material appears to offer some deep insights into fundamental questions about who we are. Quite captivating.
Amazing content, so simple and beautiful!! keep up the amazing work!! I was going through the github tutorial and wanted to ask if there was a way to use this method for logistic regression also. Is it possible to point to best way to solve this? I used the normal implementation used in the boston dataset, to turn it to logistic I just rounded the output predictions off . So given the problem was binary classification it would put out 0 or 1 . Is this approach alright or are there better ways to turn it into logistic regression and would it still be appropriate to use this method in a multiple classification problem.
I've been working on a thought experiment that's like a textured blanket in a landscape you can't really see. Imagine being in a dark room where the blanket is the only thing you can see. You can place it over objects, and it will show their unique shapes. As you do this, the blanket gets "stains" and "creases," recording information and becoming a tool that shows overlaps of information. Let's think of a cheetah. The image of the cheetah in your mind is based on previous observations, like a part of your mental "blanket." You can compare that part with other parts of your "blanket," like house cats, to create new connections between them. This idea helps me understand complex subjects, and I've been turning it into a digital version. I create complex networks of images and use a method that spreads dissimilar energy or pixel value between similar images. If you have two images of a field with different subjects, the combined image will show the same field with both subjects. Without input data you get some pretty cool multidimensional crystal like fractals but update one of the images in the network with a live video feed and the data propagates in a way which highlights and saves physical changes in temporal data distributed throughout the network.
This is a very interesting thought experiment with a nice connection to a practical approach to classification in Machine Learning. Consider the setting you described, the blanket is a landscape that takes certain shapes, depending on the object underneath. This landscape can be described by a parametric model. If you have two models with the same number of parameters, but each modeling a different landscape, you can compare the similarity of two landscapes by measuring their distance in the space of parameters.
is it accurate to say that the RLS formula w=sum(y*x)/sum(x*x) is the covariance(x,y) divided by the variance(x). So in other words the regression coefficient formula. Please can you clarify this because when computing the covariance it also involves computing the mean of x and also dividing by number of observations. So cov(x,y)= sum((x-xbar)*(y-ybar))/num_samples and this was not carried out in the video. Can you explain why this was not carried out And how its still is able to yield the gradient values . I was wondering if both processes are the same and what are their differences. Your help would be highly valued here.
Yes, these methods are related. The reason I didn't remove the mean because the simulated data were already centered around zero i.e their mean was zero. As for your other comment regarding logistic regression and multiclass classification, we have added more examples to the github rep. Hope that helps.
Very interesting video. The method you outlined is very cool and I haven't seen it before. I would love to see what efforts have been made on it being able to do traditional machine learning tasks, or ways to help solve the non-linearity problem
Hi, I tried your RLS_Neural_Network code and while I check on your rls algorithm the way you presenting the gain vector is little different from the original one, the original one has the regularization part, this is the original one: c = 1/1 + hph, yours is c = 1/hph. However after I tried the original one the results were off... but the output weights were significantly shrinked, from 300 to 0.1(just an example), do you know why this happend? Appreciate your video!
I think we could have a chance in beating Galactus if we and alpha centauri already have a protocol in place that could be put into motion as soon as the threat is detected. I suggest we start investin in Gundam level armies, just in case.
hi there. really like the videos! but i was working to implement your multiple regression solution for the boston dataset but kept running into issues as the model was very poor. is the implementation in the video able to solve the boston housing dataset??
Hi, I have added "boston housing example" to the tutorial folder of the source code. This method has a similar performance to multiple regression using matrix inversion (pinv). In any case, working with real world data requires you to do some necessary pre-processing before doing any regression analyses. These include: centering your dataset, adding a bias term, removing skews & outliers...etc.
@@brainxyz This algo is so elegant omg!!! I just have one more question: why do you walk back in the following section: for i in range(M-1, -1, -1) I tried using for i in range(M-1) but the error is much larger. any reason why this happens(why does the order matter at all), Plus why is it really necessary to update y iteratively in the last for loop again, the section y -= wy[i] * xs[i]. Since our x[i] values are already orthogonal and shouldn't really intefere with each other. so when taking the x feature components from y values shouldn't really change right??? I am finding it a little challenging to define answers for these code sections. if you can you, focus more on your explanation of the first question. But really appreciate your work and contribution. Such an interesting viewpoint you have, Keep it up, please! 👏
@@BB-ko3fh Not all the xs become orthogonal by the end of the first loops, only the last x[-1] becomes orthogonal to all the other variables, hence you walk back using a reverse loop starting from the last x. The next variable before the last x, i.e x[-2], is still not totally orthogonal because the effect of x[-1] is not removed from it. And that is why we subtract the effect of the last x from y and do that recursively for all the other variables 6:48 It's OK if you find it challenging, I needed many days to wrap my head around it and make the algorithm work. The concept is easy but applying it recursively correctly to all the variables is the brain twisting part.
@@brainxyz Thanks for clarifying further it makes much more sense now why what you did. By the end of the first loop, you only orthogonolized only the last value only and not all features!
@@brainxyz This video also reminded me of this youtube video ruclips.net/video/qpoRO378qRY/видео.html . It's Geoffrey Hinton talking about deeplearning where he states that backpropagation is not really how the brain works and that we should find better, more efficient ways to train models, and your approach to it made so much sense. Just thought of sharing it in case you were not aware, It would be a huge breakthrough if you could find a way to make this method map to non-linear functions.
I would like to volunteer to assist in your projects. I have access to human capital and building facilities. Please let me know. This has potential to cause a paradigm shift that set us in a path to a new level of civilization.
is there a simple implementation on a multiclass logistical regression. I was going through the RLS text prediction and it is a lil hands on, is it possible to illustrate this using a simpler dataset. @brainxyz
Like which dataset? you can open an issue on the Github repo and give more details about the dataset or a link to the dataset so we can add more info about how to solve it.
@@brainxyz It is the iris dataset. It has three labels to categorise (hence its not binary classification) so I presume the method will be the same whether it is for 3 labels or 10 labels.
@@BB-ko3fh I have added the iris dataset example to the tutorial folder of the Github rep. Interestingly for this dataset a simple threshold method worked better than converting the ys to multiple one_hot classes. Hope that helps.
@@brainxyzJust had a look at the dataset. Is there any way to incorporate the softmax function into the logistical regression? Plus why is the accuracy drastically different in both approaches? is this difference dataset dependent or is the first approach generally poorer than the second approach (where is the bias stemming from). Also is there any way of turning the second (much shorter) approach in capturing each of the predicted classes whilst maintaining its accuracy? @brainxyz
underrated channel, very good content
It is so rewarding to watch your videos. They can be challenging but following along is so worth it for your profound insights!
This guy is brilliant!!
The core assumption here comes from the fact that every variable is known to us. As they say, we don't know what we don't know. So although the maths can add up for all the known variables, how does one calculate the unknown variables.
I think this is why the brain has so many neuron's, why it constantly changes, grows, and why it takes 5 years for a human to be able to correctly interact in the world.
I mean, a human doesn't even use their hands correctly until about 2-3 years... The immense computation the brain requires to interact with the world, is such, because of the immense amount of variables present in the objective universe.
Thank you. Although I'm quite far removed from anything related to statistics, your material appears to offer some deep insights into fundamental questions about who we are. Quite captivating.
Oh hell yeaaah, another brainxyz video! Thanks for another amazing video!
Your videos always intrigue me. Really enjoyed this one too.
Amazing content, so simple and beautiful!! keep up the amazing work!! I was going through the github tutorial and wanted to ask if there was a way to use this method for logistic regression also. Is it possible to point to best way to solve this? I used the normal implementation used in the boston dataset, to turn it to logistic I just rounded the output predictions off . So given the problem was binary classification it would put out 0 or 1 . Is this approach alright or are there better ways to turn it into logistic regression and would it still be appropriate to use this method in a multiple classification problem.
I've been working on a thought experiment that's like a textured blanket in a landscape you can't really see. Imagine being in a dark room where the blanket is the only thing you can see. You can place it over objects, and it will show their unique shapes. As you do this, the blanket gets "stains" and "creases," recording information and becoming a tool that shows overlaps of information.
Let's think of a cheetah. The image of the cheetah in your mind is based on previous observations, like a part of your mental "blanket." You can compare that part with other parts of your "blanket," like house cats, to create new connections between them.
This idea helps me understand complex subjects, and I've been turning it into a digital version. I create complex networks of images and use a method that spreads dissimilar energy or pixel value between similar images. If you have two images of a field with different subjects, the combined image will show the same field with both subjects. Without input data you get some pretty cool multidimensional crystal like fractals but update one of the images in the network with a live video feed and the data propagates in a way which highlights and saves physical changes in temporal data distributed throughout the network.
This is a very interesting thought experiment with a nice connection to a practical approach to classification in Machine Learning. Consider the setting you described, the blanket is a landscape that takes certain shapes, depending on the object underneath. This landscape can be described by a parametric model. If you have two models with the same number of parameters, but each modeling a different landscape, you can compare the similarity of two landscapes by measuring their distance in the space of parameters.
THE GOAT RETURNS
how are you so underrated oh well i guess hidden gems arnt so bad
Welcome back mr.Hunar ❤
is it accurate to say that the RLS formula w=sum(y*x)/sum(x*x) is the covariance(x,y) divided by the variance(x). So in other words the regression coefficient formula. Please can you clarify this because when computing the covariance it also involves computing the mean of x and also dividing by number of observations. So cov(x,y)= sum((x-xbar)*(y-ybar))/num_samples and this was not carried out in the video.
Can you explain why this was not carried out And how its still is able to yield the gradient values . I was wondering if both processes are the same and what are their differences.
Your help would be highly valued here.
Yes, these methods are related. The reason I didn't remove the mean because the simulated data were already centered around zero i.e their mean was zero.
As for your other comment regarding logistic regression and multiclass classification, we have added more examples to the github rep. Hope that helps.
Very interesting video. The method you outlined is very cool and I haven't seen it before. I would love to see what efforts have been made on it being able to do traditional machine learning tasks, or ways to help solve the non-linearity problem
I've created a brain model already which remembers changes in its memory. You can find it in my videos
interesting ideas! I understand may be half of it but still mind blowing to me.
Hi, I tried your RLS_Neural_Network code and while I check on your rls algorithm the way you presenting the gain vector is little different from the original one, the original one has the regularization part, this is the original one: c = 1/1 + hph, yours is c = 1/hph. However after I tried the original one the results were off... but the output weights were significantly shrinked, from 300 to 0.1(just an example), do you know why this happend? Appreciate your video!
How would you compare this method to liquid neural nets?
I think we could have a chance in beating Galactus if we and alpha centauri already have a protocol in place that could be put into motion as soon as the threat is detected. I suggest we start investin in Gundam level armies, just in case.
hi there. really like the videos! but i was working to implement your multiple regression solution for the boston dataset but kept running into issues as the model was very poor. is the implementation in the video able to solve the boston housing dataset??
Hi, I have added "boston housing example" to the tutorial folder of the source code. This method has a similar performance to multiple regression using matrix inversion (pinv). In any case, working with real world data requires you to do some necessary pre-processing before doing any regression analyses. These include: centering your dataset, adding a bias term, removing skews & outliers...etc.
@@brainxyz This algo is so elegant omg!!! I just have one more question: why do you walk back in the following section: for i in range(M-1, -1, -1) I tried using for i in range(M-1) but the error is much larger. any reason why this happens(why does the order matter at all),
Plus why is it really necessary to update y iteratively in the last for loop again, the section y -= wy[i] * xs[i]. Since our x[i] values are already orthogonal and shouldn't really intefere with each other. so when taking the x feature components from y values shouldn't really change right??? I am finding it a little challenging to define answers for these code sections.
if you can you, focus more on your explanation of the first question. But really appreciate your work and contribution. Such an interesting viewpoint you have, Keep it up, please! 👏
@@BB-ko3fh
Not all the xs become orthogonal by the end of the first loops, only the last x[-1] becomes orthogonal to all the other variables, hence you walk back using a reverse loop starting from the last x. The next variable before the last x, i.e x[-2], is still not totally orthogonal because the effect of x[-1] is not removed from it. And that is why we subtract the effect of the last x from y and do that recursively for all the other variables 6:48
It's OK if you find it challenging, I needed many days to wrap my head around it and make the algorithm work. The concept is easy but applying it recursively correctly to all the variables is the brain twisting part.
@@brainxyz Thanks for clarifying further it makes much more sense now why what you did. By the end of the first loop, you only orthogonolized only the last value only and not all features!
@@brainxyz This video also reminded me of this youtube video ruclips.net/video/qpoRO378qRY/видео.html . It's Geoffrey Hinton talking about deeplearning where he states that backpropagation is not really how the brain works and that we should find better, more efficient ways to train models, and your approach to it made so much sense. Just thought of sharing it in case you were not aware, It would be a huge breakthrough if you could find a way to make this method map to non-linear functions.
I would like to volunteer to assist in your projects. I have access to human capital and building facilities. Please let me know. This has potential to cause a paradigm shift that set us in a path to a new level of civilization.
Thank you, sir🎉
Hunar, your very insightful ideas and conjectures deserve a better mic quality than this!
yes
This is why I learned coding 😅
دکتۆر بەیارمەتی ئەتوانیت ڤیدیۆشت هەبێ بەکوردی؟
سوپاس بۆ کۆمێنت.
هەوڵ دەدەم بە کوردیش ڤیدیۆی نوێ ئامادەبکەم.
well done!
I don't remember seeing the yesterday
O_O
So basically, we need to develop Psychohistory to defeat Galactus. I'm all for that!
:D
is there a simple implementation on a multiclass logistical regression. I was going through the RLS text prediction and it
is a lil hands on, is it possible to illustrate this using a simpler dataset. @brainxyz
Like which dataset?
you can open an issue on the Github repo and give more details about the dataset or a link to the dataset so we can add more info about how to solve it.
@@brainxyz It is the iris dataset. It has three labels to categorise (hence its not binary classification) so I presume the method will be the same whether it is for 3 labels or 10 labels.
@@BB-ko3fh
I have added the iris dataset example to the tutorial folder of the Github rep.
Interestingly for this dataset a simple threshold method worked better than converting the ys to multiple one_hot classes.
Hope that helps.
@@brainxyzJust had a look at the dataset. Is there any way to incorporate the softmax function into the logistical regression? Plus why is the accuracy drastically different in both approaches? is this difference dataset dependent or is the first approach generally poorer than the second approach (where is the bias stemming from). Also is there any way of turning the second (much shorter) approach in capturing each of the predicted classes whilst maintaining its accuracy? @brainxyz
@@brainxyz any explanation would be appreciated
O_O