Thank you for the video , i have a question why did you classified BMI ? Why dont you use BMI values as they are ? I mean can continous values of dependent variable be used in ANN?
Thank you for your clear description of how we can use SPSS for ANN modeling and predicting relative importance of independent variables. but I have 1 question, can you explain more about normalized importance and importance and how can you relate to percentage contribution of independent variables to them.
While in an ANN it is not possible to interpret the synaptic weights in a fashion as for example the coefficients in a regression analysis the relative importance gives you an idea about which of the variables has the strongest or least strongest effect on the dependent variable(s).
Amazingly explained. I did the analysis for my data but there is something that you don't explain and I am having trouble with: How to predict future values? If I use the synaptic weight and new input data to predict future value it doesn't predict it correctly. I am afraid it has something to do with the normalization the program does in the architecture in the rescaling of dependent variables that I cannot remove. Could you explain that?
Assuming that you accounted for the fact that each model also sports an error (Effect is seen via the relative error for example) I would share your idea that it is due to the rescaling of the dependents. However, as I do not know the exact programming underlying the implementation I cannot say for sure.
Good job. I want to ask if your data is not training or testing data set, can you still use neural network analysis? can you perform this kind of analysis for "cause and effect" or relationship and association?
In linear regression, the effect of each independent variables can be measured (has effect or not by looking at the significant value) and we can also determine if it has a positive or negative effect. Could such be done in ANN?
Very nice video. Howevr, I have a question. The research papers conduct the ten-fold cross validation with data partition of 90:10 (or 70:30) for training and testing. They use the Root Mean Square of Error (RMSE) for each ANN analysis to assess the accuracy of the NN model. then, the mean RMSE values and standard deviations (for each nth ANN) are calculated. After that, the sensitivity analysis (relative importance) is calculated for each input. Average relative importance and Normalized relative importances (for each nth ANN). How all these results are calculated.
I try to answer your question as far as I understand it. Independent of which partitioning you use 90:10 or 70:30 (which can be selected in the menu partitions) SPSS uses a random sample of your observations so the results, in particular the sum of squares errors as SPSS calls them, might vary depending on the selection. These sum of squares errors are calculated by squaring the difference of actual dependent and estimated dependent for all observations and summing all of them up. In the RMSE you run your ANN n times take all the sum of squares and calculate the mean thereof and of the result you calculate the root. This gives you something like the average error. To assure stability of your results you can calculate the RMSE in a similar fashion but only for a subset of your results (every x-th or a random sample). This shows you if might have some results which generate much higher errors. For calculating the absolute importance weights you might want to check out the SPSS Modeler Algorithm Guide and look the implemented procedure up under KNN Output statistics. The normalized importance results from the absolute importance, where the most important covariate is set to 100% and the others are calculated accordingly.
Sir for neural network analysis in spss what should be the threshold percentage or value (variable importance table). e.g I want to go for SEM and I want to include items of my factor with a higher percentage to get the best fit of the model and remove the less important item from a factor to predict the output. so threshold percentage can take 60% 70% or above?
Maybe estimate all of the links of the SEM separately and on this basis decide if some aspects mighg be excluded? Then again the model should have some theoretical motivation behind itself, so all variables even insignificant ones should remain in the model.
There always is the problem with in-sample and out-of-sample accuracy of models. Not every model that has a high explanatory power also is well suited for forecasting. Here the training data is used to estimate the weights of the network. The test data is not used in the training process. Having finished the training process the model is used to forecast the training and the test data set. The corresponding error rate for the training set gives the in-sample error while the error rate of the test data set gives the out-of-sample error rate.
Thank you for the video. Can you help me with one question: how to find which form use the neural network in spss? For example which functions use for calculating, you know mathematics explain.
If you could rephrase your question maybe I could help you. Do you want to know which options SPSS offers or what the underlying math behind these approaches is?
Thanks for the lovely video. I want to ask how can I combine clustering with neural network using spss? How to cluster using k-means to filter the data and use the filtered (training data) as input for the neural network?
As I understand your question you want to assign in a first step each observation to a particular cluster and use this cluster membership as input in the ANN part?! If this is the case, then in the k-means dialog just click on Save and select cluster membership. This will generate a new variable with the cluster adherences.
Simply said, you cannot. What you could do is either use a suitable Pseudo R Square or use 1 - relative error or 1 - squared relative error as an alternative measure of model fit.
@@muhammadrizqisiregar6202 R squared is ideally defined as the share of the variance of the dependent variable that is explained by the independent variables. The first problem is that in context of an ANN we do not work with variances and the goal is not the reduction of the variance. Second, the ANN approach allows for multiple dependent variables making it even more problematic if you want to work with a single R squared statistic. Does this mean that it is impossible to calculate an R squared statistic per the above definition? Not necessarily. You can calculate for each dependent variable the variance and check which share of it is explained when using the outputs generated of the ANN. This is however not recommended since it mixed two different approaches. Additionally, you will not be able to use established thresholds for effect sizes or run tests based on it like an F-test. Thus, you will get a nice statistic which explanatory content however will be questionable, that is why I would rather advise against it.
Thank you for the wonderful video. I've read in many papers about calculating the statistic for R-squared. The formula for calculating it is:R² = 1-RMSE/S², where S² is the variance of the test data’s desired output. (like this paper: 10.1038/s41598-022-24532-8) But I really don't know how to calculate it, I would like your help, thank you.
Is it possible to employ the same network that was trained to predict more outputs? I mean, I'm running a MLP network and I'm saving the predicted results obtained from my input variables, with one of them being the year 2018. Then, I need to use the same network with the same variables but changing the variable year for 2019, 2020, .. and so on, to predict the behaviour along time. If I run a network eveytime changing the year I need to predict, the network is going to change a bit everytime. What I need is to save the weights and use them with the new variables. Is this possible in SPSS?
Depending on how I interpret your question I could give you two answers: 1. You could use the export menu to export the estimated weights which can later on use to apply the model to other datasets. 2. To my knowledge - at least in SPSS 23 - it is not possible to export network weights and later re-import them if you have additional data points and want the network to incorporate the new additional information. In this case you might have to rerun it for the whole new dataset.
@@jensk.perret6794 thank you for your answer. I've been searching but as you said I think in SPSS is not possible to export the estimated weights and then use them to apply the model to other datasets. In Matlab you can do it, you can save your network and then give another dataset to the network and get the predictions; but here it seems is not possible.
Excuse me, why you do not put the variables in the factors? and what is a significant difference between factors and covariates here? thank you so much!
It depends on the scale level. If you are looking for a more in-depth though understandable description check out the description by IBM: www.ibm.com/support/knowledgecenter/en/SSLVMB_23.0.0/spss/neural_network/idh_idd_mlp_variables.html
@@jensk.perret6794 Thanks, how do we interpret the NETWORK INFORMATION part? parameter estimates part, hidden layer, output layer. using translation I'm trying to understand. so do not be sorry.
@@ess9938 The general problem about ANNs is that compared to a (linear) regression approach it is very hard to nearly impossible to interpret the neural weights in any comparable fashion as all links in an ANN are inherently non-linear. Thus, the interpretation would always depend on the starting solution. I personally would only use the importance list to get a feeling which of the inputs has the highest, lowest impact. If take the estimated ANN and simulate different inputs (Monte Carlo Simulation) you might get an idea about how the output depends on the input, either in general if all variables are randomized or ceteris paribus if you only randomize one variable and keep the other constant.
Sir i am having one query these all fine i got the output also by following the procedure said by you. but my doubt is in the paper they are reporting testing and training results under the heading of RMSE value of an artificial neural network? this thing I am not understanding how to get those testing and training values by using spss multilayer layer method
I think your question goes into the same direction as the one by Walid below. In my reply there I go into how one can get the RMSE values. Let me know if this helped.
@@jensk.perret6794 Thank you sir for replying my query. Do you have any video for the procedure that you told to find out the value of testing and training results under the heading of RMSE value of an artificial neural network? its humble request if you will share the link which represents the above way of doing it to find out the training and testing values. if you have please share the links or such videos. It would be a great help for writing my paper.I got stuck in this sir.
Dear Mr. Rishabh Shekhar could you please tell me how did you solve the problem. I am having exactly the same problem like yours. It would be a great help for me. Please help zapan@cu.ac.bd
Hi I have a question, regarding the analysis of time series in the manner of artificial neural networks MLP, note it is a sales chain, how this is done in this program, please help with that.
Up to my knowledge the standard implementation of SPSS does not per se allow for using an MLP approach to analyze time series data. Maybe if you use the SPSS Modeler this might help. This said you can go a round-about way to using ANN at least in part for your aim by generating variables containing for example lags of the original data series - generating an AR component. In addition you might use the variable containing the time id as some kind of trend component.
@@jensk.perret6794 Hi Professor Thank you for the answer, I am using (IBM SPSS) software, which predict the method of artificial neural networks (MLP), I hope you explain to me in a short video by you regarding the prediction of a time series in this way above. If you wish I will send you a time series. Greetings to you.
Hi Sir. Thank you for the explanation. Do you have the SPSS Modeler 18.4 version of this tutorial? I am trying to implement ANN for my non-linear regression model. Hope you could advise? Thank you.
This video is just to illustrate the possibilities of the ANN implementation in the normal SPSS Statistics, the Modeler would allow for more sophisticated approaches.
I will answer both of your questions in this reply. a central idea behind using ANN is in providing forecasts. When building models for forecasting usually the model is constructed based on some kind of sample. The corresponding model fit is the in-sample quality. If we want to do a forecast we are applying the model to previously unknown data so the same quality is no longer assured. To get a better idea of the quality of the forecasts - out-of-sample quality - we divide our data set into two sub-sets. The first (training) set is used to estimate the model parameters and reports on in-sample quality. The second (test) set is used to estimate the forecast quality, out-of-sample quality. The layers are as in a regression approach first inputs and outputs. In addition to regression we however have in-between layers, the hidden layers, that are latent variables. If you are familiar with structural equations models you might compare it a bit to these types of models.
I suspect that you have panel or at least time series data. Thus, I would recommend using a time series approach in particular since you did not say anything about additional independent variables. If you however were to go with ANN you could try to make it work with time-lagged variables. If you decide on time series in SPSS you might like to check out: ruclips.net/video/GE5g2fiEvA8/видео.html
@@ranjanibhat4417 As I mentioned in my earlier answer the problem is that in the standard basic SPSS version the ANN as a multilayer perceptron requires an input and an output. While you can mirror the AR part and with some preprocessing even the I part the MA part would be rather problematic to be mirrored in SPSS' ANN implementation. If your main goal lies in comparing ARIMA with an ANN and you're not limited to the basic version of SPSS you might take a look at the paper by Buhari and Adamu (2012) "Short-Term Load Forecasting Using ANN". They do something comparable but using Matlab but it should also be realizable in the SPSS Modeler. If you are just looking for a different / better version for forecasting in the basic SPSS version you might try out the Temporal Causal Models under Forecasting.
@@jensk.perret6794 I'm trying to implement ANN on QSAR (quantitative structure-activities relationship) for predicting drug activities with physicochemical properties as predictors. My problem right now is I can't find the model on the SPSS output, so I don't know how to use the model from the result of ANN analysis.
@@idabagusalitraisugiharta5226 I suspect that you are asking where you can find something comparable to the coefficients in from regression analysis. In this case the only thing might be the synaptic weights. They however cannot be interpreted as easily as regression coefficients or rather it is almost impossible to make sense of them. If you want to know how to use the model for other predictions you can take a look at the other comments where the problems of doing this are already discussed.
Thank you for the video , i have a question why did you classified BMI ? Why dont you use BMI values as they are ? I mean can continous values of dependent variable be used in ANN?
Thank you for your clear description of how we can use SPSS for ANN modeling and predicting relative importance of independent variables. but I have 1 question, can you explain more about normalized importance and importance and how can you relate to percentage contribution of independent variables to them.
While in an ANN it is not possible to interpret the synaptic weights in a fashion as for example the coefficients in a regression analysis the relative importance gives you an idea about which of the variables has the strongest or least strongest effect on the dependent variable(s).
Amazingly explained. I did the analysis for my data but there is something that you don't explain and I am having trouble with: How to predict future values? If I use the synaptic weight and new input data to predict future value it doesn't predict it correctly. I am afraid it has something to do with the normalization the program does in the architecture in the rescaling of dependent variables that I cannot remove. Could you explain that?
Assuming that you accounted for the fact that each model also sports an error (Effect is seen via the relative error for example) I would share your idea that it is due to the rescaling of the dependents. However, as I do not know the exact programming underlying the implementation I cannot say for sure.
Good job. I want to ask if your data is not training or testing data set, can you still use neural network analysis? can you perform this kind of analysis for "cause and effect" or relationship and association?
In linear regression, the effect of each independent variables can be measured (has effect or not by looking at the significant value) and we can also determine if it has a positive or negative effect. Could such be done in ANN?
Very nice video. Howevr, I have a question. The research papers conduct the ten-fold cross validation with data partition of 90:10 (or 70:30) for training and testing. They use the Root Mean Square of Error (RMSE) for each ANN analysis to assess the accuracy of the NN model. then, the mean RMSE values and standard deviations (for each nth ANN) are calculated. After that, the sensitivity analysis (relative importance) is calculated for each input. Average relative importance and Normalized relative importances (for each nth ANN). How all these results are calculated.
I try to answer your question as far as I understand it.
Independent of which partitioning you use 90:10 or 70:30 (which can be selected in the menu partitions) SPSS uses a random sample of your observations so the results, in particular the sum of squares errors as SPSS calls them, might vary depending on the selection. These sum of squares errors are calculated by squaring the difference of actual dependent and estimated dependent for all observations and summing all of them up. In the RMSE you run your ANN n times take all the sum of squares and calculate the mean thereof and of the result you calculate the root. This gives you something like the average error.
To assure stability of your results you can calculate the RMSE in a similar fashion but only for a subset of your results (every x-th or a random sample). This shows you if might have some results which generate much higher errors.
For calculating the absolute importance weights you might want to check out the SPSS Modeler Algorithm Guide and look the implemented procedure up under KNN Output statistics.
The normalized importance results from the absolute importance, where the most important covariate is set to 100% and the others are calculated accordingly.
Thank you very much, it is possible to explain ANN in minitab?
I have not yet considered doing tutorials on Minitab. If I get to it I will post a corresponding link here.
@@jensk.perret6794 Thank you
Sir for neural network analysis in spss what should be the threshold percentage or value (variable importance table). e.g I want to go for SEM and I want to include items of my factor with a higher percentage to get the best fit of the model and remove the less important item from a factor to predict the output. so threshold percentage can take 60% 70% or above?
Maybe estimate all of the links of the SEM separately and on this basis decide if some aspects mighg be excluded? Then again the model should have some theoretical motivation behind itself, so all variables even insignificant ones should remain in the model.
Thanks, unique content on RUclips!
Thank you for the video. Its educative. My question is, Whats the idea behind diving the variables into "Training and testing". Thank you
There always is the problem with in-sample and out-of-sample accuracy of models. Not every model that has a high explanatory power also is well suited for forecasting. Here the training data is used to estimate the weights of the network. The test data is not used in the training process. Having finished the training process the model is used to forecast the training and the test data set. The corresponding error rate for the training set gives the in-sample error while the error rate of the test data set gives the out-of-sample error rate.
Thank you for the video. Can you help me with one question: how to find which form use the neural network in spss? For example which functions use for calculating, you know mathematics explain.
If you could rephrase your question maybe I could help you. Do you want to know which options SPSS offers or what the underlying math behind these approaches is?
@@jensk.perret6794 Do you have an email or something, this I need for my finished study. If you won't help me.
Thanks for the lovely video. I want to ask how can I combine clustering with neural network using spss? How to cluster using k-means to filter the data and use the filtered (training data) as input for the neural network?
As I understand your question you want to assign in a first step each observation to a particular cluster and use this cluster membership as input in the ANN part?! If this is the case, then in the k-means dialog just click on Save and select cluster membership. This will generate a new variable with the cluster adherences.
Thank you Sir for sharing ANN in SPSS. Appreciated!
Could you please explain how can I calculate the R square for the ANN model which can I compare further with my PLS SEM R square.
Simply said, you cannot. What you could do is either use a suitable Pseudo R Square or use 1 - relative error or 1 - squared relative error as an alternative measure of model fit.
@@jensk.perret6794 Thanks 🙏 Sir could you please make a video on this.
@@muhammadrizqisiregar6202 R squared is ideally defined as the share of the variance of the dependent variable that is explained by the independent variables. The first problem is that in context of an ANN we do not work with variances and the goal is not the reduction of the variance. Second, the ANN approach allows for multiple dependent variables making it even more problematic if you want to work with a single R squared statistic.
Does this mean that it is impossible to calculate an R squared statistic per the above definition? Not necessarily. You can calculate for each dependent variable the variance and check which share of it is explained when using the outputs generated of the ANN. This is however not recommended since it mixed two different approaches. Additionally, you will not be able to use established thresholds for effect sizes or run tests based on it like an F-test. Thus, you will get a nice statistic which explanatory content however will be questionable, that is why I would rather advise against it.
Thank you for the wonderful video. I've read in many papers about calculating the statistic for R-squared.
The formula for calculating it is:R² = 1-RMSE/S², where S² is the variance of the test data’s desired output. (like this paper: 10.1038/s41598-022-24532-8)
But I really don't know how to calculate it, I would like your help, thank you.
Good explanation, carefully explained. Thanks!
Please, would you help me on data preparation for general Regrssion neural network and analyze usin SPSS?
Can you provide the data set?
Is it possible to employ the same network that was trained to predict more outputs? I mean, I'm running a MLP network and I'm saving the predicted results obtained from my input variables, with one of them being the year 2018. Then, I need to use the same network with the same variables but changing the variable year for 2019, 2020, .. and so on, to predict the behaviour along time. If I run a network eveytime changing the year I need to predict, the network is going to change a bit everytime. What I need is to save the weights and use them with the new variables. Is this possible in SPSS?
Depending on how I interpret your question I could give you two answers:
1. You could use the export menu to export the estimated weights which can later on use to apply the model to other datasets.
2. To my knowledge - at least in SPSS 23 - it is not possible to export network weights and later re-import them if you have additional data points and want the network to incorporate the new additional information. In this case you might have to rerun it for the whole new dataset.
@@jensk.perret6794 thank you for your answer. I've been searching but as you said I think in SPSS is not possible to export the estimated weights and then use them to apply the model to other datasets. In Matlab you can do it, you can save your network and then give another dataset to the network and get the predictions; but here it seems is not possible.
Excuse me, why you do not put the variables in the factors? and what is a significant difference between factors and covariates here? thank you so much!
It depends on the scale level. If you are looking for a more in-depth though understandable description check out the description by IBM: www.ibm.com/support/knowledgecenter/en/SSLVMB_23.0.0/spss/neural_network/idh_idd_mlp_variables.html
Can you write more clearly how we can interpret the results? or is there a note about comment? I do not know much english. thank you so much
If you tell me which part of the result you mean then I could elaborate on this.
@@jensk.perret6794 Thanks, how do we interpret the NETWORK INFORMATION part? parameter estimates part, hidden layer, output layer. using translation I'm trying to understand. so do not be sorry.
@@ess9938 The general problem about ANNs is that compared to a (linear) regression approach it is very hard to nearly impossible to interpret the neural weights in any comparable fashion as all links in an ANN are inherently non-linear. Thus, the interpretation would always depend on the starting solution.
I personally would only use the importance list to get a feeling which of the inputs has the highest, lowest impact. If take the estimated ANN and simulate different inputs (Monte Carlo Simulation) you might get an idea about how the output depends on the input, either in general if all variables are randomized or ceteris paribus if you only randomize one variable and keep the other constant.
Thank you so much for the very useful tutorial!!
Sir i am having one query these all fine i got the output also by following the procedure said by you. but my doubt is in the paper they are reporting testing and training results under the heading of RMSE value of an artificial neural network? this thing I am not understanding how to get those testing and training values by using spss multilayer layer method
I think your question goes into the same direction as the one by Walid below. In my reply there I go into how one can get the RMSE values. Let me know if this helped.
@@jensk.perret6794 Thank you sir for replying my query. Do you have any video for the procedure that you told to find out the value of testing and training results under the heading of RMSE value of an artificial neural network? its humble request if you will share the link which represents the above way of doing it to find out the training and testing values. if you have please share the links or such videos. It would be a great help for writing my paper.I got stuck in this sir.
Dear Mr. Rishabh Shekhar could you please tell me how did you solve the problem. I am having exactly the same problem like yours. It would be a great help for me. Please help zapan@cu.ac.bd
RISHABH SHEKHAR please help
Jens K. Perret please help
Hi I have a question, regarding the analysis of time series in the manner of artificial neural networks MLP, note it is a sales chain, how this is done in this program, please help with that.
Up to my knowledge the standard implementation of SPSS does not per se allow for using an MLP approach to analyze time series data. Maybe if you use the SPSS Modeler this might help.
This said you can go a round-about way to using ANN at least in part for your aim by generating variables containing for example lags of the original data series - generating an AR component. In addition you might use the variable containing the time id as some kind of trend component.
@@jensk.perret6794 Hi Professor Thank you for the answer, I am using (IBM SPSS) software, which predict the method of artificial neural networks (MLP), I hope you explain to me in a short video by you regarding the prediction of a time series in this way above. If you wish I will send you a time series. Greetings to you.
Hi Sir. Thank you for the explanation. Do you have the SPSS Modeler 18.4 version of this tutorial? I am trying to implement ANN for my non-linear regression model. Hope you could advise? Thank you.
This video is just to illustrate the possibilities of the ANN implementation in the normal SPSS Statistics, the Modeler would allow for more sophisticated approaches.
Thank you for the video
Wonderful video
Whats the meaning of layers in this case. Thank you
I will answer both of your questions in this reply. a central idea behind using ANN is in providing forecasts. When building models for forecasting usually the model is constructed based on some kind of sample. The corresponding model fit is the in-sample quality. If we want to do a forecast we are applying the model to previously unknown data so the same quality is no longer assured. To get a better idea of the quality of the forecasts - out-of-sample quality - we divide our data set into two sub-sets. The first (training) set is used to estimate the model parameters and reports on in-sample quality. The second (test) set is used to estimate the forecast quality, out-of-sample quality.
The layers are as in a regression approach first inputs and outputs. In addition to regression we however have in-between layers, the hidden layers, that are latent variables. If you are familiar with structural equations models you might compare it a bit to these types of models.
What is the difference between feature and covariate?
Thank your for nice video
Can you add Turkish subtitles?
Sir i have cases and deaths of a disease for india country.. State wise.. Can i apply neural network??
I suspect that you have panel or at least time series data. Thus, I would recommend using a time series approach in particular since you did not say anything about additional independent variables. If you however were to go with ANN you could try to make it work with time-lagged variables. If you decide on time series in SPSS you might like to check out: ruclips.net/video/GE5g2fiEvA8/видео.html
Sir... I have used arima model for forecasting.... Soo cant i use neural network for forecasting... So i thought of comparing the models
@@ranjanibhat4417 As I mentioned in my earlier answer the problem is that in the standard basic SPSS version the ANN as a multilayer perceptron requires an input and an output. While you can mirror the AR part and with some preprocessing even the I part the MA part would be rather problematic to be mirrored in SPSS' ANN implementation.
If your main goal lies in comparing ARIMA with an ANN and you're not limited to the basic version of SPSS you might take a look at the paper by Buhari and Adamu (2012) "Short-Term Load Forecasting Using ANN". They do something comparable but using Matlab but it should also be realizable in the SPSS Modeler.
If you are just looking for a different / better version for forecasting in the basic SPSS version you might try out the Temporal Causal Models under Forecasting.
How to predict future values?
How to use the model, Sir?
I might answer your question if you elaborate what you mean by "use the model".
@@jensk.perret6794 I'm sorry, I mean how to use the model to predict the output of data sample outside the training/test set?
@@jensk.perret6794 I'm trying to implement ANN on QSAR (quantitative structure-activities relationship) for predicting drug activities with physicochemical properties as predictors. My problem right now is I can't find the model on the SPSS output, so I don't know how to use the model from the result of ANN analysis.
@@idabagusalitraisugiharta5226 I suspect that you are asking where you can find something comparable to the coefficients in from regression analysis. In this case the only thing might be the synaptic weights. They however cannot be interpreted as easily as regression coefficients or rather it is almost impossible to make sense of them. If you want to know how to use the model for other predictions you can take a look at the other comments where the problems of doing this are already discussed.
@@jensk.perret6794 do you mean to use different approach?