Motivation for sigmoid in ANN is same as that for Logistic Regression. It makes the log-odds a linear function of the input parameters/features. ReLU is chosen due to its piece-wise linearity, since linear functions are easy to handle and fast to compute.
Is it possible that for a particular dataset accuracy we get with ANNs(even after optimization) is less than Logistic Regression/SVMs OR Is it the case that ANNs always give better accuracy than Logistic Regression/SVMs ?
In theory, ANN is always at least as good as any other ML algorithm due to the universal approximation theorem. But in practice, it is quite possible for ANN accuracy to be lower than that of other algorithms since training an ANN is not easy for all datasets. And always remember the "no free lunch theorem"!
To add to what @@EvolutionaryIntelligence said, if we have enough data available, ANNs are as powerful as any other methods. But often, if your dataset is small, the ANN might underperform as compared to Logistic Regression. Thus choosing the right algorithm to use is an art in itself. I'd encourage you to look at this link: scikit-learn.org/stable/tutorial/machine_learning_map/index.html
Does implementing residual connections in fully connected neural networks (which are usually implemented in CNNs) help in increasing the performance of ANNs?
ResNet models are used in Deep Learning (CNN, RNN, etc) since a large number of hidden layers aggravates the problem of vanishing gradients. In usual ANNs, I have not seen use of residual connections, since these are usually not very deep.
Does increasing the number of hidden layers after a certain optimum number, decrease the accuracy of the algorithm? If so, can there be an instance, when even after we keep on increasing the number of hidden layers once it shows a decrease in the accuracy, it suddenly starts showing an increase in the accuracy?
For a given classification problem, there is generally a certain optimum number of hidden layers, in whose vicinity, you get the best training and testing accuracy. For smaller number of hidden layers, your accuracy will most likely go down. For larger number of layers, either your training accuracy will go down or you may also get into overfitting depending on various factors.
Yes, ANN can definitely overfit, which is usually taken care of by using various regularisation techniques like dropout. However, the possibility of overfitting in ANN is not as much as in Decision Trees.
Of course it is affected by the dataset we have and this choice is an important part of the design process. Theoretically, a single layer ANN can approximate any arbitrary function, but in practice, such an ANN is very difficult to train due to which we use multi-layer ANN in practice.
Matlab does have its own advantages, but I prefer Python for various reasons. Matlab is useful mainly if you wish to integrate your Machine Learning code with some of its toolboxes.
What was the motivation behind choosing sigmoid and relu kind functions for activation functions?
Motivation for sigmoid in ANN is same as that for Logistic Regression. It makes the log-odds a linear function of the input parameters/features. ReLU is chosen due to its piece-wise linearity, since linear functions are easy to handle and fast to compute.
Is it possible that for a particular dataset accuracy we get with ANNs(even after optimization) is less than Logistic Regression/SVMs OR Is it the case that ANNs always give better accuracy than Logistic Regression/SVMs ?
In theory, ANN is always at least as good as any other ML algorithm due to the universal approximation theorem. But in practice, it is quite possible for ANN accuracy to be lower than that of other algorithms since training an ANN is not easy for all datasets. And always remember the "no free lunch theorem"!
To add to what @@EvolutionaryIntelligence said, if we have enough data available, ANNs are as powerful as any other methods. But often, if your dataset is small, the ANN might underperform as compared to Logistic Regression. Thus choosing the right algorithm to use is an art in itself.
I'd encourage you to look at this link:
scikit-learn.org/stable/tutorial/machine_learning_map/index.html
Does implementing residual connections in fully connected neural networks (which are usually implemented in CNNs) help in increasing the performance of ANNs?
ResNet models are used in Deep Learning (CNN, RNN, etc) since a large number of hidden layers aggravates the problem of vanishing gradients. In usual ANNs, I have not seen use of residual connections, since these are usually not very deep.
How to determine the number of times we need to apply the backpropagation step?
Typically its a good idea to iterate till your error (or testing accuracy) stabilises to some value.
Does increasing the number of hidden layers after a certain optimum number, decrease the accuracy of the algorithm? If so, can there be an instance, when even after we keep on increasing the number of hidden layers once it shows a decrease in the accuracy, it suddenly starts showing an increase in the accuracy?
For a given classification problem, there is generally a certain optimum number of hidden layers, in whose vicinity, you get the best training and testing accuracy. For smaller number of hidden layers, your accuracy will most likely go down. For larger number of layers, either your training accuracy will go down or you may also get into overfitting depending on various factors.
Sir does ANN suffer from overfitting problem because it almost totally approximates the function as per the training dataset?
Yes, ANN can definitely overfit, which is usually taken care of by using various regularisation techniques like dropout. However, the possibility of overfitting in ANN is not as much as in Decision Trees.
Does number of hidden layers in ANN and number of nodes in hidden layer affected by the dataset we have or is it independent of the dataset?
Of course it is affected by the dataset we have and this choice is an important part of the design process. Theoretically, a single layer ANN can approximate any arbitrary function, but in practice, such an ANN is very difficult to train due to which we use multi-layer ANN in practice.
Does the universal approximation theorem also work for discontinuous functions?
A discontinuous function can be approximated by a steep continuous function. All that ANN does is approximation.
It will be beneficial if you please apply with Matlab sir
Matlab does have its own advantages, but I prefer Python for various reasons. Matlab is useful mainly if you wish to integrate your Machine Learning code with some of its toolboxes.