One of the issues with NN is that there is no way to find out the relative significance of the input variables and how they affect the output response. Atleast I couldn't find a way when I tried using them a couple of years back. Secondly, I guess there isn't a way to cluster similar cases together to analyze what rules bind different decisions together. Also, one cannot find the partial dependence of the output on a given input variable. All these problems are solved by the random forest classifier which I have been working with and really like.
Okay, I think I should accept that I am probably outdated or misinformed about the things one can do with neural nets. Re, randomforests, I was referring to the Leo Breiman implementation on the univ berkeley website. Andy Liaw of Merck also maintains the R version of Leo Breiman's algorithm (http://cran.r-project.org/web/packages/randomForest/index.ht...).
I know neural nets have been successfully used for really compute intensive applications like face recognition or vehicle autopilot, but for applications like poker or predicting car sales or data mining used by walmart etc, there has been increasing use of simpler regression based / decision tree based approaches. There is an impression in the machine learning community that NN is like a black box which is probably the reason for its falling popularity if that is the case.
It is true, though, that more powerful models are less explainable. But in return you have more compact modeling and training via gradient descent. This is faar faster than the combinatorial optimization involved in ensembles of trees. (Trust me, I've implemented both.)
there isn't a way to cluster similar cases together to analyze what rules bind different decisions together
"PCA analysis of activations of the hidden layer. Could also do k-means clustering of activations of hidden layer."
I concur. However, for 2-d or 3-d visualization, you should use the more recently developed t-SNE algorithm instead of PCA or other alternatives. t-SNE does waay better.
Software (in Matlab and Python) is available at:
http://ict.ewi.tudelft.nl/~lvandermaaten/t-SNE.html
Also, one cannot find the partial dependence of the output on a given input variable.
If your model assumes that the output is a non-linear combination of inputs, then yes, it is hard to express the output in terms of a linear decomposition. But that was your choice of modelling assumption, presumably because linear models are insufficiently powerful to fit the underlying variations.
"The random forest classifier looks interesting, I'll have to investigate that. Any suggestions for papers/tutorials?"
Random forests were developed by Leo Breiman (RIP).
"Ensemble methods in machine learning" by Diettrich (2000) compares different tree ensemble methods. He concludes that boosting an ensemble of decision trees is better, except when the data are very noisy, in which randomized trees are better. (Boosting is when you focus on the examples that the model is currently doing the worst. Randomized instead works on random subsets of examples.) The main reason boosting is worse than randomized trees in the noisy case is because the AdaBoost exponential loss is sensitive to outliers. Which is to say, AdaBoost boosts the wrong loss function. Boosting an appropriate choice of loss function (perhaps a regularized log-loss) is probably superior to randomized trees in most circumstances.
"Improved boosting algorithms using confidence-rated predictions" by Schapire and Singer (2000) is a great introduction to boosting.
Around the same time Llew Mason and Jerome Friedman independently demonstrated that boosting is essentially fitting an additive model using gradient-based methods to select the features that have the steepest loss gradient. So you should follow up by looking at their work.
Your statements are all false and I can refute them later when I am not on my phone.
If you want to criticize neural nets as tricky and confusing to newbies with a steep learning curve, that is fair. But it's irresponsible criticism like yours that has given neural nets an undeserved bad reputation in the machine learning community.