One of the issues with NN is that there is no way to find out the relative signi...

clevercode · on Oct 13, 2009

I agree that analysis of neural networks is not easy, but there are certainly ways to do it:

there is no way to find out the relative significance of the input variables and how they affect the output response

Calculate partial derivatives of inputs w.r.t. outputs.

there isn't a way to cluster similar cases together to analyze what rules bind different decisions together

PCA analysis of activations of the hidden layer. Could also do k-means clustering of activations of hidden layer.

The random forest classifier looks interesting, I'll have to investigate that. Any suggestions for papers/tutorials?

prat · on Oct 13, 2009

Okay, I think I should accept that I am probably outdated or misinformed about the things one can do with neural nets. Re, randomforests, I was referring to the Leo Breiman implementation on the univ berkeley website. Andy Liaw of Merck also maintains the R version of Leo Breiman's algorithm (http://cran.r-project.org/web/packages/randomForest/index.ht...). I know neural nets have been successfully used for really compute intensive applications like face recognition or vehicle autopilot, but for applications like poker or predicting car sales or data mining used by walmart etc, there has been increasing use of simpler regression based / decision tree based approaches. There is an impression in the machine learning community that NN is like a black box which is probably the reason for its falling popularity if that is the case.

bravura · on Oct 13, 2009

there is no way to find out the relative significance of the input variables and how they affect the output response

"Calculate partial derivatives of inputs w.r.t. outputs."

Yes. There are other ways to interpret the output. See, for example, "Visualizing Higher Layer Features of a Deep Network." by Erhan et al 2009: http://www.iro.umontreal.ca/~lisa/publications/?page=publica...

It is true, though, that more powerful models are less explainable. But in return you have more compact modeling and training via gradient descent. This is faar faster than the combinatorial optimization involved in ensembles of trees. (Trust me, I've implemented both.)

there isn't a way to cluster similar cases together to analyze what rules bind different decisions together

"PCA analysis of activations of the hidden layer. Could also do k-means clustering of activations of hidden layer."

I concur. However, for 2-d or 3-d visualization, you should use the more recently developed t-SNE algorithm instead of PCA or other alternatives. t-SNE does waay better. Software (in Matlab and Python) is available at: http://ict.ewi.tudelft.nl/~lvandermaaten/t-SNE.html

Also, you should look at the JMLR paper (http://ict.ewi.tudelft.nl/~lvandermaaten/t-SNE_files/vanderm...) and supplemental material (http://ict.ewi.tudelft.nl/~lvandermaaten/t-SNE_files/Supplem...), to see the visualizations produced by t-SNE and competing methods. This qualitative evaluation by looking at pictures speaks for itself.

Also, one cannot find the partial dependence of the output on a given input variable.

If your model assumes that the output is a non-linear combination of inputs, then yes, it is hard to express the output in terms of a linear decomposition. But that was your choice of modelling assumption, presumably because linear models are insufficiently powerful to fit the underlying variations.

"The random forest classifier looks interesting, I'll have to investigate that. Any suggestions for papers/tutorials?"

Random forests were developed by Leo Breiman (RIP).

"Ensemble methods in machine learning" by Diettrich (2000) compares different tree ensemble methods. He concludes that boosting an ensemble of decision trees is better, except when the data are very noisy, in which randomized trees are better. (Boosting is when you focus on the examples that the model is currently doing the worst. Randomized instead works on random subsets of examples.) The main reason boosting is worse than randomized trees in the noisy case is because the AdaBoost exponential loss is sensitive to outliers. Which is to say, AdaBoost boosts the wrong loss function. Boosting an appropriate choice of loss function (perhaps a regularized log-loss) is probably superior to randomized trees in most circumstances.

"Improved boosting algorithms using confidence-rated predictions" by Schapire and Singer (2000) is a great introduction to boosting.

Around the same time Llew Mason and Jerome Friedman independently demonstrated that boosting is essentially fitting an additive model using gradient-based methods to select the features that have the steepest loss gradient. So you should follow up by looking at their work.

bravura · on Oct 13, 2009

Your statements are all false and I can refute them later when I am not on my phone.

If you want to criticize neural nets as tricky and confusing to newbies with a steep learning curve, that is fair. But it's irresponsible criticism like yours that has given neural nets an undeserved bad reputation in the machine learning community.

prat · on Oct 13, 2009

If I were you, I would have commented after finishing up with the phone.

bravura · on Oct 13, 2009

What I meant is that I would comment further when not on my mobile device, but instead a keyboard. See my forthcoming comments above.