Hm. So that helps with high-frequency noise. Any progress on what to do when the...

gabrielgoh · on April 4, 2017

Author here - I believe the problem of a "stiff system" you're referring to is exactly the problem of pathological curvature!

Some points not touched on in the article. If the individual dimensions are of different scales, this problem can be easily fixed with a diagonal preconditioner. Even something like ADAM or Adagrad (unconventional, I know, in this domain) can be used.

There's also a small industry around more sophisticated preconditioners for the linear systems in PDEs, see Multigrid, for example, or preconditioned conjugate gradient.

Animats · on April 4, 2017

The stiffness may be local. It definitely is in a physical simulation for hard collisions. Machine learning data is usually normalized into [0..1], so if you get a really steep slope, something is pathological.

bigger_cheese · on April 4, 2017

I'm not an expert on anything covered in the article but we have a similar physics based model at my work (complex non-linear equations) we use a technique called Sequential Quadratic Programming (SQP) to find an optimal solution. My understanding is that this gives better results than using gradient descent but will only work if the functions are continuous.

This could be worth looking into for you.