This is something I am interested and considering as a possible future for my career, coming from a math background and starting to dip my toes into ML.
One thing I am afraid of is how useful is it actually going to be to create a rigorous mathematical framework?
I am afraid it might end up like mathematical physics, where they are almost a century behind theoretical and experimental physics and playing catch-up.
What does anyone else think?
One thing to consider is the distinction between deep neural networks as mathematical objects and machine learning as currently practiced.
Lately, there have been quite a few theories on neural networks as ideal nonlinear approximators [1]. Similarly, people have shown many ways that gradient descent can tend to reach a global maximum of regularized curve-closeness[2]. Which is to say, if your development cycle is: gather-data, train, test, deploy, we know this approximates the data almost ideally; you can't really do much better than a deep network.
But we know in practice, when deployed, that deep neural networks actually have many limitations (compared to our intuitions or compared to human performance, etc). There are some obvious explanations. Of course, they're limited by our ability to gather data and by the biases of the data. But even more, they're limited to situations where you have large chunks of unchanging data that you can extrapolate from.
Given that deeps are more or less perfect for the train-test-deploy cycle, it seems like the problem is with this cycle itself. And it's easy to see human beings somehow acting "intelligently" without using this cycle. So figuring out an alternative to this might be something to look at.
imo, instead of developing new theory entirely from the ground up, its more useful to address & work in the context of understanding longstanding open problems/phenomenon. Theoretical insight and the framework should follow.
> This is something I am interested and considering as a possible future for my career, coming from a math background and starting to dip my toes into ML.
This is a great time and place to be! Neural networks are the twenty-first century Fourier series. It's just that we don't yet understand them. We can easily run them (synthesis) but we are missing the analysis. There's a lot of math to do here.
> This is something I am interested and considering as a possible future for my career, coming from a math background and starting to dip my toes into ML.
I'm in the exact same position.
I think mathematical explanations and (substantiated) intuition for the things that practitioners discover would be very useful. Maybe an all explaining grand theory of deep learning is possible.
Relate it to free energy minimization, that's very hot right now. Read Smolensky's 1986 article in PDP on the harmonium, it was the first restricted boltzmann machine.
Yes, that's arguably the dominant paradigm. It's interesting because of the relationship to thermodynamics -- again, check out Smolensky's paper in PDP, he was a postdoc (along with Geoff Hinton) at UCSD's cognitive science dept, run by Don Norman.
The whole pipeline from Pytorch or TensorFlow and Python to LLVM, to
GPUs or TPUs is absolutely rigorous. Much more rigorous,
in fact, than normal, hand-written mathematics, as you find it in
e.g. a typical Annals of Mathematics publication, or mathematical textbook!
I think what you really have in mind is a simple model of modern deep
learning that is not fully accurate, but still useful!
Let me argue by analogy. You are looking for something
that is to deep learning what the lambda-calculus is to the Haskell
compiler. One of the main simplifications in
programming language theory is replacing finite precision arithmetic
(which is painfully complex) with mathematical integers and real
numbers (which are much simpler). Would a theory
of deep learning based on mathematical reals be
valuable in a theory of deep learning? The stunning success of
floating point formats like bfloat16 [1] suggests otherwise, since
arithmetic precision in deep learning is closely connected to important learning
phenomena such as overfitting and regularisation.
I am tempted to be provocative and say that you are really looking for less rigour!
We don't know which model architecture will work for which problem, and why it would/wouldn't. We experiment until we find something that works, and can sometimes try to guess why it did. But none of this knowledge is formalized in a way that can reliably predict performance in future problems. We engineer solutions to problems, but don't build a rigorous body of knowledge to help us in future problems.
The problem you describe (model architecture will work for which problem) is not lacking rigour, but due to the Turing complete expressive power of neural networks.
What you can do is use the existing rigour and derive stronger properties for restricted classes of NNs (just like you can prove stronger properties for simple subsystems of lambda-calculus) and that is an interesting field of study.
> The problem you describe (model architecture will work for which problem) is not lacking rigour, but due to the Turing complete expressive power of neural networks.
Non-recurrent neural networks are not Turing-complete in any sense.
In the sense that for non-trivial applications, we struggle to define what approach is successful and why it's successful without leaning on empirical metrics.
That's not because of lacking rigour, but because of complexity!
You cannot, in general, predict arbitrary program properties without running programs, that's the essence of Rice's theorem [1]. By the famous Universal Approximation Theorems for neural networks, in general, NNs are Turing complete, which precludes general and simple mathematical "silver bullets" that will help you overcome those challenges.
I think you're confusing rigorous processes with rigorous theory. The analogy of steam engines seems appropriate - they were initially built with (sort of) rigorous processes for metal forming and joining, etc. It wasn't just people tinkering in their backyards. Yet that was done without the theory of thermodynamics. Those rigorous engineers didn't know a Helmholtz free energy from a free energy machine. Once we had thermodynamics, it didn't directly influence how to form metal parts or lubricate pistons, but it did put hard constraints on the possible performance of engines and made it easier to predict how an engine would perform before building it, even if it was outside the range of what had already been tested - eg. jet and rocket engines that required high confidence in predictions of the theory to invest the money needed to develop working models in the middle of wartime scarcity.
Not the OP, but I've built several ML pipelines in the last few years. Even if every step in the process very rigorous and using solid software, there are still challenges. ML models are very difficult (or impossible to test). The usual testing approaches don't work at all for ML features.