I think this is way too much for a pure CS person. It is not likely they will make a big contribution on the math side without being a mathematician first. E.g. an applied mathematician to CS.
For ML, the OP already has linear algebra which is sufficient. Deep neural networks is back prop which is basically high school math. You could have mentioned ODEs, sensitivity analysis which I think are more relevant than convex optimization. For NNs we don't even care about identifiability in both the statistics and dynamic systems points of view. NNs blow away SVMs and almost everything except for random forests in some domains. Both of these have this interesting property that nobody understands them except in terms of black boxes for the most part. Boosting is another example. It really is stranger than fiction.
The being said I think statistics/probability theory and Bayesian stats/networks are useful to know for any scientist.
I would talk to your advisor about what to do. They will be able to advise on what's important and what to learn/focus on.
It took a while, but there's been a lot of work lately explaining neural nets' performance over the last 5 years of so, from papers showing PAC learnability for specific architectures (https://arxiv.org/abs/1710.10174) to work saying that most local optima are close to global optima (http://www.offconvex.org/2016/03/22/saddlepoints/), to work saying that the optimization error incurred (as separate from approximation and estimation errors) serve as a form of regularization for deep neural networks.
And understanding how these things work helps improve and speed up these methods and models: it's hybrid algorithms which are enabling performance in time-series data and more complex tasks. The future will nearly certainly use neural networks as part of many algorithms, but I doubt that the full machinery will be simple feed-forward nets of ever-increasing sizes.
For ML, the OP already has linear algebra which is sufficient. Deep neural networks is back prop which is basically high school math. You could have mentioned ODEs, sensitivity analysis which I think are more relevant than convex optimization. For NNs we don't even care about identifiability in both the statistics and dynamic systems points of view. NNs blow away SVMs and almost everything except for random forests in some domains. Both of these have this interesting property that nobody understands them except in terms of black boxes for the most part. Boosting is another example. It really is stranger than fiction.
The being said I think statistics/probability theory and Bayesian stats/networks are useful to know for any scientist.
I would talk to your advisor about what to do. They will be able to advise on what's important and what to learn/focus on.