Calculus on Computational Graphs: Backpropagation

xtacy · on Aug 31, 2015

It's also known as "automatic differentiation" -- it's quite different from numerical/symbolic differentiation.

More information here:

- https://justindomke.wordpress.com/2009/02/17/automatic-diffe...

- https://wiki.haskell.org/Automatic_Differentiation

The key idea is extending common operators (+, -, product, /, key mathematical functions) that usually operate on _real numbers_ to tuples of real numbers (x, dx) (the quantity and its derivative with respect to some variable) such that the operations preserve the properties of differentiation.

For instance (with abuse of notation):

    - (x1, dx1) + (x2, dx2) = (x1 + x2, dx1 + dx2).
    - (x1, dx1) * (x2, dx2) = (x1 * y1, x1 * dx2 + x2 * dx1).
    - sin((x, dx)) = (sin(x), cos(x)).

Note that the right element of the tuple can be computed precisely from quantities readily available from the inputs to the operator.

It's also extensible to derivatives of scalars that are functions of many variables by a vector (of those variables) (common in machine learning).

It's beautifully implemented in Google's Ceres optimisation package:

https://ceres-solver.googlesource.com/ceres-solver/+/1.8.0/i...

conistonwater · on Aug 31, 2015

I don't think you're describing quite the same thing. You're talking about forward-mode differentiation, whereas backpropagation corresponds to what's usually called adjoint-mode differentiation (I think it's called reverse-mode in the post). The difference is in computational efficiency when the number of parameters is large.

Chinjut · on Sept 1, 2015

Why do you say this is quite different from symbolic differentiation? It seems every function (including addition, multiplication, etc.) is simply augmented with a description of its total derivative, and these derivatives are combined via the chain rule, which is just how symbolic differentiation would normally be carried out as well.

versteegen · on Sept 1, 2015

Yes, but in practice the difference is vast. You can apply automatic differentiation to (almost) arbitrary code with conditional branches, arbitrary inputs (by treating them as unknown constants wrt to the independent variables), recursion and for/while loops with an unknown number of loops, etc. Trying to describe all that using piecewise or recursive functions and then symbolically differentiate it is either a nightmare or impossible. Hence, automatic differentiation.

versteegen · on Sept 1, 2015

Anyone reading this who hasn't already should do themselves a favour and read the other articles on colah's blog. Beautifully presented demonstrations of ML algorithms, a number running live in your browser http://colah.github.io/

jmount · on Aug 31, 2015

My demonstration Scala automatic differentiation library: http://www.win-vector.com/blog/2010/06/automatic-differentia...

outlace · on Aug 31, 2015

This is beautiful. I've never seen a more concise yet powerfully clear explanation of backpropagation. This explanation is so fundamental in that it relies on the fewest number of axioms.

plg · on Sept 1, 2015

Love these tutorials. Not sure that the LaTeX font is the right choice for a web page though.

misiti3780 · on Sept 1, 2015

this is easily the best explanation of back-propagation i have found on the web - nice work