Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Incorrect Pytorch gradients with Apple MPS backend...

Yep this kind of thing can happen. I found and reported incorrect gradients for Apple's Metal-backed tensorflow conv2d in 2021 [1].

(Pretty sure I've seen incorrect gradients with another Pytorch backend, but that was a few years ago and I don't seem to have raised an issue to refer to... )

One might think this class of errors would be caught by a test suite. Autodiff can be tested quite comprehensively against numerical differentiation [2]. (Although this example is from a much simpler lib than Pytorch, so I could be missing something.)

[1] https://github.com/apple/tensorflow_macos/issues/230

[2] https://github.com/sradc/SmallPebble/blob/2cd915c4ba72bf2d92...



I’ve also found that some versions of torch get quite different inference results on MPS, ignoring gradient. See https://gist.github.com/gcr/4d8833bb63a85fc8ef1fd77de6622770


Yeah, luckily, you can unit tests these and fix them. They are not concurrency bugs (again, luckily).

BTW, numeric differentiation can only be tested very limitedly (due to algorithmic complexity when you doing big matrix). It is much easier / effective to test against multiple implementations.


You can easily test a gradient using only the forward pass by doing f(x+h) ~ f(x) + dot(g, h) for a random h




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: