I'm really happy to hear that it works on those sorts of cases. I've tried a few images during testing, such as unusually long noses, and was pleased with how they came out. The novelty comes from a simple approach to a usually quite complex problem (i.e. posing the problem as a semantic segmentation problem, to produce a spatially aligned volume)
One of the authors here. You are very right in saying that the there aren't many details. This limitation, we believe, is due to a lack of large, high quality training sets. The data we trained from was very smooth, which means our method is unable to pick out features such as wrinkles and dimples.
What about the "Maybe try rendering an image of a 3d model of a face, then attempt to reconstruct it from the rendered image and measure the displacement from the original model." approach for generating high resolution training data?
That's actually a great idea. Because then you can get a diff between the outputted 3d and the actual 3d used to generate the images, as opposed to what I assume is just diffing the generated profile with a profile shot of the subject.