Even if the effects were not properly accounted for in the calibration of thermometers (which I doubt, in general), it would not likely cause a systematic error that shifts slowly over decades. The authors of the report have identified, and explored, other, more plausible souces of such systematic errors, such as in how the measurements are performed.
Indeed it does, but I cannot see how that could possibly refute the point I am making. How could that cause a systematic error that shifts slowly over decades?