I think given enough students, 3 data points over 4 years is a perfectly appropriate dataset from which to measure longitudinal effects.
I'm not saying I don't agree that the test may not have measured anything useful, or that there wasn't some sampling bias, but there's nothing inherently wrong with 3 data points per person over four years.
I'm not saying I don't agree that the test may not have measured anything useful, or that there wasn't some sampling bias, but there's nothing inherently wrong with 3 data points per person over four years.