Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Are these inferential statistics not designed to be in-sample?

I would imagine predictive statistics use more out-of-sample metrics like precision and recall.



If you do it right…

That’s the problem: these metrics often come from overfitted or in-sample data, and are completely unrealistic when it comes to expected generalization performance.

I’m at the point where I never trust performance metrics anymore. Or rather, the worse they are, the more I trust them!


I feel like you might be conflating a couple of things, though I'm not a DS so could be off base here.

My reading of the OP's description is that the vendors were offering interpolative predictions, but did not use a test/train split of data. This is in contrast to extrapolative predictions which I would call out-of-sample.

Thus due to not using a test/train split, they achieved extremely good accuracy because they were testing on the same data they trained on. Even though this is "in-sample", you can't use the same data for testing and training.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: