Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is why one of my principles is to be skeptical of outliers. Often they are not real and therefore misrepresent the true data.

It's one reason median is preferred over mean, at the outset, as well as throwing out outliers just to see what things look like.



Similar to Twyman's Law: “Any figure that looks interesting or different is usually wrong.”

https://en.m.wikipedia.org/wiki/Twyman%27s_law



The lesson I took from this is that it is useful and important to dig into how any piece of data was sourced.


This advice is insane. Except in specific settings (where a sensor may be misbehaving, where a survey respondent clearly just picked random choices) outliers are really just outlying values and should be kept in the analysis, or at most clipped / winsorized. When submitting to a scientific journal, admitting that outliers were removed without first inspecting why they are there can be enough for an instant rejection, and rightly so.


Twyman's law doesn't state you should ignore those outliers it just predicts that they are more likely to be mistakes then genuine.


I like using the Olympic style of scoring where they lop off the top and bottom scores to account for the cranky overly lenient judges.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: