Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Luckily, lawyers in the 1970s already figured this out, by anticipating the research on causality that computer scientists and mathematicians like Judea Pearl and Peter Spirtes did in the 1990s. Really!

In the first instance, you just can't use race as a feature, since it is a protected characteristic. But, you might also be worried that protected characteristics can generally be easily identified by looking for innocuous traits that correlate (since people tend to cluster into communities). For example, if you know an American's ZIP code and their three favorite musicians, you can determine their race with an accuracy in the high 90s. (Basically, the US is still just as segregated now as it was a hundred years ago, and black and white Americans tend to listen to different music.)

So when the US Civil Rights Act was passed, the courts came up with the idea of "disparate impact" -- when doing something like hiring, you are not allowed to base the decision on features that disproportionately affect one group rather than another, even if they are formally neutral, unless the feature directly impacts the ability to do the job.

In other words, you have to show that the features you are basing the decision on _causally influence_ the decision you are making, exactly like you see in Pearl's causal influence diagrams or structural equation models or whatever. Eg, if you want to hire a math professor, you can base the decision on the articles they published in math journals, but you can't base it on whether they like old school Goa trance.

So, what about black box neural networks, where you don't know which features are being used? In this case, it's pretty clear that you shouldn't use them directly when making a home loan, because the law wants to know what features are in use, and you can't answer the question of whether you're redlining when you have a black box. However, using black box techniques to learn (eg) the best random forest model to use is fine, because it lets you easily see which factors are going into the decision before deploying it.

FWIW, people have been doing this for decades already. (I did stuff like this back in the 1990s.)



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: