It just is a thing, tbh. It manifests in the data pretty clearly.
In aggregate, in large data sets, race comes through - especially with a few datapoints. For example, when I worked at a fintech company: with household income and zip code, we could accurately target race with >80% accuracy [0]. Add a few more datapoints, and this would very quickly get closer to 95% accuracy.
That was an _actual_ party-trick[1] demo we did, alongside also de-anonymizing coworkers based on car model, zip code, and bank name.
[0] I worked as a SecEng and were trying to prove that we were(n't) inadvertently targeting race, for compliance reasons. In the end, the business realized the threat and made required changes to prevent this.
[1] We were doing this to make a case for stricter controls and stronger isolation/security measures for storing non-PII data. The business also saw the light on this. Sometimes we'd narrow them down to 30 or 40 people in their zip code, and sometimes (such as a coworker with an old Bentley), it was an instant hit.
In aggregate, in large data sets, race comes through - especially with a few datapoints. For example, when I worked at a fintech company: with household income and zip code, we could accurately target race with >80% accuracy [0]. Add a few more datapoints, and this would very quickly get closer to 95% accuracy.
That was an _actual_ party-trick[1] demo we did, alongside also de-anonymizing coworkers based on car model, zip code, and bank name.
[0] I worked as a SecEng and were trying to prove that we were(n't) inadvertently targeting race, for compliance reasons. In the end, the business realized the threat and made required changes to prevent this.
[1] We were doing this to make a case for stricter controls and stronger isolation/security measures for storing non-PII data. The business also saw the light on this. Sometimes we'd narrow them down to 30 or 40 people in their zip code, and sometimes (such as a coworker with an old Bentley), it was an instant hit.