Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Changing every 10th word defeats that strategy but doesn't defeat a cryptographic bias.

Also the cost of storing every paragraph hash might eventually add up even if at the moment it would be negligable compared to the generation cost.



One solution is to store a hash of every n-gram for n from 2 to whatever, then report what percent of ngrams of various lengths were hits.

Did someone say Bloom Filter??


They literally already store the whole conversation...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: