Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I would think tweets would be skewed, no? Slang, memes, shrtnd txt 2 avoid char lmts, http://urls/, amongst others?


No corpus is immune from comparison, and each will have statistical parameters that reflect it's original selection criteria. Perhaps Mayzner's corpus, apparently based on a sample from literature, exhibits a bias away from the abbreviated forms widely used in written communication today.

So, if you wanted to tune your text prediction software for your phone...


Precisely. I was looking for predicting informal communication patterns, not formal book/newspaper style.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: