I would think tweets would be skewed, no? Slang, memes, shrtnd txt 2 avoid char ...

talaketu · on Jan 6, 2013

No corpus is immune from comparison, and each will have statistical parameters that reflect it's original selection criteria. Perhaps Mayzner's corpus, apparently based on a sample from literature, exhibits a bias away from the abbreviated forms widely used in written communication today.

So, if you wanted to tune your text prediction software for your phone...

chime · on Jan 7, 2013

Precisely. I was looking for predicting informal communication patterns, not formal book/newspaper style.