It's been on HN a while back, but while using it in practical cases like WSJ, it seemed to pick up HTML code, whitespace characters and/or text from a sidebar.
I ditched it at the time, but I may try to start using it again if I can get it work with ebooks.
How come I never stumbled upon this!?
Thank you very much.