Wow, I was wondering why there were so many new github stars yesterday, and here I found the reason why. :)
Thanks for being interested in tesseract.js, it makes all the work worth. And I have to thank @antimatter15 for creating this library, without him we cannot go this far
I have read all the comments and here I would like to provide my two cents for some questions:
1. Is tesseract.js pure JavaScript?
Yes, it is 100% JavaScript and it leverages Webassembly port of original tesseract-ocr. (means we compile the C sorce code to JavaScript Webassembly code, powered by Emscripten)
2. The accuracy of tesseract.js is poor.
In my experience, it is hard to get perfect results without applying additional techniques to your source images. You may need to some preprocessing and sometimes train a custom traineddata. It is not easy, but it is the price of high accuracy.
3. Cloud OCR service is much more accurate
Yes, that's true. But tesseract.js provides an in browser offline option to do your OCR, it is useful for scenarios like PWA and high confidential image content (which you don't want to send to server). Tesseract.js is not a silver bullet, but it is handy sometimes.
Hope you enjoy this library and feel free to leave any comment to us!