JeromeWu's comments

JeromeWu · on Nov 5, 2020

You are right, I cannot overwrite FFmpeg license. Lol

nostrebored · on Nov 5, 2020

Might be important to note! I can see a lot of places this would be applicable for my work but I wouldn't be able to release content using this.

Really great work though, and I'm sure it will help a lot of people!

JeromeWu · on Nov 5, 2020

You can also check the same post in my blog: https://jeromewu.github.io/build-ffmpeg-webassembly-version-...

JeromeWu · on Nov 5, 2020

In fact, the header must be set in the server-side, which means I cannot do that as I am using github pages. You can check more details here: https://github.com/ffmpegwasm/ffmpeg.wasm/issues/102

pteraspidomorph · on Nov 5, 2020

I noticed you were using github and completely failed to think of that! My bad.

JeromeWu · on Nov 5, 2020

Sounds cool! Maybe you can check my build script and see if anything is helpful. :)

https://github.com/ffmpegwasm/ffmpeg.wasm-core/tree/n4.3.1-w...

JeromeWu · on Nov 5, 2020

It is on the last line of the Installation section, but yes, ffmpeg.wasm still replies on SharedArrayBuffer for multi-threading.

JeromeWu · on Dec 22, 2019

Wow, I was wondering why there were so many new github stars yesterday, and here I found the reason why. :)

Thanks for being interested in tesseract.js, it makes all the work worth. And I have to thank @antimatter15 for creating this library, without him we cannot go this far

I have read all the comments and here I would like to provide my two cents for some questions:

1. Is tesseract.js pure JavaScript?

Yes, it is 100% JavaScript and it leverages Webassembly port of original tesseract-ocr. (means we compile the C sorce code to JavaScript Webassembly code, powered by Emscripten)

2. The accuracy of tesseract.js is poor.

In my experience, it is hard to get perfect results without applying additional techniques to your source images. You may need to some preprocessing and sometimes train a custom traineddata. It is not easy, but it is the price of high accuracy.

3. Cloud OCR service is much more accurate

Yes, that's true. But tesseract.js provides an in browser offline option to do your OCR, it is useful for scenarios like PWA and high confidential image content (which you don't want to send to server). Tesseract.js is not a silver bullet, but it is handy sometimes.

Hope you enjoy this library and feel free to leave any comment to us!