Hacker Newsnew | past | comments | ask | show | jobs | submit | rthy's commentslogin

I have been experimenting with tokenization in Rust in https://github.com/rth/vtext, mostly relying on Unicode segmentation with a few custom rules for tokenization. The obtained performance was also around 10x faster than spacy for comparable precision (see benchmarks section of the readme).


Alternatively, the toolz package ( https://toolz.readthedocs.io/en/latest/ ) is a nice way of getting some additional functional programming capabilities while using the standard CPython interpreter.


An example of compiling the CPython interpreter to WebAssembly can be found in Pyodide (https://github.com/iodide-project/pyodide/).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: