Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

When we started the project, we did write tokenizers by hand. I mention that in the blog post. You can write some very fast tokenizers by hand, even in JavaScript. Of course they won't be as fast as hand written tokenizers in C, but you'd be surprised how well the code of a hand written tokenizer in JavaScript can be optimized by a JS engine, at least I was :). IR Hydra 2 is a great tool to visualise v8's IR representation of JS code [1]. It is a shame it is not built into the Chrome Dev Tools.

In the end, we simply could not write tokenizers for all languages by hand. And our users wanted to take their themes with them when switching to VS Code. That's why we added support for TM grammars and TM themes, and in hindsight I still consider it to be a very smart decision.

[1] http://mrale.ph/irhydra/2/



What about a parser generator that takes something like a BNF-type language and generates optimal JS/TS code on the fly, similar to Lex/Yacc? (The BNF would be portable, the generated code would be a cache.)

I can see that reusing .tmLanguage files saves a lot of work, but that format is atrocious -- hard to both read and write. (I once wrote a parser/highlighter for it in ObjC, it was not a lot of fun.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: