- all the regular expressions in TM grammars are based on oniguruma, a regular expression library written in C.
- the only way to interpret the grammars and get anywhere near original fidelity is to use the exact same regular expression library (with its custom syntax constructs)
in VSCode, our runtime is node.js and we can use a node native module that exposes the library to JavaScript
- in the Monaco Editor, we are constrained to a browser environment where we cannot do anything similar
- we have experimented with Emscripten to compile the C library to asm.js, but performance was very poor even in Firefox (10x slower) and extremely poor in Chrome (100x slower).
- we can revisit this once WebAssembly gets traction in the major browsers, but we will still need to consider the browser matrix we support. i.e. if we support IE11 and only Edge will add WebAssembly support, what will the experience be in IE11, etc.
Sorry if I'm missing something, but why do you care about supporting other browsers? Isn't VSCode built on Electron which is a self contained server/browser environment?
The article says "It was shipped in the form of the Monaco Editor in various Microsoft projects, including Internet Explorer's F12 tools"; presumably some of those projects also embed it into webpages.
Is there a reason the oniguruma syntax can't be translated into other regexes? It'd require a fairly feature-rich engine since oniguruma supports practically everything, but is Chrome missing something fundamental that can't be mimicked? E.g. [::alpha::] could be converted to [a-zA-Z] or the UTF equivalent (though I doubt this particular one is a problem).
They have lookbehind lookup that's only coming now in ESNext , and, more importantly, recursion, if I understand properly[0].
Maybe more things, I stopped reading the docs at that point to comment here. They have other features absent from JS that can be more or less polyfilled, like possessive quantifiers and sticky matching (they do it with an escape sequence, though, so it can only be polyfilled using this trick[1] if the escape character applies to the whole regexp rather than part of it).
Maybe slightly offtopic, but is there any chance we could see a VSCode based on Edge/Chakra? I prefer your font rendering over your competitions', and that's a pretty big issue for an editor.
It's harder than it sounds if you want to support many languages. The Sublime syntaxes repo I use has 34,000 lines of grammars whereas my engine is only 3000 lines of code. If you count all the tmLanguage files for nice languages available online it's probably hundreds of thousands of lines, and that's in a pretty dense format. The whole point of using tmLanguage files is that people don't care about how fast other languages are if there is no highlighting for their language.
I could get way better performance by rewriting all those grammars using compiled parsers in Rust (like Xi has as an option https://github.com/google/xi-editor) but it would take an absurd amount of effort.
Thank you, that explains a lot, but now a new doubt came from it:
Why don't we use other text editors grammars that are simpler/quicker to parse in JS? I have no idea on the technicalities, but for instance, Vim or Emacs grammars instead?
I can't speak to Emacs, but i know that Vim's syntax definitions are (a) not as powerful as TM's, (b) a nightmare to maintain, and (c) heavily reliant on features specific to Vim's loony regular-expression engine (like variable-width look-around).
My experience is that most syntax highlighters and their definition formats (Scintilla, nano, almost anything that's based on JavaScript) are very limited/naïve compared to TextMate's. It doesn't have to be that way -- TM can certainly be improved upon -- but it is.
Yeah I'm not totally sure what was meant by that. They're plain text, thus parsable.
Maybe they meant that browsers don't usually have access to the file system, but that's changing and also not applicable since they're using Electron and have NodeJS at their disposal.
That doesn't sound right... but then again I don't know enough about TextMate grammars to argue.