Hacker Newsnew | past | comments | ask | show | jobs | submit | alexdima's commentslogin

Hi, I'm a developer on VS Code and wanted to clarify that there was actually a logical memory leak in there.

The problem is that I tried to reproduce it back in October while running from sources, and I could not reproduce any leak. I believed the issue is about the memory usage reported by the OS vs the memory usage reported by the logical JS Heap Snapshot, which is a common topic for garbage collected runtimes, such as JS / v8. That is why we believed the issue is about the v8 VM not freeing memory back to the OS when we freed memory logically.

But it turns out I tried the steps again just now, and there is a real memory leak in there! It only reproduces when running a built version of VS Code, such as VS Code Insiders or VS Code Stable. The built versions of VS Code offer file type extension recommendations, for example you open a `.php` file and a recommendation comes in to point you to the PHP extension. When running out of sources, those recommendations do not show up and the entire code responsible for those does not execute.

Thank you for bringing this up to my attention and I'm sorry for misunderstanding the initial issue.


Hi, I'm a developer working on VS Code.

VS Code is architected in a way where extensions are not eagerly activated by default. Each extension can declare a list of activation events, such as e.g. opening a file of a certain language, invoking a specific command, starting debugging, etc. See for example the quite long list of activation events for our built-in TypeScript extension [1].

However, we also offer an activation event called ' * ', which means an extension can ask to be activated on startup. Some over-eager extensions might be using ' * ' to start up as soon as you open a VS Code window. You can find those extensions at any time using F1 > Developer: Show Running Extensions which will show the subset of extensions running at any time, and if they were activated on startup or not.

Moreover, that view can guide you into profiling the extension host and can help you easily figure out if any extension is consuming extensive CPU. This was added quite recently [2].

[1] https://github.com/Microsoft/vscode/blob/1638acdd62d94bc4d99...

[2] https://code.visualstudio.com/updates/v1_19#_running-extensi...


VS code sometimes performs better than VS. Can you elaborate on the reasons? Are there plans to migrate some optimizations or architectural lessons to VS?


Not a VSCode developer, but I imagine VS and VSCode are so completely different under the hood that performances comparisons between the two are not very useful. Not only are they architecturally completely separate animals, but VS probably has a lot of legacy baggage that needs to stay for backwards compatibility, which VSCode lacks.


This is an amazing guide to find misbehaving extensions and thank you for your work on VS Code.


This is a great insight, thanks!

I wonder if there also is a place we can view the activation events of an extension from within VS code. If there isn't, I guess I can always find that info out from the hopefully available repository of the extensions.


Hovering over the activation time will show the reason the extension is active. e.g. "Activated because you opened a javascript file", "Activated on start-up", etc.


It could do the opposite too. When a loaded extension is not used for some time (maybe the user switched to an project with a different language), so there is no active code depending on it then it should unload it to free up resources.


F1 doesn't seem to work for me on macOS (could be my non-Apple keyboard), but I see that the command palette also gives access to Show Running Extensions - very cool!


Hi, I'm a VS Code developer.

Can you please open a new issue and describe "as you would to a five year old" what's not working as expected.

Even from reading this comment thread I simply don't get it, so it might be something very easy to fix, once I do understand what it's about.

Thank you for your patience! :)


Whoop! Would love to—thanks for your time :)


When we started the project, we did write tokenizers by hand. I mention that in the blog post. You can write some very fast tokenizers by hand, even in JavaScript. Of course they won't be as fast as hand written tokenizers in C, but you'd be surprised how well the code of a hand written tokenizer in JavaScript can be optimized by a JS engine, at least I was :). IR Hydra 2 is a great tool to visualise v8's IR representation of JS code [1]. It is a shame it is not built into the Chrome Dev Tools.

In the end, we simply could not write tokenizers for all languages by hand. And our users wanted to take their themes with them when switching to VS Code. That's why we added support for TM grammars and TM themes, and in hindsight I still consider it to be a very smart decision.

[1] http://mrale.ph/irhydra/2/


What about a parser generator that takes something like a BNF-type language and generates optimal JS/TS code on the fly, similar to Lex/Yacc? (The BNF would be portable, the generated code would be a cache.)

I can see that reusing .tmLanguage files saves a lot of work, but that format is atrocious -- hard to both read and write. (I once wrote a parser/highlighter for it in ObjC, it was not a lot of fun.)


I hate slowness and inefficiency too, that's why I try to make the editor as fast as possible :), but at least in this case, it is not the dynamic nature of JS to blame, but rather the nature of TM grammars. TM grammars consist of rules that have regular expressions, which need to be constantly evaluated; and in order to implement a correct TM grammar interpreter, you must evaluate them.

I've looked in the past for optimization opportunities in the C land (mostly through better caching), which yielded quite nice results [1][2]. I would love if you'd want to take a look too.

At this point, in tokenization, 90% of the time is spent in C, matching regular expressions in oniguruma. More precisely, regular expressions are executed 3,933,859 times to tokenize checker.ts -- the 1.18MB file. That is with some very good caching in node-oniguruma and it just speaks to the inefficiency of the TM grammars regex based design, more than anything else.

It is definitely possible to write faster tokenizers, especially when writing them by hand (even in JS), see for example the Monaco Editor[3] where we use the TypeScript compiler's lexer as a tokenizer.

At least in this case, inefficiencies are not caused by our runtime.

[1] https://github.com/atom/node-oniguruma/pull/40

[2] https://github.com/atom/node-oniguruma/pull/46

[3] https://microsoft.github.io/monaco-editor/


Do you pre-process the regular expressions into a common DFA, or does oniguruma do that for you? That would seem like the natural design for this.

It's non-trivial because TextMate grammar seem like they're just a little bit too general to be convenient. So there's definitely a trade-off. But if I wanted to really get as fast as possible, I would try to see if I can get there.


Yes, those comparisons at the end show differences in rendering caused by the "approximations" used prior to VS Code 1.9. They were all caused by the difference between the ranking rules of CSS selectors and the ranking rules of TM scope selectors


- all the regular expressions in TM grammars are based on oniguruma, a regular expression library written in C.

- the only way to interpret the grammars and get anywhere near original fidelity is to use the exact same regular expression library (with its custom syntax constructs) in VSCode, our runtime is node.js and we can use a node native module that exposes the library to JavaScript

- in the Monaco Editor, we are constrained to a browser environment where we cannot do anything similar

- we have experimented with Emscripten to compile the C library to asm.js, but performance was very poor even in Firefox (10x slower) and extremely poor in Chrome (100x slower).

- we can revisit this once WebAssembly gets traction in the major browsers, but we will still need to consider the browser matrix we support. i.e. if we support IE11 and only Edge will add WebAssembly support, what will the experience be in IE11, etc.


Sorry if I'm missing something, but why do you care about supporting other browsers? Isn't VSCode built on Electron which is a self contained server/browser environment?


Monaco is a standalone component which also works in the browser: https://microsoft.github.io/monaco-editor/

The article says "It was shipped in the form of the Monaco Editor in various Microsoft projects, including Internet Explorer's F12 tools"; presumably some of those projects also embed it into webpages.


Because the base Monaco editor is used in other products besides Code. The example I've seen thrown around is configuration editing in Azure consoles.


Is there a reason the oniguruma syntax can't be translated into other regexes? It'd require a fairly feature-rich engine since oniguruma supports practically everything, but is Chrome missing something fundamental that can't be mimicked? E.g. [::alpha::] could be converted to [a-zA-Z] or the UTF equivalent (though I doubt this particular one is a problem).


They have lookbehind lookup that's only coming now in ESNext , and, more importantly, recursion, if I understand properly[0].

Maybe more things, I stopped reading the docs at that point to comment here. They have other features absent from JS that can be more or less polyfilled, like possessive quantifiers and sticky matching (they do it with an escape sequence, though, so it can only be polyfilled using this trick[1] if the escape character applies to the whole regexp rather than part of it).

[0] https://github.com/kkos/oniguruma/blob/1983356862bc3ff795d77...

[1] https://github.com/slevithan/xregexp/issues/152


Yeah, if lookbehinds aren't supported yet, that'd be death. Thanks!


Maybe slightly offtopic, but is there any chance we could see a VSCode based on Edge/Chakra? I prefer your font rendering over your competitions', and that's a pretty big issue for an editor.



We don't show leading/trailing whitespace diffs unless the diff consists only of leading/trailing whitespace changes.

This is sort of what we do when diffing:

* when we need to compare two buffers, we represent them as two arrays of lines

* we then proceed to trim() each line in both arrays

* we then use a greedy optimization where the first N and the last M lines that are equal (post trimming) in both arrays are dropped from further computation

* we then run a LCS algorithm over the remaining lines to find the diffs

It is important to note that the same two arrays of lines can have multiple equal longest common substrings. This method [1] could get some love and could try to recover in some of these cases.

[1] https://github.com/Microsoft/vscode/blob/5b9ec15ce526decc5dd...


Here [1] is a working sample showing how to register a completion item provider for a language.

The monaco-typescript [2] plugin shows how a lot of the language API can be used.

[1] https://microsoft.github.io/monaco-editor/playground.html#ex...

[2] https://github.com/Microsoft/monaco-typescript/blob/master/s...


Thanks, that should be a good start. [1] does not demonstrate context-dependent completions, but I guess it should be possible, and maybe [2] helps there. It's also nice that in the playground intellisense for the monaco API is enabled.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: