Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How can you say it's "compressed" over "compiled" when you are actually parsing it into an AST and then (iiuc) converting that to binary? That's exactly what compilers do. You are in fact going to a new source format (whatever syntax/semantics your binary AST is encoded with) so you really are compiling.

To be fair, these two concepts are similar and I may be totally misunderstanding what this project is about. In the spirit of fairness, let me test my understanding. You are saying wasm bytecode is one step too early and a true "machine code" format would be better able to improve performance (especially startup time). I'm not following wasm development, but from comments here I am gathering that wasm is too level and you want something that works on V8. Is that what this project is about?

On a side note, it's truly a testament to human nature that the minute we get close to standardizing on something (wasm), someone's gotta step up with another approach.



> How can you say it's "compressed" over "compiled" when you are actually parsing it into an AST and then (iiuc) converting that to binary? That's exactly what compilers do. You are in fact going to a new source format (whatever syntax/semantics your binary AST is encoded with) so you really are compiling.

I am not sure but there may be a misunderstanding on the word "binary". While the word "binary" is often used to mean "native", this is not the case here. Here, "binary" simply means "not text", just as for instance images or zipped files are binary.

A compiler typically goes from a high-level language to a lower-level language, losing data. I prefer calling this a compression mechanism, insofar as you can decompress without loss (well, minus layout and possibly comments). Think of it as the PNG of JS: yes, you need to read the source code/image before you can compress it, but the output is still the same source code/image, just in a different format.

> You are saying wasm bytecode is one step too early and a true "machine code" format would be better able to improve performance (especially startup time). I'm not following wasm development, but from comments here I am gathering that wasm is too level and you want something that works on V8. Is that what this project is about?

No native code involved in this proposal. Wasm is about native code. JS BinAST is about compressing your everyday JS code. As someone pointed out in a comment, this could happen transparently, as a module of your HTTP server.

> On a side note, it's truly a testament to human nature that the minute we get close to standardizing on something (wasm), someone's gotta step up with another approach.

Well, we're trying to solve a different problem :)


Posting this in the hope that it might help some people grok what they are actually doing:

When I first discovered what Yoric and syg were doing, the first thing that I thought of was old-school Visual Basic. IIRC when you saved your source code from the VB IDE, the saved file was not text: it was a binary AST.

When you reopened the file in the VB6 IDE, the code was restored to text exactly the way that you had originally written it.


Interesting. Do you know of any technical documentation on the topic?


The parent might have been thinking of QuickBasic, which saved programs as byte-code, along with formatting information to turn it back into text: http://www.qb64.net/wiki/index.php/Tokenized_Code

VB6 projects were actually a textual format - even the widget layout: https://msdn.microsoft.com/en-us/library/aa241723(v=vs.60).a...

As someone who runs a WYSIWYG app builder startup (https://anvil.works), I can attest that this is a Really Good Idea for debugging your IDE.


BBC Basic did this too -- keywords were stored as single bytes (using the 128-255 values unused by ASCII). Apart from that the program code wasn't compiled, it just was interpreted directly. Very smart design when everything had to fit in 32KB of RAM.


> Here, "binary" simply means "not text", just as for instance images or zipped files are binary.

If it's not text, then what is it? I'm not sure "not text" is a good definition of the word "binary".

> A compiler typically goes from a high-level language to a lower-level language, losing data.

I don't agree, I don't think there is any loss in data, the compiled-to representation should cover everything you wanted to do (I suppose not counting tree-shaking or comment removal).

> I prefer calling this a compression mechanism, insofar as you can decompress without loss (well, minus layout and possibly comments).

Ahh, so you mean without losing the original textual representation of the source file.

> Wasm is about native code

Here you are making claims about their project that are just not the whole picture. Here's the one-line vision from their homepage[1]:

> WebAssembly or wasm is a new portable, size- and load-time-efficient format suitable for compilation to the web.

With that description in mind, how do you see BinAST as different?

> Well, we're trying to solve a different problem :)

I think you might be misunderstanding what wasm is intended for. Here's a blurb from the wasm docs that is pertinent:

> A JavaScript API is provided which allows JavaScript to compile WebAssembly modules, perform limited reflection on compiled modules, store and retrieve compiled modules from offline storage, instantiate compiled modules with JavaScript imports, call the exported functions of instantiated modules, alias the exported memory of instantiated modules, etc.

The main difference I can gather is that you are intending BinAST to allow better reflection on compiled modules than wasm intends to support.

Here's another excerpt from their docs (and others have mentioned this elsewhere):

> Once GC is supported, WebAssembly code would be able to reference and access JavaScript, DOM, and general WebIDL-defined objects.

[1]: http://webassembly.org/

[META: Wow, I thought downvotes were for negative or offtopic comments]


Ok I am understanding the distinction now.

I ran a google search for "js to wasm" and found a ticket on the webassembly github that explained it all: https://github.com/WebAssembly/design/issues/219


Thanks for the link, it will certainly prove useful in the future.


Text means a sequence of characters conforming to some character encoding. Yoric's binary AST is not a sequence of characters conforming to a character encoding.

Compilation maps a program in a higher level language to a program in a lower level language. The map is not required to be one-to-one: the colloquial term for this is "lossy."


To quote the original post:

> If you prefer, this Binary AST representation is a form of source compression, designed specifically for JavaScript, and optimized to improve parsing speed.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: