Meta is a very large organization, and I'm willing to believe that a good chunk of Meta FAIR (the lab releasing all of this stuff) truly do care about innovations for advancing AI safety and are doing great work along these lines. I'm not disagreeing with your point about the company being led by its financial incentives as a unit, but let's also allow ourselves permission to celebrate this work by this group of people.
Looks like there is both an ARM and x86 version according to the docs. Probably need two different binaries, but you still get cross-OS for each architecture.
% curl -O https://cosmo.zip/pub/cosmos/bin/basename
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 663k 100 663k 0 0 440k 0 0:00:01 0:00:01 --:--:-- 441k
% chmod +x basename
Kind of. It can be read as a single binary, some supported systems will do that. In others, the executable is first parsed as a shell script. There's definitely more to it than a single binary.
Reminds me of a cool tool I once used, uudecode.com, which was a DOS binary that only used 7-bit characters and could decode uuencoded (base64 predecessor) files. Was useful for getting attachments through e-mail in the face of all kinds of filters.
Just skimmed so far and didn't see any reference to the Simplified Transformer block of https://arxiv.org/abs/2311.01906 (and it seems they also left out grouped query attention, too, as pointed out by another comment).
While lazy me wants them to explain how their approach compares to these approaches, it looks like their exposition is pretty clear (quite nice for a preprint!) and I guess I'll just have to actually read the paper for real to see for myself.
Given how well I've seen Simplified Transformer blocks work in my own playground experiments, I would not at all be surprised if other related tweaks work out well even on larger scale models. I wish some of the other commenters here had a bit more curiosity and/or empathy for these two authors who did a fine job coming up with and initially testing out some worthwhile ideas.
Related to the idea of "no one trains on data they own, they shouldn't own the resulting model": since big public datasets like The Pile have CC-SA items in them, is anyone considering bringing the argument that model weights are derivative work that must be "shared alike"?
As you guessed, the history tracking is one of the killer features. Imagine it being super easy to edit the history of a REPL session (delete, reorder, merge, and edit contents of each command) and rerun... That's a notebook! Notebooks also allow for markdown input and rich HTML output (which is killer for plotting) making it possible to polish your REPL history into a document you'd actually want to share with a colleague to explain something like a data analysis workflow.
I actually started in notebooks and then learned to love the REPL as a simplified "scratchpad notebook." I'd say in many ways notebooks are an improvement that cater heavily to REPL-lovers, but that for some quick tasks, the extra complexity isn't always worth it.
In general, if you keep the source code positions of every nontrivial token, and you keep the raw source code, then yeah you can print out those pretty specific point error messages regardless of whether you keep your trees lossless. Also, if you want to include filename in your messages (perhaps because unlike Lox your language supports imports), then you'll need more than just lossless trees to store the necessary information.
I'm not sure exactly how Rust and rust-analyzer keep track of the info necessary to their excellent error messages and diagnostics, but I wouldn't be surprised if pinpoint messages were not the primary motivation for rust-analyzer to do lossless parsing.
Disclaimer: I wrote this blog post. If this were an "Ask HN" post, though, the question would be "What next after reading Crafting Interpreters?" I have only done the tree-walk interpreter half of the book, but I'm already excited to move beyond Lox, and I'm curious to hear what others have done in this situation.
I'd do the second half of the book, it's got plenty of important techniques that aren't taught in the first half.
Once you've done that, you can see how it's done "for real" in a production setting by reading the source code to Wren (by the same author). [0]
Wren's implementation maps very closely to the C implementation of Lox.
At that point, you're ready to do your own language. As far as compilation techniques go, you'll still be missing the "middle-end" of the compiler, which uses an SSA IR. I don't recommend implementing this yourself, I'd look into MLIR (from the LLVM project) if you want to actually work on the middle-end. You can create one or more dialects that are unique to your language, and implement your own compiler transformations. There are lots of existing papers and projects on GitHub demonstrating doing so.
I agree on the suggestion to do part two, it's where things get really fun!
One thing you can do with the finished Lox (or Monkey, if you prefer WACIG) before going into the world of intermediate representations is implementing a peephole optimizer. You look for reducible patterns in the bytecode, and replace them with optimized bytecode. You can also look for certain patterns and replace naive implementations with native builtins/intrinsics. You can work with the raw bytes of the bytecode, so you don't need to introduce an IR just yet.
The Apex compiler at Salesforce does a vast majority of its optimizations as peeps.
EDIT: I _just_ wrote another comment to someone asking similar questions a few days ago. Here's a link to the parent question, check out my thoughts as well as others in the thread. https://news.ycombinator.com/item?id=36119915
Wow, the whole concept of a peephole optimizer is a bit mind blowing to me. I'm appreciating all the reasons to power through to writing a bytecode VM as the next step.
I'm not sure how far down the compiler I actually will enjoy going vs. exploring ideas around type systems, linters, etc. up near the AST level, but if I do venture down this advice will certainly come in handy!
I don't have an academic background, and I'll agree that the majority of books I have picked up have covered the subject very well. It's a pretty common technique though, for assembly and for bytecode, so I've learned by reading implementations. `peephole.c` in CPython is particularly easy to read with a small understanding of the CPython API. It's a very limited implementation, but the idea goes far. Lots of things that you might expect the compiler to optimize in IR can be done directly from the bytecode instead.
Thank you for the suggestions! I was actually just searching about MLIR today after reading some Julia language community discussions on the new Mojo language that uses MLIR.
Personally, I found "Writing an Interpreter in Go" better than "Crafting Interpreters". It has a follow-up book titled "Writing a Compiler in Go", which (similar to Crafting Interpreter's second half) implements a virtual machine. The books take a lot of inspiration from Crafting Interpreters.
If this sounds at all appealing to you, I would check it out. I found Monkey so far somewhat more compelling than Lox (might be personal preference, though), and I thought the exposition (and code quality) was overall better. Go is also very easy to pick up.
I continued by taking a compilers and a interpreters course at my university. In the interpreters course a friend and I desinged a new language and implemented it.
It turns out designing a language is pretty hard, but it was quite fun and am proud of the project. Also I really wanted to implement a typechecker, so we did that too.
I am wondering why the julia lox interpreter is so slow. Could it be that there is a lot of type unstable code or could it be an issue with julia's garbage collector? (I can relate to the comment that it is quite hard to find usable information from julia's profiler).
I'm pretty sure it is type instability. The faster way to do this would be with something like https://github.com/YingboMa/Unityper.jl which would fix that. The problem with profiling unstable code in Julia is that it makes everything slow so the profiler will just show a big mess of everything being slow. We do need better resources, but I have no idea what they would like like.
I actually played with Unityper.jl and SumTypes.jl, but my conclusion was that if I was going to depart from dispatch on Julia types in my code, I might as well just stick to an untyped tree, since either way I'd have to have a single `evaluate` function for interpreting any kind of node.
Reconsidering now, it seems that there might be benefits beyond type dispatch to having a typed syntax tree, so maybe I'll give that a shot as a next step!
There currently don't exist great tools to figure out why type unstable code is slow. @code_warntype and Cthulhu can help find type stability problems, but they're pretty hard to use for newer users (or big functions)
I just finished the bytecode interpreter side, and I have similar questions. The next areas of interest for me are reasonable AST representations in C and learning how to translate a semantics from a PL paper to a runtime. Anyone have any recommendations?
I'm late to the party, but I want to say thank you for sharing this. It's inspiring to look at how much you've built and (hopefully) enjoyed the process of building! I'm loving everything -- your site, your language design, your docs, your builtin libraries, your dev tools. Beyond impressive. People like you are the ones who make HN one of my best places on the internet.
For context on where I'm coming from, about two weeks ago I picked up Crafting Interpreters [1] for fun. I'm finding your clear-yet-concise Compiler internals [2] to be particularly compelling reading, and jumping back and forth between those "how this all works" docs and the live example of this language you actually built do a WASM-compiled tree-blowing-in-the-wind animation is just... just wow. So freaking cool!
I also enjoyed reading the comment thread that inspired you to start on Yaksha and seeing how this project has a wholesome start as inspiration-by-programming-hero. I hope you recognize that a few years later you've now ascended from inspiree to inspirer. I also hope you're still having tons of fun building out Yaksha!
I was torn whether to share my own post, but I figured the HN crowd might include a few others who will also really geek out about this topic and appreciate it. It's mathematical optimization and forecasting used to guide giant batteries hooked up to the electrical grid, after all.
There is some accompanying code I got to share publicly, too, if you want to run this yourself [1]. While I'm at it, I'll also mention some papers for anyone who wants a true deep dive [2, 3].