I like this, but unfortunately it doesn't solve one annoying problem: lexical scope doesn't work and it will fail in an unexpected way.
If you reference something lexically, your code fails at runtime. Want to use an import? You have to use import() inside the closure you pass to spawn(). Typescript doesn't know this. Your language server doesn't know this. Access a variable that shadows a built in global? Now you're accessing the built in global.
The only way this could even be addressed is by having a full on parser. Even then you can't guarantee things will work.
I think the only "fix" is for JS to introduce a new syntax to have a function that can't access lexical scope, returning a value that either extends a subclass of Function or has a cheeky symbol set on it. At least then, it'll fail at compile time.
There is a simple solution to this problem, but it's not very popular: do the same thing Workers do, require using a separate file. All the tooling works out of the box, you have no issues with lexical scoping, etc. The only downside is it's (currently) clunky to work with, but that can be fixed with better interfaces.
I've been using a functionally identical implementation of this since I wrote it in my startup's codebase a decade ago. It's really handy, but definitely not without edge case issues. I've occasionally had to put in workarounds for false positive TypeScript/lint errors or a tool in the bundling pipeline trying to be too clever and breaking the output.
Overall it's great, and I'm glad to see a generic implementation of it which will hopefully become a thriving open source project, but ultimately it's a kludge. What's really needed is for JS to introduce a native standardized version of this construct which TypeScript and the rest of the ecosystem have to play nice with.
A linter rule provided by the library could be helpful here. I know it's just a workaround but probably easier than going for a solution that does compile time checks.
This should be the expected behavior when multithreading. It is the expected behavior when executing a child process, such as node’s child_process.fork.
It's expected behavior functionally, but knowing nothing of the spawn function, it's unexpected syntactically. Fork doesn't behave this way, though, because it executes a module by path, not a function.
Fork, and normal worker threads always enter a script, there's clearly no shared lexical scope. This spawn method executes a function, but that fn can't interact with the scope outside
While I agree with GP that this should be the expected behavior, your comment raises what I think is a large problem/wild-goose-chase in ‘modern’ language designs implementing concurrency.
The push from language designers (this applies across the high/low level spectrum and at all ranges of success for languages) to make concurrent code ‘look just like’ linearly read, synchronous, single-threaded code is pervasive and seems to avoid large pushback by users of the language. The complaints that should be made against this syntax design become complaints that code doesn’t do what developers think it should.
My position is that concurrent (and parallel) code IS NOT sequential code and languages should embrace those differences. The move to or design of async/await is often explicitly argued for from this position. But the semantic differences in concurrent code IMO should not be obscured or obfuscated by seeking to conform that code to sequential code’s syntax.
I’d love a way to be able to specify that sort of thing. I wrote a little server-side JSX rendering layer, and event handlers were serialized to strings, and so they had similar restrictions.
I'm not sure I'd describe the one you linked as "perfectly similar". At least to me, there's a couple obvious problems:
- Folding the corners of a rectangle an infinite number of times doesn't make it a circle, it just means it has an infinite number of corners.
- The folded corners always make right triangles, no matter how small they are. If you put the the non-hypotenuse legs of a right triangle against a circle, no matter how infinitely small the legs are, the corner of the legs will never touch the edge of the circle: an infinitely small triangle can't have all three points be the same point (or it's not a triangle). Which means the area of the folded rectangle will always exceed the area of the circle it's mimicking, even with infinite folds.
- As the folds become smaller and smaller, the arc of the circle (relative to the size of the triangles against it) becomes straighter and straighter. Which means each successive fold scrunches up more perimeter while becoming less and less circle-like.
There's also the intuition that the circumference of the circle must be less than the perimeter of the square, so if the perimeter of the polygon isn't decreasing as it gets closer to the circle, it doesn't approximate it better than the square itself.
I.e., the perimeter doesn't approach the circumference in value because it doesn't change.
It's an interesting thing to think through though, and maybe a good point about how arguments can seem intuitive at first but be wrong. On the other hand, I'm not sure that's any more true of visual proofs than other proofs.
I think your attempt to rebuke the proof is flawed too. The problem in your reasoning is mixing up "arbitrarily many" and "infinitely many".
There's no convergence after a finite number of steps. But at infinity, the canonical limit of this construction method is a circle. And because it is a circle, the circumference at infinity "jumps" to 2*pi. This is quite counterintuitive but perfectly legit in mathematical analysis. It's just one of many wacky properties of infinity.
I kind of ran into this when I was in high school and was introduced to limits.
For me the quandary was a "stair step" shape dividing a square with length of side "s" ("stairs" connecting two opposite diagonal corners). You could increase the number of steps—they get smaller—but the total rise + run of the stairs remains the same (2s). At infinity I reasoned you had a straight, diagonal line that should have been s√2 but was also still 2s in length.
At the very least you can say that the volume enclosed approached that of a right triangle (at infinity) but the perimeter stays stubbornly the same and not that of a right triangle at all.
This is indeed the common way most people encounter this. The proof of the difference in the limits for the perimeter vs. the area is in the first answer to the stack overflow question in the G(^n)P: https://math.stackexchange.com/a/12907
But imagine this was a domain you weren't familiar with, you didn't know that pi != 4, you didn't know that the proof was false going into it. Could you have come up with a list of problems so quickly?
For what it's worth, I'm not sure that problems 1 and 2 are actually genuine problems with the proof. You can approximate the length of a curve with straight lines by making them successively smaller. This is the first version of calculus that students learn. Problem 3 is the crux.
> But imagine this was a domain you weren't familiar with, you didn't know that pi != 4, you didn't know that the proof was false going into it. Could you have come up with a list of problems so quickly?
No, but if I didn't know anything about the domain, literally any proof (correct or incorrect) would seem fine. But then it's not really "proving" anything. Knowing enough for the proof to make sense but still unconditionally accepting assertions like "if you fold the corners an infinite number of times, it makes a circle" strikes me as odd.
> I'm not sure that problems 1 and 2 are actually genuine problems with the proof. You can approximate the length of a curve with straight lines by making them successively smaller.
But that's not what's happening here: the lines are straight, but you'd approximate the length of the curve with the hypotenuses, not the legs of the folds. Surely as you repeat this process you wouldn't think "wow, the circumference of this circle is actually equal to the perimeter of the original square." You'd have to disbelieve your own eyes and intuition and knowledge of circles to accept that this is true and hopefully you'd think "maybe I'm doing this wrong."
That's not to say 1 and 2 alone prove the visual proof incorrect, but they demonstrate that it is doing something wrong. Proofs that are correct don't have inconsistencies.
In math you have to disbelieve your own eyes and intuition an awful lot. Not in this case, I grant you. But there are plenty of counterintuitive results.
These are obvious problems to someone who has studied enough math/geometry/calculus to know how one form of "adding boxes together gets a curve" and another "adding boxes together does NOT get you a curve".
Are visual proofs meant for someone who hasn't studied math? I wouldn't expect them to prove anything to someone who hasn't. Any proof that's incorrect could reasonably fool a layman.
It's unclear whether you could build a JIT that meaningfully benefits from typescript types.
1. Hidden classes can't be created from TS interfaces because they don't represent the full data of the underlying object
2. You don't really ever want to compile code the first time you see it, because that takes a lot of memory and extra CPU cycles. By the time code has run enough to be worth compiling, you probably have enough profile data to optimize better than you could with data from the types anyway.
3. Many of the juiciest optimizations come from types that aren't representable in TS, like integers.
4. Including all the types for all your code and deps (literally all the .d.ts) is huge, and the size increase alone might nullify any performance benefit.
This is missing the point. If I want to instruct Claude to never write a database query that doesn't hit a preexisting index, where exactly am I supposed to document that? You can either choose:
1. A centralized location, like a README (congrats, you've just invented CLAUDE.md)
2. You add a docs folder (congrats, you've just done exactly what the author suggests under Progressive Disclosure)
Moreover, you can't just do it all in a README, for the exact reasons that the author lays out under "CLAUDE.md file length & applicability".
CLAUDE.md simply isn't about telling Claude what all the parts of your code are and how they work. You're right, that's what documenting your code is for. But even if you have READMEs everywhere, Claude has no idea where to put code when it starts a new task. If it has to read all your documentation every time it starts a new task, you're needlessly burning tokens. The whole point is to give Claude important information up front so it doesn't have to read all your docs and fill up its context window searching for the right information on every task.
Think of it this way: incredibly well documented code has everything a new engineer needs to get started on a task, yes. But this engineer has amnesia and forgets everything it's learned after every task. Do you want them to have to reonboard from scratch every time? No! You structure your docs in a way so they don't have to start from scratch every time. This is an accommodation: humans don't need this, for the most part, because we don't reonboard to the same codebase over and over. And so yes, you do need to go above and beyond the "same old good best practices".
think about how this thing is interacting with your codebase. it can read one file at a time. sections of files.
in this UX, is it ergonomic to go hunting for patterns and conventions? if u have to linearly process every single thing u look at every time you do something, how are you supposed to have “peripheral vision”? if you have amnesia, how do you continue to do good work in a codebase given you’re a skilled engineer?
it is different from you. that is OK. it doesn’t mean its stupid. it means it needs different accomodations to perform as well as you do. accomodations IRL exist for a reason, different people work differently and have different strengths and weaknesses. just like humans, you get the most out of them if you meet and work with them from where they’re at.
You put a warning where it is most likely to be seen by a human coder.
Besides, no amount of prompting will prevent this situation.
If it is a concern then you put a linter or unit tests to prevent it altogether, or make a wrapper around the tricky function with some warning in its doc strings.
I don't see how this is any different from how you typically approach making your code more resilient to accidental mistakes.
But they are right, claude routinely ignores stuff from CLAUDE.md, even with warning bells etc. You need a linter preventing things. Like drizzle sql` templates: it just loves them.
You can make affordances for agent abilities without deviating from what humans find to be good documentation. Use hyperlinks, organize information, document in layers, use examples, be concise. It's not either/or unless you're being lazy.
> no amount of prompting will prevent this situation.
Again, missing the point. If you don't prompt for it and you document it in a place where the tool won't look first, the tool simply won't do it. "No amount of promoting" couldn't be more wrong, it works for me and all my coworkers.
> If it is a concern then you put a linter or unit tests to prevent it altogether
Sure, and then it'll always do things it's own way, run the tests, and have to correct itself. Needlessly burning tokens. But if you want to pay for it to waste its time and yours, go for it.
> I don't see how this is any different from how you typically approach making your code more resilient to accidental mistakes.
It's not about avoiding mistakes! It's about having it follow the norms of your codebase.
- My codebase at work is slowly transitioning from Mocha to Jest. I can't write a linter to ban new mocha tests, and it would be a pain to keep a list of legacy mocha test suites. The solution is to simply have a bullet point in the CLAUDE.md file that says "don't write new Mocha test suites, only write new test suites in Jest". A more robust solution isn't necessary and doesn't avoid mistakes, it avoids the extra step of telling the LLM to rewrite the tests.
- We have a bunch of terraform modules for convenience when defining new S3 buckets. No amount of documenting the modules will have Claude magically know they exist. You tell it that there are convenience modules and to consider using them.
- Our ORM has findOne that returns one record or null. We have a convenience function getOne that returns a record or throws a NotFoundError to return a 404 error. There's no way to exhaustively detect with a linter that you used findOne and checked the result for null and threw a NotFoundError. And the hassle of maybe catching some instances isn't necessary, because avoiding it is just one line in CLAUDE.md.
> Yes there is? Though this is usually better served with a type checker, it’s still totally feasible with a linter too if that’s your bag
It's not, because you would have to implement a full static analyzer that traces where the result of a `findOne` call is checked for `null` and then check that the condition always leads to a `NotFoundError`. At best you've got a linter that only works some of the time, at worst you've just made your linter terribly slow and buggy.
> these tools still ignore that line sometimes so I still have to check for it myself.
You can also use your README (and in my own private project, I do!). But for folks who don't want their README clogged up with lots of facts about the project, you have CLAUDE.md
1. Create a tool that can check if a query hits a prexisting index
In step 2 either force Claude to use it (hooks) or suggest it (CLAUDE.md)
3. Profit!
As for "where stuff is", for anything more complex I have a tree-style graph in CLAUDE.md that shows the rough categories of where stuff is. Like the handler for letterboxd is in cmd/handlerletterboxd/ and internal modules are in internal/
Now it doesn't need to go in blind but can narrow down searches when I tell it to "add director and writer to the letterboxd handler output".
Learned this the hard way. Asked Claude Code to run a database migration. It deleted my production database instead, then immediately apologised and started panicking trying to restore it.
Thankfully Azure keeps deleted SQL databases recoverable, so I got it back in under an hour. But yeah - no amount of CLAUDE.md instructions would have prevented that. It no longer gets prod credentials.
This is a neat idea, but it's extremely light (no pun intended) on real details. Translating a simulation into real hardware that can do real computation in a reliable manner is properly hard. As much as I'd love to be an optimist about this project, I have to say I'll believe it when I see it actually running on a workbench.
If it does work, I think one of the biggest challenges will be adding enough complexity to it for it to do real, useful computation. Running the equivalent of GPT-2 is a cool tech demo, but if there's not an obvious path to scaling it up, it's a bit of a dead end.
Oh absolutely...this is kitchen-table level at this point. There is a clear path to really huge number of parameters, but a bunch of things need to be proven first. Like...can the detector meaningfully read what comes out the end of the optical chain?
I own three electric motorcycles and respectfully disagree. You can't make tube and canvas that let a passenger survive getting t-boned by a Yukon Denali or an F-250. One high-profile accident with a mother and her child getting peeled off the road with a coal shovel are all it'll take to kill such a form factor forever.
The problem isn't the form factor you're describing, it's that you can't put those on the road with 1000+ horsepower machines that are 50 times heavier. And on top of that, a lot of people just don't want to give up their heated massage seats and connected infotainment and removable third row or whatever crap they pack in minivans these days.
The elements of the form factor implied here are already on the road. Series hybrid bikes exist today. Fully faired bikes exist today. A fully feared tricycle recumbent could get you to work, clean and dry, on a dimes worth of energy. Cities like Barcelona and Taipei that already move on gas scooters, would smell immensely better if e-bikes took over.
American pick up trucks with their butch looking front ends that kill a lot of children are a stupid idea under any circumstances. But evidently we have to live with that death and destruction until they rust out. Kids are already dying because of the stupidity and we have not got what it takes to stop it. It means other places will benefit from better mobility sooner.
A fully feared tricycle recumbent will get you killed in a city with poor bike-ability. I have friends that have been in the hospital for weeks because of bike accidents in SF and NYC, which are arguably the exact kinds of places where you'd want bikes to replace cars. But instead, we have "Vision Zero" projects that still have staggeringly far to go.
I don't disagree with you: it would be great if we could replace more cars with bikes, but the reality is that there's almost nothing serious we can do in the US to undo the omnipresence of massive vehicles in most cities.
I agree with your comment, but I'll be a little pedantic for a minute:
As a Charger Daytona owner, I'd love to call the Mach-E a mustang, but it's really just borrowing the brand. Ford has said unequivocally that they'll never make an all-electric muscle car, which is a real shame. The Mach-E is a great car if you're turned off by a Model Y, but you wouldn't choose it over a mustang GT or a charger Daytona or a Camaro.
> Ford has said unequivocally that they'll never make an all-electric muscle car
What’s the thinking here? Pandering to some market segment? It sounds like they are organising the deck chairs in the titanic.
Edit: I tried looking into the comment. It seems he was referring to Mustangs specifically, which is weird as they do make an electric one (assuming you agree it’s a ‘real’ mustang).
The Mach-E isn't a muscle car. The comment was specifically around the Mustang sedan, which they do not have an electric version of.
Honestly, it's befuddling to me. There's a lot of folks who could get talked into an electric muscle car, they just have to know how to sell it. I own a Charger Daytona and literally every car guy I show it to has interest; I genuinely think Dodge just doesn't know how to market and sell it. I'm 100% confident that the right marketing agency could sell 100k of these, but the cohort of "it'll never be a Mustang" is far louder than the "wow that thing rips" crowd.
If I take a Ford Focus and call it a Mustang, is it? Arguably, no. Mustangs have a distinctive style, feel, feature set, intended audience. It's a matter of what people expect when they buy the thing.
The Mach-E kind of snuck in. I believe they intended to make more electric Mustang-branded cars, but things changed internally and priorities shifted. Lots of women really like Mustangs, and the Mach-E is positioned to appeal to many of the same people: it makes sense to use it as a kind of Trojan horse to ease folks into EVs with a brand they already like. But if you took a Mach-E and hid the name and asked folks "is this a Mustang?" The answer you'd get is "No".
I don't think what the article writes about matters all that much. Gemini 3 Pro is arguably not even the best model anymore, and it's _weeks_ old, and Google has far more resources than Anthropic does. If the hardware actually was the secret sauce, Google would be wiping the floor with little everyone else.
But they're not.
There's a few confounding problems:
1. Actually using that hardware effectively isn't easy. It's not as simple as jacking up some constant values and reaping the benefits. Actually using the hardware is hard, and by the time you've optimized for it, you're already working on the next model.
2. This is a problem that, if you're not Google, you can just spend your way out of. A model doesn't take a petabyte of memory to train or run. Regular old H100s still mostly work fine. Faster models are nice, but Gemini 3 Pro being 50% of the latency as Opus 4.5 or GPT 5.1 doesn't add enough value to matter to really anyone.
3. There's still a lot of clever tricks that work as low hanging fruit to improve almost everything about ML models. You can make stuff remarkably good with novel research without building your own chips.
4. A surprising amount of ML model development is boots on the ground work. Doing evals. Curating datasets. Tweaking system prompts. Having your own Dyson sphere doesn't obviate a lot of the typing and staring at a screen that necessarily has to be done to make a model half decent.
5. Fancy bespoke hardware means fancy bespoke failure modes. You can search stack overflow for CUDA problems, you can't just Bing your way to victory when your fancy TPU cluster isn't doing the thing you want it to do.
I think you are addressing the issue from a developer's perspective. I don't think TPUs are going to be sold to individual users anytime soon. What the article is pointing out is that Google is now able to squeeze significantly more performance per dollar than their peer competitors in the LLM space.
For example, OpenAI has announced trillion-dollar investments in data centers to continue scaling. They need to go through a middle-man (Nvidia), while Google does not, and will be able to use their investment much more efficiently to train and serve their own future models.
> Google is now able to squeeze significantly more performance per dollar than their peer competitors in the LLM space
Performance per dollar doesn't "win" anything though. Performance (as in speed) hardly cracks the top five concerns that most folks have when choosing a model provider, because fast, good models already exist at price points that are acceptable. That might mean slightly better margins for Google, but ultimately isn't going to make them "win"
It's not slightly better margins, we are talking about huge cost reductions on the main expense which is compute. In a context where companies are making trillion dollar investments, it matters a lot.
Also, performance and user choice are definitely impacted by compute. If they ever find a way to replace a job with LLMs, those who can throw more compute at it for a lower price point will win.
Google owns 14% of Anthropic and Anthropic is using Google TPUs, as well as AWS Trainium and of course GPUs. It isn't necessary for one company to create both the winning hardware and the winning software to be part of the solution. In fact with the close race in software hardware seems like the better bet.
But price per token isn't even a directly important concern anymore. Anyone with a brain would pay 5x more per token for a model that uses 10x fewer tokens with the same accuracy. I've gone all in on Opus 4.5 because even though it's more expensive, it solves the problems I care about with far fewer tokens.
Slightly more seriously: what you say makes sense if and only if you're projecting Sam Altman and assuming that a) real legit superhuman AGI is just around the corner, and b) all the spoils will accrue to the first company that finds it, which means you need to be 100% in on building the next model that will finally unlock AGI.
But if this is not the case -- and it's increasingly looking like it's not -- it's going to continue to be a race of competing AIs, and that race will be won by the company that can deliver AI at scale the most cheaply. And the article is arguing that company will be Google.
I think you are missing the point. They are saying "weeks old" isn't very old.
> it's going to continue to be a race of competing AIs, and that race will be won by the company that can deliver AI at scale the most cheaply.
I don't see how that follows at all. Quality and distribution both matter a lot here.
Google has some advantages but some disadvantages here too.
If you are on AWS GovCloud, Anthropic is right there. Same on Azure, and on Oracle.
I believe Gemini will be available on the Oracle Cloud at some point (it has been announced) but they are still behind in the enterprise distribution race.
OpenAI is only available on Azure, although I believe their new contract lets them strike deals elsewhere.
On the consumer side, OpenAI and Google are well ahead of course.
Last week it looked like Google had won (hence the blog post) but now almost nobody is talking about antigravity and Gemini 3 anymore so yeah what op says is relevant
It definitely depends on how you're measuring. But the benchmarks don't put it at the top for many ways of measuring, and my own experience doesn't put it at the top. I'm glad if it works for you, but it's not even a month old and there are lots of folks like me who see it as definitely worse for classes of problems that 3 Pro could be the best at.
Which is to say, if Google was set up to win, it shouldn't even be a question that 3 Pro is the best. It should be obvious. But it's definitely not obvious that it's the best, and many benchmarks don't support it as being the best.
On point 5, I think this is the real moat for CUDA. Does Google have tools to optimize kernels on their TPUs? Do they have tools to optimize successive kernel launches on their TPUs? How easy is it to debug on a TPU(arguably CUDA could use work here but still...)? Does Google help me fully utilize their TPUs? Can I warm up a model on a TPU, checkpoint it, and launch the checkpoints to save time?
I am fairly pro-google(they invented the LLM, FFS...) and recognize the advantages(price/token, efficiency, vertical integration, established DCs w/ power allocations) but also know they have a habit of slightly sucking at everything but search.
I've really only found benefit on the return type of functions, when you can say that a type parameter satisfies a type (with the return type being a boolean). This let's you use `if (isPerson(foo))` and typescript will narrow the type appropriately in the conditional
reply