More

westoncb · 2026-04-13T00:40:26 1776040826

Taking on a 'slow' software project with the kind of attention to quality (inside and out) that I had pre-AI. It's a tool I'll use myself, LLM-related, but not any kind of radical idea; it's main value is in careful UX design/efficiency, engineering quality, and aesthetics.

I've been shooting for the moon with one experimental idea after another (like many others) testing out LLM capabilities as they develop, for at least 2yrs now.

I'm still very excited about how these new tools are changing the nature of software development work, but it's easy to get into this frenetic mode with it, and I think the antidote is along the lines of 'slowing down'.

westoncb · 2026-03-31T20:02:58 1774987378

>but you could give me two black boxes that act the same externally, one written as a single line , single character variables, etc. etc. etc. and another written to be readable, and I wouldn't care so long as I wasn't expected to maintain it.

The reality of software products is that they are in nearly in all cases developed/maintained over time, though--and whenever that's the case, the black box metaphor fails. It's an idealization that only works for single moments of time, and yet software development typically extends through the entire period during which a product has users.

> I read OPs "good code" to mean "highly aesthetic code" (well laid out, good abstractions, good comments, etc. etc.)

The above is also why these properties you've mentioned shouldn't be considered aesthetic only: the software's likelihood of having tractable bugs, manageable performance concerns, or to adapt quickly to the demands of its users and the changing ecosystem it's embedded in are all affected by matters of abstraction selection, code organization, and documentation.

westoncb · 2026-02-14T23:21:25 1771111285

He also remade quake a couple weeks ago (on three.js as well I believe).

andai · 2026-02-15T08:18:48 1771143528

https://mrdoob.com/#/160/threejs_quake

(It's also his homepage now, but I included the full link for posterity.)

--

Edit: How do you actually play? I keep getting trapped in the Shareware Dimension!

westoncb · 2026-01-23T23:06:42 1769209602

Interesting that compaction is done using an encrypted message that "preserves the model's latent understanding of the original conversation":

> Since then, the Responses API has evolved to support a special /responses/compact endpoint (opens in a new window) that performs compaction more efficiently. It returns a list of items (opens in a new window) that can be used in place of the previous input to continue the conversation while freeing up the context window. This list includes a special type=compaction item with an opaque encrypted_content item that preserves the model’s latent understanding of the original conversation. Now, Codex automatically uses this endpoint to compact the conversation when the auto_compact_limit (opens in a new window) is exceeded.

icelancer · 2026-01-24T00:21:27 1769214087

Their compaction endpoint is far and away the best in the industry. Claude's has to be dead last.

nubg · 2026-01-24T03:35:50 1769225750

Help me understand, how is a compaction endpoint not just a Prompt + json_dump of the message history? I would understand if the prompt was the secret sauce, but you make it sound like there is more to a compaction system than just a clever prompt?

FuckButtons · 2026-01-24T08:01:07 1769241667

They could be operating in latent space entirely maybe? It seems plausible to me that you can just operate on the embedding of the conversation and treat it as an optimization / compression problem.

e1g · 2026-01-24T09:35:36 1769247336

Yes, Codex compaction is in the latent space (as confirmed in the article):

> the Responses API has evolved to support a special /responses/compact endpoint [...] it returns an opaque encrypted_content item that preserves the model’s latent understanding of the original conversation

xg15 · 2026-01-24T10:36:12 1769250972

Is this what they mean by "encryption" - as in "no human-readable text"? Or are they actually encrypting the compaction outputs before sending them back to the client? If so, why?

e1g · 2026-01-24T10:50:43 1769251843

"encrypted_content" is just a poorly worded variable name that indicates the content of that "item" should be treated as an opaque foreign key. No actual encryption (in the cryptographic sense) is involved.

whatreason · 2026-01-24T16:06:45 1769270805

This is not correct, encrypted content is in fact encrypted content. For openai to be able to support ZDR there needs to be a way for you to store reasoning content client side without being able to see the actual tokens. The tokens need to stay secret because it often contains reasoning related to safety and instruction following. So openai gives it to you encrypted and keeps the keys for decrypting on their side so it can be re-rendered into tokens when given to the model.

There is also another reason, to prevent some attacks related to injecting things in reasoning blocks. Anthropic has published some studies on this. By using encrypted content, openai and rely on it not being modified. Openai and anthropic have started to validate that you're not removing these messages between requests in certain modes like extended thinking for safety and performance reasons

EnPissant · 2026-01-24T11:17:58 1769253478

Are you sure? For reasoning, encrypted_content is for sure actually encrypted.

e1g · 2026-01-24T12:20:05 1769257205

Hmmm, no, I don't know this for sure. In my testing, the /compact endpoint seems to work almost too well for large/complex conversations, and it feels like it cannot contain the entire latent space, so I assumed it keeps pointers inside it (ala previous_response_id). On the other hand, OpenAI says it's stateless and compatible with Zero Data Retention, so maybe it can contain everything.

EnPissant · 2026-01-25T04:27:01 1769315221

They say they do not compress the user messages, but yeah, it's purpose is to do very lossy compression of everything else. I'd expect it to be small.

xg15 · 2026-01-24T10:52:27 1769251947

Ah, that makes more sense. Thanks!

Art9681 · 2026-01-25T01:05:27 1769303127

Their models are specifically trained for their tools. For example the `apply_patch` tool. You would think it's just another file editing tool, but its unique diff format is trained into their models. It also works better than the generic file editing tools implemented in other clients. I can also confirm their compaction is best in class. I've imlemented my own client using their API and gpt-5.2 can work for hours and process millions of input tokens very effectively.

EnPissant · 2026-01-24T05:44:39 1769233479

Maybe it's a model fine tuned for compaction?

kordlessagain · 2026-01-24T03:02:26 1769223746

Yes, agree completely.

swalsh · 2026-01-24T01:14:31 1769217271

Is it possible to use the compactor endpoint independently? I have my own agent loop i maintain for my domain specific use case. We built a compaction system, but I imagine this is better performance.

__jl__ · 2026-01-24T01:31:50 1769218310

Yes you can and I really like it as a feature. But it ties you to OpenAI…

westoncb · 2026-01-24T01:17:38 1769217458

I would guess you can if you're using their Responses api for inference within your agent.

jswny · 2026-01-24T01:34:27 1769218467

How does this work for other models that aren’t OpenAI models

westoncb · 2026-01-24T01:41:40 1769218900

It wouldn’t work for other models if it’s encoded in a latent representation of their own models.

westoncb · 2026-01-14T21:25:29 1768425929

https://symbolflux.com

westoncb · 2026-01-02T03:05:48 1767323148

That depends on the content of the SVGs.. Of course you can write a script to do a very literally kind of conversion of regardless, but in practice a lot of interpretation would be required, and could be done by an LLM. Simple case is an SVG that's a static presentation of a button; the intended React component could handle hover and click states and change the cursor appropriately and set aria label etc. For anything but trivial cases a script isn't going to get you far.

westoncb · 2025-12-20T20:31:31 1766262691

That's about how it came across for me as well: ignoring my actual content and joking about generalizations related to key words.

Project is cool overall, love the xkcd-like comic idea—but prompting and/or model-selection could use some work. I'd like to take a crack at tuning it myself :)

westoncb · 2025-12-16T21:55:49 1765922149

It sounds more like you just made an overly simplistic interpretation of their statement, "everything works like I think it should," since it's clear from their post that they recognize the difference between some basic level of "working" and a well-engineered system.

Hopefully you aren't discouraged by this, observationist, pretty clear hansmayer is just taking potshots. Your first paragraph could very well have been written by a professional SWE who understood what level of robustness was required given the constraints of the specific scenario in which the software was being developed.

westoncb · 2025-12-15T00:59:55 1765760395

I've been on a break from coding for about a month but was last working on a new kind of "uncertainty reducing" hierarchical agent management system. I have a writeup of the project here: https://symbolflux.com/working-group-foundations.html

westoncb · 2025-12-09T02:55:19 1765248919

> So, they found an underlying commonality among the post-training structures in 50 LLaMA3-8B models, 177 GPT-2 models, and 8 Flan-T5 models; and, they demonstrated that the commonality could in every case be substituted for those in the original models with no loss of function; and noted that they seem to be the first to discover this.

Could someone clarify what this means in practice? If there is a 'commonality' why would substituting it do anything? Like if there's some subset of weights X found in all these models, how would substituting X with X be useful?

I see how this could be useful in principle (and obviously it's very interesting), but not clear on how it works in practice. Could you e.g. train new models with that weight subset initialized to this universal set? And how 'universal' is it? Just for like like models of certain sizes and architectures, or in some way more durable than that?

farhanhubble · 2025-12-09T03:24:54 1765250694

It might we worth it to use that subset to initialize the weights of future models but more importantly you could save a huge number of computational cycles by using the lower dimensional weights at the time of inference.

westoncb · 2025-12-09T05:22:36 1765257756

Ah interesting, I missed that possibility. Digging a little more though my understanding is that what's universal is a shared basis in weight space, and particular models of same architecture can express their specific weights via coefficients in a lower-dimensional subspace using that universal basis (so we get weight compression, simplified param search). But it also sounds like to what extent there will be gains during inference is in the air?

Key point being: the parameters might be picked off a lower dimensional manifold (in weight space), but this doesn't imply that lower-rank activation space operators will be found. So translation to inference-time isn't clear.

farhanhubble · 2025-12-09T08:29:39 1765268979

My understanding differs and I might be wrong. Here's what I inferred:

Let's say you finetune a Mistral-7B. Now, there are hundreds of other fine-tuned Mistral-7B's, which means it's easy to find the universal subspace U of the weights of all these models combined. You can then decompose the weights of your specific model using U and a coefficient matrix C specific to your model. Then you can convert any operation of the type `out=Wh` to `out=U(C*x)` Both U and C are much smaller dimension that W and so the number of matrix operations as well as the memory required is drastically lower.

altairprime · 2025-12-09T03:10:11 1765249811

Prior to this paper, no one knew that X existed. If this paper proves sound, then now we know that X exists at all.

No matter how large X is, one copy of X baked into the OS / into the silicon / into the GPU / into CUDA, is less than 50+177+8 copies of X baked into every single model. Would that permit future models to be shipped with #include <X.model> as line 1? How much space would that save us? Could X.model be baked into chip silicon so that we can just take it for granted as we would the mathlib constant "PI"? Can we hardware-accelerate the X.model component of these models more than we can a generic model, if X proves to be a 'mathematical' constant?

Given a common X, theoretically, training for models could now start from X rather than from 0. The cost of developing X could be brutal; we've never known to measure it before. Thousands of dollars of GPU per complete training at minimum? Between Google, Meta, Apple, and ChatGPT, the world has probably spent a billion dollars recalculating X a million times. In theory, they probably would have spent another billion dollars over the next year calculating X from scratch. Perhaps now they won't have to?

We don't have a lot of "in practice" experience here yet, because this was first published 4 days ago, and so that's why I'm suggesting possible, plausible, ways this could help us in the future. Perhaps the authors are mistaken, or perhaps I'm mistaken, or perhaps we'll find that the human brain has X in it too. As someone who truly loathes today's "AI", and in an alternate timeline would have completed a dual-major CompSci/NeuralNet degree in ~2004, I'm extremely excited to have read this paper, and to consider what future discoveries and optimizations could result from it.

EDIT:

Imagine if you had to calculate 3.14159 from basic principles every single time you wanted to use pi in your program. Draw a circle to the buffer, measure it, divide it, increase the memory usage of your buffer and resolution of your circle if necessary to get a higher precision pi. Eventually you want pi to a billion digits, so every time your program starts, you calculate pi from scratch to a billion digits. Then, someday, someone realizes that we've all been independently calculating the exact same mathematical constant! Someone publishes Pi: An Encyclopedia (Volume 1 of ∞). It becomes inconceivably easier to render cones and spheres in computer graphics, suddenly! And then someone invents radians, because now that we can map 0..360° onto 0..τ, and no one predicted radians at all but it's incredibly obvious in hindsight.

We take for granted knowledge of things like Pi, but there was a time when we did not know it existed at all. And then for a long time it was 3. And then someone realized the underlying commonality of every circle and defined it plainly, and now we have Pi Day, and Tau Day, because not only do we know it exists, but we can argue about it. How cool is that! So if someone has discovered a new 'constant', then that's always a day of celebration in my book, because it means that we're about to see not only things we consider "possible, but difficult" to instead be "so easy that we celebrate their existence with a holiday", but also things that we could never have remotely dreamed of before we knew that X existed at all.

(In less tangible analogies, see also: postfix notation which was repeatedly invented for decades (by e.g. Dijkstra) as a programming advance, or the movie "Arrival" (2019) as a linguistic advance, or the BLIT Parrot (don't look!) as a biological advance. :)

AIchemist · 2025-12-09T06:29:51 1765261791

If even remotely fact what you suggest here, I see two antipodal trajectories the authors secretly huddled and voted on:

1. As John Napier, who freely, generously, gifted his `Mirifici' for the benefit of all.

2. Here we go, patent trolls, have at it. OpenAI, et al burning midnight oil to grab as much real estate on this to erase any (even future?) debt stress, deprecating the AGI Philospher's Stone to first owning everything conceivable from a new miraculous `my precious' ring, not `open', closed.