More

nylonstrung · 2026-03-01T08:01:40 1772352100

It sounds like a "cursed problem". Are there any contemporary techniques that show any promise?

nylonstrung · 2026-02-25T22:30:39 1772058639

Great article. I foresee people rediscovering 'Test Driven Development', probably with a new buzzword slapped on it

_aavaa_ · 2026-02-26T22:47:15 1772146035

Reinforcement learning with program feedback

conception · 2026-02-26T01:24:21 1772069061

Agentic Reassurance Patterns

nylonstrung · 2026-02-25T01:26:32 1771982792

I'm not sold on diffusion models.

Other labs like Google have them but they have simply trailed the Pareto frontier for the vast majority of use cases

Here's more detail on how price/performance stacks up

https://artificialanalysis.ai/models/mercury-2

volodia · 2026-02-25T02:03:55 1771985035

I’d push back a bit on the Pareto point.

On speed/quality, diffusion has actually moved the frontier. At comparable quality levels, Mercury is >5× faster than similar AR models (including the ones referenced on the AA page). So for a fixed quality target, you can get meaningfully higher throughput.

That said, I agree diffusion models today don’t yet match the very largest AR systems (Opus, Gemini Pro, etc.) on absolute intelligence. That’s not surprising: we’re starting from smaller models and gradually scaling up. The roadmap is to scale intelligence while preserving the large inference-time advantage.

ainch · 2026-02-25T03:13:16 1771989196

This understates the possible headroom as technical challenges are addressed - text diffusion is significantly less developed than autoregression with transformers, and Inception are breaking new ground.

nylonstrung · 2026-02-25T03:55:29 1771991729

Very good point- if as much energy/money that's gone into ChatGPT style transformer LLMs were put into diffusion there's a good chance it would outperform in every dimension

nylonstrung · 2026-02-25T03:59:49 1771991989

I changed my mind: this would be perfect for a fast edit model ala Morph Fast Apply https://www.morphllm.com/products/fastapply

It looks like they are offering this in the form of "Mercury Edit"and I'm keen to try it

nylonstrung · 2026-02-24T21:38:02 1771969082

It's extremely similar to the fake "agentic" crypto plays a year ago

Where Goatseus Maximus and stuff supposedly created coins and invested autonomously.

Obviously it was BS but it fueled a huge amount of attention and speculation

nylonstrung · 2026-02-23T08:03:20 1771833800

I hate these Lovable-generated slopsites

nylonstrung · 2026-02-23T08:01:04 1771833664

BAR is incredible, probably best RTS right now

nylonstrung · 2026-02-23T03:51:02 1771818662

This is way too dense, you need to distill your thesis and interesting ideas down to a small post if you expect people spending time reading a 417 page PDF

MajorBee · 2026-02-23T04:47:51 1771822071

You're crazy if you think the target demo of "business leaders" and "thought leaders" aren't going to dump it into their favorite LLM first thing and prompt their way into a summary.

thesmtsolver2 · 2026-02-23T05:26:44 1771824404

So much water and resources being wasted by "thought leaders" posting performative BS on LinkedIn (just count "It is not X, it is Y" style posts).

amire · 2026-02-24T15:09:02 1771945742

The "muse vs. writer" framing is a good start, but the real issue is the source of inspiration. An AI prompted on a blank slate will only ever generate a sophisticated average of its training data. The workflow is broken. A better system doesn't start with "What should I write?" but with "What have I learned?" Using AI to synthesize your unique takeaways from high-signal content you've already consumed—a podcast, a talk—is how you scale authenticity, not just words.

I'm the founder of Castifai.com, which is built for this. It systematizes the "muse" by creating a workflow that starts with content you consume (talks, podcasts) and turns your insights into authentic drafts, solving the input problem.

amire · 2026-02-24T14:59:21 1771945161

This isn't a content problem; it's a systems problem. The pressure to create without a pipeline for genuine insights leads to these templates. Authentic thought leadership should be a byproduct of a consumption and synthesis workflow, not a forced, separate task. I've been working on solving this - first for myself and then for others - by building a tool for this called Castifai. It's a consumption-first workflow that helps turn insights from content you already consume into authentic posts, so you're sharing what you know, not just filling a quota. (I'm the founder). You can try it at castifai.com

nicklo · 2026-02-23T06:17:08 1771827428

directionally correct but important to note the water wasted by sustaining the insufferable human is much higher than producing the tokens

bananaflag · 2026-02-23T03:59:27 1771819167

I'm not the author, I just got sent the link by someone else :)

nylonstrung · 2026-02-21T01:47:07 1771638427

This wasn't a16z monolithically speaking as a firm, it was Anish Acharya talking on a podcast.

Seems like he's focused on fintech and not involved in many of their LLM investments

nylonstrung · 2026-02-20T13:23:20 1771593800

It has all the trappings of NIH syndrome.

Reinventing the wheel without explaining why existing tools didn't work

Creating buzzwords ("blueprints" "devboxes") for concepts that are not novel and already have common terms

Yet they embrace MCP of all things as a transport layer- the one part of the common "agentic" stack that genuinely sucks and needs to be reinvented

menaerus · 2026-02-20T17:20:13 1771608013

They mention "Why did we build it ourselves" in the part1 series: https://stripe.dev/blog/minions-stripes-one-shot-end-to-end-...

However, it is also light on material. I would also like to hear more technical details, they're probably intentionally secretive about it.

But I do, however, understand that building an agent that is highly optimized for your own codebase/process is possible. In fact, I am pretty sure many companies do that but it's not yet in the ether.

Otherwise, one of the most interesting bits from the article was

> Over 1,300 Stripe pull requests (up from 1,000 as of Part 1) merged each week are completely minion-produced, human-reviewed, but containing no human-written code.

tempest_ · 2026-02-20T17:27:14 1771608434

"human reviewed"

"LGTM..."

I feel like code review is already hard and under done the 'velocity' here is only going to make that worse.

I am also curious how this works when the new crop of junior devs do not have the experience enough to review code but are not getting the experience from writing it.

Time will tell I guess.

menaerus · 2026-02-20T17:39:25 1771609165

Agents can already do the review by themselves. I'd be surprised they review all of the code by hand. They probably can't mention it due to the regulatory of the field itself. But from what I have seen agentic review tools are already between 80th and 90th percentile. Out of randomly picked 10 engineers, it will provide more useful comments than most engineers.

tibbar · 2026-02-20T18:13:22 1771611202

the problem with LLM code review is that it's good at checking local consistency and minor bugs, but it generally can't tell you if you are solving the wrong problem or if your approach is a bad one for non-technical reasons.

This is an enormous drawback and makes LLM code review more akin to a linter at the moment.

menaerus · 2026-02-20T18:23:10 1771611790

I mean if the model can reason about making the changes on the large-scale repository then this implies it can also reason about the change somebody else did, no? I kinda agree and disagree with you at the same time, which is why I said most of the engineers but I believe we are heading towards the model being able to completely autonomously write and review its own changes.

tibbar · 2026-02-20T18:33:39 1771612419

There's a good chance that in the long run LLMs can become good at this, but this would require them e.g. being plugged into the meetings and so on that led to a particular feature request. To be a good software engineer, you need all the inputs that software engineers get.

menaerus · 2026-02-20T18:49:55 1771613395

If you read thoroughly through Stripe blog, you will see that they feed their model already with this or similar type of information. Being plugged into the meetings might just mean feed the model with the meeting minutes or let the model listen to the meeting and transcribe the meeting. It seems to me that both of them are possible even as of today.

__float · 2026-02-20T16:06:04 1771603564

What are the common terms for those? (I have heard "devbox" across multiple companies, and I'm not in the LLM world enough to know the other parts.)

CuriouslyC · 2026-02-20T15:38:33 1771601913

I was an early MCP hater, but one thing I will say about it is that it's useful as a common interface for secure centralization. I can control auth and policy centrally via a MCP gateway in a way that would be much harder if I had to stitch together API proxies, CLIs, etc to provide capabilities.

croes · 2026-02-20T17:55:57 1771610157

>Reinventing the wheel without explaining why existing tools didn't work

Won‘t that be the nee normal with all those AI agents?

No frameworks, no libraries, just let AI create everything from scratch again

throwaway-aws9 · 2026-02-20T15:07:29 1771600049

resume driven development

nylonstrung · 2026-02-20T13:12:07 1771593127

Agree that routing is becoming the critical layer here. Vllm iris is really promising for this https://blog.vllm.ai/2026/01/05/vllm-sr-iris.html

There's already some good work on router benchmarking which is pretty interesting