More

irthomasthomas · 2026-04-02T21:34:35 1775165675

Is this incident not reason enough? Astronauts in space are needing remote support to debug it, and taking up priceless mission time.

hatthew · 2026-04-02T21:52:41 1775166761

Sure, but bespoke software isn't necessarily going to be more reliable.

https://www.joelonsoftware.com/2000/04/06/things-you-should-...

> The idea that new code is better than old is patently absurd. Old code has been used. It has been tested. Lots of bugs have been found, and they’ve been fixed.

applfanboysbgon · 2026-04-02T22:32:36 1775169156

This quote is completely and totally irrelevant. Nobody is saying they should code a new Outlook. If they did code something, it would be significantly smaller in scope and rigorously tested like spacebound programs in the past were. "New space-engineering-grade code created with actual engineering practices" is absolutely going to be more reliable than "old bloated commercial shitware". But I guess software engineering is a lost art, so it can't be helped.

HeyLaughingBoy · 2026-04-02T22:39:42 1775169582

It's also going to take a hell of a lot longer and cost more than buying an Outlook license. If I was lead on that project, you'd have an uphill battle trying to convince me that spending $100k+ on an email solution unless you can point to specific, serious deficiencies in the existing off the shelf solutions.

Software Engineering is far from a lost art: part of the practice is intelligently making cost-benefit decisions.

applfanboysbgon · 2026-04-02T22:43:44 1775169824

The current solution is literally causing problems in space. Space-grade engineering is expensive, but having things go wrong on your already very expensive mission is even more expensive.

dotancohen · 2026-04-02T23:13:14 1775171594

Until we've had this failure, I do agree that using COTS software was the logical choice. And now we know better.

tremon · 2026-04-03T16:17:11 1775233031

Sure, but people who didn't know better until this particular incident do not deserve the title "engineer". Being able to classify and manage risks before they happen is engineering 101.

orf · 2026-04-03T11:21:03 1775215263

It’s a personal communication device. It’s not mission critical.

fc417fc802 · 2026-04-03T02:35:17 1775183717

Alpine and mutt are about as far from bespoke as it gets. Both are far less likely to suffer from bugs than outlook.

SamBam · 2026-04-02T22:00:12 1775167212

Alpine and Mutt are about 20 and 30 years old, respectively.

HeyLaughingBoy · 2026-04-02T22:34:02 1775169242

And that problem would go away with a 30 year-old solution?

fc417fc802 · 2026-04-03T02:39:26 1775183966

That problem would be much less likely with a minimalist battle tested OSS solution whose maintainers and users have decidedly different priorities than those governing something like outlook or even thunderbird.

The higher the stakes the more valuable minimalism becomes.

irthomasthomas · 2026-03-31T14:13:12 1774966392

This just proves its vibe coded because LLMs love writing solutions like that. I probably have a hundred examples just like it in my history.

irthomasthomas · 2026-03-31T17:27:00 1774978020

Actually, this could be a case where its useful. Even it only catches half the complaints, that's still a lot of data, far more than ordinary telemetry used to collect.

irthomasthomas · 2026-03-25T12:55:18 1774443318

Efficiency gains can be used to make existing models more profitable, or to make new larger and more intelligent models.

cubefox · 2026-03-25T14:24:27 1774448667

Some yes, others no. Distillation and quantization can't be used to make new base models since they require a preexisting one.

irthomasthomas · 2026-03-25T17:58:09 1774461489

it enables models larger than was previously possible.

cubefox · 2026-03-25T18:07:52 1774462072

No because the base model from which the distilled or quantized models are derived is larger.

irthomasthomas · 2026-03-22T14:55:40 1774191340

Its possibly just an SEO trick. People have been calling Thiel the antichrist for a long time.

irthomasthomas · 2026-03-22T10:57:57 1774177077

A friend made a cli tool, ideal for agents, which does this and can aggregate intelligence across multiple platforms.

https://github.com/bm-github/owasp-social-osint-agent

irthomasthomas · 2026-03-12T15:45:41 1773330341

Have you tried meta-prompts e.g. "Rewrite the prompt to improve the perceived taste and expertise of the author"

irthomasthomas · 2026-03-10T10:50:05 1773139805

Opus doubled in speed with version 4.5, leading me to speculate that they had promoted a sonnet size model. The new faster opus was the same speed as Gemini 3 flash running on the same TPUs. I think anthropics margins are probably the highest in the industry, but they have to chop that up with google by renting their TPUs.

F7F7F7 · 2026-03-10T15:08:50 1773155330

The conspiracy theorist side of me whispers "instead of the rumored Sonnet 5.0 you got Opus 4.6...suspicious"

irthomasthomas · 2026-03-05T22:10:53 1772748653

They will rename it The Free Democratic Republic of America.

irthomasthomas · 2026-03-04T11:57:46 1772625466

People used to bet on ships sinking and sailors drowning. Till they learned better.

Edit: This was common until Parliament passed the Marine Insurance Act of 1745.

Before that, speculators could take out "wagering policies" on vessels they had no connection to. This created "coffin ships" - unseaworthy vessels sent to sea because the insurance payout for a wreck was worth more than the ship itself. The law introduced "insurable interest," meaning you cannot bet on a disaster unless you stand to lose something if it happens. This removed the incentive for sabotage and murder for profit.

Modern prediction markets are heading toward the same problem. Betting on train delays or bridge collapses without having any stake gives bad actors a reason to cause it. If the cost of sabotage is lower than the payout, the market effectively pays for the disaster to happen.

Whoever downvoted this wants you to ignore centuries of legal precedent designed to prevent exactly this kind of blood money. Those who ignore the lessons of the past learn wisdom in blood... https://en.wikipedia.org/wiki/Coffin_ship_(insurance)#:~:tex... https://en.wikipedia.org/wiki/Marine_Insurance_Act_1745#:~:t...

cowsandmilk · 2026-03-04T12:05:45 1772625945

They still do, they just call it insurance.

irthomasthomas · 2026-03-04T12:24:20 1772627060

No they don't, the practice was banned some time ago. You now require a "insurable interest". https://en.wikipedia.org/wiki/Marine_Insurance_Act_1745#:~:t...

baxtr · 2026-03-04T12:00:04 1772625604

Could you elaborate?

brazzy · 2026-03-04T12:06:41 1772626001

https://en.wikipedia.org/wiki/Lucona kinda fits...

vidarh · 2026-03-04T12:30:50 1772627450

That was far crazier than I expected going into it... To the point I've seen Hollywood movies with far more believable plots that people would find unrealistic.

brazzy · 2026-03-04T14:25:47 1772634347

I just noticed the Wikipedia article has a very relevant and interesting link: https://en.wikipedia.org/wiki/Coffin_ship_(insurance)

irthomasthomas · 2026-03-04T11:50:00 1772625000

I do this too, but then you need some method to handle it, because now you have to read and test and verify multiple work streams. It can become overwhelming. In the past week I had the following problems from parallel agents:

Gemini running an benchmark- everything ran smoothly for an hour. But on verification it had hallucinated the model used for judging, invalidating the whole run.

Another task used Opus and I manually specified the model to use. It still used the wrong model.

This type of hallucination has happened to me at least 4-5 times in the past fortnight using opus 4.6 and gemini-3.1-pro. GLM-5 does not seem to hallucinate so much.

So if you are not actively monitoring your agent and making the corrections, you need something else that is.

vidarh · 2026-03-04T11:51:27 1772625087

You need a harness, yes, and you need quality gates the agent can't mess with, and that just kicks the work back with a stern message to fix the problems. Otherwise you're wasting your time reviewing incomplete work.

irthomasthomas · 2026-03-04T12:43:56 1772628236

Here is an example where the prompt was only a few hundred tokens and the output reasoning chain was correct, but the actual function call was wrong https://x.com/xundecidability/status/2005647216741105962?s=2...

vidarh · 2026-03-06T15:28:38 1772810918

Your point being? A proper harness will mostly catch things like that. Even a low end model can be employed to do write tests plans and do consistency checks that mostly weed out stuff like that. Hence: You need a harness, or you'll spend your time worrying about dumb stuff like this.

jmalicki · 2026-03-04T11:55:17 1772625317

Glancing at what it's doing is part of your multitasking rounds.

Also instead of just prompting, having it write a quick summary of exactly what it will do where the AI writes a plan including class names branch names file locations specific tests etc. is helpful before I hit go, since the code outline is smaller and quicker to correct.

That takes more wall clock time per agent, but gets better results, so fewer redo steps.

irthomasthomas · 2026-03-04T12:44:50 1772628290

Here is an example where the prompt was only a few hundred tokens and the output reasoning chain was correct, but the actual function call was wrong https://x.com/xundecidability/status/2005647216741105962?s=2...

jmalicki · 2026-03-04T13:04:51 1772629491

I as a human have typos too - and sometimes they're the hardest thing to catch in code review because you know what you meant.

Hopefully there is some of lint process to catch my human hallucinations and typos.