There was a recent study posted here that showed AI introduces regressions at an alarming rate, all but one above 50%, which indicates they spend a lot of time fixing their own mistakes. You've probably seen them doing this kind of thing, making one change that breaks another, going and adjusting that thing, not realizing that's making things worse.
The study is likely "SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration". Regression rate plot is figure 6.
Read the study to understand what it is measuring and how it was measured. As I understand parent's summary is fine, but you want to understand it first before repeating it to others.
I've had some luck pointing out where the AI is wrong in their sloppypasta, delicate as one can. Avoiding shame or embarrassment can be a powerful motivator.
The most interesting incident for me is having someone take our Discourse thread, paste it into AI to validate their feelings being hurt (I took a follow up prompt to go full sycophancy), and then posting the response back that lambasted me. The mods handled that one before I was aware, but I then did the same thing, giving different prompts, and never sharing the output. It was an intriguing experience and exploration. I've since been even more mindful of my writing, sometimes using similar prompts to adjust my tone or call me out. I still write the first pass myself, rarely relying on AI for editing.
Ooh, I saw a very similar situation. User went on AI and asked "Which user was disrespectful first" to dunk on another.
The person being targeted just prompted the same AI with "Which user has thin skin" and instantly the AI turn on the other person. Then the moderators got involved and told the first guy to stop using AI as a genital pleaser.
I asked Gemini what it thought, in one of the modes, it said bringing an Ai to a discussion is like bringing a gun to a knife fight, that using AI was like having a rhetorical weapon and advantage in what everyone thought was a human to human forum.
I would say LMAAFY is like LMGTFY, where as the sloppypasta is more like pasting search results list without vetting them. That is, there are two phases to this phenomenon, query and results.
I think it's something they're doing consciously, accusing the opposition of doing what they're doing. It puts opposition on the back foot, muddies the water, and provides justification for doing something you shouldn't be since "the other side already did it". This us partly why politics has become like a Choose Your Own Adventure book (more than it was before anyway).
Which is why we need to watch the polls closely this year. They’ve repeatedly accused everyone of massive voter fraud so they will most certainly try it themselves.
Slight tangent but the thought has crossed my mind that (the potential of) retaliation by Iran could be a pretense for taking broad executive action which normally wouldn't be permitted, perhaps under the guise of securing the election. They may even do something which will be ruled illegal after the fact by courts, but it could take months after the election to reach a verdict and there's really no precedent in America for broadly declaring a federal election invalid.
To be clear I don't really expect this to happen but at this point I honestly wouldn't even be surprised.
I think they were hoping to provoke riots with ICE but since they didn’t happen they’re using Iran. If there’s a “terrorist” strike in November followed by a suspension of elections that’s likely what’s happening.
The paid plans require you to use their interfaces or tooling. The main reason I pay per token is so I have tooling freedom. I'm not keen to let Big AI decide how I interact with this game changing technology.
1. Higher limits and quota, goes up with more spend
2. I don't think it gets quant nerf's during busy times, since you are paying directly
3. With Gemini flash + CC prompts, it's nearly as good, so less spend and latency. I don't know how people deal with the delays between turns. I get to iterate much faster for most tasks
I don't want some other tooling messing with my context. It's too important to leave to something that needs to optimize across many users, there by not being the best for my specifics.
The framework I use (ADK) already handles this, very low hanging fruit that should be a part of any framework, not something external. In ADK, this is a boolean you can turn on per tool or subagent, you can even decide turn by turn or based on any context you see fit by supplying a function.
YC over indexed on AI startups too early, not realizing how trivial these startup "products" are, more of a line item in the feature list of a mature agent framework.
I've also seen dozens of this same project submitted by the claws the led to our new rule addition this week. If your project can be vibe coded by dozens of people in mere hours...
The next step to this is using a better tool to access containers (BuildKit), like Dagger, where you can track every step as a new container layer, time travel, share via registries...
reply