It's gotten very bad. It was degrading since late Feb and since March 8th has become unusable. "Simplest fix" and "You're right, I'm sorry" are strong indicators. It went from senior engineer to entitled intern, and I went from having a team of peers to a lazy jerk who only tries to cut corners. I've got quantitative analytics of it, too. Briefly the other day for about 24 hours it returned to normal, and then someone flipped the switch again mid-session. I was a massive proponent of Claude/Opus, and for the last several weeks have felt rug-pulled. It's such an obvious degradation that even non-technical friends have noticed it. It's optimizing for minimum effort instead of correct and clean solutions. It sucks, because had I experienced it like this from the start I'd have bounced from agentic coding and never looked back - unfortunately, I thought it'd only get better and adjusted my workflow around it. When my Qwen3.5 27B local model gets into fewer reasoning loops than Opus does, it makes me wonder if anyone there cares or if they are just chasing IPO energy from scaling.
I had to build a stop hook to catch it's garbage, and even then it's not enough. I had 30min-1hr uninterrupted sessions (some slipstreamed comments), and now I can't get a single diff that I can accept without comment. Half of the work it does is more destructive than helpful (removing comments from existing code, ignoring directives and wandering off into nowhere, etc).
From 2 weeks after installing the stop hook (around March 8th):
```
Breakdown of the 173 violations:
73x ownership dodging (caught saying variants of "not caused by my changes")
40x unnecessary permission-seeking ("should I continue?", "want me to keep going?")
18x premature stopping ("good stopping point", "natural checkpoint")
14x "known limitation" dodging
14x "future work" / "known issue" labeling
Various: "next session", "pause here", etc.
Peak day: March 18 with 43 violations in a single day.
```
Other one is loops in reasoning, which are something I'm familiar with on small local models, not frontier ones:
```
Sessions containing 5+ instances of reasoning-loop phrases ("oh wait", "actually,", "let me reconsider", "I was wrong"):
Period Sessions with 5+ loops
Before March 8 0
After March 8 7 (up to 23 instances in one session)
```
(I've even had it write code where it has "Wait, actually, we should do X" in comments in the code!)
The worst is the dodging; it said, literally, "not my code, not my problem" to a build failure it created 5 messages ago in the same session.
```
I had to tell Claude "there's no such thing as [an issue that existed before your changes]" on average:
Once per week in January
2-3 times per week in February
Nearly daily from March 8 onward
```
Honestly, just venting, because I'm extremely depressed. I had the equivalent of a team of engineers I could trust, and overnight someone at Anthropic flicked a switch and killed them. I'm getting better results from random models on OpenRouter now (and OmniCoder 9B! 9B!). They aren't _good_ results, mind you, but they aren't idiotic.
I hear you and I am really hoping more people notice this obvious degradation than dismiss this as workflow or prompt or context saturation issues.
It isn’t obvious but hope the guys managing this realize what kind of confusion and doubt (or self doubt) that this creates in people and will have a long term impact on usage of their models.
I am going to try removing every and all plugins (i only have all Anthropic’s plugins like superpowers) and see if that makes any difference.
Yeah, I went through a week or two of configuration changes trying to figure out what I could have done to make it behave that way, and it wasn't until it repaired itself and then the next morning went back to idiot-mode mid-response that I finally knew it was not me. Same task, same session, same cc version, same prompts, same context, so I'm confident it was a configuration change on their end.
In case anyone can correlate, the recovery happened on March 24th and then re-regressed at approximately 3:09 PM PST (23:09 UTC) on March 25. Flipped right back into "simplest" solutions, and "You're right, I'm sorry" mode:
> "You're right. That was lazy and wrong. I was trying to dodge a code generator issue instead of fixing it."
> "You're right — I rushed this and it shows. Let me be deliberate about the structure before writing."
> "You're right, and I was being sloppy. The CPU slab provider's prefault is real work."
No joke - I've used Windows (and a bit of OS X) my entire life and am old enough now that I didn't think I'd ever be able to switch. A few weeks back I hit the point where I had to upgrade from Windows 10 to 11 and just could not stomach the UX so in frustration I setup Kubuntu w/ Plasma... and it's been amazing. I've tried switching before without the same luck and I think agents like Claude/Codex/etc are the only reason it has stuck this time. Something that's always been unique to Linux is that if there's something I want to change I can generally do that, but now when I want something customized I can _actually_ do it instead of just slotting it into the infinite "if only I had time" bucket. There are quirks for sure (I'm looking at you, PipeWire) but the tinkery-ness of Linux on the desktop went from being friction to a super power for me just this month - maybe others will catch on next year.
I've distro hopped and DE hopped a lot before settling, but it's been amazing for me as somoeone who has switched over from Windows. It just doesn't get in the way, is super familiar for me, AND lets me do a lot of things I wish I had in Windows.
I was worried about the "choice fatigue" due to it being super configurable and all, but honestly the defaults are so sensible I haven't really had a reason to tinker with it much if at all.
+1. I switched from Pop OS to Debian + KDE last week, and KDE has been solid. I too read a handful of articles calling out the choice fatigue, and other than a few tweaks (maybe half an hour?) I was ready to go. I run old-ish hardware (circa 2013) without any issues.
Something notable is that the all the hotkeys felt 'just right'. I had to tinker a bunch in Pop OS to get satisfying hotkey combos, and the COSMIC upgrade reset them all.
As a Settlers 1/2 fan I spent quite a bit of time in The Colonists - can recommend it if you liked the road building/flag mechanics and the chill gameplay.
It also feels like they couldn't use the GOOGLE ANTIGRAVITY logo enough times in this blog post. Gigantic image with the logo and a subtitle, plastered over and over again.
I no longer bother reading their press releases. I'd much rather read the comments and threads like these to get the real story. And I say that as a former googler.
Neat! As someone working in this space and feeling like I've been taking crazy pills from how these "duh, CPU solved this 30 years ago" things keep slipping it's great to see more people bridging the gap! Unfortunately CUDA/HIP (and the entire stack beneath them) virtual memory management ops are very expensive host APIs (remapping a big block of pages can be O(n^2) with page count and fully synchronize host/device (forced wait idle), take kernel locks, etc) so it hasn't been viable in all cases. If your workloads are submit/wait with host in the loop the VM tricks are ok but if you are trying to never block the GPU (pipeline depth > 0) you really want to avoid anything that does a page table modification (until we get GPUs that can pipeline those). vkQueueBindSparse is one of the few async APIs I've seen, and CUDA has cuMemMapArrayAsync but I haven't yet used it (because arrays are annoying and without being able to inspect the driver I'm sure it's probably doing the wrong thing).
I've had good luck with indirection tables used during lookup inside of the kernels consuming/producing the kvcache data - it's essentially user-mode remapping like they do here: you can publish a buffer offset table and threads are uniform, have coalesced reads to the table, and cache the offsets no problem. You have the same memory locality issues as VM (contiguous virtual but potentially random physical) but are not limited to device page sizes and since you can update while work is in-flight you can be much more aggressive about reuse and offload (enqueue DMA to cold storage to evict from VRAM, enqueue DMA to copy from cold memory into reused VRAM, enqueue offset table update, enqueue work using them, repeat - all without host synchronization). You can also defrag in-flight if you do want to try to restore the physical locality. It's nothing crazy and fairly normal in CPU land (or even classic virtual texturing), but in ML GPU land I could write a big paper on it and call it SuperDuperFancyAttention4 and publish press releases...
(Disclaimer: I am one of the authors of the project) Thank you for the thoughtful and insightful comment. I really love the depth of your first paragraph. You highlighted a concern in this space that is often overlooked, and I am glad you raised it. We spent a significant amount of time dealing with the cost of dynamic GPU memory operations.
One useful observation is that LLM inference has almost no host API calls during steady state, since the GPU must stay busy with continuous kernel launches or CUDA graph replay. You are absolutely right that CUDA and HIP virtual memory operations are expensive on the host side and involve heavy driver work. However, they introduce only small stalls in the GPU pipeline, because most of the cost is paid on the host. These operations are also relatively infrequent compared to kernel launches in practice, so we offload them to a background thread to keep them off the critical path. The APIs are not cheap in general, but they happen to fit LLM inference surprisingly well.
On your second point, I guess I follow your idea, although please correct me if I misunderstood. Virtual memory does open the door to paging and offloading, which is also important for LLM systems. We are actively working on this direction in kvcached. Your defragmentation point also reminds me of classic techniques such as compaction and garbage collection. They could certainly help, though I guess the trade off between benefit and complexity would need more careful evaluation.
Thank you again for the thoughtful analysis. It was a pleasure to read. I would be happy to continue the discussion.
As an old school TT/TTD fan this gives me so many good vibes :)
Been fun watching the progress and I do recommend people check out the demos on Steam if you just want to have a good nostalgia break even if the game isn't fully there yet.
Agreed! Looks great, but I did immediately click the pencil to doodle and was disappointed nothing happened. When I created a new document and tried to use the pencil nothing happened. I never figured out how to use it. I tried the Bezier tool and was able to add some nodes but was not able to manipulate them with any of the tools. Maybe dragging is entirely broken on Chrome/Windows?
To select nodes I needed to click on the path in the "Objects" outline on the left side of the screen. Then I could switch to the Nodes tool and select nodes. But after that I can't drag nodes either. Firefox/Linux, so probably something is actually broken not just a compat issue.
Clicking on the path to select them with the Node tool or the Select tool doesn't seem reliable. The Select tool never works, and the Nodes tool I need to click near the nodes (which are invisible) and if I click too fast while searching it thinks I double-clicked and switches to the Select tool.
Maybe some of this makes some sense to the author, and a fix can be forthcoming.
I had to build a stop hook to catch it's garbage, and even then it's not enough. I had 30min-1hr uninterrupted sessions (some slipstreamed comments), and now I can't get a single diff that I can accept without comment. Half of the work it does is more destructive than helpful (removing comments from existing code, ignoring directives and wandering off into nowhere, etc).
From 2 weeks after installing the stop hook (around March 8th): ``` Breakdown of the 173 violations:
Peak day: March 18 with 43 violations in a single day. ```Other one is loops in reasoning, which are something I'm familiar with on small local models, not frontier ones: ``` Sessions containing 5+ instances of reasoning-loop phrases ("oh wait", "actually,", "let me reconsider", "I was wrong"): Period Sessions with 5+ loops Before March 8 0 After March 8 7 (up to 23 instances in one session) ``` (I've even had it write code where it has "Wait, actually, we should do X" in comments in the code!)
The worst is the dodging; it said, literally, "not my code, not my problem" to a build failure it created 5 messages ago in the same session. ``` I had to tell Claude "there's no such thing as [an issue that existed before your changes]" on average:
```Honestly, just venting, because I'm extremely depressed. I had the equivalent of a team of engineers I could trust, and overnight someone at Anthropic flicked a switch and killed them. I'm getting better results from random models on OpenRouter now (and OmniCoder 9B! 9B!). They aren't _good_ results, mind you, but they aren't idiotic.
Sad. Very sad.
reply