The pendulum is swinging back slightly, but I wouldn’t pronounce it dead just yet.
We are seeing a decline of American hegemony, accelerated by this current regime. And the ascendancy of a non-democratic superpower.
However, the largest chunk of GDP and growth still sits firmly in democratic countries and very consequential American elections are happening this year, and in 2028.
There's no way to stop them federally without a full coup since they are administered by the states. The US has a long history of not cancelling election but suppressing votes (e.g. literacy tests, gerrymandering, closing polling locations, etc).
I would look more for voting place shenanigans, voter ID laws with only a weird subset of IDs allowed, radical gerrymandering, and stuff like that. Some of it will be blatantly partisan but also people are using justifications like "restoring trust in elections" to advocate for things that reduce the general franchise. They don't need to do a lot since a few percent is enough to swing the general balance of things.
Hungary isn't the only illiberal democracy within the EU - France, Italy, Slovakia, Romania, Poland, Cyprus, Malta, Slovenia, Latvia, Belgium, Lithuania, Croatia, and Bulgaria are all either Illiberal/Flawed Democracies or Hybrid Regimes according to the EIU ranking [0].
Now that Babis is back in power with the backing of SPD and AUTO, it will also revert back into an Illiberal/Flawed Democracy.
Furthermore, all states on the cusp of EU membership (Albania, Montenegro) are also Illiberal/Flawed Democracies.
> largest chunk of GDP and growth still sits firmly in democratic countries
The only Full Democracies in the 10 largest GDPs are Germany, Japan, and the UK. Japan under Takaichi Sanae is pro-Trump and Germany is likely to see the AfD break it's cordon sanitare by 2029.
Functional doesn't mean "more democratic". What matters is institutions, jurisprudence, and norms.
And after having dealt with the experience of opening a large foreign office in Czechia, there absolutely is a democratic deficit (sure it's extremely efficient, but we just needed to keep a handful of decisionmakers and "phone a (now deceased) friend" in a non-democratic manner).
The index you just cited is calculated out of five sub-numbers, one of whom is literally "functional government", and Czechia for some reason gets rather low 6.4 on this, less than Greece.
First, this is not my experience, and second, much like you I don't think that this is particularly relevant to the democratic character of the country.
I also would like to hear more about the democratic deficit you describe. Most problems around opening anything are caused by bureaucracy, which is obliged to follow norms produced by the lawmakers. Some of these norms are stupid, but that does not mean that they are undemocratic. Voters have the right to be stupid and to elect stupid representatives who produce stupid norms.
The core crux of "democratic character" is providing an even playing field as much as possible institutionally, organizationally, and politically. If functioning is subpar or requires "hacks" or misaligned institutions, it undermines democratic character itself.
Chest-thumping while ignoring the real degradation of institutions in a large portion of Europe is only going to put you back in the same position as the US.
> I also would like to hear more about the democratic deficit you describe
I'd rather not given the incumbent in power and how small the Cybersecurity FDI community in Czechia is. Maybe Vsquare, just not you.
I'd expect a degradation to start in Civil Liberties scores with ANO's plan to abolition of the license fee; merge CT and CRo; and then move to a fully state funded operating model for the NewCo.
I also expect the political culture score to start steadily dropping as SPD and AUTO's competition to "own" the far-right leads to the intensification of culture war discourse, and potentially forces ANO to start opportunistically shifting right as well.
I don't expect "functioning of government" scores to shift significantly either, as the same issues that persisted when I helped my former employer enter Czechia still remain.
Our PortCos will still continue to remain in CZ because once you build that network it makes everything so much easier (and because Israeli founders and operators continue to have a soft spot for CZ), but the manner if which we need to operate in Czechia and maintain closeness with the right people isn't that different from emerging markets.
And that I feel is the crux of the issue in Czechia and much of the CEE - once you know the right 20-30 people or their friends or colleagues, you get the red carpet. Otherwise, it's an uneven playing field.
The model that is being discussed for the public broadcasters is that they will be financed by a certain fixed percentage of the country's GDP, and I don't think that there will be any merging of CT and CRo; there is no agreement on that in the coalition.
"intensification of culture war discourse" Compared to what? There isn't much space left to increase the heat.
"potentially forces ANO to start opportunistically shifting right as well."
ANO is a pensioner's party and given our fertility rate, this is their goldmine. They don't really have to expand their electorate, it expands on its own.
"once you know the right 20-30 people or their friends or colleagues"
Isn't that why people fight to get into Ivy League universities or Ecole Normale Superieure? I am not sure if there is any single nation on Earth where personal connections are unimportant.
> Isn't that why people fight to get into Ivy League universities or Ecole Normale Superieure
Going to Harvard or Yale doesn't mean I have the ability to call a couple people who can pressure someone at the SEC to speed up the review of an S-1 or can pressure a city council to re-zone agricultural land to residential land to build a housing complex, or (using your earlier Eton example) find a SpAd who can put pressure at the SFO to get them off my back.
And more critically, if I find someone to do that, then my competitor will find out and take me to court, and 2-3 years are burnt in negotiating a settlement.
On the other hand, if someone even finds out that I do something like that in CZ, they have no choice but to roll with it because otherwise they will be frozen out from dealflow or ignored when asking for a favor.
And this is why institutions matter, and degradation of institutions are worrisome, becuase they increase the risk profile of opportunities and incentivize zero-sum thinking.
> The model that is being discussed for the public broadcasters is that they will be financed by a certain fixed percentage of the country's GDP
Yet the power of the purse for state media will be removed from the media itself and given to the state, thus reducing CT and CRo's independence. This disincentivizes the publication of politically controversial statements.
-----
Just becuase the US is seeing degradation of institutions does not mean much of Europe is not facing similar problems.
There are hundreds of elderly people in prison right now in the UK charged with supporting terrorism because they opposed a racism inspired nazi-style genocide. Greta was among them.
0 have ever threatened or supported any kind of violence against any person ever.
Social media posts on this topic are treated the same way as holding up a poster in public.
The European countries leaderships were each put in place by its responsible CIA compartment supporting liberal candidates/parties and undermining the competition.
With the current conservative US admin they are supposed to interact with they don't know what to do and likely will do nothing.
CIA is fanatical about following the State Department's foreign policy. Aside from gathering intelligence, they just take the State Department's lead.
A lot "CIA influence" isn't the CIA at all, but the US Government, usually State or DoD, projecting soft power.
I know this sounds pendantic. But whenever someone starts talking about the CIA like it's responsible for "supporting liberal candidates" - all seriousness leaves the room.
> CIA is fanatical about following the State Department's foreign policy
From past personal experience, inter-service autonomy over policymaking is tightly guarded, and arguments always end up with the NSA (advisor, not the agency) where the president essentially becomes the tiebreaker.
Under the current administration, this rivalry has gotten much more intense due to the relatively hands-off management style that has been adopted.
I'm sure fights happen all the time over inter-service autonomy. There was a book written recently about very nasty fighting between the CIA and DEA over whether to support a group of anti-communist guerillas who financed by running drugs.
The CIA and DEA switched positions repeatedly: one day the CIA wanted to support them to fight communism, and the DEA wanted to cut them off to stop the supply of drugs. When communism fell, the CIA saw the group as a liability who knew too much, while the the DEA wanted to pay them to destroy their drug labs and plant licit crops.
The group ended up destroying their drug labs, and focusing on money laundering, ransomware, and crypto-scams, which neither the CIA nor DEA cared about.
But the CIA is very consistent in following state department policies. They jealously guard their ability to delivery intelligence that conflicts with State Department priorities, but they don't have any strong priorities that conflict with those of State.
I'm sure things need to be ironed about by the NSA/NSC. That's normal. But the CIA isn't going fight the State department like they fight the DEA.
I'm open to correction on this. Maybe I'm just not understanding the situation.
> I'm open to correction on this. Maybe I'm just not understanding the situation
It's much more gray simply because there are multiple agencies per department that can interpret and conduct intelligence operations.
The current administration also decided to adopt the private sector practice of letting "middle managers" conduct and implement what they want on their own and only disturb "upper management" if there are irreconcilable differences.
This is why policies change on a dime in the current administration.
Where would you say the "CIA influence" is the strongest, so I could see better what you mean?
I've observed that it's the messy process of democracy that has put the people in power. Sure, big countries (i.e, mostly Russia) would like to tilt governments their way, but it isn't succeeding. I can tell you though that local Facebook pages for newspapers are full of strange comments, seemingly Russian trolls (but I have no proof).
Agreed... also fwiw I don't think that langauge-dependent games are as much of a barrier as it used to be. I've built a game recently that I easily localized first with real-time AI translations and then later with more static language translations.
Anyway I think this would be an amazing thing to let other people contribute to as this is an entire industry of hypercasual games which could easily be ported to this minus the annoying ads
I think the issue with language-dependant games is not just knowing the correct translation - as OP points out, it's more about being funny or clever on the spot, which usually requires a certain level of understanding of the nuances of the language.
Exactly this! Translating the games themselves is not a big deal as that can be automated (although the quality of LLM-translations is not always the best) but when it comes to user generated responses given in a quick timeframe, that's when non-native english players struggle the most, at least in our own friend groups.
In this case I think it came from the very top down — Benioff has been very bullish on AI and they’ve pretty much re-branded behind their Agent Force offerings.
Also probably a part of their go-to-market strategy. If they can prove it internally they can sell it externally.
I'm trying to. I don't know how you know it's working. Maybe, sometimes I do feel present like I am in this building, this town right now.
It is funny, I bought an old phone of mine from the 2010s, I had a different mindset back then (try to make a shit ton of money through ads on a website). That did not happen but I had this ambition/tried to make a lot of dumb apps. I'm trying to get back to that mental state as now I can make like anything, back then I didn't even know how to generate a CSR like come on you amateur!
I use the phone as a grounding tool for meditation/try to go back in time what I was thinking back then. I also loaded it with old cloud photos from that time. It doesn't have internet.
Oh yeah, what does work for grounding you to reality is when you lose internet. Then you're grounded in reality, bored. What do I do with myself now.
after a while it should feel like a refreshing nap. during the meditation itself, you're just doing a simple task and going along with it without resistance, like when sleeping. eventually the idea of "non-doing" will make more sense.
another way to look at it: upon waking each morning, you start with an empty glass. from this point, everything that enters your realm of awareness accumulates in this glass and at some point it will start overflowing if you don't manage what you're accumulating. meaning you can only effectively work with a certain amount of "stuff on your mind". So you shouldn't make a habit of carrying stress from the morning commute all day into affecting your afternoon meetings, for example.
take a few deep breaths and let the morning commute pass, and your glass is empty again. allow the glass to fill up with morning work, noticing and managing points of friction so they don't linger more than necessary. if you notice yourself getting overwhelmed or stressed about everything that comes up, you're overflowing and would likely benefit from some meditation. as you meditate more, it becomes more effortless so you won't be reliant on "doing meditation" as much.
the Plum Village app has a meditation bell that rings on a schedule (default is every 15 minutes). they recommend you take a few deep breaths to re-center and state your intention(s) for the moment. I started using it earlier this year and it has a noticeable effect over time, would highly recommend trying it out all day if possible. or at least during times where you're trying to do focused work but have a tendency to get distracted.
Thanks, I like that waking up idea, it is nice being fresh/blank.
It's crazy how our lives just run on autopilot (following some schedule, scheduled to pay bills, if I do this and that I'm good). The meditation/mindfulness will be good to get grounded/be in the moment. Worries too trying to stop that.
Honest to God, the best meditation for me is Focus 1 from The Gateway Tapes. I'm not sure the woo of astral projection and energy control are for me at all. But when it comes to blanking my mind and existing purely in the exact right now, that does it for me.
> Have you ever tried meditation? Does a great job scratching that ‘boredom’ itch…
No, but I am considering getting a working Amiga, a CRT and just writing some games for it.
All I had growing up was a C64, and I remember how peaceful I felt when I was designing and writing my (simple) games for it. I hankered all through my childhood for an Amiga; any Amiga.
TBH, I might even settle for a C128; just the thrill of writing software with some paper manuals next to me, no internet and no distractions.
Exactly this... I think there will be a golden age of excel replacement SaaS solutions with highly customized UX and workflows for vertical use cases. But, at the same time a lot more competition. Regardless, it will be great for users / companies with these specific problems.
In my experience, the golden age of indie software is about to begin. LLMs and coding agents will make building vertical and niche software much more cost effective.
In the last 3 months, I’ve built and launched a SaaS app to help my sister manage her florist business, and already have other paying customers. Without LLMs, this would have never been feasible because of dev time and/or costs.
> Without LLMs, this would have never been feasible because of dev time and/or costs.
This implies that the ultimate payoff will be quite small, doesn't it? I would think that a "golden age" requires gold, so to speak. A lucrative software business should eventually return profits after costs in the long run.
To me, it doesn't sound like a golden age if the idea is just to break even on development.
Are we just talking about a hobby here, or about becoming a professional indie software developer? Those are two vastly different outcomes. If you can't quit your day job, I wouldn't call it a golden age.
In another comment you said, "it will be great for users / companies with these specific problems." https://news.ycombinator.com/item?id=46360019 But this seems to be changing the subject. The article author is a software developer trying to make a living. A golden age for florists, for example, is not necessarily a golden age for indie software developers.
I agree. As seen in other comments as well, it’s an engineer’s instinct to believe that producing more creates more value. In reality, value is determined by scarcity and usefulness, not output alone.
This. Companies are chomping at the bits about developer productivity and how they can do 10x more. What is not clear even if they can fire 90% of their engineers (assuming the 10x productivity gain is real), how are they expecting that even a tiny sliver of that 90% cannot replicate the products - with AI? And if we are in such a world how are those companies' valuations justified any more?
Yeah exactly. It is basically a commodity at this point - and in commodities margins are like 3% and there is nothing you can compete on except price - which becomes a race to the bottom. And there is no booming industry where this is the case.
But still you can compete on prize or provide proper localization. In your link they share the are based in UK and available in 7 countries. Something that took half a year and a few devs now it can be done by one indie in 1 month living in cheaper country and charging 1/5th and still be happy about it.
Yup, I echo this sentiment. We're about to flourish.
It's never been cheaper and easier to build real value. It's also never been cheaper and easier to build real crap–but, the indie devs who care will build more value with higher velocity and independence. And good indie development will come with it an air of quality that the larger crap will struggle to compete with (at the edges). Not that they'll care, because the big players be making more money off the entrenched behemoths.
But as an indie dev, your incentive structures are far different and far more manageable.
Betteridge's law applies here – if the author truly believed the thesis, they would have declared it as a statement rather than a question.
There's such an opportunity for people to actually explore ideas whose prototyping cost would have been too high with both time/money to not be worth it earlier.
And even outside that perspective, there's a lot of broken corpo software now. The indie hackers are fighting back. See Helium by imputnet, for example. Ghostty by the revered Mitchell Hashimoto is another example of something I daily and is relatively indie.
Corpo-slop seems to be enshittifying at an exponential rate due to decision paralysis and general management talent decay.
As the models have progressively improved (able to handle more complex code bases, longer files, etc) I’ve started using this simple framework on repeat which seems to work pretty well at one shorting complex fixes or new features.
[Research] ask the agent to explain current functionality as a way to load the right files into context.
[Plan] ask the agent to brainstorm the best practices way to implement a new feature or refactor. Brainstorm seems to be a keyword that triggers a better questioning loop for the agent. Ask it to write a detailed implementation plan to an md file.
[clear] completely clear the context of the agent —- better results than just compacting the conversation.
[execute plan] ask the agent to review the specific plan again, sometimes it will ask additional questions which repeats the planning phase again. This loads only the plan into context and then have it implement the plan.
[review & test] clear the context again and ask it to review the plan to make sure everything was implemented. This is where I add any unit or integration tests if needed. Also run test suites, type checks, lint, etc.
With this loop I’ve often had it run for 20-30 minutes straight and end up with usable results. It’s become a game of context management and creating a solid testing feedback loop instead of trying to purely one-shot issues.
As of Dec 2025, Sonnet/Opus and GPTCodex are both trained and most good agent tools (ie. opencode, claude-code, codex) have prompts to fire off subagents during an exploration (use the word explore) and you should be able to Research without needing the extra steps of writing plans and resetting context. I'd save that expense unless you need some huge multi-step verifiable plan implemented.
The biggest gotcha I found is that these LLMs love to assume that code is C/Python but just in your favorite language of choice. Instead of considering that something should be written encapsulated into an object to maintain state, it will instead write 5 functions, passing the state as parameters between each function. It will also consistently ignore most of the code around it, even if it could benefit from reading it to know what specifically could be reused. So you end up with copy-pasta code, and unstructured copy-pasta at best.
The other gotcha is that claude usually ignores CLAUDE.md. So for me, I first prompt it to read it and then I prompt it to next explore. Then, with those two rules, it usually does a good job following my request to fix, or add a new feature, or whatever, all within a single context. These recent agents do a much better job of throwing away useless context.
I do think the older models and agents get better results when writing things to a plan document, but I've noticed recent opus and sonnet usually end up just writing the same code to the plan document anyway. That usually ends up confusing itself because it can't connect it to the code around the changes as easily.
>Instead of considering that something should be written encapsulated into an object to maintain state, it will instead write 5 functions, passing the state as parameters between each function.
Sounds very functional, testable, and clean. Sign me up.
I know this is tongue in cheek, but writing functional code in an object oriented language, or even worse just taking a giant procedural trail of tears and spreading it across a few files like a roomba through a pile of dog doo is ... well.. a code smell at best.
I have a user prompt saved called clean code to make a pass through the changes and remove unused, DRY and refactor - literally the high points of uncle bob's Clean Code. It works shockingly well at taking AI code and making it somewhat maintainable.
>I know this is tongue in cheek, but writing functional code in an object oriented language, or even worse just taking a giant procedural trail of tears and spreading it across a few files like a roomba through a pile of dog doo is ... well.. a code smell at best.
After forcing myself over years to apply various OOP principles using multiple languages, I believe OOP has truly been the worst thing to happen to me personally as engineer. Now, I believe what you actually see is just an "aesthetics" issue, moreover it's purely learned aesthetics.
> As of Dec 2025, Sonnet/Opus and GPTCodex are both trained and most good agent tools (ie. opencode, claude-code, codex) have prompts to fire off subagents during an exploration (use the word explore) and you should be able to Research without needing the extra steps of writing plans and resetting context. I'd save that expense unless you need some huge multi-step verifiable plan implemented.
Does the UI shows clearly what portion was done by a subagent?
The UI (terminal) in Claude code will tell you if it has launched a subagent to research a particular file or problem. But it will not be highlighted for you, simply displayed in its record of prompts and actions.
Nothing will really work when the models fail at the most basic of reasoning challenges.
I've had models do the complete opposite of what I've put in the plan and guidelines. I've had them go re-read the exact sentences, and still see them come to the opposite conclusion, and my instructions are nothing complex at all.
I used to think one could build a workflow and process around LLMs that extract good value from them consistently, but I'm now not so sure.
I notice that sometimes the model will be in a good state, and do a long chain of edits of good quality. The problem is, it's still a crap-shoot how to get them into a good state.
In my experience this was an issue 6-8 months ago. Ever since Sonnet 4 I haven’t had any issues with instruction following.
Biggest step-change has been being able to one-shot file refactors (using the planning framework I mentioned above). 6 months ago refactoring was a very delicate dance and now it feels like it’s pretty much streamlined.
I recently ran into two baffling, what felt like GPT 3.5 era completely backwards misinterpretations of an unambiguous sentence once each in Codex and CC/Sonnet a few days apart in completely different scenarios (both very early in the context window). And to be fair, they were notable partially as an "exception that proves the rule" where it was surprising to see but OP's example can definitely still happen in my experience.
I was prepared to go back to my original message and spot an obvious-in-hindsight grey area/phrasing issue on my part as the root cause but there was nothing in the request itself that was unclear or problematic, nor was it buried deep within a laundry list of individual requests in a single message. Of course, the CLI agents did all sorts of scanning through the codebase/self debate/etc in between the request and the first code output. I'm used to how modern models/agents get tripped up by now so this was an unusually clear cut failure to encounter from the latest large commercial reasoning models.
In both instances, literally just restating the exact same request with "No, the request was: [original wording]" was all it took to steer them back and didn't become a concerning pattern. But with the unpredictability of how the CLI agents decide to traverse a repo and ingest large amounts of distracting code/docs it seems much too over confident to believe that random, bizarre LLM "reasoning" failures won't still occur from time to time in regular usage even as models improve given their inherent limitations.
(If I were bending over backwards to be charitable/anthropomorphize, it would be the human failure mode of "I understood exactly what I was asked for and what I needed to do, but then somehow did the exact opposite, haha oops brain fart!" but personally I'm not willing to extend that much forgiveness/tolerance to a failure from a commercial tool I pay for...)
It's complicated. Firstly, don't love that this happens. But the fact you're not willing to provide tolerance to a commercial tool that costs maybe a few hundred bucks a month but are willing to do so for a human who probably costs thousands of bucks a month is revealing of a double standard we're all navigating.
Its like the fallout when a waymo kills a "beloved neighborhood cat". I'm not against cats, and I'm deeply saddened at the loss of any life, but if it's true that (comparable) mile for mile, waymos reduce deaths and injuries, that is a good thing - even if they don't reduce them to zero.
And to be clear, I often feel the same way - but I am wondering why and whether it's appropriate!
For me I was just pointing out some interesting and noteworthy failure modes.
And it matters. If the models struggle sometimes with basic instruction following, they're can quite possibly make insidious mistakes in large complex tasks that you might no have the wherewithal or time to review.
The thing about good abstractions is that you should be able to trust in a composable way. The simpler or more low-level the building blocks, the more reliable you should expect them to be. In LLMs you can't really make this assumption.
I mean, we typically architect systems depending on humans around an assumption of human fallibility. But when it comes to automation, randomly still doing the exact opposite even if somewhat rare is problematic and limits where and at what scale it can be safely deployed without needing ongoing human supervision.
For a coding tool it’s not as problematic as hopefully you vet the output to some degree but it still means I have don’t feel comfortable using them using them as expansively (like the mythical personal assistant doing my banking and replying to emails, etc) as they might otherwise be used with more predictable failure modes.
I’m perfectly comfortable with Waymo on the other hand, but that would probably change if I knew they were driven by even the newest and fanciest LLMs as [toddler identified | action: avoid toddler] -> turns towards toddler is a fundamentally different sort of problem.
I'm curious in what kinda if situations you are seeing the model the do opposite of your intention consistently where the instructions were not complex. Do you have any examples?
Mostly gemini 3 pro when I ask to investigate a bug and provide fixing options (i do this mostly so i can see when the model loaded the right context for large tasks) gemini immediately starts fixing things and I just cant trust it
Codex and claude give a nice report and if I see they're not considering this or that I can tell em.
but, why is it a big issue? if it does something bad, just reset the worktree and try again with a different model/agent? They are dirt cheap at 20/m and I have 4 subscription(claude, codex, cursor, zed).
Same I have multiple subscription and layer them. I use haiku to plan and send queue of task to codex and gemini whose command line can be scripted
The issue to me is that I have no idea of what the code looks like and have to have a reliable first layer model that can summarize current codebase state so I can decide whether the next mutation moves the project forward or reduces technical debt. I can delegate much more that way, while gemini "do first" approach tend to result in many dead ends that I have to unravel.
The issue is that if it's struggling sometimes with basic instruction following, it's likely to be making insidious mistakes in large complex tasks that you might no have the wherewithal or time to review.
The thing about good abstractions is that you should be able to trust in a composable way. The simpler or more low-level the building blocks, the more reliable you should expect them to be. In LLMs you can't really make this assumption.
I'm not sure you can make that assumption even when a human wrote that code. LLMs are competing with humans not with some abstraction.
> The issue is that if it's struggling sometimes with basic instruction following, it's likely to be making insidious mistakes in large complex tasks that you might no have the wherewithal or time to review.
Yes, that's why we review all code even when written by humans.
We've taken those prompts, tweaked them to be more relevant to us and our stack, and have pulled them in as custom commands that can be executed in Claude Code, i.e. `/research_codebase`, `/create_plan`, and `/implement_plan`.
It's working exceptionally well for me, it helps that I'm very meticulous about reviewing the output and correcting it during the research and planning phase. Aside from a few use cases with mixed results, it hasn't really taken off throughout our team unfortunately.
I don't do any of that. I find with GitHub copilot and Claude sonnet 4.5 if I'm clear enough about the what and where it'll sort things out pretty well, and then there's only reiteration of code styling or reuse of functionality. At that point it has enough context to keep going. The only time I might clear that whole thing is if I'm working on an entirely new feature where the context is too large and it gets stuck in summarising the history. Otherwise it's good. But this in codespaces. I find the Tasks feature much harder. Almost a write-off when trying to do something big. Twice I've had it go off on some strange tangent and build the most absurd thing. You really need to keep your eyes on it.
Yeah I found that for daily work, current models like Sonnet/Opus 4.5, Gemini 3.0 Pro (and even Flash) work really well without planning as long as I divide and conquer larger tasks into smaller ones. Just like I would do if I was programming myself.
For planning large tasks like "setup playwright tests in this project with some demo tests" I spend some time chatting with Gemini 3 or Opus 4.5 to figure out the most idiomatic easy-wins and possible pitfalls. Like: separate database for playwright tests. Separate users in playwright tests. Skipping login flow for most tests. And so on.
I suspect that devs who use a formal-plan-first approach tend to tackle larger tasks and even vibe code large features at a time.
I’ve had some luck with giving the LLM an overview of what I want the final version to do, but then asking it to perform smaller chunks. This is how I’d approach it myself — I know where I’m trying to go, and will implement smaller chunks at a time. I’ll also sometimes ask it to skip certain functionality - leaving a placeholder and saying we’ll get back to it later.
Same. I find that if I can piecemeal explain the desired functionality and work as I would pairing with another engineer that it’s totally possible to go from “make me a simple wheel with spokes” to “okay now let’s add a better frame and brakes” with relatively little planning, other than what I’d already do when researching the codebase to implement a new feature
It's quite interesting because it makes me wonder how we make it efficient and predictable. The human language is just too verbose. There must be some DSL, some more refined way to get to the output we need. I don't know whether it means you actually just need to provide examples or something else. But you know code is very binary, do this do that. LLMs are really just too verbose even in this format right now. That higher layer really needs a language. I mean I get it. It's understanding human language and converting it to code. Very clever. But I think we can do better.
This is essentially my exact workflow. I also keep the plan markdown files around in the repo to refer agents back to when adding new features. I have found it to be a really effective loop, and a great way to reprime context when returning to features.
Exactly this. I clear the old plans every few weeks.
For really big features or plans I’ll ask the agent to create linear issue tickets to track progress for each phase over multiple sessions. Only MCP I have loaded is usually linear but looking for a good way to transition it to a skill.
In general anything with an API is simply saying "find the auth token at ~/.config/foo.json". It mostly knows the rest endpoints and can figure out the rest
I’m uneasy having an agent implement several pages of plan and then writing tests and results only at the and of all that. It feels like getting a CS student to write and follow a plan to do something they haven’t worked on before.
It’ll report, “Numbers changed in step 6a therefore it worked” [forgetting the pivotal role of step 2 which failed and as a result the agent should have taken step 6b, not 6a].
Or “there is conclusive evidence that X is present and therefore we were successful” [X is discussed in the plan as the reason why action is NEEDED, not as success criteria].
I _think _ that what is going wrong is context overload and my remedy is to have the agent update every step of the plan with results immediately after action and before moving on to action on the next step.
When things seem off I can then clear context and have the agent review results step by step to debug its own work: “review step 2 of the results. Are the stated results confident with final conclusions? Quote lines from the results verbatim as evidence.”
100%, the reason I thought of this is constantly telling developers to break their work down into smaller pieces so that they can focus and the customer sees value sooner.
One of the things I like about LLM coding is that I don't need to become a psychologist in order to persuade other humans to approach their work in an manner I'd prefer.
Highly recommend using agent based hooks for things like `[review & test]`.
At a basic level, they work akin to git-hooks, but they fire up a whole new context whenever certain events trigger (E.g. another agent finishes implementing changes) - and that hook instance is independent of the implementation context (which is great, as for the review case it is a semi-independent reviewer).
I agree this can work okay, but once I find myself doing this much handholding I would prefer to drive the process myself. Coordinating 4 agents and guiding them along really makes you appreciate the mythical-man-month on the scale of hours.
Did some early qualitative testing on this. Definitely seems easier for Claude to handle than playwright MCP servers for one-off web dev QA tasks. Not really built for e2e testing though and lacks the GUI features of cursors latest browser integration.
Also seems quite a bit slower (needs more loops) do to general web tasks strictly through the browser extension compared to other browser native AI-assistant extensions.
Overall —- great step in the right direction. Looks like this will be table stakes for every coding agent (cli or VS Code plugin, browser extension [or native browser])
We are seeing a decline of American hegemony, accelerated by this current regime. And the ascendancy of a non-democratic superpower.
However, the largest chunk of GDP and growth still sits firmly in democratic countries and very consequential American elections are happening this year, and in 2028.
The real question is, will Europe find its spine?
reply