All of those things are smells imo, you should be very weary of any code output from a task that causes that much thrashing to occur. In most cases it’s better to rewind or reset and adapt your prompt to avoid the looping (which usually means a more narrowly defined scope)
A person has a supervision budget. They can supervise one agent in a hands-on way or many mostly-hands-off agents. Even though theres some thrashing assistants still get farther as a team than a single micromanaged agent. At least that’s my experience.
Just curious, what kind of work are you doing where agentic workflows are consistently able to make notable progress semi-autonomously in parallel? Hearing people are doing this, supposedly productively/successfully, kind of blows my mind given my near-daily in-depth LLM usage on complex codebases spanning the full stack from backend to frontend. It's rare for me to have a conversation where the LLM (usually Opus 4.6 these days) lasts 30 minutes without losing the plot. And when it does last that long, I usually become the bottleneck in terms of having to think about design/product/engineering decisions; having more agents wouldn't be helpful even if they all functioned perfectly.
I've passed that bottleneck with a review task that produces engineering recommendations along six axis (encapsulation, decoupling, simplification, dedoupling, security, reduce documentation drift) and a ideation tasks that gives per component a new feature idea, an idea to improve an existing feature, an idea to expand a feature to be more useful. These two generate constant bulk work that I move into new chat where it's grouped by changeset and sent to sub agent for protecting the context window.
What I'm doing mostly these days is maintaining a goal.md (project direction) and spec.md (coding and process standards, global across projects). And new macro tasks development, I've one under work that is meant to automatically build png mockup and self review.
I work on 1M LOC 15 yr old repo. Like you it's across the full stack. Bugs in certain pieces of complex business logic would have catastrophic consequences for my employer. Basically I peel poorly-specific work items off my queue into its own worktree and session at high reasoning/effort and provide a well-specified prompt.
These things eat into my supervision budget:
* LLM loses the plot and I have to nudge (like you)
* Thinking hard to better specify prompts (like you)
* Reviewing all changes (I do not vibe code except for spikes or other low-risk areas)
* Manual thing I have to do (for things I have not yet automated with a agent-authored scripts)
* Meetings
* etc
So, yes, my supervision budget is a bottleneck. I can only run 5-8 agents at a time because I have only so much time in the day.
Compare that vs a single agent at high reasoning/effort: I am sitting waiting for it to think. Waiting for it to find the code area I'm talking about takes time. Compiling, running tests, fixing compile errors. A million other things.
Any time I find myself sitting and waiting, this is a signal to me to switch to a different session.
I'm seeing your comment's downvoted, I'd like to hear from those that did as to why. Doesn't his current venture with his AGI startup Keen Technologies deserve being called out as a potential conflict of interest, here?
Yes, but likely in the exact inverse than what is implied here. Carmack has generational wealth, he is likely fine financially regardless of how AI pans out. The many individuals who feel they should be financially compensated for code they open sourced are likely far more invested financially in that particular outcome.
Are you telling me that my home assistant enabled humidity sensors in my garden that trigger the arduino hose valve could just be replaced by a watering can??
I'm on FB primarily because my local buy-nothing group is on it, so I am logging in multiple times a day. I'm so used to this slop it's pretty funny at this point, but as is the case with all social media, you tune your algorithm as you engage. At this point it pushes things like cooking videos and hockey clips more than the AI slop for me.
Sometimes I'll go down a rabbit hole of clicking AI generated videos just because my curiosity is piqued, and then I'll be stuck getting that slop fed to me for the next week. I have to make a mental note to actively disengage with it as quickly as possible to tip the algo in the other direction.
I can't think of an interviewer who interjects their viewpoint more and tries to get his guest to acknowledge/agree to his typically shallow level analysis than Lex. The only redeeming quality about his podcast are the guests he gets. I don't think Dwarkesh is great but he's leagues better.
I just don't understand this view on Lex Fridman at all.
Fridman is quite good at letting the guest speak. The whole show is exceptionally good at keeping a conversation moving.
I think there are technical haters on Lex but that is stupid because Lex is in sales. He is selling a podcast. From a sales perspective, Lex is incredibly good.
It is like saying the chef is only a good cook because of the quality of the ingredients. Yes, exactly. The chef isn't a farmer growing their own organic vegetables for the dishes. The art is in the choice and ability to source quality ingredients and then bring it all together as a full dish.
I guess you're right - getting your podcast big enough that it becomes a necessary checkbox for book/media tours is a skill.
You're correct that he brings absolutely nothing to the podcast, but he interrupts plenty - usually with superficial pet theories about the "oneness of the universe" or "how all we need is love, actually". He never seems well prepared for his guest beyond a chatgpt summary, never gets any kind of interesting answer out of a guest that they weren't already going to give, just absolutely zero criticality to anything in the interview.
A podcast with guests is an interview. Interviewing is a skill. The difference between a good and bad interviewer is night and day.
Helping out with a freelance project I built 15 years ago. It didn’t end on the best of terms, but the relationship has since been repaired (and I’m much better at managing my time now)
It’s been fun to come back to, most of the code I wrote still drives the business (it’s just far outdated).
I was pretty early on in my career when I wrote it, so seeing my mistakes and all the potential areas to improve has been very interesting. It’s like buying back your old high school Camaro that you used to wrench on.
Adaptive cruise control requires some degree of lane detection. It has to figure out what car it's actually following, not merely what car is in front of it. (The road is turning, the car in front of you can easily not be the car you are actually behind.)
Lane keep keeps your car in the lane so you can stop paying attention just like cruise control keeps you going the same speed so you can stop paying attention… they don’t.
They are just aids that ease fatigue on long trips.
The "fatigue" from long trips is hardly a result of having to keep in a lane.
It's more so the result of being awake, doing effectively nothing, for a long time. Lane Keep assistance is a useless technology for 99% of the population and the 1% who need it, likely shouldn't be driving a car anyways.
The more we "aid" fatigue, the longer drivers will attempt to drive. This cannot be a good outcome. The worst driving occurs when one is practically half asleep.
I’m not referring to mental fatigue, but the physical ergonomic fatigue simply from continually activating muscles in a narrow range of motion even over a couple of hours.
If you’ve ever driven a 1970s truck you’ll know that continually correcting the steering will wear you out after just a couple of hours. Modern rack and pinion steering is a lot more comfortable, and lane keep is a further comfort improvement.
I dunno, when you've made about 10,000 clay pots its kinda nice to skip to the end result, you're probably not going to learn a ton with clay pot #10,001. You can probably come up with some pretty interesting ideas for what you want the end result to look like from the onset.
I find myself being able to reach for the things that my normal pragmatist code monkey self would consider out of scope - these are often not user facing things at all but things that absolutely improve code maintenance, scalability, testing/testability, or reduce side effects.
Depends on the problem. If the complexity of what you are solving is in the business logic or, generally low, you are absolutely right. Manually coding a signup flow #875 is not my idea of fun either. But if the complexity is in the implementation, it’s different. Doing complex cryptography, doing performance optimization or near-hardware stuff is just a different class of problems.
> If the complexity of what you are solving is in the business logic or, generally low, you are absolutely right.
The problem is rather that programmers who work on business logic often hate programmers who are actually capable of seeing (often mathematical) patterns in the business logic that could be abstracted away; in other words: many business logic programmers hate abstract mathematical stuff.
So, in my opinion/experience this is a very self-inflected problem that arises from the whole culture around business logic and business logic programming.
Coding signup flow #875 should as easy as using a snippet tool or a code generator. Everyone that explains why using an LLM is a good idea always sound like living in the stone age of programming. There are already industrial level tools to get things done faster. Often so fast that I feel time being wasted describing it in english.
In my experience AI is pretty good at performance optimizations as long as you know what to ask for.
Can't speak to firmware code or complex cryptography but my hunch is if it's in it's training dataset and you know enough to guide it, it's generally pretty useful.
Most optimizations are making sure you do not do work that is unnecessary or that you use the hardware effectively. The standard techniques are all you need 99% of the time you are doing performance work. The hard part about performance is dedicating the time towards it and not letting it regress as you scale the team. With AI you can have agents constantly profiling the codebase identifying and optimizing hotspots as they get introduced.
reply