I agree with most of this, with one important exception: you should have some form of sandboxing in place before running any local AI agent. The easiest way to do that is with .claude/settings.json[0].
This is important no matter how experienced you are, but arguable the most important when you don't know what you're doing.
0: or if you don't want to learn about that, you can use Claude Code Web
The part about permissions with settings.json [0] is laughable. Are we really supposed to list all potential variations of harmful commands? In addition to the `Bash(cat ./.env)`, we would also need to add `Bash(cat .env)`, Bash(tail ./.env)`, Bash(tail .env)`, `Bash(head ./.env)`, `Bash(sed '' ./.env)`, and countless others... while at the same time we allow something like `npm` to run?
I know the deny list is only for automatically denying, and that non-explicitly allowed command will pause, waiting for user input confirmation. But still it reminds me of the rationale the author of the Pi harness [1] gave to explain why there will be no permission feature built-in in Pi (emphasis mine):
> If you look at the security measures in other coding agents, *they're mostly security theater*. As soon as your agent can write code and run code, it's pretty much game over. [...] If you're uncomfortable with full access, run pi inside a container or use a different tool if you need (faux) guardrails.
As you mentioned, this is a big feature of Claude Code Web (or Codex/Antigravity or whatever equivalent of other companies): they handle the sand-boxing.
Yes. I don't bother with that. I feel like the risk of Claude Code running amok is pretty low, and I don't have it do long-running tasks that exceeds my desire to monitor it. (Not because I'm worried about it breaking things, it's just I don't use the tool in that way.)
Let's not fool ourselves here. If a security feature adds any amount of friction at all, and there's a simple way to disable it, users will choose to do so.
I'm sure most folks run Claude without isolation or sandboxing. It's a terrible idea, but even most professional software developers don't think much about security.
There many decent options (cloud VMs, local VMs, Docker, the built-in sandboxing). My point is just that folks should research and set up at least one of them before running an agent.
How did you contain Claude Code? Did you virtualize it? I just set up a simple firejail script for it. Not completely sure if it's enough but it's at least something.
You can download the devcontainer CLI and use it to start a Docker container with a working Claude Code install, simple firewall, etc. out of the box. (I believe this is how the VSCode extension works: It uses this repo to bootstrap the devcontainer).
This is broadly true, but not comparable when you get into any detail. The mistakes current frontier models make are more frequent, more confident, less predictable, and much less consistent than mistakes from any human I'd work with.
IME, all of the QA measures you mention are more difficult and less reliable than understanding things properly and writing correct code from the beginning. For critical production systems, mediocre code has significant negative value to me compared to a fresh start.
There are plenty of net-positive uses for AI. Throwaway prototyping, certain boilerplate migration tasks, or anything that you can easily add automated deterministic checks for that fully covers all of the behavior you care about. Most production systems are complicated enough that those QA techniques are insufficient to determine the code has the properties you need.
> The mistakes current frontier models make are more frequent, more confident, less predictable, and much less consistent than mistakes from any human I'd work with.
my experience literal 180 degrees from this statement. and you don’t normally get the choose humans you work with, some you may be involved in the interview process but that doesn’t tell you much. I have seen so much human-written code in my career that, in the right hands, I’ll take (especially latest frontier) LLM written code over average human code any day of the week and twice on Sunday
This is a couple of years old now, but at one point Janelle Shane found that the only reliable way to avoid being flagged as AI was to use AI with a certain style prompt
I had the same experience as peer comments. I'm on Pixel 8 and Google Fi. When I check for updates, I'm told I'm up-to-date with the last update being over a month old.
You should see an "unvote" or "undown" link to the right of the timestamp (i.e. the opposite side from where the vote arrows were). It's fairly subtle.
Yeah, I never send a PR out without reviewing each commit myself and adding GitHub comments when I think it's relevant. Sometimes a PR is clear enough that I don't feel the need to add comments, though.
I'd say "good old days" thinking is probably involved, but not the full explanation. Over the past few decades, software has gone from a fairly obscure profession to being seen as a great way (maybe the best way) to make a lot of money. In absolute numbers, there are probably at least as many engaged, curious engineers as before. There are almost certainly drastically more uninterested engineers who are there partially or fully because of the money, though.
Dunno. I’ve been at this since the late ‘80’s, and have run into precious few developers who were interested in software and programming for its own sake. For most of them it was just a job.
Have you noticed any change in that trend in the past year or two, or is it continuing to get better?
reply