It doesn’t take much. Let’s say you want an assistant that can tell you about im...

__MatrixMan__ · 2025-04-08T02:57:59 1744081079

I'm not sure I'd characterize those two things as "it doesn't take much," that's quite a lot to give to an untrusted entity.

wat10000 · 2025-04-08T12:25:00 1744115100

My whole point is that you must consider this entity to be untrusted, which is pretty strongly at odds with having it act as an agent. It can’t both have access to private data and the outside world.

__MatrixMan__ · 2025-04-08T18:56:59 1744138619

I guess it's just that I've given up on expecting them to be able to police themselves. Even if there was some fundamental change which made it plausible, it would likely be implemented by somebody I don't know or trust--so I'm going to be locking it down via OS-level controls anyway. And since I'm going to do that, doesn't the self-policing part then become redundant?

If it's not allowed to do something, I'd rather it just show me the error it got when it tried and leave it to me to tweak the containment or not. Having it refuse because it's not allowed according to its own internal logic just creates a whole separate set of less-common error messages that I'll have to search for, each of which is opaquely equivalent to one that we have decades of experience with. There is a battle-hardened interface for this sort of thing and reimplementing it internally to the LLM just isn't worth the squeeze.

I will confess that I've previously run untrusted agents (e.g. from CircleCI) as my own user without giving them due scrutiny. And shame on me for doing so. I just don't think that my negligence would be any greater had it contained an LLM.