Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> The agent should be treated as an untrusted user in your client,

An untrusted user in a client is a hacker/invasor, not an agent.



That’s not really a reason not to treat the agent like it’s “rogue”. The point is, if it accepts any untrusted inputs then, from a security perspective, it is possible for any given (untrusted) input to contain a prompt injection payload that jailbreaks the model and tells it to do things it shouldn’t do.

As such, it can be told to do bad stuff in a way that can’t be prevented and therefore should not be given read access to anything you don’t want others to know about, nor write access to any data of which you care about the integrity.


That is out of scope of the service. What kind of user agent the actual user deputizes to interact with a service, is the user's own choice and responsibility. In general, it's not something a service can solve on their end.


Services can certainly make this safer by providing means to get more restricted credentials, so that users can deputize semi-trusted delegates, such as agents vulnerable to injection.

The important point being made in this discussion is that this is already a common thing with OAuth, but mostly unheard of with web sessions and cookies.


an untrusted, but permitted, user is why sandboxes exist. There are plenty of times you want to allow an untrusted user to have capabilities in a system, that's why you restrict those capabilities.


a sandboxed user is not an untrusted user of the client but an unstrusted user of the host, that is why the client is sandboxed.


sandboxing is a general term for actor isolation, and its context agnostic.

For example, when you use the sandbox attribute on an iframe in a web application, it's not the user that's untrusted, it's some other user that's attempting to trigger actions in your client.


I've thought more about this and I think the only way to make completely sure that sensitive data does not get leaked is by making sure it never makes it into the models context in the first place.

The issue is even if the MCP-B extension makes it so the user has to give confirmation when the agent want's to call a tool on a new domain after interacting with another domain, there is no clear way to determine if a website is malicious or not.

A solution to this might be to give server owners the ability to write the restricted data to extension storage on tool response instead of returning it to the models context. Instead, a reference to this location in extension storage get's passed to the model. The model then has the ability to "paste" this value into other website via tool call without ever actually seeing the value itself.

That way, MCP-B can put lots of warnings and popups when this value is requested to be shared.

Any thoughts?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: