One way is trying to sneak in a specific structure/pattern that is difficult for...

nonethewiser · on July 25, 2023

> One way is trying to sneak in a specific structure/pattern that is difficult for a human to notice when reading

This seems like a total non-starter. That can only negatively impact the answers. A solution needs to be totally decoupled from answer quality.

thewataccount · on July 25, 2023

The paper I linked in the parent's comment as the "Simple proof of concept" on page 2, and like you said outlines it's limitations as both negative to performance and also easily detectable and determinable.

Their improved method instead only replaces tokens when there's many good choices available, and skips replacing tokens when there are few good choices. "The quick brown fox jumps over the lazy dog" - "The quick brown" is not replaceable because it would severely harm the quality.

Essentially it's only replacing tokens where it won't harm the performance.

It's worth noting that any watermarking will likely harm the quality to some degree - but it can be minimized to the point of being viable.

yttribium · on July 25, 2023

You can do this by injecting non visible unicode (LTR / RTL markers, zero width separators, the various "space" analogs, homographs of "normal" characters) but it can obviously be stripped out.