> It's not implausible that an LLM could generate a verbatim article it was never even trained on if you pushed on it hard enough, especially if it was trained on writing in a similar style and other coverage of the same event.
That'd be a coincidence, not a verbatim copy. Copyright law doesn't prohibit independent creation. This defense isn't available to OpenAI because there is no dispute OpenAI ingested the NYTimes articles in the first place. There is no plausible way OpenAI could say they never had access to the articles they are producing verbatim copies of.
Rather than sneeringly explain away how LLMs work without any eye towards the laws at issue, maybe you should do yourself the favor of learning about them so you can spare us this incessent "no let me explain how they work, it's fine I swear!" shtick.
It would be both. Or to put it a different way, how would you distinguish one from the other?
> This defense isn't available to OpenAI because there is no dispute OpenAI ingested the NYTimes articles in the first place.
The question remains whether ingesting the article is the reason it gets output in response to a given prompt, when it could have happened either way.
And in cases where you don't know, emitting some text is not conclusive evidence that it was in the training data. Most of the text emitted by LLMs isn't verbatim from the training data.
> Rather than sneeringly explain away how LLMs work without any eye towards the laws at issue, maybe you should do yourself the favor of learning about them so you can spare us this incessent "no let me explain how they work, it's fine I swear!" shtick.
This is a case of first impression. We don't really know what they're going to do yet. But "there exists some input that causes it to output the article" isn't any kind of offensive novelty; lots of boring existing stuff does that when the input itself is based on the article.
>It would be both. Or to put it a different way, how would you distinguish one from the other?
No, it's not both. Have you engaged in any effort to understand the law here? Copyright doesn't prohibit independent creation. I'm not sure how much more simple I can make that for you. In one scenario there is copying, in the other there isn't. The facts make it clear, when something is copied it is illegal.
>The question remains whether ingesting the article is the reason it gets output in response to a given prompt, when it could have happened either way.
This can't actually be serious? This isn't credible. You are saying there is no difference between ingesting it and outputting the results vs not ingesting it and outputting the results. Anything to back this up at all?
>This is a case of first impression. We don't really know what they're going to do yet. But "there exists some input that causes it to output the article" isn't any kind of offensive novelty; lots of boring existing stuff does that when the input itself is based on the article.
"First impression" (something you claim) doesn't mean ignore existing copyright law. One side is arguing this isn't first impression at all, it's just rote copying.
> But "there exists some input that causes it to output the article" isn't any kind of offensive novelty
You said its novel, I called it plain copying.
>lots of boring existing stuff does that when the input itself is based on the article.
That'd be a coincidence, not a verbatim copy. Copyright law doesn't prohibit independent creation. This defense isn't available to OpenAI because there is no dispute OpenAI ingested the NYTimes articles in the first place. There is no plausible way OpenAI could say they never had access to the articles they are producing verbatim copies of.
Rather than sneeringly explain away how LLMs work without any eye towards the laws at issue, maybe you should do yourself the favor of learning about them so you can spare us this incessent "no let me explain how they work, it's fine I swear!" shtick.