There are definitely some good arguments why it wouldn't be defensible (e.g. the phone book argument), but of course this is all new and remains to be seen.
The other interesting aspect of this is that they're classifying it as unauthorized content distribution. Meta was already distributing the weights, but limited their distribution to "researchers" with approved credentials. It was one of those researchers who leaked the weights originally. So it's not like they were reverse engineered from a binary or exfiltrated out of FB HQ. That might be an important bit of nuance.
The phone book argument is super interesting. I wonder if it would hold, though: Wikipedia suggests it relies on the re-user having a new “selection & arrangement.” Kinda tough to rearrange model weights!
Yeah, it seems that would require a fairly fundamental shift in how copyright is understood in the U.S. Not something that should happen during routine processing of DMCA notices.
The EU is different, it recognizes copyright-like rights in databases and database works, which is why the cavalier attitude of U.S.-oriented organizations to these matters tends to annoy me. For example, the FSF does not actually check that certain non-code data files are legally unencumbered. They merely disclaim any copyright of their own. But for all we know, that could be wishful thinking.
AI weights are as much a mechanical, non-creative result of the training set and the netowrk initial state as other AI outputs are a mechanical, non-creative result of the prompt and the network trained state, so except that the input is bigger for the weights case for the same model as it would be for the output case, I don't see a difference.
I agree. But it's not like we could force companies to release the weights of their models. We might obtain them in other ways, such as unauthorized leaks, or reverse engineering. But if the company didn't intend to release them, does that mean they're protectable as a trade secret? And if so, are trade secrets enforceable in the same way as copyrights?
(Also, another element to this case is that GitHub is owned by Microsoft, who has a conflict of interest with Meta in terms of ChatGPT vs. LLaMA)
(Also: great username!)
EDIT: DMCA here [0]. It does sound like they're asserting copyright on the "content," i.e. presumably the weights themselves.
[0] https://github.com/github/dmca/blob/master/2023/03/2023-03-2...