Good question. Is there a copy of the actual DMCA notice available? I'd like to ...

thfuran · on March 24, 2023

Is that a defensible claim in the US? No human designed those weights; they're the output of some algorithm run on a bunch of data.

chatmasta · on March 24, 2023

There are definitely some good arguments why it wouldn't be defensible (e.g. the phone book argument), but of course this is all new and remains to be seen.

The other interesting aspect of this is that they're classifying it as unauthorized content distribution. Meta was already distributing the weights, but limited their distribution to "researchers" with approved credentials. It was one of those researchers who leaked the weights originally. So it's not like they were reverse engineered from a binary or exfiltrated out of FB HQ. That might be an important bit of nuance.

curiousllama · on March 24, 2023

The phone book argument is super interesting. I wonder if it would hold, though: Wikipedia suggests it relies on the re-user having a new “selection & arrangement.” Kinda tough to rearrange model weights!

aftbit · on March 24, 2023

What about by fine-tuning using LoRA? That would introduce new layers and re-arrange the data for additional uses.

fweimer · on March 24, 2023

Yeah, it seems that would require a fairly fundamental shift in how copyright is understood in the U.S. Not something that should happen during routine processing of DMCA notices.

The EU is different, it recognizes copyright-like rights in databases and database works, which is why the cavalier attitude of U.S.-oriented organizations to these matters tends to annoy me. For example, the FSF does not actually check that certain non-code data files are legally unencumbered. They merely disclaim any copyright of their own. But for all we know, that could be wishful thinking.

chatmasta · on March 24, 2023

Early indications [0] [1] from the US Copyright Office seem to suggest AI-produced works will not be copyrightable:

> If a work's traditional elements of authorship were produced by a machine, the work lacks human authorship and the Office will not register it.

However, this guidance seems primarily about the stuff you create _with_ AI, not necessarily the model weights used in the AI itself.

[0] https://www.federalregister.gov/documents/2023/03/16/2023-05...

[1] Discussed on HN: https://news.ycombinator.com/item?id=35191206

dragonwriter · on March 24, 2023

AI weights are as much a mechanical, non-creative result of the training set and the netowrk initial state as other AI outputs are a mechanical, non-creative result of the prompt and the network trained state, so except that the input is bigger for the weights case for the same model as it would be for the output case, I don't see a difference.

chatmasta · on March 24, 2023

I agree. But it's not like we could force companies to release the weights of their models. We might obtain them in other ways, such as unauthorized leaks, or reverse engineering. But if the company didn't intend to release them, does that mean they're protectable as a trade secret? And if so, are trade secrets enforceable in the same way as copyrights?

(Also, another element to this case is that GitHub is owned by Microsoft, who has a conflict of interest with Meta in terms of ChatGPT vs. LLaMA)

dragonwriter · on March 24, 2023

The only way you can use the DMCA is around copyright-protected content, its a copyright-law rule, not a general C&D.