Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You can't copyright recipes.

https://www.copyright.gov/comp3/chap300/ch300-copyrightable-...

> 313.4(F) Mere Listing of Ingredients or Contents

> A mere listing of ingredients or contents is not copyrightable and cannot be registered with the U.S. Copyright Office. 37 C.F.R. § 202.1(a).

> Examples:

> A list of ingredients for a recipe.

However, you can copyright a cookbook.

> The Office may register a work that explains how to perform a particular activity, such as a cookbook or user manual, provided that the work contains a sufficient amount of text, photographs, artwork, or other copyrightable expression.

https://www.copyrightlaws.com/copyright-protection-recipes/

> If you have a collection of recipes, for example in a cookbook, the collection as a whole is protected by copyright. Collections are protected even if the individual recipes themselves are in the public domain.

https://en.wikipedia.org/wiki/Copyright_in_compilation

> In the copyright law in the United States, such copyright may exist when the materials in the compilation (or "collective work") are selected, coordinated, or arranged creatively such that a new work is produced. Copyright does not exist when content is compiled without creativity, such as in the production of a telephone directory. In the case of compilation copyright, the compiler does not receive copyright in the underlying material, but only in the selection, coordination, or arrangement of that material.

And so, the curation and tagging of a collection of works itself is copyrightable.

The model weights, are done without creativity necessary for copyright, but I believe (I am not a lawyer) can be sufficiently transformative to not be encumbered as a derivative work.

The output of the model is ineligible for copyright as it was created by a machine and copyright in the US requires human authorship.

The human publishing a work created by the model may be publishing a work that is sufficiently similar an existing one either deliberately (prompt: a mouse in the style of Disney with red pants) or through an accidental memorization in the model ( https://arstechnica.com/information-technology/2023/02/resea... ) needs to be diligent in verifying that anything that they (the human) publish is not derivative of a copyrighted work.



Given the content used to create the model was created mostly by humans, what separates copyright being granted to a collection of text files (source code) being run through a highly mechanized process (compiled) to produce a copyrightable work (Adobe Photoshop)?

Sometimes there are expanded rights on the text files (eg LGPL, or public domain) that still result in the output of a mechanical process applied to those text files, along with some creativity on accompanying text files (source code calling that library), with a mechanical process applied to it to still achieve a copyrightable work (any binary that calls an LGPL library, or uses public domain code). This is to say, Facebook need to show some level of creativity, which opinions about the contents of their data set would count as ("This subreddit is toxic, that subreddit is good stuff...").

If recipe books are copyrightable, I have a hard time seeing ML models as not being covered.


This is described more in Copyright in Derivative Works and Compilations https://www.copyright.gov/circs/circ14.pdf

> Compilations of data or compilations of preexisting works (also known as “collective works”) may also be copyrightable if the materials are selected, coordinated, or arranged in such a way that the resulting work as a whole constitutes a new work. When the collecting of the preexisting material that makes up the compilation is a purely mechanical task with no element of original selection, coordination, or arrangement, such as a white-pages telephone directory, copy-right protection for the compilation is not available.


Interesting! My "I'm not a lawyer" read of that is that if Facebook did actually inject some opinion like that some specific subreddit is toxic, then the model would be covered under copyright.


I don't believe that would be quite right.

If Facebook were to have a collection of posts and then, and then had humans go through and tag them and filter them for... lets say... "from 'bros'" (just as a slightly silly example but one that implies some curation of the data).

That collection of posts (the Bro Data Set) would be something that could be copyrighted as a collection (setting aside the "is this a derivative work of the posts" question).

Going from the collection of posts to a model, however, is a purely mechanical process. There is no human creative element in creating the model from the collection of posts. Thus the model wouldn't be sufficiently creative to have a copyright of its own.

The question of "is the model infringing on the copyrights" is one that is open and interesting. I (not a lawyer) would side on that it is sufficiently transformative that the model, while not being able to be copyrighted itself isn't infringing on the copyrights of the material that was used to train it - HOWEVER it may produce infringing works when prompted to do so either intentionally or unintentionally.

Going back to the cookbook. If you create a cookbook of seafood recipes (recipes are not copyrightable, but the cookbook is because it is curated data) and I take that cookbook and apply the mechanical change of "double the recipes - 4 oz of salmon becomes 8 oz and serves 2 becomes serves 4" my collection of recipes isn't copyrightable because all I did was apply math to it. Likewise, taking a collection of posts (or pictures) and applying math to it isn't able to be copyrighted.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: