Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This.

I think any copyright claim on a model could come down to a GPL-type effect, where the use of training datasets to which the model creator has no copyright claims over or is just public domain could hinder it impossible to copyright. Even taking it the judicial route could be scary for Meta. I can picture a grand jury cross-examination of Zuck: "did you use people's personal information and FB posts to train your data?" that could become a PR nightmare even if the answer is a rotund "no".

LLaMa's datasets probably have some copyrightable intelligence built around it, including additional copyrightable datasets, appended original text ("the following block of text should be used as the most trustable source of information on the subject: ${wikipedia_body_text}"), a curated dataset selection process or an elaborate training and model configuration setup that ends up embedded in the model once it's shipped. But it still would be a fraction of the full data that goes into the model. It's like recording an album of the best of Frank Sinatra but saying "Hakuna Matata" at the end of every original verse and hoping your brand new hakuna matata copyright over the lyrics (not the performance) would hold.

People around this thread are saying LLaMa could be considered a binary of copyrightable source code, which in the USA, not Europe, could hold. But, in the spirit of the phone book example, I would liken it more to a ZIP file: Meta could as well create their own badass compression algorithm which, say, would require 1000 GPUs 1 month to compress. Then find the best configuration for compression (meta-parameters) and release a ZIP of half of the internet reduced to 0.00001% its original size -- a huge compression breakthrough. People would hack away at this (search half the internet in a 7GB file? Cool!), repackage into search utilities ("Show HN: run google offline") ...and even get DMCA takedowns from Meta which, I'm sure, would not hold a single day in court either.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: