> - under what license the generated code falls under?
Is it even copyrighted? Generally my understand is that to be copyrightable it has to be the output of a human creative process, this doesn't seem to qualify (I am not a lawyer).
Isn't it subject to the licenses the model was created from, as the learning is basically just an automated transformation of the code, which would be still the original license - as else I could just run some minifier, or some other, more elaborate, code transformation, on some FOSS project, for example the Linux kernel, and relicense it under whatever?
Does not sound right to me, but IANAL and I also did not really look at how this specific model/s is/are generated.
If I did some AI on existing code I'd be quite cautious and group by compatible licences classes, asking the user what their projects licence is and then only use the compatible parts of the models.-Anything else seems not really ethical and rather uncharted territory in law to me, which may not mean much as IANAL and just some random voice on the internet, but FWIW at least I tried to understand quite a few FOSS licences to decide what I can use in projects and what not.
Anybody knows of some relevant cases of AI and their input data the model was from, ideally in jurisdictions being the US or any European Country ones?
This is a great point. If I recall correctly, prior to Microsoft's acquisition of Xamarin, Mono had to go out of its way to avoid accepting contributions from anyone who'd looked at the (public but non-FOSS) source code of .NET, for fear that they might reproduce some of what they'd seen rather than genuinely reverse engineering.
Is this not subject to the same concern, but at a much greater scale? What happens when a large entity with a legal department discovers an instance of Copilot-generated copyright infringement? Is the project owner liable, is GitHub/Microsoft liable, or would a court ultimately tell the infringee to deal with it and eat whatever losses occur as a result?
In any case, I hope that GitHub is at least limiting any training data to a sensible whitelist of licenses (MIT, BSD, Apache, and similar). Otherwise, I think it would probably be too much risk to use this for anything important/revenue-generating.
> In any case, I hope that GitHub is at least limiting any training data to a sensible whitelist of licenses (MIT, BSD, Apache, and similar). Otherwise, I think it would probably be too much risk to use this for anything important/revenue-generating.
I'm going to assume that there is no sensible whitelist of licenses until someone at GitHub is willing to go on the record that this is the case.
> I hope that GitHub is at least limiting any training data to a sensible whitelist of licenses (MIT, BSD, Apache, and similar)
Yes, and even those licences require preservation of the original copyright attribution and licence. MIT gives some wiggle room with the phrase "substantial portions", so it might just be MIT and WTFPL
(Not a lawyer, and only at all familiar with US law, definitely uncharted territory)
No, I don't believe it is, at least to the extent that the model isn't just copy and pasting code directly.
Creating the model implicates copyright law, that's creating a derivative work. It's probably fair use (transformative, not competing in the market place, etc), but whether or not it is fair use is github's problem and liability, and only if they didn't have a valid license (which they should have for any open source inputs, since they're not distributing the model).
I think the output of the model is just straight up not copyrighted though. A license is a grant of rights, you don't need to be granted rights to use code that is not copyrighted. Remember you don't sue for a license violation (that's not illegal), you sue for copyright infringement. You can't violate a copyright that doesn't exist in the first place.
Sometimes a "license" is interpreted as a contract rather than a license, in which you agreed to terms and conditions to use the code. But that didn't happen here, you didn't agree to terms and conditions, you weren't even told them, there was no meeting of minds, so that can't be held against you. The "worst case" here (which I doubt is the case - since I doubt this AI implicates any contract-like licenses), is that github violated a contract they agreed to, but I don't think that implicates you, you aren't a party to the contract, there was no meeting of minds, you have a code snippet free of copyright received from github...
So if I make AI that takes copyrighted material in one side, jumbles it about, and spits out the same copyrighted material on the other side, I have successfully laundered someone else's work as my own?
Wouldn't GitHub potentially be responsible for the infringement by distributing the copyrighted material knowing that it would be published?
I exempted copied segments at the start of my previous post for a reason, that reason is I don't really know, I doubt it works because judges tend to frown on absurd outcomes.
Where does copying end though?
If an AI "retypes" it, not only with some variable renaming but some transformations that are not just describable by a few code transformations (neural nets are really not transparent and can do weird stuff), it wouldn't seem like a copy when just comparing parts of it, but it effectively would be one, as it was an automated translation.
Probably, copying ends when the original creative elements are unrecognizable. Renaming variables actually goes a long way to that, also having different or standardized (and therefore not creative) whitespace conventions, not copying high level structure of files, etc.
The functional parts of code are not copyrightable, only the non functional creative elements.
> The functional parts of code are not copyrightable, only the non functional creative elements.
1. Depends heavily on the jurisdiction (e.g., Software patents are a thing in America but not really in basically all European ones)
2. A change to a copyrightable work, creative or not, would still mean that you created a derived work where you'd hold some additional rights, depending on the original license, but not that it would now be only in your creative possession.
E.g., check §5 of https://www.gnu.org/licenses/gpl-3.0.en.html
3. What do you think of when saying "functional parts"? Some basic code structure like an `if () {} else {}` -> sure, but anything algorithmic like can be seen as copyrightable, and whatever (creative or not) transformation you apply, in its basics it is a derived work, that's just a fact and the definition of derived work.
Now, would that matter in courts? That depends not only on 1., but additionally to that also very much on the specific case, and for most trivial like it probably would be ruled out, but if an org would invest enough lawyer power, or suing in a for its case favourable court (OLG Hamburg anyone). Most small stuff would be thrown out as not substantial enough, or die even before reaching any court.
But, that actually scares me a bit when thinking about that in this context, as for me, it seems like when assuming you'd be right, this all would significantly erodes the power of copyleft licenses like (A)GPL.
Especially if a non-transparent (e.g., AI), lets call it, code laundry would be deemed as a lawful way to strip out copyright. As it is non-transparent it wouldn't be immediately clear if creative change or not, to use the criteria for copyright you used. This would break basically the whole FOSS community, and with its all major projects (Linux, coreutils, ansible, git, word press, just to name a few) basically 80% of core infrastructure.
Is it even copyrighted? Generally my understand is that to be copyrightable it has to be the output of a human creative process, this doesn't seem to qualify (I am not a lawyer).
See also, monkeys can't hold copyright: https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput...