Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think this would fall under any reasonable definition of fair use. If I read GPL (or proprietary) code as a human I still own code that I later write. If copyright was enforced on the outputs of machine learning models based on all content they were trained on it would be incredibly stifling to innovation. Requiring obtaining legal access to data for training but full ownership of output seems like a sensible middle ground.


Certainly not. If I memorize a line of copyrighted code and then write it down in a different project, I have copied it. If an ML model does the same thing as my brain - memorizing a line of code and writing it down elsewhere - it has also copied it. In neither case is that "fair use".


1) this is not human, it's some software

2) if I write a program that copies parts of other GPL licensed SW into my proprietary code, does that absolve me of GPL if the copying algorithm is complicated enough?


Clearly this requires some level of judgement but this isn't new, determining what is plagiarism and not requires a similar judgement call.


What if I put a licence on my Github-repositories that explicitly forbids the use of my code for machine-learning models?


My interpretation of the GitHub TOS section D4 would give give GitHub the right to parse your code and/or make incedental copies regardless of what your license states.

https://docs.github.com/en/github/site-policy/github-terms-o...

This is the same reason it doesn’t matter if you put up a license that forbids GitHub from including you in backups or the search index.


Then the person training the models wouldn't be legally accessing your code.


And so it begins: We start applying human rights to AIs.

Not a critique on your point, which a was just about yo bring up myself.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: