Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You bring up a really good point. I'm super curious what the legality and ethics around training machines on licensed or even proprietary code would be. IIRC there are implications around code you can build if you've seen proprietary code (I remember an article from HN about how bash had to be written by someone who hadn't seen the unix shell code or something like that).

How would we classify that legally when it comes to training and generating code? Would you argue the machine is just picking up best practices and patterns, or would you say it has gained specifically-licensed or proprietary knowledge?



I would argue that a trained model falls under the legal category of "compilation of facts".

More generally, keep in mind that the legal world, despite an apparent focus on definition is very bad at dealing with novelty, and most of it will end up justifying a posteriori existing practices.


You might argue that, but you would likely be wrong.

Even a search engine is not merely a "compilation of facts". A trained model is the result of analysis and reasoning, albeit automated.


A search engine provides snippets of other data. You can point explicitly to where it got that text from. A trained model generates its own new data, from influence of millions of different sources. It's entirely different.


> (I remember an article from HN about how bash had to be written by someone who hadn't seen the unix shell code or something like that).

I believe you're referring to Clean Room Design[1].

[1] https://en.wikipedia.org/wiki/Clean_room_design




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: