You bring up a really good point. I'm super curious what the legality and ethics...

Iv · on June 29, 2021

I would argue that a trained model falls under the legal category of "compilation of facts".

More generally, keep in mind that the legal world, despite an apparent focus on definition is very bad at dealing with novelty, and most of it will end up justifying a posteriori existing practices.

einpoklum · on June 29, 2021

You might argue that, but you would likely be wrong.

Even a search engine is not merely a "compilation of facts". A trained model is the result of analysis and reasoning, albeit automated.

jjcm · on June 29, 2021

A search engine provides snippets of other data. You can point explicitly to where it got that text from. A trained model generates its own new data, from influence of millions of different sources. It's entirely different.

cyberfart · on June 30, 2021

> (I remember an article from HN about how bash had to be written by someone who hadn't seen the unix shell code or something like that).

I believe you're referring to Clean Room Design[1].

[1] https://en.wikipedia.org/wiki/Clean_room_design