This feels like (yet another) extension of copyright. Whilst I'm not sure I completely disagree with you, I want people to acknowledge that copyright is not the natural state of the universe. Prior to (I think) 1790 there was no copyright and human beings managed minor things like, you know, the renaissance and stuff like that.
Copyright was invented and enforced and the results have been a mixed bag. It seems to suffer from a ratchet effect where the law only ever increases the scope to which copyright applies and never decreases it.
However intuitive your sense of your moral rights are, it's about the net benefit to society and we should be very careful what we wish for.
If creating LLMs based on copyrighted data is found to be legal, all that will do is allow giant companies to sell copyrighted work without crediting the original authors, while leaving everyone else in the dirt.
> all that will do is allow giant companies to sell copyrighted work without crediting the original authors, while leaving everyone else in the dirt.
I'm not sure I follow. But even accepting your premise - I'm not sure how it will favour giant companies over anyone else. The models are already in the wild and anyone can use them. In some ways - large companies are less likely to do anything that might open them up to legal risks or PR downsides.
Maybe this is more of a Napster moment than it is a big tech powergrab?
GPT is owned by Microsoft, LLama by Facebook and Bard by Google. If you trained a model on google public properties and started distributing it for money (or its output), we'd be sued into oblivion real quick.
My point was that the models exist, people are fine tuning them and/or releasing open clones. There are models of comparable power to the state of the art without any controlling interest from a big tech company.
The Google memo covered this in detail and it was what makes me want to question the "AI is owned by big tech" angle.
Price/profit != value. Sure, Hollywood movies bring in a ton of money, but I get way more value from daily indie youtubers than a blockbuster released once a month.
> Prior to (I think) 1790 there was no copyright and human beings managed minor things like, you know, the renaissance and stuff like that.
Curious if the introduction of copyright is what led to an explosion of products and innovation. Suddenly people were given an incentive to monetize their ideas. I doubt the renaissance happened due to a lack of copyright. I think it's more due to social, political and health circumstances rather than the lack of protection of one's work. We, in Europe, suffered from disease, famine, war, to the point where we reached the conclusion that enough is enough - we need rules to the game.
There doesn’t seem to be evidence that copyright increases innovation. Indeed in some areas with no IP protection we actually see more innovation (example: fashion)
> it's about the net benefit to society and we should be very careful what we wish for.
Seems like we have a classic trolly problem.
On one track, compensating copyright holders is required for LLMs, and it's going to be very expensive to acquire all of this copyrighted info, meaning only the biggest companies can afford to do it.
On the other track, compensating copyright holders is not required, LLMs (led by big tech) capture most of the economic value from every incremental piece of content created by humans in perpetuity, consolidating wealth in the hands of a few shareholders and insiders.
> On one track, compensating copyright holders is required for LLMs, and it's going to be very expensive to acquire all of this copyrighted info, meaning only the biggest companies can afford to do it.
There is also the third track which is that most abundant code is open source or unlicensed content (which is protected in the US afaik). If corporations can't monetize on it, we win, because models either need to be open source or we need payment for training.
I'm not sure it's certain yet with AI is going to lead to more consolidation or actually have the opposite effect.
Whilst history tends to make me suspect the former, the recent leaked Google memo gave me pause for thought. AI is already out there and already can be trained on consumer hardware. It's ever so slightly possible that big tech won't be able to horde the benefits this time.
Open source models are possible if we pick the second option. Lots of innovation in the AI scene is happening thanks to open source models being available to the general public.
Copyright was invented and enforced and the results have been a mixed bag. It seems to suffer from a ratchet effect where the law only ever increases the scope to which copyright applies and never decreases it.
However intuitive your sense of your moral rights are, it's about the net benefit to society and we should be very careful what we wish for.