If I spent a lot of time reading open source repos on GitHub to teach myself to code, and then went out and got a high-paying job based on that knowledge, is that ethical? This seems roughly analogous to what the machine is doing.
Regardless of the legality, one of these situations is clearly ethical compared to the other. In the case were you get a job based on your knowledge of GPL software, you still must respect the license if you use that code commercially (i.e. at your new job). And yes, if you reproduce GPL code you "learned" from, you are violating the license.
A company ingesting an entire GPL codebase without warning or any way to opt-out in order to create a closed-source feature that they and only they will profit off of is clearly not the same as an individual reading the code and getting a job based on those ideas.
You're intentionally conflating scale here to make them seem the same.
> no warning that somebody/something is ingesting the repo
An individual reading code on their own time is not the same as ingesting terabytes to train a machine. No matter how much you believe in AI working similar to the human learning (it doesn't), they are not comparable.
> private profit
Again, the difference between an individual reading code to work for a salary is orders of magnitude different from ingesting terabytes of code so a company can create a new feature. Claiming these things are the same only makes sense if you ignore the massive differences in scale and the differences between how humans and machines learn.
> Is it unethical to use software for profit?
When the licenses explicitly say that if you use the software for profit it requires attribution, the answer is clearly yes. My code on github is licensed such that if you use it, you must say where it came from. The only way this isn't at the very least unethical (because it goes against my wishes as owner of the code) is if you argue that github isn't "using" the code, which clearly isn't true, because if everyone was able to opt out there wouldn't be a product for github to be working on at all.