If submitter picks (a) they assert that they wrote the code themselves and have right to submit it under project's license. If (b) the code was taken from another place with clear license terms compatible with the project's license. If (c) contribution was written by someone else who asserted (a) or (b) and is submitted without changes.
Since LLM generated output is based on public code, but lacks attribution and the license of the original it is not possible to pick (b). (a) and (c) cannot be picked based on the submitter disclaimer in the PR body.
If there's a "the original" the LLM is copying then there's a problem.
If there isn't, then (b) works fine, the code is taken from the LLM with no preexisting license. And it would be very strange if a mix of (a) and (b) is a problem; almost any (b) code will need some (a) code to adapt it.
> the code is taken from the LLM with no preexisting license
That's not good enough to comply with (b). The code must be specifically covered by an open-source license, it's not enough for it to just not have a license.
There's a difference between "no license, all rights reserved" and "no license, public domain". Up until recently, you could assume that not having a license meant the former. But treating the latter as the same would just be silly.
As far as I'm concerned, public domain counts as "an appropriate open source license".
I'm of course assuming the legal status quo holds, where code properly generated by LLM is also explicitly public domain. No shadiness involved.
(There's always a risk of an LLM copying something verbatim by accident, but if the designers are doing their job that chance gets low enough to be acceptable. Human code has that risk too after all. (And for situations that aren't an accident, with the human intentionally using snippets to draw out training text, then if they submit that code in a patch it's just a human violating copyright with extra steps.))
> Both the federal and circuit courts in the District of Columbia have upheld the Copyright Office's refusal to register copyrights for works generated solely by machines, establishing that machine ownership would conflict with heritable property rights as establish by the Copyright Act of 1975.[16] As of March 2026, the Supreme Court of the United States has denied hearing challenges to the Copyright Office's decision.[17]
To many, it qualifies under either A or B, and therefore C as well. Under A, you can think of the LLM as augmenting your own intelligence. Under B, the license terms of LLM output are essentially that you can do whatever you want with it. The alternative is avoiding use of AI because of copyright or plagiarism concerns.
Whether AI output can fall under copyright at all is still up for debate - with some early rulings indicating that the fact that you prompted the AI does not automatically grant you authorship.
Even if it does, it hasn't been settled yet what the impact of your AI having been trained on copyrighted material is on its output. You can make a not-completely-unreasonable argument that AI inference output is a derivative work of AI training input.
Fact is, the matter isn't settled yet, which means any open-source project should assume the worst possible outcome - which in practice means a massive AI-generated PR like this should be treated like a nuke which could go off at any moment.
2. Copyrighted works require human creativity to be applied in order to be copyrighted.
For point 2 this would apply to times were AI one shots a generic prompt. But for these large PRs where multiple prompts are used and a human has decided what the design should be and how the API should look you get the human creativity required for copyright.
In regards to being a derivative work I think it would be hard to argue that an LLM is copying or modifying an existing original work. Even if it came up with an exact duplicate of a piece of code it would be hard to prove that it was a copy and not an independent recreation from scratch.
>the worst possible outcome
The worst possible outcome is they get sued and Anthropic defends them from the copyright infringement claim due to Anthopic's indemnity clause when using Claude Code.
That indemnity clause is only for Team, Enterprise and API users. Do you know what was used here?
Also the commercial version is limited to “…Customer and its personnel, successors, and assigns…”. I am very much not a lawyer and couldn’t find definitions of these in the agreement but I am not sure how transferable this indemnity would be to an open source project.
Why write open-source software at all, when the government could outlaw open-source entirely? What if an asteroid destroys Earth and there are no humans left to enjoy your work? At some point, you have to agree that a risk isn't worth worrying about. And your "worst possible outcome" is just the arbitrary outcome that you think has some subjective risk threshold. And it's certainly not one I agree with. Furthermore, calling it a "nuke" is a bad analogy because that implies that it can't be put back in the bottle once opened. In reality, we're dealing with legal definitions, which can be redefined as easily as defined.
Well, it's a good thing you're not on the hook for defending against it, then.
Like I said in another comment, you don't have a license just because they're cool and look neat. You have them specifically to guard against people like patent trolls, who are trying to wreck your shit and take your lunch money. It's not an abstract risk.
> Well, it's a good thing you're not on the hook for defending against it, then
If you are on the hook for defending against it, and your risk assessment is based on emotional, irrational fear and not an objective understanding of the risks, then you're doing people a disservice and should step down.
This is not how law works. Stop pretending that you’re a lawyer. You do not “always assume the worst”. Stop giving legal advice. You’re very clearly a developer in over his head. Law is not an engineering problem. Legislation is not a technical specification. Christ.
No, they're absolutely correct, and they're not saying either of those things. They're pointing out an enormous hidden risk. Yanno, like an engineer is supposed to do.
You don't have a license because it's what all the cool kids are doing, you have one in case shit goes sideways and someone decides to try and ruin your day. You do, in fact, have to assume the worst.
The "nuke" here is some litigious company -- let's call them Patent Troll Rebranded (PTR) -- discovers that the LLM reproduced large amounts of their copyrighted code. Or it claims to have discovered it. They have large amounts of money and lawyers to fight it out in court and you are a relatively shoestring language foundation.
Either you have to unwind years of development to remove the offending code or you're spending six figures or more to defend yourself in court, all because you didn't bother to anticipate things that are anticipatable.
Because bash is everywhere. Stability is a separate concern. And we know this because LLMs routinely generate deprecated code for libraries that change a lot.
I've been working with the shell long enough that I know just by looking at it.
Anyway, it was rethorical. I was making a point about portability. Scripts we write today run even on ancient versions, and it has been an effort kept by lots of different interpreters (not only bash).
I'm trying to give sane advice here. Re-implementing bash is a herculean task, and some "small incompatibilities" sometimes reveal themselves as deep architectural dead-ends.
The issue here is not language, is basic understanding of how LLMs are trained, how agents act on that training and what is the role of the shell from a systems perspective.
I can't have a meaningful conversation with someone that doesn't fully grasp those, no matter in which language.
My work machine is Win11 and the new Notepad is hilariously buggy. Repeatedly encountered bugs where the screen fails to paint, takes multiple seconds to load, hard refuses to open files of a certain size, etc.
Notepad was never fancy, but it was a reliable tool to strip formatting or take a quick note, and now I cannot even count on that.
They dont care. Their sales reps absolutely know that if you are using Microsoft products it is because you are locked in so deeply that escape is nearly impossible.
I like CUE a lot. We use it pretty heavily for schema enforcement of CRDs. That being said, it is pretty complex and learning to use it was anything but straight forward.
Lets put it this way, no engineer is choosing to use bitbucket. You use it because some SVP made the mistake of choosing atlassian software a decade ago and refuses to change.
Yes. poetry & pyenv was already a big improvement, but now uv wraps everything up, and additionally makes "temporary environments" possible (eg. `uv run --with notebook jupyter-notebook` to run a notebook with my project dependencies)
This is it. Later versions of python .11/.12/.13 have significant improvements and differences. Being able to seamlessly test/switch between them is a big QOL improvement.
I don't love that UV is basically tied to a for profit company, Astral. I think such core tooling should be tied to the PSF, but that's a minor point. It's partially the issue I have with Conda too.
> Later versions of python .11/.12/.13 have significant improvements and differences. Being able to seamlessly test/switch between them is a big QOL improvement.
I just... build from source and make virtual environments based off them as necessary. Although I don't really understand why you'd want to keep older patch versions around. (The Windows installers don't even accommodate that, IIRC.) And I can't say I've noticed any of those "significant improvements and differences" between patch versions ever mattering to my own projects.
> I don't love that UV is basically tied to a for profit company, Astral. I think such core tooling should be tied to the PSF, but that's a minor point. It's partially the issue I have with Conda too.
In my book, the less under the PSF's control, the better. The meager funding they do receive now is mostly directed towards making PyCon happen (the main one; others like PyCon Africa get a pittance) and to certain grants, and to a short list of paid staff who are generally speaking board members and other decision makers and not the people actually developing Python. Even without considering "politics" (cf. the latest news turning down a grant for ideological reasons) I consider this gross mismanagement.
I think the big difference is that these aren't AI generated bug reports. They are bugs found with the assistance of AI tools that were then properly vetted and reported in a responsible way by a real person.
From what I understand some of the bugs where in code the AI made up on the spot, other bug reports had example code that didn't even interact with curl. These things should be relatively easy to verify by a human, just do a text search in the curl source to see if the AI output matches anything.
Hard to compute, easy to verify things should be the case where AI excel at. So why do so many AI users insist on skipping the verify step?
The issue I keep seeing with curl and other projects is that people are using AI tools to generate bug reports and submitting them without understanding (that's the vetting) the report. Because it's so easy to do this and it takes time to filter out bug report slop from analyzed and verified reports, it's pissing people off. There's a significant asymmetry involved.
Until all AI used to generate security reports on other peoples' projects is able to do it with vanishingly small wasted time, it's pretty assholeish to do it without vetting.
Thats a bit uncalled for. This is a game made by someone shaped by their perspective on the world. It can be appreciated as such without applying your own additional intent.
reply