Leaking sensitive data and infringement are separate (tho related) concerns. The...

hnlmorg · on Sept 7, 2023

Are they separate? Or is it the same concern but from opposite view points?

Both worried about IP leaking but one side is worried about their IP leaking and the other worried about liability if they inadvertently implement any leaked IP. Either way, the concern is leaked IP.

Gigachad · on Sept 8, 2023

Yes, if I ask something like "Can you describe microsoft's internal security processes and the names of upcoming products" the output would be original and not covered by copyright, but it would be sensitive internal information and covered by NDAs. But any code publicly posted and available to be scraped will not have such sensitive info in it.

hnlmorg · on Sept 8, 2023

I don’t think GitHub Co-pilot can respond to prompts like that. I thought it was ostensibly sophisticated source code completion. If so, source code is absolutely covered under copyright.

hunter2_ · on Sept 9, 2023

Works generated by AI are not copyrightable. If you take a generated work and substantially build upon it, then it's likely copyrightable.

At least that's the case for art, and I think the same logic should apply to art and code.

hnlmorg · on Sept 9, 2023

That hasn’t been tested in court.

But even if that were true, it’s a moot point because we are talking about the copyrighted content that the models were trained on. Hence the point the OP made that if Microsoft really wanted to reassure people then they’d promote models that were trained on Microsoft’s own code rather than handwave away these concerns with gestures of assuming theoretical liability.

hunter2_ · on Sept 9, 2023

Ah, ok. As for testing in court, that will be useful, but a rather official source says "created by a human author" [0] in defining the notion of copyright, which I assume is paraphrasing actual law, which I assume a judge would interpret similarly. However, I will concede that it's conceivable that if a human authors a work that then itself authors another work, the second work could potentially be attributed to the human for purposes of copyright eligibility.

[0] https://www.copyright.gov/what-is-copyright/

eru · on Sept 8, 2023

> But any code publicly posted and available to be scraped will not have such sensitive info in it.

Well, at least you'd hope so.

chongli · on Sept 7, 2023

even though it's totally infringement safe

This hasn't been tested in court.