You should read the FAQ at the bottom of the page; I think it answers all of you...

viccuad · on June 29, 2021

> You should read the FAQ at the bottom of the page; I think it answers all of your questions: https://copilot.github.com/#faqs

Read it all, and the questions still stand. Could you, or any on your team, point me on where the questions are answered?

In particular, the FAQ doesn't assure that the "training set from publicly available data" doesn't contain license or patent violations, nor if that code is considered tainted for a particular use.

res0nat0r · on June 29, 2021

From the faq:

> GitHub Copilot is a code synthesizer, not a search engine: the vast majority of the code that it suggests is uniquely generated and has never been seen before. We found that about 0.1% of the time, the suggestion may contain some snippets that are verbatim from the training set.

I'm guessing this covers it. I'm not sure if someone posting their code online, but explicitly saying you're not allowed to look at it, getting ingested into this system with billions of other inputs could somehow make you liable in court for some kind of infringement.

IncRnd · on June 29, 2021

That doesn't cover it, since that is a technical answer for a non-technical question. The same questions remain.

viccuad · on June 29, 2021

that doesn't include patent violations nor license violations or compatibility between licenses. Which would be the most numerous and non-trivial cases.

res0nat0r · on June 29, 2021

How is it possible to determine if you've violated a random patent from somewhere on the internet via a small snippet of customized auto-generated code?

Does everyone in this thread contact their lawyers after cutting and pasting a mergesort example from Stackoverflow that they've modified to fit their needs? Seems folks are reaching a bit.

IncRnd · on June 29, 2021

For that very reason, many companies have policies that forbid copying code from online (especially from StackOverflow).

dlubarov · on June 29, 2021

That mitigates copyright concerns, but patent infringement can occur even if the idea was independently rediscovered.

IncRnd · on June 29, 2021

I was answering a specific question, "How is it possible to determine if you've violated a random patent from somewhere on the internet via a small snippet of customized auto-generated code?" The answer is that many companies have forbidden that specific action in order to remove the risk from that action.

You are expanding the discussion, which is great, but that doesn't apply in answer to that specific question.

There are answers in response to your question, however. For example, many companies use software for scanning and composition analysis that determines the provenance and licensing requirements of software. Then, remediation steps are taken.

dlubarov · on June 30, 2021

Not sure what you're getting at. Are you suggesting that independent discovery is a defense against patents? Or are you clear that it isn't a defense, but just arguing that something from the internet is more likely to be patented than something independently invented in-house? Maybe that's true, but it doesn't really answer the question of

> How is it possible to determine if you've violated a random patent from somewhere on the internet via a small snippet of customized auto-generated code?

The only real answer is a patent search.

IncRnd · on June 30, 2021

There are different ways to handle risk, such as avoidance, reduction, transfersal, acceptance. I was answering a specific question as to how people manage risk in a given situation. In answer I related how companies will reduce the risk. I was not talking about general cases of how to defend against the risk of patents but a specific case as to reducing the risk of adding externally found code into a product.

My answer described literally what many companies do today. It was not a theoretical pie in the sky answer or a discussion about patent IP.

To restate, the real-world answer I gave for, "How is it possible to determine if you've violated a random patent from somewhere on the internet via a small snippet of customized auto-generated code?" is often "Do not take code from the Internet."

woah · on June 29, 2021

I think a patent violation with CoPilot is exactly the same scenario as if you violated a patent yourself without knowing it.

notsureaboutpg · on June 29, 2021

Sounds like using CodePilot can introduct GPLd code into your project and make your project bound by GPL as a result...

0.1% is a lot when you use 100 suggestions a day.

samtheprogram · on June 29, 2021

The most important question, whether you own the code, is sort of maybe vaguely answered under “How will GitHub Copilot get better over time?”

> You can use the code anywhere, but you do so at your own risk.

Something more explicit than this would be nice. Is there a specific license?

EDIT: also, there’s multiple sections to a FAQ, notice the drop down... under “Do I need to credit GitHub Copilot for helping me write code?”, the answer is also no.

Until a specific license (or explicit lack there-of) is provided, I can’t use this except to mess around.

dvaun · on June 29, 2021

None of the questions and answers in this section hold information about how the generated code affects licensing. None of the links in this section contain information about licensing, either.

netcraft · on June 29, 2021

I dont see the answer to a single one of their questions on that page - did you link to where you intended?

Edit: you have to click the things on the left, I didn't realize they were tabs.

kitsune_ · on June 30, 2021

Sorry Nat, but I don't think it really answers anything. I would argue that using GPL code during training falls under Copilot being a derivative work of said code. I mean if you look at how a language model works, than it's pretty straightforward. The word "code synthesizer" alone insinuates as much. I think this will probably ultimately tested in court.

rozab · on June 29, 2021

This page has a looping back button hijack for me