Huh, so Copilot was trained on a mix of correct and incorrect code and it learne...

Huh, so Copilot was trained on a mix of correct and incorrect code and it learned to reproduce both correct and incorrect code. It's a language model, after all. It predicts the most likely next sequence of tokens. It doesn't know anything about correct and incorrect code. If the most likely next sequence of tokens is incorrect, it doesn't care, it's still the most likely next sequence.

And I guess it makes sense that Copilot was trained in this way, even if it wasn't just a language model. How do you even begin to separate correct from incorrect code on the entire freaking github?

But I think TFA serves best to show the worst way to use Copilot. I haven't tried it but I suspect that it would do a lot better if it was asked to generate small snippets of code, rather than entire functions. Not "returns the current phase of the moon" but "concatenates two strings". That would also make it much easier to eyball the generated code quickly without too many mistakes making it through.

Of course, you could do the same thing with a bunch of macros, rather than a large language model that probably cost a few million to train...