Huh, so Copilot was trained on a mix of correct and incorrect code and it
learned to reproduce both correct and incorrect code. It's a language model,
after all. It predicts the most likely next sequence of tokens. It doesn't know
anything about correct and incorrect code. If the most likely next sequence of
tokens is incorrect, it doesn't care, it's still the most likely next sequence.
And I guess it makes sense that Copilot was trained in this way, even if it
wasn't just a language model. How do you even begin to separate correct from
incorrect code on the entire freaking github?
But I think TFA serves best to show the worst way to use Copilot. I haven't
tried it but I suspect that it would do a lot better if it was asked to generate
small snippets of code, rather than entire functions. Not "returns the current
phase of the moon" but "concatenates two strings". That would also make it much
easier to eyball the generated code quickly without too many mistakes making it
through.
Of course, you could do the same thing with a bunch of macros, rather than a
large language model that probably cost a few million to train...
And I guess it makes sense that Copilot was trained in this way, even if it wasn't just a language model. How do you even begin to separate correct from incorrect code on the entire freaking github?
But I think TFA serves best to show the worst way to use Copilot. I haven't tried it but I suspect that it would do a lot better if it was asked to generate small snippets of code, rather than entire functions. Not "returns the current phase of the moon" but "concatenates two strings". That would also make it much easier to eyball the generated code quickly without too many mistakes making it through.
Of course, you could do the same thing with a bunch of macros, rather than a large language model that probably cost a few million to train...