1. The latest and greatest Codex is now twice as good on its own benchmark suite than the original version published a year ago.
2. It's just as good on Python as it is on JS, Scala, C++, Swift, TypeScript... and other languages are not too far behind. It's not bad at bash of all things.
A reader of my article sent me this note, pointing me to some relevant hot-off-the-presses work (reproduced with permission):
-----
A group recently evaluated the performance of Codex (and another model) on 18 programming languages:
https://nuprl.github.io/MultiPL-E/
The high-order bits are:
1. The latest and greatest Codex is now twice as good on its own benchmark suite than the original version published a year ago.
2. It's just as good on Python as it is on JS, Scala, C++, Swift, TypeScript... and other languages are not too far behind. It's not bad at bash of all things.
Paper here: https://arxiv.org/abs/2208.08227