These AI systems are all riding on fair use anyway (at least for now), so it doesn't matter what license you choose because they're never accepting it in the first place.
Where did Microsoft say that GPL code was excluded from the training dataset? Since not including attribution is already a violation of MIT/BSD-style licenses, I don't think MS is discouraged by violating licenses.