I think this would fall under any reasonable definition of fair use. If I read GPL (or proprietary) code as a human I still own code that I later write. If copyright was enforced on the outputs of machine learning models based on all content they were trained on it would be incredibly stifling to innovation. Requiring obtaining legal access to data for training but full ownership of output seems like a sensible middle ground.
Certainly not. If I memorize a line of copyrighted code and then write it down in a different project, I have copied it. If an ML model does the same thing as my brain - memorizing a line of code and writing it down elsewhere - it has also copied it. In neither case is that "fair use".
2) if I write a program that copies parts of other GPL licensed SW into my proprietary code, does that absolve me of GPL if the copying algorithm is complicated enough?
My interpretation of the GitHub TOS section D4 would give give GitHub the right to parse your code and/or make incedental copies regardless of what your license states.