I'm guessing that GPUs aren't 100% standard-cell designs. GPUs are datapath-intensive, with many identical functional units (adders, subtracters, multiliers, etc.) These circuit blocks are 'relatively standard' in that maybe a small # of unique components spans >70-80% of the used instances in the GPU's datapath. Even just a 'semi-custom' relayout of those highly repetitive blocks could result in huge area/power-savings (which conversely could be put to upping the clock frequency.)
Not saying this is what they do (because I don't know any engineers who work there), but I wouldn't be surprised.
I didn't mean that they don't customize some of the chip, lol. I guess that's what I made it sound like, sorry.