Scrolling back to where this thread was interesting... "layout by hand"
Looking at chip images of NV30, it looks like it's completely auto-routed. It's almost like if they've done
everything with standard logic cells.
And when I read
this I saw it as an vague confirmation. It would also explain why ATI can fit a full register set for all quads in flight, while nV have significantly less. A hand made cache (hand made cells, duplicated with a "memory generator") will be significantly more space efficient than a auto-routed register array definied in VHDL.
This is definitely different than most designs, even if you don't count the extremely strange auto-routed memory.
Looking at just about any modern high end CPU, you can see that there's a lot of small units with a regular pattern in them. Those small units can be memories/register banks, ALUs, or other things with regular structure. Those are not autorouted. They are carefully hand-crafted, possibly with some automation to fine tune some parameters.
These hand made blocks are laid out by hand on the chip. The people doing the lay-out should know what blocks are closely connected, and place them accordingly.
Finally the rest of the logic is "filled in" with auto-routing. It can be "glue logic" between the already done blocks, or other more irregular logic, wich an auto-router does better than a human.
I'm not sure, but maybe new lay-out software can push the fixed blocks around to clear out space where it want some auto-routed stuff.
So there can certainly be a lot of hand-routing in a 100M transistor design, and it could give some real speed up/area savings. But that doesn't mean that you route every single transistor individually.
PS
I've seen an image on Prescott, and while the image wasn't very clear, it's still clear enough to say that there was some hand made lay-out in there.