Jawed
Legend
Considering that G71 has 220 clock cycles of latency hiding, at 650MHz, you'd expect G80 to require in the region of 400+ clock cycles to hide latency at 1350MHz.
Even if G71 has a huge margin for "error", say a factor of 2, then that still means that G80 would require ~200 clock cycles to hide latency.
If you decide that a typical worst case is vec2 instructions being used to hide latency, then each instruction takes 2 clocks for 16 fragments at a time, or 8 fragments per clock. So that's a total of 1600 fragments required to execute a single instruction over a period of 200 clocks.
If you assign 4xfp32s (64 bytes) to each of those fragments, then you get a minimum register file size of 102400 bytes per cluster.
That's 800KB of register file for the entire GPU. Still a small value compared to R580 or Xenos (both are 1152KB as far as I can tell, though excluding vertex shader register file in the case of R580).
And assuming that the worse case bilinear latency cannot be hidden (when only a scalar instruction is available to hide latency) - which is a stretch in my view. In other words I suspect the register file in G80 is actually twice this size.
Jawed
Even if G71 has a huge margin for "error", say a factor of 2, then that still means that G80 would require ~200 clock cycles to hide latency.
If you decide that a typical worst case is vec2 instructions being used to hide latency, then each instruction takes 2 clocks for 16 fragments at a time, or 8 fragments per clock. So that's a total of 1600 fragments required to execute a single instruction over a period of 200 clocks.
If you assign 4xfp32s (64 bytes) to each of those fragments, then you get a minimum register file size of 102400 bytes per cluster.
That's 800KB of register file for the entire GPU. Still a small value compared to R580 or Xenos (both are 1152KB as far as I can tell, though excluding vertex shader register file in the case of R580).
And assuming that the worse case bilinear latency cannot be hidden (when only a scalar instruction is available to hide latency) - which is a stretch in my view. In other words I suspect the register file in G80 is actually twice this size.
Jawed