Yeah, don't get me wrong - I actually think 192SPs at 2.4GHz+ is very likely. Even if I think it's likely, that doesn't mean I wouldn't be positively surprised anyway though!
When the NV4x architecture got an upgrade and turned into the G7x line, many have referenced the upgraded second MADD (was a MUL in the 6 series, became a MADD in the GF7's).
Could this MUL in the G80 be "upgraded" to a MADD unit in the upcoming G92 ?
Would that (like in the earlier GF6 -> GF7 transition) increase the per-clock efficiency of the GPU or, rather, complicate things even further in the DX10 era ?
G80 obviously doesn't have 180 shaders. 180 is around the 192 people are talking about for G92, but it's a really screwy number (20*9??). Could this just be a typo, or did Andy let something slip?According to Nvidia, the multithreaded 180 shaders found in the Nvidia GeForce 8800 can offer developers "essentially unlimited instruction bandwidth," said Andy Keane, the general manager of the GPU computing group at Nvidia.
Yes, it is possible in theory to change the ratio of MADD ALUs to SFUs units. And by SFU units, I mean the multipurpose SFU-Interpolators-MUL units, in case that wasn't clear enough.Would it be possible that NV adds per cluster 32 SPs, which are only MADD?
Think of the SFUs (excluding the MUL) as doing either one attribute interpolation per cycle or one SFU op every 4 cycles.Is the ratio of regular ops to SFU ops really high enough for Nvidia to go wider with the main ALU's while keeping the number of 1/4 speed SFU units the same without creating a serious bottleneck there?
Hmmm, yeah, increasing the width from 8 to 12 (not 16 to 24, remember, double-pumping! And I was talking at the multiprocessor-level here, which is what matters for warp size) would indeed be problematic in terms of CUDA optimization backwards compatibility. So that would indeed seem like it would exclude the possibility of changing the SFU ratio easily, hmmm.
Well the SFUs will prolly continue to come in pairs per SIMD, so you can choose whichever ratio suits needs, e.g. 4:1 or 8:1. It's just the granularity of that ratio is rather large. I think the underlying 1:4 ratio related to SFU:Interpolation is all that can be considered as fixed.So that would indeed seem like it would exclude the possibility of changing the SFU ratio easily, hmmm.
For top-line performance I suspect G80's future variants will only use multiples of the "base" that G80 sets. If you change the "shape" then all the boundaries between blocks of threads, when you're modelling the input data layout, move to "non-integer" locations. A 24x24 block is "half" way between a 16x16 block and a 32x32 block.I'm wondering how much backwards compatibility matters at this point in time. My reading of the CUDA docs was along the lines of "G80 is this shape, but all bets are off what the shape will be for future architectures".
Well, I could see how a dp/int32/int64 divider, possibly with the ability to do square roots/inverse square roots would be really nice, but I'd imagine that such a circuit would be either quite large or quite slow.