Yas
There was a talk of clock speed based increase in "pipelines" with the R420 launch, hints at "double pumped", comparisons to Intel's Netburst, and agreements concerning design tools to achieve significantly higher speed operation
Viewing the numbers listed, the base number seems to be the first, and the mysterious number seems to be the 3rd one. Focusing on that, and assuming accuracy in the numbers, I see two sets of numbers that seem especially important: The "12 pipe" and "4-1-3-2" Wavey is making some sort of hint about, and the 16-1-1-1 and 16-1-3-1 between the R520 and R580.
One thing that makes sense is "ROPs" and then ALUs per ROP, but I don't think that fits...that seemis too drastic a change in transistor count it would seem to me for the R520 to R580 change . That doesn't rule it out, but it doesn't seem to fit a sane refresh path.
What does seem to fit is having a design intended to achieve that type of throughput without adding silicon, which fits with some of the indicators listed at the beginning (if they're not simply fiction). That is, by having ALU processing multi-"pumped" per clock.
- This would maintain the "locked" pixel/ROP/ALU pipeline relationship (silicon-wise) that would seem to explain the "R3xx legacy"
- This would correspond to some speculation I've had concerning how some of their mobile-technology solutions could be of benefit to desktop parts in terms of performance (there seems to be varying clock usage in mobile parts already, geared toward minimum power instead of maximum performance)
- It might offer an alternative to performance scaling, depending on the profile for leakage, power, heat, etc., to execute units capable of this type of scaling on a given process
I can't evaluate how feasible this is to be done right now, but this seems to be the type of thing that makes sense and is planned by both IHVs, with ATI already having made announcements last year that seem to directly relate to it, and there being evidence of it for nVidia for separating clocking by increasing degrees going forward.
Using this guess does seem to indicate that the R520 would seem to be an" underperformer" without high base clock speeds, but might indicate a similarity in R580 and R520 that might directly relate to the issues reported in relation to "R520" delay depending on how this might be implemented.
...
There are problems with this guess, and some remaining mysteries. What does the "2" at the end mean, and why does only the RV350 have it? DDR2? Is it the latest generation of high-clocked DDR1 for everything else? Also, why the apparently huge jump from R520 to R580? This guess does perhaps explain how it might be
achievable in a refresh, but not why such a large jump in performance would be attempted. Along with this, it is significant that there is no "2" in this column between "1" and "3"...both together seem to strongly indicate that this guess is wrong, unless there is some implementation detail to explain it..
Also, there doesn't seem to be a listing for vertex processing in the numbers. The 2nd could be "TMUs per pipe", which would fit as well for the idea of R3xx lineage, but the last remains a mystery...why would the middle range have a larger number than any other?
Finally, why would the R420 have 16 pipes and the next generation have the same count? The R580 would certainly(!) address this if the 3rd number relates to ALU throughput somehow, but the R520 would mainly seem a fairly "dissatisfactory" stepping stone in relation. I could guess that the R420 might have already implemented something like this (making it a jump from 8 double-pumped to 16), but this wouldn't seem to fit the ROP/pixel processing relationship guesses.
...
Hmm, well, the numbers could just be wrong or incomplete, but this guess doesn't seem to hold together accurately with what is known. I hope it might touch on some relevant things, though.