AMD: R9xx Speculation

You say the benchmarks are likely to be fake, but I think they show results that are well within expectations. If the 1920 shader in groups of 4 rumours are true, then you have a 20% increase in shaders and up to 20% increase in efficiency. That's up to 44% faster than Evergreen. There could be more, since memory bandwith seems to go up by 33%. If you add increased teseletion power and consider early drivers, a 30% boost at the moment sounds very reasonable to me, maybe around 40% at launch and about 50% when drivers mature. FWIW, IMO, YMMV etc. :)


You are right , the benchmark scores are within expectations if the 1920 SPs and 4d shaders rumors are true .. but never the less , a fake score is a fake score , and if Fermi history of results is anything to go by , expectations aren't always true .
 
1920 SPs/ 4 wide VLIW = 480 TPs. 480 TPs/16 TPs per SIMD = 30 SIMDs. 30 SIMDs *4 TMUs per SIMD = 120 TMUs. How big would be this GPU? Bigger than R600?

That would bring down the ALU:TMU ratio from 5:1 to 4:1. I don't see any reason for doing it.
I could be 20 TPs per SIMD = 24 SIMDs aka 96 TMUs. The ALU:TMU ratio would stay the same. It makes sense with the last rumor from Charlie of Cayman being 380 mm^2, so about 15% bigger than Cypress, since TMUs take the most space on the die-size.
 
For one thing, I hope this new refresh would finally bring acceptable AF quality. Come on [strike]ATi[/strike] AMD, Nvidia beat you four years ago with G80 -- it's about damn time, already! :rolleyes:
 
1920 SPs/ 4 wide VLIW = 480 TPs. 480 TPs/16 TPs per SIMD = 30 SIMDs. 30 SIMDs *4 TMUs per SIMD = 120 TMUs. How big would be this GPU? Bigger than R600?
Only if the 4-wide rumor is true, otherwise we're talking about "only" 96 TMUs.

Also, iirc in RV770 the 10 SIMDs only occupied ~50% of the die area, and the 4 TMUs in each SIMD were small compared to the 80 SPs. Sure, DX11-compatible TMUs might be more complex, but we're at 40nm now, so I don't think 24 TMUs would make that much of a difference.

In any case, even R600 was much smaller than GF100 is, and not much bigger than GF104, so as long as the performance is right, even that kind of size shouldn't be much of a problem.


For one thing, I hope this new refresh would finally bring acceptable AF quality. Come on [strike]ATi[/strike] AMD, Nvidia beat you four years ago with G80 -- it's about damn time, already! :rolleyes:
I hope so, too. Well, ever since Evergreen AMD is technically the leader when it comes to angle-independency, but the problem is that they under-sample too much for performance reasons, which results in that annoying shimmering. Even G80 and G92 are better in that aspect.
 
20 TPs is impossible. It's 4, 8, 16, 32...
Hmm what is a TP again? In any case, I don't think simd width being a power of two is really a hard requirement in theory, it needs to be a multiple of 4 for sure but I'd say npot widths should be doable without too many changes. I think it's very unlikely though AMD is going to mess with that (it's already on the high side anyway).
 
For RV770, I measure 29% for the ALUs including redundancy and LDS and another 12% for the TMUs.
That's also what I had in mind, 28.something % for the ALUs/regs/redundancy and about 41% together with the TMUs and L1 texture caches. Guess it got a bit more with Cypress though. However, it should not be that much of a problem for the die size to stock that up a bit, especially as ATI appears to still use only a 256bit memory controller.

Just to throw up some numbers, if ATi had decided to put some additional 50% of ALUs and TMUs to RV790 instead of just increasing the frequency compared to RV770 (which needed some die size too), it would have resulted in about 52mm² (+ a few mm² for scheduling) more die size. Charlie speculated about a similar increase from Cypress to Cayman.
The rumored number of added units would be relatively the same (+50%, twice in absolute numbers, but we got a shrink in between), and the savings by going to a 4 slot VLIW-Design and the additional transistors for removing some of the bottlenecks in EG may cancel out. That means 380-400mm² appears to be a solid estimate for a hypothetical Cayman with 480 slightly slimmed down units and a 256 bit memory interface.
 
20 ALUs 4-wide per SIMD, instead of 16 ALUs 5-wide per SIMD.
The ALUs count per SIMD stays the same..
Why is impossible?
Because it would change the wavefront size to some strange value like 80 which isn't going to happen. And just imagine the pain it would be to adress something in the local memory with non power of two sizes. A local memory with 20 or 40 banks? Will it be 40kB instead of 32kB now?
 
What circumstances allowed for RV770's die shot to be released?
Will they ever happen again?
There was never a full rationale for releasing the RV770 shot, much less the reason why there still isn't one released for Cypress.
 
Back
Top