AMD: R9xx Speculation

Isn't that Caicos on the left? Looks like 32 TMUs and 16 ROPs. Can't really make out the number of SPs, though, but I guess 160×4 is a safe bet.
 
Turks? Where is Turks?

Can't really make out the number of SPs, though, but I guess 160×4 is a safe bet.

I would say, anything between 100 and 160. It doesnt look like alle RPEs must have the same amount of SPs.

RPE sounds very much like marketing...:rolleyes:
 
confidential237549.jpg


RPE?

My idea: One RPE is one "ultra threaded dispatch processor" and one rasterizer (and one Shader/TMU block)..
 
Isn't that Caicos on the left? Looks like 32 TMUs and 16 ROPs. Can't really make out the number of SPs, though, but I guess 160×4 is a safe bet.

That would mean Turks is the slowest chip, which doesn't make much sense but it's just name so who knows

edit:
IMO it looks like this
Caicos - Barts - Cayman
160x4 - 320x4 - 480x4
32 - 64 - 96
16 - 32 - 48 (could be 32 too but 48 would "fit" the rest better)
1 - 2 - 3
 
RPE could just be like Cypress's dual setup, it doesn't have to be all the way like the GPCs.
Is Cayman 96/32/3 behind the blur? It's not 4 at least.
 
RPE could just be like Cypress's dual setup, it doesn't have to be all the way like the GPCs.
It could , however , it is more logical that a RPE is like a GPC , because it scales perfectly with core count :

Caicos : 1 RPE = 640 SPs
Barts : 2 RPE = 1280 SPs (640x2)
Cayman : 3 RPE = 1920 SPs (640x3)
 
As long as the slides are sufficiently blured we can make sensible configurations out of them, instead of calling fake due to inconsistent numbers:LOL:
 
It could , however , it is more logical that a RPE is like a GPC , because it scales perfectly with core count :

Caicos : 1 RPE = 640 SPs
Barts : 2 RPE = 1280 SPs (640x2)
Cayman : 3 RPE = 1920 SPs (640x3)

The problem is you know what nvidia did with Fermi. Yet you think that AMD will do the same.
 
As long as the slides are sufficiently blured we can make sensible configurations out of them, instead of calling fake due to inconsistent numbers:LOL:
Makes me wonder .. why blur the rest of the specs ? I understand the need for blurring the superior and inferior parts , but why the specs ?

If Cayman is 480 4-way SPs, that makes 30x16-way SIMDs or 10 SIMDs per RPE, for 3xRPEs.
At least that took care of the wavefront problem , it sets now at 64 as it should be .
 
If Cayman is 480 4-way SPs, that makes 30x16-way SIMDs or 10 SIMDs per RPE, for 3xRPEs.

Don't forget the TMUs. 96 TMUs and 30 SIMDs - this can't work.

Or we go back to the R600 style but this time with 2 clocks latency.
So one RPE has than: one TMU-SIMD (32 TMUs) and 5 32-way-SIMDs.
 
Last edited by a moderator:
Yeah. But the problem is still the same: 32 TMUs and 160x4 TPs - this can't work (RV770 style).
That's no different from 64 TMUs and 320x4 ALU lanes.

In both cases it seems an RPE consists of 10 SIMDs. With 32 TMUs.

Which would imply TMUs are shared within an RPE by all the SIMDs ...
Or R600 is back.
... or something along the lines of the patents I've been talking about, where TMUs are shared by SIMDs. The patents talk about a "processor" producing two filtered results independently and also sharing texel data (unfiltered texels, not texel results) amongst L1s, with 2 TMUs seemingly sharing an L1. Those two concepts would appear to tally with this peculiar setup.

R600 shares only results, I think, not original texel data (or, if you prefer, texel data isn't shared amongst L1s, only amongst L2s). Though it would be funny if a ring-bus appeared.

Barts Pro, presumably, has SIMDs turned off. It presumably also has TMUs turned off. So with SIMDs being much larger than TMUs, it's likely that while only 1 quad-TMU per RPE is turned off, 2 or more SIMDs would be turned off.

e.g. 1024 ALU lanes and 56 TMUs.

I have to admit I've got a queasy feeling about the "non-integer-multiple" SIMD:quad-TMU thing going on here.
 
Wasn't R600 just like Rv770 ? i.e: it used 4 shader clusters (80 SPs each) , with a texture quad block for each cluster ?

This is the main difference: The R600 design had a decoupled TMU-SIMD. So the R600 had five SIMDs: one TMU-SIMD and four Shader-SIMDs.
 
Last edited by a moderator:
Is it me, or there's a potential imbalance -- an opposite case to Fermi -- in Cayman's spec's with only 32 ROPs but 30 SIMDs (48 pixels?), regarding pixel throughput from the fragment pipeline to the back-end?

p.s.:

47099457.jpg
 
This is the main difference: The R600 design had a decoupled TMU-SIMD. So the R600 hat five SIMDs: one TMU-SIMD and four Shader-SIMDs.
I see , thanks for the heads up .

Is there a possibility that we are tackling the wrong side of the problem ? Cayman could have 24 SIMDS (80SPs each) and maintain the right texture arrangement . (4x24 = 96) ?

I know the wavefront problem would persist , but what is more likely ? a change in wavefront (with subsequent load on the compiler possibly degrading performance ) or a change in texture quads arrangement ? what is the least harmful option ? doesn't TMU sharing add latency and conflicts?
 
Back
Top