AMD: R9xx Speculation

IIRC, ATi's RBEs are tied to the memory controller so you wouldn't have two chips with the same memory interface and different number of RBEs as you have listed here with Barts and Cayman.
FWIW, the ROPs are tied to MC with nvidia just the same, but, as others have said, the ratio can change (haven't seen yet products in fermi family with different ratio, but I definitely expect gf108 to have less rops even though it also seems to have 128bit interface).

Rv770 16ROP/256bit
Cypress 32ROP/256bit
I think a better example would be chips from the same family:
rv770 - 16 ROP/256bit
rv740 - 16 ROP/128bit
Juniper 16ROP/128bit
Redwood 8ROP/128bit
 
My speculation:
Cayman XT: 1920SP(30 SIMD)/120TMU/32ROP/256bit
Barts XT: 1280SP(20 SIMD)/80TMU/16ROP/256bit
Turks XT: 512SP(8 SIMD)/32TMU/8ROP/128bit
Caicos: 128SP(2 SIMD)/8TMU/4ROP/64bit

I think Turks would be slower than Juniper.If Turks match Juniper's performance,it will require a 6pin connecter,which is highly unlikely for a Redwood replacement,

I agree on SP, ROPs and interfaces.

However, I think it's possible AMD went for 128 SP per SIMD, so we may see only half the number of TMUs. Evergreens TMUs are weak compared to any Nvidian post-G7x TMUs (with optimizations turned off [AI off], even a 5870 takes a stronger hit from enabling 16xAF than for example an 8800 GT), so I hope for better TMUs, not more (I wouldn't mind getting both, of course...).

Also, for Cayman I wouldn't be surprised if the full chip actually had 32 (or if I turn out to be right, 16) SIMDs, although they might not enable all of them, even on the XT.
 
And wrong for that matter ...

Yes. Think so too. The mistake was the 6990. It was the first time to hear about that card.

I would think of something like that:

Antilles XT (2xCayman XT)
Antilles Pro (2xCayman Pro)
Cayman XT 1920 SPs/120TMUs
Cayman Pro app. 1700 SPs eg 1728 SPs/108TMUs
Cayman LE app. 1500 SPs eg 1536 SPs/96TMUs
Barts XT 1280 SPs/80TMUs
Barts Pro 1152 SPs/72TMUs
 
I know the ratio of ROPs to 64-bit MC channel can change between generations of GPUs, but have they ever varied within the same generation?
 
I know the ratio of ROPs to 64-bit MC channel can change between generations of GPUs, but have they ever varied within the same generation?

As posted above,
Redwood 128bit / 8 ROPs
Juniper 128bit / 16 ROPs

RV770 256bit 16 ROPs
RV740 128bit 16 ROPs
 
Yes. Think so too. The mistake was the 6990. It was the first time to hear about that card.

I would think of something like that:

Antilles XT (2xCayman XT)
Antilles Pro (2xCayman Pro)
Cayman XT 1920 SPs/120TMUs
Cayman Pro app. 1700 SPs eg 1728 SPs/108TMUs
Cayman LE app. 1500 SPs eg 1536 SPs/96TMUs
Barts XT 1280 SPs/80TMUs
Barts Pro 1152 SPs/72TMUs
Yeah I'm thinking along the exact same numbers too. I also expect Antilles to use downclocked Caymans.
 
We have a very very nice kind of speculation here:



http://semiaccurate.com/forums/showthread.php?p=67549#post67549

http://semiaccurate.com/forums/showpost.php?p=67541&postcount=655


Very nice, indeed. That's what I expect too. :oops:

Cayman XT: 429 US$, 35% faster than HD 5870 on average
Cayman Pro: 339 US$, 25% faster than HD 5870 on average, matching GTX 480 512 SP performance wise
Cayman LE: 269 US$, 15% faster than HD 5870 on average

This doesn't really make sense to me. Why would the spread of price/performance relative to the HD 5870 be equal to the spread in performance between the 5850 / 5870 and yet the spread of prices is 60% (60$) more between them?

What would probably make more sense given equal pricing is:

Cayman LE $269 = 5870
Cayman Pro $339 20% faster than 5870
Cayman XT $429 40% faster than 5870 + 2GB RAM.
 
Oh, I thought they were doing interpolation using regular ALU arithmetic. If they have special interpolation instructions and an abundance of ALUs how does it still become a bottleneck?
It isn't really a bottleneck but previously interpolation was "free" in the sense it didn't consume any ALU slots. I don't think it's a big deal though and isn't really felt anywhere typically except with the gpus which have the twice larger tex:alu ratio (Cedar).
 
Very few cases have shown this to be a bottleneck, at least higher than the value products. In most cases its proven to be neutral from a performance perspective and sometimes beneficial (see the other thread on Starcraft performance, for instance).

Note, the change was made because it it part of the DX11 spec.
 
Dave can you clarify this for me? Part of DX11 spec meaning that DX11 requires interpolation done by the shaders or DX11 requires interpolation done in a more precise/flexible way than your previous hardwired interpolators, so that it made sense to have it done by the shaders instead of redesigning a fixed-function-unit for it?

Oh, and btw: Could you please save HD 6870 again as you already saved HD 4850? I want 2 gig as a standard... :)
 
A reason to (at least partially) move interpolation to the shader cores is the new pull model for attributes evaluation in DX11, as shaders have now various ways to dynamically request the evaluation of an attribute.
 
Ok then.

600 series or less < or = 6.25A
700 series < or = 12.5A
800 series < or = 18.75A
900 series < or = 25A

eh, the problem with this quote of yours,

A good way to prevent people from taking a lower end SKU and overclocking it to much higher performance is to limit the available power.
So for instance if some of the above speculation is true then Barts Pro with a power <150W could not be turned into Barts XT under overclock conditions and the same is true for Cayman Pro with <225W which cannot be turned into Cayman XT.

You don't limit the available power by not giving another PCIE connection, you do it by cutting down the Vgpu. The lack of another PCIE connector(or the TDP figure) doesn't have to do much with how high you are going on air or on water, LN2 maybe.

http://67.90.82.13/forums/showpost.php?p=3479660&postcount=1247
 
that 98.5% is an eerily precise number. i dont think a simulator could get you that much accuracy, not that information would even be leaked.

that article from AT he cited is specious. we already know that alu's are not a bottleneck for shaders most of the time. but die area and power? a bottleneck indeed. think performance per mm2 or W, not per clock.
 
that 98.5% is an eerily precise number. i dont think a simulator could get you that much accuracy, not that information would even be leaked.

that article from AT he cited is specious. we already know that alu's are not a bottleneck for shaders most of the time. but die area and power? a bottleneck indeed. think performance per mm2 or W, not per clock.

Some guy at anandtech.com forums "leaked" that a 4d shader has 98.5% of the performance of a 5d and everyone seems to be running with it...

http://forums.anandtech.com/showpost.php?p=30402647&postcount=497

Not 100% sure that the above post is the first reference to the 98.5% performance but it is the first I saw. Also I don't ever recall that tempered guy having any sort of inside info... though I am not a member of AT so I can't really review his posts, I do read the GPU forum quite a bit for a laugh.
 
Back
Top