AMD: R9xx Speculation

rpg.314 · Nov 21, 2010

DarthShader said:
http://bbs.expreview.com/viewthread.php?tid=37918&from=recommend_f

Only 160GB/s bandwidth? Only 1250mhz memory clock?

Looks real but WEIRD at the same time. Basically same mem bw, 30 SIMDs but prolly organized as two clusters and SIMDs and TMUs decoupled all over again.

no-X · Nov 21, 2010

leoneazzurro said:
So or the slide is fake or there is a typo or the TMU are "decoupled" now (á la R600)?

Maybe this change maybe somewhat related to the patent that Jawed linked some time ago(?)

SimBy · Nov 21, 2010

2 poly/clock?

Alexko · Nov 21, 2010

no-X said:
I think the number of TMUs is more interesting. R7xx/8xx had 4 TMUs per SIMD, but Cayman doesn't.

Well it would fit 24 SIMDs with 80 SPs each… but obviously that contradicts the slide itself. Weird. So either the TMUs are decoupled or it's yet another fake.

Jawed · Nov 21, 2010

Hmm, so this means ALU:TEX is 5:1 (in terms of cycles) rather than 4:1 as it has been for years now. So perhaps there's something in those patent applications that I've linked several times

I expect this will be fine for games, 80 TMUs in Cypress seem to be wasted anyway.

Compute applications which depend on L1->ALU bandwidth might be a bit constrained. Though there's always the possibility that TEX->ALUs could be beefed-up. If, as one of the patent applications seems to suggest, ALU's can write to the L1s, then that'll be more interesting...

2 polys per clock is definitely what we want to see.

After Barts's revealing that 16 ROPs ~ 32 ROPs as far as performance goes, I think it's reasonable to expect Cayman to be significantly more bandwidth efficient, and for 32 Cayman ROPs to be worth significantly more than 32 Cypress ROPs.

I can't see anything here that looks faked, and I'm cautiously optimistic it'll work out well...

One possible arrangement?:

30 SIMDs - each 16 ALUs with 64 ALU lanes
12 octo-TMUs - totalling 96 TMUs
Each set of 10 SIMDs has 4 octo-TMUs

Or?:

30 SIMDs - each 16 ALUs with 64 ALU lanes
12 octo-TMUs - totalling 96 TMUs
Each set of 15 SIMDs has 6 octo-TMUs

I dare say the latter accords with 2 polys per clock.

fellix · Nov 21, 2010

The spec's from this slide surprisingly coincide with my early prediction for Cayman architectural layout here.

Mianca · Nov 21, 2010

leoneazzurro said:
So or the slide is fake or there is a typo or the TMU are "decoupled" now (á la R600)?

The rumor about "decoupled" TMUs is rather old now. It came from the same source that indicated a 6 module architecture (with each module sharing 4x4 TMUs and 1 dedicated tessellator unit) ...

See my earlier post.

PSU-failure · Nov 21, 2010

Jawed said:
30 SIMDs - each 16 ALUs with 64 ALU lanes

12 octo-TMUs - totalling 96 TMUs

Each set of 15 SIMDs has 6 octo-TMUs

I dare say the latter accords with 2 polys per clock.

Makes sense, considering Barts probably has 2x8 SIMD, there would be the same macro-redundancy in Cayman (1 spare SIMD for each bank of 16).

I was also thinking AMD could deviate from Hemlock and use high yield, partly disabled dies (perfect if the die is big as it'll allow for $400-500-650 prices, or something similar).

Jawed · Nov 21, 2010

fellix said:
The spec's from this slide surprisingly coincide with my early prediction for Cayman architectural layout here.

My interpretation of the patent applications is that an octo-TMU can deliver 4 texturing results, based on 64-bit texels (e.g. fp16 RGBA texels), per clock to an ALU.

So one arrangement we might see for a shader engine, assuming 2 shader engines (quick and dirty photochop from your picture fellix

):

This consists of 3 clusters - each containing 5 SIMDs and 2 octo-TMUs. Each octo-TMU can deliver its results to any of the 5 pairs of ALUs aligned with it, delivering 2 quads of results to the respective ALU quads or a single quad of results to one or the other of the pair.

rpg.314 · Nov 21, 2010

fellix said:
The spec's from this slide surprisingly coincide with my early prediction for Cayman architectural layout here.

But do 2 tris/clk fit with that too? If so, how?

fellix · Nov 21, 2010

rpg.314 said:
But do 2 tris/clk fit with that too? If so, how?

One triangle per clock fits well with Cypress' two SIMD blocks, so what's the trouble here?
Just making an analogy.

no-X · Nov 21, 2010

It would be 50% more efficient - now 1 of 2 rasterizers is nearly always idling. With 2 setups and 3 rasterizers, only one of them would be idling...

DavidGraham · Nov 21, 2010

Now we can safely say that HD 5970 will be at least 30% more powerful than HD 5870.

Wirmish · Nov 21, 2010

Maybe it's another fake... :???:

... maybe not.

· Difference in size between the "3" and the "0".
· Different font for the "30" and the "32".
· GDDR5 @ 5 GHz seems too slow especially since 2Gb @ 6 GHz is available from Hynix (H5GQ2H24MFR-R0C), Samsung (K4G20325FC-HC03), and Elpida (EDW2032BABG-60-F).
· Number of TMUs vs SIMD seems a little strange.

gongo · Nov 21, 2010

That is a lot of SP! ...is it still 4+1? I guess the more SP is needed for MLAA with only 32ROPS....wonder why they did not bump up Barts SP....looks a perf gap between AMD Cayman and Barts will be formed....if the 2GB vram is true...on to the bandwidth...i know AMD recent gpus are not bandwidth limited...i guess they are made with GDDR5 limitations in mind or will it finally hold back Cayman massive SP count...i think Cayman will be powerful...$499?

SimBy · Nov 21, 2010

Wirmish said:
Maybe it's another fake... ... maybe not.

Maybe, maybe not. I guess it's a photo of presentation slide on projection screen taken from an angle so that may very well be the reason for the things you pointed out.

neliz · Nov 21, 2010

SimBy said:
Maybe, maybe not. I guess it's a photo of presentation slide on projection screen taken from an angle so that may very well be the reason for the things you pointed out.

Lol, zeroes don't shrink when they come closer

Tridam · Nov 21, 2010

It's a fake based on the slide #72 exposed at the event in LA last month.

SimBy · Nov 21, 2010

Good call then Wirmish.

caveman-jim · Nov 21, 2010

neliz said:
I think everyone seems to be happy with a 5 days lead on the cards (including the weekend), just like the GTX580.

The full time guys are, the part time guys want 2 weeks.

AMD: R9xx Speculation

rpg.314

no-X

SimBy

Alexko

Jawed

fellix

Mianca

PSU-failure

Jawed

rpg.314

fellix

no-X

DavidGraham

Wirmish

gongo

SimBy

neliz

GIGABYTE Man

Tridam

SimBy

caveman-jim

Similar threads