AMD: R8xx Speculation

mao5 · Aug 19, 2009

chiphell insist it's a dual core design

neliz · Aug 19, 2009

mao5 said:
chiphell insist it's a dual core design

Much like they insisted R600 would wh00p G80's butt?

Jawed · Aug 19, 2009

Jawed said:
each 5-way SIMD is 16 strands wide (just like now) but the ALU:TEX is doubled to 8:1 - this leads to a massive increase in compute density as the TUs currently cost ~29% of a cluster's area, and this would reduce that penalty to a mere ~17%. Put another way that would double per cluster compute for 71% more area.

If each TU is single-cycle fp16, that means each TU could be ~70% bigger. This huge increase in TU area might be motivation to increase ALU:TEX.

If ALUs also get 10% bigger per SIMD (say for new features) and the redundancy/LDS section get's 40% bigger per SIMD (mostly due to LDS doubling in size), then I estimate that the cluster size, overall, will grow by about 110% (before the 40nm shrink).

After 40nm shrink that would be about 11.9mm² per cluster, or about 11% bigger than on RV770. In this scenario the TU would be about 24% of the cluster, which represents a small saving in comparison with RV770.

So, ahem, guesstimating:

Juniper 181mm² - 5 clusters, 800 ALUs, 20 TUs, 16 RBEs, 128-bit
Cedar 220mm² - 8 clusters, 1280 ALUs, 32 TUs, 16 RBEs, 128-bit
Cypress 316mm² - 12 clusters, 1920 ALUs, 48 TUs, 32 RBEs, 256-bit

Jawed

Fusion · Aug 19, 2009

According to the latest rumors AMD's ATI RV870 flagship graphics card launch is almost imminent (somewhere next month)

At an Asian bulletin board, is seems that the RV870 surfaced with specs and all, it supposedly comes with a 384 Bit memory interface, but that info seems shady and somewhat inaccurate, we'll leave that info as is.

Other colleagues published a 3DMark Vantage scores in the performance preset for the mainstream solution RV840 (Juniper) already a few days ago. That chip reached a P9500 points, while they did not give any further details about the test system. But this causes a ranking between HD 4850 and HD 4870.

That same source has also a result of the flagship product RV870 (Cypress) on hand. But they did not disclose exact performance number, but a range in-between P16000 and P18000, which is almost twice the RV840 and passing the HD 4870 X2.

* Cypress/RV870: P16000-P18000
* Juniper/RV840: P95xx
* Redwood/RV830: P46xx

Though 3Dmark says very little, it's safe to assume that the new flagship card will performance double opposed to the last generation.

http://www.guru3d.com/news/ati-rv870-scores-p18000-in-vantage/

CarstenS · Aug 19, 2009

Jawed said:
"Dual-shader" might mean that one cluster contains two 5-way SIMDs, both of which share a quad-TU. This provides two options:

each 5-way SIMD is 8 strands wide, i.e. a thread is 32 wide - this improves branch incoherence penalties substantially

each 5-way SIMD is 16 strands wide (just like now) but the ALU:TEX is doubled to 8:1 - this leads to a massive increase in compute density as the TUs currently cost ~29% of a cluster's area, and this would reduce that penalty to a mere ~17%. Put another way that would double per cluster compute for 71% more area.

Jawed

Maybe it just refers to the direly needed dual-setup-unit, which is indicated by the double-complex scheduler?

Jawed · Aug 19, 2009

CarstenS said:
Maybe it just refers to the direly needed dual-setup-unit, which is indicated by the double-complex scheduler?

You mean the AAAABBBB instruction-issue?

Jawed

Jawed · Aug 19, 2009

http://techpulse360.com/2009/08/12/...en-graphics-chips-you-wont-believe-your-eyes/

[Update 2] AMD will host its Evergreen’s official launch on aircraft carrier U.S.S. Hornet moored in Alameda, Calif.

CarstenS · Aug 19, 2009

Jawed said:
You mean the AAAABBBB instruction-issue?

Jawed

Not sure what you mean, so I cannot tell if I mean the same.

no-X · Aug 19, 2009

Jawed said:
So, ahem, guesstimating:

Juniper 181mm² - 5 clusters, 800 ALUs, 20 TUs, 16 RBEs, 128-bit

Cedar 220mm² - 8 clusters, 1280 ALUs, 32 TUs, 16 RBEs, 128-bit

Cypress 316mm² - 12 clusters, 1920 ALUs, 48 TUs, 32 RBEs, 256-bit

Jawed

Your Juniper specs are very RV670-like. The only significant difference is in higher number of ALUs. I can't believe, that the Juniper could be the same size as RV670 at 1,9x bigger manufacturing process...

Jawed · Aug 19, 2009

CarstenS said:
Not sure what you mean, so I cannot tell if I mean the same.

Two threads are issued on the ALUs, with thread A issued as a single instruction over four cycles AAAA, then thread B takes its turn. So the SIMD looks like it's executing two threads at the same time.

Jawed

Jawed · Aug 19, 2009

no-X said:
Your Juniper specs are very RV670-like. The only significant difference is in higher number of ALUs. I can't believe, that the Juniper could be the same size as RV670 at 1,9x bigger manufacturing process...

Well, there's a black hole labelled D3D11 that seems to be sucking up die - it's just a guess...

For what it's worth I'm wary of these big TUs, just because they have such a large hit on 8-bit performance. But, ahem, ATI tried once before, the question is, when does the tipping point come that makes them worthwhile?

Jawed

no-X · Aug 19, 2009

I'd like to know, what's the major cause of the R600's TMUs performance (compared to todays TMUs) - whether the native FP16 support, or the point samplers. Still thinking about the low (60%) performance difference between HD2900XT and HD4890 in current game w/o FSAA.

mczak · Aug 19, 2009

Jawed said:
For what it's worth I'm wary of these big TUs, just because they have such a large hit on 8-bit performance. But, ahem, ATI tried once before, the question is, when does the tipping point come that makes them worthwhile?

Maybe never, because at this point it would make more sense to just use shader alus for filtering?

trinibwoy · Aug 19, 2009

I think pretty much any single-cycle FP16 implementation would have to provide double throughput for INT8 right? Otherwise that's a lot of wasted space.

Jawed · Aug 19, 2009

trinibwoy said:
I think pretty much any single-cycle FP16 implementation would have to provide double throughput for INT8 right? Otherwise that's a lot of wasted space.

That's why I've been querying how expensive it is to meet the filtering precision specification of D3D11 for 8-bit textures. Single-cycle fp16 TUs are 70% more expensive in R600, apparently. Maybe less (if there is some overhead associated with process/library from back then).

Jawed

fellix · Aug 19, 2009

It is not just the bigger tex units for the 16-bit compute lanes, but what about the texture cache and load bandwidth for sampling & etc. -- this also must be accounted in to the transistor budget.
Anyway, in that perspective, should we expect some advancement in AF quality this time?

Davros · Aug 19, 2009

@jawed
why do you care about 8bit textures in this day and age ?

Jawed · Aug 19, 2009

fellix said:
It is not just the bigger tex units for the 16-bit compute lanes, but what about the texture cache and load bandwidth for sampling -- this also must be accounted in the transistor budget.

The bandwidth is already there. 128 bits of unfiltered data are as fast as 32 bits. L1 is very small, so doubling it is hardly a world of pain - I think RV770 doubled L1 per cluster in comparison with RV670, so for performance parity against the per unit capability of RV670 the cache is already there.

Anyway, in that perspective, should we expect some advancement in AF quality this time?

I guess so, but I don't know what D3D11's filtering specifications cover.

Going back to no-X's comment, Juniper appears to be RV730's replacement. HD4670 was never offering a compelling alternative to HD3870 in terms of absolute performance as far as I can tell - not enough bandwidth - maybe drivers have changed things since its launch?

RV730's 32 TUs were squandered, though it has pretty much the same fp16 filtering rate as HD3870 (does that imply that fp16 rate is more important at the low end?). HD3870 has the stupid 2x Z rate, squandering its bandwidth.

So, ahem, in comparison with RV670, putting 25% more 8-bit and 16-bit texturing into Juniper would still be a performance win, assuming that GDDR5 is there to give some tasty extra bandwidth.

RV740->Cedar, that's more tricky as my speculation says there's only a doubling in fp16 rate and a doubling in ALUs.

Jawed

Jawed · Aug 19, 2009

Davros said:
@jawed
why do you care about 8bit textures in this day and age ?

Pretty much all of what you see on surfaces in games originates as 8-bit textures.

Jawed

CarstenS · Aug 19, 2009

Jawed said:
Two threads are issued on the ALUs, with thread A issued as a single instruction over four cycles AAAA, then thread B takes its turn. So the SIMD looks like it's executing two threads at the same time.
Jawed

Thanks, but no, that's not what I meant. I was merely referring to the "rumor" that started with Olicks presentation on id tech6 and wormed its way through doubled ROP-count on all DX11 chips compared to their DX10 predecessors (which may very well be more than just a rumor).

no-X said:
I'd like to know, what's the major cause of the R600's TMUs performance (compared to todays TMUs) - whether the native FP16 support, or the point samplers. Still thinking about the low (60%) performance difference between HD2900XT and HD4890 in current game w/o FSAA.

Be warned, though CB is using the same driver this time, they'd have to revert to medium details quite often in order to let the X800 and 6800 compete at all.

Jawed said:
That's why I've been querying how expensive it is to meet the filtering precision specification of D3D11 for 8-bit textures. Single-cycle fp16 TUs are 70% more expensive in R600, apparently. Maybe less (if there is some overhead associated with process/library from back then).
Jawed

Would TA be have to beefed up significantly/ a bit in order to support the required 16k-textures?

AMD: R8xx Speculation

How soon will Nvidia respond with GT300 to upcoming ATI-RV870 lineup GPUs

Within 1 or 2 weeks

Within a month

Within couple months

Very late this year

Not until next year

mao5

neliz

GIGABYTE Man

Jawed

Fusion

CarstenS

Moderator

Jawed

Jawed

CarstenS

Moderator

no-X

Jawed

Jawed

no-X

mczak

trinibwoy

Meh

Jawed

fellix

Davros

Jawed

Jawed

CarstenS

Moderator

Similar threads