AMD: R8xx Speculation

neliz · Aug 18, 2009

Forrest said:
RV870 (Cypress)

SIMD : 20
Shader Clock : 850
Memory Clock : 1200
Bandwidth : 153Gbps
1024Gb GDDR5

enjoy

Oh dear, that doesn't sound right... someone took the 1600 SP's and connected that to 1.2Ghz GDDR5. what are we to do now?

w0mbat · Aug 18, 2009

Im already enjoing it! Like a cold shower

DegustatoR · Aug 18, 2009

Forrest said:
SIMD : 20

1200/20 = 60, 60 < 80
1280/20 = 64, 64/5=12,8 -)
The way i see it 20 SIMDs are only possible with 1600+ SPs.
For 1200 or 1280 SPs the number of SIMDs will be less than 20, probably in 10-16 range.

neliz · Aug 18, 2009

DegustatoR said:
1200/20 = 60, 60 < 80
1280/20 = 64, 64/5=12,8 -)
The way i see it 20 SIMDs are only possible with 1600+ SPs.
For 1200 or 1280 SPs the number of SIMDs will be less than 20, probably in 10-16 range.

RV770 had 800SP in 10 SIMD units.

DegustatoR · Aug 18, 2009

neliz said:
RV770 had 800SP in 10 SIMD units.

So?
The number of SPs per SIMD probably won't be less than in RV770.
So 20 SIMDs for RV870 will give us 1600+ SPs. And if we're lucky with 120 SPs per SIMD (+50% increase to RV770) we'll get 2400 SPs.
But that would mean that Juniper has 1200 of them, nearly twice of RV740, wouldn't it?

sc3252 · Aug 18, 2009

Forrest said:
RV870 (Cypress)

SIMD : 20
Shader Clock : 850
Memory Clock : 1200
Bandwidth : 153Gbps
1024Gb GDDR5

enjoy

Only thing I am worried about is the memory bandwidth... wow they really went cheap this time, only 19GBs, of course I bet that is just a typo.

If this spec is true, would having 150GBs~ really hinder performance? Since the 4850 has a little more than half of the 4870 and yet still performance fairly well I could imagine this having very little performance impact. I mean really what is everyone expecting? Since pairing it with anything faster would cost a pretty penny, something AMD seems to be against now.

Jawed · Aug 18, 2009

I've mused before that the TUs might return to R600-style single-cycle fp16 processing. I'm now wondering if Evergreen unifies TUs and RBEs.

One of the key points of D3D11 is that resources are written/read more fluidly than ever before. Additionally, some of the addressing math for textures and render targets is the same as well as some of the blending. A lot of the fluidity comes from compute shader, but pixel shading provides for writing/reading resources.

So, what if each cluster now contains a combined TU/RBE?

In typical game situations the two are not normally both running flat-out - only some dodgy synthetics from yesteryear work that way. The functional overlap isn't really the biggest deal - to me it's more important the way that memory is more of a write/read resource in D3D11, whereas prior versions kept writing and reading as separate passes.

Jawed

Thowllly · Aug 18, 2009

Forrest said:
RV870 (Cypress)

SIMD : 20
Shader Clock : 850
Memory Clock : 1200
Bandwidth : 153Gbps
1024Gb GDDR5

enjoy

Only 19GB/s of BW (32bit bus?), as sc3252 pointed out, but on the other hand it has 128GB of ram!

Squilliam · Aug 18, 2009

Wait, is this legit?

neliz · Aug 18, 2009

Squilliam said:
Wait, is this legit?

no.. this is legit:

Code:

Cypress ~P16xxx - P17xxx - P18xxx
Juniper XT ~P95xx
Redwood ~P46xx

CarstenS · Aug 18, 2009

Jawed said:
AMD has gone on record saying this is the most radical change since R600...

Well, it'd better be, else no DX11, right?

mboeller · Aug 18, 2009

What about MIMD?

According to Ailuros, one of the advantages of the SGX compared to other GPUs is the MIMD core because the performance is higher, especially with small triangles. With the "new" tesselation in DX11 maybe AMD thought that they need MIMD to have a higher performance with small triangles.

Ailuros · Aug 18, 2009

It would be too much trouble to explain to the world that the so far "superscalar" units aren't as superscalar as the new ones

***edit: on a more serious note I'm not so sure AMD/NVIDIA really need MIMD units for the time being; it seems to me that changes like that might come in the more distant future if they haven't come up with an even more efficienct idea in the meantime.

rjc · Aug 18, 2009

neliz said:
no.. this is legit:

Code:

Cypress ~P16xxx - P17xxx - P18xxx Juniper XT ~P95xx Redwood ~P46xx

Comparing to 4670 at launch, vr-zone got round ~P35XX(Probably scores higher now but cant find a recent bench). There is a chance the above Redwood is configured with gddr5 also.

Edit:
Looking at mobile GPUs power profile posted back further in this thread Redwood = Madison:

Madison HD5750M 20-30W
Madison HD5730M 20-25W
Madison HD5650M 15-20W

Compare to:
RV730 HD4670M 28-30W
RV730 HD4650M 12-25W

That looks to be potentially an ~30% performance improvement in the same power profile.

gamervivek · Aug 18, 2009

Forrest said:
RV870 (Cypress)

SIMD : 20
Shader Clock : 850
Memory Clock : 1200
Bandwidth : 153Gbps
1024Gb GDDR5

enjoy

hudd mama hudd.

trinibwoy · Aug 18, 2009

neliz said:
no.. this is legit:

Code:

Cypress ~P16xxx - P17xxx - P18xxx Juniper XT ~P95xx Redwood ~P46xx

Cypress ~HD 4870 X2
Juniper ~HD 4870
Redwood ~HD 3870

According to the ORB. Toss in the rumoured Hemlock and Trillian parts and.....

neliz · Aug 18, 2009

Hemlock:

http://www.chiphell.com/uploadfile/2009/0818/20090818094853224.jpg

6pin +8 pin power connectors, and PCB is a little bit longer than RV870.

Cypress pic was here:

http://www.chiphell.com/uploadfile/2009/0728/20090728094608914.png

Edit: Fan P/N is exactly the same as on the HD2900XT 7121030500G

rpg.314 · Aug 18, 2009

trinibwoy said:
Cypress ~HD 4870 X2
Juniper ~HD 4870
Redwood ~HD 3870

According to the ORB. Toss in the rumoured Hemlock and Trillian parts and.....

So they indeed managed to double the Perf/$. Assuming cypress sells at $300, of course. Cool.

TimothyFarrar · Aug 18, 2009

Jawed said:
I'm now wondering if Evergreen unifies TUs and RBEs.

Hello Jawed. If they went this way, would be interesting to see how they handle general RT read/write access (UAVs).

Typically one would assume that RTs are divided into tiles and those tiles are distributed across the RBEs (ROPs) with a fixed mapping per RT. So raster stage just sends out fragments to the ALUs associated with the RBEs for the destination tile for the fragments. Both TU/RBE local to the cluster. Seems like the set ALU/RBE tile mapping needs to stay just to insure draw ordering.

Given a unified TU/RBE, random RT access implies read/write from non-local RBEs (or local fetch and some crazy "tile" cache coherency, which I'm guessing is highly unlikely).

Almost seems like a good idea to just distribute both RBE and TU access (distribute tiles) across the chip similar to how global access is distributed to MCs ... except for various problems like TU filtering requires neighboring texels! So TU access probably stays local with read only cache.

How do you see a unified TU/RBE working?

pjbliverpool · Aug 18, 2009

trinibwoy said:
Cypress ~HD 4870 X2
Juniper ~HD 4870
Redwood ~HD 3870

According to the ORB. Toss in the rumoured Hemlock and Trillian parts and.....

I may be completely wrong, but I thought Juniper was the highest end single chip solution and Cypress was going to be a dual chip soluition?

If thats the case then what makes this better than the previous generation?

AMD: R8xx Speculation

How soon will Nvidia respond with GT300 to upcoming ATI-RV870 lineup GPUs

Within 1 or 2 weeks

Within a month

Within couple months

Very late this year

Not until next year

neliz

GIGABYTE Man

w0mbat

DegustatoR

neliz

GIGABYTE Man

DegustatoR

sc3252

Jawed

Thowllly

Squilliam

Beyond3d isn't defined yet

neliz

GIGABYTE Man

CarstenS

Moderator

mboeller

Ailuros

Epsilon plus three

rjc

gamervivek

trinibwoy

Meh

neliz

GIGABYTE Man

rpg.314

TimothyFarrar

pjbliverpool

B3D Scallywag

Similar threads