AMD: R8xx Speculation

How soon will Nvidia respond with GT300 to upcoming ATI-RV870 lineup GPUs

  • Within 1 or 2 weeks

    Votes: 1 0.6%
  • Within a month

    Votes: 5 3.2%
  • Within couple months

    Votes: 28 18.1%
  • Very late this year

    Votes: 52 33.5%
  • Not until next year

    Votes: 69 44.5%

  • Total voters
    155
  • Poll closed .
RV870 (Cypress)

SIMD : 20
Shader Clock : 850
Memory Clock : 1200
Bandwidth : 153Gbps
1024Gb GDDR5

enjoy :)

Oh dear, that doesn't sound right... someone took the 1600 SP's and connected that to 1.2Ghz GDDR5. what are we to do now?
 
1200/20 = 60, 60 < 80
1280/20 = 64, 64/5=12,8 -)
The way i see it 20 SIMDs are only possible with 1600+ SPs.
For 1200 or 1280 SPs the number of SIMDs will be less than 20, probably in 10-16 range.

RV770 had 800SP in 10 SIMD units.
 
RV770 had 800SP in 10 SIMD units.
So?
The number of SPs per SIMD probably won't be less than in RV770.
So 20 SIMDs for RV870 will give us 1600+ SPs. And if we're lucky with 120 SPs per SIMD (+50% increase to RV770) we'll get 2400 SPs.
But that would mean that Juniper has 1200 of them, nearly twice of RV740, wouldn't it?
 
RV870 (Cypress)

SIMD : 20
Shader Clock : 850
Memory Clock : 1200
Bandwidth : 153Gbps
1024Gb GDDR5

enjoy :)

Only thing I am worried about is the memory bandwidth... wow they really went cheap this time, only 19GBs, of course I bet that is just a typo.

If this spec is true, would having 150GBs~ really hinder performance? Since the 4850 has a little more than half of the 4870 and yet still performance fairly well I could imagine this having very little performance impact. I mean really what is everyone expecting? Since pairing it with anything faster would cost a pretty penny, something AMD seems to be against now.
 
I've mused before that the TUs might return to R600-style single-cycle fp16 processing. I'm now wondering if Evergreen unifies TUs and RBEs.

One of the key points of D3D11 is that resources are written/read more fluidly than ever before. Additionally, some of the addressing math for textures and render targets is the same as well as some of the blending. A lot of the fluidity comes from compute shader, but pixel shading provides for writing/reading resources.

So, what if each cluster now contains a combined TU/RBE?

In typical game situations the two are not normally both running flat-out - only some dodgy synthetics from yesteryear work that way. The functional overlap isn't really the biggest deal - to me it's more important the way that memory is more of a write/read resource in D3D11, whereas prior versions kept writing and reading as separate passes.

Jawed
 
What about MIMD?

According to Ailuros, one of the advantages of the SGX compared to other GPUs is the MIMD core because the performance is higher, especially with small triangles. With the "new" tesselation in DX11 maybe AMD thought that they need MIMD to have a higher performance with small triangles.
 
It would be too much trouble to explain to the world that the so far "superscalar" units aren't as superscalar as the new ones ;)

***edit: on a more serious note I'm not so sure AMD/NVIDIA really need MIMD units for the time being; it seems to me that changes like that might come in the more distant future if they haven't come up with an even more efficienct idea in the meantime.
 
no.. this is legit:

Code:
Cypress ~P16xxx - P17xxx - P18xxx
Juniper XT ~P95xx
Redwood ~P46xx

Comparing to 4670 at launch, vr-zone got round ~P35XX(Probably scores higher now but cant find a recent bench). There is a chance the above Redwood is configured with gddr5 also.

Edit:
Looking at mobile GPUs power profile posted back further in this thread Redwood = Madison:

Madison HD5750M 20-30W
Madison HD5730M 20-25W
Madison HD5650M 15-20W

Compare to:
RV730 HD4670M 28-30W
RV730 HD4650M 12-25W

That looks to be potentially an ~30% performance improvement in the same power profile.
 
Last edited by a moderator:
I'm now wondering if Evergreen unifies TUs and RBEs.

Hello Jawed. If they went this way, would be interesting to see how they handle general RT read/write access (UAVs).

Typically one would assume that RTs are divided into tiles and those tiles are distributed across the RBEs (ROPs) with a fixed mapping per RT. So raster stage just sends out fragments to the ALUs associated with the RBEs for the destination tile for the fragments. Both TU/RBE local to the cluster. Seems like the set ALU/RBE tile mapping needs to stay just to insure draw ordering.

Given a unified TU/RBE, random RT access implies read/write from non-local RBEs (or local fetch and some crazy "tile" cache coherency, which I'm guessing is highly unlikely).

Almost seems like a good idea to just distribute both RBE and TU access (distribute tiles) across the chip similar to how global access is distributed to MCs ... except for various problems like TU filtering requires neighboring texels! So TU access probably stays local with read only cache.

How do you see a unified TU/RBE working?
 
Cypress ~HD 4870 X2
Juniper ~HD 4870
Redwood ~HD 3870

According to the ORB. Toss in the rumoured Hemlock and Trillian parts and.....

I may be completely wrong, but I thought Juniper was the highest end single chip solution and Cypress was going to be a dual chip soluition?

If thats the case then what makes this better than the previous generation?
 
Back
Top