AMD: R7xx Speculation

Tchock · May 28, 2008

Lux_ said:
The wording could indicate that FLOPS could "scale" somewhere around 1 TFLOPS on 4870X2.

The Terascale article only involved the 4850/4870 SKUs.

So most probably it is on 1 GPU. And if the die is really 250mm^2, it's quite an accomplishment regardless of how much real power they can squeeze out before being CPU-limited (GPGPU) or limited in other factors.

Quasar, remove the scaling optimization part (aka plan RV670: Steroids), then it's still plausible. I'll admit I'm probably a passerby on all of this, but there's quite some values aren't as straight as expected.

Really don't want to revert to the RV635->RV670 discussion all over again, but from that (perhaps skewed) perspective 60mm^2 is more than enough to include the units. 40 ALUs (x5), 8 TMUs, 12 ROPs, and a double width memory interface. Register sizes are vague (They should have made it bigger with a 2+x ALU power increase, compared to current 670->770 scaling) so I'll let that go.

How did the component sizes fare with R520/580/670 in terms of TMU/ROPs and the associated baggage, respectively? That should get us out of this sea of potential scaling variables.

mboeller · May 28, 2008

It seems AMD/ATI thinks that the RV770Pro is already faster than the 9800GTX

Link: http://www.tomshardware.tw/550,news-550.html

Jawed · May 28, 2008

Hmm, I've just realised that the presence of the PLX chip on HD4870X2 (as apparently revealed by the "impression" in the surface of the cooler) doesn't tell us anything about the link between each GPU.

If I redraw the picture I made earlier:

Code:

    MC        MC   MC        MC
      \------/       \------/
      |      |  bri  |      |
-------      =========      -------
|     |      |  dge  |      |     |
|     /------\       /------\     |
|   MC        MC   MC        MC   |
|                                 |
|                                 |
----------------   ----------------
PCI Express    |   |    PCI Express
                PLX
                 |
                 |
                CPU

the PLX chip is still required to connect two GPUs to a single CPU.

Jawed

MfA · May 28, 2008

Jawed said:
Since textures normally consume hundreds of MB doing that would nullify most of the benefit of sharing.

The biggest benefit of sharing is being able to parallelize rendering of a single frame without running into the problems supertiling ran into (dynamic textures and to a lesser extent redundant work with the vertex shaders). Not saving a lot of memory in comparison to Crossfire rendering is not a big deal.

PS. all in all I think the solution is practical, needing less than a third of main memory bandwidth by my guesstimates, but rather ambitious ... so unfortunately unlikely.

Lukfi · May 28, 2008

=>Jawed: Right, the PLX is needed to separate out the PCIe lines. But I think that if there really was a bridge allowing the chips to share their memory pools, the board layout would be different - something like this:

Code:

[GPU]===[GPU]
  |	  |
  +-[PLX]-+
      |

and not like this (the actual layout):

Code:

[GPU]-[PLX]-[GPU]
        |

…where the interconnect path is longer. And the longer the path, the lesser the bandwidth (frequency) you can get, am I right? We do know for a fact that there is a bus between the GPUs, it's the good ol' CrossFire interconnection which doesn't mind longer distances. I must admit, I have no idea about its bandwidth, but it probably won't be any ball-busting figure. If ATi was serious about having a shared memory pool, I think they'd put the chips closer together to allow for faster interconnect.

ShaidarHaran · May 28, 2008

There is no bridge chip for memory transactions. It's a DMA architecture unified memory pool. Simple stuff perfected in your PC by CPU/memory/chipset designers years ago.

The PLX chip is just for PCI-e transactions.

MfA · May 28, 2008

Lukfi said:
And the longer the path, the lesser the bandwidth (frequency) you can get, am I right?

The longer the path the more power it takes ... but other considerations might be more important (cooling, ground/power plane isolation).

Silent_Buddha · May 28, 2008

What's the possibility that Rv770 supports a 512 bit memory interface but only uses 256 bits for communication with memory while 256 bits is used for chip to chip communication?

EDIT - Hmmm, GDDR 5 would allow using 128 bit to each pool of memory while keeping the same bandwidth as Rv770 pro with 256 bits + GDDR 3. And using 128 bits for chip to chip communication/sharing of memory.

That would provide each chip with relatively equal access to both pools of memory, assuming a NUMA like memory configuration.

Or is the Rv770 die too small to support a 512 bit memory interface?

The other possibility is that that's not actually a PLX chip, but a "northbridge" through which both chips access the memory pool. 256 bit memory interfact from each chip to that northbridge and then 256/512 bit from that chip to the memory. This doesn't seem nearly as elegant though.

Regards,
SB

Jawed · May 28, 2008

MfA said:
The biggest benefit of sharing is being able to parallelize rendering of a single frame without running into the problems supertiling ran into (dynamic textures and to a lesser extent redundant work with the vertex shaders). Not saving a lot of memory in comparison to Crossfire rendering is not a big deal.

PS. all in all I think the solution is practical, needing less than a third of main memory bandwidth by my guesstimates, but rather ambitious ... so unfortunately unlikely.

As far as I can tell you're hypothesising that one GPU performs all vertex shading and transmits some of the resulting fragment workload to the other GPU.

I'm unclear how this scenario avoids the dynamic textures problem you're alluding to, though.

Jawed

Jawed · May 28, 2008

Lukfi said:
And the longer the path, the lesser the bandwidth (frequency) you can get, am I right?

There's no doubt that graphics card memory chips tend to be fairly close to the GPU and the distance we're seeing between the two dies on HD4870X2 seems substantially further. I'm not sure how this compares with memory attached to CPUs. Also, I'm not sure if power is a solution as suggested by MfA.

We do know for a fact that there is a bus between the GPUs, it's the good ol' CrossFire interconnection which doesn't mind longer distances.

I presume that connection is separate and will exist for HD4870X2, enabling a pair of them to work together (in AFR, presumably). Presumably only the "master" GPU in a pair will talk to the "master" on the other X2 board.

I must admit, I have no idea about its bandwidth, but it probably won't be any ball-busting figure. If ATi was serious about having a shared memory pool, I think they'd put the chips closer together to allow for faster interconnect.

The PLX chip is huge. The PCI Express 2 version of the chip is substantially smaller but still a monster - or, at least the package is a monster. As far as layout of X2 is concerned, there may be no choice - even when accounting for the layers available in the PCB.

Jawed

MfA · May 28, 2008

Because the parts of dynamic textures rendered remotely could still be accessed across the bridge.

PS. LVDS and it's cousins are pretty forgiving ... most of us have computers using 3 GHz signaling on unshielded cables.

Lukfi · May 28, 2008

ShaidarHaran said:
There is no bridge chip for memory transactions. (...) The PLX chip is just for PCI-e transactions.

I know that. I was just saying that it would perhaps made better sense if the two GPUs were closer together on the PCB. But then again, maybe there are other, more important factors or they simply didn't have any other choice like Jawed said.

Jawed said:
I presume that connection is separate and will exist for HD4870X2, enabling a pair of them to work together (in AFR, presumably). Presumably only the "master" GPU in a pair will talk to the "master" on the other X2 board.

This kind of bridging should be the same as with the R680, so we might as well look there for answers. I remember a photo of four Radeons plugged into one motherboard and they were connected in a chain-like manner, as opposed to 3-way SLI which requires a ring-like bridging. So perhaps it's similar with the X2 cards.

Jawed · May 28, 2008

MfA said:
Because the parts of dynamic textures rendered remotely could still be accessed across the bridge.

So you're suggesting that this is actually supertiling, with sharing solely for dynamic textures. OK. It would be a start...

Jawed

MfA · May 28, 2008

Well supertiling also doubled the vertex load, so it's potentially a little better than that.

Jawed · May 28, 2008

MfA said:
Well supertiling also doubled the vertex load, so it's a little better than that.

A doubled VS workload should be trivial for CrossFire unified GPUs, which is why I'm puzzled not to see it happening.

Does anyone know if supertiling is available on R6xx GPUs? Has it been benchmarked?

Geometry shader (particularly stream out) does make the prospect of each GPU doing all of the pre-setup work pretty unattractive, though. Maybe that's the deal breaker, even though it's irrelevant for the vast majority of games.

Jawed

Lukfi · May 28, 2008

SuperTiling doesn't work at all AFAIK.

rwolf · May 28, 2008

Lukfi said:
=>Jawed: Right, the PLX is needed to separate out the PCIe lines. But I think that if there really was a bridge allowing the chips to share their memory pools, the board layout would be different - something like this:

Code:

[GPU]===[GPU] | | +-[PLX]-+ |

and not like this (the actual layout):

Code:

[GPU]-[PLX]-[GPU] |

…where the interconnect path is longer. And the longer the path, the lesser the bandwidth (frequency) you can get, am I right? We do know for a fact that there is a bus between the GPUs, it's the good ol' CrossFire interconnection which doesn't mind longer distances. I must admit, I have no idea about its bandwidth, but it probably won't be any ball-busting figure. If ATi was serious about having a shared memory pool, I think they'd put the chips closer together to allow for faster interconnect.

http://pc.watch.impress.co.jp/docs/2007/1204/kaigai404_01l.gif

This diagram shows a multi-chip module that has a crossfire connector and a single connection to the PLX from the module. The chips are so close they are packaged together.

Kaotik · May 28, 2008

rwolf said:
http://pc.watch.impress.co.jp/docs/2007/1204/kaigai404_01l.gif

This diagram shows a multi-chip module that has a crossfire connector and a single connection to the PLX from the module. The chips are so close they are packaged together.

It's old speculation, and the R700 heatsink (at least the claimed one) tells straight away it's not possible

no-X · May 28, 2008

I think it's more than a speculation, but it seems that current R7xx is a bit different to original R7xx design. Something like R400/R420...

Slyne · May 28, 2008

The way the market has been shown to work, if you want to gain market share you have to have a better product for less money. Now, the slide with the $329 price could be fake, but if we accept it then there is no need for a PhD in economics to make the following statement:
Trying to push an inferior product with a higher price when you own the smaller part of a market where your competitor's product reigns supreme in the collective mind is suicidal.

Therefore we must assume that RV770 is more than a 4xZ per clock RV670, or ATI will have lost the performance section of the market, after abandoning the high-end last year. And what would the motivation be to go that way? Playing it safe? Companies don't play it safe when they're the outsider, they play it safe when they're comfortably in the lead. ATI's own 9800 was playing it safe, but now is not the time.

To gain the performance lead, all R520 needed was more ALUs, and R580 got them. This time (and in that price category), it's accepted what RV670 needs is more TMUs. Why would ATI go any other direction?

AMD: R7xx Speculation

Tchock

mboeller

Jawed

MfA

Lukfi

ShaidarHaran

hardware monkey

MfA

Silent_Buddha

Jawed

Jawed

MfA

Lukfi

Jawed

MfA

Jawed

Lukfi

rwolf

Rock Star

Kaotik

Drunk Member

no-X

Slyne

Similar threads