AMD: R7xx Speculation

compres · May 29, 2008

If they cooler pic is true, then this is unlikely as you say. If the approach is an MCM I see this a doable.

ShaidarHaran · May 29, 2008

Slyne said:
Somehow I don't see a shared ring make it all the way from one end of the PCB to the other end, if the cooler pic that was leaked recently is to be trusted. That would pointlessly complicate clock management IMHO.

I never said it would be easy...

The (relatively) cheap GPUs offset the increased PCB cost. If R700 were an MCM, I don't think this conversation would be necessary.

Have we seen any (confirmed) 4780 X2 PCB shots yet?

Kaotik · May 29, 2008

ShaidarHaran said:
I never said it would be easy...

The (relatively) cheap GPUs offset the increased PCB cost. If R700 were an MCM, I don't think this conversation would be necessary.

Have we seen any (confirmed) 4780 X2 PCB shots yet?

Nope, only claimed cooler

Mintmaster · May 29, 2008

ShaidarHaran said:
Yes, a shared ring is the likely approach here.

You have any idea how many pads you need on the chip for that? Even if you had both chips on the same substrate, there's no room to just add 1000+ pads.

Once upon a time I flirted with the same idea, but it's just not realistic. You really do need a custom high speed connection to pull it off.

Jawed · May 29, 2008

Mintmaster said:
You really do need a custom high speed connection to pull it off.

And what I'm wondering is if they can use GDDR5 protocol to link two GPUs... GDDR5 appears to have symmetric training (i.e. both ends can train their interfaces) which I imagine is a crucial feature when linking two "equal" chips.

Jawed

compres · May 29, 2008

What about what they did with the Xenos core? Not similar at all?

Also, what prevents AMD from making communication a fast serial interface(less pins)?

Mat3 · May 29, 2008

Jawed said:
And what I'm wondering is if they can use GDDR5 protocol to link two GPUs... GDDR5 appears to have symmetric training (i.e. both ends can train their interfaces) which I imagine is a crucial feature when linking two "equal" chips.

Like dual-port RAM?

Arty · May 29, 2008

but it looks like the 770 and the 260 will be about equal for most things.

This means that buying a GT260 board will cost about 50 per cent more than an R770 for equivalent performance.

Courtesy: Inq

Arun, you might have another passenger on the 800SP train ..

mczak · May 29, 2008

Jawed said:
Sigh I even attempted the calculation more than once, though now I'm getting 512GB/s
ARGH, I meant 4 texels per pixel not 16.

Ok, with 4 texels per pixel I agree I come up with 512GB/s too

.

I was trying to come up with the worst case texturing bandwidth. Assuming that fp16 texels aren't compressed and considering bilinear filtering with minication of at least 50% with no mipmap so that every pixel is filtered from 4 distinct texels. Anyway it was a total bust

I suspect the "real" worst case could even be worse since the granularity of a single texel fetch (i.e. the texture cache line size) could be more than 8 bytes (in fact it could be burst length * channel width?).
But anyway, I think this worst case is not relevant. You're not expected to get full performance if you don't use mipmaps...

Yeah I agree, textures would need tiling to help with load-balancing. The pathological case is possible even with a single GPU (i.e. reading all texels from just one memory channel instead of from all four, say).

Ah right forgot about that. This is probably more fine-grained though than what the tiling across chips would be? This is a slight problem here, for better balancing you'd want small tiles but for better cache efficiency you'd want large tiles.

I assume that the RBEs are assigned to screen space tiles so that they only use GPU-attached, not foreign, memory. So there should be no cross-GPU traffic relating to colour/Z/stencil operations. The only render target related cross-GPU traffic should occur when stitching-together the tiles from the constituent GPUs.

Yes, that's what I'm thinking too. RBEs not using local memory just plain doesn't make sense.

Jawed · May 29, 2008

compres said:
What about what they did with the Xenos core? Not similar at all?

Xenos is interesting because there are two apparently (or at least potentially) quite different buses there.

The Xenon<->Xenos bus is an IBM design I believe, providing 21.6GB/s.

The Xenos<->EDRAM bus is ATI's, 32GB/s.

The former has to work across the mainboard so is like other CPU<->northbridge connections I guess (since Xenos functions as XB360's northbridge).

The latter is designed for on-substrate communications, so is theoretically "easier". Erm...

Also, what prevents AMD from making communication a fast serial interface(less pins)?

http://en.wikipedia.org/wiki/Hypertransport

Apparently the full whack configuration will support 41.6GB/s, which would be useful :smile:

Certainly in older discussions on this topic HT has come up as means for connecting GPUs.

Jawed

Jawed · May 29, 2008

Mat3 said:
Like dual-port RAM?

not sure what you mean.

Jawed

MfA · May 29, 2008

Jawed said:
What's got me dubious is the physical implementation of the ring bus. Each chip needs to have 2 ring-bus ports, each port seems to need to be in the region of 50GB/s in each direction, I suspect.

In each direction? That's an aggregate bandwidth almost equal to main memory bandwidth with GDDR5 ... I'd say even a 50 GB/s bidirectional link would be more than enough.

Mat3 · May 29, 2008

Jawed said:
not sure what you mean.

This kind of thing:

http://pdfserv.maxim-ic.com/en/an/AN62.pdf

Jawed · May 29, 2008

MfA said:
In each direction? That's an aggregate bandwidth almost equal to main memory bandwidth with GDDR5 ... I'd say even a 50 GB/s bidirectional link would be more than enough.

I wrote that just over a year ago :!:

Jawed

Jawed · May 29, 2008

Mat3 said:
This kind of thing:

http://pdfserv.maxim-ic.com/en/an/AN62.pdf

Have to admit I was scratching my head thinking of dual-ported RAM from the good old days of video memory (to enable simultaneous scan-out and update) - didn't realise you meant as a staging post to move data between processors.

Is there anything out there that's fast enough?... I presume this is a technique that's died.

Jawed

Jawed · May 29, 2008

mczak said:
I suspect the "real" worst case could even be worse since the granularity of a single texel fetch (i.e. the texture cache line size) could be more than 8 bytes (in fact it could be burst length * channel width?).

I was working on the hidden assumption that burstiness would average out when boiling this down to packets of texture data.

But anyway, I think this worst case is not relevant. You're not expected to get full performance if you don't use mipmaps...

Oh well.

Ah right forgot about that. This is probably more fine-grained though than what the tiling across chips would be? This is a slight problem here, for better balancing you'd want small tiles but for better cache efficiency you'd want large tiles.

As far as I can tell, the R6xx texture cache system effectively works with a tile size that's matched to the memory system:

3-D rendering texture caching scheme

The set-associativity of the cache is a whole other question though.

Jawed

Shtal · May 30, 2008

HAL said:
http://www.nordichardware.com/news,7809.html

Both the GeForce GTX and Radeon HD 4800 series will arrive in about three weeks. Each series will bring two new cards to the market; GeForce GTX 280 and 260, and Radeon HD 4870 and 4850. There is a big difference between the cards though as the GeForce GTX series is enthusiast range, while Radeon HD 4800 series is more mid-range. There have been talks of what GeForce GTX 280 can do in Vantage, but it has now been completed with figures for the other cards.

These are of course in no way official and we can't say for certain where they come from. The only thing we know is that the numbers are not unreasonable, but some information about the rest of the system would be nice. ATI performance (with all cards) is still subpar due to poor drivers, and should improve in Vantage with coming releases. The numbers that are circulating the web are something like this;

Graphics card Vantage Xtreme profile*
GeForce GTX 280 41xx
GeForce GTX 260 38xx
GeForce 9800GX2 36xx
GeForce 8800 Ultra 24xx
Radeon HD 4870 XT 26xx
Radeon HD 3870X2 25xx
Radeon HD 4850 Pro 20xx
Radeon HD 3870 14xx
* 1920x1200 4AA/16AF

Then R700 (4870X2) should score 48xx-49xx points in this settings.

AlphaWolf · May 30, 2008

Shtal said:
Then R700 (4870X2) should score 48xx-49xx points in this settings.

Those numbers (the ones posted by HAL) look really close to the ones that were posted here and then later proven fake. Well at least the ATI values.

Shtal · May 30, 2008

AlphaWolf said:
Those numbers look really close to the ones that were posted here and then later proven fake. Well at least the ATI values.

Thanks!

Shtal · May 30, 2008

Nvidia GT200 sucessor tapes out

http://www.theinquirer.net/gb/inquirer/news/2008/05/29/nvidia-gt200-sucessor-tapes

Question!

EDIT:

The answer to that is to tape out the GT200b yesterday. It has taped out, and it is a little more than 400mm^2 on a TSMC 55nm process

There are several problems with the GT200, most of which are near fatal. The first is the die size, 576mm^2, bigger than most Itanics.

I guess R700 (4870X2) may be more efficient / better then GT200 first revision on 65nm. "In regards comparing R700 - it may be better alternative to GT200 65nm"

AMD: R7xx Speculation

compres

ShaidarHaran

hardware monkey

Kaotik

Drunk Member

Mintmaster

Jawed

compres

Mat3

Arty

KEPLER

mczak

Jawed

Jawed

MfA

Mat3

Jawed

Jawed

Jawed

Shtal

AlphaWolf

Specious Misanthrope

Shtal

Shtal

Similar threads