AMD: R7xx Speculation

Status
Not open for further replies.
Somehow I don't see a shared ring make it all the way from one end of the PCB to the other end, if the cooler pic that was leaked recently is to be trusted. That would pointlessly complicate clock management IMHO.

I never said it would be easy...

The (relatively) cheap GPUs offset the increased PCB cost. If R700 were an MCM, I don't think this conversation would be necessary.

Have we seen any (confirmed) 4780 X2 PCB shots yet?
 
I never said it would be easy...

The (relatively) cheap GPUs offset the increased PCB cost. If R700 were an MCM, I don't think this conversation would be necessary.

Have we seen any (confirmed) 4780 X2 PCB shots yet?

Nope, only claimed cooler
 
Yes, a shared ring is the likely approach here.
You have any idea how many pads you need on the chip for that? Even if you had both chips on the same substrate, there's no room to just add 1000+ pads.

Once upon a time I flirted with the same idea, but it's just not realistic. You really do need a custom high speed connection to pull it off.
 
You really do need a custom high speed connection to pull it off.
And what I'm wondering is if they can use GDDR5 protocol to link two GPUs... GDDR5 appears to have symmetric training (i.e. both ends can train their interfaces) which I imagine is a crucial feature when linking two "equal" chips.

Jawed
 
What about what they did with the Xenos core? Not similar at all?

Also, what prevents AMD from making communication a fast serial interface(less pins)?
 
And what I'm wondering is if they can use GDDR5 protocol to link two GPUs... GDDR5 appears to have symmetric training (i.e. both ends can train their interfaces) which I imagine is a crucial feature when linking two "equal" chips.

Like dual-port RAM?
 
but it looks like the 770 and the 260 will be about equal for most things.

This means that buying a GT260 board will cost about 50 per cent more than an R770 for equivalent performance.

Courtesy: Inq

Arun, you might have another passenger on the 800SP train .. :p
 
:oops: Sigh I even attempted the calculation more than once, though now I'm getting 512GB/s :???:
ARGH, I meant 4 texels per pixel not 16.
Ok, with 4 texels per pixel I agree I come up with 512GB/s too :).
I was trying to come up with the worst case texturing bandwidth. Assuming that fp16 texels aren't compressed and considering bilinear filtering with minication of at least 50% with no mipmap so that every pixel is filtered from 4 distinct texels. Anyway it was a total bust :oops:
I suspect the "real" worst case could even be worse since the granularity of a single texel fetch (i.e. the texture cache line size) could be more than 8 bytes (in fact it could be burst length * channel width?).
But anyway, I think this worst case is not relevant. You're not expected to get full performance if you don't use mipmaps...

Yeah I agree, textures would need tiling to help with load-balancing. The pathological case is possible even with a single GPU (i.e. reading all texels from just one memory channel instead of from all four, say).
Ah right forgot about that. This is probably more fine-grained though than what the tiling across chips would be? This is a slight problem here, for better balancing you'd want small tiles but for better cache efficiency you'd want large tiles.

I assume that the RBEs are assigned to screen space tiles so that they only use GPU-attached, not foreign, memory. So there should be no cross-GPU traffic relating to colour/Z/stencil operations. The only render target related cross-GPU traffic should occur when stitching-together the tiles from the constituent GPUs.
Yes, that's what I'm thinking too. RBEs not using local memory just plain doesn't make sense.
 
What about what they did with the Xenos core? Not similar at all?
Xenos is interesting because there are two apparently (or at least potentially) quite different buses there.

The Xenon<->Xenos bus is an IBM design I believe, providing 21.6GB/s.

The Xenos<->EDRAM bus is ATI's, 32GB/s.

The former has to work across the mainboard so is like other CPU<->northbridge connections I guess (since Xenos functions as XB360's northbridge).

The latter is designed for on-substrate communications, so is theoretically "easier". Erm...

Also, what prevents AMD from making communication a fast serial interface(less pins)?
http://en.wikipedia.org/wiki/Hypertransport

Apparently the full whack configuration will support 41.6GB/s, which would be useful :smile:

Certainly in older discussions on this topic HT has come up as means for connecting GPUs.

Jawed
 
What's got me dubious is the physical implementation of the ring bus. Each chip needs to have 2 ring-bus ports, each port seems to need to be in the region of 50GB/s in each direction, I suspect.
In each direction? That's an aggregate bandwidth almost equal to main memory bandwidth with GDDR5 ... I'd say even a 50 GB/s bidirectional link would be more than enough.
 
I suspect the "real" worst case could even be worse since the granularity of a single texel fetch (i.e. the texture cache line size) could be more than 8 bytes (in fact it could be burst length * channel width?).
I was working on the hidden assumption that burstiness would average out when boiling this down to packets of texture data.

But anyway, I think this worst case is not relevant. You're not expected to get full performance if you don't use mipmaps...
:LOL: Oh well.

Ah right forgot about that. This is probably more fine-grained though than what the tiling across chips would be? This is a slight problem here, for better balancing you'd want small tiles but for better cache efficiency you'd want large tiles.
As far as I can tell, the R6xx texture cache system effectively works with a tile size that's matched to the memory system:

3-D rendering texture caching scheme

The set-associativity of the cache is a whole other question though.

Jawed
 
http://www.nordichardware.com/news,7809.html

Both the GeForce GTX and Radeon HD 4800 series will arrive in about three weeks. Each series will bring two new cards to the market; GeForce GTX 280 and 260, and Radeon HD 4870 and 4850. There is a big difference between the cards though as the GeForce GTX series is enthusiast range, while Radeon HD 4800 series is more mid-range. There have been talks of what GeForce GTX 280 can do in Vantage, but it has now been completed with figures for the other cards.

These are of course in no way official and we can't say for certain where they come from. The only thing we know is that the numbers are not unreasonable, but some information about the rest of the system would be nice. ATI performance (with all cards) is still subpar due to poor drivers, and should improve in Vantage with coming releases. The numbers that are circulating the web are something like this;


Graphics card Vantage Xtreme profile*
GeForce GTX 280 41xx
GeForce GTX 260 38xx
GeForce 9800GX2 36xx
GeForce 8800 Ultra 24xx
Radeon HD 4870 XT 26xx
Radeon HD 3870X2 25xx
Radeon HD 4850 Pro 20xx
Radeon HD 3870 14xx
* 1920x1200 4AA/16AF

;)


Then R700 (4870X2) should score 48xx-49xx points in this settings.
 
Nvidia GT200 sucessor tapes out

http://www.theinquirer.net/gb/inquirer/news/2008/05/29/nvidia-gt200-sucessor-tapes

Question!

EDIT:
The answer to that is to tape out the GT200b yesterday. It has taped out, and it is a little more than 400mm^2 on a TSMC 55nm process
There are several problems with the GT200, most of which are near fatal. The first is the die size, 576mm^2, bigger than most Itanics.


I guess R700 (4870X2) may be more efficient / better then GT200 first revision on 65nm. "In regards comparing R700 - it may be better alternative to GT200 65nm"
 
Last edited by a moderator:
Status
Not open for further replies.
Back
Top