Somehow I don't see a shared ring make it all the way from one end of the PCB to the other end, if the cooler pic that was leaked recently is to be trusted. That would pointlessly complicate clock management IMHO.
I never said it would be easy...
The (relatively) cheap GPUs offset the increased PCB cost. If R700 were an MCM, I don't think this conversation would be necessary.
Have we seen any (confirmed) 4780 X2 PCB shots yet?
You have any idea how many pads you need on the chip for that? Even if you had both chips on the same substrate, there's no room to just add 1000+ pads.Yes, a shared ring is the likely approach here.
And what I'm wondering is if they can use GDDR5 protocol to link two GPUs... GDDR5 appears to have symmetric training (i.e. both ends can train their interfaces) which I imagine is a crucial feature when linking two "equal" chips.You really do need a custom high speed connection to pull it off.
And what I'm wondering is if they can use GDDR5 protocol to link two GPUs... GDDR5 appears to have symmetric training (i.e. both ends can train their interfaces) which I imagine is a crucial feature when linking two "equal" chips.
but it looks like the 770 and the 260 will be about equal for most things.
This means that buying a GT260 board will cost about 50 per cent more than an R770 for equivalent performance.
Ok, with 4 texels per pixel I agree I come up with 512GB/s too .Sigh I even attempted the calculation more than once, though now I'm getting 512GB/s
ARGH, I meant 4 texels per pixel not 16.
I suspect the "real" worst case could even be worse since the granularity of a single texel fetch (i.e. the texture cache line size) could be more than 8 bytes (in fact it could be burst length * channel width?).I was trying to come up with the worst case texturing bandwidth. Assuming that fp16 texels aren't compressed and considering bilinear filtering with minication of at least 50% with no mipmap so that every pixel is filtered from 4 distinct texels. Anyway it was a total bust
Ah right forgot about that. This is probably more fine-grained though than what the tiling across chips would be? This is a slight problem here, for better balancing you'd want small tiles but for better cache efficiency you'd want large tiles.Yeah I agree, textures would need tiling to help with load-balancing. The pathological case is possible even with a single GPU (i.e. reading all texels from just one memory channel instead of from all four, say).
Yes, that's what I'm thinking too. RBEs not using local memory just plain doesn't make sense.I assume that the RBEs are assigned to screen space tiles so that they only use GPU-attached, not foreign, memory. So there should be no cross-GPU traffic relating to colour/Z/stencil operations. The only render target related cross-GPU traffic should occur when stitching-together the tiles from the constituent GPUs.
Xenos is interesting because there are two apparently (or at least potentially) quite different buses there.What about what they did with the Xenos core? Not similar at all?
http://en.wikipedia.org/wiki/HypertransportAlso, what prevents AMD from making communication a fast serial interface(less pins)?
In each direction? That's an aggregate bandwidth almost equal to main memory bandwidth with GDDR5 ... I'd say even a 50 GB/s bidirectional link would be more than enough.What's got me dubious is the physical implementation of the ring bus. Each chip needs to have 2 ring-bus ports, each port seems to need to be in the region of 50GB/s in each direction, I suspect.
I wrote that just over a year agoIn each direction? That's an aggregate bandwidth almost equal to main memory bandwidth with GDDR5 ... I'd say even a 50 GB/s bidirectional link would be more than enough.
Have to admit I was scratching my head thinking of dual-ported RAM from the good old days of video memory (to enable simultaneous scan-out and update) - didn't realise you meant as a staging post to move data between processors.
I was working on the hidden assumption that burstiness would average out when boiling this down to packets of texture data.I suspect the "real" worst case could even be worse since the granularity of a single texel fetch (i.e. the texture cache line size) could be more than 8 bytes (in fact it could be burst length * channel width?).
Oh well.But anyway, I think this worst case is not relevant. You're not expected to get full performance if you don't use mipmaps...
As far as I can tell, the R6xx texture cache system effectively works with a tile size that's matched to the memory system:Ah right forgot about that. This is probably more fine-grained though than what the tiling across chips would be? This is a slight problem here, for better balancing you'd want small tiles but for better cache efficiency you'd want large tiles.
http://www.nordichardware.com/news,7809.html
Both the GeForce GTX and Radeon HD 4800 series will arrive in about three weeks. Each series will bring two new cards to the market; GeForce GTX 280 and 260, and Radeon HD 4870 and 4850. There is a big difference between the cards though as the GeForce GTX series is enthusiast range, while Radeon HD 4800 series is more mid-range. There have been talks of what GeForce GTX 280 can do in Vantage, but it has now been completed with figures for the other cards.
These are of course in no way official and we can't say for certain where they come from. The only thing we know is that the numbers are not unreasonable, but some information about the rest of the system would be nice. ATI performance (with all cards) is still subpar due to poor drivers, and should improve in Vantage with coming releases. The numbers that are circulating the web are something like this;
Graphics card Vantage Xtreme profile*
GeForce GTX 280 41xx
GeForce GTX 260 38xx
GeForce 9800GX2 36xx
GeForce 8800 Ultra 24xx
Radeon HD 4870 XT 26xx
Radeon HD 3870X2 25xx
Radeon HD 4850 Pro 20xx
Radeon HD 3870 14xx
* 1920x1200 4AA/16AF
The answer to that is to tape out the GT200b yesterday. It has taped out, and it is a little more than 400mm^2 on a TSMC 55nm process
There are several problems with the GT200, most of which are near fatal. The first is the die size, 576mm^2, bigger than most Itanics.