R700 Inter-GPU Connection Discussion

To do single frame parallel rendering without duplicating the vertex load half the dynamic textures and half the transformed vertices have to go over the link ... that's quite a lot of data for what seems a 16x PCI-e 2.0 speed link.
 
I don't think so. Kyle was probably the first to back up, like 6 months ago, the theories about a R700 that was not only a CF on a card. Which now seems extremely likely.
I think that with that post he's hinting to the fact that R800, even if it will be a multi-gpu card, it will be viewed by the system as a single gpu card.

I doubt he was the first. I'd been saying it for quite some time myself, and there were slides out there for years detailing some sort of "shared memory" scheme, also I believe HSI schemes had been detailed, if not implied at least...
 
To do single frame parallel rendering without duplicating the vertex load half the dynamic textures and half the transformed vertices have to go over the link ... that's quite a lot of data for what seems a 16x PCI-e 2.0 speed link.

Vertices can be streamed and textures can be compressed. Also, ~8GB/s bi-directionally ain't half-bad.
 
To do single frame parallel rendering without duplicating the vertex load half the dynamic textures and half the transformed vertices have to go over the link ... that's quite a lot of data for what seems a 16x PCI-e 2.0 speed link.
I'm pretty sure we'll still see AFR, but that doesn't mean we won't have textures split across the memory pools. Whether the PCI is the only link or if there is another link (as has been suggested multiple times), BW from static textures is usually not too high, especially when using DXT.

Space occupied by render-to-texture, including shadow and reflection maps, would obviously be doubled, and I'm sure that games using under 512MB would duplicate everything, but with a good enough link and smart memory management there should be a lot more than 512MB available to a 3D app if needed without too big of a performance drop.

SFR without duplicating vertex work will take quite some time, IMO. Interestingly the software solution is quite similar to tiling on the 360...
 
Yeah the slide's legit, I'm just wondering if GPU-Z detecting RAM is actually just GPU-Z looking it up in a database like other things or if it actually is detecting 1GB for that GPU. So either the R700 is 2 x 1 GB or 1GB shared... hm!

Well, we already have a picture of the card with 16 RAM chips on both sides. That seems to indicate 2x1GB.
 
Now that I look at the chips closer, it's 2x512MB total on the board.

So assuming the card in the picture and the screenshot card are the same, it looks like ATi really does have working shared memory.

EDIT: Hey, what's up with my ID?
 
The pics have 8 chips per side, 4 per GPU on back and 4 per GPU on front.
However, there was 2x1GB models of 3870X2 too, wasn't there? So nothing is saying the slide couldn't just have 2 of such cards there. Then again, we can hope it really is shared memory pool per card.
 
The pics have 8 chips per side, 4 per GPU on back and 4 per GPU on front.
However, there was 2x1GB models of 3870X2 too, wasn't there? So nothing is saying the slide couldn't just have 2 of such cards there. Then again, we can hope it really is shared memory pool per card.

Can anyone make out the labels on the chips?
 
Do not several in-order execution CPUs lacking predication hw do this already? Cell, for instance.

Combining both paths of a branch is possible with many ISAs, and Cell can do this as well.

Whether it's done through predicated instructions or conditional moves for a given architecture, the effect of executing down both paths of a branch can be derived.

Doing it in hardware is not done in any significant architecture I know of.

and if we see AFR, what will we see in those cases when the application wants to reuse geometry in the subsequent frame?
um. what do we see now, actually? :?:
Right now, I believe each frame's geometry setup is done fresh.
I'm not sure how to safely reuse geometry between frames without checking to see if it needs to be set up again.
 
My spidey-sense is still telling me what we're going to see here is incremental rather than revolutionary.

Not that useful progress can't be made incrementally!
 
I can't think of many reasons that you would use a geometry shader's output for multiple frames. I doubt any game does that today.

More complicated are things like position data used in cloth and water simulation along with some image persistence techniques. The only way to handle that is to transfer the texture across the link immediately after updating it.
 
I was thinking of stuff like waterfalls or explosions generated with GS. it was only my assumption that it preserves data across multiple frames - wouldn't it?

but your examples work just the same. and your solution implies there's a lot of data passed back and forth every frame. even with the 8GB/sec link, I think it may become a bottleneck.
 
but your examples work just the same. and your solution implies there's a lot of data passed back and forth every frame. even with the 8GB/sec link, I think it may become a bottleneck.
If you look at my posting history, you'll see that I haven't been a big fan of multi-GPU for just this reason. The two big problems are persistent data and wastage of memory.

With a good link, both can be solved to a certain degree (I'm pretty sure ATI's Froblins demo is going to be trouble for R700). With an insanely fast link, multi-GPU design can probably be as fast as monolithic GPU design in all scenarios, but we're a long way from that.
 
Back
Top