R700 Inter-GPU Connection Discussion

MfA · Jun 30, 2008

To do single frame parallel rendering without duplicating the vertex load half the dynamic textures and half the transformed vertices have to go over the link ... that's quite a lot of data for what seems a 16x PCI-e 2.0 speed link.

ShaidarHaran · Jun 30, 2008

A.L.M. said:
I don't think so. Kyle was probably the first to back up, like 6 months ago, the theories about a R700 that was not only a CF on a card. Which now seems extremely likely.
I think that with that post he's hinting to the fact that R800, even if it will be a multi-gpu card, it will be viewed by the system as a single gpu card.

I doubt he was the first. I'd been saying it for quite some time myself, and there were slides out there for years detailing some sort of "shared memory" scheme, also I believe HSI schemes had been detailed, if not implied at least...

ShaidarHaran · Jun 30, 2008

MfA said:
To do single frame parallel rendering without duplicating the vertex load half the dynamic textures and half the transformed vertices have to go over the link ... that's quite a lot of data for what seems a 16x PCI-e 2.0 speed link.

Vertices can be streamed and textures can be compressed. Also, ~8GB/s bi-directionally ain't half-bad.

Mintmaster · Jun 30, 2008

MfA said:
To do single frame parallel rendering without duplicating the vertex load half the dynamic textures and half the transformed vertices have to go over the link ... that's quite a lot of data for what seems a 16x PCI-e 2.0 speed link.

I'm pretty sure we'll still see AFR, but that doesn't mean we won't have textures split across the memory pools. Whether the PCI is the only link or if there is another link (as has been suggested multiple times), BW from static textures is usually not too high, especially when using DXT.

Space occupied by render-to-texture, including shadow and reflection maps, would obviously be doubled, and I'm sure that games using under 512MB would duplicate everything, but with a good enough link and smart memory management there should be a lot more than 512MB available to a 3D app if needed without too big of a performance drop.

SFR without duplicating vertex work will take quite some time, IMO. Interestingly the software solution is quite similar to tiling on the 360...

Karma · Jun 30, 2008

ZerazaX said:
Yeah the slide's legit, I'm just wondering if GPU-Z detecting RAM is actually just GPU-Z looking it up in a database like other things or if it actually is detecting 1GB for that GPU. So either the R700 is 2 x 1 GB or 1GB shared... hm!

Well, we already have a picture of the card with 16 RAM chips on both sides. That seems to indicate 2x1GB.

Karma Police · Jun 30, 2008

Now that I look at the chips closer, it's 2x512MB total on the board.

So assuming the card in the picture and the screenshot card are the same, it looks like ATi really does have working shared memory.

EDIT: Hey, what's up with my ID?

Anarchist4000 · Jun 30, 2008

Karma said:
Well, we already have a picture of the card with 16 RAM chips on both sides. That seems to indicate 2x1GB.

16 total chips or 16 chips on each side? If the pool is shared 2x512MB would be plenty of room I'd think.

Kaotik · Jun 30, 2008

The pics have 8 chips per side, 4 per GPU on back and 4 per GPU on front.
However, there was 2x1GB models of 3870X2 too, wasn't there? So nothing is saying the slide couldn't just have 2 of such cards there. Then again, we can hope it really is shared memory pool per card.

Karma Police · Jun 30, 2008

Kaotik said:
The pics have 8 chips per side, 4 per GPU on back and 4 per GPU on front.
However, there was 2x1GB models of 3870X2 too, wasn't there? So nothing is saying the slide couldn't just have 2 of such cards there. Then again, we can hope it really is shared memory pool per card.

Can anyone make out the labels on the chips?

Pantagruel's Friend · Jun 30, 2008

Mintmaster said:
I'm pretty sure we'll still see AFR

and if we see AFR, what will we see in those cases when the application wants to reuse geometry in the subsequent frame?
um. what do we see now, actually? :?:

3dilettante · Jun 30, 2008

ShaidarHaran said:
Do not several in-order execution CPUs lacking predication hw do this already? Cell, for instance.

Combining both paths of a branch is possible with many ISAs, and Cell can do this as well.

Whether it's done through predicated instructions or conditional moves for a given architecture, the effect of executing down both paths of a branch can be derived.

Doing it in hardware is not done in any significant architecture I know of.

Pantagruel's Friend said:
and if we see AFR, what will we see in those cases when the application wants to reuse geometry in the subsequent frame?
um. what do we see now, actually?

Right now, I believe each frame's geometry setup is done fresh.
I'm not sure how to safely reuse geometry between frames without checking to see if it needs to be set up again.

Pantagruel's Friend · Jun 30, 2008

3dilettante said:
Right now, I believe each frame's geometry setup is done fresh.

so, when a single card happily reuses geometry shader output from the previous frame, the dual card solutions are simply forced to redo the geometry calculations? it sounds possible. it also sounds ugly :???:

Geo · Jun 30, 2008

My spidey-sense is still telling me what we're going to see here is incremental rather than revolutionary.

Not that useful progress can't be made incrementally!

Mintmaster · Jun 30, 2008

I can't think of many reasons that you would use a geometry shader's output for multiple frames. I doubt any game does that today.

More complicated are things like position data used in cloth and water simulation along with some image persistence techniques. The only way to handle that is to transfer the texture across the link immediately after updating it.

Pantagruel's Friend · Jun 30, 2008

I was thinking of stuff like waterfalls or explosions generated with GS. it was only my assumption that it preserves data across multiple frames - wouldn't it?

but your examples work just the same. and your solution implies there's a lot of data passed back and forth every frame. even with the 8GB/sec link, I think it may become a bottleneck.

Mintmaster · Jul 1, 2008

Pantagruel's Friend said:
but your examples work just the same. and your solution implies there's a lot of data passed back and forth every frame. even with the 8GB/sec link, I think it may become a bottleneck.

If you look at my posting history, you'll see that I haven't been a big fan of multi-GPU for just this reason. The two big problems are persistent data and wastage of memory.

With a good link, both can be solved to a certain degree (I'm pretty sure ATI's Froblins demo is going to be trouble for R700). With an insanely fast link, multi-GPU design can probably be as fast as monolithic GPU design in all scenarios, but we're a long way from that.

compres · Jul 1, 2008

What if the link is as fast as the memory interface?

Mintmaster · Jul 1, 2008

compres said:
What if the link is as fast as the memory interface?

Still not as good as monolithic design, but it may be close enough 95% of the time.

rwolf · Jul 1, 2008

Mintmaster said:
Still not as good as monolithic design, but it may be close enough 95% of the time.

Lets see....
- lower cost
- higher yields
- better performance

If they can share the same framebuffer then there goes your arguement.

rwolf · Jul 1, 2008

Radeon HD 4870X2 in the nude, 15% better than 4870 CrossFire

http://www.nordichardware.com/news,7900.html

Thats what I'm talking about.

R700 Inter-GPU Connection Discussion

MfA

ShaidarHaran

hardware monkey

ShaidarHaran

hardware monkey

Mintmaster

Karma

Karma Police

Anarchist4000

Kaotik

Drunk Member

Karma Police

Pantagruel's Friend

3dilettante

Pantagruel's Friend

Geo

Mostly Harmless

Mintmaster

Pantagruel's Friend

Mintmaster

compres

Mintmaster

rwolf

Rock Star

rwolf

Rock Star

Similar threads