Will PCI Express Signal the Return of Dual GPU Systems?

Well, you can already get a separate TV card, so there's little reason to be forced into upgrading a TV card as well as your 3D hardware.

The only reason you might want to turn GPU's into coprocessors would be for either multi-screen applications, or for offline rendering. The additional cost for just the display units is just such a small part of the cost for the total card, I don't see a reason to go in the direction of coprocessors for desktop PC's.
 
you don't need to see it... :D

but you see that you just can't loose with that. but you can gain.

yes, multimonitor support and hw accelerated offline rendering are two things we could get more easily then. there's much more, too.. it would allow much more innovation in hw, as people could design specific hw for specific tasks. today, you have to provide always a full gpu replacement. tomorrow, you can create some simd, or mimd dsp for anything. it could then get used to process graphics data (rastericing, or raytracing, or evaluating radiosity matrices, etc.., voxel rendering..), or audiodata (or both at the same time:D), or scientific data.

what we have today is nothing. it is a restricted fixed pipeline with two tiny parts that are now called programable..

its nothing compared what we could have. there is hw existing that can do that. it just uses non-standard busses..
 
Except how would you deal with the massive memory bandwidth needs of a graphics coprocessor? Not to mention the significant bandwidth needs between the graphics coprocessor and the DAC.
 
doing the math i don't see much problems.. it depends on how you manage your data, of course.. but it works out okay..
 
Chalnoth said:
Except how would you deal with the massive memory bandwidth needs of a graphics coprocessor? Not to mention the significant bandwidth needs between the graphics coprocessor and the DAC.


Although I think it unlikely that we'll see a return to V2-type technology, I think the alternate scan-line rendering technique would work pretty well--it would just be very expensive to make and purchase, of course. At 1600x1200 the max resolution each card would have to handle would be 800x600, which most current cards can do easily. Likewise 2048x1536 3d support would be much easier (1024x768, each card max.) The DAC limits would be easy to deal with as they'd be governed by the monitor.inf capability, etc., though the driver.

But you'd need API support, of course, as 3dfx had with GLIDE (or maybe not--I can see how the driver might handle all of it transparently to the API.) To get to 256mbs of dedicated ram you'd need 512mbs between the two cards, and I think a pass-through cable would be necessary for synchronization, while routing the monitor from the last card in the pair. You'd just duplicate textures when you loaded them to the cards, etc., in local ram but they could exist normally in system ram otherwise. But the one major improvement over the V2 model would be that you'd no longer need the dedicated 2d card in the mix. Onboard bandwith is already so much faster than AGPx8 that I wouldn't see that as much of a problem, really.

I think it is definitely doable, but probably unlikely. Sure would be interesting to see, though...:) One caveat here would be the limited resolutions of LCDs which are becoming more popular, and also how to get FSAA/AF working synchronously, maybe (V2 didn't have to contend with any of that.) I like the alternate scan-line approach as I can't think of a better method of evently dividing the workload between the cards--certainly AFR and screen segmenting wouldn't begin to do it as efficiently as the alternate scan-line approach. Really is an intriguing thought...
 
WaltC said:
Although I think it unlikely that we'll see a return to V2-type technology, I think the alternate scan-line rendering technique would work pretty well--it would just be very expensive to make and purchase, of course. At 1600x1200 the max resolution each card would have to handle would be 800x600, which most current cards can do easily. Likewise 2048x1536 3d support would be much easier (1024x768, each card max.) The DAC limits would be easy to deal with as they'd be governed by the monitor.inf capability, etc., though the driver.
actually, it would be 800x1200 and 1024x1536 that each card would have to handle..
 
I would think that with the increasing complexity of rendering tasks, like shaders and AA running at high speed and data rates, a scan line interleaving method would be counterproductive.

With AA sampling between odd and even scan lines, wouldn't you wind up having to pass color data for every pixel between each card? That would kind of ruin the resolution advantage, since you would wind up making each card maintain a separate copy of the other card's render results.

If so, the GPUs are going to be doing a lot of sitting around for latencies much worse than video memory access.

With shaders, especially when they start having more complex flow control, it is also possible that they will try to access pixel data outside of what the card's scanline is. In addition, interleaving might help with fill rate demands, but would might wind up making the cards do duplicate computing for non-fillrate bound tasks, unless it is possible to write shaders that can somehow themselves be interleaved.

Perhaps a more efficient method that deals with pixel effects would be to just divide the screen in half, which would only require a smaller subset of hopefully only 2 lines to be passed in the center (Not sure about reflection shaders that wind up reflecting across the boundary). This mid-transfer could also be used to synchronize rendering, just in case one half of the screen is more demanding (perhaps some kind of load balancing can be implemented for a dynamic division).

The division at half-screen runs a little against the scan line nature of CRTs, but maybe if the cards were deferred renderers?
 
WaltC said:
Although I think it unlikely that we'll see a return to V2-type technology, I think the alternate scan-line rendering technique would work pretty well--it would just be very expensive to make and purchase, of course.
The problem with scanline rendering is that texture memory bandwidth requirements increase with the number of processors, reducing the overall benefit in memory bandwidth performance from adding more memory datapaths. Quite simply, with scanlines separated from one another, texture cache hits go way down.

AFR, on the other hand, is harder to load-balance, but is more efficient in memory bandwidth, as no data need be shared (screen segmenting is almost as good in this regard, but may be even harder to load-balance). Of course, we all remember the ATI Rage Fury MAXX, which used AFR. Its main problem was that unless triple buffering was enabled, its performance was all over the place, with subsequent frames taking very different amounts of time.

Anyway, the best solution is to merely use one chip, which I think will remain possible.
 
Althornin said:
actually, it would be 800x1200 and 1024x1536 that each card would have to handle..

Heh...:) Right....I sort of divided each scanline in half, for some reason.
 
Another alternative to SLI and AFR is to have one GPU render the upper half of the frame and the other one rendering the lower half. This avoids the texture caching problem of SLI and the latency/uneven frame rate problem of AFR, but needs a bit of load-balancing between upper and lower half to reach optimal performance (which can be done by dynamically adjusting the boundary between the two areas up and down).
 
Voodoo 5 didn't use SLI like the V2 did. It use variable sized bands of about 32 pixels per band IIRC. This of course was to improve texture cache hits.
 
IIRC, the bands that Voodoo5 used has programmable widths, from 1 to 128 pixels; 32 may have been the best compromise between texture cache misses and load imbalance at the time.

I seem to remember that the solution I described was pushed by a company called Metabyte, but it never really caught on for whatever reason.

And a question: R300 supports multi-core setup - there is at least one company out there (IIRC Evans & Sutherland?) selling systems with 4 R300s - how is workload distributed over the cores?
 
arjan de lumens said:
IIRC, the bands that Voodoo5 used has programmable widths, from 1 to 128 pixels; 32 may have been the best compromise between texture cache misses and load imbalance at the time.

I seem to remember that the solution I described was pushed by a company called Metabyte, but it never really caught on for whatever reason.

And a question: R300 supports multi-core setup - there is at least one company out there (IIRC Evans & Sutherland?) selling systems with 4 R300s - how is workload distributed over the cores?

IIRC, wasn't V5 64mbs, 32 dedicated to each gpu--so that effectively you were limited to the same restrictions as a 32mb card? It seems to me that as the new reference designs are so much more powerful in every way than something akin to a V5, that if anything setting up two would be a lot easier, especially with reference to load balancing. It might even be that screen segmenting simply wouldn't be the workload issue it was three years ago, as although the hardware horsepower has increased by several hundred percent the desired resolutions are the same.
 
arjan de lumens said:
IIRC, the bands that Voodoo5 used has programmable widths, from 1 to 128 pixels; 32 may have been the best compromise between texture cache misses and load imbalance at the time.

I seem to remember that the solution I described was pushed by a company called Metabyte, but it never really caught on for whatever reason.

And a question: R300 supports multi-core setup - there is at least one company out there (IIRC Evans & Sutherland?) selling systems with 4 R300s - how is workload distributed over the cores?

I believe E&S uses 2 GPUs. CAE uses 4 on a single board. CAE uses an interconnect board to interface 4 cards in one system.

I can't see multiple boards going commercial. The cost would be outrageous. Maybe "mass" production would lower the cost but who would buy a $1000.00 + card(s).

CAE could have sold a single board with 4 GPUs 2 years ago but we didn't see any market for it.
 
WaltC said:
I think the alternate scan-line rendering technique would work pretty well

I disagree.
Think about an application that spends 90% of it's rendering time rendering into textures (shadow maps, environment maps, post processing buffers).
How would such an architecture deal with that efficiently?
 
Back
Top