Will PCI Express Signal the Return of Dual GPU Systems?

With the Transition to PCI Express it seems almost inevitable that some form of SLI Dual GPU support should make a comeback. It actually makes perfect sense with the way the GFX market is looking to slow Down product Cycles. Why have a Spring refresh product with a 5% performance increase when just popping in a second Video card or even Filling your ENTIRE SYSTEM with cards say 4-6 cards would Doubble, Tripple, even Quadrupple GPU performance.

SLI, AFR or some other similar method it seems like a no brainer for this to happen.

Thoughts?
 
Well I hope this happens! 8)

However, certainly with the first PCI express motherboards, only one PCI Express slot will be a X16. No doubt this will stop SLI type setups to start with.

But like you say, this has to be a no brainer for ATI / nVidia et al. They can only sell more cards this way, once motherboards support this type of setup.
 
I wonder how well PCI Express can be used to synchronise several GPUs.

There is a bit of latency penalty in the serial nature of PCI Express, but that is probably surmountable.

The point to point nature of standard PCI Express could be stumbling block, since data would have to be routed through a controller before being sent to the other cards.

The advanced switching specification could probably allow for a very powerful means of communication and synchronization, but it seems to require additional work and expense in design and implementation, since it no longer relies on a host-centric model. One would probably need a motherboard, bridges, and cards designed either to specifically handle advanced switching or sport some kind of dual capability.

This would probably be overkill for most board makers who are producing for the consumer, and the fabric model used seems to be earmarked for communications. In addition, the number of links necessary to feed data to several video cards might make things even more complicated.

Even then, there would probably be intermediate switches between the cards in a fabric setup, unless the cards were designed with their own native switches, but then the cards would probably require that they be set up in multiple amounts.

To best minimize intermediary delays, maybe they could put dedicated point to point links between the cards, but that would probably make the reduced pin count and easier trace design moot, and would run counter to the general expansion role of PCI Express, since the connectors and whatnot are going to get pretty non-standard.
 
I don't think it will happen that easy ...

If SiS ... err... I mean XGI succeeds to bring a solution worth to be take into consideration then the other players will consider it too .

As XGI's cards are the only present , real example of a multiGPU card I'm going to start off about them .

The GPUs have 80 mln transistors each which means that they'll have 160 mln transistors on each card . That's 30% more than nVIDIA has on their NV35 .

I will not compare these GPUs with the R3xx line from ATi as ATi uses 0.15 micron technology on their top of the line products and anyway the hardware implementation of the R3xx line is more than a wonder to me .. probably has to do with the cross licensing deals ATi has with INtel but that's another story .

So ... 30% more transistors translates into 30% more die size then it translates into ma more expensive GPU ( per wafer ) and also into lower yields .

Pay attention to these last 2 words "lower yields" that probably won't be the case with XGI as we are not talking about a 160 mln GPU but about more 80 mln ones ... less transistors higher yields so this differentiates XGI from nVIDIA in this case .

Conclusion 1 :

- the only disadvantage seems to be the fact that they have to pay more per wafer <or > get less chips (dual GPU card setups ) per wafer.

- the yields are likely to be higher than on nV35 for example ...

Now lets talk about the cost of the PCB ... NV35 uses a 256 Bit BUS PCB while the memory bus on the Dual XGI cards will be a 128 Bit one ... I don't think this will be a big cost issue ... most likely the price will be the same considering that the XGI Dual PCBs will be complex too .

The R&D costs fall down but not quite in the logic you would expect ....

Let's say ATi invested huge amounts of money and work into their R420 project ... the first 6 months from the launch we will have a refresh of the line with faster chips which will offer 15 to 20% more performance in the best case ... after 12 months ATi won't have to invest the same huge amount to develop a different architecture but it will launch a dual setup with the only R&D costs of the more complex PCB and some other issues like VPU interconnect BUS .

This kind of move will most like bring the 60% (compared with the 6 month refresh - 92% improvement from the first product based on that architecture) performance delta desired from the new , yearly refresh of the product line .

The most important is that the R&D costs for the yearly refresh are now at 10% from the initial strategy .

Also , the manufacturing process is now mature so the cost per chip is also smaller to compensate from the total cost of the card ... with the more complex PCB .

So ... 24 months architecture cycle will also offer a more stable platform for the programmers and developers to work with .

This , in my opinion , can not go on forever like 3dfx thought : with 4x GPU setups and all .

Reasons :

-the amount of card memory ;
-the redundancy ratio of the setup ;
-cost per GPU ( a GPU will never cost like a low cost mobo chipset ) ;
-the changes in the API and new implementations

For this ^ particular reason a R500 will most certainly appear even in this multi GPU scenario and ... just like the case with the changes in the architecture ... the performance will be greater than the older Dual setups .

Then ... we can all proceed to the next dual setup for another 24 months while waiting for DX11 . :)


The most important development in such a scenario :

Sure the first time , the dual setups won't bring anything extraordinary innovative BUT ... if this is the way to go ... from the second generation of such implementation the companies adopting such a method must develop a method of SHARING the video memory in a most EFFICIENT way . This will be , in my view , the most important development as it would drive the price of the card something like 20% lower while bring more performance .

PCI Express will bring the necessary bandwidth for such devices along with the increased current supply for mid end solutions .

Conclusion 2 :

-I don't think anybody will ever use Dual card setups again , like in 2 slots in the mobo . It's too expensive and too inefficient as the second card won't have as much bandwidth as the main one and they'll have to use cables which are affected by loss of signal ... etc. .
 
David G. said:
Conclusion 2 :

-I don't think anybody will ever use Dual card setups again , like in 2 slots in the mobo . It's too expensive and too inefficient as 1.the second card won't have as much bandwidth as the main one and 2.they'll have to use cables which are affected by loss of signal ... etc. .

1. Why wouldn't the second card have as much bandwidth? If they're both the same card, and are both in PCIE-16x slots...

2. Why do you think 3dfx's Voodoo2 SLI cable was so ridiculously short? And besides, with PCIE, such a cable won't be necessary, as the signal can go straight to the other card.
 
Tagrineth said:
And besides, with PCIE, such a cable won't be necessary, as the signal can go straight to the other card.

Well, when you do this, you have about 0.6 GB/s (1600x1200x32@75 Hz) going out one and into the other, assuming your monitor is only connected to one video card. I'm not an expert on PCI Express so I don't know what kind of impact that would have. If it significantly burdens the memory controller, the CPU could have a lag on every other frame.

I suppose it will be possible in the long run for power hungry users.
 
Tagrineth said:
1. Why wouldn't the second card have as much bandwidth? If they're both the same card, and are both in PCIE-16x slots...
Most motherboards, if not all, will not have more than one PCIE-16x slot.

2. Why do you think 3dfx's Voodoo2 SLI cable was so ridiculously short? And besides, with PCIE, such a cable won't be necessary, as the signal can go straight to the other card.
That would take up PCI bandwidth, and given the large amount of data that must be transferred, would probably not be a good way to do things. Perhaps a separate, dedicated cable would be better (i.e. a digital cable connecting the two cards within the case, but it would have to be very high bandwidth).

The easiest solution, in the end, would be a simple passthrough cable like the Voodoo2's used. This would work best if this passthrough used the DVI interface.
 
Tagrineth said:
And besides, with PCIE, such a cable won't be necessary, as the signal can go straight to the other card.

Technically this is not the case. PCIE is point to point, meaning there can only be a device on one end and a switch on the other end of a lane. The switch will have to route the data to the other card, at least for the standard version.

There's an advanced switching variant that I'm less read up on that might allow for something to minimize this.

It could also be possible to lay down some kind of non-standard connection, where the video slots could have a number of lanes running between them in addition to the lanes they use to access the host bridge. If the cards then had PCIE bridges integrated into them, they could create their own little graphical backplane.

However, that would mean non-standard slots for nonstandard hardware that would still require non-standard means of coordinating memory and processor access(though this coordination is more inherent to the system having multiple cards).

It is feasible that these extra lanes could be positioned somehow so that standard cards could still use the slots, but then that is a pretty big investment for a motherboard maker and logic designer to make for a feature that is rather unlikely to be used.
 
Early-on in the development of PCI-Express, there was the suggestion that it could be used not only as a motherboard interface protocol, but also as a USB-type interface. If this remains in the spec, then the cards could, if they had their own PCI-Express bridges, provide their own link with an additional cable.
 
The twin VSA-100 Voodoo 5500, the unreleased quad VSA-100 Voodoo 6000, the twin Rage Fury MAXX card, the multi chip Voodoo2, and multi card Voodoo2 SLI and any other consumer 3D card released with more than 1 chip, were not "multi GPU" cards. they were cards with more than 1 rasterizing chip. but all of them lacked geometry processing / T&L, now known as vertex shaders.

XGI's Volari V8 Duo (and any other multi GPU card they might have) would be the first consumer card(s) with more than one 3D processor. In this context, a 3D processor means something that takes the entire graphics load -off- the CPU.
 
<brainstorm>
I had an interesting(?) half way solution that might be possible. :) I don't remember if I wrote it here before, but here it goes.

If I read (and remembered) the spec correctly, a PCIE-16x slot consist of 16 parallel full duplex serial ports. The standard allows reconfiguration of slots and cards to use as many ports as posiible. If a PCIE-8x card is put into a PCIE-16x slot, the mb will only use the first 8 ports.

Even if it's to much to put two 16x slots in the chipset, it might be possible to put two PCIE-8x slots there. (I'm not talking from any experience here. I'm just thinking that it might not be to much extra logic if bandwidth/portcount is the same.) And then make it reconfigurable to work as one 16x slot.

That chipset would work in a normal mb with one 16x slot without too much overhead. (Just some overhead in-chip, nothing at mb level.) It would also work in a mb with two 8x slots.

Now for the interesting part: If the 2x8x mb use 16x slots where port 8-15 were connected between the slots, it would be a nice triple-use mb.
1) You could put one 16x card in it, with a pass-through card in the second slot. The pass-through card is just an empty circuit board that connects port N with port N+8, and thus makes all 16 ports available for the card in slot 1.

2) You could put two 8x cards in it, which then has 8 ports between them for its own use. Two separate PCIE-8x slots is still quite fast, and the fast port between the cards is handy to tranfer screen output and textures generated dynamically on the GPU.

3) You could put a Nx card in the first slot and a 1x card in the second (if the other 1x slots are full). This makes your gfx card efectively max 8x (even if it's a 16x card, but it's still quite fast.

So instead of making a mb with one 8x and four 1x slots (1*8x+4*1x), you'll could do (with very little overhead) a mb that can be reconfigured by end user to 1*16x+3*4x, 2*8x+3*4x, or 1*8x+4*4x. And if the mb manufacturer wants to reduce their number of PCBs, it could be done so the manufacturer could use the same PCB for a regular 1*16x+4*4x mb.
</brainstrom>
 
I think this might be a great idea for the workstation market (Quadro type cards) but will never happen for the mainstream/enthusiast cards.
 
what i hope, and think it could come, would be the split between the rendering card, and the actual display card..

so that you could have tons of different cards that you can just plug in to plug a monitor on (or a tv.. doesn't mather).. they don't render anything, they are just for providing another display adapter to the system..

and on the other hand, the current way of gpu get removed, and you pug into the x16 port a more general additional cpu, wich has complex MIMD capabilities to do vertexshading, pixelshading, or even raytracing on it.. a coprocessor simply..

this could come.. and it should come.. so matrox could do the high end display adapter modules, and say ati, or nvidia, the graphics-co-processor:D or similar:D

it would free the 3d rendering from the actual displaying.. wich is an important step wich HAS to happen anyways..
 
davepermen said:
what i hope, and think it could come, would be the split between the rendering card, and the actual display card..

...

it would free the 3d rendering from the actual displaying.. wich is an important step wich HAS to happen anyways..

:?: Sounds like a step back to the Voodoo 2 days. Why does this have to come?
 
because it would allow you to plug in new components without having to replace all of your gpu system at once..

say you have some all-in-wonder system there in, but need the feature set of an fx gpu.. you could then..

no, main reason is, the way we're seeing 3d graphics today is too restricted for real further movement, and the way hw goes today they force us to all-or-nothing solutions.. a gpu is a monopole in your pc, forcing you to have this and that without giving you a choise.

this CAN loosen up with pci-express. and i hope it will.. so you can additionally to your favourite nv40 or r420 chip for old style games plug in say a raytracing acceleration chip for newstyle games..

or offline renderers can use the same nv40 chip to do their stuff, as the chip is not there for the actual desktop displaying, but just 3d calc..

plug and play is something we don't have in gpu's.. closed systems there.. i don't like that..
 
Back
Top