SLI Thought

Dave Baumann

Gamerscore Wh...
Moderator
Legend
With current SLI solutions we appear to be in the situation where only a partial number of lanes available to the graphics card are used – with current chipsets we have the best case of x8 PCI Express lanes going to each board when a single Northbridge is used, on the nForce4 chipset.

With the nForce4 solutions we have the little lane switch card whereby moving it round from single to dual board redistributes the routing for the PCI Express lanes on the motherboard moving the final 8 lanes from the primary x16 connector to the 8 lanes on the secondary x16 connector. This solution leaves a potential 8 lanes on each graphics board, and 8 lanes on each x16 connector redundant.

What problems could you foresee by having the motherboard traces routed in such a fashion that when the SLI lane switching connector is placed into SLI mode each of the x16 connectors receives x8 lanes each from the northbridge, as normal, but also re-routes the final 8 lanes of each x16 connector to each other, reversing the transmit and receive, such that the final 8 lanes can be used to communicate directly between the two graphics chips, giving them a dedicated 4GB/s transfer between them?
 
Smells like AMR.

It seems you could do just that with current SLI boards with a new continuity connector, without any board redesign.
 
Rys said:
It seems you could do just that with current SLI boards with a new continuity connector, without any board redesign.

At the moment, assuming this doesn't happen (which I've not heard anything to suggest that it does) then there would need to be the correct traces to connect the lanes, so new MB's would be needed at least.

What I'm wondering, though, is what type of logic the chips would need to communciate directly with each other over the remaining PCIe lanes.
 
DaveBaumann said:
What I'm wondering, though, is what type of logic the chips would need to communciate directly with each other over the remaining PCIe lanes.

You just need a programmable PCIe interface. Being able to direct data to lane bundles directly connected to the pins on the GPU shouldn't be hard. I'd imagine that's already present on existing hardware.

If indeed is it ATI's preferred method for doing multi-participant accelerated 3D, with a remap of the back end of a PEG16X interface, I can see the majority of their existing PEG native parts being able to take part.
 
Maybe the Link Training between the two cards will a little bit tricky. But if you retard it until both cards know which one is the primary it should work.

Maybe nVidia allready use such a solution for the SLI Link. 16 lanes + 4 lanes (for SLI) are 20 lanes. The same number nForce4 supports. But this is surely only a coincidence.
 
Rys said:
If indeed is it ATI's preferred method for doing multi-participant accelerated 3D, with a remap of the back end of a PEG16X interface, I can see the majority of their existing PEG native parts being able to take part.

I don't think ATI are going to do this, this was just a bit of thinking out loud. I think that whatever either company is going to do initially it'll have to exist without any ASIC changes and I would presume that a model such as this would need some else I'd have thought NVIDIA would have implemented it themselves. The only clues I have for ATI at the moment is this.
 
Well it would be a much cooler and efficient? design. Won't be surprised if that's the route ATI takes.
 
DaveBaumann said:
Rys said:
If indeed is it ATI's preferred method for doing multi-participant accelerated 3D, with a remap of the back end of a PEG16X interface, I can see the majority of their existing PEG native parts being able to take part.

I don't think ATI are going to do this, this was just a bit of thinking out loud. I think that whatever either company is going to do initially it'll have to exist without any ASIC changes and I would presume that a model such as this would need some else I'd have thought NVIDIA would have implemented it themselves. The only clues I have for ATI at the moment is this.

This PCIe switches can very helpfull if you want to build more than one chip on a single card. I am not sure how they can help in a multicard configuration. Anyway AFAIK this chips are very expensiv (> $100) at the moment. I am sure that we see more configurations like the Gigabyte and ASUS multi GPU card that do not use any additional switch. AFAIK this did not fulfill the PCIe Spec (Until now I did not have the time to read the whole 426 pages). But the GPU bussines seems to be always somewhat more generous.

An idea by me is to use the "Hyper Memory Technology" for AMR. Two ore more chips can exchange information easily with a shared block in the main memory.
 
Teehee, a fair while back now I was talking about ATI implementing a dual-PEG 16 chipset solution for AMR integrated into the Hypermemory architecture.

Since PCI Express is a point-to-point network, to get any number of lanes talking between one card and another in a dual-card configuration, one could simply route that data via a switch. This is entirely analogous to twisted-pair ethernet routing.

If ATI really is doing AMR, then I really can't see them using a dedicated link, a la SLI's direct-between-cards link. The best alternative for high performance, looking into the dim distant future, is to maximise PEG bandwidth by using all 16 lanes bi-directionally, with the Hypermemory archiecture (integrated into the Northbridge, or working in concert with a PEG Switch) controlling the allocation of lanes between the cards versus the need to move data from the "CPU/system RAM" to the cards.

PEX 8532 seems to fit the bill wonderfully as the hardware portion of this architecture, though of course it will add to the motherboard's cost to have a non-integrated (i.e. not part of the Northbridge) PCI Express switch.

This obviously requires that the motherboard design "fully populates" the two PEG 16 slots with traces (152 traces per slot?), which is obviously more complex than providing one slot with PEG x16 traces and the other slot with PEG x8 (which is what SLI requires).

This could be useful:

http://www.gen-x-pc.com/pci_basic.htm

The diagram that shows how the PCI-Express Switch is distinct from the Northbridge chipset (even if it's integrated into the same device) looks to be right on topic. :)

There's also an interesting bit about PCI Express support for snooping, allowing a cache-controller to maintain consistency between GPU memory and system memory. In other words, Hypermemory and Turbo Cache were designed-in to PCI Express...

Jawed
 
Been wondering for much of this what the hell "AMR" is, and I've just worked out what you're talking about - I've got an idea that ATI are not referring to anything as "AMR".
 
DaveBaumann said:
Been wondering for much of this what the hell "AMR" is, and I've just worked out what you're talking about - I've got an idea that ATI are not referring to anything as "AMR".

Whatever they call it, and indeed whether they use it at all, you're close to describing how to transmit the sync and off screen data needed for some kind of 'SLI', without the inter-GPU connector, using un-bundled lanes :idea:

It's a valid method. Remap the slot, handle it in your PCIe interface on the GPU, send some data.

If it's not what 'AMR' will use, then you're probably not far off IMO. And If you're talking about using a PLX switch ASIC for the remap (or something similar), where's that going? On an ATI mainboard?
 
While it'd probably be a great implementation for SLI, it seems it would take too much collaboration for such a niche product between motherboard and GPU manufacturers to actually happen.
 
DaveBaumann said:
I think that whatever either company is going to do initially it'll have to exist without any ASIC changes and I would presume that a model such as this would need some else I'd have thought NVIDIA would have implemented it themselves.

As an aside, I think NVIDIA's method of inter-GPU communication was designed that way entirely on purpose, not because they hadn't thought of using the spare lane bundles. I've long considered it's just a way for platform lock-in, using their own core logic.

It somewhat forces ATI to do something similar, too, if they don't want to interleave data into traffic already going to and from the host.
 
It's a cool idea, but I still think the variant I've proposed a couple of times would be better. You can get the exact same effect without an extra switching slot.

On the motherboard:
Route lane 1-8 of the NB to lane 1-8 of PCIe#1.
Route lane 9-16 of the NB to lane 1-8 of PCIe#2.
Route lane 9-16 of PCIe#1 to lane 9-16 of PCIe#2.

If you want to use just one PCIe board, put it in PCIe#1, and put a router card (supplied with motherboard) in PCIe#2. => 16 lanes in PCIe#1

If you want to do SLI, the two cards can speak directly over lane 8-16.


Nice effects: (with both your and mine version)

If the cards have some routing capabilities, you could access any of the cards at x16 speed (as long as you're not using the busses for anything else at that time). You could also broadcast data with x16 worth of data to each of the cards.

But the most important part is that they've got a fast bus for synching data generated on the GPU. (As Rys mentioned.)
 
Basic said:
It's a cool idea, but I still think the variant I've proposed a couple of times would be better. You can get the exact same effect without an extra switching slot.

On the motherboard:
Route lane 1-8 of the NB to lane 1-8 of PCIe#1.
Route lane 9-16 of the NB to lane 1-8 of PCIe#2.
Route lane 9-16 of PCIe#1 to lane 9-16 of PCIe#2.

If you want to use just one PCIe board, put it in PCIe#1, and put a router card (supplied with motherboard) in PCIe#2. => 16 lanes in PCIe#1

So SLI gives you this :?: :

Code:
NB(1-8)  <-> (1-8)PEG1(9-16) <->
NB                             |(bi-directional)
NB(9-16) <-> (1-8)PEG2(9-16) <->

And with the router (sounds like pass through) :?: :

Code:
NB(1-8)  <-> (1-8)PEG1(9-16) <-
NB                            |(one-way passthru?)
NB(9-16) <->   router card   ->

Save you using a seperate ASIC for routing bundles, cool. But you need a new mainboard. So Dave's suggestion gets you that channel using a different continuity connector, whereas yours needs a passthru adaptor in the second PEG socket and a new mainboard for that.

SLI carries on working regardless, since you get both connections to the host.

You get the feeling either is going to show up at some point, from someone ;)
 
Rys:
The first is exactly what I meant. The second is right if you make all parts bi-directional. The bandwidth from the 8 extra lanes back from the GPU aren't all that necessary, but PCIe lanes are always bidirectional.


Neither mine, nor Dave's idea (if I understood it correctly) needs a separate routing ASIC. And both our ideas need new motherboards. But I think my would make the motherboard simpler.

I made an image comparing the different versions.
PCIe_SLI.PNG
 
Nice diagrams Basic; yeah you pretty much got what I was thinking down.

I'm guessing that trace lengths and signal timing will be an issue here, assuming that under normal circumstancing each of the x16 lanes would need to be the same trace length. If thats the case then I'm not sure which would be easier given the locations of the second x16 lane and the current SLI routing switch board.

Still, whats the betting that either one or both of these will turn up in a patent sooner or later? ;)
 
I think there's one flaw in your diagram, Basic. The "Current" diagram indicates that only 8 lanes pass through the SLI switching card.

I believe all 16 lanes pass through it. If there's a question of "equal length" traces, as Dave suggests, then the routing of all 16 lanes through a board that's placed equidistant between the PEGx16 slots would appear to be an intrinsic part of the design.

Further, you need to remember that this board is only needed on AMD CPU motherboards. Intel motherboards do all switching in the chipset.

And we all need to remember that motherboards have been demonstrated where the SLI link board that directly connects the two graphics cards is redundant. In other words we've already seen SLI operating via the motherboard's PCI Express infrastructure. Although it seems that NVidia has stomped on this solution by altering a BIOS to prevent it working.

Jawed
 
I believe all 16 lanes pass through it. If there's a question of "equal length" traces, as Dave suggests, then the routing of all 16 lanes through a board that's placed equidistant between the PEGx16 slots would appear to be an intrinsic part of the design.

That’s what I’m thinking as well.

[Edit] - Eyeballing the connector in comparison to a PCIe graphics card indicates that there is about the same numer of connections on the two, if not more on the connector.

Further, you need to remember that this board is only needed on AMD CPU motherboards. Intel motherboards do all switching in the chipset.

No. This is just “lane re-routingâ€, not switching. With a similar number of lanes available in the Northbridge then a similar implementation is required. The Intel solutions we’ve seen before just happened to have x16 lanes routed to one x16 connector and x4 lanes routed to a second.

And we all need to remember that motherboards have been demonstrated where the SLI link board that directly connects the two graphics cards is redundant. In other words we've already seen SLI operating via the motherboard's PCI Express infrastructure. Although it seems that NVidia has stomped on this solution by altering a BIOS to prevent it working.

In NVIDIA’s current solution the SLI interlink is only passing the final rendered portion of the frame from the slave board to the master – in SFR removing it will remove half the rendered frame, in AFR you will alternate between rendered and “grey†frames.
 
DaveBaumann said:
I'm guessing that trace lengths and signal timing will be an issue here, assuming that under normal circumstancing each of the x16 lanes would need to be the same trace length. If thats the case then I'm not sure which would be easier given the locations of the second x16 lane and the current SLI routing switch board.

This is not an issue.

PCIe allows that every pair have a different trace length. Only the traces of one pair need to be the same length.

Take a look at a PCIe and a AGP motherboard. On the AGP board you will see many serpentines to make sure that all AGP traces have the same length. A well designed PCIe Board don't need this serpentines.

The only problem I am see with basics solution is the overall tracelength. A single pair should only be around 12 inches long.
 
Back
Top