nVIDIA's "SLI" solution

SA said:
Scaling the geometry processing using multliple GPU cards shouldn't be too difficult if the cards use screen partitioning (as opposed to scan line interleaving). If the application's 3d engine uses efficient culling of geometry to the viewport (say using hierarchical bounding volumes) the scaling happens automatically since geometry outside a card's partition is efficiently culled before being processed.

Hmm. How would that work? Lets say the driver sends all vertices of triangles in the upper screen part to card A and all vertices of triangles in the lower part to card B and overlapping vertices to both cards (or cut them in two parts?)

What if the VS transforms a alpha blended vertice/triangle that has been on the upper part down to the lower part? Then it would have to send the (transformed) vertices to card B, not?
 
You can only know where a triangle will end up on screen after transforming the three vertices. But if you split horizontally or vertically, you only need to get the x or y coordinate (x/z or y/z, maybe) to determine which chip has to do the processing (or both). You could also use a bounding volume to do a quick&dirty test (like NV does with display lists). A problem might be analyzing the vertex shader code and stripping the transform calculation from it.
 
Chalnoth said:
Isn't a bounding volume mathematically the same as calculating one coordinate in this case?
I don't get what you mean. Why would this be the same?
 
A "bounding volume" would, in this case, just be a bounding plane. Whether or not a vector is below a bounding plane is a 3-component dot product compared to a specific value. Since we don't care about the w component here, this same 3-component dot product should simply give the y component of the vertex's position in screen space.
 
I was talking about a bounding volume for the object, so that you don't need to check all vertices, but only 8 for a box. Of course the check on these vertices can be done the same way as a check for the individual vertices.


edit: Note that a bounding volume can only be used with affine transformations, otherwise vertices are not guaranteed to be inside after transformation.
 
AlphaWolf said:
Simon F said:
Ailuros said:
The Apple gigantic monitor has a native resolution (or did I get that the wrong way?) of 2560*1600. Anything lower then the native resolution on TFT/LCD monitors isn´t usually a good idea..
An integer fraction should be ok, surely?

Some lcd displays scale better than others but even that would be a problem for that apple as its a bit of an odd resolution and many games seem to lack support for odd resolutions.

What games? Even if games would support resolutions beyond 2048*1536 (which isn´t all that common too), performance would be lacklustering anyway.

Maybe it´s just me, but I somehow have the feeling that even though 32" translates into a huge viewing area, 2500*1600 is a tad over the top. I´d most likely opt for 1920*1440 instead, in order to not have to glue my nose on it to read simple text.
 
This is why we need resolution-independent GUI's. If I were to double the resolution I currently work at (1280x960), yes, I would want the GUI to gracefully scale to the larger resolution, so I don't have to squint to read anything.
 
Chalnoth said:
This is why we need resolution-independent GUI's.

These have been around for years, but won't be "invented" until Microsoft release Longhorn.

It's kinda an amusing parallel with anti-aliased outline fonts, also "invented" by Microsoft and introduced with Win 9x (despite the fact that certain other platforms had had them for years).
 
Xmas said:
A problem might be analyzing the vertex shader code and stripping the transform calculation from it.

I don't think "analyzing" vertex shader code is a realistic solution. Games do lots of skeletal animation stuff and sometimes even physics which affect vertex positions ... hard to detect.

The algorithm must be independent of the vertex shader code. I guess some redistribution of vertex data between to two card must be done after vertex shading.
 
Hum..

I hope nVidia was that brilliant. For some reason, I doubt that they have added that level of "intelligence" in their GPU (i.e. exchanging vertex data over the SLI bus).

Many applications (like terrain mesh rendering) render the whole scene with a single draw primitive, i.e. a single large triangle mesh; with all the vertices cached in video memory. If we are to discuss how SLI can speed-up geometry-bound applications, this would be a good example to analyze.

My guess is that nVidia supports other rendering modes than just the horizontal-split mode. An "easy" way of getting twice the geometry performance is by supporting an Alternate-Frame-Rendering scheme similar to ATI's AFR (see Rage Fury MAXX). That type of scheme is much more complex to support at driver level and I suspect this mode not to be ready yet (if ever). Note that AFR would also double the latency, making it not very well "scalable" (like the "S" in SLI).

Ozo.
 
Mephisto said:
I don't think "analyzing" vertex shader code is a realistic solution. Games do lots of skeletal animation stuff and sometimes even physics which affect vertex positions ... hard to detect.
You can still try, and fall back to sending both cards all geometry data if the shader is too complex. Cases where the position is simply the result of a vector-matrix mul are easy to detect and very common.

The algorithm must be independent of the vertex shader code. I guess some redistribution of vertex data between to two card must be done after vertex shading.
I highly doubt it. Post-transform cache is quite small, and the cards don't store transformed vertices in memory. So where should this vertex data go, and how would you compensate the latency if one GPU is waiting for vertex data from the other GPU to continue rendering?


Ozo, why would AFR be complex at the driver level?
 
Ailuros said:
What games? Even if games would support resolutions beyond 2048*1536 (which isn´t all that common too), performance would be lacklustering anyway.

Maybe it´s just me, but I somehow have the feeling that even though 32" translates into a huge viewing area, 2500*1600 is a tad over the top. I´d most likely opt for 1920*1440 instead, in order to not have to glue my nose on it to read simple text.

Well I was speaking about the fractional. 1/2 of an odd res is still an odd res.
 
Mephisto said:
Xmas said:
A problem might be analyzing the vertex shader code and stripping the transform calculation from it.

I don't think "analyzing" vertex shader code is a realistic solution. Games do lots of skeletal animation stuff and sometimes even physics which affect vertex positions ... hard to detect.

The algorithm must be independent of the vertex shader code. I guess some redistribution of vertex data between to two card must be done after vertex shading.
I seriously doubt it. It'd take too much bandwidth.

I think it'd be far more likely that the Y clipping planes are adjusted on each of the chips to stop overlap and to save some fill rate.
 
Back
Top