SLI Vs. Crossfire

Xmas · Apr 9, 2007

trinibwoy said:
Does that one frame lag really make a difference. How much does the workload really shift from one frame to another?

Usually not much, as the goal is to have smooth animation. Although the workload can change without affecting the visual result much, e.g. when objects fade in or out.

KimB · Apr 10, 2007

armchair_architect said:
Right, but it works increasingly poorly as the interleave granularity gets coarser. Think about dividing the screen into four sections vertically (instead of just two for NV-style SFR) and interleaving on 2 GPUs: in many games this will not be an even workload split. As you interleave at finer granularity your load balance gets better (typically), but you start to damage coherence.

This isn't really a problem for ATI's implementation, because its texture cache is based on rather small tiles anyway, and I'm pretty sure we're talking about hundreds to thousands of tiles here. In fact, I sometimes wonder why supertiling doesn't work better. On paper it should be vastly superior to SFR.

silent_guy · Apr 10, 2007

Chalnoth said:
In fact, I sometimes wonder why supertiling doesn't work better. On paper it should be vastly superior to SFR.

Could you elaborate a bit more? I have a hard time coming up with reasons why it would even be somewhat superior, so 'vastly' comes a bit as a surprise. (I'm sure this must have been discussed to death during the introduction of supertiling, but it's hard to mine relevant info out of old threads...)

KimB · Apr 10, 2007

silent_guy said:
Could you elaborate a bit more? I have a hard time coming up with reasons why it would even be somewhat superior, so 'vastly' comes a bit as a surprise. (I'm sure this must have been discussed to death during the introduction of supertiling, but it's hard to mine relevant info out of old threads...)

In essence, the load balancing for supertiling should be near-perfect, and due to the texture cache structure of ATI's chips, no rendering penalty is incurred. SFR, on the other hand, will always have a fairly hard time obtaining good load balancing, since the only reasonable means of load balancing is to use the last rendered frame as a guide to move the split around. And in the situations where performance is most important, situations where there is a lot of change happening on the screen, SFR is most likely to fall flat.

silent_guy · Apr 10, 2007

Chalnoth said:
In essence, the load balancing for supertiling should be near-perfect, and due to the texture cache structure of ATI's chips, no rendering penalty is incurred.

Hard to judge for me with zero insight into the ATI texture cache structure.

Piecing together some info from this Extreme Tech article:
Tiles are sized 32x32 pixels. Does this mean that pixel shaders allocation is also done with this granularity and that texture caches are assigned accordingly?
And one step further: is there a static allocation between a pixel position on the screen and the PS unit or group on which the shader will be executed? If all that is indeed the case, I can see why texture mapping performance can't be a major problem.

Maybe the inefficiency is then in a whole different place? For example, it may be that rasterizer (and the attribute interpolator, or is that later in the pipe?) still has to rasterize pixels that have to be rendered on the other card.

Example: if you have a triangle that is larger than 32x32, maybe the rasterizer doesn't have the smarts to immediately skip a tile that doesn't belong to the current card. That is, it will step through all quads of that tile, but immediately discard those pixels. It's not unthinkable that this can become the limiting factor when the pixel shaders aren't very complex compared to the number of attributes per pixel.

Or is all this just nonsense?

BTW, from the same article:

According to ATI, supertiling is the most compatible solution, and works with every Direct3D game. However, the performance gain using supertiling is less than AFR or split-screen mode (which ATI calls "scissor" mode, since each GPU renders half the scene).

SFR, on the other hand, will always have a fairly hard time obtaining good load balancing, since the only reasonable means of load balancing is to use the last rendered frame as a guide to move the split around. And in the situations where performance is most important, situations where there is a lot of change happening on the screen, SFR is most likely to fall flat.

Yes, I can see that. Not sure how much of a problem this is in practice though.

KimB · Apr 10, 2007

Yes, from what I understand, the tiles in supertiling mode are exactly aligned with the texture caches.

As for the performance hit, one possible reason is that triangles are culled more easily for the unrendered portion of the screen with scissor than with supertiling, potentially meaning less work for the triangle setup engine or even for the vertex shaders. Or, as you mentioned, it may have to do with the cards spending time on unrendered tiles.

Xmas · Apr 10, 2007

Chalnoth said:
In essence, the load balancing for supertiling should be near-perfect, and due to the texture cache structure of ATI's chips, no rendering penalty is incurred. SFR, on the other hand, will always have a fairly hard time obtaining good load balancing, since the only reasonable means of load balancing is to use the last rendered frame as a guide to move the split around. And in the situations where performance is most important, situations where there is a lot of change happening on the screen, SFR is most likely to fall flat.

You can't really have a lot of change from frame to frame if you want smooth animation. Interleaving being "vastly superior" to SFR should be a rare exception, at least if we're talking about a 2-way split.

KimB · Apr 10, 2007

Xmas said:
You can't really have a lot of change from frame to frame if you want smooth animation. Interleaving being "vastly superior" to SFR should be a rare exception, at least if we're talking about a 2-way split.

Well, you're right. But what I'm worried about here is that in your average frame SFR is just going to be a bit inefficient in load balancing purely due to the lag time, and that in those situations where the action really gets intense SFR is more likely to have even worse performance drops.

Anyway, I did misspeak about it being 'vastly superior', but it does seem that on paper it should at least be superior in all reasonable cases, provided the hardware is not vertex shader limited, and also provided that the SFR mode is not capable of culling triangles that are outside the viewing area before transform (which may not be possible).

Xmas · Apr 10, 2007

With SFR each chip can cull triangles outside of its own viewport at the clipping stage, i.e. after VS but before triangle setup. That could make a difference, too.

trinibwoy · Apr 10, 2007

Chalnoth said:
Anyway, I did misspeak about it being 'vastly superior', but it does seem that on paper it should at least be superior in all reasonable cases, provided the hardware is not vertex shader limited, and also provided that the SFR mode is not capable of culling triangles that are outside the viewing area before transform (which may not be possible).

Hmmm I still have a hard time understanding why SFR would be as inefficient as you suggest. Would there really be such instantaneous shifts in workload per frame that the one-frame lag will always be playing catch up and allocating screen space sub-optimally? Based on that quote above it seems that even ATi thinks SFR would be faster than super-tiling.

SLI Vs. Crossfire

Xmas

Porous

KimB

silent_guy

KimB

silent_guy

KimB

Xmas

Porous

KimB

Xmas

Porous

trinibwoy

Meh

Similar threads