5k 'pixel shader skinned' characters at 60FPS (ATI at GDC 06)

Well, the SDK seems to have gone online now, so the sample in question (R2VB-Animation) is available here with source and there's also a white paper on the technique:
http://www.ati.com/developer/radeonSDK.html

Basically, the idea is that you do the vertex skinning in the pixel shader instead of the vertex shader by using R2VB. This gives a performance boost of almost 3x on the X1900.
 
DudeMiester said:
Either they're rendering to vertex buffer, or more likely, the news source in question made a mistake.

5K at 60Fps sounds about right. It defaults to 10000 models, and I know it run on 28 fps in that case.
 
Humus said:
Basically, the idea is that you do the vertex skinning in the pixel shader instead of the vertex shader by using R2VB. This gives a performance boost of almost 3x on the X1900.
Clever :)

Presumably this will be unnecessary in DX10 with a unified architecture and the stream output abilities though?
 
Humus said:
Basically, the idea is that you do the vertex skinning in the pixel shader instead of the vertex shader by using R2VB. This gives a performance boost of almost 3x on the X1900.
Is this technique going to be practical in any type of future game programming?
 
AndyTX said:
Clever :)

Presumably this will be unnecessary in DX10 with a unified architecture and the stream output abilities though?

With a unified architecture you get the same amount of computation power in both vertex and pixel shader, so from that point of view it's not neccesary. However, whether StreamOut, R2VB or VTF will be faster for something like this is harder to judge at this point.
 
micron said:
Is this technique going to be practical in any type of future game programming?

Sure. Most games probably won't need 10K characters running around at the same time though, but there are probably cases where you'd want to deal with a lot of objects at the same time where using a technique like this can both take advantage of the power in the pixel pipes and improve batching as well.
 
Humus said:
However, whether StreamOut, R2VB or VTF will be faster for something like this is harder to judge at this point.
Is this down to uncertainty over memory access patterns/batching/tiling? e.g. streamout writes may not match-up well with DDR access patterns (analogous to vertex data structures that ignore vertex cache line sizes) whereas R2VB writes' usage of the ROPs might grant more efficient access patterns.

Jawed
 
Back
Top