Strange ROP performance + ROP benchmarks by format.

Got some strange results for ROP performance from a G84 (8600 GTS clocked to 730GHz).

Anyone here have any speculations as to why?

Perhaps the G92 might also also have similar ROP querks (don't have a G92 to test with yet). The results with blending disabled seem rather odd. Like why would LUMINANCE32F be so slow? The ROP blending results seem correct, still wonder why RGBA8 is also so slow.

Results

Max possible blend rate = 8 ROP units at 730MHz = 5.84 Gpix/sec.

With blending disabled (~ = approximately),

L8,L16F,LA8,LA16F,RGBA8 : ~5.1 Gpix/sec, ~88% of max (5.8 Gpix/sec).
L32F : ~3.3 Gpix/sec, ~57% of max (5.8 Gpix/sec).
RGBA16F : ~2.7 Gpix/sec, ~93% of max (2.9 Gpix/sec).
LA32F : ~2.1 Gpix/sec, ~72% of max (2.9 Gpix/sec).
RGBA32F : ~1.2 Gpix/sec, ~82% of max (1.45 Gpix/sec).

With glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA),

L8, RGBA8 : ~2.9 Gpix/sec, ~50% of max (5.8 Gpix/sec).
L16F : ~2.7 Gpix/sec, ~93% of max (2.9 Gpix/sec).
L32F : ~1.4 Gpix/sec, ~97% of max (1.45 Gpix/sec).
RGBA16F : ~2.7 Gpix/sec, ~94% of max (2.9 Gpix/sec).
RGBA32F : ~0.36 Gpix/sec, ~99% of max (0.36 Gpix/sec).

Info on Test Method

Using OpenGL with the latest NVidia drivers (Linux64), 2048x2048 texture of above formats bound to FBO, fragment shader writes a constant value, and GL_TIME_ELAPSED_EXT query used for timing.
 
Should we be looking for a software explanation to these differences? It almost seems theres a problem with the driver implementation in those formats but my knowledge is limited. I would like to see the answer myself and I would love to see how my R600 performs in those tests (probably poorly as is the trend with it).
 
The ROP blending results seem correct, still wonder why RGBA8 is also so slow.
That's a feature, and possibly one of the least logical trade-offs in G8x. I would not be surprised if part of the point was to make HDR cheaper than it really should be in theory; however, that doesn't make as much sense on G84 or G86. On the latter, the lack of bandwidth makes it irrelevant though.

As for L32F & friends, I'm not sure. G8x is equally weird for non-Vec4 FP32 filtering, even on the latest drivers (at least the ones I tried) while G92 behaves as expected. I don't know what might be causing this in the case of blending.
 
Back
Top