TimothyFarrar
Regular
Got some strange results for ROP performance from a G84 (8600 GTS clocked to 730GHz).
Anyone here have any speculations as to why?
Perhaps the G92 might also also have similar ROP querks (don't have a G92 to test with yet). The results with blending disabled seem rather odd. Like why would LUMINANCE32F be so slow? The ROP blending results seem correct, still wonder why RGBA8 is also so slow.
Results
Max possible blend rate = 8 ROP units at 730MHz = 5.84 Gpix/sec.
With blending disabled (~ = approximately),
L8,L16F,LA8,LA16F,RGBA8 : ~5.1 Gpix/sec, ~88% of max (5.8 Gpix/sec).
L32F : ~3.3 Gpix/sec, ~57% of max (5.8 Gpix/sec).
RGBA16F : ~2.7 Gpix/sec, ~93% of max (2.9 Gpix/sec).
LA32F : ~2.1 Gpix/sec, ~72% of max (2.9 Gpix/sec).
RGBA32F : ~1.2 Gpix/sec, ~82% of max (1.45 Gpix/sec).
With glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA),
L8, RGBA8 : ~2.9 Gpix/sec, ~50% of max (5.8 Gpix/sec).
L16F : ~2.7 Gpix/sec, ~93% of max (2.9 Gpix/sec).
L32F : ~1.4 Gpix/sec, ~97% of max (1.45 Gpix/sec).
RGBA16F : ~2.7 Gpix/sec, ~94% of max (2.9 Gpix/sec).
RGBA32F : ~0.36 Gpix/sec, ~99% of max (0.36 Gpix/sec).
Info on Test Method
Using OpenGL with the latest NVidia drivers (Linux64), 2048x2048 texture of above formats bound to FBO, fragment shader writes a constant value, and GL_TIME_ELAPSED_EXT query used for timing.
Anyone here have any speculations as to why?
Perhaps the G92 might also also have similar ROP querks (don't have a G92 to test with yet). The results with blending disabled seem rather odd. Like why would LUMINANCE32F be so slow? The ROP blending results seem correct, still wonder why RGBA8 is also so slow.
Results
Max possible blend rate = 8 ROP units at 730MHz = 5.84 Gpix/sec.
With blending disabled (~ = approximately),
L8,L16F,LA8,LA16F,RGBA8 : ~5.1 Gpix/sec, ~88% of max (5.8 Gpix/sec).
L32F : ~3.3 Gpix/sec, ~57% of max (5.8 Gpix/sec).
RGBA16F : ~2.7 Gpix/sec, ~93% of max (2.9 Gpix/sec).
LA32F : ~2.1 Gpix/sec, ~72% of max (2.9 Gpix/sec).
RGBA32F : ~1.2 Gpix/sec, ~82% of max (1.45 Gpix/sec).
With glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA),
L8, RGBA8 : ~2.9 Gpix/sec, ~50% of max (5.8 Gpix/sec).
L16F : ~2.7 Gpix/sec, ~93% of max (2.9 Gpix/sec).
L32F : ~1.4 Gpix/sec, ~97% of max (1.45 Gpix/sec).
RGBA16F : ~2.7 Gpix/sec, ~94% of max (2.9 Gpix/sec).
RGBA32F : ~0.36 Gpix/sec, ~99% of max (0.36 Gpix/sec).
Info on Test Method
Using OpenGL with the latest NVidia drivers (Linux64), 2048x2048 texture of above formats bound to FBO, fragment shader writes a constant value, and GL_TIME_ELAPSED_EXT query used for timing.