I'm afraid to say I think this test holds no value at all for transcendentals (actually, for anything to the right of MAD on that graph ). Generally NVidia's driver appears to be optimising the code, so the GPU's doing less work. Depending on flavour of the month optimisations, results vary wildly.Got my 285 so could run some tests myself. Not sure if something is screwy with Nvidia's drivers but my SFU numbers are coming in twice as high as expected:
I think "l" and "m" are contradictory, but I doubt it matters. As a matter of interest you might like to experiment with 63 and 65 and other non-power of 2 and non-even values to see if any of the instructions change significantly in throughput.gpubench instrissue -l 64 -a -m -c 4
I'm afraid to say I think this test holds no value at all for transcendentals (actually, for anything to the right of MAD on that graph ). Generally NVidia's driver appears to be optimising the code, so the GPU's doing less work. Depending on flavour of the month optimisations, results vary wildly.
Yeah I noticed but those are the parameters used here so I just mimicked them.I think "l" and "m" are contradictory, but I doubt it matters.
No difference....As a matter of interest you might like to experiment with 63 and 65 and other non-power of 2 and non-even values to see if any of the instructions change significantly in throughput.
Same as -c 4 except ADD/MAD/MUL/SUB are about 4x speed as expectedWhat does a -c 1 graph look like?
Are those available somewhere?
Prior to GT200 the MUL is essentially unusable (is there any driver that shows MUL faster than MAD on any NVidia G8x/9x GPU in GPUBench?), so the compiler is behaving differently depending on which chip it's targetting.I don't know, this same driver was showing expected results on my 8800GTS so it's weird that GTX285 is behaving differently on the same driver set.
I thought they were documented/available somewhere but I've failed to find them so far. I'm not sure if they're much use with respect to the transcendental functions, anyway...Are those available somewhere?
IIRC it's also quite interesting how the HD2900 and possibly newer models are doing in this test with current drivers.
HD 4870 (Cat. 8.12 WHQL) HD 2900 XT (Cat. 8.12 WHQL)
Vertex Pixel Vertex Pixel
- float MAD serial 21.182.372 117.642.201 10.623.348 47.200.346
- float4 MAD parallel 21.240.413 144.876.620 10.624.156 58.129.619
- float SQRT serial 21.244.773 117.650.268 10.625.376 47.252.525
- float 5-instruction issue 53.114.714 570.917.384 26.562.044 225.130.502
- int MAD serial 21.242.146 58.579.440 10.624.713 23.669.843
- int4 MAD parallel 21.210.878 29.480.274 10.621.545 11.475.836
Are you saying that TechReport didn't run the test on HD2900XT?Sorry, but I've found only zero verified data wrt this issue.
Clearly the current driver has not been tweaked to make the vertex shaders look good And the HD4870 vertex shader numbers are even less tweakedSo I had to rerun the tests myself.
This is in Vista 64 Bit, SP1, X48-Chipset, C2D-CPU with 3,8 Ghz with recent and official Catalyst 8.12:
Code:HD 4870 (Cat. 8.12 WHQL) HD 2900 XT (Cat. 8.12 WHQL) Vertex Pixel Vertex Pixel - float MAD serial 21.182.372 117.642.201 10.623.348 47.200.346 - float4 MAD parallel 21.240.413 144.876.620 10.624.156 58.129.619 - float SQRT serial 21.244.773 117.650.268 10.625.376 47.252.525 - float 5-instruction issue 53.114.714 570.917.384 26.562.044 225.130.502 - int MAD serial 21.242.146 58.579.440 10.624.713 23.669.843 - int4 MAD parallel 21.210.878 29.480.274 10.621.545 11.475.836
GF 8800 U. (181.22 WHQL)
Vertex Pixel
- float MAD serial 7.966.160 173.258.010
- float4 MAD parallel 7.969.755 43.733.967
- float SQRT serial 7.969.775 45.048.540
- float 5-instr. issue 19.925.096 210.127.473
- int MAD serial 7.969.392 36.321.509
- int4 MAD parallel 6.440.103 9.561.770
GF 9600 GT (181.22 WHQL)
Vertex Pixel
- float MAD serial 10.620.117 101.237.745
- float4 MAD parallel 10.522.113 25.721.769
- float SQRT serial 10.620.914 25.256.123
- float 5-inst. issue 26.552.369 116.284.367
- int MAD serial 10.618.908 20.424.944
- int4 MAD parallel 4.016.976 4.947.652
GTX 285 (181.22 WHQL)
Vertex Pixel
- float MAD serial 10.620.539 317.590.627
- float4 MAD parallel 10.613.723 81.331.192
- float SQRT serial 10.621.059 81.464.549
- float 5-instr. issue 26.547.464 365.579.803
- int MAD serial 10.618.403 65.521.415
- int4 MAD parallel 5.118.731 16.143.853
I was kinda hoping to see some effect of the MUL in GT200, but I can't see it.
2 x: MULADD_e R2.x, PV1.w, PV1.w, PV1.w
y: MIN_DX10 R0.y, R2.y, PV1.y
z: MUL_e R0.z, PV1.x, PV1.x
w: MAX_DX10 R1.w, R2.y, PV1.z
t: SQRT_e R1.x, PS1
--------------------------------------------------------------------------
Instruction Issue
--------------------------------------------------------------------------
512 70.9729 ADD 4 64
512 70.0217 SUB 4 64
512 94.9020 MUL 4 64
512 68.8689 MAD 4 64
[...]
512 71.4560 RSQ 4 64