SQRT FP16 performance on the NV35 - What the heck?!

Arun

Unknown.
Moderator
Legend
Hey everyone,

I just did a few tests... And I got some VERY strange results ( still running 45 Series driver here, BTW )

If I put SQRT alone, thus loads of SQRT instructions in a row, I get 35% faster performance in FP16 than in FP32.
But, if I do a lot of SQRTs with MULs, ADDs and MADs surrounding these SQRTs, I get 2% swloer performance in FP16 than in FP32 :!:

I did these tests by modifying the .fx file of RightMark3D ( the "Marble" shader ) , thus it's DX9 HLSL.

I've yet to try it with the Det50s - but it does seem strange! Also, with the Det45s, COS/SIN doesn't seem to be done in parallel to the other stuff. COS/SIN also doesn't have the strange behavior SQRT has.


Uttar
 
Are those sqrt (RSQ in assembly actually) dependent or independent instructions? I.e. are you testing latency or throughput? 2% seems to fall into the inaccurate measurement range. How many temp values are you using?
 
Hmm, strange stuff, might have done some mistakes.
Det50s do give rather different results too.

I'll try redoing some tests in the coming days.


Uttar
 
Back
Top