Hey everyone,
I just did a few tests... And I got some VERY strange results ( still running 45 Series driver here, BTW )
If I put SQRT alone, thus loads of SQRT instructions in a row, I get 35% faster performance in FP16 than in FP32.
But, if I do a lot of SQRTs with MULs, ADDs and MADs surrounding these SQRTs, I get 2% swloer performance in FP16 than in FP32
I did these tests by modifying the .fx file of RightMark3D ( the "Marble" shader ) , thus it's DX9 HLSL.
I've yet to try it with the Det50s - but it does seem strange! Also, with the Det45s, COS/SIN doesn't seem to be done in parallel to the other stuff. COS/SIN also doesn't have the strange behavior SQRT has.
Uttar
I just did a few tests... And I got some VERY strange results ( still running 45 Series driver here, BTW )
If I put SQRT alone, thus loads of SQRT instructions in a row, I get 35% faster performance in FP16 than in FP32.
But, if I do a lot of SQRTs with MULs, ADDs and MADs surrounding these SQRTs, I get 2% swloer performance in FP16 than in FP32
I did these tests by modifying the .fx file of RightMark3D ( the "Marble" shader ) , thus it's DX9 HLSL.
I've yet to try it with the Det50s - but it does seem strange! Also, with the Det45s, COS/SIN doesn't seem to be done in parallel to the other stuff. COS/SIN also doesn't have the strange behavior SQRT has.
Uttar