Hey everyone,
I've just released a small patch for nVidia cards allowing it to be 100% FP32 ( see nV News forums for link, too lazy to put it here )
I've asked for MikeC of nV News to benchmark it ( as well as a non-public FP16 version ) and he accepted to give me some numbers. Thanks Mike!
The original shader files are a mix of FX12, FP16 and FP32. The small, precision parts such as the eyes are FP32.
The bulk of the work is done mostly in FX12 ( guestimate: 80% ) and with some FP16 ( guestimate: 20% )
As you can see, FP16 ( replace FX12 by FP16, keep FP32 where it already was ) performance is practically identical to default performance. This would indicate the NV35 got no FX12 hardware ( or maybe a very little bit to explain the 5% performance hit or so )
The FP32 performance is however the 2/3 of the FP16 performance. This is a big performance hit!
The difference is that everything is FP32, and thus also that the number of used registers is doubled ( 4FP32 instead of 4FP16/2FP32 )
My theory right now is that the NV3x might be able to use a small poll of FX12 units shared with T&L ( maybe, but probably not ) explaining the slight performance hit of FP16.
Also, the NV35 would be 100% FP32 from top to bottom, but would get very big performance hits from using more registers ( heck, maybe even more than the NV30, although these tests can't show that )
Any other ideas of what those numbers could mean? Or any feedback?
Uttar
I've just released a small patch for nVidia cards allowing it to be 100% FP32 ( see nV News forums for link, too lazy to put it here )
I've asked for MikeC of nV News to benchmark it ( as well as a non-public FP16 version ) and he accepted to give me some numbers. Thanks Mike!
1024x768:
Default Shaders - 29fps
FP16 - 27fps
FP32 - 18fps
1600x1200:
Default Shaders - 27fps
FP16 - 25fps
FP32 - 17fps
The original shader files are a mix of FX12, FP16 and FP32. The small, precision parts such as the eyes are FP32.
The bulk of the work is done mostly in FX12 ( guestimate: 80% ) and with some FP16 ( guestimate: 20% )
As you can see, FP16 ( replace FX12 by FP16, keep FP32 where it already was ) performance is practically identical to default performance. This would indicate the NV35 got no FX12 hardware ( or maybe a very little bit to explain the 5% performance hit or so )
The FP32 performance is however the 2/3 of the FP16 performance. This is a big performance hit!
The difference is that everything is FP32, and thus also that the number of used registers is doubled ( 4FP32 instead of 4FP16/2FP32 )
My theory right now is that the NV3x might be able to use a small poll of FX12 units shared with T&L ( maybe, but probably not ) explaining the slight performance hit of FP16.
Also, the NV35 would be 100% FP32 from top to bottom, but would get very big performance hits from using more registers ( heck, maybe even more than the NV30, although these tests can't show that )
Any other ideas of what those numbers could mean? Or any feedback?
Uttar