A quick question about R3x0 and R4x0

JaylumX

Newcomer
My question relates to ATI percision. I am aware it renders [is that the right term] currently at 24bit percision when it comes to SM2.0 calls. Does that mean it renders internally at 32bit percision and outputs to 24bit or is it 24bit all the way.

Cheers

JaylumX
 
JaylumX said:
My question relates to ATI percision. I am aware it renders [is that the right term] currently at 24bit percision when it comes to SM2.0 calls. Does that mean it renders internally at 32bit percision and outputs to 24bit or is it 24bit all the way.
Vertices and texture coordinates are computed in FP32. Pixel shaders are computed in FP24. Output precision depends on what format is selected, but if you select FP32 then you get FP32.
 
So could that be the reason why the x800 sries is generally faster than the Geforce6 series in D3D games, because ATI uses 24bit percision while Nvidia uses 32bit percision.

Disclaimer

Just before some of you start a flame war on ATI vs Nvidia, to put things at rest, i have an x800 series card ;)
 
OpenGL guy said:
Vertices and texture coordinates are computed in FP32. Pixel shaders are computed in FP24. Output precision depends on what format is selected, but if you select FP32 then you get FP32.
Just a note on OpenGL guy's comment.
Though if you choose FP16/FP32 and you have alpha blending then your gonna be doing CPU rendering.
 
JaylumX said:
So could that be the reason why the x800 sries is generally faster than the Geforce6 series in D3D games, because ATI uses 24bit percision while Nvidia uses 32bit percision.

Disclaimer

Just before some of you start a flame war on ATI vs Nvidia, to put things at rest, i have an x800 series card ;)
No. Precision has nothing to do with speed. A single instruction generally takes the same number of clock cycles regardless of what precision it is. FX9, FP16, FP24, FP32. . . it doesn't matter. They all execute at exactly the same speed.

The additional speed achieved by NVidia cards from using lower precisions is a result of additional latency caused by using a large number of registers. Using FP16 doubles the number of registers that may be used before hitting the threshold. The situation has vastly improved from the days of the NV30, but AFAIK it still remains to some degree. The NV4x series also has the ability to perform certain instructions that would normally take many cycles in a single cycle using FP16. There may be more, but the one I know if is the normalize instruction.

Other factors are, of course, the number of pixel pipelines, the number of ALUs per pipeline, clockspeed, and bandwidth.
 
bloodbob said:
OpenGL guy said:
Vertices and texture coordinates are computed in FP32. Pixel shaders are computed in FP24. Output precision depends on what format is selected, but if you select FP32 then you get FP32.
Just a note on OpenGL guy's comment.
Though if you choose FP16/FP32 and you have alpha blending then your gonna be doing CPU rendering.
Not in D3D you won't ;) I.e. there is no fallback path.
 
JaylumX said:
So could that be the reason why the x800 sries is generally faster than the Geforce6 series in D3D games, because ATI uses 24bit percision while Nvidia uses 32bit percision.
AFAIK, FP32 can run as fast as FP24, it just require more transistors (for the ALUs and the register space). Also AFAIK, the reason X800s are faster than 6800s in some games may simply be because they're clocked higher and the games aren't coded (or designed) to take advantage of the GF6's potentially higher "IPC."
 
Back
Top