Another NV3x series and multiple precisions thread

indio

Newcomer
The more I read about the Nv3x series the more I think that it wasn't designed to run at any precision in particular. I think it was designed to run all 3 precisions simultaneously , with the developer optimising operations in their code to differentiate precision usage . Basically making developers parse their code to use only the precision that was necessary for each particular line of code respectively. They created Cg to compile down to FX optimal code with the compiler doing the determinations of what piece of code should be run at what precision.
If the above is true and I don't seriously misunderstand what's going on. It is very presumptous for Nvidia to assume that ppl. will code specially for them. This put recent events in a different light for me. If the FX series doesn't sell well and developers find no need to code specially for the FX or use CG , Nvidia's cards will run so horrible in games that very few if any will buy them. Nvidia positively absolutely needs a high number of FX units sold to motivate developers and publishers into to doing extra work. This extra work is not choose with the FX series when it comes to Pixel Shaders it is absolutley required. Considering all of this together is it any surprise Nvidia has been doing what they are doing?
They put themselves in a vicious circle were good performance is dependent on special coding which is dependant on significant sales volumes for the FX series which is dependant on good performance.
Why would they paint themselves in a corner? and secondly , is it FAIR to require a developer to do all sorts of special work when it clearly benefits only one IHV and is there to make one particular hardware functional?
 
Indio:

Programming specifically for certain hardware is not new, especially under OpenGL (differing precisions in NV_fragment_program). Recent hardware for example, Geforce1 introduced NV_register_combiner. Geforce2 was a speed bumped Geforce1. Geforce3 introduced a programmable vertex pipeline, and NV_texture_shader1/2. Geforce4 gave us NV_texture_shader3. R200 has its own special opengl extensions also.

You also had the two different AGP DMA access methods, such as NV_vertex_array_range and ATI_vertex_array_object. We're lucky now that we have ARB_vertex_buffer_object, which negates the need to program for those other two extensions (which is really required to get any sort of throughput performance out of new video cards - since geforce1).

D3D8 has various things you need to look out for too, such as pixel shader versions (1.1-1.4).

It's not a new problem for developers. Sure it can be a pain in the ass, but it can be handled. ARB_fragment_program is nice because it simplifies things a little w.r.t. writing multiple code paths.

People are really up in arms about doom3 having different code paths. Just remember that it's designed based on geforce1 technology, and supports everything back to that chip. So, you have the standard ARB path, which is ARB_texture_env_dot3. NV10 path (gf1 and gf2). NV20 (gf3 and gf4). R200 (8500). ARB2 (which is ARB_f_p and ARB_vertex_program, which is similar to the NV20 path which uses NV_vertex_program) which will run on R300 and NV30, and then the NV30 path.

I hope this has explained to you the need to have multiple paths, and that it's been common in the past. Even Q3 detected different video cards, because back then there was some problems with certain video cards not supporting required blending modes and such. It's just gotten a bit more complex.
 
We should have an interview with various developers about the "problem" of multiple precision choices, which model they initially target and such.
 
Multiple precisions just seem to needlessly complicate things. There seems to be no significant benefit other than you gain "efficiency " in the code because you are not using any more precision than is needed. This hasn't translated into speed at this point. Even though there is little information about the relative performance of Nv30 custom coding versus coding genericly , it seems at this juncture there is little if any performance gain from being more efficient. This may or may not change in the future.

Where you losing efficiency is in the actual development of the application but then that's not Nvidia's problem. It just seems to me that having to nit-pick which operation deserves alot of precision and which deserves less adds an entire layer of analysis to the equation. I guess we'll just have to wait and see if there is any actual benefit.
 
I think multiple precision choices is a necessary evil -- it is for the transitional period between "yesteryear"'s games and the newest/latest hardware.
 
---Begin Paste---
Internal Precision
- All hardware that support PS2.0 needs to set
D3DPTEXTURECAPS_TEXREPEATNOTSCALEDBYSIZE.
- MaxTextureRepeat is required to be at least (-128, +128).
- Implementations vary precision automatically based on precision of
inputs to a given op for optimal performance.
- For ps_2_0 compliance, the minimum level of internal precision for
temporary registers (r#) is s16e7** (this was incorrectly s10e5 in spec)
- The minimum internal precision level for constants (c#) is s10e5.
- The minimum internal precision level for input texture coordinates
(t#) is s16e7.
- Diffuse and specular (v#) are only required to support [0-1] range,
and high-precision is not required. ---End Paste ---

For ps_3_0 the requirements are the same, however interpolated input
registers are now defined by semantic names. Inputs here behave like t#
registers in ps_2_0: they default to s16e7 unless _pp is specified
(s10e5).

Note that specifying _pp on an input register only affects how they are
read into temp registers or what precision ALU math might run on an op
reading an input as a parameter. However texld* instructions that take
in unmodified texture coordinates will not be affected by the _pp
modifier, as the texture coordinate iterators are of fixed precision.


amar

Unless I read wrong or something has changed, s16e7 is FP24 is it not.
 
From the DX9 SDK:
Code:
The partial precision hint (represented as _pp in the assembly) can be used by the application to indicate to the device that the operation can be performed and the result stored at a lower precision (at least s10e5). This is a hint and many implementations might ignore it.
As Dave said, if Microsoft didn't wish for multiple precision levels to be associated with DX9 they wouldn't have included the modifier.
 
I don't think the story has changed at all. The specifications say that the minimum for full DX9 compliancy the pixel shader must support at least FP24 precision, there is an optional partial precision mode that can be utilised (as specified by the _PP precision hint in some DX instructions) which must be of at least FP16 quality. If MS didn't think there was a need for precision hints they could have just left out the partial precision mode as it is optional and below their minimum requirement.
 
Is there a guideline for use of partial precision or can it be effectively be used across the board thereby negating or circumventing fp24 all together ?
 
DaveBaumann said:
I don't think the story has changed at all. The specifications say that the minimum for full DX9 compliancy the pixel shader must support at least FP24 precision, there is an optional partial precision mode that can be utilised (as specified by the _PP precision hint in some DX instructions) which must be of at least FP16 quality. If MS didn't think there was a need for precision hints they could have just left out the partial precision mode as it is optional and below their minimum requirement.

http://www.pocketmoon.com/Cg/Cg.html

N.B. The current Nvidia drivers comply with an older version of the DX9 shader spec, which contained a typographical error. As a result, the Cg DX9 PS profiles always opts to use Partial Precision where allowed, rather than defaulting to the full precision as indicated in the recently (Feb 03) corrected DX9 specs.

That is where I'm confused.
 
Where's the confusion? Cg's profile uses _PP _as a default_. Its in error and should not use _PP _as a default_.

Its allowed to use _PP, though when instructed.
 
indio said:
Is there a guideline for use of partial precision or can it be effectively be used across the board thereby negating or circumventing fp24 all together ?

The parial precision hint is a developer tool. It should be only left to the control of the developer to make the call as to whether they wish to use it it - they can apply it to all their code if they so feel that's all they need, there shouldn't be a precident for an IHV doing this. There are also parts of the pixel shader pipeline, as described in the spec DM pulled above, that still requires a minimum of FP24 precision.

Doomtrooper said:
http://www.pocketmoon.com/Cg/Cg.html

N.B. The current Nvidia drivers comply with an older version of the DX9 shader spec, which contained a typographical error. As a result, the Cg DX9 PS profiles always opts to use Partial Precision where allowed, rather than defaulting to the full precision as indicated in the recently (Feb 03) corrected DX9 specs.

That is where I'm confused.

Thats a Cg specification. How Cg interprets the DX functions its spits out is up the the developer of the Cg profile (NVIDIA in this case). This is nothing to do with the actual issue at hand - the question was whether precision is a necessary overhead or an unnecessary overhead for development - clearly MS feels that it was necessary or they wouldn't have left the _PP precision hint in the DX9 specs that allows developers that choice.

Again, wait for the forthcoming interivew. It has input from a number of luminaries in the industry and they will be able to answer these questions than we will.
 
DaveBaumann said:
Again, wait for the forthcoming interivew. It has input from a number of luminaries in the industry and they will be able to answer these questions than we will.

Bah...you're just trying to "minimize" nVidia again, by the very fact that you are looking into this. ;)
 
RussSchultz said:
Where's the confusion? Cg's profile uses _PP _as a default_. Its in error and should not use _PP _as a default_.

Its allowed to use _PP, though when instructed.


rather than defaulting to the full precision as indicated in the recently (Feb 03) corrected DX9 specs.


I read from that DX9 was not allowing _pp...so I will wait.
 
I do think Microsoft included the _pp hint with it being the exception (it would seem rather silly not to let a developer use less precision where it is applicable for purely arbitrary reasonings) rather than the rule. I agree even if partial precision is useful even in only one instance it should be implemented. There is no logical reason to absolutely banish it. Not yet at least ;)
However the question remains why would Nvidia make varying degrees of precision (<------- i need to find an abbreviation for this word I'm sick of typing it!) the very core of it's design. NV3x basically will not function adequately without doing the extra work. It's really not optional. Even though the hardware is more flexible in some ways , in others it is constrained. In general it would seem logical that minimum amount of time it takes develop PS enabled apps on Nvidia hardware automaticly exceeds the competition.
In the overall scheme of things , I think that there is a general consensus at this point that software sells hardware. Making it more costly to develop on one hardware over anothers will ultimately cost in units sold.
 
Reverend said:
I think multiple precision choices is a necessary evil -- it is for the transitional period between "yesteryear"'s games and the newest/latest hardware.

I think multiple precisions will soon be forgot, as soon as the NV3x series is a generation or two old. I would be surprised if nVidia continued along this path in subsequent generations. I'm pretty confident the industry will settle for something like fp32 soon and pretty much stay there.

We will need a integer type too though (not talking fixed point) for loops and stuff.
 
Back
Top