Variable colour precision in future benchmarks?

g__day

Regular
Preface:

Pardon me if I over simplify (or get it wrong) from lack of detailed knowledge. My questions are based on what I understand NVidia is saying in regards to 3d Mark 03 tests and my own personal journey towards learning more about 3d development and hardware.


Questions:

Once game developers start developing games where graphics is primarily done by shaders on todays top NVidia hardware platforms are NVidia suggetsing the following rules apply:


1. Be modest in your demands for high colour precision in any scene, regardless if you are using DX9 or OpenGL in any pathway.

Your code has to be mapped to physical hardware that only has so much throughput in its capablity to handle fx12, fp16 or fp32 bit colour precision. So when you code your games budget for how much of a frame will need high precision colour rendering (pixel shading).

Prehaps today on NV35 set a target for a demanding game with alot of shaders and complex lighting to use 5 - 10% fp32 in any given scene, 60% fp16 and 35 - 30% fx12 colour precision on our best 3d hardware (yes I know all games target 2 generation old h/w anyway).


2. When you write pixel shaders you need to specify (hard code) the exact colour precision required for certain effects on any object knowing exactly what the colour precision is in advance.

I understand 3d GPU assembler is not a high level abstract language. You have to specify the data accuracy in the specific instructions used and the data types they address. Worst case this would mean writing 3 different precision shaders to do the same effect on 3 different colour precision objects (or is worse case having to write nine shaders for a FX 12, fp16 or fp32 based model interacting with a fx 12, fp 16 or fp32 image affecting pixel shader)?


3. Whenever a object needs to be modified you would need to know what colour precision it is currently at and what colour precision object or event is affecting it and therefore what colour precision the target result should be calculated at?

So if Object A is skinned to colour accuracy fx 12 (say its a persons arm) and it interacts with object B (say a can of silver paint) causing shader C (a fp 16 or fp32 accuracy function) - to add a reflective mirror surface to that persons arm - then you'd somehow have to up the colour accuracy of shading that object to fp32 level within that environment.


* * *


This all sounds very powerful but extremely complex to manage in future games or benchmarks. Is managing that level of complexity for every object in a game and every shading function really what would have to be done in future?
 
Personally, I don't see the point of FX12 when FP16 is available. They provide the exact same mantissa, though the latter also allows for a [relatively] high dynamic range. FX12 is faster on the NV30, sure, but as I've asked before: is it really faster than FP16 would be on an otherwise equal card that didn't support FX12? Were compromises to floating point speed made in order to support FX12? The answers to these question could very well validate or invalidate the need for FX12.

There are uses for integer types, such as for counters, but even then you do not need a 4 channel data type with 12 bits per channel. All you'd need is an 8 bit scalar datatype, maximum. Also, if all it's going to be used for is counters then all you would need is specialized datatype meant exclusively for counters, rather than a general purpose one.
 
cellarboy said:
Chalnoth said:
As for why FX12 exists, it takes far fewer transistors than FP16.

But if you already have FP16 on the die anyway.......?

There's no such thing as infinite FP16 functionality :)
Every additional bit of speed using FP16 is gonna cost.

The NV30 is 8FX12 units + 4FP32 units.

The reason FP16 is faster than FP32 is only register usage.


Uttar
 
Wonders why the only comments so far have no apparent connection to the 3 questions asked in the thread?
 
Okay, okay... :)

Well, part of what you wrote is wrong, part of it is oversimplified and much of it is correct IMO.

First, the NV3x allows *per-instruction* precision levels. That means you don't have to say all the shader is FP16 or stuff. It's just like in traditional CPU programming: you can do part of the task in a precision format such as FP32 and another part in another format such as FP16.

In the NV30, NV31 and NV34, this is 100% free, and it is highly reccomended to always operate like that.

However, according to some early reports, the NV35 works faster when doing ONLY FP32 or FP16, and not both of them.
Sadly, due to the rarity of the card right now, the details are unknown, and it is still expected that a mixed FP16/FP32 program would be faster than a full FP32 one, just not as faster as on the NV30.


Uttar
 
Many thanks Uttar, we only learn by trying and asking questions to gain knowledge.

I didn't intuit that a shader containing maybe 20 instructions to shade one object might using varying precision maths or logic across these instructions.

This must be one of the most intimidating and challenging places on the web to post technical questions - not because of any attitudes but due to the sheer volume of knowledge across the breadth and depth of 3d software and hardware. Its almost like being a kid again and sitting at the big table for the first time and wanting to contribute but only understanding 2% of the conversation.

B3D is very rapidly becomming my most favourite website for pure quality of what to me is the most interesting component of the PC. Its just that for every 200 posts I read - I feel that after 8 months I can contribute to about 3% of them in anyway whatsoever. And this from someone with 20+ years passion in PCs - on his 16th PC he build himself with over 10,000 posts around the world on technical forums. Its great to find a new site openning up a whole new branch to study - I feel like a kid in a toy store :)

But life's all about learning and doing!
 
Uttar said:
cellarboy said:
Chalnoth said:
As for why FX12 exists, it takes far fewer transistors than FP16.

But if you already have FP16 on the die anyway.......?

There's no such thing as infinite FP16 functionality :)
Every additional bit of speed using FP16 is gonna cost.

The NV30 is 8FX12 units + 4FP32 units.

The reason FP16 is faster than FP32 is only register usage.
Indeed, but like cellarboy said (paraphrased): why bother with FX12 when you support FP16? If the GPU had an equal number of floating point units as integer units, would FX12 still be fast enough in comparison to justify its existence? It'd probably still be faster due to the relative simplicity of integer math, but I still don't feel that it would be necessary to support.
 
It maybe upsetting for NVidia, but is it fair to say NV3x cards will definitely be fast in current and old games and fast in new games where there are proprietary API extension that allow NVidia to drop image quality generally below R3x0 levels?

Reduction in image quality for NVidia I guess is very possible in OpenGL - as Doom 3 will prove with the NV30 path but is more challenging in Directx 9 at the moment, unless you use Cg code to bypass DX9.
 
Well, I think it's really important to differentiate the NV30/NV31/NV34 and the NV35/NV36 architectures.

From my current understanding, the NV35/NV36 ( which I personally prefer call NV3+ for simplicity ) is pure FP32, but with still very big register usage performance hits thus being much faster in FP16.
The NV30/NV31/NV34 ( NV3- ) , however, are three times faster in FX12 than in FP16.

Of course, the NV3+ details might be incorrect, since as I said before, the boards are still rare.

Also, Cg is not capable of making DX9 PS2.0.+ run integer functionality, so it is often not sufficent to help nVidia.

But yes, in general, it is fair to say the NV3x will be fast in old games and in new games using their own proprietary extensions. The NV3+ will run many future games alright too, but certainly not as well as old/proprietary-using ones. The NV3-, though, is truly a mess when using standards, IMO.


Uttar
 
These "standards" are almost arbitrary anyway. ARB_fragment_program working group was lead by ATI, and is very R300 centric. Is it any coincidence that MS magically chose PS 2.0 specs as they are right now? I think not. If you disagree, you are not well enough informed.
 
Uttar:

Do you happen to know how long we knew about FX12 on the NV3x architecture? I haven't looked into it yet, but I'm wondering what the probability is that the FX12 units are a hack to make the card competative with the R300 architecture. I'm not a hardware engineer, but the design seems so convoluted that it's hard for me to imagine that the NV3x chips in their current form was the design goal when they started the project. Would it be possible to run low precision applications on only the FP32 units? I'd like to see how it compares to the GF4. Without the R300 getting in the way, and low-k dielectric working, I'm curious if it would have been enough without FX12.

Nite_Hawk
 
Nite_Hawk said:
Uttar:

Do you happen to know how long we knew about FX12 on the NV3x architecture?

Actually, I believe we only *realized* there was integer functionality in the NV30 at launch, and we learned it was FX12 and not FX9 like on the GF4 some time *after* the reviews hit the web IIRC.

But I think CineFX for-developer documents at developer.nvidia.com mention FX12 since the very first detailed documents ( although some overview papers didn't mention any integer functionality whatsoever )

Now, we have to realize that the NV30 got some oddities. There was a reported tape-out in March, then another one in August, which the company denying any tape-out in March. Maybe just a false rumor, though.
And there is nVidia recently stating that the NV30 was underresourced, resulting in "features being cut".

What's fairly obvious is that with any FX12 units, the NV30 would pretty much be, on a shader point of view, at 400Mhz, probably on par with a Radeon 9000 - never better, sometimes worse, I'd even say.

It's a given the NV30 was supposed to have more than those FX12 units. But it's harder to say whether they didn't replace maybe 3 FP32 units in 2 FX12 units and one FP32 unit ( thus making the original design identical to the NV35 )

Who knows...


Uttar
 
What I'd like to see is color precision levels made forceable directly through the user interface to the drivers in a fashion very similar to the way degree of FSAA is currently implemented. After all, if I can choose levels of integer precision (8-bit, 16-bit, 24+8), why not be able to choose levels of fp precision directly as well in fp-supporting games (or benchmarks)?

It's always going to be problematic to judge comparable performance between products as long as these things are shrouded in mystery and fog. I'd rather see that as opposed to having everything buried in application code. If the application can command it from the drivers then so ought the user interface be able to do that. How interesting would FSAA be if levels of it could not be forced directly through the driver, although I'm not suggesting for instance that there's any worth to trying to force a 24+8 integer precision to an fp precision...;)
 
Back
Top