Is partial precision dead?

ZioniX said:
There is ~ 4.5% loss in SM 2.0/SM 3.0 scores.
A few single digit percentage gains here and there can easily determine whether you beat the competition or not.

A 10% difference in 3DM06 is considered a sizeable victory/loss.
 
geo said:
Huh? Why would they need two paths? ATI hardware just ignores _pp, doesn't it?
Some games supported both precisions, FarCry, IIRC, did.
geo said:
What am I explaining again? That shaders are a basic part of modern games and HDR is still at the optional stage? How many games have HDR? What percentage of gamers have it turned on when they play those games (I do tho).
Complex shaders in the latest games can be turned off.
 
Dave Baumann said:
I get the impression that Xenos has given us a hint as to MS's thoughts about the longevity of it all.

I have no idea yet what D3D10 requires and/or allows to be frank, but it has been my understanding so far that Microsoft doesn't take those decisions alone (rather M$ + IHVs) and that NVIDIA wasn't the only party that was in favour of split precision formats in the past.

Is SGX FP32 across the board by the way (yes that's an honest question, because the PR's mentioned "up to"....).
 
Deathlike2 said:
It's probably because the NV3x series left a sour taste after its introduction... if it weren't so bad, it probably would have had a better reception..

It shouldn't have been like "it's going to kill performance if it is used extensively"... instead of "use full FP where appropriate". If the performance differential was acceptable, then it wouldn't have mattered as much. However, it wasn't and it will probably be referred to like that for a while.

Prejudism can be a bad consultant.
 
I'm not saying that PP is bad by any stretch of the imagination.. it's that the NV3x did look bad in comparison to the R300.
 
Vysez said:
Some games supported both precisions, FarCry, IIRC, did.

I'm still not following. Yes, more than FC did. But my understanding is it wouldn't need to be "a separate path", at least not for that reason. ATI hardware reads _pp hints and says "thanks, but no thanks."

Complex shaders in the latest games can be turned off.

Fair point. Are these "optional" shaders the only ones that _pp is used on in recent/new development? If so, then it would be a fair comparison.
 
Humus said:
Was it ever alive?

For a very high persentage of graphics units sold in the past years apparently yes.

----------------------------------------------------------------------------

Back to the other one since I found the PR snippet:

PowerVR SGX enables compelling image quality via Internal True Colourâ„¢ - enabling colour operations to be performed on-chip using high precision mathematics at arbitrary pixel colour precision up to 128bit/IEEE32 floating point....

http://www.imgtec.com/news/Release/index.asp?ID=259

Awaiting possible clarification from any PVR employee, if I'm reading the above correctly.
 
Deathlike2 said:
It's probably because the NV3x series left a sour taste after its introduction... if it weren't so bad, it probably would have had a better reception..

It shouldn't have been like "it's going to kill performance if it is used extensively"... instead of "use full FP where appropriate". If the performance differential was acceptable, then it wouldn't have mattered as much. However, it wasn't and it will probably be referred to like that for a while.


The thing is. Most of the IQ problems we had with the NV3x were not a result of partial precision. But rather the inclusion of things like normalized cubemaps in Far Cry for the floor tiles or the outright replacement of FP precision with integer.
 
Last edited by a moderator:
ChrisRay said:
The thing is. Most of the IQ problems we had with the NV3x were not a result of partial precision. But rather the inclusion of things like normalized cubemaps in Far Cry for the floor tiles or the outright replacement of FP precision with integer.


Very true, also this was done in many games. At the moment for current games most shaders (all surface shaders) don't need the full 24FP or 32FP, its just over kill, but later on yes this will be necessary
 
Last edited by a moderator:
DeanoC said:
Partial Precision (half is alot easier to write you know...) is alot faster on G70 fragment shaders... Using halfs double the size of register pool, the effect of that is allowing upto twice as many fragments to be in flight at once.

Its still the first peice of advise NVIDIA give about writing fragment shaders.

In the real world though is this something you would use or is it now predominantly full precision all the way through. I ask because I recall what you said here from a long time ago.

DeanoC said:
It means when all the shaders are working at a good FPS on ATI or NV40 cards but sucks on GFFX, I have to go through the shader code looking for places to partial precision it.

This is harder than it sounds as unlike teapot renderers, many games (i.e. Valve, Crytek, Ninja Theory) are using auto-generated shaders. This make adding partial precision much harder as truncating the precision too early in the shader code, can look fine on some shaders and rubbish on the more complex ones. Our current system has the ability to override (by material name) shaders from the auto-generated ones but that takes work to a)find which shader need optimising and b)optimise them.

As the majority of the cards we are targeting (ATI R3x0, R4x0 and NV40) generally don't need this work, having to do it just for NV3x is a pain.

I think the problem that often missed in this discussion is that in games we don't really work on 'a' shader but lots. I don't actually know how many pixel shaders we have but I know that total shaders (vertex and pixel) is over 6000.
 
6,000! Yowza. Okay, let's say only 1/2 of those are pixel. Further, you can knock 'em off at 2 mins each to figure out if _pp is not a problem here. That's still 100 person hours (I don't know if that's reasonable or not, but I'm trying not to be "worst case" here). Wouldn't it be nice if that went away?

All I'm saying is I'm assuming NV50 is a pretty new architecture --I hope they removed the register pressure making this a "win" on their current and previous parts.
 
Are there two? Is this one different than the one that existed before it was G80 and then became NV50 again? :LOL:
 
nelg said:
In the real world though is this something you would use or is it now predominantly full precision all the way through. I ask because I recall what you said here from a long time ago.
If your targetting good performance on NVIDIA based machines it makes a reasonable difference. We noticed a significant speed up when we started using halfs where we could.

As our shaders have got more complex, the difference on NV4x becomes more noticeable (I wrote that comment before we moved some of lighting computation down to the fragment level...)
 
geo said:
6,000! Yowza. Okay, let's say only 1/2 of those are pixel. Further, you can knock 'em off at 2 mins each to figure out if _pp is not a problem here. That's still 100 person hours (I don't know if that's reasonable or not, but I'm trying not to be "worst case" here). Wouldn't it be nice if that went away?
Its not that bad... you simply look at your scene and if you can see a noticable difference you go back and work out why. Same with compiler bugs, though compiler bugs are much harder to fix...

We went through our code, adding half everywhere we thought it was safe. Looked at some complex scenes and our material test scenes doing an 'eyeball' comparision that it was okay. Went back and fixed up a few, repeat and rinse.

Then we did a second pass, putting half in places which were more doubtful, repeat and rinse until it looks the same. Then do a profile and see how much speed you've gained.
 
  • Like
Reactions: Geo
There is one case where FP16 precision may continue to find value -- normalization.

One aspect of shaders increasing ALU : TEX ratio is the replacement of cube map normalization (or no normalization at all) with ALU normalization. It turns out, that in most cases, no more than 17-bits is needed to represent all the normals you will need in the vast majority of cases. Because normals don't need much precision, and because normals have 48-fold symmetry, the total number of unique normals are about 2,000 (which are blown up to ~100k via symmetry) This is a small enough number to enable hardware lookup table implementations.

Now, one might argue that a NRM is not needed every cycle, so "free" norm isn't that big of a deal, but I'd argue it's common enough, and it doesn't eat alot of transistors. DP4/RSQ/MUL eats 3+ cycles on other GPUs and ties up "live" GPRs, plus eats an ALU. A free NRM_PP could operate "in place" and eat less GPRs and ALU.
 
DeanoC said:
Its not that bad... you simply look at your scene and if you can see a noticable difference you go back and work out why. Same with compiler bugs, though compiler bugs are much harder to fix...

We went through our code, adding half everywhere we thought it was safe. Looked at some complex scenes and our material test scenes doing an 'eyeball' comparision that it was okay. Went back and fixed up a few, repeat and rinse.

Then we did a second pass, putting half in places which were more doubtful, repeat and rinse until it looks the same. Then do a profile and see how much speed you've gained.

Thanks, that's useful. Care to put a rough man-hours estimate on that process? And an estimate of the percentage that ended up _pp?
 
geo said:
Are there two? Is this one different than the one that existed before it was G80 and then became NV50 again? :LOL:

We all know that NV5x or G8x means NVIDIA's next-gen chip, regardless of what it will be actually codenamed.
 
Back
Top