FP16 and market support

Dio said:
Chalnoth said:
You are misguided as to how much of a benefit FP24 really is over FP16, as it seems most ATI supporters are.
I fear not. FP24 is sufficient to do texture addressing calculations and FP16 is not. It is not even close to having enough precision OR range to be useful for anything but colour calculations.
Exactly. FP32 is available for texture addressing, and I'm not sure that FP24 is quite enough.

Please stop spreading this FUD.
It's not FUD. It's combatting people erroneous perceptions that somehow the NV3x works at either FP32 or FP16 all the time, and going FP32 is always going to be too slow. Neither is true.
 
and I'm not sure that FP24 is quite enough.

Well I'm sure all of ATI's hardware designers and MS's DX developers are thrilled to know you've told the real truth on what's good enough. :rolleyes:

What REAL qualifications do you actually have to tell ATi and Ms why they're wrong?

and going FP32 is always going to be too slow. Neither is true.

Too slow for what? gaming? It certainly well is. pointless techdemos to thrill the fanboys? Probably not.

nVidia's FP32 is IEEE.

This has been covered in another thread you know.
 
Chalnoth said:
Simon F said:
FWIW, the FP32 in the shaders isn't IEEE either (or at least it doesn't have to be).
nVidia's FP32 is IEEE.
I see someone else has already commented but, FWIW, DX has some things in it which actually go against IEEE-754 behaviour.
 
Would anyone care to fathom how much better Fp24 is Fp16 and Fp32 over Fp24? The way I understand it the returns diminish rapidly the higher you go. Would it be fair to say that something like Fp24 is 30% better than Fp16 while Fp32 is 5% better than Fp24? All of course in the context of the differences being visible not a mathematical comparison. .
 
Chalnoth said:
Exactly. FP32 is available for texture addressing, and I'm not sure that FP24 is quite enough.
Well, I am, and a lot of people much smarter than me are.

Chalnoth said:
Please stop spreading this FUD.
It's not FUD. It's combatting people erroneous perceptions that somehow the NV3x works at either FP32 or FP16 all the time, and going FP32 is always going to be too slow. Neither is true.
I have never made any comments on performance. Personally I don't see any reason any method should be faster or slower than any other if the hardware is correctly designed.

The benefits of FP32 over FP24 are currently only in marginal cases; the benefits of FP24 over FP16 are quite obvious. So to say that FP24 is only marginally better than FP16 is clearly FUD from where I sit.
 
nelg said:
Would anyone care to fathom how much better Fp24 is Fp16 and Fp32 over Fp24? The way I understand it the returns diminish rapidly the higher you go. Would it be fair to say that something like Fp24 is 30% better than Fp16 while Fp32 is 5% better than Fp24? All of course in the context of the differences being visible not a mathematical comparison. .
Well, it's all a question of how useful it is for the job in hand:

FP16 is pretty much useless for anything but colour calculations.
FP24 is fine for texture calculations but starts to lose accuracy a touch on long chains of operations.
FP32 is as FP24, only pushes the breakdown point out somewhat further (about six times as many instructions).
FP64 pushes out the breakdown point so far that problems are very unlikely.

I'd make a very rough guestimate and say that for instructions in DX9-class applications - that I've seen - about 50% of instructions calculate something for use in a texture dereference, and about 50% calculate some colour values, and I've only seen the one heavy math shader with chains long enough to really make accuracy drift matter (the Mandelbrot).

So I could argue that FP16 gets you 50%, FP24 gets you 99.9% and FP32 gets you the other 0.1% :).

Actually I do think FP32 is somewhat more useful than that, but I'd still say that FP24 is at least twice as useful as FP16, and FP32 really doesn't matter at all in this generation of hardware (in the same way it would have been a waste of the consumer's money for a Rage Pro to use FP16).
 
To me FP16 and FP32 make more sense.

FP32 since it's more orthogonal to the VS, and (if the implimentation is IEEE compliant) I'm likely to encounter fewer rounding surprises. Especially since the VS/PS model is becoming more general in approach to programmability.

FP16, because I work with FP16 image formats all the time (and FP32 as well, as both are pretty common as CGI render targets and for compositing). Plus with future window servers moving towards full-blown compositors with GPU's managing it (already the case on OS X and Longhorn whenever the hell it gets here), and applications using those services as acceleration hooks (again already the case with Shake and FCP to some degree on OS X), I'd rather have hardware that works with existing formats natively (rather than converting down and back up again)...

As far as I'm concerned FP24 exists simply because of compromises during hardware design of a particular vendors parts, that I hope doesn't live on besides those parts that already exist...
 
Dio said:
The benefits of FP32 over FP24 are currently only in marginal cases; the benefits of FP24 over FP16 are quite obvious. So to say that FP24 is only marginally better than FP16 is clearly FUD from where I sit.
Did I say that FP24 was only marginally better than FP16?

No. I said that FP16 has its uses. FP32 is there when those uses run out.
 
Chalnoth said:
Dio said:
The benefits of FP32 over FP24 are currently only in marginal cases; the benefits of FP24 over FP16 are quite obvious. So to say that FP24 is only marginally better than FP16 is clearly FUD from where I sit.
Did I say that FP24 was only marginally better than FP16?

No. I said that FP16 has its uses. FP32 is there when those uses run out.
What you said:
Chalnoth said:
You are misguided as to how much of a benefit FP24 really is over FP16, as it seems most ATI supporters are.
What you said:
Chalnoth said:
FP32 is available for texture addressing, and I'm not sure that FP24 is quite enough.
What you said:
Chalnoth said:
It's not FUD. It's combatting people erroneous perceptions that somehow the NV3x works at either FP32 or FP16 all the time, and going FP32 is always going to be too slow. Neither is true.
How many errors do we have in these quotations?

I'm sure I'll sleep easier knowing that Chalnoth has decreed that FP24 is not enough precision and that everyone at ATI and MS is just plain wrong.
 
nelg said:
Would anyone care to fathom how much better Fp24 is Fp16 and Fp32 over Fp24? The way I understand it the returns diminish rapidly the higher you go. Would it be fair to say that something like Fp24 is 30% better than Fp16 while Fp32 is 5% better than Fp24? All of course in the context of the differences being visible not a mathematical comparison. .
I think it would be wrong to attach figures/percentages to the differences. It is difficult to provide "percentage of difference" in the first place. I mean, how many percent is 32-bit better than 16-bit color?

It's far better to just list a smattering of examples where the differences between the two formats exists, and how severe or negligible the differences are. At the risk of sounding like a broken record, here and here are threads with lively and informative discussions about the differences.
 
Dio said:
So I could argue that FP16 gets you 50%, FP24 gets you 99.9% and FP32 gets you the other 0.1% :).

But if 50% of the average workload could be handled fine by FP16, wouldn't the optimal architecture be one that has HW FP16 support for 50% of those workloads that needed it, and HW FP32/FP24 for the other 50%? In that way, transistors saved by implementing FP16 (instead of FP24) for some part of the pipeline can be applied to either implementing FP32 or more FP24 units for the rest.

The only downside is the increase in complexity to support mixed-precision execution in both HW and in the compiler.

Obviously of course, if the average workload changes, HW will have to change as well, just like the majority of the workload was multitexturing.

ATI seems to have decided on an FP24/FP32 mixed precision architecture, but if color ops don't need FP24 in the majority of cases, it seems like FP16 would have allowed even better tradeoffs with the transistor budget.
 
DemoCoder said:
But if 50% of the average workload could be handled fine by FP16, wouldn't the optimal architecture be one that has HW FP16 support for 50% of those workloads that needed it, and HW FP32/FP24 for the other 50%? In that way, transistors saved by implementing FP16 (instead of FP24) for some part of the pipeline can be applied to either implementing FP32 or more FP24 units for the rest.
If I read what you're saying correctly, you're saying that ATI should have done FP16, then added support for FP24. If you have 8 FP16 units that can also do FP24, then you've gained nothing. If you mean that ATI should have done FP16 then added extra FP24 units, how is this a better use of resources? You're talking about using more transitors than was already used. Even if it gains you an extra execution unit (you can do two FP16 ops or one FP24 op), it still means a larger chip which means higher cost, lower yields, etc.
The only downside is the increase in complexity to support mixed-precision execution in both HW and in the compiler.
Sounds like a big downside to me.
 
OpenGL guy said:
What you said:
Yes, and I stand by every single thing you quoted of me above.

FP24 isn't as much better than FP16 as many here seem to think on the NV3x only because the NV3x also supports FP32.

And nVidia claims that FP24 is not enough for texture addressing. ATI claims that it is. Each company has a vested interest in posting their own view, and each is probably correct within certain tolerances for texture addressing.
 
Chalnoth said:
OpenGL guy said:
What you said:
Yes, and I stand by every single thing you quoted of me above.
Maybe you should reread some of your own quotes. Like how the NV3X does something other than FP16 or FP32.
FP24 isn't much better than FP16 on the NV3x only because the NV3x also supports FP32.
That's not what you said before.
And nVidia claims that FP24 is not enough for texture addressing. ATI claims that it is. Each company has a vested interest in posting their own view, and each is probably correct within certain tolerances for texture addressing.
MS also seems to agree that FP24 is adequate for texture addressing. However, aside from that, why would you take nvidia's view over ATI's? Does nvidia have FP24 hardware? ATI does and you can see for yourself that FP24 is plenty for now.
 
archie4oz said:
To me FP16 and FP32 make more sense.

FP32 since it's more orthogonal to the VS, and (if the implimentation is IEEE compliant) I'm likely to encounter fewer rounding surprises. Especially since the VS/PS model is becoming more general in approach to programmability.

FP16, because I work with FP16 image formats all the time (and FP32 as well, as both are pretty common as CGI render targets and for compositing). Plus with future window servers moving towards full-blown compositors with GPU's managing it (already the case on OS X and Longhorn whenever the hell it gets here), and applications using those services as acceleration hooks (again already the case with Shake and FCP to some degree on OS X), I'd rather have hardware that works with existing formats natively (rather than converting down and back up again)...

As far as I'm concerned FP24 exists simply because of compromises during hardware design of a particular vendors parts, that I hope doesn't live on besides those parts that already exist...

Yeah FP16 file format are all nice and good thats what nvidia said why FP16 calculations are good enough.

But what is the precision of calculates done in your CGI rendering.
 
This might be a more interesting discussion when there is a chip that can run FP16 pixel shaders faster than ATI's can run FP24.
 
i have 1bit, 0.5bit per pixel fileformats.. does that mean i have to do my calculations on 0.5bits per colour per pixel? i guess not


fp16 has about half the precicion of fp24, this cummulates _VERY_ fast.. its about the same as listening to 8bit sampled audio compared to 16bit sampled audio (so even non-both-gpu users can get a "grip" on it..). it adds precicion behind the 0.0009765625 range.. thats about 1/1000

fp32 adds precicion behind the 0.000030517578125 range (for numbers between 0.5 and 1, that is).. there it adds quite a bit to it.

24bit audio sounds more nice than 16bit audio, yes. definitely. but in general, you have enough quality with 16bit audio (fp24).



the main trick is, todays pixelshaders have a very restricted instruction count. if it would have no pixel shader at all, even fp64 wouldn't help, as the final output is still only 32bit (a.k.a. 8bit per component) colour.

todays pixelshaders can't cummulate errors for that long, errors in general can't scale up too big. fp24 is a good balance.. it is precious enough to allow for most shaders of today, with the 96 pixelshaders instructions at max, to work at good quality. fp32 is bether, but in that restricted area you won't gain much anymore.

for ps3.0, with much more instructions, fp32 gets needed. errors would else cummulate too much..

oh, and, yes, nvidia was stupid by creating fp16 because of cgi.. they don't render anything at fp16. they just store it at that.. they even do a lot in fp64!

calculations should be done at high enough precicion all the time. storage doesn't need it. as i said.. 0.5bits per pixel are sometimes enough:D


oh, and, Chalnoth, you're.. rather stupid:D
 
DemoCoder said:
But if 50% of the average workload could be handled fine by FP16, wouldn't the optimal architecture be one that has HW FP16 support for 50% of those workloads that needed it, and HW FP32/FP24 for the other 50%? In that way, transistors saved by implementing FP16 (instead of FP24) for some part of the pipeline can be applied to either implementing FP32 or more FP24 units for the rest.

My personal conjecture (and I stress that it is only educated speculation) is that there were no transistors saved in the NV FP16 implementation. My guess is that it's a FP32 pipeline with additional units to expand 16->32 at the front end and to compact 32->16 at the back end. Putting in separate 32 and 16bit pipelines would be silly. Each 32bit register could thus store twice as many floating values if used in a 16bit mode.

As I said before, the NV chip seems to suffer performance drops with increasing register usage which would seem to indicate a lack of register space and/or some strange limitations on access. Having the 16bit storage would thus give a big boost in performance by halving the number of physical registers required.
 
Back
Top