FP16 and market support

Bouncing Zabaglione Bros. said:
The same question applies to Nvidia's use of 32 bit over 24 bit doesn't it?

Why does Nvidia have slower 32 bit when they only "needed" 24 bit?
The question you should be asking is why nVidia's 32-bit is slower. It doesn't need to be because nVidia supports 32-bit FP, or because nVidia supports 16-bit FP. There are many, many other differences between the architectures besides the precision differences. FP32 can be every bit as fast as FP24. The main benefit of going FP32 is that one can also support FP16 to some benefit.

And FP32 support is the future. FP32 is required if you want to unify the vertex and pixel pipelines.
 
Chalnoth said:
jvd said:
and whats your point . Ms picked fp24 over fp16 though .
MS picked FP24 and FP16. You are misguided as to how much of a benefit FP24 really is over FP16, as it seems most ATI supporters are.

Higher than FP16 is required for non-color data. High dynamic range color data will not exhaust the dynamic range of FP16. FP24 is a bare minimum for texture addressing (it may not be quite enough for proper texture addressing). Why use higher precision when it's not needed if you can gain performance from using a lower precision?

And nivida decided to support the lower of the two. Whats your point ? Ati has the highest fp support that is also usable.

So why are we using 32 bit color , why are we using trilinar filtering ?

Why cna ati uses 24fp and have it be playable across the board even with its middle end cards . You should be kissing ati's feat for pushing the envolope while also having the power to see it used in a first gen product. Nvidia is not pushing the envolope. They are holding it back. It can be argueed that because of nvidia we will be stuck at sub fp24. While if they went the route ati did we would have a solid standard at fp24.


Are you that in love with nvidia that you will defend them even when they are wrong ? I had my geforce 4 but now the radeon is the better card to own.
 
ninelven said:
If your HW doesn't meet the specs, then it's wrong. Make sense?
Who makes the specs?
Who makes the specs? For DX9, it's MS who has the final say, of course, but MS works with IHV's to come up with the spec.
And they are automagically right? Infallible?
Of course the spec is "automagically" right: It's correct by definition! Does this mean it's perfect? Of course not. However, if you're designing a part to meet a certain spec, then it's your job to make sure your part meets or exceeds the spec's requirements, right? For example, if a spec requires you to design a screw with a certain thread, a certain resistance to corrosion, a certain tensile strength, etc., who's fault would it be if the screw you designed for the job didn't meet these specs? Who's going to listen to you if you say, "My screw is good enough. Who cares about those last couple pounds of tensile strength in the spec?"? Chances are no one. Why should it be any different here?
 
Chalnoth said:
FP32 can be every bit as fast as FP24. The main benefit of going FP32 is that one can also support FP16 to some benefit.

But it isn't as fast on NV3x isn't it? Nvidia can't do gaming speeds with FP32, which makes it kind of usless for the gaming market it is being squarely aimed at.

Chalnoth said:
And FP32 support is the future. FP32 is required if you want to unify the vertex and pixel pipelines.

Agreed, but it's only "the future" if it can be done at useful speeds. Of course the question begs, if FP32 is the future, what is Nvidia doing with it now in an unusable state instead of providing the minimum spec that would be faster? IMNSHO, it's only there for the marketing tickbox, just like when Nvidia introduced FSAA, 32 bit colour, large textures, etc.
 
jvd said:
And nivida decided to support the lower of the two. Whats your point ? Ati has the highest fp support that is also usable.

So why are we using 32 bit color , why are we using trilinar filtering ?
nVidia supports both the lower and the higher FP precisions available. Your same arguments apply to the ability to use FP32 instead of FP24 in the GeForce FX. The point is that you have a choice. Specifically, the FX architecture is designed to sometimes use FP32, and to use FP16 at other times.

The FX architecture is not designed to always use FP32, or to always use FP16. It's designed to use both at the same time. It's designed to use whichever is best for the situation at hand.
 
Chalnoth said:
jvd said:
And nivida decided to support the lower of the two. Whats your point ? Ati has the highest fp support that is also usable.

So why are we using 32 bit color , why are we using trilinar filtering ?
nVidia supports both the lower and the higher FP precisions available. Your same arguments apply to the ability to use FP32 instead of FP24 in the GeForce FX. The point is that you have a choice. Specifically, the FX architecture is designed to sometimes use FP32, and to use FP16 at other times.

The FX architecture is not designed to always use FP32, or to always use FP16. It's designed to use both at the same time. It's designed to use whichever is best for the situation at hand.

Yet it doesn't do fp32 fast enough to be playable. Ati does fp24 fast enough to be playable. Nvidia barely offers playable fp16 with some fp32 thrown in.

Look at all the problems with half life . Who but a fool would buy anything but an ati card for that game ? Looking at the benchmarks of the dx 9 software we have avalible that comment will be an on going theme when it comes to nvidia and ati.
 
NVidia's speed problems are with floating point precision in general. They have half the pixel pipelines as ATI does with floating point units, so what do you expect? Replace the FP24 units in the R3x0 with FP32 ones and there will be no change in speed (assuming the same clock speed). Replace the FP32 units in the NV3x with FP24 and there will also been no change in speed. FP32 or any other precision is not inherently slower -- more costly in terms of transistors, perhaps (not sure), but not slower. Same thing goes for FP16 -- it's not inherently faster.
 
Chalnoth said:
I do remember saying that HLSL is designed rather poorly. Cg is designed a little bit better, but is still much more limited than GLSL's design. I wasn't talking about how the compiler optimizes: I was talking about the way it was put together.

The only language feature (apart from having access to OpenGL internal variables like matrices, lights, etc.) different between Cg and DX HLSL is the fixed data type, again something I wouldn't say is in any way forward looking in post-DX9 world and would have been completely unnecessary if NV30 hadn't had FX12 units in the first place. The rest is for all purposes the same, NVidia even advertises Cg being syntax compatible with DX HLSL. Cg still has to go through the same procedure of compiling to PS or ARB assembler (and you can get the output out) before you can use the shaders unless there is a secret driver hook that can completely bypass the D3D runtime, which would be a nightmare to maintain and completely breaks the illusion of platform independency of Cg. So to me Cg is just thr poorer compiler of the two, which also supports OpenGL (a good thing in itself though).

I agree that GLSL is better designed in this respect (though I still hate the irrelevant changes they made compared to the already established DX HLSL and Cg, mainly float4 --> vec4), though puts more weight on the driver department to actually have proper (and much more complex than pure assembler postprocessor) compiler in place. At the moment the Tenebrae 2 delux mapping shader used as example in the thread I referred to earlier, that compiles to around 30 instructions with DX HLSL compiler (and not too much worse with Cg compiler ;) ) hits GLSL software emulation with 9800Pro and Catalyst 3.10... (simpler bump mapping shaders work fine).
 
radar1200gs said:
There is no single "right" way of doing anything in 3d. If nVidia is supporting HDR the OpenGL way then I'd suggest that way has a certain credibility to it. This is simply more proof of microsoft doing everything they possibly can to screw nVidia over. What are "normal DX specifications" anyhow, compared to OpenGL specifications that carry the support of companies such as SGI, 3DLAbs etc? This is the exact same issue as FP16 which has been used for years and years to produce professional work. If its good enough for professional use its certainly good enough for gaming.
Lets look at what is probably the most important OGL benchmark (albeit beta) Doom3. Using non proprietary extensions ATI hardware is still faster. So is the OpenGL ARB screwing nV?
 
nelg said:
radar1200gs said:
There is no single "right" way of doing anything in 3d. If nVidia is supporting HDR the OpenGL way then I'd suggest that way has a certain credibility to it. This is simply more proof of microsoft doing everything they possibly can to screw nVidia over. What are "normal DX specifications" anyhow, compared to OpenGL specifications that carry the support of companies such as SGI, 3DLAbs etc? This is the exact same issue as FP16 which has been used for years and years to produce professional work. If its good enough for professional use its certainly good enough for gaming.
Lets look at what is probably the most important OGL benchmark (albeit beta) Doom3. Using non proprietary extensions ATI hardware is still faster. So is the OpenGL ARB screwing nV?

Yes! :devilish:
 
I was just reading through the "Issues" section of the GLslang spec and noted the following:
33) Should precision hints be supported (e.g., using 16-bit floats or 32-bit floats)?
DISCUSSION: Standardizing on a single data type for computations greatly simplifies the
specification of the language. Even if an implementation is allowed to silently promote a reduced
precision value, a shader may exhibit different behavior if the writer had inadvertently relied on the
clamping or wrapping semantics of the reduced operator. By defining a set of reduced precision
types all we would end up doing is forcing the hardware to implement them to stay compatible.
When writing general programs, programmers have long given up worrying if it is more efficient to
do a calculation in bytes, shorts or longs and we do not want shader writers to believe they have to
concern themselves similarly. The only short term benefit of supporting reduced precision data types
is that it may allow existing hardware to run a subset of shaders more effectively.

This issue is related to Issue (30) and Issue (68).
RESOLUTION: Performance/space/precision hints and types will not be provided as a standard part
of the langauge, but reserved words for doing so will be.
CLOSED: November 26, 2002.
The bolded text is most interesting. By that one can infer that the GLslang development group does not think that lower precisions will last in the next few generations of hardware. Of course, the resolution to this issue and issue #68 indicate that support for a lower precision has not been entirely ruled out as a future possiblity. Their main concern, as is outline in issue #68, is that when a vendor does not support an alternate precision, that hardware would have to decide on an existing precision to use (or simply not support the shader program entirely). This may lead to unexpected consequences if the bounds of the precision are unintentionally exceeded.

Now, time to say something potentially stupid:

Personally, I think that additional precisions should only be added if they contribute something significant to possible shader effects. I agree that currently the only reason to support a half precision is to benefit current hardware that performs better with it. I think that a fixed point precision is useful only if it can double as a whole number integer precision. For example, FX32 would be useful for handling 32 bit z-buffers, as FP32 does not have enough mantissa precision for such a thing (so I guess FP24 doesn't have enough for a 24 bit z-buffer -- there's an example for ya, Chalnoth), while at the same time being useful for loop counters. As the situation currently is, though, FP16 and FX12 only produce higher performance than FP32 -- and only because of a design decision, not because of being inherently faster on those video cards.
 
Bouncing Zabaglione Bros. said:
Of course the question begs, if FP32 is the future, what is Nvidia doing with it now in an unusable state instead of providing the minimum spec that would be faster?
This is the entire problem with your line of thinking. There are many, many differences between the NV3x and the R3xx. You cannot claim that the difference in precision support is the reason for the performance difference, because there are so many other differences there.

In particular, the primary problem with FP32 on the NV3x is the register performance hit. FP32 registers take a very minor amount of additional die space over FP24 registers. No, if the NV3x used FP24, that alone would not have changed the performance significantly.

Other differences between the architctures include:
1. Support for high precision log/exp/sin/cos functions
2. Supposedly deeper pipelines
3. Much longer program support
4. Unlimited number of texture reads among 16 separate textures

Anyway, the performance problems are much more likely to be due to the longer program support than anything else. That architectural choice may have motivated the sharing of FP registers.
 
jpaana said:
Chalnoth said:
I do remember saying that HLSL is designed rather poorly. Cg is designed a little bit better, but is still much more limited than GLSL's design. I wasn't talking about how the compiler optimizes: I was talking about the way it was put together.
The only language feature (apart from having access to OpenGL internal variables like matrices, lights, etc.) different between Cg and DX HLSL is the fixed data type,
I wasn't even considering that, either. The support for FX12 is only available in OpenGL, anyway.

The main thing that makes Cg better is the runtime compile support. This, however, wasn't implemented quite as well as it could have been. An optimal implementation would have had included extensions in DirectX and OpenGL that would both allow the driver to select the default compile target, and would allow driver updates to update the compiler itself.
 
This is the entire problem with your line of thinking. There are many, many differences between the NV3x and the R3xx. You cannot claim that the difference in precision support is the reason for the performance difference, because there are so many other differences there.

the reasons why FP32 is so slow on the NV3X cards is utterly irrelevant to anyone outside of Nvidia's engineering labs. To find out why is only useful for the purposes of self enlightenment but that's about.

I'd go into more depth on issues raised in this thread but just reading some of it is making my head hurt.
 
Chalnoth said:
And FP32 support is the future. FP32 is required if you want to unify the vertex and pixel pipelines.

FP32 is the future for hardware designs. Unifying the vertex and pixel pipelines is the future for hardware designs. Yet FP32 hardware is here, no one will change it for you, and it doesn't run FP32 at decent speeds nor is unifying anything and it doesn't take (much) advantage from processing PS and VS at same precision. no future hardware design from nvidia will magicly solve nv3x problems. After one year when nv will stop replacing shaders for nv3x line and let them run at fp32 if requested, I will be happy to live with eventual banding because requiring fp 32 won't suddenly make nv3x faster, it will make it a lot slower.

Yes, FP32 is the future. It allows you to unify the vertex and pixel shader pipelines. why would you go FP32 if the current state of transistor budget doesn't allow you to do that?

hell, FP64 is the future. Or maybe FP48. who cares?
 
Chalnoth said:
You are misguided as to how much of a benefit FP24 really is over FP16, as it seems most ATI supporters are.
I fear not. FP24 is sufficient to do texture addressing calculations and FP16 is not. It is not even close to having enough precision OR range to be useful for anything but colour calculations.

Please stop spreading this FUD.
 
radar1200gs said:
There is no single "right" way of doing anything in 3d. If nVidia is supporting HDR the OpenGL way then I'd suggest that way has a certain credibility to it. This is simply more proof of microsoft doing everything they possibly can to screw nVidia over.

Its exposed through nvidia specific extensions in OpenGL, not through the core OpenGL.

What are "normal DX specifications" anyhow, compared to OpenGL specifications that carry the support of companies such as SGI, 3DLAbs etc?

As has been explained by developers around here, DX has always expected textures to be handled in a certian fashion, for many revisions - the FX series do not handle the float textures in this fashion. OpenGL probably expects the same, which is why NVIDIA are supporting this via extensions.

Did I ever say anything about how the operations are carried out? R300 had an issue with certain older games, NOLF2 was proof of that and one of your ATi guys confirmed that they rewrote older DX support in this very forum. Truth hurts, doesn't it?

Not me - seems you have issues with it. :rolleyes:

Anyway, the point of this little track was that you were responding to the quote that becuase ATI doesn't have int support then DX 8 wouldn't work - which is clearly a stupid statement because DX8 did work as thats all they had when R300 was released. Of course ATI rewrote DX8 (and DX7, 6,5,3) becuase they started the drivers from scratch - but what one game issue doe not point to anything problematic with using a float pipeline to support integer requests.
 
radar1200gs said:
OpenGLGuy:
I'll reply to your other points if and when I feel like doing so.

I never said FP16 was an IEEE standard or FP24 either, FP32 is the only IEEE backed standard out of them.
FWIW, the FP32 in the shaders isn't IEEE either (or at least it doesn't have to be).
 
Chalnoth said:
In particular, the primary problem with FP32 on the NV3x is the register performance hit. FP32 registers take a very minor amount of additional die space over FP24 registers. No, if the NV3x used FP24, that alone would not have changed the performance significantly.

Other differences between the architctures include:
1. Support for high precision log/exp/sin/cos functions
2. Supposedly deeper pipelines
3. Much longer program support
4. Unlimited number of texture reads among 16 separate textures

Anyway, the performance problems are much more likely to be due to the longer program support than anything else. That architectural choice may have motivated the sharing of FP registers.

So what you're saying is that regardless of meeting or surpassing the API spec, Nvidia hardware would be slow because it is just badly designed all over for today's spec?

Sounds like Nvidia just screwed up bigtime with NV30, and continued to make lemons with NV3x. We already knew that, and regardless of allowing FP16 in the API, it would still be a lemon.
 
Back
Top