FP16 and market support

Nick said:
Point taken but, I don't think it's inherently bad to go beyond specifications. With OpenGL, Nvidia has the luck to be able to produce a lot of extensions that have proven to be very efficient. But while waiting for OpenGL 2.0, it's stuck with DirectX specifications while the hardware can actually do a lot more (which could have been exposed in OpenGL).

"Embrace and extend" eh? Problem is that this is devisive at at time when developers want a consitent interface. Historically, extras are ignored until they make it into an API.

GFFX is a good example because Nvidia tried (and failed spectacularly) to force this "going beyond the API" because (a) performance sucks when you do, and (b) they neglected to supply things that are useful, basic parts of the API.

Nick said:
Remember DirectX 8.1? Wasn't it desiged specifically so that ATI could expose ps 1.4 which actually had a totally different specification from the rest? I'd say a DirectX 9.1 to answer Nvidia's demands would be fair right now...

I was being fatuous, but PS 1.4 was made part of the API - and little supported either then or now because Nvidia refused to support it when they were the 900 lb gorilla. It's this sort of thing that shows why API decisions should not be up to one IHV.
 
FP16 was definately a bad descision as it took away from resources that could have been used to give NV30 better FP32 instruction execution speed.

Nvidia though (wrongly I might add) that they would dictate to MS what DX 9 should and should not contain. MS just blew them off like there was no tomorrow. MS again slapped them silly when they announced that Nvidia would have NO part in X-Box 2

Nvidia thought becuase almost everyone used their chips that they had the big balls to call the shots and the whole industry put them in their place (nvidia just has not realized it yet)

Chalnoth said:
YeuEmMaiMai said:
the reason Nvidia has to use FP16 is becuase their hardware cannot deliver FP32 at sufficient speed.
The reason that nVidia chose to expose FP16 was because they wanted to allow higher speeds. They decided that the vast majority of calculations will not need higher than FP16 precision (which is true if the vast majority of shaders are relatively short and/or aren't prone to recursion artifacts, and most calculations are on color data).

The support of FP16 was a design decision, and wasn't an inherently bad one. It did buy nVidia higher-precision FP support, after all. And FP16 is enough precision and dynamic range for any color calculations (where recursion artifacts aren't a major problem).

IF Nvidia is smart, for the NV40 they will remove all FP16 crap and just go straight with FP32 and proper trilinear and FSAA....

It is pretty funny when ATi blew them out of the water and they still have not realized their mistake and followed along the same design with NV31,34,35 not realizing that the design is inherently flawed when it comes to DX9 performance. Mind telling me where you can find out about NV30 on Nvidia's website?
 
BRiT said:
Until they get AF and FSAA working, they can't even be considered 3D graphic cards.

LOL. I guess the Voodoo 1-3 cards, TNT 1,2, G400 etc. were all 2D graphics cards then. :p
 
John Reynolds said:
Hopefully MS has learned their lesson from that mistake and won't splinter shaders within their API like that again.

*cough* ps2.0a *cough*

The solution to that problem is to go 100% HLSL with no intermediate assembly step.
 
YeuEmMaiMai said:
IF Nvidia is smart, for the NV40 they will remove all FP16 crap and just go straight with FP32 and proper trilinear and FSAA....
I don't think so.

First of all, FP32 support will be necessary if the pixel/vertex pipelines are to be unified. It seems apparent that this will happen within two generations (from the rumors we've been hearing, at least ATI will not go unified this next generation, but it is unknown what nVidia will do).

But, as nVidia has shown, more computing power can be pulled out of a FP32 pipeline by using FP16. I'm not even talking about the register limitations here, as it appears that with some ops, the NV35+ can actually do more FP16 ops per clock than it can FP32 ops (in one of the units).

The main thing that nVidia needs to work on is making the register usage performance hit more manageable. FP16 can still be useful for many operations, and there's no reason to stop using it if performance can be gained.

FP24, on the other hand, is on the way out. It will not be possible to build a FP24 vertex shader that works properly, and so the unification of vertex/pixel pipelines will force FP32 throughout. There may also be issues with FP24 and dependent texture reads, though I haven't seen this confirmed.
 
Chalnoth said:
But, as nVidia has shown, more computing power can be pulled out of a FP32 pipeline by using FP16. I'm not even talking about the register limitations here, as it appears that with some ops, the NV35+ can actually do more FP16 ops per clock than it can FP32 ops (in one of the units).
You're right and false at the same time. With some ops NV35+ can do more FP16 ops per clock than it can do with FP32 ops. Some ops ? AFAIK it's just the case with MAD. Why ? Because MAD has to access to one more data. Actually it looks like the same limitation as the register limitation.
 
Humus said:
BRiT said:
Until they get AF and FSAA working, they can't even be considered 3D graphic cards.

LOL. I guess the Voodoo 1-3 cards, TNT 1,2, G400 etc. were all 2D graphics cards then. :p

The new cards might as well be V1-5/TNT1-2/G4-500 without those features. :p

Alright, additional clarification: Until they get AF and FSAA working, they can't even be considered 3D graphic cards for this day and age (ie: current generation).
 
Tridam said:
You're right and false at the same time. With some ops NV35+ can do more FP16 ops per clock than it can do with FP32 ops. Some ops ? AFAIK it's just the case with MAD. Why ? Because MAD has to access to one more data. Actually it looks like the same limitation as the register limitation.
You're right, it seems the FX architecture is capable of fewer "double FP16" ops than I thought. However, you are also wrong, according to 3dcenter's analysis:
http://www.3dcenter.de/artikel/cinefx/index3.php

According to this, only one function, rsq (which is 1/sqrt(x), a very common function in 3D) can be executed twice as fast with FP16 as it can with FP32. MAD apparently operates at the same speed (assuming the register limit isn't being hit, of course).

Regardless, my point was that FP16 definitely has its uses for 3D graphics. It has enough precision/range for color data, and may have other uses as well. Of course, it should only be used if performance can be gained from its use. Would it be better to save transistors and go all FP32? Or can significant performance really be gained by supporting FP16? These are questions to be answered by nVidia's engineers, and without knowing the architectural specifics, you cannot be sure which is the better course of action.

Anyway, I do have to apologize about sounding like I'm sure FP16 is the way to go. I am not. I was responding to what I feel like is an attitude that FP16 is low-precision, and should just go away. I'm saying that it has enough precision for many operations, and should not be done away with if performance can be gained from its use.
 
Well I guess we'll just have to agree to disagree then. FP16 is not what Nvidia promised, They promised FP32 with the speed needed to beat R3X0 (all that worthless PR crap they were spouting about). Since you say that

"FP24, on the other hand, is on the way out. It will not be possible to build a FP24 vertex shader that works properly, and so the unification of vertex/pixel pipelines will force FP32 throughout. There may also be issues with FP24 and dependent texture reads, though I haven't seen this confirmed"

Obviusly FP16 is going to be worse off than FP24 in this area right?. Since the R3X0 has partial FP32 percision, I yould definately expect R4X0 to have it all the way through.

Remember if your lvoe child, Nvidia, would have stuck with a fully functional FP32 design, they would not be in the mess they are in at this very time...INT12 units in NV30? what for?

Chalnoth said:
YeuEmMaiMai said:
IF Nvidia is smart, for the NV40 they will remove all FP16 crap and just go straight with FP32 and FP24, on the other hand, is on the way out. It will not be possible to build a FP24 vertex shader that works properly, and so the unification of vertex/pixel pipelines will force FP32 throughout. There may also be issues with FP24 and dependent texture reads, though I haven't seen this confirmedproper trilinear and FSAA....
I don't think so.

First of all, FP32 support will be necessary if the pixel/vertex pipelines are to be unified. It seems apparent that this will happen within two generations (from the rumors we've been hearing, at least ATI will not go unified this next generation, but it is unknown what nVidia will do).

But, as nVidia has shown, more computing power can be pulled out of a FP32 pipeline by using FP16. I'm not even talking about the register limitations here, as it appears that with some ops, the NV35+ can actually do more FP16 ops per clock than it can FP32 ops (in one of the units).

The main thing that nVidia needs to work on is making the register usage performance hit more manageable. FP16 can still be useful for many operations, and there's no reason to stop using it if performance can be gained.

.
 
DaveBaumann said:
There is no chance of having 3DMark03 changed now since that would alter the historical scores. NVIDIA had ample opportunity whilst they were on the Beta program to register their interest in partial precision in the application and considering ATI is invariant to partial precision they wouldn't have argued, the fact that its not part of 3DMark03 is a clear suggestion that NVIDIA didn't deem it an issue at the time.

They can release a SE version just like they did with 3DMark2001.

I'm not sure that invalidating historical scores would be all that important anyhow since only those cards capable of PP would experience a change in scores, ATi scores woud remain exactly as they already are since they can't benefit from PP.
 
YeuEmMaiMai said:
Obviusly FP16 is going to be worse off than FP24 in this area right?. Since the R3X0 has partial FP32 percision, I yould definately expect R4X0 to have it all the way through.
FP16 won't be used in a vertex shader.

Remember if your lvoe child, Nvidia, would have stuck with a fully functional FP32 design, they would not be in the mess they are in at this very time...INT12 units in NV30? what for?
And if Microsoft had decided that they would allow integer precision formats in DirectX 9, the NV3x would do much better. If the NV30 hadn't experienced such significant delays, developers would have had them much sooner, and not all early DX9 software (including Microsoft's own HLSL compiler) would be geared for ATI hardware, which would have improved things for nVidia.

Said succinctly, it was just one thing. You like to point out the precision argument as causing the problems because you want to defend ATI's use of FP24. I don't think that was necessarily the best decision that could have been made. In fact, I consider nVidia's choice of supporting FP32 and FP16 to be more forward-thinking (though the integer performance of the NV30-34 was probably overdone).
 
Umm integer support is allowed it is optional and nvidia integers do not met the requirements for the integer support anyway since they are not signed 32 bit ints.

Most of the texture formats are also integer formats only a few are floats.
 
bloodbob said:
Umm integer support is allowed it is optional and nvidia integers do not met the requirements for the integer support anyway since they are not signed 32 bit ints.
Um, link? I've never seen this before. The only PS precision stuff I've seen from Microsoft's docs is that full precision requires at least FP24, while the "PP" hint requires at least FP16.

Regardless, it wouldn't matter all that much. Nobody supports 32-bit ints in the PS.
 
Floats and ints are different. Float point support is required and requires 24 bits support like I said int support is optional but it is supported if nvidia had proper integer support then developers could have used it and ati would have been forced to just emulate support for which currently both companies do. Direct NEXT will require integer support. Nvidia might have been better off doing FX32 then it could have really been used in PS 2.0 but since once again they choose not to use the require precision they get no advantage. Some developers are still using integer maths for shader IE john carmack.
 
IIRC, Dave made mention of a study that Microsoft did that looked at what the optimal precision would be wrt to the Dx9 generation. It would be interesting to know when this was completed. It may help shed some light on when the decision to declare Fp24 as full precision was made and help end these recurring arguments.
 
Do current video cards have double precision floats? I don't think they do, then again they just might.
 
They don't. It's way overkill for 99.99% of the applications out there. I think we'll settle for 32bit for a long time to come.
 
bloodbob said:
Umm integer support is allowed it is optional and nvidia integers do not met the requirements for the integer support anyway since they are not signed 32 bit ints.

Most of the texture formats are also integer formats only a few are floats.

Aren't you confusing fixed point and integers?
 
radar1200gs said:
They can release a SE version just like they did with 3DMark2001.

I'm not sure that invalidating historical scores would be all that important anyhow since only those cards capable of PP would experience a change in scores, ATi scores woud remain exactly as they already are since they can't benefit from PP.
the SE version didnt invalidate old scores - the advanced shader test wasnt part of the score.......
 
Chalnoth said:
And if Microsoft had decided that they would allow integer precision formats in DirectX 9, the NV3x would do much better.
translation:
If nVidia hadnt bothered to waste transistors on FP16/integer units, the NV3x would be much better.

I mean, whats your point? Why do you fail to ever lay blame on nvidia? Why must it always be someone elses fault - ATI, MS, game developers, etc....


I also find it funny that out of one side of your mouth you proclaim FP16 as useful, and out the other side, say that FP24 is not.

FP16 is a low precision worthless format, and should be on its way out.
 
Back
Top