FP16 and market support

YeuEmMaiMai · Jan 2, 2004

OH I forgot where Carmack said that there are IQ trade offs for speed with the NV30 I guess you missed that part. JC was very carfull when he coded Doom III and he is doing his best to minimize the IQ has be has pointed out many times

"The R200 path has a slight speed advantage over the ARB2 path on the R300, but only by a small margin, so it defaults to using the ARB2 path for the quality improvements. The NV30 runs the ARB2 path MUCH slower than the NV30 path. Half the speed at the moment. This is unfortunate, because when you do an exact, apples-to-apples comparison using exactly the same API, the R300 looks twice as fast, but when you use the vendor-specific paths, the NV30 wins."

so while nVIdia claims higher percision (rightfully so) they cannot really use it due to lack of required speed

In terms of Performance :
NV30+NV30-path is faster than NV30+ARB2
NV30+NV30-path is faster than R300+ARB2
R300+ARB2 is faster than NV30+ARB2
R300+R200-path is faster than R300+ARB2

In terms of Quality :
NV30+ARB2 is better than NV30+NV30-path
NV30+ARB2 is better than R300+ARB2
R300+ARB2 is better than NV30+NV30-path
R300+ARB2 is better than R300+R200-path

so I concede the fact that NV30 is better looking at ARB2 but the problem is NV30 and ARB2 = half of the performance of R300+arb2....

please note that JC stated that "R300+ARB2 is better than NV30+NV30-path" is correct.

Looking at the NV3X performance in current games where it has a hard time keeping up with a slower clocked part from a competitor makes me wonder how badly it will do once the DX9 games that do make use of all of the advanced PS 2.0 features comes out

I am definately glad I did bot buy one of Nvidia's cards in my price range as I would have been royally screwed...

Ailuros said:
YeuEmMaiMai,

A simple example where you obviously want to see only half of the picture would be here:

John carmack states that when you force NV30 to run the STANDARD ARB2 path the R300 appears to be twice as fase.

Click to expand...

Carmack's specific statement wasn't concentrated on performance alone; he also commented on image quality differences between different modes. I'd urge you to re-read the full statement.

If the differences in a game like Doom3 are miniscule between different accuracy depths, then it's senseless to torture a specific accelerators performance for nothing. Riding over it makes to me as much sense as the R3xx 5bit LOD precision.

FXs are by the way yielding better performance due to stencil op performance with the special NV30 path.

----------------------------------------------------------------

I think it's time that someone sits down and writes an educated article about different floating point formats, implementations and what not. Ideally even with an attempt to analyze where and why each format is required and what the differences would look like.

I don't think that there's anyone with half a brain that cannot acknowledge that the R3xx family has an advantage in terms of arithmetic efficiency; yet that doesn't mean that the FXs are completely worthless either. In fact wouldn't anti-aliasing quality take such a high persentage in my own preferences I'm not so certain I'd own a R3xx today.

John Reynolds · Jan 2, 2004

YeuEmMaiMai said:
OH I forgot where Carmack said that there are IQ trade offs for speed with the NV30 I guess you missed that part. JC was very carfull when he coded Doom III and he is doing his best to minimize the IQ has be has pointed out many times

"The R200 path has a slight speed advantage over the ARB2 path on the R300, but only by a small margin, so it defaults to using the ARB2 path for the quality improvements. The NV30 runs the ARB2 path MUCH slower than the NV30 path. Half the speed at the moment. This is unfortunate, because when you do an exact, apples-to-apples comparison using exactly the same API, the R300 looks twice as fast, but when you use the vendor-specific paths, the NV30 wins."

so while nVIdia claims higher percision (rightfully so) they cannot really use it due to lack of required speed

In terms of Performance :
NV30+NV30-path is faster than NV30+ARB2
NV30+NV30-path is faster than R300+ARB2
R300+ARB2 is faster than NV30+ARB2
R300+R200-path is faster than R300+ARB2

In terms of Quality :
NV30+ARB2 is better than NV30+NV30-path
NV30+ARB2 is better than R300+ARB2
R300+ARB2 is better than NV30+NV30-path
R300+ARB2 is better than R300+R200-path

so I concede the fact that NV30 is better looking at ARB2 but the problem is NV30 and ARB2 = half of the performance of R300+arb2....

please note that JC stated that "R300+ARB2 is better than NV30+NV30-path" is correct.

Looking at the NV3X performance in current games where it has a hard time keeping up with a slower clocked part from a competitor makes me wonder how badly it will do once the DX9 games that do make use of all of the advanced PS 2.0 features comes out

Your entire post is premature until we know how well the finished game performs with its various paths. Even if ARB2 is half the speed of the NV3x path, if it still gives playable frame rates then it's a moot issue. Same goes for Half Life 2.

Ostsol · Jan 2, 2004

Chalnoth said:
jvd said:

Its funny . I love hearing that 16bit with 32 bit once inawhile is good enough . Yet 24bit isn't good enough .

Click to expand...

And yet even ATI agrees with this. Their fixed function texture addressing units are FP32.

Um. . . same thing with the programmable pipeline. Straight out of the interpolators, it's FP32. Of course, the sampled data with be FP24 max.

sonix666 · Jan 2, 2004

YeuEmMaiMai said:
OH I forgot where Carmack said that there are IQ trade offs for speed with the NV30 I guess you missed that part. JC was very carfull when he coded Doom III and he is doing his best to minimize the IQ has be has pointed out many times

"The R200 path has a slight speed advantage over the ARB2 path on the R300, but only by a small margin, so it defaults to using the ARB2 path for the quality improvements. The NV30 runs the ARB2 path MUCH slower than the NV30 path. Half the speed at the moment. This is unfortunate, because when you do an exact, apples-to-apples comparison using exactly the same API, the R300 looks twice as fast, but when you use the vendor-specific paths, the NV30 wins."

so while nVIdia claims higher percision (rightfully so) they cannot really use it due to lack of required speed

In terms of Performance :
NV30+NV30-path is faster than NV30+ARB2
NV30+NV30-path is faster than R300+ARB2
R300+ARB2 is faster than NV30+ARB2
R300+R200-path is faster than R300+ARB2

In terms of Quality :
NV30+ARB2 is better than NV30+NV30-path
NV30+ARB2 is better than R300+ARB2
R300+ARB2 is better than NV30+NV30-path
R300+ARB2 is better than R300+R200-path

so I concede the fact that NV30 is better looking at ARB2 but the problem is NV30 and ARB2 = half of the performance of R300+arb2....

please note that JC stated that "R300+ARB2 is better than NV30+NV30-path" is correct.

Looking at the NV3X performance in current games where it has a hard time keeping up with a slower clocked part from a competitor makes me wonder how badly it will do once the DX9 games that do make use of all of the advanced PS 2.0 features comes out

I am definately glad I did bot buy one of Nvidia's cards in my price range as I would have been royally screwed...

I am sorry but I only read something about quality in the part "it defaults to using the ARB2 path for the quality improvements" and that only tells something about the speed and quality of R200 specific path versus ARB2 path. So I am unsure where you deducted the NV30-path is worse quality wise than ARB2. Computationally less correct? Or visibly less correct?

OpenGL guy · Jan 3, 2004

sonix666 said:
I am sorry but I only read something about quality in the part "it defaults to using the ARB2 path for the quality improvements" and that only tells something about the speed and quality of R200 specific path versus ARB2 path. So I am unsure where you deducted the NV30-path is worse quality wise than ARB2. Computationally less correct? Or visibly less correct?

Those tables came from John Carmack himself.

jvd · Jan 3, 2004

Ostsol said:
Chalnoth said:

jvd said:

Its funny . I love hearing that 16bit with 32 bit once inawhile is good enough . Yet 24bit isn't good enough .

Click to expand...

And yet even ATI agrees with this. Their fixed function texture addressing units are FP32.

Click to expand...

Um. . . same thing with the programmable pipeline. Straight out of the interpolators, it's FP32. Of course, the sampled data with be FP24 max.

I think your all missing the point. People are claiming that 16bit and 32 bit fp is better than 24bit fp (32bit fp eariler in the pipeline) . Of course 32bit all the way through is the best. But then the one right next to it is the 24bit . Then the 16 bit. So if 24bit isn't good then 16 bit is horrible

The logic they are using doesn't make sense .

demalion · Jan 3, 2004

For a more complete answer on the topic of quality comparisons, here are some threads to browse:

Interview about it, and related thread. This about FX12, or using FP16 in a limited fashion..analogous to the NV35 as the R200 path is to the R300 and above for the basic Doom 3 featureset of the time.

More in depth discussion, though there is a LOT of it to go through. Note the discussion of texture look up versus calculations for specular highlights and normalizations, and the quality tradeoffs involved. For Doom 3, this seems likely to represents the still evident tradeoff for image quality present for the NV3x path, though not because the NV35 couldn't improve performance significantly over the NV30 (certainly significantly better than "half as fast as the R300"), but because it makes more sense to implement the highest quality once in a standard path (ARB2). The ARB2 path could feasibly be the actual default "NV35" path, since the limited shader usage outlined so far should allow it to fairly easily deliver the target fps values with its bandwidth advantage and stencil/clock parity to counter its disadvantages.

Revisited with some relatively recent comments. I think comments about the FX was more in defense of Valve and their more complex shading goals, than stating even the NV35 wouldn't be able to use the ARB2 path quality improvements for Doom 3 at the targetted speeds without significant issues, at least outside of speculating that new experiments are making their way into the game and choking on the NV35 (perhaps with an interest in revitalizing OpenGL's perception? I'm actually curious about a GLslang "awareness" effort being associated with Doom 3, for example, which the latest delay could conceivably allow).

DemoCoder · Jan 3, 2004

No, the logic makes perfect sense. Some calculations don't need more then FP16, some need more (FP24), and still others require FP32 (or even higher) The people arguing FP24 isn't good enough are the people arguing that certain shaders require FP32 in some degenerate cases. The people arguing FP16 aren't saying "FP16 is enough", they are saying that not all shaders require FP24.

If you have an architecture with multiprecision, you can select FP16 when you need it, and FP32 in those more rare circumstances where you need it. With single precision, if you need more than FP24, you're screwed.

At best, the people arguing single precision should atleast argue that FP32 needs to be supported as the minimum. If ATI indeed did not "finalize" their HW until MS said "FP24 is minimum", and if ATI indeed could have supported more than FP24 "easily", why didn't they move to FP32 in the beginning and lobby Microsoft to define FP32 as the minimum. At the time, NV already had FP32 capable HW (although with pathetic performance), and ATI could have delivered FP32 "easily", so there could have been consensus on FP32. Therefore if MS had endorsed FP32 as the minimum, we wouldn't have to wait another generation for the standard to be bumped up, since both vendors could have had FP32 HW ready, and ATI still would have come out looking golden, because presumably, their FP32 implementation would have "wiped the floor" with NVidia.

If the NV40/R420 deliver FP32 as minimum, MS will be forced to update the spec again, and any late IHV followers who had been designing to the spec will be forced to revise their plans/designs.

I'm not questioning ATI's design decision to go with FP24, but if we're arguing single precision is best and "elegant", than a pure FP32 all-the-way through design is better than an FP32/FP24 hybrid.

But all this is irrelevent, since there are paultry few DX9 games, and still even fewer that abuse shaders enough to need FP32 without artificting, and even if there were artifacting, most people wouldn't notice it, or would ignore it. I don't see many people complaining about inaccurate specular, or low-precision normal maps.

There may come a day when people can't stand non-gamma-correct non-HDR non-artifacted shaded rendering, just like nowadays people easily notice aliasing, but it's gonna be awhile.

Bouncing Zabaglione Bros. · Jan 3, 2004

DemoCoder said:
I'm not questioning ATI's design decision to go with FP24, but if we're arguing single precision is best and "elegant", than a pure FP32 all-the-way through design is better than an FP32/FP24 hybrid.

Is ATI's FP32/24 better (and faster) than Nvidia's FP32/16? It would seem to indicate that ATI made the better trade-off, because their performance is significantly better, even at higher precisions.

DemoCoder said:
But all this is irrelevent, since there are paultry few DX9 games, and still even fewer that abuse shaders enough to need FP32 without artificting, and even if there were artifacting, most people wouldn't notice it, or would ignore it. I don't see many people complaining about inaccurate specular, or low-precision normal maps.

And yet already we are seeing poor performance from Nvidia's design, barely on parity with ATI's 32/24 design even when using 32/16. And this is in the early days of shaders.

Sure you can turn around and say "meh, it's the early days of shaders", but that doesn't change the fact the Nvidia are selling a part with significant issues in this area, and that they can't compete even in the first wave of DX9 titles.

Nvidia will just turn around and tell you to buy an NV40, but who's to say the same issues will not come up again? Traditionally Nvidia do seem to carry forward these sorts of thing from generation to generation. With NV3x we've actually seen some things (like AA and brilinear filtering) actually going backwards.

DemoCoder said:
There may come a day when people can't stand non-gamma-correct non-HDR non-artifacted shaded rendering, just like nowadays people easily notice aliasing, but it's gonna be awhile.

True, and by then we'll all be using better cards anyway, but that's no reason not to discuss the facts as they are today. Yes 32 bit all over would be nice, and yes ATI made a compromise with 24 bit. Nvidia also made a compromise with 16 bit, and it looks like ATI made better choices and built better products this time around.

I could do lots of handwaving in this thread as people have done saying (paraphrase) "Nvidia's 16/32 is better if they could make it run faster", but I can just as easily handwave on behalf of ATI and say "ATI's next gen will be 32 bit so it doesn't matter, and will be faster, and 32/24 will be faster anyway". That's kind of pointless because it's not relevent to how things are in today's parts, where ATI have proved that their design was better than Nvidia's for the gaming markets of the last 18 months.

Regardless of Nvidia's design debateably being better in *theory*, in *practice* it's worse. Worse in performance, worse in quality, and worse to program.

Ailuros · Jan 3, 2004

YeuEmMaiMai,

There's nothing in any of JC's comments that suggest anything but a huge performance difference between paths. The IQ differences always according to him and followed statements did never sound to be huge, rather the exact contrary.

Albeit preliminary Doom3 does not sound to me like a game that actually requires more than FP16; nonetheless it's far from qualifying to be called a true dx9.0 equivalent OGL game.

Thanks to demalion here are again his link and the according quote from Carmack:

What about the difference between NV30+NV30-path and R300+R200-path in terms of performance and quality?

Very close. The quality differences on the ARB2 path are really not all that significant, most people won't be able to tell the difference without having it pointed out to them.

http://www.beyond3d.com/forum/viewtopic.php?t=4202

-----------------------------------------------------------------

Nvidia will just turn around and tell you to buy an NV40, but who's to say the same issues will not come up again?

Multiple precision formats will continue to exist for quite some time and at more than one IHV. High precision formats ie FP32 do come with bandwidth penalties and not even ATI employees have denied that fact in public. How each IHV and their according architectures are going to handle FP32 or even maybe higher precision formats in the distant future, I cannot know nor predict, but it seems to make sense that for a transitional period multiple precision formats will be necessary and that irrelevant of IHV or architecture.

----------------------------------------------------------

Sidenote on highly anticipated games: one side trumpets about HL2 preliminary results, while the other over Doom3 preliminary results. Both are facing delay over delay at the moment and it's almost a tragical irony that each delay took the wind out of the sails of each side.

NV knows that it has to improve/increase it's arithmetic efficiency, while ATI knows that it has to work on stencil op performance (especially with MSAA combined), for each of the two cases.

Ostsol · Jan 3, 2004

Bouncing Zabaglione Bros. said:
DemoCoder said:

I'm not questioning ATI's design decision to go with FP24, but if we're arguing single precision is best and "elegant", than a pure FP32 all-the-way through design is better than an FP32/FP24 hybrid.

Click to expand...

Is ATI's FP32/24 better (and faster) than Nvidia's FP32/16? It would seem to indicate that ATI made the better trade-off, because their performance is significantly better, even at higher precisions.

Once again, though, it's not the precision that's slowing NVidia down, but other design decisions.

Geo · Jan 3, 2004

Ostsol said:
Once again, though, it's not the precision that's slowing NVidia down, but other design decisions.

The impression I get from reading these boards, rightly or wrongly, is that the FP24 "tradeoff" was more effect than cause. . with the underlying cause being the .15 vs .13 decision (with a large helping of "the mathemagicians say it won't matter for awhile anyway" on top). Given how that worked out for the two parties, it is extremely hard to argue that ATI made the wrong choice there.

At least that's what I get from hearing the ATI guys talk about FP24 vs FP32 in their architecture being about transistor count and not speed.

StealthHawk · Jan 3, 2004

Bouncing Zabaglione Bros. said:
Nvidia will just turn around and tell you to buy an NV40, but who's to say the same issues will not come up again? Traditionally Nvidia do seem to carry forward these sorts of thing from generation to generation. With NV3x we've actually seen some things (like AA and brilinear filtering) actually going backwards.

How are they going "backwards" with regards to AA? The gfFX has the same AA as the gf3 or gf4. I will agree that forcing brilinear is a step backwards quality-wise.

sonix666 · Jan 3, 2004

Bouncing Zabaglione Bros. said:
DemoCoder said:

I'm not questioning ATI's design decision to go with FP24, but if we're arguing single precision is best and "elegant", than a pure FP32 all-the-way through design is better than an FP32/FP24 hybrid.

Click to expand...

Is ATI's FP32/24 better (and faster) than Nvidia's FP32/16? It would seem to indicate that ATI made the better trade-off, because their performance is significantly better, even at higher precisions.

I hear this reasoning against FP32/FP16 of nVidia all too much. You can't deduct that going FP24 is better than going FP32/FP16 because of the performance of the two cards. The only thing you can deduct is that ATi made the better performing DX9 architecture of the two, but nothing directly points to FP24 being the reason of that.

nVidia's design seems to be hampered by register usage restrictions and the strange quads architecture, which I highly doubt is the result of their decision of not going FP24, do you?

radar1200gs · Jan 3, 2004

nVidia marketed Brilinear incorrectly IMO.

What I would have done is introduce Brilinear as a Bilinear replacement, and then for UT2003 state that "our bilinear filtering is so good it almost exactly matches ATi's trilinear filtering". This would have the effect of the game looking better "straight out of the box".

This way quality is actually increased (for bilinear) instead of (very marginally) decreased for Trilinear, and pressure is applied on the competition all at the same time.

Bouncing Zabaglione Bros. · Jan 3, 2004

StealthHawk said:
How are they going "backwards" with regards to AA? The gfFX has the same AA as the gf3 or gf4. I will agree that forcing brilinear is a step backwards quality-wise.

I thought the GFFX AA with it's weird "loopback blurring" was worse than on the previous generation, which IIRC, used supersampling. If you're suggesting that the previous Nvidia generation of AA is just as bad as on the GFFX, then I stand corrected. It's still not something to be proud of, especually when compared to their primary competition.

aZZa · Jan 3, 2004

Im a little curious as to how nVidia marketed Brilinear radar1200gs badly. All they have done is hidden tri-lin filtering from the user within the recent drivers to make their cards look greater than their current capabilities, to win some benchmarks. People would love really these optimisations... if only they had the choice to run at full quality or optimised (especially on slower systems).

Maybe if they marketed this as a feature as bilinear++ (like their shader architecture) they could win a few more less informed sites over.

Offering better performance for 99% quality is an excellent optimisation, provided 100% iq is still available to the user/developer even if it runs below par.

The current pixel shader hardware follow this trend - 32 bit pixel fp is currently optimal ...in the future 64...128...256 bit wow... but running offline is different to the intended use in online/gaming systems/platforms.

Forgetting OGL and its extensions, remember dx9 has limited shader capabilities especially within its p.s.2 format, and currently 24bit fp/with 32bit addressing?? offers the best cost and performance in dx9 gaming systems. Without this intermediate stepping by some companies, total 32 bit would be much further in the future. This might have be different to many if Doom 3 was released 12 months ago offering 16 bit fp and perfomance??

Please think of the engineering/development effort/cost in developing and supporting such an impressive hardware by ATI on 0.15u (and later 0.13u), before one questions the use of 24bit p.s. If ATI did not produce such technology, one would still be paying huge dollars for the nv30 and its derivatives. Ok, they are not perfect, but they made the right choices in design in fp-all-the-way-dx9++ chip. 24-bit is a short term standard which should be nearly perfect for the capabilities of p.s.2 tech. All current dx9 capable chips bar nvidias allow for 24 bit pixel shading v2. Without having a standard what chance would these comapnies have in comparing to these huge companies.

Once we move to longer, faster and more complicated shader tech 24-bit might find itself in trouble of error/overflow problems due to its limitations, but that should not be at least for a year or two (dx10-next) which may incorporate many new tech features. By then we should have the manufacturing tech (<=0.09u) to build the complicated 32 bit shaders at good performance levels and yields for the consumer.

Anyway, all the best to everyone in 2004, and good luck to the smaller companies an underdogs of the 3d World... Volari, Deltachrome and especially the mysterious PowerVR Series 5 Tech chip(s), which can hopefully turn a few heads, win some buyers, and raise the current shading tech to a new level ...coming soon..

[/quote]

Bouncing Zabaglione Bros. · Jan 3, 2004

Ostsol said:
Once again, though, it's not the precision that's slowing NVidia down, but other design decisions.

That's my point. All this handwaving about Nvidia's 32/16 hybrid potentially giving better speed with 16 where possible, and better IQ with 32 where necessary is all very pie in the sky, because that is not what the NV3x architecture is capable of.
On the whole, Nvidia's design decisions on the NV3x have created something too slow to use FP32, leaving low quality FP16 that still fails to give speed parity. This is why Nvidia have been cheating and dropping precision even when the developer doesn't want it.

Sure, you can turn around and say "it'll all be fixed in NV40", but that's no more vailid than me handwaving and saying "well in R420, ATI will have 32 bit, won't need 16bit (ie a 32/32 architecture), and will *still* be faster and have better IQ than Nvidia".

As has been said, it's not to do with the "bitness" of the precision, but the way the architechture has been built to run it. NV3x may *potentially* have better precision modes, but isn't capable of using them in with the performance necessary to be used for games.

radar1200gs · Jan 3, 2004

Im a little curious as to how nVidia marketed Brilinear radar1200gs badly. All they have done is hidden tri-lin filtering from the user within the recent drivers to make their cards look greater than their current capabilities, to win some benchmarks. People would love really these optimisations... if only they had the choice to run at full quality or optimised (especially on slower systems).

Maybe if they marketed this as a feature as bilinear++ (like their shader architecture) they could win a few more less informed sites over.

That is what I already said. nVidia replaced trilinear on the quiet/sly with brilinear, when they should have replaced bilinear with brilinear and loudly advertised the fact that they did.

I think a good filtering regime would look like this:

High performance bilinear = original bilinear, performance bilinear = 52.10 version of brilinear or thereabouts, quality bilinear = brilinear as found in 53.03.

High performance trilinear = performance bilinear based, performance trilinear = quality bilinear based, quality trilinear = original trilinear.

high performance aniso based on performance trilinear for all stages, performance aniso based on quality trilinear for first stage, performance trilinear for later stages, quality aniso based on quality trilinear for all stages.

KimB · Jan 4, 2004

Ostsol said:
Chalnoth said:

jvd said:

Its funny . I love hearing that 16bit with 32 bit once inawhile is good enough . Yet 24bit isn't good enough .

Click to expand...

And yet even ATI agrees with this. Their fixed function texture addressing units are FP32.

Click to expand...

Um. . . same thing with the programmable pipeline. Straight out of the interpolators, it's FP32. Of course, the sampled data with be FP24 max.

Right. It's the dependent texture reads, where the texture address has to go through a FP24 register/calculation, that may result in problems.

FP16 and market support

YeuEmMaiMai

John Reynolds

Ecce homo

Ostsol

sonix666

OpenGL guy

jvd

demalion

DemoCoder

Bouncing Zabaglione Bros.

Ailuros

Epsilon plus three

Ostsol

Geo

Mostly Harmless

StealthHawk

sonix666

radar1200gs

Bouncing Zabaglione Bros.

aZZa

Bouncing Zabaglione Bros.

radar1200gs

KimB

Similar threads