FP16 and market support

Hellbinder · Dec 31, 2003

Dio said:
Hellbinder said:

ATi dudes Jump into their SUV's

Click to expand...

SUV's? There tends to be more interest in sports cars.

Except for hiring them to go up to Tahoe.

Yeah.. your probably right.. I was kind of in a Seattle mindset when I wrote that part.

Vince · Dec 31, 2003

Chalnoth said:
Sony claims that the VU's can transform at a rate of 66 million polys/sec.

AFAIK, thats just the theoretical peak for VU1.

3. Transistor space: as an example, the PS2 vertex units support a divide function. GPU's don't. This saves lots of transistors. Once again, dedicated hardware saves transistors, which in turn improves performance.

Lots huh? A EE's Vector Unit contains 5.8M transistors. Also take note that this number included the 32Kbyte-RAM and VIF. While ATI and nVidia aren't as forthcoming in releasing such information, I'm going to go out on a limb and say you're mistaken.

BRiT · Dec 31, 2003

radar1200gs said:
...more fud...

Hey, where are those numbers at? Or have you conceeded that you're only capable of spreading FUD?

digitalwanderer · Dec 31, 2003

radar1200gs said:
If the rumors that R420/R423 will support a higher precision are true and _PP support comes along with that higher precision (which I believe it will, since for R300 to easily support _PP it would have had to be FP12 which is below the minimum microsoft specify) then I will LMAO. It will be hugely entertaining to see the fanboys explain how it all came about given FP24's superiority.

As I said before, you should see what nVidia's FP32/FP16 is truly capable of with NV40. Shame about how NV3x ended up, but in the end nVidia were correct to just get on with NV40 rather than waste resources and effort trying to fix the unfixable. In the end NV3x is extremely competitive in everything bar DX9 and DX9 isn't exactly setting the gaming world on fire at present.

DaveBaumann said:
(which I believe it will, since for R300 to easily support _PP it would have had to be FP12 which is below the minimum microsoft specify) then I will LMAO

Click to expand...

Get a clue Greg. Please paddle in the shallow end where your undertsanding best suits you.

LOL~~~~ LMFAO~~~~~

I'm sorry, I don't know which is funnier. Greg's post or Dave's response, but I got coffee all over my bloody monitor again on these two posts.

OpenGL guy · Dec 31, 2003

DemoCoder said:
Ok, why don't you tell me what shaders are being used on the wood, Lara's hair, jeans, skin, door, etc in this http://www.gamespot.com/pc/adventure/tombraidertheangelod/screens.html?page=129 screenshot (I picked the first one, just to avoid accusations of selection bias) I'd like to know, because I am baffled as to how bland this game looks. Can you pick a shot that doesn't include water, caustics, or mirrors that demonstrates the other (e.g. the rest of the game) shaders? Here's another one http://www.gamespot.com/pc/adventure/tombraidertheangelod/screens.html?page=118 Anything in this scene DX9 specific (requires PS2.0 to do?)

You think I have time to play this game and find that exact location? I did grab some shaders from the start of the Paris benchmark.
Walls:

Code:

ps_1_1
tex t0 
tex t1 
texcoord t2 
texcoord t3 
dp3_d2 r1, t1_bx2, t3_bx2 
mov_d2 r0.rgb, v0 
mad r0.rgb, r1, c1, r0 
mul_x2 r0.rgb, r0, t0 
mov_sat r1.a, t2.b 
mul_sat r1.a, c0.a, r1.a 
lrp r0.rgb, r1.a, c0, r0 
+mov r0.a, t0 
end

Floor:

Code:

ps_2_0
def c2, -0.5, 0, 0, 0.5 
def c3, 0, 1, 0, 0 
dcl v0.rgb 
dcl t0.rg 
dcl t1 
dcl_pp t2.rgb 
dcl_pp t3.rgb 
dcl_pp t4.rgb 
dcl t5.rgb 
dcl t6.rgb 
dcl_2d s0 
dcl_2d s1 
dcl_cube s2 
texld_pp r0, t0, s0 
dp3 r7.a, t6, t6 
rsq r9.a, r7.a 
mul_pp r4.rgb, r9.a, t6 
add r0.rgb, r0, c2.r 
add_pp r0.rgb, r0, r0 
dp3_pp r6.r, r0, t2 
dp3_pp r6.g, r0, t3 
dp3_pp r6.b, r0, t4 
dp3_pp r11.a, r6, r4 
add r1.rgb, r6, r6 
mad_pp r3.rgb, r1, r11.a, -r4 
texld_pp r10, r3, s2 
texld_pp r5, t0, s1 
mul r10.a, r0.a, c2.a 
mul r7.rgb, r10, r10.a 
dp3 r7.a, t5, t5 
rsq r7.a, r7.a 
mul_pp r2.rgb, r7.a, t5 
dp2add_pp r9.r, r0, t2, c3.r 
dp2add_pp r9.g, r0, t3, c3.r 
dp2add_pp r9.b, r0, t4, c3.r 
dp3_pp r7.a, r9, r2 
mad_pp r0.rgb, r7.a, c1, v0 
mov_pp r6.a, r5.a 
mad_pp r5.rgb, r5, r0, r7 
dp4_sat r1.a, t1, c4 
mul_pp r1.a, r1.a, c0.a 
lrp_pp r6.rgb, r1.a, c0, r5 
mov_pp oC0, r6 
end

Clothes and hair:

Code:

ps_1_1
tex t0 
tex t1 
texcoord t2 
mov r0.rgb, t2 
dp3_sat r1.rgb, t1_bx2, r0_bx2 
mul r1.rgb, r1, v0 
mul r0.rgb, r1, t0 
mad r0.rgb, v1, t0, r0 
+mov r0.a, t0.a 
end

Skin:

Code:

ps_2_0
def c0, 1, 0, 0, 32 
def c2, -0.5, 0, -1, 1.01 
dcl v0.rgb 
dcl_pp t0.rg 
dcl_pp t2.rgb 
dcl_pp t3.rgb 
dcl_pp t4.rgb 
dcl_pp t5.rgb 
dcl_pp t6.rgb 
dcl_2d s0 
dcl_2d s1 
texld_pp r0, t0, s0 
texld_pp r7, t0, s1 
add r0.rgb, r0, c2.r 
mul_pp r2.rgb, r0.a, c1 
add_pp r9.rgb, r0, r0 
dp3_pp r4.r, r9, t2 
dp3_pp r4.g, r9, t3 
dp3_pp r4.b, r9, t4 
dp3 r4.a, r4, r4 
rsq r4.a, r4.a 
mul_pp r11.rgb, r4, r4.a 
mul r6.rgb, r11, c2.a 
dp3 r6.a, t5, t5 
rsq r6.a, r6.a 
mul_pp r1.rgb, r6.a, t5 
dp3 r1.a, r6, r1 
add r1.a, r1.a, r1.a 
mad_pp r10.rgb, r11, r1.a, -r1 
dp3_sat r2.a, r11, r1 
dp3 r10.a, t6, t6 
rsq r10.a, r10.a 
mul_pp r5.rgb, r10.a, t6 
dp3_sat_pp r6.a, r5, r10 
dp3_sat r0.a, r11, r5 
log_pp r1.a, r6.a 
mul r8.a, r1.a, c0.a 
exp_pp r3.a, r8.a 
min_pp r5.a, r3.a, c0.r 
mov_pp r11.a, -r0.a 
cmp_pp r9.a, r11.a, c2.g, r5.a 
mul_pp r2.rgb, r2, r9.a 
mul_pp r0.rgb, r7, c1 
add_pp r2.a, r0.a, r2.a 
rcp_pp r2.a, r2.a 
add_pp r1.a, r2.a, c2.b 
cmp_pp r2.a, r1.a, c0.r, r2.a 
mul_pp r0.a, r0.a, r2.a 
mul_sat_pp r5.rgb, r7, v0 
mov_pp r3.a, r7.a 
mad_pp r11.rgb, r0, r0.a, r5 
add_pp r3.rgb, r2, r11 
mov_pp oC0, r3 
end

That's all I have time for. If you want to see more shaders, dump them yourself.

Since OpenGL guy seems to know alot about this game, why not explain some of the advanced material shaders they are using. How kind of amazing shaders did they design to make Lara's skin, hair, and clothes suck so bad, and the game's rocks, stone, metal, and wood look like it came out of a 1997 game.

I don't have time to deal with your attitude.

Look, I can believe that have some shaders to do advanced water, reflection, DOF, bloom, etc. But are those in every frame? I'm not trying to start an argument with you on purpose, but the PC version of this game doesn't look that much better (in terms of lighting, effects) than the PS2 version.

Sure sounds like you're trying to start an argument to me. Why don't you ask the developer about why the game looks the way it does?

Althornin · Dec 31, 2003

DemoCoder said:
Still can't let go of that fairy tale that IHVs finalize their HW only after the spec is worked out, eh? I guess ATI won't even write one line of code for beyond 3.0+ features until Microsoft hands down the DirectX Next spec and blesses it?

well, what he said was pretty much a paraphrase of what some of the ATI guys here told us - that in this case, the hardware was NOT finalized until they got word back from MS on the precision issue.

Ostsol · Dec 31, 2003

radar1200gs said:
Are you so thick that you need a picture drawn?

R300's pixel shader registers are 24 bits wide. 24/2 = 12

As I said above, they could attempt to get 16 bits out, but the complexity would not be worth it.

The minimum for partial precision is FP16, therefore FP12 plainly is inadequate.

Um. . . you must be under the impression that supporting _PP on the R300 would be useful, when in fact it would not be. The NV3x's _PP performance comes from being able to use more registers than without a performance hit than its FP32. FP16 is not inherently faster, just as FP12 would not be inherently faster. Just because double the values can be fit in a register does not mean that double the operations can be executed per clock. Furthermore, the R300 has -no- performance penalties for register usage. You could use all the registers the R300 has available and still receive the same performance.

Basically, FP12 not only has inadequate precision, but it would be utterly pointless to implement on the R300. In fact, it would be pointless for the R300 to have any additional floating point precision available. It's not even worth considering.

Tim Murray · Dec 31, 2003

This thread proves that once fanboys learn two technical terms, they suddenly become infinitely more dangerous.

Be careful, kids, and slap the fanboys around before you try and teach them anything. This thread has become one big pile of BS, FUD, and silliness.

Partial precision might be good in the future. But as R300 has helpfully proven, you don't need it to be successful, at least for the time being.

Just reading the first post in this thread... what, you can't simply let it go that NVIDIA loses in 3DMark legitimately (e.g., when cheats are not used)? You can't accept that 3DMark03 is a full-precision PS2.0 benchmark? You feel insecure because your [H]ard Penis isn't magically longer than everyone else's because you can't win at 3DMark? Grow up. The original post had NOTHING WHATSOEVER to do with FP16/32 versus FP24--it had to do with 3DMark.

Ostsol · Dec 31, 2003

Personally, I still don't believe there's any practical merit for _PP except on the current video cards that support it. For the future all we really need is a couple of basic types: float and integer. I don't speak of integer as fixed point like FX12, though, but integers as they are used in most programming languages: whole numbers. Fixed point can be useful (FX32 could be good if 32 bit z-buffers become more common), but only if the same ALUs can also be used for whole number operations. Otherwise, it's not worth the cost of implementing it in silicon.

Geo · Dec 31, 2003

The Baron said:
This thread proves that once fanboys learn two technical terms, they suddenly become infinitely more dangerous.

Be careful, kids, and slap the fanboys around before you try and teach them anything. This thread has become one big pile of BS, FUD, and silliness.

And yet y'all continue to feed the energy monster. Tho I may copy and paste Ostsol's post about the uselessness of partial precision on the R300 architecture, as it really is a succinct statement of the heart of the 26 pages of merry go round on this thread.

KimB · Dec 31, 2003

Ostsol said:
FP16 is not inherently faster,

But FP16 is faster on the NV30.

If it is assumed that x% of future shader instructions can use FP16 with no loss in image quality, then it makes sense to accelerate FP16 instructions. Since for the same number of transistors you can get more performance out of FP16, it makes sense to make FP16 faster.

Of course, if an architecture assumes that all instructions are going to be FP32, then FP16 will not be faster (except as an input/output format, of course).

But we're talking about whether or not to use FP16 in the future, right? It makes sense to use FP16 if speed can be gained. It makes sense to accelerate FP16 if it can be used frequently in shaders. I claim that FP16 will be able to be used quite often in shaders with little to no quality loss, and thus FP16 processing is not going away.

Ostsol · Dec 31, 2003

Yes, but once again we are back to the question: did the NV3x architecture -really- have to have its register limitations? -That-, after all, is where all that FP16 speed comes from on those cards. It is faster not because more instructions can be executed per clock -- the same FPUs for FP32 are used for FP16 --, but because of some entirely different matter dealing with how the NV3x handles registers. As such, does it not look like FP16 is only faster in some current technology and might not be faster in future technology?

KimB · Dec 31, 2003

Ostsol said:
Yes, but once again we are back to the question: did the NV3x architecture -really- have to have its register limitations?

It was a design decision. Only the engineers at nVidia know what was gained by opting to have register limitations.

There are also ways to increase FP16 performance by having units dedicated to FP16. You could, for example, have each FP32 unit also have a FP16 unit in parallel. In this way, when executing FP16, peak performance would double.

Depending on how many transistors it takes, it may also be an option to make that single FP32 unit operate as two parallel FP16 units.

And it is unlikely that nVidia will abandon the current architecture that we see in the NV3x. They will, instead, work on its major faults. It seems pretty certain, then, that nVidia will reduce the register usage performance hit for the NV4x (hopefully they will also improve the AA and anisotropic filtering...), but the NV4x will work on the same basic paradigm.

As such, does it not look like FP16 is only faster in some current technology and might not be faster in future technology?

Since FP16 is not faster in some current technology, of course there's the possibility that it won't be faster in some future technology. The point is that as long as the software can make use of it, there will be a gain from accelerating the format.

Ostsol · Dec 31, 2003

Chalnoth said:
There are also ways to increase FP16 performance by having units dedicated to FP16. You could, for example, have each FP32 unit also have a FP16 unit in parallel. In this way, when executing FP16, peak performance would double.

Depending on how many transistors it takes, it may also be an option to make that single FP32 unit operate as two parallel FP16 units.

I'd say that the latter is the best solution of the two you present. After all, why spend transistors on specialized, lower precision units when you can add more existing high precision units? If an IHV can beef up FP16 speed by making a FP32 unit double as a twin FP16 unit, I'll certainly agree that FP16 has some merit, as this would be an example of how the FPUs themselves make one precison or the other faster.

Hellbinder · Dec 31, 2003

Ostsol said:
Chalnoth said:

There are also ways to increase FP16 performance by having units dedicated to FP16. You could, for example, have each FP32 unit also have a FP16 unit in parallel. In this way, when executing FP16, peak performance would double.

Depending on how many transistors it takes, it may also be an option to make that single FP32 unit operate as two parallel FP16 units.

Click to expand...

I'd say that the latter is the best solution of the two you present. After all, why spend transistors on specialized, lower precision units when you can add more existing high precision units? If an IHV can beef up FP16 speed by making a FP32 unit double as a twin FP16 unit, I'll certainly agree that FP16 has some merit, as this would be an example of how the FPUs themselves make one precison or the other faster.

The problem is that is not taking into account the Negative aspects of FP16 that will become more and more evident as shader use and the complexities thereof become more common.

After a while even FP24 will begin to show some signs of weakness. FP16 however will start splitting at the seams even as early as late next year. What if you want to sample a texture that is 2048x2048? or other cases outside the limitations of FP16.

It Makes no sense whatsoever to support FP16 currently and is flat out Insanity to push it with a new product even comming out in Q1 of next year. There are several reasons for this and they are self evident. IMO it borders on dishonesty for some of the people in this thread to be staunch supporters of FP16 when they know damn well it not a wise course of action.

If even FP24 is going to need to go the way of the Dodo,, Why in the hell are people arguing for FP16 just because one IHV made such poor decisions over the last year.

The Future, even the near future is Pure FP32 and thats all there is to it.

Razor04 · Dec 31, 2003

DemoCoder said:
I think the most likely multiprecision path of the future will be FX16/FP32, since PS3.0 adds integer registers, and OpenGL 2.0 specs integers in GLSLang, and since it makes sence for loops and other operations.

Wouldn't something like this cause even more problems? Let's say for a minute that both ATI and NV implement these two precisions...what happens to all those cards from the past that used FP16 and not FX16. And even more importantly what happens when an application calls for FP16 and gets FX16? I really don't see any precision changes happening other than ATI possibly moving to FP32 with the R420 until DX10 is released. It would just screw things up further IMHO...and lead to even more pointless threads like this one (which I must admit is damn amusing) and confusing architectures.

Ostsol · Dec 31, 2003

Hellbinder said:
The problem is that is not taking into account the Negative aspects of FP16 that will become more and more evident as shader use and the complexities thereof become more common.

After a while even FP24 will begin to show some signs of weakness. FP16 however will start splitting at the seams even as early as late next year. What if you want to sample a texture that is 2048x2048? or other cases outside the limitations of FP16.

It Makes no sense whatsoever to support FP16 currently and is flat out Insanity to push it with a new product even comming out in Q1 of next year. There are several reasons for this and they are self evident. IMO it borders on dishonesty for some of the people in this thread to be staunch supporters of FP16 when they know damn well it not a wise course of action.

If even FP24 is going to need to go the way of the Dodo,, Why in the hell are people arguing for FP16 just because one IHV made such poor decisions over the last year.

The Future, even the near future is Pure FP32 and thats all there is to it.

Indeed if FP16 is inadequate in a situation, that's where FP32 is used. My only issue with FP16 is when potential FP32 performance is excessivly compromised because of FP16. Not all shaders are really long and complex though. Not all shaders will see a build-up of inaccuracies as a result of low precision. Simple, flat texturing operations, for example, will never need floating point precision. Of course, one could always just use the fixed function pipeline in some instances, but what if the fixed function pipeline is emulated using shaders? In that case, one will want the fastest emulation possible. If it can be gained using lower precision that results in no quality penalty, then there's no problem with that.

Now I'm not turning around and suddenly supporting FP16 entirely. As I said: I support it so long as its existence does not excessively compromise potential FP32 performance. That is because by itself it is does not add anything at all to graphics programming. Performance is the only thing it can possibly add. It does not allow for special functionality within shader programming, nor does it provide the possibility of certain effects -- except for providing additional performance, but that's only good when the result won't compromise quality. As such, FP16 becomes a bonus -- a potential way to get performance when it is needed.

arjan de lumens · Dec 31, 2003

Razor04 said:
Wouldn't something like this cause even more problems? Let's say for a minute that both ATI and NV implement these two precisions...what happens to all those cards from the past that used FP16 and not FX16. And even more importantly what happens when an application calls for FP16 and gets FX16?

Huh? If FP16 is missing, then the natural choice would be to use the next floating-point precision upwards; FP24 or FP32, whichever one is available. FX16 should only ever be used when the programmer explicitly asks for a fixed-point type; any other FP16->FX16 replacement is either a bug or a cheat.

Razor04 · Dec 31, 2003

arjan de lumens said:
Razor04 said:

Wouldn't something like this cause even more problems? Let's say for a minute that both ATI and NV implement these two precisions...what happens to all those cards from the past that used FP16 and not FX16. And even more importantly what happens when an application calls for FP16 and gets FX16?

Click to expand...

Huh? If FP16 is missing, then the natural choice would be to use the next floating-point precision upwards; FP24 or FP32, whichever one is available. FX16 should only ever be used when the programmer explicitly asks for a fixed-point type; any other FP16->FX16 replacement is either a bug or a cheat.

Yea...but what would happen if there are only two precisions implemented...FX16 and FP32. If the _pp hint defaults to the lower of the two precisions like it is supposed to (at least as I understand it) then it would be using FX16 whether the developer intended FX16 or FP16.

cthellis42 · Dec 31, 2003

DemoCoder said:
Still can't let go of that fairy tale that IHVs finalize their HW only after the spec is worked out, eh? I guess ATI won't even write one line of code for beyond 3.0+ features until Microsoft hands down the DirectX Next spec and blesses it?

Are you saying that they didn't have a speck of other work to do aside from waiting on the final determination of pixel-pipe precision and then designed the chip from ground-up right then? Or that it would have been too hard to add in the silicon to move them to 32 from 24? (Contrary to what they've already said.)

I'm sure they had plenty down and pre-planned, and I'm also pretty sure they would have had alternate designs at the ready to move on whatever precision was ultimately adopted--since they were trying to make an architecture to take the most advantage of DX9 and all. FP24 may have been their hoped-for design and the one they were pushing for (along with others IHVs and what MS themselves wanted, I suppose, since that's what was adopted), but I don't see it as being beyond them to have added what was necessary to their designs should the API have required FP32 as minimum full precision. (In which case they'd still rather have the better implementation and the _pp hint would still have had to be added in for nVidia to get them running at functional speeds? Not to mention XGI and S3 would be crying even more at trying to enter a market getting even lower yields.)

AFTER this fact NV revealed their multiple precision setup and *lobbied* for this to be supported. DX9.0b (with some bug fixes as well) along with the PS2_0_x targets introduced the _pp hint to support this.

They also had a more complex design that they were hoping would cover all bases WITHOUT having to wait on anything (since obviously there was no chance at aligning above FP32), and if FP32 hadn't come at such a massive penalty they would have had all their bases covered AND offered the most flexible hardware for developers to take advantage of. Since it did, though, _pp hinting became a necessity.

Chalnoth said:
And it is unlikely that nVidia will abandon the current architecture that we see in the NV3x. They will, instead, work on its major faults. It seems pretty certain, then, that nVidia will reduce the register usage performance hit for the NV4x (hopefully they will also improve the AA and anisotropic filtering...), but the NV4x will work on the same basic paradigm.

Indeed, I rather don't think they'll change much for NV40 either. (Nor ATi with R420) If they can get FP32 to function at reasonable speeds, and improve their FP performance in general, they'll be able to offer the added flexibility and performance (since invariably their FX12 and FP16 will still run faster, when more is not needed) without forcing so much on the lower end. Of course the transistor counts on these babies keep skyrocketing, and I'm not sure how much more they're willing to put on without blowing a gasket.

For that reason I think they could perhaps have shifted away from FX12 to concentrate on pumping FP16/32 as much as possible, but think it's a fairly small chance offhand for this gen. (Much more likely with NV50.)

FP16 and market support

Hellbinder

Vince

BRiT

(>• •)>⌐■-■ (⌐■-■)

digitalwanderer

OpenGL guy

Althornin

Senior Lurker

Ostsol

Tim Murray

the Windom Earle of mobile SOCs

Ostsol

Geo

Mostly Harmless

KimB

Ostsol

KimB

Ostsol

Hellbinder

Razor04

Ostsol

arjan de lumens

Razor04

cthellis42

Hoopy Frood

Similar threads