Has FP24 been a limitation yet?

sireric said:
I'm sorry, but that's not an example of FP24 limitations. Comparisons between completely different rasterizers (refras and ATI's), will lead to tons of differences. That's why DCT/WHQL specifies tolerances. There's no way the two can be the same, unless the rasterization algorithm (including setup, Scan conversion, iteration, etc...) is identical in both. I hope this never comes to be, though who knows in the future.
Ok, now I am down to having no examples :). Serves me right for having taken a drunks word for it. Sorry Rev. ;)

So hindsight has proven your decisions to be prudent. May I ask if SM3.0 did not require FP32 in the PS would ATI still stick with 24b (feel free to divulge any secret info)?

Edit. Answered in your response to Joe.
 
Chalnoth said:
Joe DeFuria said:
I also have not been shown ANY convincing case where FP32 pixel shaders makes a quality difference for gaming. (At least, for the near to mid-term.)
And I feel that video cards should be forward-looking. As texture sizes continue to increase, for instance, it would be rather disappointing to suddenly realize that artifacts become visible in dependent texture reads that weren't there for normal texture reads.

I tend to agree. However, the forward looking aspect as to be tempered by costs and schedule. We do tend (both IHVs) to put features in that are forward looking and a little risky. That's a risk that seems to have a good reward potential.

The worst problem would be if the API were so strict as to make any sort of deviation from it impossible -- How would we innovate then? The cheapest part that implemented everything with the best performance would win. Regretfully, that's not really appealing to me, as a designer. It would also cause stagnation in between API changes. Not very fun, either for IHVs or ISVs.
 
nelg said:
sireric said:
I'm sorry, but that's not an example of FP24 limitations. Comparisons between completely different rasterizers (refras and ATI's), will lead to tons of differences. That's why DCT/WHQL specifies tolerances. There's no way the two can be the same, unless the rasterization algorithm (including setup, Scan conversion, iteration, etc...) is identical in both. I hope this never comes to be, though who knows in the future.
Ok, now I am down to having no examples :). Serves me right for having taken a drunks word for it. Sorry Rev. ;)

So hindsight has proven your decisions to be prudent. May I ask if SM3.0 did not require FP32 in the PS would ATI still stick with 24b (feel free to divulge any secret info)?

Edit. Answered in your response to Joe.

Who knows? Though if max texture size increases, possibly some increase would be required. As well, Marketing pressures do exist (i.e. pointless mine-is-bigger-than-yours without any sort of backing).

I'm a strong believer that if it's not broken, don't fix it. With limited schedule and resources, we have to limit the things we can change. If something isn't a bottleneck or broken, then not fixing it is good.
 
Considering that Nvidia has been getting by just fine with fp16 shader precision I would have to agree with Sireric that fp24 was good enough and fp32 is currently overkill.
 
I think that fp24 IS a limitation when you start using the VPU for general purpose computing however. I have read lots of articles talking about NVIDIA products being used for general purpose computing and haven't seen any for ATI (not saying there isn't any).

I think there is lots of potential to move physics, AI, and other math oriented operations onto the VPU. Too many games are CPU bound.
 
OpenGL guy said:
You really think that you need the full range of FP32 for vertex textures?
Depends upon what they're used for. One could conceivably use vertex textures to get around the limitation that DX9 doesn't support render to vertex buffer, for instance.
 
rwolf said:
Considering that Nvidia has been getting by just fine with fp16 shader precision
This is not true in general. FP16 is unusable for texture addressing. FP16 is the format that should be used for most calculations done on color values, however, as the final color output will, with current devices, always be 8-bit.
 
OpenGL guy said:
Reverend said:
I think we have to look at a wider picture (that some probably hadn't thought of) -- that of the business of making games. Games such as space-constrained first-person-shooters (perfect latest example is Doom3) atm gain very little from FP32. This is not due to lack of 3D knowledge from such game programmers but from probably a collective lack of creativity from a single development house. The trend appears to be that gamers want bigger "worlds" to play/explore, and this applies to first-person-shooters as well. Bigger worlds has much bigger demands. Higher precision counts, as an example of such demands.
What does any of this have to do with FP24 vs. FP32 in the pixel shader? All R3xx/R4xx products use FP32 in the vertex shader and it's vertices that define a large world, right? Please explain how FP24 pixel shaders somehow limit your "bigger world".
What's so special about pixels that make their precision requirements lower than all other areas of computing where 32-bit and 64-bit floating point are the basic building blocks for numerical computation?

But you asked for an explanation/example : We run into precision problems with FP24 in pixel shaders when re-projecting Z-buffer values into worldspace for deferred shading computations. Won't happen with at least FP32 in pixel shaders. As world sizes increase, we'll eventually run into precision limitations even with FP32 for many algorithms requiring positional computations, and need to move those to (shock, horror) FP64. Those will probably comprise less than 2% of the total FLOPS in our shader code, but it's an area where the results with FP32 eventually won't be satisfactory. Key word being "eventually".

Look, let's not go over all of this again. How big a problem is the example I gave for game developement? I don't know. How high up the priority list (of IHVs) is the problem I gave when it comes to hardware 3D features and specs that aim to advance game graphics? I don't know. I have absolutely no idea if the example problem I gave will rear its head in a game. I have absolutely no idea if some of the things I consider personally annoying has the same degree of annoyance for game developers. I'm just stating some obvious things/facts. Whether these problems really matter to game developers as they go about their ways in pushing the game development envelope (evolution of gameplay development that makes ever more demands of 3D graphics) is something that probably deserves a 20 page article.

Remember, I never said FP24 is bad. And remember what andypski said in that thread -- he thinks I brought up that topic because I was very pleased with the R300 (he's right) and I was just simply wondering what would (have) happen(ed) if the extra 8bits made it into the R300.

I'm not a game developer... but let's just pretend I am, say, John Carmack (hehe) and I bring this up with ATI... how would you guys respond? Will you educate John Carmack on options to solve his nit-picking? Will you tell him not to make his world that big? I'm not trying to be smart again here... I'm truly interested in what goes on between IHV DevRels and ISVs when it comes to cases where ISVs bring up hardware limitations that exists for a game/engine he/she wants to design.
 
Reverend said:
...

DX9 games may be reasonably "plentiful" right now and not a single one of them may have demonstrated why more than 24-bits FP is required... but that's due to the above paragraph. Not due to lack of knowledge by programmers of what's needed for PC 3D graphics to progress.

In any case, this is a simple-looking question with the possibility of complicated answers. ISVs aren't stupid enough to make a game that requires nothing but full SM3.0 precision (32-bits), regardless of whether NVIDIA may have tried to be extremely persuasive.

I really think that nV probably has invested a lot more towards convincing general consumer markets of the benefits of its present "full SM3.0 precision" than with game developers as a group. Too, each developer is in a decidedly different state with regard to its own in-house production tools which have been engineered to support SM3.0 in meaningful and practically beneficial ways. So far, the best developer application of it seems strictly superficial in terms of providing something unique and worthwhile.

What made fp24 so good at the R300 introduction (and so compelling for API adoption) was not "fp 24" in and of itself but rather the quality and practical benefits exemplified by the R300 implementation of fp24, which was not only better in terms of quality than nV's various nV3x color-precision implementations up to and including fp16, but also the fact that in nV3x "fp32" while "supported" was practically unworkable for developers because it was far too slow (meaning much slower than R3x0's fp24 support) and thus of no real benefit to the consumers who bought it expecting that nV3x's fp32 support would in fact have been worth buying in terms of the practical benefits they'd receive from it. I personally don't think a discussion of "fp32" is really worthwhile when it concerns anything outside of the specific implementations of fp32 currently offered by the IHVs--confusing the abstract with the concrete doesn't seem all that helpful to me for these reasons.

2) I think you (and perhaps many others) misread my comments in that thread -- that's not exactly the kind of discussion (i.e. FP24 good enough for games) I was trying to promote. I was trying to figure out why ATI did not or could not go with 32-bits with the R300, given that the NV30 has it (although we know what this meant in terms of performance for the NV30) and given that the R300 really was swell with FP24, when what we know of the progress of the DX9 specifications development and that of the existing IEEE-754. I am not a hardware engineer, so perhaps folks didn't realize this... at the time of the thread, I have no idea how expensive it is silicon-wise for an extra 8-bits ! :)

Not a good idea idea at all to confuse cpu "bitness" with fp graphics pipeline "bitness" because there is such a big and fundamental difference (ie, the difference between a P4 and an A64 is not the difference between fp24 and fp32 in a graphics chip pipeline, and the difference in "bitness" is merely the tip of the iceburg)...;)

As to why R300 went fp24 when nV went fp32 with nV3x, I should think that would be obvious. Having access to the same general theoretical and practical knowledge as to manufacturing processes--indeed, using the same FABs even--what was in my view most strikingly different between nV3x and R3x0 was the difference in the professional judgment the two companies employed. nV3x wound up so far behind R3x0, imo, because of the general gpu design decision differences between the two companies, which boils down to strictly a matter of judgment.

ATi believed that, first of all, .13 micron manufacturing capability at the time wasn't suitable for fp32-precision gpu pipelines, and that .15 would be better for yields while also allowing them to go to fp24 and 8 pixel-per-clock maximums at the same time. nV, otoh, staked everything on the increased gpu clocks that it believed .13 would provide, and fp16 was expected to counter the handicap of the nV30/5/8 maximum of 4 pixels per clock--the company had been overly dependent on 3rd-party FAB manufacturing-process improvements for several years prior to nV3x and nV3x simply proves it. Hindsight is indeed 20-20 and clearly shows that ATi's judgment for R3x0 design was far better than nV's with its nV3x design.

It sort of reminds me of the old if-a-tree-falls-in-a-forest-with-no-one-around, does-it-make-a-sound question...;) If not for R3x0 would nV4x0 ever have been conceived? (I rather think not.)

One thing that intrigues me is whether -- during the (perhaps table-banging) discussions that goes on with MS and various IHVs in next-version-of-DirectX-shaping meetings -- each IHV's knowledge and capabilities to take advantage of (in a very, very proficient and efficient way) current and soon-to-be-available process technology are brought up to MS. Which IHV has the bigger brains, so to speak :) . I still have absolutely no idea how the specification process of each DX version starts, progress and ends.

I don't think it a matter so much of rocket science as I think it is a matter of judgment--you can, for instance, design what you think is the best cpu on Earth but if nobody can build it, or build it to run to promise, then it amounts to nothing or little. Engineering design decisions divorced from manufacturing practicalities are simply disasters in the making. Sometimes you get lucky, but mostly you don't. "99% perspiration and 1% inspiration" is a good rule of thumb to follow I believe, as the saying goes...:D
 
Reverend said:
What's so special about pixels that make their precision requirements lower than all other areas of computing where 32-bit and 64-bit floating point are the basic building blocks for numerical computation?
You've just about answered your own question. What's so special about computation tasks that mean you need both 32-bit and 64-bit floating point support?

One costs more (in terms of performance, storage space, whatever) and is overkill for some applications. The judgement call was that 24-bit was correct for the current timeframe, and (it seems to me) it was the right one.
 
OpenGL guy said:
Chalnoth said:
OpenGL guy said:
2. Render to vertex buffer/texture.
Render to VB is not supported by the API. I don't see how render to texture is affected and differently than normal rendering.
Here I'm talking about a FP32 texture that would be used in the vertex shader.
You really think that you need the full range of FP32 for vertex textures?
This, then, would be an argument for those stating that the FP32 requirement for PS3 is ridiculous.
Not at all. What it argues against is the need for FP32 vertex textures.
Yeah, but vertex textures are not really useful unless you do something to them via pixel shaders.

Having said that, however, I agree with you about whether we really need FP32 vertex textures. Only in very isolated circumstances will it be necessary, and we've got a long time to wait (my guess is 3 years or so) before that even has the possibility of showing up in games. Developers aren't even using vertex textures now, and you'll need a sophisticated application like a physics simulation that has delicate equilibrium conditions for you to run into FP24 problems. Water and cloth simulation, which I think will be the first major contributions from vertex textures to gaming, will be absolutely fine with FP24. Even I16 (FX16?) would be sufficient here, much better than FP16 (which isn't that bad itself).

I say carry on with what you're doing, ATI. NVidia dug themselves a hole by promoting FP32 and slagging FP24 so much, and now they have to bear the die cost.
 
Reverend said:
What's so special about pixels that make their precision requirements lower than all other areas of computing where 32-bit and 64-bit floating point are the basic building blocks for numerical computation?
Lots of things.

Why is 32b used everywhere else? CPUs execute one instruction at a time (okay, they're superscalar now, but it's still only a few instructions). The actual multipliers/dividers/etc on them aren't numerous, and don't take up much space. That's why they can calculate at 80-bit FP precision without performance loss. Only with SSE & SSE2 are you limited in precision to 32- or 64-bit sometimes, because now you're going parallel, and both die costs and memory bandwidth add up. That's why there never really was a benefit to lower precision internal calculation.

FP16 is inadequate for too many applications (less than 3 decimal places in scientific notation), but 32bit isn't. There's a wide range in there, and 32bit isn't the absolute minimum required. They could choose something else, but then you either have alignment problems, or need to store it in 32 bits of space (like ATI does externally) and waste space for years to come because the standards must hold. With CPU's, reproducibility is critical, so one CPU maker can't make a compromise like FP24. Everyone must get precisely the same outcome when running programs.

Pixels produce an image, though, and 100% exact reproducibility is not required. Furthermore, you have a much more confined set of calculations. As Chalnoth points out many times, FP16 is plenty for colour calculations, including HDR, so it's a great storage format. We're limited by the eye's ability to discern colours here. For graphics spanning very large ranges of distances, you might need FP32 for vertices, but they just tell you where the pixels go, and so are also "allowed" to be less than perfect, especially given the finite resolution of the screen. Texture coordinates and interpolation, OTOH, are examples of where you subtract similar numbers and use the difference. So far it seems fine, but FP24 could be lacking for dependent texture coordinates if they index into a texture that's big, repeats a lot, and is also viewed closely. A collusion of factors like that is extremely rare, though.

So yes, graphics are different. FP32 is used in general computation because die space issues are small, FP16 isn't enough, and you can't change your mind down the road. FP24 is just an empirically practical choice for graphics, plain and simple. GPU design takes place under much more flexible constraints.
 
To answer the original question, I treat ATI and NVIDIA (FP32) as if they were floats (i.e. treat them exactly like I treat all floating calculations) and never had a problem.

So I'd say FP24 was good enough that I don't have to think about it...

Wether that means I'm not clever enough to really need FP32 I couldn't say :)

And yes having to go through my shader that work full speed on ATI, reducing things to FP16 on GFFX is a pain.
Maybe a trained monkey can do it, but do you know the cost of a degree education in bananas...
 
DeanoC said:
So I'd say FP24 was good enough that I don't have to think about it...

And that is the real key isn't it. Ease of use for developers. The rest of the argument is moot.
 
If we talk about precision, why use MIP-maps and detail textures? Why not store all textures in FP 32? And why 32 bits? Just because it seems large enough, it is an easy multiple of 8 and most CPUs use them? In that case, you could make a very convincing case for 80 bits. And why not 100 bits then, just to make sure? And require four such values to be computed at the same time, to ease the use of calculating vectors and color values...

I wrote a program on an 8-bit Sinclair Spectrum long ago to calculate faculties and the value of SQR(2) as exact as my memory could hold the bits that made up the gigantic numbers, that took minutes to scroll past on my screen. The more general purpose the processing elements, the less it matters.

Everything is gradual and needs to fit the task it is designed for.
 
DiGuru said:
If we talk about precision, why use MIP-maps and detail textures? Why not store all textures in FP 32? And why 32 bits? Just because it seems large enough, it is an easy multiple of 8 and most CPUs use them? In that case, you could make a very convincing case for 80 bits. And why not 100 bits then, just to make sure? And require four such values to be computed at the same time, to ease the use of calculating vectors and color values...

Everything is gradual and needs to fit the task it is designed for.
The problem solved by mip-mapping (aliasing/popping/sparkling because features in the texture map become smaller than 1 pixel; mip-mapping is a somewhat crude way of simulating the process of taking multiple samples per pixel) is not the same problem that FP32 would solve.
 
DiGuru said:
If we talk about precision, why use MIP-maps and detail textures? Why not store all textures in FP 32? And why 32 bits? Just because it seems large enough, it is an easy multiple of 8 and most CPUs use them? In that case, you could make a very convincing case for 80 bits. And why not 100 bits then, just to make sure? And require four such values to be computed at the same time, to ease the use of calculating vectors and color values...
For storage, you want to store objects in power of 2 sizes to make the memory controllers simpler.
 
DeanoC said:
And yes having to go through my shader that work full speed on ATI, reducing things to FP16 on GFFX is a pain.
Can you expand on this? What is your particular shader where precision can matter a great deal (or not) to you? What does "full speed" mean in this context?
 
Reverend said:
DeanoC said:
And yes having to go through my shader that work full speed on ATI, reducing things to FP16 on GFFX is a pain.
Can you expand on this? What is your particular shader where precision can matter a great deal (or not) to you? What does "full speed" mean in this context?
It means when all the shaders are working at a good FPS on ATI or NV40 cards but sucks on GFFX, I have to go through the shader code looking for places to partial precision it.

This is harder than it sounds as unlike teapot renderers, many games (i.e. Valve, Crytek, Ninja Theory) are using auto-generated shaders. This make adding partial precision much harder as truncating the precision too early in the shader code, can look fine on some shaders and rubbish on the more complex ones. Our current system has the ability to override (by material name) shaders from the auto-generated ones but that takes work to a)find which shader need optimising and b)optimise them.

As the majority of the cards we are targeting (ATI R3x0, R4x0 and NV40) generally don't need this work, having to do it just for NV3x is a pain.

I think the problem that often missed in this discussion is that in games we don't really work on 'a' shader but lots. I don't actually know how many pixel shaders we have but I know that total shaders (vertex and pixel) is over 6000.
 
Back
Top