Has FP24 been a limitation yet?

Discussion in 'General 3D Technology' started by nelg, Dec 12, 2004.

  1. nelg

    Veteran

    Joined:
    Jan 26, 2003
    Messages:
    1,557
    Likes Received:
    42
    Location:
    Toronto
    Ok, now I am down to having no examples :). Serves me right for having taken a drunks word for it. Sorry Rev. :wink:

    So hindsight has proven your decisions to be prudent. May I ask if SM3.0 did not require FP32 in the PS would ATI still stick with 24b (feel free to divulge any secret info)?

    Edit. Answered in your response to Joe.
     
  2. sireric

    Regular

    Joined:
    Jul 26, 2002
    Messages:
    348
    Likes Received:
    22
    Location:
    Santa Clara, CA
    I tend to agree. However, the forward looking aspect as to be tempered by costs and schedule. We do tend (both IHVs) to put features in that are forward looking and a little risky. That's a risk that seems to have a good reward potential.

    The worst problem would be if the API were so strict as to make any sort of deviation from it impossible -- How would we innovate then? The cheapest part that implemented everything with the best performance would win. Regretfully, that's not really appealing to me, as a designer. It would also cause stagnation in between API changes. Not very fun, either for IHVs or ISVs.
     
  3. sireric

    Regular

    Joined:
    Jul 26, 2002
    Messages:
    348
    Likes Received:
    22
    Location:
    Santa Clara, CA
    Who knows? Though if max texture size increases, possibly some increase would be required. As well, Marketing pressures do exist (i.e. pointless mine-is-bigger-than-yours without any sort of backing).

    I'm a strong believer that if it's not broken, don't fix it. With limited schedule and resources, we have to limit the things we can change. If something isn't a bottleneck or broken, then not fixing it is good.
     
  4. rwolf

    rwolf Rock Star
    Regular

    Joined:
    Oct 25, 2002
    Messages:
    967
    Likes Received:
    51
    Location:
    Canada
    Considering that Nvidia has been getting by just fine with fp16 shader precision I would have to agree with Sireric that fp24 was good enough and fp32 is currently overkill.
     
  5. rwolf

    rwolf Rock Star
    Regular

    Joined:
    Oct 25, 2002
    Messages:
    967
    Likes Received:
    51
    Location:
    Canada
    I think that fp24 IS a limitation when you start using the VPU for general purpose computing however. I have read lots of articles talking about NVIDIA products being used for general purpose computing and haven't seen any for ATI (not saying there isn't any).

    I think there is lots of potential to move physics, AI, and other math oriented operations onto the VPU. Too many games are CPU bound.
     
  6. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,902
    Likes Received:
    218
    Location:
    Seattle, WA
    Depends upon what they're used for. One could conceivably use vertex textures to get around the limitation that DX9 doesn't support render to vertex buffer, for instance.
     
  7. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,902
    Likes Received:
    218
    Location:
    Seattle, WA
    This is not true in general. FP16 is unusable for texture addressing. FP16 is the format that should be used for most calculations done on color values, however, as the final color output will, with current devices, always be 8-bit.
     
  8. Reverend

    Banned

    Joined:
    Jan 31, 2002
    Messages:
    3,266
    Likes Received:
    24
    What's so special about pixels that make their precision requirements lower than all other areas of computing where 32-bit and 64-bit floating point are the basic building blocks for numerical computation?

    But you asked for an explanation/example : We run into precision problems with FP24 in pixel shaders when re-projecting Z-buffer values into worldspace for deferred shading computations. Won't happen with at least FP32 in pixel shaders. As world sizes increase, we'll eventually run into precision limitations even with FP32 for many algorithms requiring positional computations, and need to move those to (shock, horror) FP64. Those will probably comprise less than 2% of the total FLOPS in our shader code, but it's an area where the results with FP32 eventually won't be satisfactory. Key word being "eventually".

    Look, let's not go over all of this again. How big a problem is the example I gave for game developement? I don't know. How high up the priority list (of IHVs) is the problem I gave when it comes to hardware 3D features and specs that aim to advance game graphics? I don't know. I have absolutely no idea if the example problem I gave will rear its head in a game. I have absolutely no idea if some of the things I consider personally annoying has the same degree of annoyance for game developers. I'm just stating some obvious things/facts. Whether these problems really matter to game developers as they go about their ways in pushing the game development envelope (evolution of gameplay development that makes ever more demands of 3D graphics) is something that probably deserves a 20 page article.

    Remember, I never said FP24 is bad. And remember what andypski said in that thread -- he thinks I brought up that topic because I was very pleased with the R300 (he's right) and I was just simply wondering what would (have) happen(ed) if the extra 8bits made it into the R300.

    I'm not a game developer... but let's just pretend I am, say, John Carmack (hehe) and I bring this up with ATI... how would you guys respond? Will you educate John Carmack on options to solve his nit-picking? Will you tell him not to make his world that big? I'm not trying to be smart again here... I'm truly interested in what goes on between IHV DevRels and ISVs when it comes to cases where ISVs bring up hardware limitations that exists for a game/engine he/she wants to design.
     
  9. WaltC

    Veteran

    Joined:
    Jul 22, 2002
    Messages:
    2,710
    Likes Received:
    8
    Location:
    BelleVue Sanatorium, Billary, NY. Patient privile
    I really think that nV probably has invested a lot more towards convincing general consumer markets of the benefits of its present "full SM3.0 precision" than with game developers as a group. Too, each developer is in a decidedly different state with regard to its own in-house production tools which have been engineered to support SM3.0 in meaningful and practically beneficial ways. So far, the best developer application of it seems strictly superficial in terms of providing something unique and worthwhile.

    What made fp24 so good at the R300 introduction (and so compelling for API adoption) was not "fp 24" in and of itself but rather the quality and practical benefits exemplified by the R300 implementation of fp24, which was not only better in terms of quality than nV's various nV3x color-precision implementations up to and including fp16, but also the fact that in nV3x "fp32" while "supported" was practically unworkable for developers because it was far too slow (meaning much slower than R3x0's fp24 support) and thus of no real benefit to the consumers who bought it expecting that nV3x's fp32 support would in fact have been worth buying in terms of the practical benefits they'd receive from it. I personally don't think a discussion of "fp32" is really worthwhile when it concerns anything outside of the specific implementations of fp32 currently offered by the IHVs--confusing the abstract with the concrete doesn't seem all that helpful to me for these reasons.

    Not a good idea idea at all to confuse cpu "bitness" with fp graphics pipeline "bitness" because there is such a big and fundamental difference (ie, the difference between a P4 and an A64 is not the difference between fp24 and fp32 in a graphics chip pipeline, and the difference in "bitness" is merely the tip of the iceburg)...;)

    As to why R300 went fp24 when nV went fp32 with nV3x, I should think that would be obvious. Having access to the same general theoretical and practical knowledge as to manufacturing processes--indeed, using the same FABs even--what was in my view most strikingly different between nV3x and R3x0 was the difference in the professional judgment the two companies employed. nV3x wound up so far behind R3x0, imo, because of the general gpu design decision differences between the two companies, which boils down to strictly a matter of judgment.

    ATi believed that, first of all, .13 micron manufacturing capability at the time wasn't suitable for fp32-precision gpu pipelines, and that .15 would be better for yields while also allowing them to go to fp24 and 8 pixel-per-clock maximums at the same time. nV, otoh, staked everything on the increased gpu clocks that it believed .13 would provide, and fp16 was expected to counter the handicap of the nV30/5/8 maximum of 4 pixels per clock--the company had been overly dependent on 3rd-party FAB manufacturing-process improvements for several years prior to nV3x and nV3x simply proves it. Hindsight is indeed 20-20 and clearly shows that ATi's judgment for R3x0 design was far better than nV's with its nV3x design.

    It sort of reminds me of the old if-a-tree-falls-in-a-forest-with-no-one-around, does-it-make-a-sound question...;) If not for R3x0 would nV4x0 ever have been conceived? (I rather think not.)

    I don't think it a matter so much of rocket science as I think it is a matter of judgment--you can, for instance, design what you think is the best cpu on Earth but if nobody can build it, or build it to run to promise, then it amounts to nothing or little. Engineering design decisions divorced from manufacturing practicalities are simply disasters in the making. Sometimes you get lucky, but mostly you don't. "99% perspiration and 1% inspiration" is a good rule of thumb to follow I believe, as the saying goes...:D
     
  10. Dio

    Dio
    Veteran

    Joined:
    Jul 1, 2002
    Messages:
    1,758
    Likes Received:
    8
    Location:
    UK
    You've just about answered your own question. What's so special about computation tasks that mean you need both 32-bit and 64-bit floating point support?

    One costs more (in terms of performance, storage space, whatever) and is overkill for some applications. The judgement call was that 24-bit was correct for the current timeframe, and (it seems to me) it was the right one.
     
  11. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    Yeah, but vertex textures are not really useful unless you do something to them via pixel shaders.

    Having said that, however, I agree with you about whether we really need FP32 vertex textures. Only in very isolated circumstances will it be necessary, and we've got a long time to wait (my guess is 3 years or so) before that even has the possibility of showing up in games. Developers aren't even using vertex textures now, and you'll need a sophisticated application like a physics simulation that has delicate equilibrium conditions for you to run into FP24 problems. Water and cloth simulation, which I think will be the first major contributions from vertex textures to gaming, will be absolutely fine with FP24. Even I16 (FX16?) would be sufficient here, much better than FP16 (which isn't that bad itself).

    I say carry on with what you're doing, ATI. NVidia dug themselves a hole by promoting FP32 and slagging FP24 so much, and now they have to bear the die cost.
     
  12. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    Lots of things.

    Why is 32b used everywhere else? CPUs execute one instruction at a time (okay, they're superscalar now, but it's still only a few instructions). The actual multipliers/dividers/etc on them aren't numerous, and don't take up much space. That's why they can calculate at 80-bit FP precision without performance loss. Only with SSE & SSE2 are you limited in precision to 32- or 64-bit sometimes, because now you're going parallel, and both die costs and memory bandwidth add up. That's why there never really was a benefit to lower precision internal calculation.

    FP16 is inadequate for too many applications (less than 3 decimal places in scientific notation), but 32bit isn't. There's a wide range in there, and 32bit isn't the absolute minimum required. They could choose something else, but then you either have alignment problems, or need to store it in 32 bits of space (like ATI does externally) and waste space for years to come because the standards must hold. With CPU's, reproducibility is critical, so one CPU maker can't make a compromise like FP24. Everyone must get precisely the same outcome when running programs.

    Pixels produce an image, though, and 100% exact reproducibility is not required. Furthermore, you have a much more confined set of calculations. As Chalnoth points out many times, FP16 is plenty for colour calculations, including HDR, so it's a great storage format. We're limited by the eye's ability to discern colours here. For graphics spanning very large ranges of distances, you might need FP32 for vertices, but they just tell you where the pixels go, and so are also "allowed" to be less than perfect, especially given the finite resolution of the screen. Texture coordinates and interpolation, OTOH, are examples of where you subtract similar numbers and use the difference. So far it seems fine, but FP24 could be lacking for dependent texture coordinates if they index into a texture that's big, repeats a lot, and is also viewed closely. A collusion of factors like that is extremely rare, though.

    So yes, graphics are different. FP32 is used in general computation because die space issues are small, FP16 isn't enough, and you can't change your mind down the road. FP24 is just an empirically practical choice for graphics, plain and simple. GPU design takes place under much more flexible constraints.
     
  13. DeanoC

    DeanoC Trust me, I'm a renderer person!
    Veteran Subscriber

    Joined:
    Feb 6, 2003
    Messages:
    1,469
    Likes Received:
    185
    Location:
    Viking lands
    To answer the original question, I treat ATI and NVIDIA (FP32) as if they were floats (i.e. treat them exactly like I treat all floating calculations) and never had a problem.

    So I'd say FP24 was good enough that I don't have to think about it...

    Wether that means I'm not clever enough to really need FP32 I couldn't say :)

    And yes having to go through my shader that work full speed on ATI, reducing things to FP16 on GFFX is a pain.
    Maybe a trained monkey can do it, but do you know the cost of a degree education in bananas...
     
  14. rwolf

    rwolf Rock Star
    Regular

    Joined:
    Oct 25, 2002
    Messages:
    967
    Likes Received:
    51
    Location:
    Canada
    And that is the real key isn't it. Ease of use for developers. The rest of the argument is moot.
     
  15. Frank

    Frank Certified not a majority
    Veteran

    Joined:
    Sep 21, 2003
    Messages:
    3,187
    Likes Received:
    59
    Location:
    Sittard, the Netherlands
    If we talk about precision, why use MIP-maps and detail textures? Why not store all textures in FP 32? And why 32 bits? Just because it seems large enough, it is an easy multiple of 8 and most CPUs use them? In that case, you could make a very convincing case for 80 bits. And why not 100 bits then, just to make sure? And require four such values to be computed at the same time, to ease the use of calculating vectors and color values...

    I wrote a program on an 8-bit Sinclair Spectrum long ago to calculate faculties and the value of SQR(2) as exact as my memory could hold the bits that made up the gigantic numbers, that took minutes to scroll past on my screen. The more general purpose the processing elements, the less it matters.

    Everything is gradual and needs to fit the task it is designed for.
     
  16. nelg

    Veteran

    Joined:
    Jan 26, 2003
    Messages:
    1,557
    Likes Received:
    42
    Location:
    Toronto
    Thanks Deano, that answers my question. :)
     
  17. arjan de lumens

    Veteran

    Joined:
    Feb 10, 2002
    Messages:
    1,274
    Likes Received:
    50
    Location:
    gjethus, Norway
    The problem solved by mip-mapping (aliasing/popping/sparkling because features in the texture map become smaller than 1 pixel; mip-mapping is a somewhat crude way of simulating the process of taking multiple samples per pixel) is not the same problem that FP32 would solve.
     
  18. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,902
    Likes Received:
    218
    Location:
    Seattle, WA
    For storage, you want to store objects in power of 2 sizes to make the memory controllers simpler.
     
  19. Reverend

    Banned

    Joined:
    Jan 31, 2002
    Messages:
    3,266
    Likes Received:
    24
    Can you expand on this? What is your particular shader where precision can matter a great deal (or not) to you? What does "full speed" mean in this context?
     
  20. DeanoC

    DeanoC Trust me, I'm a renderer person!
    Veteran Subscriber

    Joined:
    Feb 6, 2003
    Messages:
    1,469
    Likes Received:
    185
    Location:
    Viking lands
    It means when all the shaders are working at a good FPS on ATI or NV40 cards but sucks on GFFX, I have to go through the shader code looking for places to partial precision it.

    This is harder than it sounds as unlike teapot renderers, many games (i.e. Valve, Crytek, Ninja Theory) are using auto-generated shaders. This make adding partial precision much harder as truncating the precision too early in the shader code, can look fine on some shaders and rubbish on the more complex ones. Our current system has the ability to override (by material name) shaders from the auto-generated ones but that takes work to a)find which shader need optimising and b)optimise them.

    As the majority of the cards we are targeting (ATI R3x0, R4x0 and NV40) generally don't need this work, having to do it just for NV3x is a pain.

    I think the problem that often missed in this discussion is that in games we don't really work on 'a' shader but lots. I don't actually know how many pixel shaders we have but I know that total shaders (vertex and pixel) is over 6000.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...