FP - DX9 vs IEEE-32

Discussion in 'General 3D Technology' started by Reverend, Jun 3, 2003.

  1. Nick

    Veteran

    Joined:
    Jan 7, 2003
    Messages:
    1,881
    Likes Received:
    17
    Location:
    Montreal, Quebec
    If you need full IEEE compliance and have control over evaluation order, you will have to do it in software and live with the fact that it's slower. Of course there are much better alternatives than the reference rasterizer: swShader. :p

    Else just be content. Your eyes can't see the difference anyway. Well I should be speaking for myself... :roll:
     
  2. Simon F

    Simon F Tea maker
    Moderator Veteran

    Joined:
    Feb 8, 2002
    Messages:
    4,560
    Likes Received:
    157
    Location:
    In the Island of Sodor, where the steam trains lie
    Do you mean allowing the shader compiler to re-order the operations, eg assume associativity or distributive law? That's risky. As I said, a certain IHV appeared to be using different calcs in the 'fixed T&L' part of the drivers depending on whether shading was on or off and it definitely caused some major rendering errors. (eg Z values changing and making objects flicker).
     
  3. Entropy

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,059
    Likes Received:
    1,021
    A small comment from the scientific computing field.

    Code that critically depend on the minutiae that Reverend brings up is effectively broken. You should never, ever write anything which makes those kinds of assumtions.

    Assuming rounded rather than truncated results is pretty much as far as you can hope for. If you _need_ control, you should explicitly code for it, never leave it to the system to take care of for you.

    Now, in scientific computing, codes tend to have very long life and get ported all over the place, and is thus probably a worst case, but generally the experience should carry over.

    Sireric explained nicely why FP24 is a good compromise for the tasks we ask of this hardware. If you do something else though and need fp32, by all means buy whatever supports it. But making the product significantly slower/costlier for some hypothetical benefit just doesn't make sense. The very same tradeoffs have been made on the CPUs you are currently running on.

    BTW, the above should in no way be construed as endorsing general sloppyness when defining computational tools. From personal experience, I do however endorse extreme suspiciousness on the part of programmers as far as these issues are concerned. "Just don't count on it."

    Entropy
     
  4. Humus

    Humus Crazy coder
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    3,217
    Likes Received:
    77
    Location:
    Stockholm, Sweden
    The only place where such optimizations would causes problems is on the vertex position output, since it affects fragment depths. Otherwise it should be pretty safe. In the fragment pipeline I see no reason why it should ever be a problem.
     
  5. darkblu

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    2,642
    Likes Received:
    22
    what's with the 'fragment pipeline' and 'vertex pipeline'? - it all comes down to 'real' data ending up discrete (actually it's discrete data getting 'grossly more' discrete, so to say, but nevermind). so, until the very final color output of the very last 'pass' of the algorithm at hand you'd want as high error-proofness as possible (in consumer's terms - 'as money can buy'). saying that poeple don't scrutinize an artist's work under a microscope is not quite the analogy - microscopes deal w/ spatial not so w/ spectral precision, and with the latter you don't know if the artist wouldn't have liked the means to express his vision of a particular color even further than what the 'present art' allowed him. humans strive for perfection - and they wouldn't give it up if they had the means to achieve it (resources & time). in this regard, i'm perfectly fine with the dx9 ps/vs specs, but that does not mean i'm set with those for the rest of my life (any life span expectations aside).

    ps: a pretty please w/ sugar on top goes to the well-respected ati employees who spend their well-deserved but sparse spare time to post on these forums - could you (arbitrarily) improve on the aniso algo for the next parts currently in design? i believe i'm speaking for those ppl of the mindset 'aniso should be rather costlier but nicer'. thank you.

    pps: before anybody gets the wrong idea, my humble opinion is that r3xx is the best dx9 implementation by far for the time being. i just wish it could be a bit better ;)
     
  6. antlers

    Regular

    Joined:
    Aug 14, 2002
    Messages:
    457
    Likes Received:
    0
    Is ATI going with FP24 more akin to 3dfx going with 16-bit only in the Voodoo 3, or 3dfx going 16-bit only in the Voodoo1?

    I think it's more akin to going 16-bit only with the Voodoo1. Sure, FP32 is nice to have, but the applications that would demand it and the technology to support it at fast speeds aren't here yet (I've yet to be convinced that the NV35 can do FP32 shaders at adequate speeds).

    Also, when it comes to color precision, there is diminishing returns. The visible difference between FP24 and FP32 would be much less than between FP16 and FP24.
     
  7. Deflection

    Newcomer

    Joined:
    Jul 16, 2002
    Messages:
    66
    Likes Received:
    0
    Humus's Mandlebrot demo. It's basically a worst case scenario demo where precision errors can "spiral" out of control (pardon the pun:) Even there you really have to zoom to see it. Kind of like the SS rotating floors for AF on the radeon. The stuff we've seen so far seems to be that FP24 can handle pretty much all that's out there to a very acceptible level.

    Where I'm not sure, is that the same can't be said of FP16. ATI and MS don't seem to think so, but Humus's demo can't really be used to judge that because it is a worst case scenario. The 3Dmark demo did show differences too under close examination. The question is, does it fall more on the side of "worst case scenario" or "real games will see results like these". The framerates are rather low which implies intensive shaders that might not make it in to DX9 games. Some people on this forum have said textures need the extra precision, but I don't have the knowledge to judge that.

    In any case, we're just now starting to see DX8 pixel shader games. I think it's safe to say the r300 is the best DX9 design so far, but it's tough to say by how much without the games to compare.
     
  8. Dio

    Dio
    Veteran

    Joined:
    Jul 1, 2002
    Messages:
    1,758
    Likes Received:
    8
    Location:
    UK
    Textures already do need extra precision.

    If you consider the concept of a 'location' in a texture - well, the biggest textures are 2048x2048. That's 11 bits. But for smooth bilinear filtering, you have to have subtexel precision (because the bilinear interpolation factor is the fractional part of the texture coordinate). That's at least four more bits to be acceptable, and might be more like six.
     
  9. Luminescent

    Veteran

    Joined:
    Aug 4, 2002
    Messages:
    1,036
    Likes Received:
    0
    Location:
    Miami, Fl
    That is exectly what sireric referred to when he wrote this a while back, in reference to R3*:

     
  10. Reverend

    Banned

    Joined:
    Jan 31, 2002
    Messages:
    3,266
    Likes Received:
    24
    On a single machine, it's deterministic.
    On all machines supporting DX9, no. NVIDIA's * function and ATI's * function are not the same function because NVIDIA's a*b and ATI's a*b differ.

    Correct, the theoretical thing going on here is that floating point numbers form a "semi-field", rather than an field, because certain laws fail, such as associativity (a semifield is a data type equipped with addition, negation, multiplication, inverse, zero and one; a field is a semifield where all of the operations obey all of the associative, distributive, etc., laws). But at least IEEE defines the operations deterministically across machines

    Whoa, so true. So, C compilers tend to have optimization options that you can turn on to let the compiler pretend that identities like (a+b)+c = a+(b+c) are true so it can rearrange your code to make it faster. Like most compilers' "assume no aliasing" optimization flag, this isn't strictly safe, but is usually good enough for most tasks. The difference here is that with C, the programmer can choose whether to do things precisely or quickly, whereas with DirectX9, the hardware has already decided for you.
     
  11. Reverend

    Banned

    Joined:
    Jan 31, 2002
    Messages:
    3,266
    Likes Received:
    24
    My bad, you're right. This case actually occurs, for example, when a player is in a small room with the lightsource, and the room is far away from the origin of the world.

    Yup, you're right.

    The approaches are pretty similar. The problem occurs just as much in 1D as in 3D, for example (a-b)^2 where a and b are both large numbers that are almost equal.

    Yes, you can definitely reduce the amount of error by arranging calculations as carefully as possible, and moving certain things into the vertex shader (or doing them on the CPU in double-precision and passing the final results down to a VS). This all requires more programming effort of course. It also limits the generality of what you can set up. When you are writing a single pixel shader, you can look at the overall algorithm and manage its precision carefully.

    But for example if you're writing a bunch of shader components that can be combined together to form pixel shaders (for example, a specular lighting module, a spherical harmonic module, attenuation modules, etc), you can't be so sure about how much precision will be lost as data is passed between the different routines, given that they can be plugged together arbitrarily, by artists. This is the essence of what engines are meant to do, not to provide a single shader or single feature, but a bunch of shaders that the content creators can piece together to achieve the effect they want.

    I'm sure for all that I've written thus far, there is a spirit of the arguments for FP24 (or other hardcoded hardware limitations in general) being always something like "for all the shaders we can think of, this isn't a problem. If you think there's a problem, send us a shader and we'll show you how to work around our limitations with it.". The flaw in that logic is that it assumes isolated pieces of shader code matter, but what really matters is the set of all possible shaders an engine can generate. If you look at Max or Maya's lighting models and material systems, they're all along these lines, not a single shader with a few knobs you can twiddle with, but general frameworks for combining arbitrary other shader functionality.
     
  12. Reverend

    Banned

    Joined:
    Jan 31, 2002
    Messages:
    3,266
    Likes Received:
    24
    That's the thing... the entire chicken-and-egg scenario. Sure, the apps that demand it aren't here yet but if we have a long timeframe where FP24 hardware is the majority of the video cards out there, it will be even longer before we see such apps than if FP32 hardware had debuted instead of FP24 in the first place. That's logical and makes business sense to developers who sells games.

    Obviously, it all comes down to performance when you make a piece of hardware. But the point of my starting this thread really isn't about slower FP32 performance compared to FP24 -- it was simply about instances where I think FP24 has definite disadvantages compared to FP32 and I wanted others to confirm if my understanding and thinking about this is correct or not because I have never had much faith in myself when I see and know there are so many folks here more knowledgeable about coding and hardware than myself :)
     
  13. Colourless

    Colourless Monochrome wench
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,274
    Likes Received:
    30
    Location:
    Somewhere in outback South Australia
    Even though I get a feeling this comment is going to bite me in the ass sometime in the future, Precision be damned! The biggest limitation that I'm facing with the R300 is purely instruction counts!

    More instruction slots and more registers are needed right now, not really more precision. Of course, GFFX has all three, so Nvidia at least did something right with it.
     
  14. Luminescent

    Veteran

    Joined:
    Aug 4, 2002
    Messages:
    1,036
    Likes Received:
    0
    Location:
    Miami, Fl
    It seems the R3xx's lattest incarnation (R350) supports an unlimited number of instructions (in the fragment shader) via f-buffer; altough I'm not sure if the functionality is currently exposed in drivers.
     
  15. LeStoffer

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,253
    Likes Received:
    13
    Location:
    Land of the 25% VAT
    Yes, but I prefer to look at it from a much more practical view: Going from FP24 to FP32 for developers can't really take much of an effort when you look at how much they had to upgrade their skill to write shader in the first place (PS 1.1 - 1.4) and working with FP (PS 2.0) the second time around.

    And then Colourless brings up the crucial point of what you want the IHV to include in their silicon budget (gotta love that word): Do you really want them to use up so much space for FP32 when we are still in the very start of cinematic rendering (like Colourless mention more instruction slots and registers)?

    In other words: I sincerely doubt that the industry will stop in it's tracks if we don't see all IHV's doing FP32 before DX10. :wink: Just for the record: I think ATI made the right decision with R300 for all us non-developers, while I can see why nVidia wanted the developers to have the opportunity to mess around with the future today.

    I know this isn't the point you're making - I don't care about IEEE standards in my games :p - but I just like to keep part of the discussion within the constraints of reality (the given silicon budget). IMHO.
     
  16. Humus

    Humus Crazy coder
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    3,217
    Likes Received:
    77
    Location:
    Stockholm, Sweden
    And that's the way it should be. We have never had any more determinism and we should not enforce it because it's basically useless and a heck of a burden to put on the shoulder of IHVs and in the end on the customers.

    In OGL2 there has been talks about providing ways to turn optimizations off, but I don't know the status of that though. That should satisfy everyone. For shaders optimisations should default to on.


    Either way Reverend, you haven't explained why just 32 bits is significant. It's an arbitrary number just like every other. Assume ATI had provided fp32 already, this whole discussion would still apply, except all number += 8. The same argumentation could be made that "why don't we have fp40, there are applications that could use it".
     
  17. Reverend

    Banned

    Joined:
    Jan 31, 2002
    Messages:
    3,266
    Likes Received:
    24
    There is no additional effort (FP24 -> FP32) if you know exactly what you aim for -- FP32 is available to me, I know what it offers and what its limitations are and I work on that from the very start... this isn't about "upgrading". All of my postings in this thread is based on using FP32 -- I can't do this (which is important to me, for what I have in mind, which as OpenGL guy pointed out in a hidden way, doesn't matter) with FP24. I don't know if what I want is important nor what a game developer may want to do, of course.

    Do I want them to? Yes I do. But I don't have/need to consider competition and I don't work for a IHV :)

    This is rather silly -- of course the "industry" won't stop because of this.

    Perhaps all that I have written is based on the fact that the R300 is a resounding success -- and usually when I see a resounding success, I start thinking "Why didn't they do this in the first place?" Kinda like asking for a mile when I am given an inch :)
     
  18. Reverend

    Banned

    Joined:
    Jan 31, 2002
    Messages:
    3,266
    Likes Received:
    24
    The entire point of starting this thread is based on DX9 and IEEE-32, both available standards. It's not about "XX bits" nor an additional 1 bit -- it's about the two standards I know of, which I offered as a the basis for this discussion. If I followed your way of thought, this thread wouldn't exist -- nothing is ever enough.

    You appear to not know the basis of my wanting to start this discussion, which was very specific -- it's about FP24 and FP32, nothing more than that -- and you have digressed onto "But what is enough for you Rev?", which isn't what I want to talk about. I gave specific examples of why I want FP32, and not FP24. Not why I always want more. I have explained why FP32 is significant to me (to me, to me, TO ME ALONE! :) ) compared to the availability of FP24. I have not explained why FP32 is enough as a distinct floating point spec (32bit) because that would be pointless -- as I said, nothing is ever enough when you get more creative. I am simply working on FP32 and 32bits alone, compared to FP24 and 24-bits. Hope this is clear.

    If you want to me to stick to talking about "what's available", I would have nothing to say and live with what's available because, well, that's all I can do, right?
     
  19. Humus

    Humus Crazy coder
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    3,217
    Likes Received:
    77
    Location:
    Stockholm, Sweden
    Then what's this talk about reproducibility all about? If someone goes to fp40, then any reproducibility is once again kicked out of the window. Arguing for a particular precision is odd IMO, be it 32, 24 or anything else. It's more precision => better (assuming same performance).
     
  20. Reverend

    Banned

    Joined:
    Jan 31, 2002
    Messages:
    3,266
    Likes Received:
    24
    As is usually the case in any thread, things get sidetracked -- I didn't bring up reproducibility. Well, actually I did but I had to, in response to sireric's first post in this thread, hehe :)

    I can tell you one thing though -- I already know why I want more than FP32... but that'll have to be in another thread. And another time where I'll be damned for wanting more than what is the "API" standard. :)
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...