In my opinion, Shader Model 3.0 is a huge step forward compared with Shader Model 2.0. Shader Model 3.0 adds dynamic branching in the pixel shader and while it's not required, I'm expecting the major IHVs to provide a complete orthogonal solution (FP16 texture filtering/blending) in their next HW iteration.
At this point (things may change), I'm expecting that Splinter Cell - X will only support SM 1.1 and SM 3.0 when it comes out.
-Very significant performance improvement because of dynamic branching in the PS unit.
-Orthogonal FP16 operations
-Market (can't discuss that yet)
SM 3.0 is going to be good enough for some time. There is only one big step left (before GPU start evolving just like CPUs --> performance only) that should allow classic global illumination algorithms to be efficient on GPUs. I doubt SM 4.0 will provide that.
_________________
Dany Lepage
Lead Programmer
Splinter Cell - X
UbiSoft Montreal
Ailuros said:Is that New Year in Turkey or New Year with a turkey?
I most certainly do not tend to hang out with my dinner
Because what is being talked about here is the external floating point formats, which have always been FP16 and FP32 in DX9 - nothing new in that. Also, people aren't arguing that FP16 isn't part of the spec (or at least they shouldn't be) - when the _pp hint is set the spec defines FP16 as the minimum precision. What it isn't legal for a driver to do is to use FP16 under any circumstances when the _pp hint isn't set, whether it believes it will affect the output noticably or not.radar1200gs said:Given the insistence by some members of this forum that FP16 is not a valid part of the DX9 spec, I'd like to hear opinions on why, if this was the case, Microsoft would bother doing anything with them and why they didn't instead apply the changes to FP24 (since if you listen to the fanboys FP24 is the logical, obviously superior to anything else out there format)?
andypski said:What it isn't legal for a driver to do is to use FP16 under any circumstances when the _pp hint isn't set, whether it believes it will affect the output noticably or not.
FP24 is the standard internal high-precision format of the shaders - in terms of external memory accesses it makes sense to restrict the widths of the data to powers of two, but internally you have much more freedom, and so you can be more flexible in tradeoffs between silicon cost and overall precision.
Whether that is acceptable or not is really up to Microsoft to define as the owners of the API. Without guidance on this the only thing you can say is that _pp can be run at partial precision, and not _pp must be run at at least FP24. Perhaps Microsoft would be inclined to allow such subtitutions - but I expect that situations where this is permitted would need to be very clearly defined - clearly any lower-precision substitution that could in any way affect the output value is obviously invalid.DemoCoder said:What about strength reductions or substitutions that won't effect the output at all? There are probably a few cases where the compiler can make conservative substitutions, especially on short shaders that do hardly anything, deal only with integer textures, and write to normal integer framebuffers.
Yes - naturally we will see support for higher precision formats coming along as VPUs become used more frequently in applications outside of entertainment, and also just as a natural consequence of advances in technology.Agreed, but it is nice when HW supports an extended FP32 precision if you are doing some scientific rendering or non-games stuff. I coded up some scientific algorithms on the GPU a few months ago as an experiment, and it kinda sucked that my accuracy turned out alot worse than a C program because my parameters were getting truncated.
Yes, I'm really hoping that we get full FP16 framebuffer (and texture) support.andypski said:What Dany is talking about here is a move towards treating external floating point formats as first class citizens - ie. having all the same capabilities as current integer formats with respect to blending and filtering, whereas in current hardware floating point formats can typically only be point-sampled and not blended. Initially Dany states that he expects to see this for FP16 formats because it is cheaper to do the filtering and blending on FP16 than FP32.
Yeah, it would be really nice to be able to just tell the compiler, "Just use whatever precision you think won't affect the output," and not worry about it. Then just switch some optimizations off if it starts to look ugly.DemoCoder said:I agree. I brought up the issue of needing a "hinting" mechanism for the driver a while ago, due to the fact that the drivers now contain optimizers, and sometimes you need to switch optimizations off, especially if the optimizer is doing something bad on some pathological case.
Heh. I've used a few compilers like that. Except you have to switch off some optimizations at random times when the compiler doesn't like whatever construct you've come up with.Chalnoth said:Yeah, it would be really nice to be able to just tell the compiler, "Just use whatever precision you think won't affect the output," and not worry about it. Then just switch some optimizations off if it starts to look ugly.
It's not like it's something you'd be forced to use. Remember that this would be a substitute for paying close attention to which precisions are needed where.RussSchultz said:Heh. I've used a few compilers like that. Except you have to switch off some optimizations at random times when the compiler doesn't like whatever construct you've come up with.
From an engineering/programming standpoint, that sucks. It should work as designed, per spec, all the time, not some indeterminate output.
Agreed. It can be just as hairy when a particular CPU architecture (actually, let's cut straight to the chase - it's the x86) decides it's going to use higher precision just because you've given the C-compiler more opportunity to optimise the code.RussSchultz said:Heh. I've used a few compilers like that. Except you have to switch off some optimizations at random times when the compiler doesn't like whatever construct you've come up with.Chalnoth said:Yeah, it would be really nice to be able to just tell the compiler, "Just use whatever precision you think won't affect the output," and not worry about it. Then just switch some optimizations off if it starts to look ugly.
From an engineering/programming standpoint, that sucks. It should work as designed, per spec, all the time, not some indeterminate output.
The x86 will always use 80-bit FP calculations unless you're using some of the SIMD instructions. But that shouldn't be a problem unless you're using the equality operator, and not using the equality operator on floats was one of the first things I learned in programming classes. It's just not something you do.Simon F said:Agreed. It can be just as hairy when a particular CPU architecture (actually, let's cut straight to the chase - it's the x86) decides it's going to use higher precision just because you've given the C-compiler more opportunity to optimise the code.
I've mentioned this before but I've had no end of problems when I optimised/rewrote some floating-point intensive code used in the texture compressor. When you are trying to determine if something converges and sometimes it's computed at 32-bit and then at other times 80-bit, you get no end of problems. A complete nightmare.
Chalnoth, I suggest you try writing an SVD routine and see what happensChalnoth said:The x86 will always use 80-bit FP calculations unless you're using some of the SIMD instructions. But that shouldn't be a problem unless you're using the equality operator, and not using the equality operator on floats was one of the first things I learned in programming classes. It's just not something you do.
Chalnoth said:The x86 will always use 80-bit FP calculations unless you're using some of the SIMD instructions.
Hyp-X said:Chalnoth said:The x86 will always use 80-bit FP calculations unless you're using some of the SIMD instructions.
Wrong.
You can set the calculating precision of the FPU in the control word to 32, 64 or 80 bits.
For example D3D sets 32 bit (for the entire program!!!).
The flags are set to 64 bit by default in MSVC.
For all FPU operations on x86 these 2 code sequences will produce different results in Reg0
Program 1
Reg2 = Reg0 + Reg1
Store Reg2 to memory
Read memory to Reg2
Reg0 = Reg2 + Reg1
Program 2
Reg2 = Reg0 + Reg1
Reg0 = Reg2 + Reg1
But still, most compilers enable 'aggressive' floating point optimizations by default. The simple rule is to include error margins for any comparison when using floating point. Someone suggested that they'd better remove the equality and inequality operators from the language, to prevent 'stupid' programmers to fuck up, because they have no clue that floating points don't have unlimited precision.DemoCoder said:Well, even on sensible CPU's, you can get different results depending on the optimizer due to code motion and reordering, which can change between recompiles of your program. ANSI C/C++ "banned" associativity optimizations (e.g. (a+b)+c cannot be evaluated as a+(b+c) ) , but there are still compilers that offer this, other languages don't have similar bans, and even in the ANSI C case, there are still some pathological optimizer issues. (compiler is still allowed to compute A, B, and C in any order and cache or move the results)
Just for the record for those interested, that is an option I wish to avoid as gprof tells me that ~60% of the run time is in this function so it would rather defeat my efforts to optimise that piece of codeDeanoC said:Most compiler have a mode which will ensure that a float really is a float by transferin and reading back at every operation all floats/doubles etc.
DemoCoder said:Well, even on sensible CPU's, you can get different results depending on the optimizer due to code motion and reordering, which can change between recompiles of your program. ANSI C/C++ "banned" associativity optimizations (e.g. (a+b)+c cannot be evaluated as a+(b+c) ) , but there are still compilers that offer this, other languages don't have similar bans, and even in the ANSI C case, there are still some pathological optimizer issues. (compiler is still allowed to compute A, B, and C in any order and cache or move the results)
DeanoC said:And wrong again.
As SimonF says the precision is always 80 bit for 'most' operations, the control word precision is used for divides mainly. Long (64 or 80 bit) divides are expensive whereas long multiples aren't (on Intel x86). So you can tell the processor to stop division calculations at the approiate stage (i.e. 23rd bit for floats) but it doesn't change other operations.
This means that depending on when the compiler decides to flush something from floating point register to memory is when the truncation occurs to the length specified.
IA software developer's manual said:The double precision and single precision settings, reduce the size of the significand to 53 bits and 24 bits, respectively. These settings are provided to support the IEEE standard and to allow exact replication of calculations which were done using the lower precision data types. Using these settings nullifies the advantages of the extended-real format?s 64-bit significand length. When reduced precision is specified, the rounding of the significand value clears the unused bits on the right to zeros.
The precision-control bits only affect the results of the following floating-point instructions:
FADD, FADDP, FSUB, FSUBP, FSUBR, FSUBRP, FMUL, FMULP, FDIV, FDIVP, FDIVR, FDIVRP, and FSQRT.