D3D Double Precision

Jawed

Legend
I haven't seen any official documents talk about implementing double-precision in D3D. Or, I've forgotten that I've seen them :oops:

Anyway, I'm curious when it will happen and what it'll mean:
  • what's the motivation for implementing double-precision? is it for graphics or is it for "other things" (e.g. GPGPU)?
  • can we expect the next revision to D3D, 10.1, to require DP?
  • will DP apply to both floating point and integer?
Jawed
 
I think it would be useful mostly for GPGPU stuff, as I can`t fathom where you`d actually run into precision limits under normal conditions involving 3D rendering...you`d have to have a really complex thing going on, with errors accumulating like heck.

OTOH, I don`t know how near double-precision is...I think the silicon real-estate necessary is a little out of grasp, or at least better spent elsewhere for the moment. But all of the above are IMHO, so dear nV and ATi, feel free to prove me wrong:)
 
  • can we expect the next revision to D3D, 10.1, to require DP?
Double precision floats for any shading calculation or output format are not part of the base spec for 10.1.
 
There was a scallywag telling me R600 supported DP.
:oops:

OK, stupid question time: if you build a 64-bit ALU, does that mean it can also do int32 stuff?

If that worked is it much more costly than implementing a pipeline with two separate ALUs side-by-side: fp32 + int32?

Are the G80 multifunctional (fp + int) ALUs most of the way to supporting fp64, because common support for fp32/int32 requires widening the fp portion?

Jawed
 
Int32 ALUs are generally very cheap, except for Multiply/Divide operations; for multiply, since DX10 does not appear to support widening multiply, there is no need to have a complete 32-bit multiplier just for Int32 support, and Integer Divide is not something that I would expect to need to be very fast. As such, there is not really much benefit at all to gain from having int32 support if your ultimate goal is to add FP64 support.
 
Int32 ALUs are generally very cheap, except for Multiply/Divide operations; for multiply, since DX10 does not appear to support widening multiply, there is no need to have a complete 32-bit multiplier just for Int32 support,
By this do you mean, D3D10 allows a loss of precision when two large (in significant digits) integers are multiplied?

If so, would this be a bar to GPGPU use of Int32? Would the GPGPU guys care?

Has anyone tested the precision of Int32 multiplies in G80? Presumably this precision issue can only be detected in a composite instruction, such as MAD, since even with widening, after a MUL when clamping the final result to the int32 range, there'll be an identical loss of precision.

Jawed
 
Last edited by a moderator:
According to NV_gpu_program4 spec, the MUL instruction can have "HI" modifier, which makes it compute higher half of 64-bit result of Int32 multiply.
 
I haven't seen any official documents talk about implementing double-precision in D3D.
There's a good reason for why this hasn't been "hyped", which could be why you haven't "seen" such documentations. Can you guess why? :)

can we expect the next revision to D3D, 10.1, to require DP?
Rys clarified this (correctly) re 10.1 pertaining to your hope. I doubt even MS and all DX-shaping-participants really want this now. Or even in DX11 (and I'll be shocked if this is in DX11 as a requirement).

It's just too soon because it's a waste of, or not the best use of, resourses now or in the foreseeable future.

No one I know wants it now or in the foreseeable future unless there's some drastic improvement.
 
:oops:

OK, stupid question time: if you build a 64-bit ALU, does that mean it can also do int32 stuff?

If that worked is it much more costly than implementing a pipeline with two separate ALUs side-by-side: fp32 + int32?

Adds might be done in seperate physical units, but I highly doubt multiplies are. I'd expect one big unit that could accept either fp32 (8bit exponent, 1 bit sign, 23 (24) bit mantissa) or int 32. Internally the unit would have a 32x32->64 multiplier array and just handle int32 as denormalized floating point numbers. So the logic complexity for the multiplier would be equal that of a 40bit fp unit.

Doubles in IEEE 754 format has 11 exponent bits, 1 sign and 52 mantissa bits (53 with the implicit leading 1). So logic complexity would be 52x52/32x32 or 2.64 times as great (164% more logic). It would still be able to handle both fp32 and int32.

Cheers
 
By this do you mean, D3D10 allows a loss of precision when two large (in significant digits) integers are multiplied?

If so, would this be a bar to GPGPU use of Int32? Would the GPGPU guys care?

Has anyone tested the precision of Int32 multiplies in G80? Presumably this precision issue can only be detected in a composite instruction, such as MAD, since even with widening, after a MUL when clamping the final result to the int32 range, there'll be an identical loss of precision.

Jawed
When you multiply two N-bit integers, the result is in general a number with 2N bits (so that multiplying two 32-bit numbers results in a single 64-bit number). The common practice with integer types is then to chop off the top N bits and return to the programmer only the bottom N bits; this is not very likely to be problematic for the GPGU crowd, as most programming languages for ordinary CPU programming do just that. (In fact, I have yet to see any programming language at all other than straight assembly that does not actually just chop off the high-order bits.). Having access to the top N bits is in general useful only in a limited number of cases (arbitrary-precision arithmetic, as well as a special trick that compilers use to do fast integer divide by constant), and as such, having a complete widening-multiplier in every ALU is IMO highly wasteful.
 
According to NV_gpu_program4 spec, the MUL instruction can have "HI" modifier, which makes it compute higher half of 64-bit result of Int32 multiply.

Interestingly, according to the same spec, the MUL instruction can also have the modifiers "S24" and "U24", which enable "fast" 24-bit integer multiplications - presumably implying that the full 32-bit multiply is NOT fast.
 
(In fact, I have yet to see any programming language at all other than straight assembly that does not actually just chop off the high-order bits.).
Python?

Doubles in IEEE 754 format has 11 exponent bits, 1 sign and 52 mantissa bits (53 with the implicit leading 1). So logic complexity would be 52x52/32x32 or 2.64 times as great (164% more logic). It would still be able to handle both fp32 and int32.
It's probably worse than just M^2/N^2. To maintain clock rate with the larger multiplier, you'd have to introduce more pipeline stages/registers.
 
Internally the unit would have a 32x32->64 multiplier array and just handle int32 as denormalized floating point numbers.
I think the full multiplier array would be a bit wasteful as you never need a full 64 bit result in D3D10.

Interestingly, according to the same spec, the MUL instruction can also have the modifiers "S24" and "U24", which enable "fast" 24-bit integer multiplications - presumably implying that the full 32-bit multiply is NOT fast.
I think the most reasonable explanation for this is that the secondary MUL unit is not capable of full 32-bit integer multiplies.
 
Last edited by a moderator:
I think the full multiplier array would be a bit wasteful as you never need a full 64 bit result in D3D10.

Good point. You'd only really need 48 bits to normalize and round fp32 results. And as Arjan mentions, int results are usually chopped and only delivers the bottom 32bits.

Cheers
 
arjan said:
Interestingly, according to the same spec, the MUL instruction can also have the modifiers "S24" and "U24", which enable "fast" 24-bit integer multiplications - presumably implying that the full 32-bit multiply is NOT fast.
I think the most reasonable explanation for this is that the secondary MUL unit is not capable of full 32-bit integer multiplies.
I think the most reasonable explanation for this is that the secondary MUL unit is not capable of full 32-bit integer multiplies.
Implying that it's actually one that has been "borrowed" from an idle FP32 multiplier?
 
Double precision floats for any shading calculation or output format are not part of the base spec for 10.1.


isn't G80 DX 10,1, or almost? (I remember DX 10.1 mandates orthogonal 4x MSAA, not sure if there's something else, or that looked less significant..)
 
Back
Top