D3D Double Precision

Jawed · Nov 15, 2006

I haven't seen any official documents talk about implementing double-precision in D3D. Or, I've forgotten that I've seen them

Anyway, I'm curious when it will happen and what it'll mean:

what's the motivation for implementing double-precision? is it for graphics or is it for "other things" (e.g. GPGPU)?
can we expect the next revision to D3D, 10.1, to require DP?
will DP apply to both floating point and integer?

Jawed

AlexV · Nov 15, 2006

I think it would be useful mostly for GPGPU stuff, as I can`t fathom where you`d actually run into precision limits under normal conditions involving 3D rendering...you`d have to have a really complex thing going on, with errors accumulating like heck.

OTOH, I don`t know how near double-precision is...I think the silicon real-estate necessary is a little out of grasp, or at least better spent elsewhere for the moment. But all of the above are IMHO, so dear nV and ATi, feel free to prove me wrong

Rys · Nov 15, 2006

Jawed said:
can we expect the next revision to D3D, 10.1, to require DP?

Double precision floats for any shading calculation or output format are not part of the base spec for 10.1.

Andrew Lauritzen · Nov 15, 2006

Morgoth the Dark Enemy said:
... as I can`t fathom where you`d actually run into precision limits under normal conditions involving 3D rendering...

Summed area tables

Note that double precision can be emulated to some extent using single precision though, so it's not a crippling problem.

stevem · Nov 16, 2006

Jawed said:
Anyway, I'm curious when it will happen and what it'll mean:

Next gen HW, apparently. G90/R700 timeframe. Should coincide with the increase in mainstream GPGPU usage. There was a scallywag telling me R600 supported DP.

Jawed · Nov 16, 2006

stevem said:
There was a scallywag telling me R600 supported DP.

OK, stupid question time: if you build a 64-bit ALU, does that mean it can also do int32 stuff?

If that worked is it much more costly than implementing a pipeline with two separate ALUs side-by-side: fp32 + int32?

Are the G80 multifunctional (fp + int) ALUs most of the way to supporting fp64, because common support for fp32/int32 requires widening the fp portion?

Jawed

arjan de lumens · Nov 16, 2006

Int32 ALUs are generally very cheap, except for Multiply/Divide operations; for multiply, since DX10 does not appear to support widening multiply, there is no need to have a complete 32-bit multiplier just for Int32 support, and Integer Divide is not something that I would expect to need to be very fast. As such, there is not really much benefit at all to gain from having int32 support if your ultimate goal is to add FP64 support.

Jawed · Nov 16, 2006

arjan de lumens said:
Int32 ALUs are generally very cheap, except for Multiply/Divide operations; for multiply, since DX10 does not appear to support widening multiply, there is no need to have a complete 32-bit multiplier just for Int32 support,

By this do you mean, D3D10 allows a loss of precision when two large (in significant digits) integers are multiplied?

If so, would this be a bar to GPGPU use of Int32? Would the GPGPU guys care?

Has anyone tested the precision of Int32 multiplies in G80? Presumably this precision issue can only be detected in a composite instruction, such as MAD, since even with widening, after a MUL when clamping the final result to the int32 range, there'll be an identical loss of precision.

Jawed

CouldntResist · Nov 16, 2006

According to NV_gpu_program4 spec, the MUL instruction can have "HI" modifier, which makes it compute higher half of 64-bit result of Int32 multiply.

Reverend · Nov 16, 2006

Jawed said:
I haven't seen any official documents talk about implementing double-precision in D3D.

There's a good reason for why this hasn't been "hyped", which could be why you haven't "seen" such documentations. Can you guess why?

can we expect the next revision to D3D, 10.1, to require DP?

Rys clarified this (correctly) re 10.1 pertaining to your hope. I doubt even MS and all DX-shaping-participants really want this now. Or even in DX11 (and I'll be shocked if this is in DX11 as a requirement).

It's just too soon because it's a waste of, or not the best use of, resourses now or in the foreseeable future.

No one I know wants it now or in the foreseeable future unless there's some drastic improvement.

Gubbi · Nov 16, 2006

Jawed said:
OK, stupid question time: if you build a 64-bit ALU, does that mean it can also do int32 stuff?

If that worked is it much more costly than implementing a pipeline with two separate ALUs side-by-side: fp32 + int32?

Adds might be done in seperate physical units, but I highly doubt multiplies are. I'd expect one big unit that could accept either fp32 (8bit exponent, 1 bit sign, 23 (24) bit mantissa) or int 32. Internally the unit would have a 32x32->64 multiplier array and just handle int32 as denormalized floating point numbers. So the logic complexity for the multiplier would be equal that of a 40bit fp unit.

Doubles in IEEE 754 format has 11 exponent bits, 1 sign and 52 mantissa bits (53 with the implicit leading 1). So logic complexity would be 52x52/32x32 or 2.64 times as great (164% more logic). It would still be able to handle both fp32 and int32.

Cheers

arjan de lumens · Nov 16, 2006

Jawed said:
By this do you mean, D3D10 allows a loss of precision when two large (in significant digits) integers are multiplied?

If so, would this be a bar to GPGPU use of Int32? Would the GPGPU guys care?

Has anyone tested the precision of Int32 multiplies in G80? Presumably this precision issue can only be detected in a composite instruction, such as MAD, since even with widening, after a MUL when clamping the final result to the int32 range, there'll be an identical loss of precision.

Jawed

When you multiply two N-bit integers, the result is in general a number with 2N bits (so that multiplying two 32-bit numbers results in a single 64-bit number). The common practice with integer types is then to chop off the top N bits and return to the programmer only the bottom N bits; this is not very likely to be problematic for the GPGU crowd, as most programming languages for ordinary CPU programming do just that. (In fact, I have yet to see any programming language at all other than straight assembly that does not actually just chop off the high-order bits.). Having access to the top N bits is in general useful only in a limited number of cases (arbitrary-precision arithmetic, as well as a special trick that compilers use to do fast integer divide by constant), and as such, having a complete widening-multiplier in every ALU is IMO highly wasteful.

arjan de lumens · Nov 16, 2006

CouldntResist said:
According to NV_gpu_program4 spec, the MUL instruction can have "HI" modifier, which makes it compute higher half of 64-bit result of Int32 multiply.

Interestingly, according to the same spec, the MUL instruction can also have the modifiers "S24" and "U24", which enable "fast" 24-bit integer multiplications - presumably implying that the full 32-bit multiply is NOT fast.

Simon F · Nov 16, 2006

arjan de lumens said:
(In fact, I have yet to see any programming language at all other than straight assembly that does not actually just chop off the high-order bits.).

Python?

Gubbi said:
Doubles in IEEE 754 format has 11 exponent bits, 1 sign and 52 mantissa bits (53 with the implicit leading 1). So logic complexity would be 52x52/32x32 or 2.64 times as great (164% more logic). It would still be able to handle both fp32 and int32.

It's probably worse than just M^2/N^2. To maintain clock rate with the larger multiplier, you'd have to introduce more pipeline stages/registers.

Xmas · Nov 16, 2006

Gubbi said:
Internally the unit would have a 32x32->64 multiplier array and just handle int32 as denormalized floating point numbers.

I think the full multiplier array would be a bit wasteful as you never need a full 64 bit result in D3D10.

arjan de lumens said:
Interestingly, according to the same spec, the MUL instruction can also have the modifiers "S24" and "U24", which enable "fast" 24-bit integer multiplications - presumably implying that the full 32-bit multiply is NOT fast.

I think the most reasonable explanation for this is that the secondary MUL unit is not capable of full 32-bit integer multiplies.

arjan de lumens · Nov 16, 2006

Simon F said:
Python?

Oh. OK. (although the solution that Python uses - which is to magically change the type of the variable from "ordinary integer" to "arbitrary-precision integer" upon overflow - is very compiler-unfriendly.)

Gubbi · Nov 16, 2006

Xmas said:
I think the full multiplier array would be a bit wasteful as you never need a full 64 bit result in D3D10.

Good point. You'd only really need 48 bits to normalize and round fp32 results. And as Arjan mentions, int results are usually chopped and only delivers the bottom 32bits.

Cheers

Simon F · Nov 16, 2006

arjan said:
Interestingly, according to the same spec, the MUL instruction can also have the modifiers "S24" and "U24", which enable "fast" 24-bit integer multiplications - presumably implying that the full 32-bit multiply is NOT fast.
I think the most reasonable explanation for this is that the secondary MUL unit is not capable of full 32-bit integer multiplies.

Xmas said:
I think the most reasonable explanation for this is that the secondary MUL unit is not capable of full 32-bit integer multiplies.

Implying that it's actually one that has been "borrowed" from an idle FP32 multiplier?

nAo · Nov 16, 2006

Simon F said:
Implying that it's actually one that has been "borrowed" from an idle FP32 multiplier?

Yep, it's likely it's been borrowed from the SF/interpolators unit.

Blazkowicz · Nov 16, 2006

Rys said:
Double precision floats for any shading calculation or output format are not part of the base spec for 10.1.

isn't G80 DX 10,1, or almost? (I remember DX 10.1 mandates orthogonal 4x MSAA, not sure if there's something else, or that looked less significant..)

D3D Double Precision

Jawed

AlexV

Heteroscedasticitate

Rys

Graphics @ AMD

Andrew Lauritzen

Moderator

stevem

Jawed

arjan de lumens

Jawed

CouldntResist

Reverend

Gubbi

arjan de lumens

arjan de lumens

Simon F

Tea maker

Xmas

Porous

arjan de lumens

Gubbi

Simon F

Tea maker

nAo

Nutella Nutellae

Blazkowicz

Similar threads