Carry bits in GPUs?

I was wondering, do GPUs have a carry bit much like CPUs do? If they did, could the drivers use this bit to detect whether or not they can legitimately use partial precision?

I'd imagine this might cause problems, or be difficult if it where to try and do this on the fly, but if perhaps Nvidia created a tool for developers to use that profiled their shaders, textures, etc and either optimized it for them, or rather showed them where they could use PP it might be of some use. It would high light areas that could legitimately use the lower precision (for the given source material).

The only real problem I can come up with is, what happens if a Mod developer uses the game's shaders but provides new source art, which when combined requires a higher precision. This would require the mod developer to run the profiler as well. While this likely wouldn't be a big problem, but just the only down side I could think of.

Anyone else have any thoughts on the matter?
 
Killer-Kris said:
I was wondering, do GPUs have a carry bit much like CPUs do? If they did, could the drivers use this bit to detect whether or not they can legitimately use partial precision?
CPUs have carry bits in the integer portion of the chip, but it doesn't make sense in the floating point portion (that's what INF and things are for). Similarly, since GPUs are floating point in the shaders, I don't see a use for carry bits.
 
OpenGL guy said:
Killer-Kris said:
I was wondering, do GPUs have a carry bit much like CPUs do? If they did, could the drivers use this bit to detect whether or not they can legitimately use partial precision?
CPUs have carry bits in the integer portion of the chip, but it doesn't make sense in the floating point portion (that's what INF and things are for). Similarly, since GPUs are floating point in the shaders, I don't see a use for carry bits.

Ahh my bad, I only have experience writing assembly code for simple risc chips which only have integer.

Though I do believe the concept is valid, even for floating point you should be able to detect when you have a carry or just a general over flow and set an appropriate bit. You could then use this bit to detect when you would need a higher precision. Or is there something inherent in how floats work that I'm missing which makes this impractical?
 
Killer-Kris said:
Though I do believe the concept is valid, even for floating point you should be able to detect when you have a carry or just a general over flow and set an appropriate bit. You could then use this bit to detect when you would need a higher precision. Or is there something inherent in how floats work that I'm missing which makes this impractical?

They do detect carry!
When there's a carry the exponent is incremented and the mantissa is shifted down 1 bit.
 
Hyp-X said:
Killer-Kris said:
Though I do believe the concept is valid, even for floating point you should be able to detect when you have a carry or just a general over flow and set an appropriate bit. You could then use this bit to detect when you would need a higher precision. Or is there something inherent in how floats work that I'm missing which makes this impractical?

They do detect carry!
When there's a carry the exponent is incremented and the mantissa is shifted down 1 bit.

So now I go back to my original question. Why doesn't nvidia create a tool to help developers automate the process of optimizing their shaders. Much like you run a profiler on CPU software; run it with your data set, then recompile using the profile data. How come things like this are not happening for GPUs (well at least the ones with partial precision capabilities)?
 
Hyp-X said:
Killer-Kris said:
Though I do believe the concept is valid, even for floating point you should be able to detect when you have a carry or just a general over flow and set an appropriate bit. You could then use this bit to detect when you would need a higher precision. Or is there something inherent in how floats work that I'm missing which makes this impractical?

They do detect carry!
When there's a carry the exponent is incremented and the mantissa is shifted down 1 bit.
That's not really the same at all, else 0x10 + 0x10 = 0x20 would generate a carry.
 
Killer-Kris said:
So now I go back to my original question. Why doesn't nvidia create a tool to help developers automate the process of optimizing their shaders. Much like you run a profiler on CPU software; run it with your data set, then recompile using the profile data. How come things like this are not happening for GPUs (well at least the ones with partial precision capabilities)?

Ok, now I see you don't get how FP numbers are working.

They have precision relative to their range.
Integers (or fixed point numbers) have fixed precision.

FP has huge range (trough the use of the exponent)
Integers (or fixed point numbers) have limited range.

They are behaving completely differently.
What I described as carry in my post is actually when the range changes and it affects the precision.

Eg. 0.8 + 0.8 = 1.6 it makes the exponent to increase. That makes you to lose some bit at the bottom. Does it mean that one should have used higher precision? But there's no FP precision where these doesn't occur you can use FP16 or FP80 you still lose the last bit on this operation.
And none of the (binary) FP formats can represent 0.8 precisely...
 
It just occured to me that by profiling you might mean running the shaders at different precision and automatically comparing the results.

Unfortunatly once non-linear operations are used in the shaders - automatic IQ analyzing is futile.
 
For the predictable non-linear operations you can still compute bounds and distributions on the errors ... the problem is dependent texture reads.
 
Hyp-X said:
Eg. 0.8 + 0.8 = 1.6 it makes the exponent to increase. That makes you to lose some bit at the bottom. Does it mean that one should have used higher precision? But there's no FP precision where these doesn't occur you can use FP16 or FP80 you still lose the last bit on this operation.
And none of the (binary) FP formats can represent 0.8 precisely...

I suppose, maybe carry bit was not the correct, term to use. I suppose overflow would have been a better choice.

So let me set up a hypothetical situation, lets say we have an architecture that provides FP8 (s3e4) and FP16 (s7e8). FP8 would be adequate to perform the example you gave of 0.8+0.8 = 1.6 right? So now what if it where 7.5 + 3.0 = 10.5 which FP8 cannot represent? From my understanding FP8 is no longer adequate and we will have an overflow. In this case we would have wanted to use FP16, correct?

Now I suppose the next question (if any of the above is correct), is this the sort of thing that is happening when we commonly talk about FP16 or FP24 not being adequate versus FP32 for 3D rendering? Or is the problem more commonly we just need more fractional precision? If it's the second, then I suppose my idea of a profiling program would not do much good. Though if it were the first, it would seem straight forward to correct the problem by detecting the overflow, alert the developer, and hopefully they remove the partial precision hint.

Hyp-X said:
It just occured to me that by profiling you might mean running the shaders at different precision and automatically comparing the results.

Unfortunatly once non-linear operations are used in the shaders - automatic IQ analyzing is futile.

Actually not quite what I was thinking. Though from our discussion it seems like my idea is only applicable to the few instances where you actually overflow your precision, versus when you just need more fractional precision.
 
Killer-Kris said:
So let me set up a hypothetical situation, lets say we have an architecture that provides FP8 (s3e4) and FP16 (s7e8).

Ok.

FP8 would be adequate to perform the example you gave of 0.8+0.8 = 1.6 right?

I made a bad example bacause it's essentially *2 and *2 is always exact.
Altough you may consider that the closest represatation of 0.8 is

FP8 : 0x2A [0010 1010] which is 0.8125
FP16 : 0x3E9A [0011 1110 1001 1010] which is 0.80078125

so how do you decide which is "adequate"?

So now what if it where 7.5 + 3.0 = 10.5 which FP8 cannot represent?

Why?
7.5 is 0x5E [0101 1110]
3.0 is 0x48 [0100 1000]
10.5 is 0x65 [0110 0101]

unlike my example these are all exact.
 
Overflow is about lacking range.
In other words, when the number is too large to fit into the representation
This is a problem you always must be aware of with integers, but it's usually a non-issue with floats. Floats have a huge range (it's quite large even for FP16). And if you manage to overflow it, the float gets an "inf" value (works in CPUs and GPUs). So the overflow detection is already there, but overflow isn't the problem.

All problems seen with FP16 are about precision.


Btw, FP16 is s5e10 in GPUs, not s7e8.
 
Back
Top