Progression with OpenGL 2.0

We discussed Frame-buffer reads before...it's interesting to see how it's panning out....from the "Issues" section:

7) Is alpha blending programmable?
Fragment shaders can read the contents of the frame buffer at the current location using the built-in
variables gl_FBColor, gl_FBDepth, gl_FBStencil, and gl_FBDatan. Using these facilities,
applications can implement custom algorithms for blending, stencil testing, and the like. However,
these frame buffer read operations may result in a significant reduction in performance, so
applications are strongly encouraged to use the fixed functionality of OpenGL for these operations if
at all possible. The hardware to implement fragment shaders (and vertex shaders) is made a lot
simpler and faster if each fragment can be processed independently both in space and in time. By
allowing read-modify-write operations such as is needed with alpha blending to be done as part of the
fragment processing we have introduced both spatial and temporal relationships. These complicate
the design because of the extremely deep pipelining, caching and memory arbitration necessary for
performance. Methods such as render to texture, copy frame buffer to texture, aux data buffers and
accumulation buffers can do most, if not all, what programmable alpha blending can do. Also the
need for multiple passes has been reduced (or at least abstracted) by the high-level shading language
and the automatic resource management.
RESOLVED on October 12, 2001: Yes, applications can do alpha blending, albeit with possible
performance penalties over using the fixed functionality blending operations.
REOPENED on July 9, 2002: This issue is related to Issue (23) which remains open, so this issue
should also remain open.
Another possibility would be to create an extension that allows more flexibility than the current alpha
blending allows, but would still be considered fixed functionality.
RESOLUTION: Issue 23) is resolved as allowing frame buffer reads, so this is once again resolved
allowing alpha blending, with the caveats listed above.
REOPENED on December 10, 2002. Issue 23 is re-resolved to disallow frame buffer reads.
RESOLUTION: No, applications cannot do alpha blending, because they cannot read alpha.
CLOSED on December 10, 2002.

My favorite part: "Issue 23 is re-resolved " ;) It looks like one of the advents of "programmable and flexible" hardware, is that it makes some things, um, less flexible. :!:
 
Something else...

Should precision hints be supported (e.g., using 16-bit floats or 32-bit floats)?
DISCUSSION: Standardizing on a single data type for computations greatly simplifies the
specification of the language. Even if an implementation is allowed to silently promote a reduced
precision value, a shader may exhibit different behavior if the writer had inadvertently relied on the
clamping or wrapping semantics of the reduced operator. By defining a set of reduced precision
types all we would end up doing is forcing the hardware to implement them to stay compatible.
When writing general programs, programmers have long given up worrying if it is more efficient to
do a calculation in bytes, shorts or longs and we do not want shader writers to believe they have to
concern themselves similarly. The only short term benefit of supporting reduced precision data types
is that it may allow existing hardware to run a subset of shaders more effectively.
This issue is related to Issue (30) and Issue (6 8 ).
RESOLUTION: Performance/space/precision hints and types will not be provided as a standard part
of the langauge, but reserved words for doing so will be.
CLOSED: November 26, 2002.

I presume this means that nvidia's FP16 pipeline can not supported for GL? (Other than through possibly proprietary extensions?0
 
Joe DeFuria said:
I presume this means that nvidia's FP16 pipeline can not supported for GL? (Other than through possibly proprietary extensions?0
I'm not quite sure what these requirements say:
glslang paper said:
As an input value to one of the processing units, a floating-point variable is expected to match the IEEE
single precision floating-point definition for precision and dynamic range. It is not required that the
precision of internal processing match the IEEE floating-point specification for floating-point operations,
but the guidelines for precision established by the OpenGL 1.4 specification must be met.
glspec14 said:
The GL must perform a number of floating-point operations during the course of
its operation. We do not specify how floating-point numbers are to be represented
or how operations on them are to be performed. We require simply that numbers’
floating-point parts contain enough bits and that their exponent fields are large
enough so that individual results of floating-point operations are accurate to about
1 part in 10^5. The maximum representable magnitude of a floating-point number
used to represent positional or normal coordinates must be at least 2^32; the maximum
representable magnitude for colors or texture coordinates must be at least 2^10.
The maximum representable magnitude for all other floating-point values must be
at least 2^32. x0 = 0 x = 0 for any non-infinite and non-NaN x. 1 x = x1 = x.
x+0 = 0+x = x. 00 = 1. (Occasionally further requirements will be specified.)
Most single-precision floating-point formats meet these requirements.
A s10e5 fp16 number should certainly suffice to represent 2^10, but I don't know what 'about 1 part in 10^5' should exactly mean.
 
gokickrocks said:
1 part in 10^5 = .00001

the more 0s between the decimal and 1, the more accurate the calculation will be
Yes, that's obvious, but how apply this to float numbers? Does this mean the next greater number to any given number must be less than 1.00001 times that number? That would mean you need a 17bit mantissa, right?
 
I note that the quote says "about 10^5" without qualifiers such as "minimum", etc, which to me speaks to error accumulation guidelines not an absolute criteria.

But perhaps I should read the pdf for myself in more detail.
 
Xmas said:
gokickrocks said:
1 part in 10^5 = .00001

the more 0s between the decimal and 1, the more accurate the calculation will be
Yes, that's obvious, but how apply this to float numbers? Does this mean the next greater number to any given number must be less than 1.00001 times that number? That would mean you need a 17bit mantissa, right?

Yup.
Speaking for the math only, not GL 2.0, where I'm not qualified.

Basically they seem to say "at least 24-bit fp per colour component". Which would make sense, since if you have a 10 bits per colour at the RAMDAC, a 10 bit mantissa (16-bit fp) ensures that calculational errors will end up on screen, whereas a 17-bit mantissa gives a healthy margin, particularly with proper rounding.

Of course how disturbing the artifacts from 16 bit fp would be, would be situational. After all we lived with far worse, but then again noone ran shader programs using output precision in the calculations in 8-bits per component either.

I feel that not having mechanisms for dealing with precision explicitly is a bit myopic. Contrary to what they are writing, even if the calculations would be equally fast, using suitable precision can give substantial savings in memory traffic over always using largest data type. I can see what they are saying in terms of data type support, but the language doesn't have to be set up so that support of all types is mandatory. Perhaps they feel that when the programmability of GPUs take directions where explicit data type control would be useful, it's time for GL3.

Entropy
 
On the precision issue, I should think it would be simple enough to just implement all data types that are supported by various hardware in the language (INT8, INT12, INT16, FP16, FP32), and just state that when a specific data type is selected for a specific variable, that is the minimum precision that is to be used (with float values and int values related by the mantissa of the float, floats always being considered higher-precision with the same mantissa), or the highest-supported precision if no analog is available.

For example, if a variable is defined at FP16, but a Radeon 9700 is the render target, then FP32 will be used for that storage.

Obviously this could cause some problems with memory management on some video cards, but I think it's a whole lot better than just leaving the interface open for vendor-specific extensions.
 
Chalnoth said:
For example, if a variable is defined at FP16, but a Radeon 9700 is the render target, then FP32 will be used for that storage.

The paper explained the problem with that:

DISCUSSION: Standardizing on a single data type for computations greatly simplifies the
specification of the language. Even if an implementation is allowed to silently promote a reduced
precision value, a shader may exhibit different behavior if the writer had inadvertently relied on the
clamping or wrapping semantics of the reduced operator. By defining a set of reduced precision
types all we would end up doing is forcing the hardware to implement them to stay compatible.
When writing general programs, programmers have long given up worrying if it is more efficient to
do a calculation in bytes, shorts or longs and we do not want shader writers to believe they have to
concern themselves similarly. The only short term benefit of supporting reduced precision data types
is that it may allow existing hardware to run a subset of shaders more effectively.

So, they decided that yes, the obvious benefit is to allow existing hardware to possibly runa set of shaders more "effectively", but the drawbacks of having inadvertant behvior occur because of actually running at different precisions is not worth it.
 
Joe DeFuria said:
So, they decided that yes, the obvious benefit is to allow existing hardware to possibly runa set of shaders more "effectively", but the drawbacks of having inadvertant behvior occur because of actually running at different precisions is not worth it.
I must say that I totally disagree with their position, especially on the point of not supporting 'half floats'. There is simply no 'clamping or wrapping semantics' a programmer could rely on when using floats (because the GL1.4 spec only contains minimum requirements), so why should that be different for half floats?

And even for integers, there's no defined wrapping or clamping behavior, because the spec allows implementations to use floats instead.

Why can't they just say: "Do not rely on clamping or wrapping behavior of datatypes. If you need it, do it yourself (modulo or built-in clamp function)" ?
 
Joe DeFuria said:
So, they decided that yes, the obvious benefit is to allow existing hardware to possibly runa set of shaders more "effectively", but the drawbacks of having inadvertant behvior occur because of actually running at different precisions is not worth it.

Their justification seemed weak to me. They do a million things in OGL for the benefit of hardware implementations, and this one time they decide to (potentially) penalize hardware implementations for a questionable benefit to developers?
 
Oh, on rendering errors for FP16:

Two things:
First, it looks like the GeForce FX, even if FP16 is used in the calculations, will usually use a normal 32-bit framebuffer for the output. I'm not aware of any ability of the FX to output a floating-point buffer.

Secondly, even if the output was 10-bit, the errors are only certain (assuming the last bit is always in error...which is an erroneous assumption since the internal calculation is certainly at higher precision) for the brighter half of the spectrum. Dimmer color values will not show as much error, which is where it counts (our eyes can see banding much more easily at lower brightness levels).
 
Chalnoth said:
Oh, one other thing on rendering errors for FP16:

Two things:
First, it looks like the GeForce FX, even if FP16 is used in the calculations, will usually use a normal 32-bit framebuffer for the output. I'm not aware of any ability of the FX to output a floating-point buffer.

Secondly, even if the output was 10-bit, the errors are only certain (assuming the last bit is always in error...which is an erroneous assumption since the internal calculation is certainly at higher precision) for the brighter half of the spectrum. Dimmer color values will not show as much error, which is where it counts (our eyes can see banding much more easily at lower brightness levels).

I wouldn't be so sanguine. There are a million ways limitations in precision could show up visibly, other than losing precision in the least significant color bits.

I can think of two examples off the bat: the Mandelbrot shader that shows less detail at lower precision; and, Dot3 lighting can look blocky. But I am sure that there are other problems that might arise in shaders not yet conceived.
 
The question is rather whether data type size should be visible to the programmer, and under programmer control to the extent of the abilities of the hardware.

It's not necessarily a good idea to make certain data formats mandatory (not necessarily a bad idea either, IEEE FP has arguably been a boon), but I fail to see that removing this from programmer control is a giant leap for mankind. If your hardware and problem can use lower precision and improve both calculational speed and bandwidth requirements, that would seem useful. As Chalnoth also pointed out, it's not as if 16-bit FP would always yield unacceptable results.

Entropy
 
Remember that these intermediate values can be used as the texture coordinate for a dereferenced texture lookup...
 
Yea like Dio said, dereferenced texture lookups...put that in ya pipe(line) and smoke it baby!

It's at times like these I wish I knew what you guys were talking about! :oops:

Edit: ;)
 
Back
Top