Gamma-corrected FSAA - what is it?

Dark Helmet

Newcomer
Could someone point me to a white-paper or beyond3d thread that explains what gamma-corrected FSAA is (algorithmically, what's going on)?

I realize that gamma-space textures, gamma-to-linear texture reads, linear space frag shader math, linear frame buffer, and linear-to-gamma translation in the DAC are the in-vogue ways of doing things. However, at gammas of ~3 (typical for us), this is unworkable due to the 8-bit color quantization on frame buffer write which occurs before gamma correction. The color banding is hideous.

Just curious if gamma-correct FSAA is the solution; that is (I hope), gamma-space frame buffer samples, and the card does read, gamma->linear, blend, linear->gamma, write when FSAA blending (in floating point), and 8-bit-source DAC gamma is tossed in the garbage where it belongs.
 
"gamma-corrected" isn't quite the right word. Currently, most games render in a non-linear color space, or some mish-mash with incorrect linear blending, or just don't care at all. Which is because of the quantization, 8 bit per channel textures and framebuffers are just not enough for a linear color space.

What "gamma-corrected" AA (more precise, the AA of R300) does, is take each AA sample color, linearize it, blend all samples of a pixel linearly, de-linearize the result. R300 does this with a fixed 2.2 gamma value.

After the blending, the DAC gamma ramp is still used.


The "right" way however, would be to do all color calculations in linear color space, and only on final output use the DAC gamma ramp to adjust the values to the non-linear response curve of the display device used. When you do this, "gamma-corrected" AA is actually The Wrong Thing(tm).
 
And once again, I agree with Xmas on some topic that aren't often spoken about here.

One thing though. If you want to keep it in linear space you need a floating point frame buffer, and preferably floating point textures. I think RGBe would be good enough to be better than current solutions, otherwise go for FP16.

I think R9G9B9e5 would be better than R8G8B8e8. Even though the latter seems to be moore of an standard. AFAIK, none of them are available in any gfx cards though.

It might be hard to make a highly compressed HDR texture, so textures in non-linear light can still be uesfull, but should be converted to linear space before filtering.
 
Basic said:
I think R9G9B9e5 would be better than R8G8B8e8. Even though the latter seems to be moore of an standard. AFAIK, none of them are available in any gfx cards though..

Given that the eye/brain can detect differences in intensity of about 1%, I would think that even more fractional bits would still be useful <shrug>
 
Xmas said:
What ... the AA of R300 ... does, is take each AA sample color, linearize it, blend all samples of a pixel linearly, de-linearize the result. R300 does this with a fixed 2.2 gamma value.
Thanks for clarifying this. That's just what we need except we want a different gamma value. Hopefully we'll see this in upcoming hardware.

Basic said:
If you want to keep it in linear space you need a floating point frame buffer, and preferably floating point textures. I think RGBe would be good enough to be better than current solutions, otherwise go for FP16.
The problem is that multisampling is a requirement and AFAIK, no vendors support multi-sampled floating-point buffers yet. Is this correct?

In fact, for now 8-8-8-8 is the limit for multisample frame buffers, isn't it?
 
I agree that more bits can be useful, especially with multiple layers blended. But I tried to stay within 32 bits. So one bit better per component than current standard (R8G8B8X8) plus a better range for HDR seemed reasonable.

Or did you mean that you'd prefer something like R9G10B9e4 or R10G10B10e2?

Heh, maybe we should go berserk with the blocking, and make quad-blocks? 4*(R10G10B10)+e8 But I'd think that'd be pushing it too far. :D


Dark Helmet:
Yes, I think that's correct on both accounts.
 
The basic idea behind gamma correct FSAA is that x^n + y^n is not equal to (x+y)^n. One way that you can see this effect, and it's used by color calibration utilities, is that if you have a dithered image that is 50% black and 50% red, it should appear to be the exact same brightness as the 50% red color. This is often not the case with default settings.

The problems with non-gamma-corrected FSAA crop up mostly when viewing lines or wireframe images. A rendered line will appear "dashed" without proper gamma correction. With proper gamma correction it should appear like a solid line.
 
Basic said:
I agree that more bits can be useful, especially with multiple layers blended. ... Or did you mean that you'd prefer something like R9G10B9e4 or R10G10B10e2?
No, just more bits. No exponent needed. But it has to multisample and alpha blend correctly.
Dark Helmet: Yes, I think that's correct on both accounts.
Ok, thanks.
 
Dark Helmet:
What I said above was a reply to what you wrote to me.

Now, about what you wrote to Xmas:
Yes. If you're forced to use a R8G8B8X8 frame buffer, it's best to have it in nonlinear format. And then you need a gamma corrected AA.

BUT, to make it correct, you also need a linear->nonlinear conversion at the end of the pixel shader (which I doubt anyone does). You also need a nonlinear->linear conversion on textures before filtering, which no current hardware support.


And your last post:
I just think it would be better use of the bits to have a linear framebuffer with block exponent, instead of a nonlinear buffer without exponent. You can skip a lot of gamma conversions with a linear framebuffer (btw, AFAIK no current hardware does gamma correct alpha blend).
And as a bonus, the block exponent will make it HDR capable.
 
The R3xx perform a linear -> Gamma space conversion after all shading operations and before writing to the 8888 or whatever limited precision frame buffer format the application specifies. The linear to gamma conversion is, in fact, a compression for all pratical purposes, since the internal linear format is FP24 -- It allows for an "equivalent" of ~12b precision in linear format.

What the R3xx also does is when AA needs to be resolved (i.e. combine the fragments back into pixels for display), it reads the 8b/component, removes the gamma correction while increasing the internal linear storage back up, does the combine, then performs again the gamma correction compression before final pixel storage.

There is a final gamma adjustement available in the display, which is what is controlled by the app/user.

This is the best that can be done with a 8888 frame buffer, and is as "mathematically" correct as can be done. This can be seen best with lines, but shows up in all AA edges. Just merging formats leads to the wrong values being stored in the FB (yes, it's a^n + b^b != (a+b)^n issue).

Using a large FB format, such as FP24 or FP32, would allow us to store linear pixels and avoid the gamma altogether. For FP16, some amount of gamma is still needed, since the mantissa precision is still not enough.
 
Basic said:
And your last post:
I just think it would be better use of the bits to have a linear framebuffer with block exponent, instead of a nonlinear buffer without exponent. You can skip a lot of gamma conversions with a linear framebuffer (btw, AFAIK no current hardware does gamma correct alpha blend).
And as a bonus, the block exponent will make it HDR capable.
Except wouldn't block exponent remove alpha in the framebuffer?

Why not just go FP16 for the framebuffer, request that games do all calculations in linear space (perhaps even with canned routines to adjust the gamma of textures at loadtime), and use the gamma adjust that every video card supports at scanout?
 
sireric said:
Using a large FB format, such as FP24 or FP32, would allow us to store linear pixels and avoid the gamma altogether. For FP16, some amount of gamma is still needed, since the mantissa precision is still not enough.
Why is the mantissa precision in FP16 still not enough (11 bits, effectively)?
 
Last time I checked, the required conversion of 8b after de-gamma is on the order of 12 to 13b, so 11b mantissa, while "better", probably still needs some amount of compression to match the quality now in all cases.
 
sireric said:
Last time I checked, the required conversion of 8b after de-gamma is on the order of 12 to 13b, so 11b mantissa, while "better", probably still needs some amount of compression to match the quality now in all cases.
But if that's integer precision, FP16 should be enough. From what I remember, the reason you need more precision is for higher precision in the darker range. A floating-point format should have equal accuracy over all brightness ranges, and so there shouldn't be a problem.
 
That's a good point (need more precision at <0.5). I could be convinced that FP16 is just about enough. Certainly much, much better than 8b linear.
 
Chalnoth:
Yes, the destination alpha is lost. How often is it used? I know it can be used, but I've got the impression that it's rarely put in work.


sireric:
That's interesting info. I thought I remembered a discussion here long time ago where someone said the only gamma correction done were in the downsampling, the gamma ramp from the driver panel, or if you explicitly made any conversions in PS. And nobody objected. I'm glad you corrected that misbelief.

Does this mean that R300 can do gamma correct alpha blend too? And gamma correction on the right side of texture filtering?


But I don't agree with what you say about how many "equvalent" bits you get from sRGB.
Yes, it's equivalent to ~11.1 bit for integers (if it's done like this, more without the linear part) . But it's only equivalent to ~7.4 bit mantissa in floating point.

I'd say R9G9B9e5 would be much better (when you can skip alpha). FP16 even more so, but at double the cost.
 
R3xx can do sRGB gamma or linear before the FB, and it can do another gamma pass in the display.

Yes, R3xx can perform de-gamma on the texture, but it's post filtering, regretfully (though it saved the 4x HW cost).

On blending, R3xx only exposes gamma space blending, or linear blending, not linear followed by gamma. Not sure if it's possible in DX9 to have it any other way.

I haven't calculated the FP bits. I was just thinking of the linear integer bits required (which I remember was at least 12b). You may very well be correct that it's as low as 8b in FP (since the precision is required nearer to 0).
 
Thanks again for the info.

Regarding blending, I read the DX doc like this:
Let's say we want to just accumulate the PS value (p) to the frame buffer value (f).
P and f are the "raw" values stored.

You may disable all gamma stuff to get the operation:
f = f + p

Or if the HW supports it, you can enable sRGB write to idealy get:
f = linearToGamma(gammaToLinear(f) + p)

But they also allow the approximate/incorrect sRGB write calculation:
f = f + linearToGamma(p)

Which of these would you call "gamma space blending", "linear blending", and "linear followed by gamma"?

Or did you mean something else with "linear followed by gamma"?
 
a+b : Linear : R300 exposes this
g(ig(a)+b) : Linear - gamma : R300 does not expose this
g(a) + b: Gamma : R300 exposes this
 
Basic said:
I'd say R9G9B9e5 would be much better (when you can skip alpha). FP16 even more so, but at double the cost.
Since we're talking about some ethereal future game that would implement this, I claim that games will become less bandwidth-limited as time goes on, due to use of longer shaders with more math and less texture accessing (less per clock: more total), so a FP16 framebuffer shouldn't be much of a performance hit going into the future (provided all of the same compression routines are put into use as with integer framebuffers).
 
Back
Top