HDR comparison, I16 vs FP16

IMHO, if R520 not support HW FP16 filtering, then it will lost in next round .
 
Last edited by a moderator:
cho said:
IMHO, if R520 not support HW FP16 filtering, then it will lost in next round .

There's no indication so far that it won't; rather the contrary.
 
It won't filter FP textures, but then again neither does Xenos which will have a large influence on next gen titles. From the dev's we've asked so far they aren't using it to any great degree at the moment as FP textures are more often used as lookups.
 
Dave Baumann said:
It won't filter FP textures, but then again neither does Xenos which will have a large influence on next gen titles. From the dev's we've asked so far they aren't using it to any great degree at the moment as FP textures are more often used as lookups.

So what's FP filtering good for then?
 
FP filtering can be used as a performance optimization for the tonemapping pass, and also allows the use of HDR textures (though I don't know how often HDR textures will be used).
 
Chalnoth said:
FP filtering can be used as a performance optimization for the tonemapping pass, and also allows the use of HDR textures (though I don't know how often HDR textures will be used).

So FP filtering wasn't critical to the support of HDR on NV40 then?
 
Blending is the critical component. I'm not sure just how important filtering is, actually. It might be important for, for example, projected lighting, but we'll have to see. Having support for FP filtering is definitely better than not....I'm just not certain how important it is.
 
Humus said:
Current games don't use selective supersampling either. And if games aren't going to spend spare GPU cycles to eliminate aliasing, then how is it "damn useful" then?

I was speaking of the driver utilizing selective supersampling on older games, which can be done ala TSAA style heuristics, or using profiles. Damn useful to owners of current titles if you ask me.

BTW, I like your proposed integer HDR technique, your RGBE format is similar to the technique I proposed in the Console Forums as a superior format to ATI's new "FP10" format in Xenos. Share one exponent amongst all 3 components, so instead of 7e3:7e3:7e3:2 we have 8:8:8:e6:2 or 9:9:9:e3:2 Still, I would argue that having full blown FP buffers with full precision not only makes development much easier, but enables some techniques that may not be efficiently realizable with integers.
 
Chalnoth said:
FP filtering can be used as a performance optimization for the tonemapping pass, and also allows the use of HDR textures (though I don't know how often HDR textures will be used).
Well whats the point of having HDR if you can't even have HDR lighting for all those light mapped games say like HL2 or far cry ect. I guess will stick with LDR sky maps too.
 
Well, you don't actually have to use lightmaps for lighting, though. But yes, if you want to use them and HDR, FP filtering would be a great boon.
 
Okay here is a worest case scenario for the humus texture storage thingy I'm using LDR images which makes it even more nasty :p ( and since trying to deal with more then 0.0-1.0 is painful and I didn't want to tonemap ect) so my max = 1.0. This is a 1024x1024 checker box image with instead of black we have R1G0B0 and white is R255G255B255. I have artifically set the mip map bias 5 mip maps back using linear mipmap and linear magnicifaction. Essentially below it should look black and white.

worest.jpg
 
Last edited by a moderator:
trinibwoy said:
So FP filtering wasn't critical to the support of HDR on NV40 then?

Nope. As Chalnoth said, blending is the critical component. Filtering can easily be worked around as has been shown in this thread, and cheaply so. For render targets it's a bit more expensive, but still very managable. Working around blending on the other hand is a real pain in the butt and tend to come at a high cost.
 
bloodbob said:
Well whats the point of having HDR if you can't even have HDR lighting for all those light mapped games say like HL2 or far cry ect. I guess will stick with LDR sky maps too.

Nothing prevents you from having HDR lightmaps. In fact, the HDR sample in the ATI SDK is using an HDR lightmap. The current version has FP16 and manual filtering in the shader. The next SDK revision will have this sample updated to use something like what's presented in this thread. It's still using method 2, but I'll update the code to use #3.
 
Humus said:
Nothing prevents you from having HDR lightmaps.
Yes I know that but it was I was replying in reference to why we will HDR assests in game.

Anyone who has instructions and speed to burn can always do this
encode
r=log2(r)/32+16;
g=log2(g)/32+16;
b=log2(b)/32+16;
a=log2(a)/32+16;
decode
r=exp2(r*32-16);
g=exp2(g*32-16);
b=exp2(b*32-16);
a=exp2(a*32-16);
If you need to keep your alpha channel and speed isn't a real problem this is an option though you end up with blending errors particularly if you have pure black pixels so its not a good option for drawn graphics. Also its a good option to use with the 32-bit A2R10G10B10 texture format and you could limit the maxium value to some lower ( say 4096 ) rather then 65536
 
Last edited by a moderator:
It was pointed out to me that the FP16 picture didn't look as smooth as the others. This is because it was point sampled. So when zoomed the FP16 picture had a semi-noisy appearance. I've updated the file in my original post to use linear filtering for this case as well.
 
bloodbob said:
Okay here is a worest case scenario for the humus texture storage thingy I'm using LDR images which makes it even more nasty :p ( and since trying to deal with more then 0.0-1.0 is painful and I didn't want to tonemap ect) so my max = 1.0. This is a 1024x1024 checker box image with instead of black we have R1G0B0 and white is R255G255B255. I have artifically set the mip map bias 5 mip maps back using linear mipmap and linear magnicifaction. Essentially below it should look black and white.

worest.jpg

Actually, this is just an encoding problem. Properly encoded this is what I get.

worstcase.jpg


I changed the algoritm to optimize for the largest rather than the smallest component in RGB, which fixed both this and improved quality overall.
 
Humus wanna post the PS code so I can try it out myself? because essentially your saying that there are several impossible outputs of the encoder (i.e. the encoder can't output a checker box red and white squares with a check board alpha of 1/256 [ or smaller if you like I did use INT16 textures ] and 1.0 ) which somethings probably going wrong somewhere. I know my example works prefectly well with the method 2 encoder and should work exactly the same with the method 3 encoder since the lowest non-zero value is 1/256 and the changes method 3 shouldn't kick in till below 1/65536 if I'm reading it right.

Code:
float a=max(rgba.r,max(rgba.g,rgba.b))/maxposs;
if( a > 1/65536 )// need some tweaking probably
{
float r=r/a;
float g=g/a;
float b=b/a;
}
else // this bit gains you an extra 2^48 unique 
{	 // possible outputs in low level lighting
float div=a*65536;
float r=r*div;
 float g=g*div;
 float b=b*div;
a=1/65536; // may require tweaking
}
return float4( r,g,b,a);
Encoder 3 should look something like that right?
 
Last edited by a moderator:
The PS code? It's the same as before, just multiply with alpha and maxValue.

Code:
uniform samplerCube SkyBox;
uniform float maxValue;
varying vec3 cubeCoord;
 
void main(){
   vec4 sky = textureCube(SkyBox, cubeCoord);
   sky.rgb *= sky.a * maxValue;
 
   float exposure = 16.0;
   gl_FragColor.rgb = 1.0 - exp(-exposure * sky.rgb);
}

The interesting part is the encoding:
Code:
float maxChannel = max(max(r, g), b);

r /= maxChannel;
g /= maxChannel;
b /= maxChannel;
float range = maxChannel / maxVal;
 
float f = sqrtf(1.0f / range);
 
range *= f;
r /= f;
g /= f;
b /= f;

This is with the change I mentioned, optimizing for maxChannel instead of minChannel as I did before. This improves quality in other cases too, such as the max 65536 where I previously saw some minor banding.
 
Nope yeap all works pretty damn well humus your description of the algorithims is what confused me. Espically your description of method three really sounds like there should be a condintional.


Okay the best accuracy you can achieve on the non maxium channels is 1/ 2^(log2(maxchannel)/2+8 ) of the main channel
So for 65536 your accurate to 1/65536 of the main channel
So for 256 your accurate to 1/4096 of the main channel
So for 1 your accurate to 1/256 of the main channel
So for 1/256 your accurate to 1/16 of the main channel

Assumig of course your using a maxium value of 65536. If your working with images with the vast majority of pixels with values over 1/256 then this shouldn't be a problem assuming your rendering to an 8bit output in the end. Since a 50/50 linear interp of 1/256 and 1.0 shouldn't be 0.2822

Okay I can't really be bothered making an example so I'll just chuck down some figures.
2x contrast
2^-16 blended with 2^-15
2^-15 blended with 2^-14
......
2^15 blended with 2^16
Accuracy 97.15%

4x contrast
2^-16 blended with 2^-14
2^-14 blended with 2^-12
......
2^14 blended with 2^16
Accuracy 90.01%

16x contrast
2^-16 blended with 2^-12
2^-12 blended with 2^-8
......
2^12 blended with 2^16
Accuracy 73.53%

256x contrast
2^-16 blended with 2^-8
2^-8 blended with 2^0
2^0 blended with 2^8
2^8 blended with 2^16
Accuracy 56.23%
 
Last edited by a moderator:
Okay so not likely the filtering in humus' method 3 instead of simply bagging it I've decided to come up with another method of encoding which could be used in some situation to replace humus. Also this method allows for better contrast between the primary colour channel and the secondry ones for values above 1.0 the contrast is 1:65536 for values under 1.0 its 1:Value*65536. As far as accuracy goes this method might loose some precision for the primary channel under 1.0 compared to humus but not much.

Method 4 encoder ( very similar to what I thought humus' method 3 was)
Code:
float maxvalue=65536.0;
// Edit should have some extra code to handle pure black so there is no dived by zero
// say like if maxChannel < 0.5 maxchannel = 1
//new method
float maxChannel = ceil( max(max(rgba.r, rgba.g), rgba.b) );
float range=1/maxChannel;
float r = rgba.r/maxChannel;
float g = rgba.g/maxChannel;
float b= rgba.b/maxChannel;
return float4(r,g,b,maxChannel/maxvalue);
Edit: Unfortunatly if you implement this code straight into a PS shader you get problems because your emulating INT maths so its not great for on the fly calculations.

Now this method of filtering has two sets of errors 1st is crossing blend X+X/65536 with Y where X is a whole number ( worest case ) and the 2nd is when crossing the 1.0 boundry.

Now for X+X/65536 with Y the blending is slight better then humus' filtering. Where X=1 The error drop as you increase X

2x contrast
2+2/65536 with 4 97.22% accuracy// not accurate this is the clostest I can get
4x contrast
1+1/65536 with 4 90.01% accuracy
16x contrast
1+1/65536 with 16 79.14% accuracy
256x
1+1/65536 with 256 75.29% accuracy
65536x
1+1/65536 with 65536 ~50% accuracy

Worest case where X =4
16834 x contrast
4+4/65536 blended with 65536 has ~90% accuracy.

Okay now for the average case X + 1/2 blended with Y. Where X=1 The error drop as you increase X
2x contrast
2+1/2 with 4 98.71% accuracy // not accurate this is the clostest I can get
4x contrast
1+1/2 with 4 95.45% accuracy
16x contrast
1+1/2 with 16 90.0% accuracy
256x
1+1/2 with 256 87.66% accuracy
65536x
1+1/2 with 65536 87.5% accuracy


For bleing 1 value of between 2^-16 and 2^0 and between 2^0 and 2^16 there is an error and it should be exactly the same as the error in humus method 3. ( atleast when working with 2^x blended with 2^-x )

2^16 blended with 2^-16 50.0015258789059% accuracy
2^8 blended with 2^-8 50.3906190396265% accuracy
2^4 blended with 2^-4 56.2256809338521% accuracy
2^2 blended with 2^-2 73.5294117647059% accuracy
2^1 blended with 2^-1 90% accuracy

For images with the vast majority of values over 1.0 I'd highly recommend method 4 as actually method 4 has much more precesision then FP16 for the primary channel for values over 1.0 and because as long as the blends between values are above 1.0 there will be less filtering errors. For the occasional blends accross 1.0 boundry the errors are no worse then humus' method 3. Edit: Some extremely incorrect comments here will fix tomorrow.

For textures with values below 1.0 this method the colour storage accuracy is exactly the same as a normal clamped INT16 texture.

And of course this work was based on humus' work :p and thanks to ATI for creating render monkey.
 
Last edited by a moderator:
Back
Top