HDR comparison, I16 vs FP16

Humus

Crazy coder
Veteran
So I played a bit with HDR this morning, and found a way to vastly improve the quality of Int16 HDR, which has every bit of the quality of FP16, even when using extreme ranges. I had to use maximum light values of 65536 and high exposure to be able to see the first signs of banding.

So basically, there are three ways to use I16 for HDR. The first one is to simply store scaled down values and scale it up in the shader. For instance set maximum value to 32, and get 5 bits of overbrightness and 3 extra bits of precision in the low range compared to RGBA8. This comes at one extra instruction in the shader to decode, so it's cheap.

The second method is to use alpha to store a range, which gives additional precision. To encode it you first find the largest value of red, green and blue, then normalize rgb so the largest value is 1.0, then store maxChannel / maxValue in alpha, where maxValue is the maximum value for the entire texture. Decoding this in the shader is done as rgb * a * maxValue, which comes at a cost of two instructions, so it's cheap too.

The second method has good quality in pretty much all normal lighting conditions. The weakness I have found with this method is that the quality can become poor in dark areas under high exposure when the maximum value for the texture is large. After playing with this a bit this morning I came up with an improvement to the encoding that fixes this and vastly improves quality by distributing the bits better across the components. The problem with small values in a texture with large maximum values is that the value in alpha gets very small, meaning you're probably only using a couple of bits in the dark regions, which ruins it even with high precision in rgb. I solved this by checking if alpha would end up smaller than the minimum rgb value. If so, I scale rgb down and alpha up so that the minimum value and alpha is equal, which essentially shifts over some bits from rgb to alpha, which gives a much better precision after the decoding. This is only a change to the encoding phase, the decoding is still the same as previously, so it's still cheap at two instructions.

Ok, so here's how it performs. I used exposure = 16 for all images. To give a sense of how much exposure that is, here's how it looks with exposure = 1.

Reference FP16 image:
fp16.jpg


Using a maximum value of 512, this is what I got with the three methods:

plain_512.jpg
regular_512.jpg
improved_512.jpg


Method 1, max = 64
Method 2, max = 64
Method 3, max = 64

Method 1, max = 512
Method 2, max = 512
Method 3, max = 512

Method 1, max = 4096
Method 2, max = 4096
Method 3, max = 4096

Method 1, max = 65536
Method 2, max = 65536
Method 3, max = 65536

All full size png images
 
Very interesting Humus! Any drawbacks you can think of, off the top of your head, for using Method #3?

Also, for giggles, do you have the reference pic w/o HDR as well? It would be interesting (at least to me!) to compare the reference pic with HDR and the picture without the HDR effect.

Again, nice work :D
 
Umm, I'll ask the thicky question tonight. Did Humus just solve the most pressing performance issue in current graphics, taking one day to do it. . . .and the solution is compatible with all current cards, and at a performance level that doesn't make it necessary to have much faster cards to do it with at playable resolutions?

Is that what just happened here? That can't be right, can it? Umm, can it?
 
Acert93 said:
Very interesting Humus! Any drawbacks you can think of, off the top of your head, for using Method #3?

For art assets and everything that's precomputed, method 3 versus method 2, nope. 3/2 versus 1, one instruction more for decoding, but that's certainly worth it most of the time. For a render target conversion pass from FP16 to I16 in order to get filtering, there will be an extra cost for encoding with method 3 versus 2. In this case it may be better to use method 1 even, since it's much cheaper for encoding (1 instruction). That depends though on whether you can get some reasonable estimation of how your range will be and be able to find a good scaling value. Otherwise method 2 should probably do.

Acert93 said:
Also, for giggles, do you have the reference pic w/o HDR as well? It would be interesting (at least to me!) to compare the reference pic with HDR and the picture without the HDR effect.

Again, nice work :D

As you wish
I added the high resolution image to the rar file above.
 
So how does this render method handle transparency?
Edit: answered below its used for art assests/texture storage more then rendering.
 
Last edited by a moderator:
geo said:
Umm, I'll ask the thicky question tonight. Did Humus just solve the most pressing performance issue in current graphics, taking one day to do it. . . .and the solution is compatible with all current cards, and at a performance level that doesn't make it necessary to have much faster cards to do it with at playable resolutions?

Is that what just happened here? That can't be right, can it? Umm, can it?

Not really. Actually, it's not so much related to performance (unless of course this is the thing that made I16 quality go over the threshold to be useful instead of FP16 so that you now can skip manual filtering in the shader). It's mostly a quality improvement, showing that you can get FP16 level of quality using I16. Why would you want to do that anyway? Well, I16 has the advantage of supporting filtering on the R300, whereas FP16 does not. So for all the HDR art assets, it makes a lot of sense to use I16 instead of FP16. For render targets it's not as obvious a solution. There are plenty of methods to use with their pros and cons, as I mentioned in another thread earlier. This does add another option for the conversion pass, in case you need that extra precision. The cost may make it less desirable in practice, whereas for art assets there's not going to be any extra runtime cost. But it depends on the situation as well. If the overall scene shading is complex, then the relative cost of the extra math in the conversion pass may not make that much of a difference on the final FPS.
 
Are there any major filtering artificates?

Take the case Max=100 doing linear interp between 2 pixles ( save me working out bilinear )
P1
R=0.5 A=2/100 R decoded=0.5*2/100*100=1 ( assume the value of the blue channel is say 1.0 )
P2
R=1.0 A=100/100 R decoded=1.0*100/100*100=100

(P1+P2)/2 for 1/2 way inbetween
R=0.75 A=51 R decoded=0.75*51/100*100=38.25 instead of 49.5
Thats an error of 22%

Worest case scenario is when a component has the value of 0 in one pixel and the maxium value in the agecent pixel then you get a 50% error since this method is only orthoginal if the alpha value is the same. As far as I can tell the only way to fix this really is to do the linear intrep inside the shader and at that point it becomes cheaper to use FP16 texture over INT16 and you also gain an alpha channel.

You would get similar errors minifications too.
Edit: Sorry if I'm bursting your bubble here :cry: though this would be cool for on a integer platform
Edit edit: Great to see something to frigging talk about in this section of the forum
 
Last edited by a moderator:
geo said:
Umm, I'll ask the thicky question tonight. Did Humus just solve the most pressing performance issue in current graphics, taking one day to do it. . . .and the solution is compatible with all current cards, and at a performance level that doesn't make it necessary to have much faster cards to do it with at playable resolutions?

Is that what just happened here? That can't be right, can it? Umm, can it?

Dude, Humus cracked nv's "transparancy AA" as well for all you radeon owners as well.. 4xmsaa fences look sweet :)

oh yeah.. humus is king...
 
neliz said:
Dude, Humus cracked nv's "transparancy AA" as well for all you radeon owners as well.. 4xmsaa fences look sweet :)

oh yeah.. humus is king...
No, he didn't. He just implemented alpha masks, and in a different way than nVidia does. There's no way to enable via software the selective supersampling that nVidia offers.
 
So I played a bit with HDR this morning, and found a way to vastly improve the quality of Int16 HDR, which has every bit of the quality of FP16, even when using extreme ranges.

Ok, but now if we could see this working in Farcry vs FP HDR or in SCCT (quality vs fp16 HDR is not on par now) then i'm convinced. :)
 
bloodbob said:
Are there any major filtering artificates?

Not really. It's not going to be 100% equivalent to what a bilinear filter would return on FP16, but that's not the point either. A bilinear filter isn't the most optimal way to reconstruct the underlying signal from the samples, so comparing error to that isn't that useful. The reason we're using filtering is mostly to make it smooth, which a linear filter on this does, despite not being 100% mathematically equivalent. There are supposedly cases with steep variation across nearby sample where slight haloing can be seen, but I haven't been able to spot anything like that myself.
 
Razor1 said:
ya going to be releasing this on your site?

Sure, when it's done. I'll be travelling the next week though, so don't sit up waiting for it.

Razor1 said:
Whats the speed differential from fp 16 to int16?

When applied directly: None.
When int16 is used with a linear filter and FP16 uses manual filter in the shader: Huge. :) Changing an HDR sample I did at work from filtering in the shader to use I16 I saw a 45% performance increase, and then I wasn't even filtering the environment map before.
 
Humus said:
When applied directly: None.
When int16 is used with a linear filter and FP16 uses manual filter in the shader: Huge. :) Changing an HDR sample I did at work from filtering in the shader to use I16 I saw a 45% performance increase, and then I wasn't even filtering the environment map before.
For Radeons, I assume?

When you do the final release, it might be nice to support pairs of 2-channel textures for better performance on NV4x hardware, for the comparison.
 
Humus said:
Sure, when it's done. I'll be travelling the next week though, so don't sit up waiting for it.



When applied directly: None.
When int16 is used with a linear filter and FP16 uses manual filter in the shader: Huge. :) Changing an HDR sample I did at work from filtering in the shader to use I16 I saw a 45% performance increase, and then I wasn't even filtering the environment map before.

HEHE cool, would be nice to see.

Yeah we used a similiar type of trick but got only about a 20% increase in performance, can't talk about the trick too much, since I don't own the rights to it. But its quite interesting what old tech still has in it :)
 
Back
Top