question on nv30 and ati radeon 9700

OpenGL guy · Sep 4, 2002

Sharkfood said:
I specifically remember Microsoft deeming this as NOT "true trilinear" from their earliest WHQL tests. These fudged non-uniform colors into the different mipmaps to ensure trilinear was taking it's samples from multiple mipmaps, rather than mipmapping your own "on the fly" from a single mipmap. After all, a 2x2 in current mipmap could be used to interpolate a 1x1 of your neighboring mipmap in real texturing. This used to be grounds for early WHQL rejection for trilinear, around ~DX5 if my memory serves me... unsure if this still sticks today.

Microsoft finally caved in... now that no one in the industry is using the technique

Dave Baumann · Sep 4, 2002

PowewrVR are (when compression is enabled).

Xmas · Sep 4, 2002

Chalnoth said:
No, because you can do the filtering in stages.

Quick example:

Mathematically, the two methods are the same:

T1 = A + B + C + D
Avg = T1 / 4

This requires T1 to have an additional two bits for the final divide to not lose anything.

Second method:

T1 = A + B
T2 = C + D
Avg1 = T1 / 2
Avg2 = T2 / 2
T3 = Avg1 + Avg2
Avg = T3 / 2

Here, because the most we're dividing by is two, no more precision than 9 bits (assuming the inputs A, B, C, and D, as well as the outputs, Avg1, Avg2, Avg are all 8-bit) is necessary.

Sorry, but that does not work.

Quick example taking 4-Bit Values:
A = 0101 (5), B = 1110 (14), C = 0010 (2), D = 1011 (11)

First Method:
T1 = A + B + C + D = 100000 (32)
Avg = T1 / 4 = 1000 (8)

Second Method:
T1 = A + B = 10011 (19)
T2 = C + D = 01101 (13)
Avg1 = T1 / 2 = 1001 (9)
Avg2 = T2 / 2 = 0110 (6)
T3 = Avg1 + Avg2 = 01111 (15)
Avg = T3 / 2 = 0111 (7)

Every integer division (or shift in that case) means a loss of precision. Cutting precision of intermediate results is worse than cutting precision of the final result.

KimB · Sep 4, 2002

Forgot one little thing. Centering the errors about zero can be very helpful, as it will keep errors as you noted there from accumulating.

In this situation, whenever dividing by two, you would round up every other number:

T1 = A + B = 10011 (19)
T2 = C + D = 01101 (13)
Avg1 = T1 / 2 = 1010 (10) <-Rounded this one up...would be the same if rounded the second intermediate number up
Avg2 = T2 / 2 = 0110 (6)
T3 = Avg1 + Avg2 = 10000 (16)
Avg = T3 / 2 = 1000 (8)

While this won't alway be correct, you can keep the errors to a minimum for hundreds of additions/divisions with only a couple extra bits of precision (The errors would be based upon a probability distribution...and the more random, the less noticeable).

Xmas · Sep 4, 2002

Chalnoth said:
Forgot one little thing. Centering the errors about zero can be very helpful, as it will keep errors as you noted there from accumulating.

In this situation, whenever dividing by two, you would round up every other number:

T1 = A + B = 10011 (19)
T2 = C + D = 01101 (13)
Avg1 = T1 / 2 = 1010 (10) <-Rounded this one up...would be the same if rounded the second intermediate number up
Avg2 = T2 / 2 = 0110 (6)
T3 = Avg1 + Avg2 = 10000 (16)
Avg = T3 / 2 = 1000 (8)

While this won't alway be correct, you can keep the errors to a minimum for hundreds of additions/divisions with only a couple extra bits of precision (The errors would be based upon a probability distribution...and the more random, the less noticeable).

Did you actually test this? I wrote a small program that does both methods and accumulates the difference and absolute difference. For every eighth set (average) of numbers (A, B, C, D) the second method results in one less than the first.
If I round Avg1 up (adding 1 before shift), it results in one more than the first method for every eighth set of numbers. And the "pattern" is similar.

KimB · Sep 4, 2002

To tell you the truth, I didn't thoroughly examine that particular algorithm, but the essential idea is centering errors about zero.

Update: I went ahead and thought about it, and since the chance to end up with an error of -1 for an 8-bit output is 1/4 for the case with truncated intermediate results (Assuming you need two odd intermediates that are truncated after the division by two for an error at the end), the proper centering would be to add one to one of the four numbers 50% of the time, making for -1 error 1/8 of the time, +1 error 1/8 of the time. This doesn't examine errors due to truncation that occurs for the final value, so it may need some adjustment.

By centering errors about zero, you can significantly reduce the error accumulation. Centering the errors about zero means that half of the errors encountered are additive, while half are subractive. With the most basic truncation, all errors are additive, meaning that every error compounds every other error.

Anyway, yes, I was wrong about reducing the accuracy of intermediate results. However, the essential idea is that it is not necessary for the number of bits required for certain calculations to increase as quickly as
you might think.

Quick example: I produced a program some time ago that did the following:

Situation one (not centered):
1. 50% chance of adding 1 to the result (represents when one binary value is truncated)
2. 25% chance of adding 2 to the result (represents when both binary values are truncated)
3. 25% chance of doing nothing (simulating no error).

Situation two (centered):
1. 25% chance of adding 1 to the result.
2. 25% chance of subtracting 1 from the result.
3. 50% chance of doing nothing.

Basically, situation two does nothing more than subtracts 1 from situation one. Right off the bat, it looks like it should produce less error. The two loops were run until they reached a certain constant number. This constant number (We'll call it T) simulates the amount of "acceptable error," which would usually consist of bits that are truncated before being used. N is the number of loops executed:

For the non-centered case:
N = T (in the real world, this means error is proportional to the number of passes, which occurs whenever errors are always additive)
For the centered case:
N = (T^2)/2 (Which, in the real world, means that error is proportional to the square of the number of passes, which occurs whenever errors are centered about zero).

Anyway, I'd have to examine precisely how this applies to real-world situations, but the general idea should be obvious. It's not absolutely necessary to have N bits of extra accuracy for 2^(N-1) averages.

Xmas · Sep 5, 2002

Of course this is true. However it requires the input values to be a bit random or may cause small artifacts. But that shouldn't be a problem here.

KimB · Sep 5, 2002

Depends on the algorithm for centering the errors about zero...the more random (and, likely, the more transistors required...), the less dependent upon incoming data it is.

Regardless, I have a strong feeling that it will be very important to implement such algorithms soon, if they don't exist already.

Humus · Sep 5, 2002

Wouldn't just adding the extra bits of precision be both cheaper and provide better results?

KimB · Sep 5, 2002

Humus said:
Wouldn't just adding the extra bits of precision be both cheaper and provide better results?

No.

Don't forget the progression. More precision with errors centered about zero does much more than just more precision.

Here's an example:

# of averages: # of extra bits (note that these may not be accurate...the important aspect is the speed of progression)

Normal truncation:
4:2
8:3
16:4
32:5
64:6

Centering errors about zero:
8:2
32:3
128:4
512:5
2048:6

Now, consider a modern high-quality scenario (As Xmas pointed out, you cannot sacrifice intermediate quality): 6x FSAA, 16-degree anisotropic with trilinear. I believe 16-degree anisotropic requires up to 16 bilinear/trilinear texture samples. With trilinear, that makes 32 bilinear samples, which makes for 128 averaged values. Add in 6x FSAA and you've averaged 768 different numbers (Granted, with MSAA you won't usually lose any additional from FSAA, so this is an extreme worst-case scenario).

And that's only with one texture! As you can see, such algorithms are most certainly necessary. Considering that we have yet to see any noticeable loss in bit depth from enabling of features like trilinear, FSAA, or anisotropic, it may well be that these algorithms already exist.

After all, we've had dithering for some time now. Dithering is a similar idea (introduce errors in the right proportion so that the end result is actually closer to reality).

question on nv30 and ati radeon 9700

OpenGL guy

Dave Baumann

Gamerscore Wh...

Xmas

Porous

KimB

Xmas

Porous

KimB

Xmas

Porous

KimB

Humus

Crazy coder

KimB

Similar threads