Conditional texturing

Xmas · Oct 22, 2005

With all the texture filtering optimizations and techniques like detail texturing, I wonder why noone ever bothered to implement conditional texture ops, i.e. texture ops that do no sampling and return a set value if a certain condition is not met. Like sampling from a detail map only when the LOD is lower than a certain value. Of course, as checking for the condition and writing a default value does take time, this would only pay off if the texture sampling takes multiple cycles. But just comparing the LOD and having a register for a default value seems to be a very cheap way of saving a few cycles.

Arun · Oct 22, 2005

Yep, I do wonder the same - such a feature would make detail maps an order of magnitude more elegant to implement. heh. I actually wonder how appropriate X1800's PS branching would be to simulate such a thing.
Although in the case of a detail map, you'd want a LERP anyway so that it doesn't just "pop" in (although that depends on how you'd implement it), so it wouldn't be quite free anyway.

Uttar

Xmas · Oct 22, 2005

Uttar said:
I actually wonder how appropriate X1800's PS branching would be to simulate such a thing.

For detail maps, this would require to calculate the LOD in the PS, so that's not a good idea. For other purposes, it would be predicated texture ops that really do no sampling when the condition is not met.

Although in the case of a detail map, you'd want a LERP anyway so that it doesn't just "pop" in (although that depends on how you'd implement it), so it wouldn't be quite free anyway.

You would need trilinear filtering, but only between the largest mip levels.

Humus · Oct 22, 2005

Xmas said:
For detail maps, this would require to calculate the LOD in the PS, so that's not a good idea.

You could make a good estimation by looking at the gradients, without computing the full LOD. For detail textures though I'm not sure how much of a gain this kind of stuff would be. You could easily use small compressed textures for that, so the cost would be low anyway.

Xmas · Oct 23, 2005

Humus said:
You could make a good estimation by looking at the gradients, without computing the full LOD. For detail textures though I'm not sure how much of a gain this kind of stuff would be. You could easily use small compressed textures for that, so the cost would be low anyway.

Bandwidth cost, but not necessarily sampling cost. Though I'm not sure how much difference AF would do for detail textures. Still, it would be a rather cheap way of saving a few cycles since the sampling is variable in cycles, anyway. And the TMU might already be able to output a constant color (border) when sampling outside of some bounds.
btw, does R520 finally support border address mode?

ehart · Oct 23, 2005

Are you referring to border mode under D3D?

I believe we have supported the clamp to border functionality under OpenGL for at least R300 forward. We do have a minor non-orthogonality that might cause us to leave it off under D3D. We only support a border color for textures up to 32 bits. This might have changed with the X1xxx series though.

As for the conditional functionality, I am not so sure that you will ever see this. Adding lots of special purpose fixed function operations to the texture operation eventually has dimishing returns over increasing the generic ALU resources available.

-Evan

Xmas · Oct 23, 2005

ehart said:
Are you referring to border mode under D3D?

I believe we have supported the clamp to border functionality under OpenGL for at least R300 forward. We do have a minor non-orthogonality that might cause us to leave it off under D3D. We only support a border color for textures up to 32 bits. This might have changed with the X1xxx series though.

That's unfortunate. I found it to be especially useful for a fast 2D or 3D range check, instead of using compare ops.

As for the conditional functionality, I am not so sure that you will ever see this. Adding lots of special purpose fixed function operations to the texture operation eventually has dimishing returns over increasing the generic ALU resources available.

Agreed. But things like fixed function LOD calculation and texture addressing/filtering aren't going away any time soon, and a LOD-based conditional sampling probably would be only a minor extension to border address mode for the W-Axis.

Humus · Oct 23, 2005

ehart said:
I believe we have supported the clamp to border functionality under OpenGL for at least R300 forward. We do have a minor non-orthogonality that might cause us to leave it off under D3D. We only support a border color for textures up to 32 bits. This might have changed with the X1xxx series though.

It is my understanding that we now fully support border mode with the X1xxx series.

OpenGL guy · Oct 23, 2005

Humus said:
It is my understanding that we now fully support border mode with the X1xxx series.

There are some slight differences between X1000 series and ref rast with border colors. On X1000, the border is the same format as the texture, like OpenGL, but ref rast doesn't behave this way. For example, if you have an R8G8 texture and the border color is 0xffffff, then ref rast will give a border color of 0xffffff and X1000 will give 0xffff. Seems clear that our behavior makes sense.

Mintmaster · Oct 23, 2005

Xmas said:
With all the texture filtering optimizations and techniques like detail texturing, I wonder why noone ever bothered to implement conditional texture ops, i.e. texture ops that do no sampling and return a set value if a certain condition is not met. Like sampling from a detail map only when the LOD is lower than a certain value.

I think you could implement this quite easily with the texldb instruction.

For a detail map implementation, you could easily use the vertex shader to output a value based on distance from the viewer that becomes positive at the point you want the detail map to fade out. Scaled and clamped appropriately, this could be an input to the texldb parameter. When the bias is high enough, you'll just be sampling the 1x1 mipmap, which costs no bandwidth since it'll be in the texture cache. AF should be single cycle too if the bias is high.

For the transistion zone, just make your detail map mipmaps lighter at each level.

Xmas · Oct 24, 2005

Mintmaster said:
I think you could implement this quite easily with the texldb instruction.

For a detail map implementation, you could easily use the vertex shader to output a value based on distance from the viewer that becomes positive at the point you want the detail map to fade out. Scaled and clamped appropriately, this could be an input to the texldb parameter. When the bias is high enough, you'll just be sampling the 1x1 mipmap, which costs no bandwidth since it'll be in the texture cache. AF should be single cycle too if the bias is high.

For the transistion zone, just make your detail map mipmaps lighter at each level.

I don't see how this would help, since it's not simply "distance from the viewer" that decides where the detail map should be visible, but LOD. So you'd have to calculate the LOD and either apply no bias if it's below 1, or a high bias if it's not.
Besides, one problem with texldb (texldl, texldd) is that it's per pixel, not per quad. That makes it potentially expensive on some implementations. And the PS2.0 spec only requires support for [-8, 8] bias. (And both ATI and NVidia implemented it at the wrong point, but that's just an unrelated rant)

KimB · Oct 24, 2005

Well, I've been saying it for a long time now, and here's another reason why it'd be nice to have some exposure of the texturing pipeline to the shader writer. It might be nice, for instance, to have the ability to have LOD (and perhaps anisotropy, in some packed vendor-dependent format) information returned in a register. I mean, the hardware to produce the LOD is already there, why not make use of it instead of writing your own shader for the task?

Other things, like the ability to tell the hardware that user-supplied LOD values are coherent across quads would be highly useful (and, better yet, to enforce quad-level coherency for a subset of a pixel shader).

It'd also be nice to have access to the texture filtering hardware for user-supplied data. A quick example of where this all would come in handy would be bump map anti-aliasing: one might be able to, with a programmable texture unit, send all of the sample values resulting from a texture access to the bump map straight to the pixel shader. This would "split" the pixel shader into N threads, each one receiving one value of the bump map (and also set to be executed in serial in one pixel pipeline: this idea of a "thread" is distinctly different from the idea of multi-threaded execution, but I can't think of a more descriptive term at the moment, sorry). The split portion of the shader would then calculate all of the lighting information from the bump map, supplying a final color value to the filtering hardware to be averaged together in concert with the texture coordinates, recombining the N threads into one.

Now, if you think about this, such a situation would require two separate accesses to the texture hardware for a situation where the hardware usually just makes one access. So it would be rather inefficient compared to normal fixed-function texturing, but it seems to me that it'd be vastly more efficient than doing the LOD calculation in the shader.

Xmas · Oct 24, 2005

Chalnoth said:
Other things, like the ability to tell the hardware that user-supplied LOD values are coherent across quads would be highly useful (and, better yet, to enforce quad-level coherency for a subset of a pixel shader).

For this, I'd like to have a sampling function in SM4 that takes virtual texture coordinates and calculates the LOD from these while doing the sampling with other coordinates (like tex#Dvirt(sampler, coords, virtualcoords)). And, of course, a lod#D(sampler, coords) function.

Mate Kovacs · Oct 24, 2005

Chalnoth said:
This would "split" the pixel shader into N threads, each one receiving one value of the bump map (and also set to be executed in serial in one pixel pipeline: this idea of a "thread" is distinctly different from the idea of multi-threaded execution, but I can't think of a more descriptive term at the moment, sorry).

How about the term (sampling) stage?

Reverend · Oct 25, 2005

Xmas said:
With all the texture filtering optimizations and techniques like detail texturing, I wonder why noone ever bothered to implement conditional texture ops, i.e. texture ops that do no sampling and return a set value if a certain condition is not met. Like sampling from a detail map only when the LOD is lower than a certain value. Of course, as checking for the condition and writing a default value does take time, this would only pay off if the texture sampling takes multiple cycles. But just comparing the LOD and having a register for a default value seems to be a very cheap way of saving a few cycles.

Alternatively, a game can wrap the texture sampling operation in an HLSL conditional statement.

Mintmaster · Oct 25, 2005

Xmas said:
I don't see how this would help, since it's not simply "distance from the viewer" that decides where the detail map should be visible, but LOD.

Well, I'm pretty sure that's how games do it, like Serious Sam. If you want to be more rigourous, you could approximate LOD in the vertex shader using N dot Eye and a constant for the texture resolution / polygon area. I was just trying to give you an example of how you could do it fairly easily today.

As Humus said, detail textures are quite small anyway. You'll reach the low mipmaps quite easily, and once it fits entirely in a fraction of the texture cache, you're not costing any bandwidth anyway. Any optimization like the one you described would have minimal impact on performance.

Reverend · Oct 25, 2005

Humus said:
You could make a good estimation by looking at the gradients, without computing the full LOD. For detail textures though I'm not sure how much of a gain this kind of stuff would be. You could easily use small compressed textures for that, so the cost would be low anyway.

I think that's the main thing/issue, whether it's worth it.

As per some of the replies here, what Xmas was thinking about basically already happens with clamp-to-border texture repeat modes -- if the texcoord is more than a half texel off the edge, it just returns a register value. It probably wouldn't be difficult to extend that to some form of LOD clamp, but I don't think it would have very much benefit and therefore means (a game developer's) resources should be spent elsewhere.

Besides, if what John said to me quite a while back :

John Carmack said:
I am loathe to advocate twiddly little extensions.

is representative of the whole game development industry, this (like many other good ideas) is a matter of priority and therefore cost in an increasingly expensive game development industry.

It is nice though

Xmas · Oct 25, 2005

Reverend said:
Alternatively, a game can wrap the texture sampling operation in an HLSL conditional statement.

Dynamic branching is less efficient for this purpose, and it would require calculating the LOD in the shader, as I already stated in response to Uttar.

Mintmaster said:
Well, I'm pretty sure that's how games do it, like Serious Sam. If you want to be more rigourous, you could approximate LOD in the vertex shader using N dot Eye and a constant for the texture resolution / polygon area. I was just trying to give you an example of how you could do it fairly easily today.

I'm pretty sure games only do a very coarse distance-based disabling of detail maps per object, if at all, and let trilinear filtering take care of fading out the detail map. But even if you had an approximate LOD, that would mean an additional cost, and using texldb is not trouble-free.

As Humus said, detail textures are quite small anyway. You'll reach the low mipmaps quite easily, and once it fits entirely in a fraction of the texture cache, you're not costing any bandwidth anyway. Any optimization like the one you described would have minimal impact on performance.

And as I wrote before, I'm not worried about bandwidth cost, because with compressed textures texture bandwidth requirements are relatively low anyway. Reducing the sampling cost is much more important IMO.

Conditional texturing

Xmas

Porous

Arun

Unknown.

Xmas

Porous

Humus

Crazy coder

Xmas

Porous

ehart

Xmas

Porous

Humus

Crazy coder

OpenGL guy

Mintmaster

Xmas

Porous

KimB

Xmas

Porous

Mate Kovacs

Reverend

Mintmaster

Reverend

Xmas

Porous

Similar threads