Triangle setup

If you do not want to do geometric near-plane clipping at all, the alternative is to build into your rasterizer explicit support for rendering of "external triangles" (as described by the page nAo linked to near the beginning of this thread).
Is this considered an important feature? Some shadow volume implementations rely on it I believe.
 
Not necessarily. Arjan gave an example in an earlier post. You do need to pick the right rasterization rules for matching up edges, but that's not strictly sufficient.
How can the very same vertex be rounded to different fixed-point coordinates? Don't all vertices use the same rounding, and therefore it's independent of vertex order?
You need to be vertex order invariant (Triangle ABC renders exactly the same as BCA, for example), and you do need to render the same pixels if you translate triangles on screen (without changing anything else). Because floats have more precision around 0 than around large numbers, you need to be careful that you compute things correctly and that you have enough precision to make this work both around the origin of the screen and around the opposite corner.

With really large screen sizes, this can become a problem.
Ok, I understand that a translation can give a different rasterization. But why is this even considered a problem? I doubt this is ever visible for mortals.
 
I see. But as I noted above there's only a maximum of 2 interpolants required per shader instruction, so two adders could do the job (or even one with some code restructuring) instead of full multiply-add units?
Each interpolant in this case is a 4 component vector, so you are saying 8 components per cycle, which is quite a lot.

I'd have to double check, but because of optional perspective correctness, I believe you need a multiplier as well.

I can't see how the amount of math is the same. I do understand that this is good for tiny triangles (around the size of a quad) where setup time is crucial.

Thanks again for the insight.

I meant the total number of interpolants required is the same, regardless of how you compute them.
 
How can the very same vertex be rounded to different fixed-point coordinates? Don't all vertices use the same rounding, and therefore it's independent of vertex order?
It's not the vertex XY coordinates themselves, but the actual edge equation that is computed from them, that is subject to the potential unpleasant roundoff errors.
Ok, I understand that a translation can give a different rasterization. But why is this even considered a problem? I doubt this is ever visible for mortals.
A 1-pixel glitch in a 1-million-polygon mesh (which is what you get if you are not uber-careful) can be quite visible, if whatever is behind the mesh has a sufficiently different color.
 
Is this considered an important feature? Some shadow volume implementations rely on it I believe.
If you're running OpenGL or Direct3D, whether the hardware uses external-polygons or traditional clipping should not make very much of a visible difference; the external-polygon approach will result in a potentially large number of pixels with Z outside the usual [0,1]-range, and these pixels will be discarded. These discarded pixels also occupy the exact same region that is otherwise "missing" with ordinary near-plane clipping, so the region that is actually drawn is exactly the same under both approaches (modulo roundoff errors). As such, this particular choice is more just an implementation choice rather than a user-visible feature.
 
Each interpolant in this case is a 4 component vector, so you are saying 8 components per cycle, which is quite a lot.

I'd have to double check, but because of optional perspective correctness, I believe you need a multiplier as well.
Yeah but with barycentric coordinates you need additional multipliers?

I'm not fully understanding the details about the barycentric coordinates approach though. Does anyone have a reference?
 
seeing the collective brainpower gathered, i'll take the chance to bring up a question that i hope Nick would not mind me asking here:

what are your tricks for obtaining the highest-precision interpolants possible? use of fat fixed point? use of fatter fp formats? intelligent rounding modes?

i'm asking this since during my tinkering i've noticed one thing: if you naively go straight forward with, say, plane-equation-based fp32 interpolant computing and eventual nearest-rounding the you may end up with precision problems. the fact that it's an off-by-one error is not consoling either as there are many scenarious when this is unacceptable - say, addresses into a high-contrast point-sampled map.

on a more optimistic and loosely-realted note, isn't it about time we threw a bit more raw power at those problems? like comforting-fat fx precision for all interpolants?
 
It's not the vertex XY coordinates themselves, but the actual edge equation that is computed from them, that is subject to the potential unpleasant roundoff errors.
Once we're working with fixed-point coordinates there is no roundoff error. Ok this requires more than 32-bit fairly rapidly but by avoiding floating-point we're not loosing any precision. Or are the edge equations unsuited for fixed-point for some reason?
A 1-pixel glitch in a 1-million-polygon mesh (which is what you get if you are not uber-careful) can be quite visible, if whatever is behind the mesh has a sufficiently different color.
As long as the mesh uses shared vertices and forms a perfectly closed surface I can't see how there could ever be a crack. Only T-joints can show cracks, but every self-respecting developer knows not to use them and how to avoid them.
 
Once we're working with fixed-point coordinates there is no roundoff error. Ok this requires more than 32-bit fairly rapidly but by avoiding floating-point we're not loosing any precision. Or are the edge equations unsuited for fixed-point for some reason?

As long as the mesh uses shared vertices and forms a perfectly closed surface I can't see how there could ever be a crack. Only T-joints can show cracks, but every self-respecting developer knows not to use them and how to avoid them.

If you are
  • converting your vertices to fixed-point before you start to compute the edge equation
  • and never throw away any low-order bits anywhere in the edge equation calculation
  • and do not perform any divisions when computing your edge equation (so that your equation ends up on the form a*x+b*y+c=0 and not x=a*y+b)
then you should be in the clear, with a 100% robust method (however you will then need to do clipping against the XY edges of the screen or a guard band).
 
If you are
  • converting your vertices to fixed-point before you start to compute the edge equation
  • and never throw away any low-order bits anywhere in the edge equation calculation
  • and do not perform any divisions when computing your edge equation (so that your equation ends up on the form a*x+b*y+c=0 and not x=a*y+b)
then you should be in the clear, with a 100% robust method (however you will then need to do clipping against the XY edges of the screen or a guard band).
Exactly.

Here's a small tutorial I wrote two years ago about a robust software rasterizer using (half-)edge functions: Advanced Rasterization. It's suited for resolutions up to 2048x2048 and 4-bit sub-pixel precision but can be extended to much higher resolutions (including a guard band) and sub-pixel precision with 64-bit. A hardware implementation could obviously use an arbitrary precision.
 
According to the OpenGL standard, clipping is to be done before the final per-vertex division-by-W; this way, a W=0 vertex doesn't cause clipping to blow up.

If you do not want to do geometric near-plane clipping at all, the alternative is to build into your rasterizer explicit support for rendering of "external triangles" (as described by the page nAo linked to near the beginning of this thread).

I know what the open GL specification is but the output of an NV2x vertex shader (the hardware version not what the D3D interface looks like) is actually
X/W Y/W (optionally)Z/W and W
If W is 0 or close to it and your using a standard divide all information about X and Y are lost since infinity doesn't tell you much. Although you might be able to get away with clamping based on sign to some large number for rasterization purposes and assuming any infinity is basically at the nearplane. But I'd have thought this would lead to noticeable texture crawl as triangles approach the nearplane.
 
I actually 'hacked' a triangle setup algorithm running on the simulator unified shader (ARB instruction set) for a paper last year about mobile GPUs. With 32-bit fp it seemed to work relatively well for the Unreal Tournament trace I was using at resolutions of 320x240 and 640x480 ... until one explosion started to produce quite large glitches. It may had been a precission problem but I never got to find what was the real problem. You can check the simulator if you are interested, the hack is still there even if it hasn't been used or tested since then.

The algorithm used is the one based on Olano's paper. The inverse matrix is computed and from that matrix four linear equations are derived for the three edges and for Z. All other interpolators are computed in a much later stage using the barycentric coordinates (the results of the three edge equations for the fragment). No point in carrying all that info for fragments that are outside the triangle or fail Z before shading. All the emulation code related with triangle setup and fragment generation uses 64-bit fp just to be 'safe' (I didn't want to bother with fixed point arithmetic and speed is not a problem for the simulator).
 
I actually 'hacked' a triangle setup algorithm running on the simulator unified shader (ARB instruction set) for a paper last year about mobile GPUs. With 32-bit fp it seemed to work relatively well for the Unreal Tournament trace I was using at resolutions of 320x240 and 640x480 ... until one explosion started to produce quite large glitches. It may had been a precission problem but I never got to find what was the real problem. You can check the simulator if you are interested, the hack is still there even if it hasn't been used or tested since then.
Was it a rasterization glitch (pixels showing up in the wrong places) or an interpolation glitch (funky colors)? I've seen a fair bit of both in my projects. :D If it's a rasterization glitch, beware of division by zero or denormals. This can happen fairly easily if you allow edge on triangles to be rasterized. Snap the coordinates to fixed-point and you can test for exact zero.
The algorithm used is the one based on Olano's paper. The inverse matrix is computed and from that matrix four linear equations are derived for the three edges and for Z. All other interpolators are computed in a much later stage using the barycentric coordinates (the results of the three edge equations for the fragment). No point in carrying all that info for fragments that are outside the triangle or fail Z before shading. All the emulation code related with triangle setup and fragment generation uses 64-bit fp just to be 'safe' (I didn't want to bother with fixed point arithmetic and speed is not a problem for the simulator).
Interesting point about interpolating data for pixels outside the triangle or failing the z-test.

Using 64-bit floating point should indeed be fairly safe. It's pretty much equivalent to having a 53-bit fixed-point format with, well, a floating point. ;) You can still do the 'snapping' though. It gets rid of nasty denormals and you can do an exact compare to zero.
 
Was it a rasterization glitch (pixels showing up in the wrong places) or an interpolation glitch (funky colors)? I've seen a fair bit of both in my projects. :D If it's a rasterization glitch, beware of division by zero or denormals. This can happen fairly easily if you allow edge on triangles to be rasterized. Snap the coordinates to fixed-point and you can test for exact zero.

Actually it was quite more visible than a few pixels. It was a large triangle or part of a triangle in orange color and untextured that showed in the middle of the explosion.
 
Actually it was quite more visible than a few pixels. It was a large triangle or part of a triangle in orange color and untextured that showed in the middle of the explosion.
It probably was textured but due to failing texcoord interpolation a single texel got applied for the whole triangle.
 
They only ever did this for 3D clip planes scissoring and nearplane clipping were never done this way.

Actually I've never really understood how nvidia does their near plane clipping, since they don't have the pre divide by W values in the clipping unit. My assumption is that they must fudge the divide in someway to allow them to backout the value even in the case of a degeneracy. I do know that they do loose precision in the case of a near clip.
Wouldn't "if w<1 then discard" work?

edited: No, it wouldn't, don't bother replying ;)

edited again: a "regular" near plane is defined by znear, and because pst-transform w would only depend on post-transform z (if it depends on anything at all), there will certainly be a way to figure out the z/w at the near plane.

It should be possible to replace near-plane clipping by the comparison z/w to a precomputed z/w-near.

w=0 shouldn't (TM) happen too often. In real world terms that's at the center of the eye/camera, not at the near plane.
 
Last edited by a moderator:
w=0 shouldn't (TM) happen too often. In real world terms that's at the center of the eye/camera, not at the near plane.
W=0 actually corresponds to a plane in the 3d space, not a point. This plane goes straight through the center-of-eye/camera and is parallel to the near-plane. These two planes are normally fairly close to each other, but they are in fact not the same plane. The usual formula for computing per-pixel Z results in 0.0 at the near-plane and minus infinity at the W=0 plane.

Getting W= exactly 0 for a vertex generally requires a 100% exact floating-point cancellation to appear somewhere in the modelview+projection transform. It is as such a rather unlikely case, but one that does need to be handled or avoided. It can be avoided with ordinary geometric near-plane clipping; handling it is a bit non-trivial, but not impossible.
 
Looking at the algorithm for doing setup with a matrix inverse, and how it depends solely on the vertex positions, I started wondering whether triangles that share an edge could have similar coefficients. Say we have a triangle strip, do we have to recompute the entire matrix inverse for each triangle, or can we use part of the previous one? This corresponds with changing one column in the matrix that needs to be inverted, in a cyclic manner.

After half a day of exercising my linear algebra skills, I found that it takes 20 multiplications to compute the new inverse from the previous one, versus 30 multiplications to do a full matrix inverse. It requires 16 additions versus 11 though, and still needs one reciproke. Either way, it's a nice tradeoff, especially when already using triangle strips exclusively.

Any chance this is also used by any hardware?
 
After half a day of exercising my linear algebra skills, I found that it takes 20 multiplications to compute the new inverse from the previous one, versus 30 multiplications to do a full matrix inverse. It requires 16 additions versus 11 though, and still needs one reciproke. Either way, it's a nice tradeoff, especially when already using triangle strips exclusively.

Well, it seems like most games aren't using triangle strips. In a study we made of OGL and D3D games only Oblivion seemed to masively use triangle strips (75% of the benchmarked geometry).
 
Back
Top