Triangle setup

Nick · Sep 7, 2006

Hi all,

I was wondering what exactly the triangle setup stage comprises, and how it's implemented on a modern GPU.

If I understand correctly, it's main task is to compute gradients of interpolants. When graphics cards specify a setup rate of one triangle per clock cycle, does this include all interpolants (including up to 8x4 texture coordinates)? Or is the triangle rate dependent on the number of interpolants?

Do they derive the gradients from the 'plane equation approach' or is there a faster method? With 'plane equation approach' I mean for example for the z interpolant (could be any other interpolant) we compute the plane equation Ax+By+Cz+D=0 so the gradients are dz/dx=-A/C, etc. Since (A, B, C) is the plane normal it can be computed with a cross product. C can be reused for all interpolants, but it still looks like a lot of work. And since every interpolant requires perspective correction this adds three divisions and a whole lot of multiplications.

Any thoughts?

nAo · Sep 7, 2006

This is a very good starting point with a lot of answers to your questions:
http://www.cs.unc.edu/~olano/papers/2dh-tri/

Nick · Sep 7, 2006

Thanks! I just managed to get the same result with the plane equation approach:

dz/dx = -A/C = ((y1/w1 - y0/w0)*(z2/w2 - z0/w0)-(y2/w2 - y0/w0)*(z1/w1 - z0/w0)) / ((x1/w1 - x0/w0)*(y2/w2 - y0/w0)-(x2/w2 - x0/w0)*(y1/w1 - y0/w0))

= z0 * (y1*w2-y2*w1) / (-x1*w0*y2+x1*y0*w2+x0*w1*y2+x2*w0*y1-x2*y0*w1-x0*w2*y1)
+ z1 * (y2*w0-y0*w2) / (-x1*w0*y2+x1*y0*w2+x0*w1*y2+x2*w0*y1-x2*y0*w1-x0*w2*y1)
+ z2 * (y0*w1-y1*w0) / (-x1*w0*y2+x1*y0*w2+x0*w1*y2+x2*w0*y1-x2*y0*w1-x0*w2*y1)

Which is exactly the product of (z0, z1, z2) with (the first column of) the inverse of the matrix formed by the (x, y, w) coordinates, as shown in the paper you refer to.

So is the setup engine's task limited to computing this matrix inverse as efficiently as possible?

Jawed · Sep 7, 2006

This is useful, too:

http://www.ati.com/products/radeonx800/RadeonX800ArchitectureWhitePaper.pdf

Some work, e.g. perspective, is done before the triangle is assembled, in the vertex engine.

Jawed

nAo · Sep 7, 2006

Nick said:
So is the setup engine's task limited to computing this matrix inverse as efficiently as possible?

well..that's a clever implementation but dunno if real hw does use this system to setup triangles. It's also nice to note that if the inverse matrix does not exist..then you automatically know that you triangle has to be rejected

It seems next generation GPUs might use shader ALUs to setup triangles..

darkblu · Sep 7, 2006

the cleverest way i've seen for computing triangle gradients is through the ratio of triangle areas. i saw that eons ago in the glide drivers. have been using it ever since.

so you want to compute the gradient of interpolant I:

consider the area of the triangle in two different coordinate spaces:

(a) the original screen-space (X,Y), and
(b) a space composed of one of the screen space base vectors (X or Y) and the interpolat of interest, I, i.e. that'd be either (X, I), or (I, Y)

then

dI/dX = tri_area(I, Y) / tri_area(X, Y), and
dI/dY = tri_area(X, I) / tri_area(X, Y)

needless to say the code for this is minimalistic and elegant, but if you want to see a sample check thurp's tri plotter (rend/rendPrim.cpp)

Nick · Sep 7, 2006

Jawed said:
Some work, e.g. perspective, is done before the triangle is assembled, in the vertex engine.

How is this possible exactly? Couldn't it create division by zero? Thanks!

nAo · Sep 7, 2006

darkblu said:
dI/dX = tri_area(I, Y) / tri_area(X, Y), and
dI/dY = tri_area(X, I) / tri_area(X, Y)

isn't this exactly the same computation?

nAo · Sep 7, 2006

Nick said:
How is this possible exactly? Couldn't it create division by zero? Thanks!

no, cause if your triangle has a zero area you don't want to rasterize it

Nick · Sep 7, 2006

darkblu, that's exactly the plane equation approach.

The C component is exactly the area of the triangle (actually, twice the area). The other componets also correspond with areas.

The method with the inverse matrix appears to be faster when computing multiple gradients.

Nick · Sep 7, 2006

nAo said:
no, cause if your triangle has a zero area you don't want to rasterize it

Sorry, I'm not with you here. Perspective requires division by w, so when it's zero things go wrong. Triangle area can only be computed after assembly (not in the vertex engine like Jawed said). Or am I confusing a couple things here?

nAo · Sep 7, 2006

Nick said:
Sorry, I'm not with you here. Perspective requires division by w, so when it's zero things go wrong. Triangle area can only be computed after assembly (not in the vertex engine like Jawed said). Or am I confusing a couple things here?

Sorry, I missread your post, I thought you were replying to darkblu and not to Jawed

Nick · Sep 7, 2006

nAo said:
Sorry, I missread your post, I thought you were replying to darkblu and not to Jawed

No problem. I still wonder what part of the triangle setup or perspective can be done in the vertex pipeline though. Obviously doing things per triangle is triple the work of doing it per vertex. So anything that can be done in the vertex pipeline should be well worth it.

darkblu · Sep 7, 2006

nAo said:
isn't this exactly the same computation?

well, you know, sometimes people are in a 'write-before-read' mode. and it usually happens in the morning of busy days ; )

don't mind me, everybody, carry on.

nAo · Sep 7, 2006

Nick said:
No problem. I still wonder what part of the triangle setup or perspective can be done in the vertex pipeline though. Obviously doing things per triangle is triple the work of doing it per vertex. So anything that can be done in the vertex pipeline should be well worth it.

I think Jawed is simply referring to the perspective division, ATI drivers patch shaders and append some instructions to perform projection to screen space

Jawed · Sep 7, 2006

I was referring to the diagram on page 6 of the PDF, which shows that before setup you have backface culling, clipping, perspective divide and viewport transform.

But this could be nothing more than specific to ATI's GPUs.

I'm still trying to understand what the PDF is saying about interpolation :???:

http://www.beyond3d.com/forum/showthread.php?t=5642

Jawed

Nick · Sep 7, 2006

Ah, I see. The slide shows that backface culling and clipping is done after the vertex pipelines, then perspective divide, which makes sense. It doesn't seem like the vertex pipelines are doing any actual perspective work, although it would make sense to compute 1/w there already for later use (ignoring division by zero). For triangles crossing the near clip plane 1/w would have to be recomputed for the new vertices. After that the perspective divide is safe and efficient.

Jawed · Sep 7, 2006

Actually that's a seriously groovy thread, largely over my head. Something to come back to...

Still a bit confused about when/where interpolation happens in ATI and NVidia...

Jawed

nAo · Sep 7, 2006

Nick said:
For triangles crossing the near clip plane 1/w would have to be recomputed for the new vertices. After that the perspective divide is safe and efficient.

AFAIK nvidia hw does not perform any geometric clipping, so this step is not required

Simon F · Sep 7, 2006

nAo said:
AFAIK nvidia hw does not perform any geometric clipping, so this step is not required

Are they still using texkill to do user plane clipping?

Triangle setup

Nick

nAo

Nutella Nutellae

Nick

Jawed

nAo

Nutella Nutellae

darkblu

Nick

nAo

Nutella Nutellae

nAo

Nutella Nutellae

Nick

Nick

nAo

Nutella Nutellae

Nick

darkblu

nAo

Nutella Nutellae

Jawed

Nick

Jawed

nAo

Nutella Nutellae

Simon F

Tea maker

Similar threads