no idea!Are they still using texkill to do user plane clipping?
no idea!Are they still using texkill to do user plane clipping?
You always need 1/w for perspective correct interpolation, no? Since divisions are expensive it's better to compute 1/w once per vertex than three times per triangle.AFAIK nvidia hw does not perform any geometric clipping, so this step is not required
The problem with doing these kinds of computations in the shader ALUs is the precision needed. For large displays / large amount of super-sampling and/or multi-sampling, the precision needed for the setup computations to maintain invariance or prevent cracks generally exceeds fp32, regardless of the approach you take to make those computations.It seems next generation GPUs might use shader ALUs to setup triangles..
You can also use barycentrics. There's more than one way of computing thingshis also mean the 'plane equation approach' is used instead of the 'inverse matrix approach'?
Thanks for the info, sereric!
Why do you mention the z gradients separately from the other interpolants? Does this have something to do with z-cull?
Since not all interpolants are needed immediately (every shader instruction uses at most 1 texture coordinate and 1 interpolated color), does some of the gradient computation happen in multiple clock cycles?
I didn't know ATI hardware did wide lines entirely in hardware! Since it requires screen space operations like you say, does this also mean the 'plane equation approach' is used instead of the 'inverse matrix approach'?
Obviously the first option is a no go, the second one might be viable if the area cost of upping ALUs precision is still less than the area required to have dedicated ALUs into the setup engine.So you're left with several options then: Don't support high resolutions, allow cracks/non-invariance/other artifacts, or increase the precision of some or all your shader ALUs.
Are they still using texkill to do user plane clipping?
Could you elaborate on this or point me to some papers? If I remember correctly, barycentric coordinates assign (0, 0), (1, 0) and (0, 1) to the vertex positions, so interpolants would just become a linear interpolation of these barycentric coordinates. But doesn't this require some transformation from (x, y) coordinates to barycentric coordiantes, and then a lot of multiplies and additions to get the interpolants? It saves the setup cost but adds a lot of per-pixel work, at first thought...You can also use barycentrics. There's more than one way of computing things
I fail to see why you would get cracks (between triangles I assume), or anything like that. Doesn't the rasterizer use fixed-point computations?The problem with doing these kinds of computations in the shader ALUs is the precision needed. For large displays / large amount of super-sampling and/or multi-sampling, the precision needed for the setup computations to maintain invariance or prevent cracks generally exceeds fp32, regardless of the approach you take to make those computations.
There are even more approaches? Where can I learn about these?There are multiple ways, but all equivalent, to compute the slopes. Also, different scan converters/rasterizers might operate differently and want different setup data.
Could you elaborate on this or point me to some papers? If I remember correctly, barycentric coordinates assign (0, 0), (1, 0) and (0, 1) to the vertex positions, so interpolants would just become a linear interpolation of these barycentric coordinates. But doesn't this require some transformation from (x, y) coordinates to barycentric coordiantes, and then a lot of multiplies and additions to get the interpolants? It saves the setup cost but adds a lot of per-pixel work, at first thought...
You do need to convert from floating-point to fixed-point (if that's what you're using), and if your source values lost some bits, you can't recover them by moving to fixed-point after the fact.I fail to see why you would get cracks (between triangles I assume), or anything like that. Doesn't the rasterizer use fixed-point computations?
I fail to see why you would get cracks (between triangles I assume), or anything like that. Doesn't the rasterizer use fixed-point computations?
The operations per pixel are simple, but still more than just an addition for regular interpolation, right? Although I guess it fits nicely with the multiply required for perspective correction. It's a tradeoff between higher setup cost with lower per-pixel cost and lower setup cost with higher per-pixel cost, right? And the latter is preferred nowadays to maximize parallelism?You could iterate barycentrics as well as X,Y,Z per pixel, then use the barycentric to compute the other interpolants such as color and texture coordinates. The operation involved are pretty simple, per pixel. You would only need to iterate a few fixed elements then, regardless of the number of colors or texture coordinates you had.
If two triangles share an edge, they share the exact same vertices and they are rounded to the same fixed-point coordinates. So how could this create cracks? Or are we talking about 'T-joints' between polygons?You do need to convert from floating-point to fixed-point (if that's what you're using), and if your source values lost some bits, you can't recover them by moving to fixed-point after the fact.
Isn't that solvable with some careful rounding?Plus, if you round wrong, you also get to extrapolate outside the triangle, which doesn't always produce pleasing results
The operations per pixel are simple, but still more than just an addition for regular interpolation, right? Although I guess it fits nicely with the multiply required for perspective correction. It's a tradeoff between higher setup cost with lower per-pixel cost and lower setup cost with higher per-pixel cost, right? And the latter is preferred nowadays to maximize parallelism?
They only ever did this for 3D clip planes scissoring and nearplane clipping were never done this way.
Actually I've never really understood how nvidia does their near plane clipping, since they don't have the pre divide by W values in the clipping unit. My assumption is that they must fudge the divide in someway to allow them to backout the value even in the case of a degeneracy. I do know that they do loose precision in the case of a near clip.
Not necessarily. Arjan gave an example in an earlier post. You do need to pick the right rasterization rules for matching up edges, but that's not strictly sufficient.If two triangles share an edge, they share the exact same vertices and they are rounded to the same fixed-point coordinates.
I see. But as I noted above there's only a maximum of 2 interpolants required per shader instruction, so two adders could do the job (or even one with some code restructuring) instead of full multiply-add units?It requires MAD operations. It's easier to iterate a fixed set of items and then a bunch of value based on them, than iterate a variable number of components.
I can't see how the amount of math is the same. I do understand that this is good for tiny triangles (around the size of a quad) where setup time is crucial.The amount of math is the same, but centralizing the rasterization is simpler.