Kyro's parallel z-check units

nAo

Nutella Nutellae
Veteran
I read many times on this board that Kyro can perform 32 z-checks per clock per polygon (or primitive). How can it be possible? We know that z(x,y), where x and y are screen space coordinates, needs a divsion per pixel in order to be evalauted.
Having 32 hw dividers (or reciprocal units) on chip seems a bit unrealistic to me. Even evaluate z(x,y) via interpolation should be a hard task, cause it would need a hyperbolic interpolation. A solution could be to evalaute 1/z(x,y), it just needs a linear interpolation and can be done with just an addition. So Kyro could sort all the primitives using a internal w-buffer instead.It could be converted to a 'real' z-buffer only when it's needed. Another question: how can be all those 32 pixel checks 'extracted' from the area covered by a primitive in a tile? Does it check a scanline per clock? If this is true Kyro would need N clocks (N<=16) to check a primitive against a tile, giving a 32x16 tile size. Do all these words make any sense? :smile:

ciao,
Marco

<font size=-1>[ This Message was edited by: nAo on 2002-02-21 15:45 ]</font>
 
Another question: how can be all those 32 pixel checks 'extracted' from the area covered by a primitive in a tile? Does it check a scanline per clock? If this is true Kyro would need N clocks (N&lt;=16) to check a primitive against a tile, giving a 32x16 tile size. Do all these words make any sense? :smile:
Yes, it checks one scanline of a tile per clock. I don't know how it exactly works, but maybe you should look for information on "infinite planes".
 
Which is what every card does...
Sure, but every card have to do perspective corrected interpolations of many quantities, so it has to calc Z each pixel in order to perform such interpolations and to update the z-buffer.I believe it's not a big issue if the hw have to do a reciprocal once per pixel and multiply it for each parameter, like textures coords and so on...

ciao,
Marco

<font size=-1>[ This Message was edited by: nAo on 2002-02-21 16:09 ]</font>
 
nAo
We know that z(x,y), where x and y are screen space coordinates, needs a divsion per pixel in order to be evalauted.

actually we don't know that ;) the perspective transform preserves the linearities, so we don't need to divide the z for perspective correction. IOW, z in screen space is linearly interpolated (oth, w is not)
 
The perspective transform preserves the linearities, so we don't need to divide the z for perspective correction. IOW, z in screen space is linearly interpolated (oth, w is not)

Obviously there is a misunderstanding here. In my notation z(x,y) isn't screen-space z, but the 3D non homogenous world space z coordinate. Too make it short, that's the value that is written to z-buffer :smile:

ciao,
Marco
 
Let's make this clear.

There are world-space coordinates (say ws_x, ws_y, ws_z).
The ws vector is transformed to camera space, using the view matrix. The camera space coordinates are (cs_x, cs_y, cs_z).

Then you do the projective transformation which looks somewhat like this:
p_x = P(0,0) * cs_x / cs_z
p_y = P(1,1) * cs_y / cs_z
p_z = (P(2,2) * cs_z + P(3,2)) / cs_z
p_w = 1 / cs_z
Where P* are constants. (This can be more complicated, but it's enough).

Note that p_z can be linear or reciprocial with cs_z depending on how you set up the constants. p_z is _linearly_ interpolated troughout the primitives and is the value used for the Z-buffer.

So the constants are usually set up to have a reciprocial of camera space z. The usual setup is
P(2,2) = far_plane/(far_plane - near_plane)
P(3,2) = -P(2,2)*near_plane
This will result p_z == 0 when cs_z == near_plane and p_z == 1 when cs_z == far_plane.

The p_w is _not_ linearly interpolated and is the value used to perspective correct the textures. You can use p_w in the depth-buffer (W-buffering) but this is deprecated with newer cards.

Edit: The last paragraph is not true :smile: See following posts.

<font size=-1>[ This Message was edited by: Hyp-X on 2002-02-22 11:34 ]</font>
 
mind you, as a rule w is cs_z (after the projection matrix multiplication, that is) and after the perspective divide at the end of the perspective transform (i.e. &lt;x, y, z, w> / w) it gets 1 (&lt;- clipping is usually performed at this stage). it's another matter that the hw takes the vertex with the reciprocal w from after the projection mat. multiplication - that way it can linearly interpolate from 1/w0 to 1/w1 in a span, and then use multiplication instead of division for the perspective correciton of the tex. coords.
 
The p_w is _not_ linearly interpolated and is the value used to perspective correct the textures. You can use p_w in the depth-buffer (W-buffering) but this is deprecated with newer cards.

U're right about p_w being used to perspective correct parameters, imho, u're wrong on p_w not being linearly interpolated. p_z it's just a + b*p_w, so if p_z can be linearly interpolated (in the 1/z variable of course) obviously even p_w can, but that is not clear in your formulas.
Let's make clear this stuff once for all (I should have done it in my first post, my mistake):

Take a parameter G you u want to interpolate along a triangle. In camera space (as u call it) u have:

G = a*cs_X + b*cs_Y + c*cs_Z

In this way G is a linear function of camera space coordinates (cs_X,cs_Y,cs_Z)
To find a,b,c you just assign to each triangle vertex a value for the parameter G.
So u have (G0,G1,G2) = (a,b,c)*T , where T is a matrix build with vertex coordinated as coloumns (please don't mind if I dont make any notational difference about co-variant and contro-variant coordinates :smile:)
Invert that relation and u obtain: (a,b,c) = (G0,G1,G2)*T^-1
Now we know how to find interpolation parameters.
Back to our 3D parameter in camera space, we can move it from 3D space to 2D homogeneous space:
G = a*x+ b*y + c*w (x = cs_X, and so on..)

Now project it:
G/w = a*X + b*Y + c

And G/w is a linear function of 2D screen coordinates, that 1/w it's just 1/cs_Z, no matter how u call it :smile:
This is what I was talking about when I said one can interpolate a parameter on a scanline with just an addition (So doing 32 checks per clock in this way should be possible in hw :smile: )
Obviously, if u want to evalaute cs_Z, u have to think G as costant parameter along the triangle. So (G0,G1,G2) can be just (k,k,k), and calculations are simple taking it as (1,1,1).U find a,b,c and use them to calc 1/w in linear fashion:

1/w = a*X + b*Y + c

so u have to divide for this to obtain the right perspective corrected value for a parameter G = (G/w)/(1/w)
I made I mistake in my first post, cause I have messed with z and w-buffer definitions, obviously z-buffer is a function of 1/z and so Kyro, imho, can find visibile pixels without have to calc Z coordinate for each (X,Y) screen projected coordinates, cause division by 1/w would be needed only in the texture mapping phase.
Sorry if I have not put in it near or far clip planes..but that's just a matter of additive or multiplicative costants here and there, so it's not interesting when one have to look at derivatives to find if there is a linearity as to some parameter :smile:

ciao,
Marco

<font size=-1>[ This Message was edited by: nAo on 2002-02-22 09:49 ]</font>
 
Thanks for the clarification.

Yes, I made a mistake with 1/w. So 1/w is linearly interpolated and a per pixel division is needed for perspective correction. And I guess (might be wrong) that w=(1/(1/w)) is used for w buffering. That's why it's not HSR friendly to use w buffering (too slow to calculate in large amounts).
 
doh, i don't believe i actually wrote this:
[snip] that way it can linearly interpolate from 1/w0 to 1/w1 in a span, and then use multiplication instead of division for the perspective correciton of the tex. coords.

of course, multiplication cannot be used instead of division in perspective correction, as the whole point is to inverse-map the u/w and v/w tex. coords (linearly-interpolated in screen-space) back to their nominal u, v values. so, at each step of the tex. map interpolation, u_i/w_i, v_i/w_i and 1/w_i get lieanrly-interpolated, and then division by 1/w_i is performed to get the nominal u_i and v_i. no multiplication optimization possible on this account.

Hyp-X:
I guess (might be wrong) that w=(1/(1/w)) is used for w buffering.

actually i believe 3dfx's voodoos use 1/w.
 
To nAo,

In a nutshell, screen space Depth - be it 1/w (== d/camera_z for some +ve scalar "d") or some other linear function, eg = a / z + b - is linearly interpolated across pixels on all the leading systems. It's this "screen space" value, not the world/camera value, that is stored in the depth buffer.
 
That was already made clear, Simon. I corrected myself later in the thread.
 
nAo,
Sorry I'd originally intended to put in some maths to explain why linearly interpolating 1/Z worked, but then changed my mind.
 
Back
Top