Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 07-Sep-2006, 23:16   #51
Nick
Senior Member
 
Join Date: Jan 2003
Location: Montreal, Quebec
Posts: 1,878
Default

Quote:
Originally Posted by arjan de lumens View Post
If you are
  • converting your vertices to fixed-point before you start to compute the edge equation
  • and never throw away any low-order bits anywhere in the edge equation calculation
  • and do not perform any divisions when computing your edge equation (so that your equation ends up on the form a*x+b*y+c=0 and not x=a*y+b)
then you should be in the clear, with a 100% robust method (however you will then need to do clipping against the XY edges of the screen or a guard band).
Exactly.

Here's a small tutorial I wrote two years ago about a robust software rasterizer using (half-)edge functions: Advanced Rasterization. It's suited for resolutions up to 2048x2048 and 4-bit sub-pixel precision but can be extended to much higher resolutions (including a guard band) and sub-pixel precision with 64-bit. A hardware implementation could obviously use an arbitrary precision.
Nick is offline   Reply With Quote
Old 08-Sep-2006, 04:19   #52
ERP
Moderator
 
Join Date: Feb 2002
Location: Redmond, WA
Posts: 3,669
Default

Quote:
Originally Posted by arjan de lumens View Post
According to the OpenGL standard, clipping is to be done before the final per-vertex division-by-W; this way, a W=0 vertex doesn't cause clipping to blow up.

If you do not want to do geometric near-plane clipping at all, the alternative is to build into your rasterizer explicit support for rendering of "external triangles" (as described by the page nAo linked to near the beginning of this thread).
I know what the open GL specification is but the output of an NV2x vertex shader (the hardware version not what the D3D interface looks like) is actually
X/W Y/W (optionally)Z/W and W
If W is 0 or close to it and your using a standard divide all information about X and Y are lost since infinity doesn't tell you much. Although you might be able to get away with clamping based on sign to some large number for rasterization purposes and assuming any infinity is basically at the nearplane. But I'd have thought this would lead to noticeable texture crawl as triangles approach the nearplane.
ERP is offline   Reply With Quote
Old 08-Sep-2006, 16:27   #53
RoOoBo
Member
 
Join Date: Jun 2002
Posts: 305
Default

I actually 'hacked' a triangle setup algorithm running on the simulator unified shader (ARB instruction set) for a paper last year about mobile GPUs. With 32-bit fp it seemed to work relatively well for the Unreal Tournament trace I was using at resolutions of 320x240 and 640x480 ... until one explosion started to produce quite large glitches. It may had been a precission problem but I never got to find what was the real problem. You can check the simulator if you are interested, the hack is still there even if it hasn't been used or tested since then.

The algorithm used is the one based on Olano's paper. The inverse matrix is computed and from that matrix four linear equations are derived for the three edges and for Z. All other interpolators are computed in a much later stage using the barycentric coordinates (the results of the three edge equations for the fragment). No point in carrying all that info for fragments that are outside the triangle or fail Z before shading. All the emulation code related with triangle setup and fragment generation uses 64-bit fp just to be 'safe' (I didn't want to bother with fixed point arithmetic and speed is not a problem for the simulator).
RoOoBo is offline   Reply With Quote
Old 08-Sep-2006, 17:51   #54
Nick
Senior Member
 
Join Date: Jan 2003
Location: Montreal, Quebec
Posts: 1,878
Default

Quote:
Originally Posted by RoOoBo View Post
I actually 'hacked' a triangle setup algorithm running on the simulator unified shader (ARB instruction set) for a paper last year about mobile GPUs. With 32-bit fp it seemed to work relatively well for the Unreal Tournament trace I was using at resolutions of 320x240 and 640x480 ... until one explosion started to produce quite large glitches. It may had been a precission problem but I never got to find what was the real problem. You can check the simulator if you are interested, the hack is still there even if it hasn't been used or tested since then.
Was it a rasterization glitch (pixels showing up in the wrong places) or an interpolation glitch (funky colors)? I've seen a fair bit of both in my projects. If it's a rasterization glitch, beware of division by zero or denormals. This can happen fairly easily if you allow edge on triangles to be rasterized. Snap the coordinates to fixed-point and you can test for exact zero.
Quote:
The algorithm used is the one based on Olano's paper. The inverse matrix is computed and from that matrix four linear equations are derived for the three edges and for Z. All other interpolators are computed in a much later stage using the barycentric coordinates (the results of the three edge equations for the fragment). No point in carrying all that info for fragments that are outside the triangle or fail Z before shading. All the emulation code related with triangle setup and fragment generation uses 64-bit fp just to be 'safe' (I didn't want to bother with fixed point arithmetic and speed is not a problem for the simulator).
Interesting point about interpolating data for pixels outside the triangle or failing the z-test.

Using 64-bit floating point should indeed be fairly safe. It's pretty much equivalent to having a 53-bit fixed-point format with, well, a floating point. You can still do the 'snapping' though. It gets rid of nasty denormals and you can do an exact compare to zero.
Nick is offline   Reply With Quote
Old 08-Sep-2006, 20:38   #55
RoOoBo
Member
 
Join Date: Jun 2002
Posts: 305
Default

Quote:
Originally Posted by Nick View Post
Was it a rasterization glitch (pixels showing up in the wrong places) or an interpolation glitch (funky colors)? I've seen a fair bit of both in my projects. If it's a rasterization glitch, beware of division by zero or denormals. This can happen fairly easily if you allow edge on triangles to be rasterized. Snap the coordinates to fixed-point and you can test for exact zero.
Actually it was quite more visible than a few pixels. It was a large triangle or part of a triangle in orange color and untextured that showed in the middle of the explosion.
RoOoBo is offline   Reply With Quote
Old 08-Sep-2006, 22:53   #56
Rolf N
Recurring Membmare
 
Join Date: Aug 2003
Location: yes
Posts: 2,494
Default

Quote:
Originally Posted by RoOoBo View Post
Actually it was quite more visible than a few pixels. It was a large triangle or part of a triangle in orange color and untextured that showed in the middle of the explosion.
It probably was textured but due to failing texcoord interpolation a single texel got applied for the whole triangle.
Rolf N is offline   Reply With Quote
Old 08-Sep-2006, 22:56   #57
Rolf N
Recurring Membmare
 
Join Date: Aug 2003
Location: yes
Posts: 2,494
Default

Quote:
Originally Posted by ERP View Post
They only ever did this for 3D clip planes scissoring and nearplane clipping were never done this way.

Actually I've never really understood how nvidia does their near plane clipping, since they don't have the pre divide by W values in the clipping unit. My assumption is that they must fudge the divide in someway to allow them to backout the value even in the case of a degeneracy. I do know that they do loose precision in the case of a near clip.
Wouldn't "if w<1 then discard" work?

edited: No, it wouldn't, don't bother replying

edited again: a "regular" near plane is defined by znear, and because pst-transform w would only depend on post-transform z (if it depends on anything at all), there will certainly be a way to figure out the z/w at the near plane.

It should be possible to replace near-plane clipping by the comparison z/w to a precomputed z/w-near.

w=0 shouldn't (TM) happen too often. In real world terms that's at the center of the eye/camera, not at the near plane.

Last edited by Rolf N; 08-Sep-2006 at 23:05. Reason: Sudden, unexpected occurrence of certain thought processes
Rolf N is offline   Reply With Quote
Old 09-Sep-2006, 00:01   #58
arjan de lumens
Senior Member
 
Join Date: Feb 2002
Location: gjethus, Norway
Posts: 1,259
Default

Quote:
Originally Posted by zeckensack View Post
w=0 shouldn't (TM) happen too often. In real world terms that's at the center of the eye/camera, not at the near plane.
W=0 actually corresponds to a plane in the 3d space, not a point. This plane goes straight through the center-of-eye/camera and is parallel to the near-plane. These two planes are normally fairly close to each other, but they are in fact not the same plane. The usual formula for computing per-pixel Z results in 0.0 at the near-plane and minus infinity at the W=0 plane.

Getting W= exactly 0 for a vertex generally requires a 100% exact floating-point cancellation to appear somewhere in the modelview+projection transform. It is as such a rather unlikely case, but one that does need to be handled or avoided. It can be avoided with ordinary geometric near-plane clipping; handling it is a bit non-trivial, but not impossible.
arjan de lumens is offline   Reply With Quote
Old 09-Sep-2006, 21:48   #59
Nick
Senior Member
 
Join Date: Jan 2003
Location: Montreal, Quebec
Posts: 1,878
Icon Idea

Looking at the algorithm for doing setup with a matrix inverse, and how it depends solely on the vertex positions, I started wondering whether triangles that share an edge could have similar coefficients. Say we have a triangle strip, do we have to recompute the entire matrix inverse for each triangle, or can we use part of the previous one? This corresponds with changing one column in the matrix that needs to be inverted, in a cyclic manner.

After half a day of exercising my linear algebra skills, I found that it takes 20 multiplications to compute the new inverse from the previous one, versus 30 multiplications to do a full matrix inverse. It requires 16 additions versus 11 though, and still needs one reciproke. Either way, it's a nice tradeoff, especially when already using triangle strips exclusively.

Any chance this is also used by any hardware?
Nick is offline   Reply With Quote
Old 10-Sep-2006, 21:01   #60
RoOoBo
Member
 
Join Date: Jun 2002
Posts: 305
Default

Quote:
Originally Posted by Nick View Post
After half a day of exercising my linear algebra skills, I found that it takes 20 multiplications to compute the new inverse from the previous one, versus 30 multiplications to do a full matrix inverse. It requires 16 additions versus 11 though, and still needs one reciproke. Either way, it's a nice tradeoff, especially when already using triangle strips exclusively.
Well, it seems like most games aren't using triangle strips. In a study we made of OGL and D3D games only Oblivion seemed to masively use triangle strips (75% of the benchmarked geometry).
RoOoBo is offline   Reply With Quote
Old 10-Sep-2006, 22:38   #61
Nick
Senior Member
 
Join Date: Jan 2003
Location: Montreal, Quebec
Posts: 1,878
Default

Quote:
Originally Posted by RoOoBo View Post
Well, it seems like most games aren't using triangle strips. In a study we made of OGL and D3D games only Oblivion seemed to masively use triangle strips (75% of the benchmarked geometry).
Yeah, an alternative that would also work for indexed geometry is to keep a small cache of cross product coefficients from vertex pairs forming edges. These can then be used to compute the matrix inverse the classical way. In the best case, when the cross products for two edges are already in the cache, it would be even faster than the approach above.

But that's just theoretical...
Nick is offline   Reply With Quote
Old 11-Sep-2006, 09:32   #62
Simon F
Tea maker
 
Join Date: Feb 2002
Location: In the Island of Sodor, where the steam trains lie
Posts: 4,447
Default

Quote:
Originally Posted by Nick View Post
Yeah, an alternative that would also work for indexed geometry is to keep a small cache of cross product coefficients from vertex pairs forming edges. These can then be used to compute the matrix inverse the classical way. In the best case, when the cross products for two edges are already in the cache, it would be even faster than the approach above.
You also get to the situation where the storage costs for the computed edge values are greater than the hardware needed to recompute those values.
__________________
"Your work is both good and original. Unfortunately the part that is good is not original and the part that is original is not good." -(attributed to) Samuel Johnson

"I invented the term Object-Oriented, and I can tell you I did not have C++ in mind." Alan Kay
Simon F is offline   Reply With Quote
Old 11-Sep-2006, 14:09   #63
Nick
Senior Member
 
Join Date: Jan 2003
Location: Montreal, Quebec
Posts: 1,878
Icon Idea

Quote:
Originally Posted by Simon F View Post
You also get to the situation where the storage costs for the computed edge values are greater than the hardware needed to recompute those values.
That's exactly why I added that it's just theoretical.

I think I did find a more practical result though: To compute the classical matrix inverse, a division by the deteminant is required. This takes 1 reciproke, 12 multiplications and 2 addtions. I believe this can be eliminated altogether. Without dividing by the determinant d, instead of computing u/w and 1/w we'd be computing d*u/w and d/w. Per pixel we're computing (d*u/w)/(d/w) = u.

Looks like a win-win to me...
Nick is offline   Reply With Quote
Old 11-Sep-2006, 14:11   #64
Simon F
Tea maker
 
Join Date: Feb 2002
Location: In the Island of Sodor, where the steam trains lie
Posts: 4,447
Default

Quote:
Originally Posted by Nick View Post
That's exactly why I added that it's just theoretical.

I think I did find a more practical result though: To compute the classical matrix inverse, a division by the deteminant is required. This takes 1 reciproke, 12 multiplications and 2 addtions. I believe this can be eliminated altogether. Without dividing by the determinant d, instead of computing u/w and 1/w we'd be computing d*u/w and d/w. Per pixel we're computing (d*u/w)/(d/w) = u.

Looks like a win-win to me...
What about back-face culling? You still have to compute most of it.
__________________
"Your work is both good and original. Unfortunately the part that is good is not original and the part that is original is not good." -(attributed to) Samuel Johnson

"I invented the term Object-Oriented, and I can tell you I did not have C++ in mind." Alan Kay
Simon F is offline   Reply With Quote
Old 11-Sep-2006, 14:46   #65
Nick
Senior Member
 
Join Date: Jan 2003
Location: Montreal, Quebec
Posts: 1,878
Default

Quote:
Originally Posted by Simon F View Post
What about back-face culling? You still have to compute most of it.
Not "most", it only takes three multiplications and two additions, but yes I forgot to include it. So in total we can save 1 reciproke and 9 multiplications. Still not bad in my opinion, and it's beneficial for precision as well.
Nick is offline   Reply With Quote
Old 11-Sep-2006, 15:45   #66
Simon F
Tea maker
 
Join Date: Feb 2002
Location: In the Island of Sodor, where the steam trains lie
Posts: 4,447
Default

Quote:
Originally Posted by Nick View Post
Not "most", .
Sorry, I meant to add "of the determinant".
__________________
"Your work is both good and original. Unfortunately the part that is good is not original and the part that is original is not good." -(attributed to) Samuel Johnson

"I invented the term Object-Oriented, and I can tell you I did not have C++ in mind." Alan Kay
Simon F is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 00:35.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.