Tri Setup, Tessellation, GS, and the Future

I'm bringing up a few topics here which I really have not seen any good information on the B3D forums. With DX11, one would think that tessellation (and the geometry shader) would place a large pressure on triangle setup of which performance is currently directly tied to GPU clock rate (ie a serial process).

Can we expect triangle setup to go parallel in GPUs for DX11 time frame?

And if triangle setup does increase in performance, will this performance increase be tied to a mandatory use of tessellation or could it also apply to standard triangle rendering from vertex and index buffers?

With the NVidia's 200 series cards, geometry shader has become fast enough to be useful in various situations (ie see a performance increase with GS usage on both ATI and now NVidia hardware). Many great examples of this can be found in ATI's March of The Froblins Paper, such as using stream out as a filtering operation, or using geometry shader + stream out + AutoDraw calls to do an efficient GPU side binning (later used for spatial queries). I'd like to keep any discussion with regards to opinion of GPGPU, GS, or stream out usefulness for some other thread and instead get into more of the details of possible hardware implementation of GS and stream out.

GS and stream out is ultimately tied to triangle setup performance (need to setup triangles or points to make use of that output). Fixed size data expansion seems now as fast as can be expected. But what about variable sized data expansion, or data reduction?

When a geometry shader does not output to the max primitives it was setup for, do the missing primitives end up costing as degenerate triangles (ie a clock of triangle setup)? Or does it do some parallel stream compaction (in hardware) for you prior to triangle setup?
 
1)Triangle setup going parallel will also require bigger fifos and perhaps parallel rasterization (not mandatory, but probably useful. when you have gazilions of small triangles you really don't care about fillrate).

2) I also don't see why a faster triangle setup should only be tied to tesselation.

3) In some GPUs degenerate triangles are not handled by the setup unit (ie they are culled before that stage)
 
Back
Top