NVIDIA Fermi: Architecture discussion

Couldn't you do the same with a more powerful triangle setup engine and a little bit of caching? Should be reasonably efficient for tessellated triangles, at least, as neighboring triangles should be near one another in screen space.

Making one stage of the pipeline (setup engine) more powerful will not save the wastage in another part of pipeline (fragment shader).

In non-tesselated scenes, this wastage is large enough to make deferred shading a win on many occasions.

When you start tessellating, the geometry count increases by an order of magnitude. And triangles become smaller and due more natural/curved geometry, occlusion increases.

In the Heaven bench, you may have noticed how almost all pebbles on the road are now curved. This means many fragments generated by the stones in the back are now occluded. Without tesselation, it was just a flat quad, texture mapped to fake it into looking real.

In other tessellation demos too, you will notice that often pixel sized triangles are being generated (in the dragon example for instance). And pixel sized triangles have 4x PS cost.

Due to increased wastage due to these reasons, deferred rendering makes more sense for tesselated scenes. But the crossover point (ie where it is the de facto standard) may not have arrived yet.
 
In real dx11 games u will have some options to limit tesselation and pixel sized triangles on different hardware. Also if u play at lower resolutions u will find more pixel sized triangles than at higher.
In the AVP techdemo http://www.youtube.com/watch?v=HFp9RP49qr8 they have tesselation off and than tesselation high which turns the aliens wireframe almost full white(its just a 480p video). If they have off and high than they will have surely low and medium tesselation setings so the dynamic lod tesslation will grow by smaller tesselation levels (and maybe use lower resolution height map) ;).
AVP is using deferred rendering too btw.
 
Well, I suppose I'd have to look more into the difficulties here, but I would tend to expect that with the limitations placed by tessellation in the first place, there's at least a possibility that this could be extended to tessellated triangles.
Having the domain shader makes that tough... they could do anything there even for neighbouring vertices and there's no guarantees without comparing the outputs.

However, as a small addendum, let me add that tessellation might actually reduce the problem of having too many very small triangles, because it also allows for lower level of detail for far-away objects. You'd only have more of a problem with small triangles if you use a very high level of detail.
Yeah that's definitely true, hence my comments in the Larrabee thread about that being the most important use of tesselation (LOD/reducing detail at distance). The numerous other advantages of deferred rendering still apply though so I suspect we'll see it continue to rise in usage, albeit in various forms. FWIW almost all of the AAA games that I know of in development use at least semi-deferred rendering.
 
Having the domain shader makes that tough... they could do anything there even for neighbouring vertices and there's no guarantees without comparing the outputs.
Hmmm, okay, that makes some sense.

Yeah that's definitely true, hence my comments in the Larrabee thread about that being the most important use of tesselation (LOD/reducing detail at distance). The numerous other advantages of deferred rendering still apply though so I suspect we'll see it continue to rise in usage, albeit in various forms. FWIW almost all of the AAA games that I know of in development use at least semi-deferred rendering.
Fair enough.
 
csaa.png


Can someone please shed some light on exactly how alpha-to-coverage determines which fragments are included/excluded during the write to the framebuffer? For example, in the basic case of 4xAA, if alpha comes out to 0.5 how does the GPU determine which 2 samples to ignore? It seems to me that the problem would only get worse with Nvidia's proposed CSAA extension to alpha-to-coverage. With 32 coverage samples and a single alpha value how do they know which samples to discard?
 
csaa.png


Can someone please shed some light on exactly how alpha-to-coverage determines which fragments are included/excluded during the write to the framebuffer? For example, in the basic case of 4xAA, if alpha comes out to 0.5 how does the GPU determine which 2 samples to ignore? It seems to me that the problem would only get worse with Nvidia's proposed CSAA extension to alpha-to-coverage. With 32 coverage samples and a single alpha value how do they know which samples to discard?
Edit: looked it up. Basically you just perform the alpha test on a per-subsample level, and write the pixel samples if and only if both the alpha and z tests pass.
 
Edit: looked it up. Basically you just perform the alpha test on a per-subsample level, and write the pixel samples if and only if both the alpha and z tests pass.

Do you have a link? I don't think that's accurate. The shader (and texture lookup) runs only once per fragment and the single output alpha value is shared by all subsamples of that fragment. To get per subsample alpha values you would have to essentially supersample the polygon (i.e TRSSAA). And that still wouldn't completely solve the problem of determining the coverage mask for the CSAA samples.
 
Do you have a link? I don't think that's accurate. The shader (and texture lookup) runs only once per fragment and the single output alpha value is shared by all subsamples of that fragment. To get per subsample alpha values you would have to essentially supersample the polygon (i.e TRSSAA). And that still wouldn't completely solve the problem of determining the coverage mask for the CSAA samples.
Right, so this is why it requires software support.
 
Right, so this is why it requires software support.
Alpha-to-coverage doesn't need software support, it's a standard OpenGL feature. The HW likely discards the same samples for each different coverage mask, unless some sort of dithering takes place. I can't recall off the top of my head if AMD's chips use a dithering pattern to reduce banding, but I think it's likely.
 
Alpha-to-coverage doesn't need software support, it's a standard OpenGL feature. The HW likely discards the same samples for each different coverage mask, unless some sort of dithering takes place. I can't recall off the top of my head if AMD's chips use a dithering pattern to reduce banding, but I think it's likely.

So within a given pixel it just randomly selects samples? Lame. :) Guess there's no practical way to do it though since you don't have geometric coverage info.
 
Alpha-to-coverage doesn't need software support, it's a standard OpenGL feature. The HW likely discards the same samples for each different coverage mask, unless some sort of dithering takes place. I can't recall off the top of my head if AMD's chips use a dithering pattern to reduce banding, but I think it's likely.
But if it only uses one alpha value from the texture, it isn't going to be able to do anything but have either 0% or 100% coverage.
 
But if it only uses one alpha value from the texture, it isn't going to be able to do anything but have either 0% or 100% coverage.
That's not true. Alpha-to-coverage means exactly what it says. Thus when alpha = 0.5, then you have half coverage, i.e. half your samples would be on and half would be off. Normally, you would turn off alpha-test when using alpha-to-coverage, but the two can be used together.

trinibwoy said:
So within a given pixel it just randomly selects samples? Lame. Guess there's no practical way to do it though since you don't have geometric coverage info.
How do you figure random? Dithering is not random at all and without dithering, you'd see a pretty annoying banding pattern corresponding to whether 1 sample, 2 samples, 3 sample, etc. were covered. Since you're disabling samples, any pixel with less than full coverage is transparent since some samples will be disabled. That can be an annoying artifact.
 
The dithering isn't random but the approach to determining which samples are disabled along the alpha spectrum isn't based on any sort of edge detection algorithm or intelligent process. I guess random isn't the right word, arbitrary is more like it.
 
That's not true. Alpha-to-coverage means exactly what it says. Thus when alpha = 0.5, then you have half coverage, i.e. half your samples would be on and half would be off. Normally, you would turn off alpha-test when using alpha-to-coverage, but the two can be used together.
Ahh, right, I remember now.
 
The dithering isn't random but the approach to determining which samples are disabled along the alpha spectrum isn't based on any sort of edge detection algorithm or intelligent process. I guess random isn't the right word, arbitrary is more like it.
Edge detection isn't necessarily possible as lots of pixels could be transparent, i.e. a whole region.
 
That article makes a distinction between deferred shading and deferred lighting, not deferred rendering and deferred shading :)

Andrew touched on it earlier but I take deferred rendering to mean the TBDR style, where geometry is binned/sorted before any sort of per-pixel work is done. I assumed this is what mczak was referring to because that's the only reason that memory usage would become a concern with tessellation enabled. Deferred shading on the other hand involves writing per-pixel inputs to the G-buffer, the size of which is dependent on resolution and not geometric complexity.
 
Back
Top