Pixel or Vertex more important, looking forward?

Vertex or Pixel Shading Prowess of greater importance?

  • Pixel Shading

    Votes: 0 0.0%
  • Balance between the two

    Votes: 0 0.0%
  • or, like me, have no clue

    Votes: 0 0.0%

  • Total voters
    232

OICAspork

Newcomer
Looking through the reviews of the X700, I've found that one of the key architectural differences between it and the 6600 line is their respective pixel and vertex shading abilities. The Tech Report, in particular discussed this.

Where the Nvidia line dominates in pixel shading:
3dm-ps20.gif


The X700 owns the vertex shading tests with six vertex shaders to the 6600s 3:
3dm-vertex.gif


I'm curious which discipline will be more important to overall performance as we move into the future. From the benchmarks released so far it appears that for the current generation, NVidia made the right decision. Any thoughts from more enlightened enthusiasts?

BTW, I'm really looking forward to Beyond3D's X700 article, as I suspect it will probably shed some light on this issue due to their incredible thoroughness.
 
I have voted vertexshaders, for these reasons:

1) Polycount can never be high enough. Normalmapping is a nice trick to fake high detail, but the silhouettes are still lowpoly, which means that geometry-based things like shadows are still relatively low-quality, and give weird 'bugs' compared to high detail implied by the lighting.

2) Vertexshaders are invaluable for highly realistic animation of skinned characters, water, hair, etc, etc... I think there is still plenty of room for improvement in this area.

3) A lot of techniques require many renderpasses (eg shadowmaps, dynamic envmaps, deferred shading), where most passes are not fillrate-limited, because either the resolution of the target is relatively small, or the per-pixel operations are very simple. The geometry needs to be processed fully for every pass ofcourse, so more/faster vertexshaders could greatly improve performance.

I agree however that for current software, the 6 vertexunits are massive overkill, as I mentioned elsewhere. But it wouldn't be the first time that NV made the right hardware for the current generation of software, which completely backfired. Just think of the FX series. Wonderful cards if you only run ps1.x software... problem is that many games moved to ps2.0 because of the Radeon 9700.
 
Depends on the application, but for PC applications in the shortterm, the pixel shaders are probably the most important because of the high resolutions and relatively low polygon counts. Assuming the shaders are complex enough to avoid fill limits (which is probably not true in a lot of older games).

At the end of the day, you have to have the right balance (which is what I voted for) and different applications will stress different parts of the pipeline, which is one of the reasons a unified architecture is appealing.
 
From what I've read over time, it seems it'd depend on the target market. To paraprhase someone whose name I don't remember, computer games are all about making pixels look better (pixel shaders) and workstation graphics is all about more polygons and more precise polygons (vertex shaders).

Anyway, I wouldn't say nvidia dominates pixel shading based off one test from one application. I'd be insterested in seeing Shadermark 2.1 benches of X700's and 6600's. Hexus used 2.0, but the nvidia cards don't work in about 1/3 of the tests so it's kinda hard to get an accurate picture.
 
Food for thought:

- unified shaders will free the hw vendors from the decision :)

- PRMan does per-vertex shading; but every kind of geometry (even flat polygons) will be tesselated until a given sub-pixel limit (shader supersampling is performed by using more vertices). Thus no interpolation is performed between the indivudal vertices. And shading is performed before the rasterization.

- until we have the ability to tesselate to micropolygons and displace them, we will still be better off with normal mapping. Just look at the UE3 techdemo, to see how far you can get with polycounts that remain to be relatively low (6000-10000 is nothing).
Note that there is a very real limit on how much geometry detail is practical - skinning a 100.000 poly model will take ages, both for the character rigging TD and for the GPU calculating it. The solution is to build a control cage for a HOS surface (for example subdivision surface), skin it, and perform the tesselation after the skinning (or cloth/softbody dynamics simulation). But tesselation won't insert additional shading detail, it'll only make the surface smooth, so you either have to put on a normal map, or displace the vertices. But displacement looks ugly if your polygons are larger then a pixel...
 
Laa-Yosh said:
- PRMan does per-vertex shading; but every kind of geometry (even flat polygons) will be tesselated until a given sub-pixel limit (shader supersampling is performed by using more vertices). Thus no interpolation is performed between the indivudal vertices. And shading is performed before the rasterization.

That doesn't sound like how I understood PRMan's REYES renderer.
The way I understood it, the subdivision is done first, then when a triangle is < 0.25 pixels large (a so-called micropolygon), it is rendered as a single pixel with the shader (using the triangle normal, so essentially it performs flat-shading). This means that the vertices aren't used in the shading process at all, they are just temporary data during subdivision (which is a form of interpolation).
I am not even sure if you can call this process rasterization.

But displacement looks ugly if your polygons are larger then a pixel...

I'm not sure if I understand what you mean. Displacing vertices won't change anything drastic, right? A continuous mesh will still be continuous, it would just have a more detailed surface. Or perhaps you mean sampling problems? But that would depend more on the vertex-to-displacementmap-elements ratio?
 
The original Reyes algorythm does subdivision in screenspace prior to displacement, so final micro polygons can be bigger than 1/4 or even one pixel. They added subdivision control to solve this problem (you explicitly subdivide something more).

micropolygons are rendered as bilinear quads.
 
Laa-Yosh said:
- unified shaders will free the hw vendors from the decision :)
That's what I was thinking. Looking forward, does it matter? ATi and nVidia will just update their drivers to have a different shader allocation schemes with new games.
 
Inane_Dork said:
That's what I was thinking. Looking forward, does it matter? ATi and nVidia will just update their drivers to have a different shader allocation schemes with new games.

I think it will be more like CPUs, where the allocation is done in realtime. This way, the best approach is probably to allocate any idle units to whatever is required next.
In the most extreme case, you'd use all units as vertex-shader units, which means you'll effectively have more vertex processing power than today. Whether developers will actually go this direction, remains to be seen. My 3 points above assume that they will, because I think it makes sense to go there.
 
Scali said:
The way I understood it, the subdivision is done first, then when a triangle is < 0.25 pixels large (a so-called micropolygon), it is rendered as a single pixel with the shader (using the triangle normal, so essentially it performs flat-shading). This means that the vertices aren't used in the shading process at all, they are just temporary data during subdivision (which is a form of interpolation).

Let me quote Tony Apocada from the Advanced Renderman book:

"Dicing converts the small primitive into a common data format called a grid. A grid is a tesselation of the primitive into a rectangular array of quadrilateral facets known as micropolygons (because of the geometry of the grid, each facet is actually a tiny bilinear patch, but we call it a micropolygon nonetheless). The vertices of these facets are the points that will be shaded later... generally, the facets will be in the order of one pixel in area.

...the ShadingRate of an object refers to the frequency with which the primitive must be shaded (actually measured by sample area in pixels). For example, a shadingrate of 1.0 specifies one shading sample per pixel. (BTW a shadingrate of 0.5 means not 2, but 4 samples! And it's not adaptive, but it's stochastic sampling. -LY) In the Reyes algorythm, this constraint translates into micropolygon size. During the dicing phase, an estimate of the raster space size of the primitive is made, and this number is divided by the shadingrate to determine the number of micropolygons that must make up a grid. "


In short, PRMan first tests the bounding box of each object, and if they're too smal, they get split into smaller parts, usually by parametric edges; if they're unseen, they're culled.
Once the splitting loop is done, each primitive is processed independently. They're diced up into grids, and then PRMan shades one grid at a time, using SIMD rendering, to conserve memory. Grids are first displaced, then the surface shader gets evaluated and surface color and opacity is assigned to each vertex. Hidden surface removal follows, by tearing the grid apart to individual micropolygons; then they get bounded and visibility-tested and perhaps culled. Each micropoly vertex is individually tested for the predetermined sampling points of every pixel that might contain it, and the color for the samples may be interpolated or not. Sample points gather visible point data and combines them for the final pixel using a reconstruction filter.

But displacement looks ugly if your polygons are larger then a pixel...

I'm not sure if I understand what you mean. Displacing vertices won't change anything drastic, right? A continuous mesh will still be continuous, it would just have a more detailed surface. Or perhaps you mean sampling problems? But that would depend more on the vertex-to-displacementmap-elements ratio?

First, think about how you're displacing: with a texture map. It's obvious that if you have less vertices than used pixels in the texture, then you loose data. So for the 2K textures commonly used in UE3, you'd require 2048*2048*.9 (UV space efficiency) = 3.77 million vertices to display all the data. And this might not be an ideal distribution of the geometry; and several parts of the model might be using the same UV space, thus requiring more vertices to display the displacement properly.

But even such a monstrous amount of geometry might not be enough for closeup views when you have high frequency data in the displacement map. We're talking here about wrinkles, pores, scales etc. which can already be displayed with normal maps. For such detail you have to subdivide to micropolygons; and for proper antialiasing you need more than 1 sample per pixel.

Of course there are uses for displacement mapping that does not require detailed geometry, like simulating an earthquake on a ground plane or such. But to add detail to models, you're pretty much required to tesselate to at least 1 vertex per texel.

(Another solution might be to combine displacement and normal mapping, by using a low frequency 256 or 512 displacement map and a high frequency 2K normal map - but there are surely some complications, at least during the content creation phase. And even this solution requires a few hundred thousand polygons at least...).
 
Scali said:
Inane_Dork said:
That's what I was thinking. Looking forward, does it matter? ATi and nVidia will just update their drivers to have a different shader allocation schemes with new games.

I think it will be more like CPUs, where the allocation is done in realtime. This way, the best approach is probably to allocate any idle units to whatever is required next.
I would hope the allocation is real time as well. But I'm also guessing that the allocation scheme will be updated frequently in drivers. And I'm also guessing they might tweak the schemes based on prior knowledge of the game's demand.

Ideally, yes, we wouldn't need that. But the ideal graphics driver has not been made yet.
 
David Kirk told our very own B3D that a unified architecture isnt on the cards for NVIDIA in the short term.

As we can see, despite the closeness of the Shader Model 3.0 Pixel and Vertex Shader sets, NVIDIA have opted to stick with distinct Vertex and Pixel shader engines, as opposed to a unified Vertex and Pixel Shader ALU (Arithmetic Logic Unit) structure. In fact when we asked David Kirk about the potential use of a unified structure he suggested that, as far as NVIDIA are concerned, this wasn't a route they are pursuing as it has performance implications such as thrashing caches - which raises the question whether they are looking at a unified pipeline structure even beyond NV4x.

ATI's presentation pretty much prooves they are going there though. I think NVIDIA is betting on the wrong horse here, a unified architecture doesnt present insurmountable problems IMO ... just challenges. Efficiency on an ideal workload for a non unified architecture is nice and all, but no real workload is ideal ... the utilization on real workloads leaves a lot of wiggle room for caches and other extra hardware for an uniform architecture.
 
Inane_Dork said:
I would hope the allocation is real time as well. But I'm also guessing that the allocation scheme will be updated frequently in drivers.
I'm assuming you're talking about dynamic allocation of a "unified" pipeline to process either vertex or pixel data? Well, if this is the case, then there's really not much need to make this allocation choice anything complex, and if the queueing/switching structure implemented in hardware

One could, for instance, merely have one quad of pipelines act upon one stream of data at a time. For example, one triangle goes in, all vertex calculations and then all pixel calculations on that triangle are done, and you output the results.

Or, alternatively, you could simply have an input queue, a loopback device (a way for the output of the pipeline to be re-inserted into the queue), and a way to quickly and easily switch between vertex and pixel processing (some sort of two-state pipeline system). This might be more amenable to cache coherency and, therefore, memory bandwidth usage, as all pipelines could possibly share the same caches more easily.

But, regardless, I guess my point is that the unification could be done in such a way that it's simple enough to not really require any driver modification. You may want to modify the caching routines in the driver, but that's about it....
 
I like Chanoth's thinking and was a bit surprised by Mfa's note on Nvidia thinking - how recent is that?

One more question, could you profitabily move any more of the 3d pipeline onto dedicated hardware, or are we at the limit?
 
Laa-Yosh said:
Let me quote Tony Apocada from the Advanced Renderman book:

"Dicing converts the small primitive into a common data format called a grid. A grid is a tesselation of the primitive into a rectangular array of quadrilateral facets known as micropolygons (because of the geometry of the grid, each facet is actually a tiny bilinear patch, but we call it a micropolygon nonetheless). The vertices of these facets are the points that will be shaded later... generally, the facets will be in the order of one pixel in area.

...the ShadingRate of an object refers to the frequency with which the primitive must be shaded (actually measured by sample area in pixels). For example, a shadingrate of 1.0 specifies one shading sample per pixel. (BTW a shadingrate of 0.5 means not 2, but 4 samples! And it's not adaptive, but it's stochastic sampling. -LY) In the Reyes algorythm, this constraint translates into micropolygon size. During the dicing phase, an estimate of the raster space size of the primitive is made, and this number is divided by the shadingrate to determine the number of micropolygons that must make up a grid. "


In short, PRMan first tests the bounding box of each object, and if they're too smal, they get split into smaller parts, usually by parametric edges; if they're unseen, they're culled.
Once the splitting loop is done, each primitive is processed independently. They're diced up into grids, and then PRMan shades one grid at a time, using SIMD rendering, to conserve memory. Grids are first displaced, then the surface shader gets evaluated and surface color and opacity is assigned to each vertex. Hidden surface removal follows, by tearing the grid apart to individual micropolygons; then they get bounded and visibility-tested and perhaps culled. Each micropoly vertex is individually tested for the predetermined sampling points of every pixel that might contain it, and the color for the samples may be interpolated or not. Sample points gather visible point data and combines them for the final pixel using a reconstruction filter.

So it depends on how you name things. The vertices that it performs shading on, aren't the vertices from the model itself. Then again, the vertices aren't exactly pixels either.
So their shader is not the vertexshader as we know it in Direct3D, but it is not a pixelshader either.
In fact, if I understood correctly, the only per-pixel operations, if any, are to lerp the colour between the shaded 'vertices' of the micropolygons.

It's hard to translate this back to Direct3D hardware at the moment, since it works so differently.
But if we look at the general idea, then we see that a REYES renderer is very much geometry-based, and doesn't try to do a lot of per-pixel hacks.
So, I suppose if we want to get closer to the realism of a REYES renderer, with current hardware we should just use more polygons. However, in our case this does not mean we should do all shading per vertex, we should still use the per-pixel shading, since that will give better quality with the amount of polygons we can handle in the near future.
Which is more or less my first point in my first post.

For the rest, as I suspected, you meant the sampling problems.
 
g__day said:
One more question, could you profitabily move any more of the 3d pipeline onto dedicated hardware, or are we at the limit?

Well Id like to see occlusion culling move to the graphics card. It could be done relatively painlessly in OpenGL using linked display list (just add a way to do optional rendering of a display list based on a bounding box test ... you would translate the scenegraph to a tree of display lists, display lists can call eachother, and tell the graphics card to start at the root). Display lists have gone totally out of fashion though, so I dont see it happening.
 
MfA said:
David Kirk told our very own B3D that a unified architecture isnt on the cards for NVIDIA in the short term.

As we can see, despite the closeness of the Shader Model 3.0 Pixel and Vertex Shader sets, NVIDIA have opted to stick with distinct Vertex and Pixel shader engines, as opposed to a unified Vertex and Pixel Shader ALU (Arithmetic Logic Unit) structure. In fact when we asked David Kirk about the potential use of a unified structure he suggested that, as far as NVIDIA are concerned, this wasn't a route they are pursuing as it has performance implications such as thrashing caches - which raises the question whether they are looking at a unified pipeline structure even beyond NV4x.

ATI's presentation pretty much prooves they are going there though. I think NVIDIA is betting on the wrong horse here, a unified architecture doesnt present insurmountable problems IMO ... just challenges. Efficiency on an ideal workload for a non unified architecture is nice and all, but no real workload is ideal ... the utilization on real workloads leaves a lot of wiggle room for caches and other extra hardware for an uniform architecture.

Entirely OT: I recall someone here (when predicting the unification of units) actually stating that some vendors will unify units and some won´t. I doubt it was a coincidence either (and no not all estimates/predictions came to frution to be honest, yet I still find it interesting enough).
 
Not perfectly following this, but basically, a unified setup would be adding in all the work between vertex and pixel shaders today, making one unit? Wouldn't you need to work with 3 points at a time in your unified shader code to generate texture coordinates per pixel? My newbie understanding is that you only get to work with a point at a time in the vertext shaders and the fixed part of the hardware does stuff like interpolating between the points by using the same kind of functionality that existed before shaders, all the mipmapping and trilinear, etc, is done by setting texture stage states. Pixel shaders then grab the texture coordinate registers and then sample textures or use the interpolated values as normals, etc..

Just curious what is meant by unifying them... beyond some API level of virtualization or something.

As for the poll, I don't think there is much point in one without the other, seems to me the current setup is efficient enough, trying to do all detail using geometry would need hardware that is a lot faster than what exists today. (Progessive meshes are ok, but be nice to have something in hardware for that, maybe something along the lines of mipmapping for textures.)

Probably coming off retarded, just asking.. :)
 
g__day said:
I like Chanoth's thinking and was a bit surprised by Mfa's note on Nvidia thinking - how recent is that?
I was surprised, too, when I first read it. I'm really hoping that he's just talking about the NV4x when he said, "short term," but I really don't know. I'm being optimistic here, but it's always possible that he said short term for the simple reason that it's standard practice not to talk about unannounced products.
 
Himself said:
Not perfectly following this, but basically, a unified setup would be adding in all the work between vertex and pixel shaders today, making one unit? Wouldn't you need to work with 3 points at a time in your unified shader code to generate texture coordinates per pixel? My newbie understanding is that you only get to work with a point at a time in the vertext shaders and the fixed part of the hardware does stuff like interpolating between the points by using the same kind of functionality that existed before shaders, all the mipmapping and trilinear, etc, is done by setting texture stage states. Pixel shaders then grab the texture coordinate registers and then sample textures or use the interpolated values as normals, etc..
Well, if you imagine a queue system, you'd just process vertex data to prepare pixel data, which would end up in the queue. When pixel data reaches the point where it's about to be processed, the pipeline would do some kind of state switch, and process the pixel data. You would still, of course, aline pixel data in quads, and thus would have to do something similar with vertex data, but it shouldn't be too hard to manage.

The real benefit of doing this is that, if you think about it, in any given scene there are many polygons that are very large (covering hundreds or thousands of pixels), and there are many polygons that are very small (covering only a couple of pixels, or, if the object is really far away, perhaps even smaller than a pixel). With static pipelines, in each situation you'll only be executing as fast as the limiting factor for that situation. For the big polygons you're limited by the pixel shader, and for the small ones the vertex shader.

Unifying the pipelines allows you to remove this limitation, allowing more "effective" vertex pipelines when processing data that is vertex processing-intensive, and vice versa for pixel shading-intensive data.
 
Back
Top