Vertex shading on CPUs & accomodating vertex-biased work

Titanio

Legend
(Sorry about length in advance)

I'm sorta looking at this from a fairly high level, and I'm not aware of the very intimate details of GPUs and such, so forgive me if some of theorising seems woefully ignorant. It's sorta late and maybe the heat's getting to me ;)

Basically upfront, a question that will be relevant later there's been a lot of talk about how the CPUs in particular can handle vertex processing, and while I'm sure there are some constraints on how this works, can I reasonably assume it can be done?

In X360 we then have a GPU that can roughly speaking handle arbitary ratios of pixel to vertex shader workloads. In PS3 we have a GPU who's pixel and vertex shader capability is locked at a certain ratio.

At this point, it might be handy to ask about general pixel:vertex ratios. It's safe to assume that typically the weighting is quite significantly toward pixel shading? Presumably the high pixel:vertex shader ratios in "fixed" hardware is a consequence of this?

X360's GPU doesn't care about this, however. It'll adapt to workloads strongly weighted toward pixels or vertices. This should become an advantage from a utilisation point of view over a dedicated architecture (RSX) in situations where vertices require more attention than pixels, or the ratio moves away from that enshrined in RSX so to speak.

Now here, where I'm about to indulge in some comparative pondering, I should add some caveats - again I'm obviously looking at this from a high level, and there's a certain amount of "all else being equal" going on, but with that out of the way - Where the bias is toward pixels, RSX seems to have the advantage. How much of Xenos would have to be dedicated to pixel shading in order for it to match RSX in absolute terms? (Not looking for an answer here necessarily, just thinking out loud).

With vertices, the roles are reversed. In terms of shader numbers, RSX reserves 25% of for vertex shading, while Xenos could dedicate an arbitrary portion of it's resources to vertices. So with proportionately heavier vertex loads, Xenos should perform better.

But then, and coming to a conclusion, I have two questions:

1) if "typically" the bias is so strongly toward pixels (apparently), how far is the weighting likely to be biased toward vertices even in what could be considered more extreme non-typical cases? 50:50? 60:40? 75:25? (How likely is it really that many would focus so extremely on vertices vs pixels?)

2) And I guess this is the key question - does it not then seem quite fortunate that the one graphics task Cell could perhaps particularly help its pixel-biased companion with, is further vertex processing? Combined, the vertex shaders and some undefined "portion" of Cell - how far can it go to "bridge the gap" in absolute terms of dealing with larger vertex loads than RSX could handle on its own relative to Xenos?

Basically, it seems to me that RSX's "weakness" relative to Xenos is going to be situations with a higher proportion of vertex work vs the norm. But moving away from proportions, and looking at it in absolute terms, in a lot of situations could the CPU and VS combined help close the gap with how much Xenos handles when more biased toward vertices, all while keeping much more pixel shading power on tap? The opposite does not seem to be true - if the relative weakness were in pixel shading, it does not appear the CPU could be as useful at all?
 
Re: Vertex shading on CPUs & accomodating vertex-biased

Titanio said:
1) if "typically" the bias is so strongly toward pixels (apparently), how far is the weighting likely to be biased toward vertices even in what could be considered more extreme non-typical cases? 50:50? 60:40? 75:25? (How likely is it really that many would focus so extremely on vertices vs pixels?)

The Xenos seems geared towards a two pass rendering scheme, first fill the Z-buffer, then redraw to fill in pixels. The first pass is exclusively vertex shading (and Z-fill of course) so should do very well here. Second pass is the usual mix. The shading of only visible fragments could again shift the load-mix towards vertex shading.

Cheers
Gubbi
 
Re: Vertex shading on CPUs & accomodating vertex-biased

Gubbi said:
The Xenos seems geared towards a two pass rendering scheme, first fill the Z-buffer, then redraw to fill in pixels. The first pass is exclusively vertex shading (and Z-fill of course) so should do very well here. Second pass is the usual mix. The shading of only visible fragments could again shift the load-mix towards vertex shading.

Cheers
Gubbi

Thanks for the reply!

How far do you think might the load shift in situations where overdraw is not a concern?

It seems that the bias toward pixel shading is a result of more than just overdraw though (?) Asides from a still possibly higher number of pixel elements to be dealt with vs vertex (?), there just seems to be a general trend toward a ramping up of per-pixel work compared to per-vertex work in the last few years (?)

Again, this is a very high level view (and perhaps simplistic) - but when you think in absolute terms, I guess, given the same amount of vertex work to PS3 vs X360 the proportions don't necessarily need to work out the same either. Like if Xenos works 50:50 on vertices vs pixels - which is quite a shift away as it is from 25:75 or 15:85 - if you took that same amount of vertex work and placed it on VS + some cell, there's still a higher proportion of power left that could be used in the pixel shaders (so in other words, up to some point/some weighting, Cell + VS could deal with the same amount of vertex work as Xenos but still have a lot more pixel shader power left on top of that, simply because it can't be used for vertex work - and I'm sure devs could find a use for it).

I guess I'm wondering where the breaking point is, and how often it might be reached.
 
This will (of course) depend on the game and more specificaly on the amount of overdraw I guess.

When rendering huge open ares (landscapes) you easily get ALOT of polygons and ALOT of overdraw. Just imagine standing on a hill and looking into the valley - lots of trees, grass and butterflies included.

Pixelshader costs only depend on resolution and shader complexity (If you ignore refraction).

Vertexshader costs depend on (unculled) polygons in frustrum and vertexshader complexity.

For a game like Morrowind the number of polygons/vertices could easily exceed the number of pixels. But I am unsure wheather the power of XBox360/PS3 really makes such a scenerio fesable.

Afaik current games are a magnitude away from 1m polygons on screen/in frustrum.

Edit: There probably never will be any "pixelshader power" left on the PS3, because pixelshader costs are constant regarding scene complexity.
If you have the resources to shade 70% of all pixels you will try to balance the engine and the levels to approximate those 70% shaded pixels on every screen.
For some games this is easy - for other games you might have to reduce image quality or allow slowdown.
On XBox360 you have the choice. If your scene complexity is low you might use the CPU for vertex processing (at least for the 2nd pass) maximizing pixelshading power. If scene complexity/overdraw is hight you may go the other way round.
 
Re: Vertex shading on CPUs & accomodating vertex-biased

Titanio said:
In terms of shader numbers, RSX reserves 25% of for vertex shading, while Xenos could dedicate an arbitrary portion of it's resources to vertices. So with proportionately heavier vertex loads, Xenos should perform better.
Umh..on RSX about 15% of programmable flops are devoted to vertex shading, imho.

1) if "typically" the bias is so strongly toward pixels (apparently), how far is the weighting likely to be biased toward vertices even in what could be considered more extreme non-typical cases? 50:50? 60:40? 75:25? (How likely is it really that many would focus so extremely on vertices vs pixels?)
100:0

2) And I guess this is the key question - does it not then seem quite fortunate that the one graphics task Cell could perhaps particularly help it's pixel-biased companion with further vertex processing? Combined, the vertex shaders and some undefined "portion" of Cell - how far can it go to "bridge the gap" in absolute terms of dealing with larger vertex loads than RSX could handle on its own relative to Xenos?
A couple of SPEs should be enough to do as much work as 8 VS on RSX can do..
So CELL can bridge any gap here, if you have enough SPEs to spare ;)

Basically, it seems to me that RSX's "weakness" relative to Xenos is going to be situations with a higher proportion of vertex work vs the norm. But moving away from proportions, and looking at it in absolute terms, in a lot of situations could the CPU and VS combined help close the gap with how much Xenos handles when more biased toward vertices, all while keeping much more pixel shading power on tap?
Even if I don't think what you're pointing out is a RSX weakness I believe CELL clould close any gap.
The opposite does not seem to be true - if the relative weakness were in pixel shading, it does not appear the CPU could be as useful at all?
In some cases some part of pixel shading work can be moved from pixel to vertices in order to allievate pixel shaders workload

ciao,
marco
 
DotProduct said:
Edit: There probably never will be any "pixelshader power" left on the PS3, because pixelshader costs are constant regarding scene complexity.

Sorry if I wasn't clear - I didn't mean to suggest there was an excess, but that relative to Xenos in situations where the vertex load was high, there'd be more. As the bias shifts towards vertices, pixel performance on Xenos decreases, while on RSX it remains a constant, a fixed, very high proportion of its power. Meanwhile up to some point (the "breaking point"), Cell + VS can keep up on the vertex side.

nAo said:
Titanio said:
In terms of shader numbers, RSX reserves 25% of for vertex shading, while Xenos could dedicate an arbitrary portion of it's resources to vertices. So with proportionately heavier vertex loads, Xenos should perform better.
Umh..on RSX about 15% of programmable flops are devoted to vertex shading, imho.

Right sorry, I mentioned this in my second post. I was looking at the proportions in two ways, perhaps simplistically - number of shaders (75:25) and Gflops (85:15).

Thanks for the rest of your answers!
 
Re: Vertex shading on CPUs & accomodating vertex-biased

Titanio said:
How far do you think might the load shift in situations where overdraw is not a concern?
No clue...

Titanio said:
It seems that the bias toward pixel shading is a result of more than just overdraw though (?) I mean in the last few years there seems to have been a general trend toward a ramping up of pixel work compared to vertex work.

I think this is an egg-and-chicken thing. Modern GPUs have larger pixel shading resources than vertex shading resources (since they have to accomodate a multitude of resolutions). Applications/games expand to fill the performance envelope. Xenos is purpose built for 1280x720, 4xMSAA and as such looks well thought out.

I'm sure you could use the SPEs for vertex-shading to boost vertex performance. You'd need to load balance the vertex load though. And it's not clear how well suited the SPEs are at running VS programs being SoA vs AoS oriented, lacking free swizzles etc.

Cheers
Gubbi
 
Even if I don't think what you're pointing out is a RSX weakness I believe CELL clould close any gap.

Absolutely. I am pretty sure that on both consoles you will never run out of vertex processing power. Afaik you will run out of texture bandwidth a long time before you reach the 500m Polygons/s of the xenos. As far as I understand the main purpose of vertex shader was to reduce agp bandwidth costs. CPU<->GPU bandwidth seems hight enough for me to turn vertex shader into "pc architecture legacy". Still nice to have though. ;)
 
One little thought to consider, which has sort of been brought up, is that the demand of the graphics pipeline shifts many times per frame. You do Z, shadow, reflection, particles, tone mapping, bloom, DOF, HUD overlay, etc. and they all have their own demand of the pipeline that shifts the bottleneck to different stages. I'm not sure that allocating SPEs to handle vertex work would be quick enough in the reaction time, though maybe you could pre-plan it.

And if you went with something more exotic like deferred shading, maybe you could allocate SPEs to handle some of the lights.
 
Inane_Dork said:
One little thought to consider, which has sort of been brought up, is that the demand of the graphics pipeline shifts many times per frame. You do Z, shadow, reflection, particles, tone mapping, bloom, DOF, HUD overlay, etc. and they all have their own demand of the pipeline that shifts the bottleneck to different stages. I'm not sure that allocating SPEs to handle vertex work would be quick enough in the reaction time, though maybe you could pre-plan it.

Not just that, but the workload can shift dramatically just in a single render batch within a single frame.

I think it would be a monumentally difficult task to load balance something like that in software alone.
 
aaaaa00 said:
Inane_Dork said:
One little thought to consider, which has sort of been brought up, is that the demand of the graphics pipeline shifts many times per frame. You do Z, shadow, reflection, particles, tone mapping, bloom, DOF, HUD overlay, etc. and they all have their own demand of the pipeline that shifts the bottleneck to different stages. I'm not sure that allocating SPEs to handle vertex work would be quick enough in the reaction time, though maybe you could pre-plan it.

Not just that, but the workload can shift dramatically just in a single render batch within a single frame.

I think it would be a monumentally difficult task to load balance something like that in software alone.

is that normally the case? if so then devs are going to have a bit harder time with cell helping out rsx
 
Inane_Dork said:
One little thought to consider, which has sort of been brought up, is that the demand of the graphics pipeline shifts many times per frame. You do Z, shadow, reflection, particles, tone mapping, bloom, DOF, HUD overlay, etc. and they all have their own demand of the pipeline that shifts the bottleneck to different stages. I'm not sure that allocating SPEs to handle vertex work would be quick enough in the reaction time, though maybe you could pre-plan it.

I was thinking the same thing but did not want to hijack Titanio's thread.

The balance is different game to game, level to level, from one aspect of a level to the next, and even on each frame. e.g. The Xenos daughter die logic seems to handle the Z pass extremely fast, and with the unified shaders, should chew through this stage in the frame.

With a traditional design, even one aided by a CPU, you run into the problem of some hardware sitting idle at stages where one task or the other is more dominant. e.g. In Far Cry you look up at the trees and leaves above you and the screen is filled with thousands upon thousands of leaves then turn around and look at the open ocean with nice water shader effects.

I think it goes back to system design. While both are powerful and both are elegant, I think it is pretty clear that each system is using a different emphasis to attain the same goal.

One thing I am hoping unified shaders bring to the table is more stable framerates. In GPU limited games it is annoying to see big dips. If some of these framerate issues are related to changes in the gaming environment and the shading load shifting back and forth between vertex and pixel shading (and parts of the pipeline sitting idle waiting for the other stuff to get done) this could being a new metric that is relevant for discussion: Minimum and Mean framerate and Standard Deviation.

Likewise, a RSX<>CELL setup could really alleviate a lot of vertex bound situations and also could increase the minimum framerate in certain situations.

Either way gamers win and I am sure developers appreciate the flexiblity.

So CELL can bridge any gap here, if you have enough SPEs to spare

You know, there are things like AI, Sound, Physics, Particle System, Game Logic, Procedural Textures, and so forth that CELL needs to do too in many games!

Just teasing ;)

I think you are right if I understood you right. 7 SPEs is a lot, especially when mutlithreading (9 total HW threads on the CELL). You obviously don't want any of that sitting idle and it may be difficult at first for developers to find enough non-graphic tasks to have all 7 SPEs at 100% utilization all the time. So why not put them forth on graphics tasks? It could only make the game look better... and good looking games 'cell' well ;)

And on the scale of tasks SPEs should be best at, I would guess vertex shading would be at, or near, the top. Probably not as effecient as hardware designed to output vertex data natively (like a shader) but meh, beggers cannot be choosers!

Hopefully this means we may see more deformable, destructable, and interactive worlds.
 
dukmahsik said:
aaaaa00 said:
Inane_Dork said:
One little thought to consider, which has sort of been brought up, is that the demand of the graphics pipeline shifts many times per frame. You do Z, shadow, reflection, particles, tone mapping, bloom, DOF, HUD overlay, etc. and they all have their own demand of the pipeline that shifts the bottleneck to different stages. I'm not sure that allocating SPEs to handle vertex work would be quick enough in the reaction time, though maybe you could pre-plan it.

Not just that, but the workload can shift dramatically just in a single render batch within a single frame.

I think it would be a monumentally difficult task to load balance something like that in software alone.

is that normally the case? if so then devs are going to have a bit harder time with cell helping out rsx

Trivial example take a sphere, assuming your doing significant pixelshading work on it half the tris in the batch will be pixel limited, the other half will be backface culled and do no pixel work.

Even ignoring things like shadow rendering and Z Pre passes, during rendering the load swings backwards and forwatds continually.

If the FIFO in the chip between the Vertex work and the pixel work is large enough it doesn't matter, but they are generally very small (10-20).

It's extremly difficult to estimate how much time parts of the chip spend waiting for each other, it's probably somethig the IHV's can measure pretty easilly.
 
Couple of thoughts

1) Can there ever be a time when pixel work is evenly divided on Xenos? It seems that pixel versus vertex work is based on thirds at any given moment. so the ratios are P:V can only be:

00:100
33:66
66:33
100:0

2) What is teh purpose of tesselator on Xenos? Does it add to x360's ability to produce geometry? Does its capabilities balance out some of what the Cell's SPEs can do?

3) I think the answer to Titanios questions will always sit with the developers in that it depends upon which system the are programming to.

Lets say for example the ps3's 85% of flops dedicated to pixel shading ( which may be a max) is greater than Xenos' 66% output ... then MS may have a problem
 
blakjedi said:
Couple of thoughts

1) Can there ever be a time when pixel work is evenly divided on Xenos? It seems that pixel versus vertex work is based on thirds at any given moment. so the ratios are P:V can only be:

00:100
33:66
66:33
100:0

It's because the pipes are distributed/set up into three units - for lack of a better quick descriptor - and within those units, all pipes must be performing the same sort of shading operation at any given moment. So that's why you see the ratios in terms of 0/3, 1/3, 2/3, 3/3 with Xenos.
 
xbdestroya said:
blakjedi said:
Couple of thoughts

1) Can there ever be a time when pixel work is evenly divided on Xenos? It seems that pixel versus vertex work is based on thirds at any given moment. so the ratios are P:V can only be:

00:100
33:66
66:33
100:0

It's because the pipes are distributed/set up into three units - for lack of a better quick descriptor - and within those units, all pipes must be performing the same sort of shading operation at any given moment. So that's why you see the ratios in terms of 0/3, 1/3, 2/3, 3/3 with Xenos.

I disagree. The Xenos does not have a "pipeline" so to speak. In Dave's Xenos article he even went so far as to cross out the word "pipeline" and replace it with the word "array." So the Xenos's unified shaders are placed into 3 "arrays"....not "pipelines."

A Quite from the Xenos article......

"The arrows on ATI's diagram above indicates that there is some dependency from one of the shader arrays to another, almost as though they are pipelined; this is in fact not the case and each ALU array is working independently of the other and the data is not pipelined between them. This being the case there is no dependency between what programs, or types of programs, are being executed on each of the three ALU arrays"
 
LOL, take it easy - I could have referenced that if I wanted as well. :p It's true they're not pipes per se, but I was just trying to give a quick and dirty. Can't be referencing material for proper terms everytime I post. ;)

Anyway the premise remains the same, the shading power is divided into thirds.
 
I think there are 2 pieces of information we are still missing, and when we have it we will be able to answer many of these questions.

1.EXACTLY how powerful is one Unified shader at pixel processing compared to one standard pixel shader on the RSX?

2.and EXACTLY how powerful is one Unified shader at vertex processing compared to one standard vertex shader on the RSX?

I have a hunch that one standard pixel shader will be more powerful than one unified shader working on pixel shading, but that one unified shader working vertex shading will be considerably more powerful than one standard vertex shader.

But even after we know the answers to 1 and 2, we still have to take into account that the Xenos simply has more shaders total. So even if one standard pixel shader can out perform one unified shader at pixel shading, the Xenos may still have the advantage due to having more total shaders ( ALU's ).

And then after we take that into account, we must conside efficiency. In theory Unified shaders are far more efficient that standard shaders.

When we look at the specs for standard shaders they are based on the fact that all of the shaders are in constnt operaton..... but we know this is hardly the case with standard shaders..... they are inefficient, constantly bottlenecking the other.

How exactly are we to calculate overall efficiency between standard and unified shaders?

It would be realy nice to have solid numbers, such as "Standard shaders are 70% efficient, while Unified shaders are 95% efficient"...... but I don't think it's quite that simple.
 
xbdestroya said:
LOL, take it easy - I could have referenced that if I wanted as well. :p It's true they're not pipes per se, but I was just trying to give a quick and dirty. Can't be referencing material for proper terms everytime I post. ;)

Anyway the premise remains the same, the shading power is divided into thirds.

I don't see how you can claim that shading power has to be divied up into multiples of 3 simply because the Xenos has 3 shader arrays, as all the ALU's on each of those arrays functions independantly of the others.
 
BenQ said:
I don't see how you can claim that shading power has to be divied up into multiples of 3 simply because the Xenos has 3 shader arrays, as all the ALU's on each of those arrays functions independantly of the others.

Isn't it the case that all ALU's inside a given array must be working on either vertex or pixel shading at any given time? Just read over my original post, replacing 'array' for 'unit' and 'ALU' for 'pipe.'
 
Back
Top