Vertex shading on CPUs & accomodating vertex-biased work

Inane_Dork said:
I think you're more likely to gain useful insight in thinking of reasonable scenarios and what usage is like in them. For instance, I wouldn't expect more than one thread on the XeCPU to be generating or shading vertices simply due to graphics state synchronization. Sure, you could work it otherwise, but I would expect no more in the general case.

It was a "crazy academic extreme" afterall ;) In "reasonable" situations, or pretty much any other as far as I can see, the outcome favours PS3 on both fronts, but to different degrees on those fronts. The reasonable situations are actually the most interesting ones I think.

Inane_Dork said:
True, but you always gain performance on Xenos from that, whereas you only gain a speedup on RSX when you're pixel shader bound. It's the mixed blessing/curse of dedicated hardware.

EDIT: You don't *always* get a speed up on Xenos, but you're much more likely to get more work done in its case.

Can you elaborate a little?
 
1 spe can do about 200-300 million polygons/s with subdiv, bezier, nurbs or displacement mapping

if scene manager on cell ,read the Z-Pyramid from RSX, able to do oclusion culling on objects with own bounding-box or -sphere

SPE rendering "only" visible objects, if you use 2 SPE , you can compute about 500 million visible polygons/s, this is 10 million polys/frame but you have 1 million pixels on screen :LOL
if scene very complex and has 10 billion polygons, then ps3 can do it with 60 fps
 
Titanio said:
In "reasonable" situations, or pretty much any other as far as I can see, the outcome favours PS3 on both fronts, but to different degrees on those fronts.
Oh, I dunno about that. With the PS3, it's really tempting to just say, "Well, do it on SPE." But reasonable or normal usage may not include using SPEs for vertex shading. And reasonable or normal usage may include having D3D use a core vertex shade on the 360. I'm not sure we have enough info to tell what these scenarios actually will be.

Though I do agree that the PS3 appears to have the edge overall.

Can you elaborate a little?
Saving pixel work by moving things to the vertex shader helps mostly in the pixel shader bound case. It's a common case.

That's for the PC and PS3. On a unified shader system, spending N more vertex ops to save on N (totally bogus ratio) pixel ops is a win because they share the same pool of ALUs and the number of pixels outweighs the number of vertices. That and the X360 is less likely to be fillrate or framebuffer bandwidth bound, which means the bottleneck is more likely to be on the processing.

Going back to a fixed system, say you have the optimal ratio of vertex to pixel ops for the system and nothing else is slowing the GPU down (totally academic, but yada yada). If you do less ops in total by switching pixel work to vertex work, you actually decrease throughput because one of the stages of the pipe has more work to do and has to take more time.


As a total side note, I'd be really interested in seeing what kind of benefits nVidia gets out of dedicating pipes to pixel work instead of going with a more general system. From what I've seen from Xenos, I can't really see anything lacking.
 
Inane_Dork said:
Oh, I dunno about that. With the PS3, it's really tempting to just say, "Well, do it on SPE." But reasonable or normal usage may not include using SPEs for vertex shading. And reasonable or normal usage may include having D3D use a core vertex shade on the 360. I'm not sure we have enough info to tell what these scenarios actually will be.

Oh I meant in terms of more reasonable mixes of pixel/vertex work rather than for example, reasonable dev situations. Though it may well be quite reasonable to use the CPUs like this, or at least that's the impression I've been getting in discussions here and elsewhere, and I suppose that's a lot of what this thread is about. Whether it'd be typical or not would be another question, but really the motivation for this scenario was to cover corner cases relative to Xenos..it just happens to introduce a lot more shading power as a consequence.

Inane_Dork said:
Saving pixel work by moving things to the vertex shader helps mostly in the pixel shader bound case. It's a common case.

That's for the PC and PS3. On a unified shader system, spending N more vertex ops to save on N (totally bogus ratio) pixel ops is a win because they share the same pool of ALUs and the number of pixels outweighs the number of vertices. That and the X360 is less likely to be fillrate or framebuffer bandwidth bound, which means the bottleneck is more likely to be on the processing.

Going back to a fixed system, say you have the optimal ratio of vertex to pixel ops for the system and nothing else is slowing the GPU down (totally academic, but yada yada). If you do less ops in total by switching pixel work to vertex work, you actually decrease throughput because one of the stages of the pipe has more work to do and has to take more time.

Cheers, I think I see what you're saying. But I'm not sure if it really matters overall as regards the absolute amounts of work being done, comparatively.

Inane_Dork said:
As a total side note, I'd be really interested in seeing what kind of benefits nVidia gets out of dedicating pipes to pixel work instead of going with a more general system. From what I've seen from Xenos, I can't really see anything lacking.

Well I think it's difficult to conclude on actual performance unless the hardware's in front of you, but intuitively I'd expect dedicated hardware to outperform more general hardware in its specific task by some margin (whatever that is). If the raw amounts of power are roughly equal and you know you want to use it for just one task, dedicated makes sense.
 
Titanio said:
Cheers, I think I see what you're saying. But I'm not sure if it really matters overall as regards the absolute amounts of work being done, comparatively.
Yeah, well, neither am I. Armchair programming isn't specific by nature. :p

...but intuitively I'd expect dedicated hardware to outperform more general hardware in its specific task by some margin (whatever that is).
Yeah, I know the intuitive argument and I agree with it. I'd just like to know what they're really getting with dedicated that general lacks. I really can't see the benefit from the info we have.
 
Inane_Dork said:
Yeah, well, neither am I. Armchair programming isn't specific by nature. :p

...but intuitively I'd expect dedicated hardware to outperform more general hardware in its specific task by some margin (whatever that is).
Yeah, I know the intuitive argument and I agree with it. I'd just like to know what they're really getting with dedicated that general lacks. I really can't see the benefit from the info we have.

Likewise, what are they really getting with general hardware?
 
Inane_Dork said:
...but intuitively I'd expect dedicated hardware to outperform more general hardware in its specific task by some margin (whatever that is).
Yeah, I know the intuitive argument and I agree with it. I'd just like to know what they're really getting with dedicated that general lacks. I really can't see the benefit from the info we have.

I expect the detail would be rather low level ;)

Likewise, what are they really getting with general hardware?

Looking at it in isolation, a tradeoff presumably between optimisation for one task vs flexibility.
 
Back
Top