Titanio said:
What proportion of a frame's processing do you think they'd require then? The smaller the proportion, the more just RSX's vertex shaders could keep up, with processing distributed across the entire length of the frame.
This is about maximum geometry per pass and skirting bottlenecks. If you have 5% of the frame's render time to perform geometry pre-processing, then Xenos will deliver 6x the shading power in the same time.
So that gives you the ability to perform tessellation (which, incidentally, isn't fixed function
) and, say, more lighting-shadow passes, e.g. 12 lights instead of 6.
More geometry creates less of a bottleneck than in RSX.
If you need results at a certain time, get your SPUs going.
Oh, I agree. But this topic is about Xenos and RSX, one being refined (both as a console-specific GPU and generally) and the other being brutish.
For example, getting an array of SPUs to crunch through an early z-pass probably might make a lot of sense in many cases.
Oh dear, where are you going to get the fill-rate for Cell to do that? Why do you think GPUs have fixed-function hardware, including hierarchical-Z and z-test in the ROPs to accelerate those tasks. Whoops.
A SPU can create and destroy geometry fine.
Of course it can, that's what CPUs have been doing for years now.
In fact if I wanted to mimic, or exceed even, "Geometry Shaders" - a feature of DX10, the API you're so keen to align Xenos with - I'd be much happier using a SPU than Xenos's very fixed function tesselator.
So, DX10 geometry shaders are a waste of time then, hmm?... Oh dear. You need a better argument than that.
Your GPU flops is more effective than CPU flops point rings rather hollow depending on what you're doing.
Er, actually we're talking about geometry and vertex shading, Xenos's home turf.
The comparison of a CPU FLOP and a GPU FLOP is more than valid. In a GPU
ADD r1.xy, r1.xy, r2.zw
runs in one clock cycle. VMX/SPE takes longer because the swizzle needs to be performed separately (at least one extra clock, maybe 2 - permute takes 4 clocks on SPE, but it could be co-issued with another vector operation on previous clock cycles, provided that the previous instruction didn't set r2). Sure, that's a silly example, but the point stands. GPUs are built for vector maths in a way that SPEs and VMX don't quite get - that's why Fafalada keeps moaning about VMX. SPEs are "more general purpose", to put it bluntly.
And don't try and mould a SPU to execute exactly as a Vertex Shader - take advantage of its own strengths (which are considerable indeed).
True, data re-ordering, packing and a variety of techniques can recover some of the efficiency that's lost in translating GPU shader programs into CPU shader programs. Cell starts with an awfully big disadvantage, though.
Functionally there is little contest, you could do things with an SPU that would be impossible to do with a Xenos (or RSX) shader (or could only be done in a rather rigid fashion with the tesselator).
Bear in mind that doing geometry/vertex-only passes, Xenos has roughly the same progammable FLOPs (for what that's worth, not much, since they're not even the same kind of FLOPs) as the whole of Cell, plus it has the extra capabilities of fixed-function hardware (e.g. culling and clipping).
And still have a lot of pixel shading power left over. Taking Xenos to a point that would require extensive use of Cell would leave it with..nothing, for pixel shading.
No because a lot of geometry/vertex shading work in advanced engines is done independently of pixel shading. e.g. the workload during stencil shadow calculation doesn't invoke any pixel shading - it's purely vertex work and z/stencil fill-rate.
D3 is a great example of this, with its extraordinarily low-poly environments/characters based on the fact that the CPU has to perform a lot of the shadowing. Even though shadowing is only a small proportion of the overall frame render time, it creates a huge bottleneck. Therefore the only solution is to keep the poly count really low.
I'm not disputing that Cell can help - all I'm saying is that geometry can create its own bottlenecks that are independent of pixel shading. Xenos has the flexibility to assign it's computing power to whatever the current bottleneck is - RSX has none of that flexibility, it consists of stages that have fixed peaks. At certain times while rendering a frame, the pixel shader pipelines will be entirely idle.
You may consider it compensatory for RSX, but at least it can compensate - the same can't be said for situations where X360 would be in a bind vs PS3.
Eh? You've got vastly more efficient shading power in Xenos allied to a more-graphics oriented instruction set in Xenon (DP3/4 plus freely interchangable AoS/SoA formatting for vectors) than that of SPEs (though Xenon's VMX units are still not up there with GPUs).
I don't know..maybe that's the mark of an architecture that's flexible, and that has legs..
No, it's the mark of an architecture that gets away with a retrograde GPU design by falling back on the CPU. If Xenos wasn't around I'm sure we'd all think PS3 was lovely, but with DX10 knocking on the door, RSX looks distinctly old-fashioned. Brutish, definitely, but old-fashioned.
Jawed