It's really a shame that everyone thinks RSX is around the same speed or faster than Xenos. NVidia has really done a good marketing job.
Just look at the G70. They get 136 ops/cycle by assuming the PS can do what, 5 operations per cycle? If you compare G70 to R300 clock for clock, the performance advantage is maybe ~30%. Even if the Xenos pipelines are only as fast as a 2 generation old ATI architecture (note that R300 is vec3+scalar, Xenos is vec4+scalar, so this is quite conservative), it will still have a ~50% advantage over RSX.
The load sharing with vertex processing is a non-issue, because you rarely have both pixel shaders and vertex shaders under heavy loads simultaneously. I know because I've worked at ATI and studied performance using data unavailable to end users. The main reason you want good vertex performance is for blasting through triangles that have
no pixels, i.e. backfaces, off screen triangles, etc.
The only advantages RSX has are fillrate without AA, and filtered texturing. The former doesn't apply for alpha blending (fog, smoke, particles), because RSX get bandwidth bottlenecked. The latter won't apply to shadow-mapped games (e.g. Unreal Engine 3) because you can use the 16 point sampled units for that.
Then you look at HDR, and Xenos has a huge advantage in that it has free AA (versus no AA) as well as a faster FP10 format.
Bobbler said:
Simple logic dictates that -- what we've seen from the Xenos hasn't been 2x the capabilities of R520 (the "devs haven't had the time!" card doesn't really work -- if Xenos was truly 2x, or anywhere near, the power it would be doing a lot more than 720p at 30fps with 2x AA)
Framerate is limited by fillrate, not shader rate, which is about equal to the current generation with 4xAA, and half with 2x or no AA. Not sure why only 2xAA is used, but maybe it has to do with getting used to the tiled rendering or something. Right now, games still rely very much on just texturing performance. Furthermore, we don't know that it is indeed the graphics card that's limiting the framerate, as we all know about the horrors of out of order execution. The developer learning curve is definately there.
Bobbler said:
, and "2x" the power from 2/3 the transistors is a bit absurd (and amazing if true on some planet). Even with ~60% efficiency vs 100%, that would only account for the transistor budget being reduced, not a 2x power gain. It just seems transistor for transistor the theorectical power is going to be about the same -- there is no magic wand to get 2x the capabilities out of the same transistor budget (especially when you have some of the best engineers working on it). It just seems absurd that anyone would think Xenos would be substantially more powerful than stuff in the same generation (or availible in the same 6month window -- R520, G70, RSX) -- I'll grant the efficiency card making up for the transistor difference (and maybe a bit extra even), but I cannot see where you get the colossul power difference outside of that. Logic dictates that 48 "pipes" in 232m transistors (with ~15% redundancy by your calculations) shouldn't beat a ~320m transistor monster (at a higher clockspeed as well)... especially when its from the same company and engineering talent.
There are many reasons for this.
First of all, these graphics firms have multiple teams working on different projects in tandem, so you can't say it's the same talent.
Second, just look at NVidia's jump between NV30 and NV40. NV30 was often less than half the speed of R300 in pixel shading unless NVidia hand tuned your shader. NV40 was faster than ATI's next gen. We're talking about a good 4x speed increase with less than 2x the transistors.
Third, the numbers you're quoting aren't comparable. 232 transistors does not include the daughter die, whose eDRAM saves you from needing z-compression, colour-compression with AA, an ultra efficient memory controller, large write caches, etc. The logic on the daughter die also saves blending, z-test, stencil test, and more. G70 and R520 are designed for the PC to use DirectX and OpenGL, so they don't have the flexibility for a radical architecture. Plus, Xenos doesn't need a 2D core, advanced video processing capabilities, and doesn't even need to worry about image output I think.
Fourth, Xenos has a unified architecture and doesn't need vertex shaders. Therefore more die space for general shader processors.
For all these reasons and more, it's more than probable that Xenos really is 2x as fast as current GPU's in pixel shading and 5x faster in heavy vertex shading, even though it 'only has 232M transistors'. Just remember that this doesn't necessarily translate into 2x performance for all scenarios.