Originally Posted by mokmok
1. The Xenos Vertex Performance is up to 6x greater than the RSX's.
So I've talked about this one a few times before, but it is a flat-out absurdity in the sense that it ignores all other limitations of the respective hardware. So the assumption is that the smallest possible vertex shader is 4 dot products (basically, transform the vertex). And since you've got 8 vertex shader pipes in RSX, and 8/4 = 2 * 500 MHz = 1 billion verts.
On Xenos, you've got 48 ALUs which if you assume are all dedicated to vertex processing (this is actually impossible, but for the sake of theory we'll ignore that), you get 48/4 = 12 * 500 MHz = 6 billion verts.
Sounds that way, but unfortunately, it's completely untrue. The thing is vertices do not get moved in at unlimited speed. You can only move vertex attributes at a fixed number of attributes per clock cycle, and that means in 99% of all *major* render passes, that a single vertex takes more than one clock cycle to get in. So no matter what, it doesn't matter how much you can theoretically process because the data doesn't move through the system fast enough. The real theoretical advantage is still there for Xenos, but it is by no means 6:1. In reality, they both suck pretty bad. RSX simply sucks a little worse.
In the end, Xenos can only set up one triangle per cycle, while RSX can set up 1 every two cycles. It should be noted, though, that because of things like a post-transform cache, if you're smart, you can actually exceed the theoretical limits. And since RSX's post-transform cache is about 8x larger than Xenos', it has more potential for gain. To be fair, though, RSX needs it far more badly than Xenos does. The vertex attribute read rate on RSX is incredibly god-awful, but it's not an insurmountable wall. Xenos simply hits fewer internal limits.
BTW, about the 6 billion verts figure... that kinda ignores a little detail. This may come as a shock to a lot of people, but vertices consist of this thing called DATA. If you take a pretty average-sized vertex, 6 billion vertices per second requires more than double the bandwidth than the entire Xbox360 has... and that's including the totally internal busses which don't actually connect any two separate devices (you know how people like to pretend the the 256 GB/sec on the eDRAM die can be treated like a point-to-point link). You want to move that much data over a main memory bus (which is the real bus of concern for this purpose), that's not going to happen within the next 3 or 4 console generations. Memory architectures simply don't grow that quickly. Currently you can't move 6 billion of even the smallest possible vert (per second) over the main memory bus, and I don't see that happening on Xbox720 or PS4 either.
2. The use of the Edge Tools and SPEs brings the Vertex Performance of the PS3 on par with the 360 but prevents the SPEs from being used for Physics, AI etc.
They're kind of assuming a lot of things because the demos, which were meant for a technical audience, used all the available SPEs in order to demonstrate the concept and showcase techniques that can keep all the SPEs busy. If you actually did it like that in a real game, yeah, you'd certainly tie up all the SPEs
for that period of time within the frame. Something that I think nobody outside the industry actually realizes is that the CPU side of rendering does NOT take up a huge portion of the time between frames. Physics, AI, etc. take up much more time than rendering. It's a little hard to see that with the PC as a reference point, of course, because Windows and the API layer robs you of so much.
That aside, the point of Edge is not to fill up all the SPEs. You certainly don't NEED to use more than 1 or 2 SPEs in order to get a huge gain out of it. More importantly, while Edge was specific to graphics, a lot of the same principles can be applied to physics (Havok's tech talks demonstrated that quite handily and nobody talks of Havok precluding the use of Edge) and AI and so on.
Just looking at the raw specs of the Xenos GPU it does seem to have a Vertex processing advantage.
For all you might say about the dynamic allocation of vertex pipes, you end up limited by a lot more external things than anything internal to the GPU. Also, no matter what, on major passes, you're going to end up spending more effort on pixels anyway, and RSX has a moderate advantage over Xenos in that area. All the same, getting a billion triangles per second to the GPU in the first place is basically impossible. It doesn't matter how much power the GPU has to work with them because it can't get to that point. In general, the challenge in getting 100-150 million tris per second moved through the pipe is hard enough whether you're on PS3 or 360, and it's not the GPU itself that's the problem.
I look back at how things looked when the 360 was still a little while shy of release, and back then, the notion of even drawing a scene of up to 750,000 polygons per frame at 30 fps was looking pretty much impossible to almost every developer out there. Nowadays we talk of nearly double that pretty freely. It's certainly not because the GPU suddenly got more powerful or we learned how to dedicate more ALUs to vertex processing. It's because we're doing better on the *CPU* side that we're able to keep that push buffer pushing more often.
My understanding is that pixel shaders are mainly used for bump mapping etc. whereas Vertex shader are used to render complex lighting effects etc.
Vertex shaders are simply for things you would do that operate at the level of a single vertex (transformation, positioning, and more often than not, setting up data for the pixel shaders to use). Pixel shaders are for things you would do at the level of a single pixel (all texturing, all lighting, etc). On hardware prior to programmable shaders, of course, you'd probably do just about everything at the vertex level because that's what you have available to manipulate both at the hardware and software level.