A primer on the X360 shader ALU's as I understand it

Mintmaster said:
It seems we're getting a bit OT here, but I'll continue nonetheless :D

When I made my comment I was talking about c0_re's coment about POWER. Just because you have more "power" doesn't mean your games will be better. I think PS3 developers will likely be more competent, on average, than XB360 devs, so it still may have better graphics overall, even though IMO RSX has less power.

Regarding my comments about Xenos:In graphics, framebuffer/Z bandwidth is far and away the biggest consumer of bandwidth, especially when you move to HDR and optimize textures for consoles. If RSX is churning out a puny 2GPix/s without alpha blending, it'll use over half its bandwidth once you include Z traffic. Throw in AA, HDR, and/or alpha blending and the situation's even worse. So this affects texture bandwidth as well.

BTW, if anyone doubts the bandwidth issue, check out the B3D 7600GT review. 22.4GB/s all consumed in an ideal simple fillrate test with colour and Z @ 2.9GPix/s. It has a core clocked 31% faster than the 6800GS, 2.6 times the MADD rate, and numerous other improvements. Unfortunately, it only checks in around 15% faster in most games because it has 30% less bandwidth. RSX will pretty much be a 7600GT times two, but with exactly the same bandwidth.

Let me reiterate: If RSX was halved, it would still be significantly hampered by lack of bandwidth!
The RSX only has to deal with 720p, which b/w to #pixels ratio wise put it in a comparable situation to the top pc gpus which are expected to deal 1600x1200(put it in much better light if you take the cell/xdr ram, and any likely ram and b/w upgrades.). On top of that given how clean looking some of the 360 games, that do not use AA, look, minimal or no AA might be viable, along with alternate-b/w-saving-HDR technics like nao's. IMHO, high lvls Aniso Filtering should be easily implementable under those circumstances. Thus while still being something that has to be watched, I don't think it'll cripple performance.
 
Asher said:
Nothing, provided the RSX can access the SPE's LS and the SPE knows enough to lock off the section of LS for the buffer.

We can infer this is going to happen. Everything about the supposedly shared architecture of CELL/RSX points to this happening. Some are going as far as saying that SPEs and shader processors (or groups of shader processors) might be able to be lockstepped. For reference, EA in a recent interview about MOH: AA said they were utilizing several SPEs for graphics.
 
Last edited by a moderator:
Vertex load lesson

ROG27 said:
Factor SPEs in (sharing in Vertex work) and Xenos may actually be at a disadvantage in this area--I mean this would technically only be for vertex burst situations, which are much more rare than the demand for pixel shaders. For graphics, I think you need to look at the system collectively (CPU/GPU) for PS3...at least much more so than for X360. In this way PS3 could sustain 48 pixel shader ALUs and 8 vertex shader ALUs +SPE vertex assistance at all times, while Xenos constantly needs to be dynamically load balancing between the shaders need for vertex and pixel work in a pool of 48 ALUs total.
I'll give you guys a few notes about vertex processing, since I've done performance analysis on video card workloads.

Vertex work is very often quite bursty in nature. Clipped and culled polygons come in bunches, and you have no pixel work to do at all. If the pixel to vertex ratio was constant across a scene, you'd have an abrupt cap on framerate at a particular resolution in any game that wasn't CPU limited: below this resolution, you're vertex limited, and above you're fillrate limited.

Instead, there isn't a single vertex heavy game that behaves this way. What you see is a slower falloff with resolution instead of the factor of 0.6 per resolution step in pixel limited games. What this means is sometimes you're vertex limited, sometimes you're pixel limited. When you do see a sudden cap in framerate, it's generally a CPU limitation (you need to test different clocked CPUs or GPUs to definatively distinguish between CPU and vertex limitations). Note: when I say pixel limited, it may be limited by fillrate, bandwidth, shader rate, texturing rate, etc, since all these quantities scale with pixel count.

Here's an example: 3DMark05, Game 2, which is very indifferent to CPU power. The concept is a tad complicated, but bear with me.
FPS decrease when changing resolution (I used the 6800GT, but any set of data will do):
1024 to 1280: 14.8%
1280 to 1600: 14.5%

AA/AF, 1024 to 1280: 22.7%
AA/AF, 1280 to 1600: 18.8%

A purely pixel limited scenario would show drops of 40% and 32% respectively. Now the absolute framerate at 1024x768 is 21.1 fps, and with AA/AF at 1600x1200 it is 10.8fps. We've increased the pixels count by 2.44 and also added AA/AF, enough to divide the net framerate by two, all the while keeping the vertex load constant.

We're still not close to a purely fillrate limited scenario! It goes from ~40% pixel limited to ~60% pixel limited, even though we've probably tripled the pixel workload only. Now granted there may be fixed resolution shadow maps or other per-frame factors to consider, but the majority of the per-frame cost is tied up in the vertex load. What this means is that there are lots of tiny triangles and lots of big triangles, but not many in between. Only a few triangles transition from vertex limited to pixel limited. Consider that the pixel to vertex ratios for various triangles in a scene can span many orders of magnitude. For any given VS and PS for a triangle, the chance that the ratio is within even 50% of any GPU's ideal pixel:vertex ratio is quite marginal.

This holds true for any game that pushes vertices. You'll very rarely be pushing both your pixel pipe and vertex pipe simultaneously. It's generally one or the other at any given instant.
 
ROG27 said:
We can infer this is going to happen. Everything about the supposedly shared architecture of CELL/RSX points to this happening. Some are going as far as saying that SPEs and shader processors (or groups of shader processors) might be able to be lockstepped. For reference, EA in a recent interview about MOH: AA said they were utilizing several SPEs for graphics.
If developers are talking about using SPEs for graphics for multiplatform games launching at or near launch, like MOH: AA, I'd be pretty concerned about the vertex power of RSX on its own.

I really want to know more about RSX...
 
Mintmaster said:
I'll give you guys a few notes about vertex processing, since I've done performance analysis on video card workloads.

Vertex work is very often quite bursty in nature. Clipped and culled polygons come in bunches, and you have no pixel work to do at all. If the pixel to vertex ratio was constant across a scene, you'd have an abrupt cap on framerate at a particular resolution in any game that wasn't CPU limited: below this resolution, you're vertex limited, and above you're fillrate limited.

Instead, there isn't a single vertex heavy game that behaves this way. What you see is a slower falloff with resolution instead of the factor of 0.6 per resolution step in pixel limited games. What this means is sometimes you're vertex limited, sometimes you're pixel limited. When you do see a sudden cap in framerate, it's generally a CPU limitation (you need to test different clocked CPUs or GPUs to definatively distinguish between CPU and vertex limitations). Note: when I say pixel limited, it may be limited by fillrate, bandwidth, shader rate, texturing rate, etc, since all these quantities scale with pixel count.

Here's an example: 3DMark05, Game 2, which is very indifferent to CPU power. The concept is a tad complicated, but bear with me.
FPS decrease when changing resolution (I used the 6800GT, but any set of data will do):
1024 to 1280: 14.8%
1280 to 1600: 14.5%

AA/AF, 1024 to 1280: 22.7%
AA/AF, 1280 to 1600: 18.8%

A purely pixel limited scenario would show drops of 40% and 32% respectively. Now the absolute framerate at 1024x768 is 21.1 fps, and with AA/AF at 1600x1200 it is 10.8fps. We've increased the pixels count by 2.44 and also added AA/AF, enough to divide the net framerate by two, all the while keeping the vertex load constant.

We're still not close to a purely fillrate limited scenario! It goes from ~40% pixel limited to ~60% pixel limited, even though we've probably tripled the pixel workload only. Now granted there may be fixed resolution shadow maps or other per-frame factors to consider, but the majority of the per-frame cost is tied up in the vertex load. What this means is that there are lots of tiny triangles and lots of big triangles, but not many in between. Only a few triangles transition from vertex limited to pixel limited. Consider that the pixel to vertex ratios for various triangles in a scene can span many orders of magnitude. For any given VS and PS for a triangle, the chance that the ratio is within even 50% of any GPU's ideal pixel:vertex ratio is quite marginal.

This holds true for any game that pushes vertices. You'll very rarely be pushing both your pixel pipe and vertex pipe simultaneously. It's generally one or the other at any given instant.

Well, perhaps that is why the system PS3 system as a whole was designed the way it was. Sustained Pixel Shading capability on the GPU heavily outweighs its vertex shading capability...because of RSX's tradition shader pipeline set-up. The SPEs are good at doing what vertex shader ALUs do and perhaps were meant to assist in that process if need be.
 
Asher said:
If developers are talking about using SPEs for graphics for multiplatform games launching at or near launch, like MOH: AA, I'd be pretty concerned about the vertex power of RSX on its own.

I really want to know more about RSX...

If you are concerned for that reason, then why are you not concerned likewise for Xenos's pixel shading power.

IMO, it all comes down to how the hardware architects decided to implement flexibility into the systems as a whole. In the PS3, I see the flexibility on the CPU. In the X360, I see the flexibility on the GPU.
 
ROG27 said:
If you are concerned for that reason, then why are you not concerned likewise for Xenos's pixel shading power.
Because I'm not aware of any Xbox 360 titles recently that are using Xenon for pixel shading.

I just find it a bit odd that they somehow determined RSX's vertex shading was not up to snuff if they already resort to using Cell to aid. It's a bit confusing to me, is all.

I've heard some rumours that RSX may have less vertex shaders, instead leaning on Cell for that kind of work, but I still don't think that makes much sense.
 
zidane1strife said:
The RSX only has to deal with 720p, which b/w to #pixels ratio wise put it in a comparable situation to the top pc gpus which are expected to deal 1600x1200
So assuming that bandwidth per pixel was the defining metric, you're saying that current games which run well at 1600x1200 on a 7900GT will run equally well at 720p on RSX. Are you satisfied with the graphics quality of today's games that run well on a 7900GT at 16x12? Here are some 4xAA numbers for a 7900GT at 16x12:
COD2: 26.9 fps, FEAR 32 fps, HL2 Lost Coast 42.3fps, SS2 34.5fps...

Anyway, a more important fact is that the bandwidth per pixel output for any GPU is almost independent of resolution, so it doesn't matter that the resolution is. It will be held back by bandwidth.

PS3 will still give you good graphics, just like a 7600GT can put out good graphics at 1024x768. Especially in the hands of a good development team.
 
Titanio said:
As for vertex shading, for bursty vertex work it has an advantage, but if you were generally expending more power on vertex shading over the entire frametime than RSX is capable of, you'd be leaving yourself with quite a lot less power for pixel shading.
You only need 1/6th of the total shading ability to match today's fastest GPU's, so pixel shading resources are hardly effected (remember that vertex shaders rarely need the TMU, and pixels shaders are rarely ALU limited on Xenos). There's huge power in reserve if you want to go beyond today's vertex shader loads. The real advantage, IMO will be in vertex texturing. There are lots of cool things you can do with that (still being researched), and Xenos should be an order of magnitude faster. On the other hand, render to vertex buffer might negate some of that advantage, assuming it's implemented in PS3.

"Tens of percent" could be pretty significant! Well, if you were bound by these things, it'd be all you need to go from playable to non-playable, or 60fps to 30fps. Though I doubt you often would be..
True, but the point is that if RSX is any faster, say with stencil-only rendering (note: 16 ROPs are useful in this situation even with a 128-bit bus) or lots of low res texture accesses, it won't outdo Xenos by that much. Play into Xenos' strengths with parallax occlusion mapping, dynamically sampled soft shadow maps, alpha blending, AA, etc. and there's no contest.

All of this, of course, hinges on whether Xenos performs as expected. Both ATI and NVidia generally produce products that don't underperform given thier specs, so I think that's reasonable. I'm not saying it's a blowout, but I'd be very surprised if RSX came out on top if you had the same talent coding for both.

RSX gives you gobs of paper stats with a low cost (look at G71!), and as scooby and Asher have pointed out, that's more than half the battle. This isn't the open PC market, so hype can go unchallenged and untested.
 
Asher said:
1. Xenon cores are individually faster than SPEs are for such processing.
I disagree - but I am also quite sure scenarios can be constructed where both cores are faster then the other.

3. Xenon cores natively understand D3D formats
Yea, the new marketting speak for having scalar quantization pack/unpack instructions. I genuinely wonder, why until these were added to a MS cpu, no other company felt it necessary to flaunt them as a checkbox feature and a 'new data format'.
But I disgress.

I'm using the term loosely -- you don't want to trash it is all I'm saying.
What's there to trash? You don't want to trash DMA buffers either and you didn't talk about locking sections when you mentioned those.

I just find it a bit odd that they somehow determined RSX's vertex shading was not up to snuff if they already resort to using Cell to aid. It's a bit confusing to me, is all.
There's a lot of things a DX9 VS can't do (or does poorly) that SPE can do better. It's also one of the most obvious uses for cores with localstore(not saying it's the best or the most efficient use, just a very obvious one), so I don't see why would it be surprising for people to use them this way. Some of us have been using them in this manner for past 5-6 years, so it's only natural progression.
Speaking of which, going by this logic, if some title happened to say, use SPEs for pixel work, would you automatically assume RSX pixel shaders aren't up to snuff either?
 
Last edited by a moderator:
Fafalada said:
Yea, the new marketting speak for having scalar quantization pack/unpack instructions. I genuinely wonder, why until these were added to a MS cpu, no other company felt it necessary to flaunt them as a checkbox feature and a 'new data format'.
But I disgress.
It's still an advantage to have when using the CPU to aid a GPU.

Fafalada said:
There's a lot of things a DX9 VS can't do (or does poorly) that SPE can do better. It's also one of the most obvious uses for cores with localstore(not saying it's the best or the most efficient use, just a very obvious one), so I don't see why would it be surprising for people to use them this way. Some of us have been using them in this manner for past 5-6 years, so it's only natural progression.
Interesting; I'm not very familiar with DX9 shaders. What kind of things can the SPE do better in terms of vertex work compared to a VS?

Fafalada said:
Speaking of which, going by this logic, if some title happened to say, use SPEs for pixel work, would you automatically assume RSX pixel shaders aren't up to snuff either?
If a multi-platform launch title is using SPE for pixel work, that would tend to say to me that there may be some issues with RSX's pixel shaders versus Xenos'.

Early multiplatform games tend to be developed on one console then ported to the other. In MOH's case, Xbox 360 is the lead platform (IIRC), which somewhat sets the standard for how the game is built. I think, especially on the Xbox 360, it is unlikely that early titles will be using the CPU for vertex work unless it's procedural synthesis of some kind. If they ported this over to Cell+RSX and needed to use the SPEs for vertex shading in addition to RSX, that implies to me that the vertex shading power may not be at parity. I also don't see Xenos' vertex shaders as being inherently more flexible than RSX's shaders that RSX would need to use Cell to do it.

Or maybe I'm just misunderstanding what EA says. Are they using the SPEs for procedural synthesis/generation, or truly as vertex shaders?
 
Asher said:
Early multiplatform games tend to be developed on one console then ported to the other. In MOH's case, Xbox 360 is the lead platform (IIRC)

Actually this hasn't been defined either way, but FWIW all the demonstrations to media up until now (as well as the closed door conference you're referencing in this post) were running on PS3 devkits according to the developers.

Asher said:
Or maybe I'm just misunderstanding what EA says. Are they using the SPEs for procedural synthesis/generation, or truly as vertex shaders?

It was pretty vague as to what exactly was involved in their SPU utilization for graphics beyond the fillrate being bound after a certain number. This presentation was around 6-8 months ago iirc.
 
Last edited by a moderator:
It wasn't entirely clear, IMO, that the MoH team meant they were using a couple of SPEs for graphics.

I think reading all articles about MoH, it seems to be heavily suggested that PS3 is the lead platform, though. It was originally presented in the context of PS3 development at that Japanese developer conference, it has since been presented to the press on PS3, they've spoken about how they're running "experiments" on PS3 re. animation etc.

On a more general note, I don't see how using SPEs for vertex work, or any specific rendering work for that matter, says anything about RSX's capability. Using SPEs in such a fashion is far more likely to be borne out of desire rather than necessisty.
 
Mintmaster said:
Vertex work is very often quite bursty in nature. Clipped and culled polygons come in bunches, and you have no pixel work to do at all.
This is so true. The work of a good developer is also to make sure this burts are no more, or reduced at least.
People should not think SPEs as stupidly fast vertex transformers, we already have VS for that..
 
ROG27 said:
We can infer this is going to happen. Everything about the supposedly shared architecture of CELL/RSX points to this happening. Some are going as far as saying that SPEs and shader processors (or groups of shader processors) might be able to be lockstepped.
Some people think the earth is hollow, doesn't make it true.
 
Asher said:
It's still an advantage to have when using the CPU to aid a GPU.
Yes it is, but it would be so much better to have some fast local store than some conversion instruction for people that can't unpack a half float or a short using simple integer operations :)
Interesting; I'm not very familiar with DX9 shaders. What kind of things can the SPE do better in terms of vertex work compared to a VS?
What about destroying and creating vertices, doing any kind of operation that involves topology informations, culling chunks of vertices in a row, decompressing geometry encoded with exotic bit formats, and bla bla bla.. :)


If a multi-platform launch title is using SPE for pixel work, that would tend to say to me that there may be some issues with RSX's pixel shaders versus Xenos'.
Not all the pixel work is pixel shaders intensive.
If they ported this over to Cell+RSX and needed to use the SPEs for vertex shading in addition to RSX, that implies to me that the vertex shading power may not be at parity
I doubt people will use CELL to just to do some vertex shading, it would not be that much helpful
. I also don't see Xenos' vertex shaders as being inherently more flexible than RSX's shaders that RSX would need to use Cell to do it.
On some things they're more flexible, on others..well, not that flexible :) (dynamic branching..)
 
Asher said:
It's still an advantage to have when using the CPU to aid a GPU.
They are only really a big deal when you're not really computation bound though - most scalar quantizations are trivially short even without specialized instructions. I'd rather have something done on the issue of GPU understanding SoA, or free conversion between AoS and SoA (there's one console CPU that does this but it's not in a desktop machine).

Interesting; I'm not very familiar with DX9 shaders. What kind of things can the SPE do better in terms of vertex work compared to a VS?
Well nAo listed a lot of it - basically being able to access entire topology of your primitives opens up a whole new world that VS can't reach, or can only reach with excessive performance and memory overhead.
Deletion of primitives is an interesting area for optimizations as well - from avoiding entire batches to be sent over to GPU, to culling stuff on primitive level that only wastes GPU vertex setup (backfacing, degenerate etc. primitives)
And that's still just a few of the interesting things.

Anyway I see your point about lead platform in multiplat. titles, but we don't really know what the case is for MoH, maybe ERP could let us in on that secret :devilish:

If a multi-platform launch title is using SPE for pixel work, that would tend to say to me that there may be some issues with RSX's pixel shaders versus Xenos'.
What nAo said about pixel work - and besides, what if the lead platform was PS3?
 
Back
Top