RSX pixel shaders vs Xenos pixel shaders

kabacha · May 4, 2006

nAo, I know you are developing a game on PS3 and have the PS3 dev kits , but do you have an Xbox 360 dev kit that you have worked with Xenos on? Dave Baumann it seems closest here on this forum to have the full knowledge on Xenos

onanie · May 4, 2006

Jawed said:
I think you're confusing two aspects of unified shading:

it optimises the distribution of workload between vertex and pixel shader code

it (in cohort with streamout) allows algorithms that are traditionally employed on CPUs, algorithms that process polys/triangles, to run much much faster on the GPU. While these "render passes" are operational, pixel shading simply isn't occurring. Do you want 100MVerts/s or 500MVerts/s performance while you do that? Bearing in mind the faster you do these render passes, the more time you'll have left to do other things?

Jawed

Commenting on the same graph which depicts the work required in one particular frame (chosen by ATI nonetheless), the ratio of pixel to vertex workload is still maintained overall (AUC). The proportion of this particular frame where vertex wordload completely replaces pixel workload is perhaps about 10%. Even so, I might just echo Phil's sentiment in wondering why, in a parallelized pipeline, should the pixel pipelines stop all the time while the vertex pipelines are busy?

rounin · May 4, 2006

kabacha said:
nAo, I know you are developing a game on PS3 and have the PS3 dev kits , but do you have an Xbox 360 dev kit that you have worked with Xenos on? Dave Baumann it seems closest here on this forum to have the full knowledge on Xenos

If you read the quote above yours, its clear that Dave receives his information from ATi as well.

kabacha · May 4, 2006

rounin said:
If you read the quote above yours, its clear that Dave receives his information from ATi as well.

thats what I meant. Dave has contacts with ATI as he interviewed them about Xenos so he is the closest person/developer on this forum to know the reality about Xenos, nAo on the other hand is working on a PS3 exclusive with a PS3 dev kit and im asking him if he has worked on Xenos to make an actual comparison

onanie · May 4, 2006

kabacha said:
thats what I meant. Dave has contacts with ATI as he interviewed them about Xenos so he is the closest person/developer on this forum to know the reality about Xenos, nAo on the other hand is working on a PS3 exclusive with a PS3 dev kit and im asking him if he has worked on Xenos to make an actual comparison

In that case, neither has Dave any experience with the PS3 to make a more valid actual comparison.

rounin · May 4, 2006

onanie said:
In that case, neither has Dave any experience with the PS3 to make a more valid actual comparison.

Nor does he have any "actual experience" with Xenos.

nAo · May 4, 2006

kabacha said:
nAo, I know you are developing a game on PS3 and have the PS3 dev kits , but do you have an Xbox 360 dev kit that you have worked with Xenos on? Dave Baumann it seems closest here on this forum to have the full knowledge on Xenos

Unfortunately I never put my hands on a 360 dev kit

3dcgi · May 4, 2006

Rockster said:
Can we get back to the more relevant issue and go back to discussing flop count between architectures? All kidding aside. I do have a question. Back when everyone was counting flops, most RSX counts included the free fp16 normalize to substantiate Sony's marketing numbers (right, wrong, or indifferent).

Can someone tell me what the following means? It was taken from Dave's Xenos article but seen in all block diagrams, etc. It's not clear to me what PS input interpolates do, are they DX7 styles register combiners, how are they typically used, should it contribute to flop count, does G71 have the same ability, anything?

I can't say for sure what the statement in the article means but when I read interpolates I think of interpolating per vertex values like texture coordinates and color to per pixel. I don't know if GPUs typically have dedicated interpolation hardware or if they use pixel shaders for this.

ROG27 · May 4, 2006

nAo said:
Unfortunately I never put my hands on a 360 dev kit

Doesn't mean you haven't seen the White Papers!

onanie · May 4, 2006

ROG27 said:
Doesn't mean you haven't seen the White Papers!

I have seen a lot of white papers recently. They haven't got much writing on them that's all

TurnDragoZeroV2G · May 4, 2006

onanie said:
I don't think I need to remind you that the bandwidth between the parent and daughter die is 32gb/s. Compressed data doesn't magically uncompress after crossing that bridge.

I seriously doubt the cost of the hardware for unpacking at the daughter die comes even close to standard z/color compression, and having that sitting in the pipeline. But, eh, email ATI and ask 'em. Heh.

After all that, I must still ask - does xenos' edram have redundancy?

No. Ultimately ATI decided it wasn't worth a few extra transistors to increase their yields on the memory. They'd just absorb the defects they got. Honestly, if that's not a satisfactory answer, then email ATI. I don't know if their answer will be any clearer than the one they generally give out for some person asking exactly how much texture cache they have or how many transistors each part of the chip consists of.

Getting rather philosophical, no? My question originally was "What kind of visual effects does dynamic branching allow exclusively?"

Er, I'll explicitly state it then: Don't expect for there to be a multitude of things NOW. Ask for a list in a few years. Nevertheless, there's speeding up shadows by branching and choosing how many samples to take, taking fewer in complete shadow and far more on edges to give better soft shadows without incurring the pain of uselessly oversampling the entire shadow. Selectively supersampling in the shader to reduce aliasing, even just in certain sections instead of the entire shader(s) for each pixel. Or parallax occlusion mapping. Exclusive effects? No. But I'd call it BS to say it's not worth quite a lot.

You would need to demonstrate the "numerous" situations where the VS load is "very high". While you might accuse one of saying (and I did not) that "we haven't seen it in games, thus there must be no use for it", is it not technically right to respond that in the months that the xenos hardware has been available to developers, "it" has not been used where it is available? While you would say that "it's because it hasn't been an option that we dont' see it, not that there aren't uses", I might just ask again what I have done before - "show me".

No, it's not right to respond that way. It can be said it hasn't been seen yet in games out now that very likely didn't allot time for people to create novel shaders. It can be said we haven't heard anything from developers, many of whom are reluctant or NDAd from speaking about techniques in games currently being developed or out now, or better yet, it's not worth talking about them to other people besides developers. And maybe we haven't seen it simply because it's not something you can see. If something is behind the scenes, would you therefore consider it a wasted effort?

You also mention developers having hardware for "months" as if it's been a long time. If you think I'm arguing it's a godsend, you've formulated the wrong impression. I don't expect new techniques to create such giant leaps visually that people will be able to point them out when they see them. But that's not what we're here for. We're talking about the technical merits of hardware here, not how good games look. Developers are far too important for hardware, especially this generation, to make that big a difference. Perhaps you disagree with that. I don't know.

But in the end, if you can dedicate all your hardware to the vertex load when it's highest, then you've just shortened the rendering time for that section by up to a factor of 5. Meanwhile, you gain a small benefit everywhere else.

With any particular method, its advantage might be real if other methods for achieving the same prove to be less effective. To produce the same effect, a different kind of code might run just as well on a different hardware.

Hence the inclusion of "the option available on the other card" in my post. With a wider range of possibilities, based upon a more diverse/flexible feature set, it's more likely that you're going to find that technique that is 10x faster than on the other card, and produces Y visual effect at such and such quality. On the other card, you use that technique that's 8x faster on that hardware relative to the first one. But maybe it's not quite as good, or in the end it's still a little more resource intensive than the technique on the first card. And a number of such things add up, and it's not so insignificant anymore. It's not like anyone's trying to say it'll double RSX's performance. But anyone who thinks these things are only going to add up to 2% gains over itself, IMO, has fallen off their cookie. If such people are right, then that means the top minds at ATI are utter failures, and wasted significant portions of all their latest chips, from R5xx to C1 and soon to be R6xx.

Commenting on the same graph which depicts the work required in one particular frame (chosen by ATI nonetheless), the ratio of pixel to vertex workload is still maintained overall (AUC). The proportion of this particular frame where vertex wordload completely replaces pixel workload is perhaps about 10%. Even so, I might just echo Phil's sentiment in wondering why, in a parallelized pipeline, should the pixel pipelines stop all the time while the vertex pipelines are busy?

Overall is a nearly useless metric. Somewhere in there, resources are being wasted. Those unused vertex shaders could go towards giving you an additional 10%, 20%, however much brute force on a regular basis. On the other hand, it could be that the vertex shaders aren't wasted that much, and instead you spend a non-negligible amount of time totally bound by vertex shaders because you can't process the data fast enough and can't go onto pixel shading until more gets done (for whatever you may be using it all for). Dedicate all your resources to it and now that amount of time spent there is insignificant, or at least less significant. Don't we have such high fillrates not because we actually need to write that many pixels, but because we want it to write at that rate when it comes time to do the job? Unify your hardware and suddenly doing this is less wasteful of the hardware you have available to you.

And it's simply going to be the case that you'll have shaders where you can't shade any more pixels until you're done with the current task. You're probably going to finish rendering your shadow before you decide to start pixel shading. And why not cut out the shader instructions that don't matter when there's a shadow on that pixel. Another use for branching.

Or hey, maybe i'm completely talking out my ass (instead of only partially), RSX has more shader power, and Xenos' feature set isn't going to make up for a deficiency nor take a "slight advantage" in say shading and turn it into a better looking gain. MS spent their money on the GPU only to fall short of Sony's machine with the GPU, and Sony's focus on the CPU means the PS3 totally spanks 360 and developers are going to really have it cut out for them to get 360 titles to match PS3 titles.

I honestly don't know.

onanie · May 4, 2006

TurnDragoZeroV2G said:
I seriously doubt the cost of the hardware for unpacking at the daughter die comes even close to standard z/color compression, and having that sitting in the pipeline. But, eh, email ATI and ask 'em. Heh...

Maybe I misunderstood you, but didn't you just say "there's no need to compress anything"?

I take your point on the rest of your post, but I will suggest this scenario. If you can complete pixel work faster, as a result of having more pixel shaders instead of a US architecture (within a fixed hardware budget), the advantage is offset to a degree.

Titanio · May 4, 2006

nAo said:
I really can't cause I don't have other info, but what I got (or what I think I got, I migt be wrong though) is that in some cases you can schedule a float2 op on the same ALU that is being used to perspectivelly correct an interpolator.

Thanks!

Jawed · May 4, 2006

onanie said:
Commenting on the same graph which depicts the work required in one particular frame (chosen by ATI nonetheless), the ratio of pixel to vertex workload is still maintained overall (AUC). The proportion of this particular frame where vertex wordload completely replaces pixel workload is perhaps about 10%. Even so, I might just echo Phil's sentiment in wondering why, in a parallelized pipeline, should the pixel pipelines stop all the time while the vertex pipelines are busy?

The simplest example is a Z-only pass, which is designed to fill the Z-part of the backbuffer with the polygon depths of the entire scene. This is a technique used by game engines to cut the shaded overdraw (i.e. fragments that end up shaded, but when the final Z test is done, they end up culled as they're not visible). While doing this, there is no pixel shader workload.

Shadow rendering algorithms also draw Z only. With streamout you also have algorithms that don't generate pixel shading workload, such as on-GPU particle systems or skinning.

Oh, and ATI didn't choose that frame. I already explained that's Victor Moya's work.

Jawed

Dave Baumann · May 4, 2006

TurnDragoZeroV2G said:
I seriously doubt the cost of the hardware for unpacking at the daughter die comes even close to standard z/color compression, and having that sitting in the pipeline. But, eh, email ATI and ask 'em. Heh.

Only Z is compressed between the parent and daughter die - Z is already at the MSAA coverage level, but the MSAA colour samples are produced on the daughter die.

Nemo80 · May 6, 2006

Well Xenos "sounds" good. To good to be true. Even EDRAM sounds good, helpful.

But when you use it in real life it has one major disadvantage, that totally messes up all advantages, and that is the raised geometry level when tiling is used, which is necessary on heavy HDR / AA usage. This is fact.

And now a theory on USA: Also sounds very good, but i think one reason why almost ALL 360 games run on poor framerate with lots of framedrops is because of the USA. Yes, it can loadbalance itself, but on the other hand it takes away control from the dev over what is happening on screen and the exact predictability of wha resources are available at a certain time - and workloads might easily run out of borders...

So AA / Tiling works nice on low poly games (Beat em ups ...) but sucks ass when high poly counts are used, because geometry levels cannot be matched.

3dcgi · May 6, 2006

Nemo80 said:
And now a theory on USA: Also sounds very good, but i think one reason why almost ALL 360 games run on poor framerate with lots of framedrops is because of the USA. Yes, it can loadbalance itself, but on the other hand it takes away control from the dev over what is happening on screen and the exact predictability of wha resources are available at a certain time - and workloads might easily run out of borders...

I'd argue that on the PC there's no exact predictability and a unified shader doesn't take any control away from developers. Performance is a little less intuitive with a US but developers don't have much control of shader execution in a traditional architecture so they aren't losing anything.

ERP · May 6, 2006

Nemo80 said:
Well Xenos "sounds" good. To good to be true. Even EDRAM sounds good, helpful.

But when you use it in real life it has one major disadvantage, that totally messes up all advantages, and that is the raised geometry level when tiling is used, which is necessary on heavy HDR / AA usage. This is fact.

And now a theory on USA: Also sounds very good, but i think one reason why almost ALL 360 games run on poor framerate with lots of framedrops is because of the USA. Yes, it can loadbalance itself, but on the other hand it takes away control from the dev over what is happening on screen and the exact predictability of wha resources are available at a certain time - and workloads might easily run out of borders...

So AA / Tiling works nice on low poly games (Beat em ups ...) but sucks ass when high poly counts are used, because geometry levels cannot be matched.

Drivel....

Edge · May 6, 2006

ERP said:
Drivel....

It responses like this I don't understand and there have been a number of them lately on the forum. If it's not too much trouble please explain WHY!!!!

After all this is a discussion forum, and these type of comments do not invite discussion.

It would be my guess you don't understand the issue enough to discuss things?

It's my understanding there is a lot of truth to what Nemo80 is saying, because I remember one of the problems with the PowerVR rendering method, that with high geometry counts spanning tiles, caused huge display lists to be created, which not only effected rendering speed but took up a lot more memory.

aaaaa00 · May 6, 2006

Edge said:
It responses like this I don't understand and there have been a number of them lately on the forum. If it's not too much trouble please explain WHY!!!!

Maybe, ERP likes working where he's working?

RSX pixel shaders vs Xenos pixel shaders

kabacha

onanie

rounin

kabacha

onanie

rounin

nAo

Nutella Nutellae

3dcgi

ROG27

onanie

TurnDragoZeroV2G

onanie

Titanio

Jawed

Dave Baumann

Gamerscore Wh...

Nemo80

3dcgi

ERP

Edge

aaaaa00

Similar threads