any news about RSX?

MrWibble · Sep 30, 2006

I'll probably regret poking my head into this thread, but I think I know what nAo was talking about, and if he means what I think he means, I think he's right.

Right, good, glad that's all cleared up! No need to thank me!

Arwin · Sep 30, 2006

I trust nAo on this. He has a very good reputation on this board, and is also someone who is involved with PS3 graphics development at the highest level possible (outside of designing the hardware itself of course).

MrWibble · Sep 30, 2006

archie4oz said:
Yeah but unless you're working for Id, Epic, or maybe Crytek, good luck getting a chance to actually take serious advantage of it on an actual project...

Amen. Frankly even if you do work for one of the big boys, you'll still have a slightly thicker API between you and the hardware, you'll still be making compromises for the variety of hardware actually out in the wild, and you're not Carmack or Sweeney so you'll probably just be making the tea anyway.

babcat · Sep 30, 2006

Mr. Wibble,

Would you be willing to tell us what you are thinking? I would sure like to find out if those numbers for the RSX were legitimate or not. For the record, I want to say that I have NO reason to doubt Barbarian unless some other information is provided that refutes his claims. But I'm willing to listen to what everyone has to say on the matter.

Shifty Geezer · Sep 30, 2006

babcat said:
Would you be willing to tell us what you are thinking?

No, he won't. He's under NDA. The conclusion is Barbarian provided figures that probably aren't the full story, somehow or other.

HolySmoke · Sep 30, 2006

ROG27 said:
The interelationship between CELL and RSX affords the viability of composite frame rendering techniques, that is CELL and RSX alternating adding elements to a frame before it is rendered. No other current system will really be doing this to the extent PS3 will be.

I'm curious, has this ability been confirmed in any way besides offloading some vertex processing to the Cell? I've always assumed that, despite any such offloading, the RSX will still be just as bandwidth limited when it comes to actually rendering the final image (especially at 1080p) as if it were being done on a PC, as the main bandwidth hogs are still completely dependent on the RSX.

If not, I'm curious which parts are being offloaded to the Cell.

patsu · Sep 30, 2006

Please MrWibble and nAo, say no more. I'll write off Barbarian's comments. Thank you for your clarifications. Hate to get you guys into trouble over our petty questions. In fact, if you send a note to the mod to delete your posts and our responses, I'm ok too.

Arwin · Sep 30, 2006

HolySmoke said:
I'm curious, has this ability been confirmed in any way besides offloading some vertex processing to the Cell? I've always assumed that, despite any such offloading, the RSX will still be just as bandwidth limited when it comes to actually rendering the final image (especially at 1080p) as if it were being done on a PC, as the main bandwidth hogs are still completely dependent on the RSX.

If not, I'm curious which parts are being offloaded to the Cell.

Ok, I've just looked at the latest 1up episode, which has an interview with the Lair guy from Factor 5 (sorry, I really should learn his name - Julian Eggebrecht, there you go).

When asked why Lair managed 1080p with so many effects while other developers are still on 720p, he basically confirmed that fillrate is really the major bottleneck - the RSX is plenty fast in terms of shader and vertex power to manage 1080p, but fillrate is more of a challenge.

Their solution is (probably rather obvious in theory, but I'm guessing less so in practice

) to implement some very clever mathematics which determine what items need to be drawn and what do not. Since fillrate is the only bottleneck and he thinks it can be overcome for most cases by improving such algorithms, he expects that more and more games will support 1080p in the future.

So personally, I don't think that the Cell/XDR/RSX connection is going to solve any fillrate issues directly. At best I could imagine that the Cell processor actually does some of the calculations on what needs to be drawn when, but that might be done on the RSX just as well, I'm not sure.

Instead, I think the Cell/XDR/RSX connection is important to offload the RSX/GDDR3 memory by holding and streaming in textures, with the added advantage that the Cell can modify (think darken/lighten or even maybe add shadow) or even generate textures.

Assuming (and I don't know if that is correct) that the read and write bandwidth useage don't affect each other, then still if a texture resided in GDDR3 memory prior to use, that would also maybe mean that you have extra bandwidth spillage for writing textures into the GDDR3 memory, and then writing them to the framebuffer in that GDDR3 again when drawing the textured poly, whereas if the texture can be streamed in from the XDR memory to the textured poly draw routine directly that would save a little on bandwidth, too.

But you can easily see how the most important bandwidth saver is minimising the amount of data written to the framebuffer.

Note that this post is a testament (good or bad

) to what someone can learn from hanging out on a forum, as everything I've written here comes from listening to you guys discuss this and the odd interview with a developer. I have virtually no experience with 3D graphics otherwise, besides the very low tech project in my sig.

pjbliverpool · Sep 30, 2006

MrWibble said:
Amen. Frankly even if you do work for one of the big boys, you'll still have a slightly thicker API between you and the hardware, you'll still be making compromises for the variety of hardware actually out in the wild, and you're not Carmack or Sweeney so you'll probably just be making the tea anyway.

Only slightely thicker? I thought the difference was quite significant? How much do you think DX10 will change that?

babcat · Sep 30, 2006

patsu said:
Please MrWibble and nAo, say no more. I'll write off Barbarian's comments. Thank you for your clarifications. Hate to get you guys into trouble over our petty questions. In fact, if you send a note to the mod to delete your posts and our responses, I'm ok too.

Umm.. Mr. Wibble and nAo did not reveal anything to us and obviously did not break any NDAs. Barbarian on the other hand did indeed violate an NDA and gave us some important information. It was his choice and I am thankful for it. He didn't have to give us anything, but he made the choice to give us the information.

I don't think anyone's posts or comments need to be deleted.

Now to move onto something related, but hopefully something people can comment about.

When it comes to optimizing a GPU in a closed system how much more (relatively speaking) power can you eeek out than a very similiar chip in a typical PC environment? It's obvious that the CELL has a lot of power that can be tapped. Because it's just so amazingly powerful in the first place. But if someone put an off the shelf GPU into a closed box themselves is there really a lot they could do to get more power out of it?

Jawed · Sep 30, 2006

Arwin said:
But you can easily see how the most important bandwidth saver is minimising the amount of data written to the framebuffer.

And the best way of doing that is to have zero overdraw.

In theory you could use an SPE to perform (tiled) occlusion queries on all triangles, before they're submitted to RSX. This is a sort of similar technique to the tiled predication that Xenos uses. This way you get a humungous reduction in overdraw, vastly reducing the fillrate used. With the added benefit that the pixel shaders can spend more time doing funky stuff to the pixels you do end up seeing, because they're not lumbered with shading pixels that you end up never seeing.

Jawed

Shifty Geezer · Sep 30, 2006

babcat said:
When it comes to optimizing a GPU in a closed system how much more (relatively speaking) power can you eeek out than a very similiar chip in a typical PC environment? But if someone put an off the shelf GPU into a closed box themselves is there really a lot they could do to get more power out of it?

Considerably more, potentially. I doubt you'll find anyone willing to put a figure to that, but knowing exactly what your GPU is, and writing to it more directly, you can structure you graphics engine to be a better fit. eg. In the PC space you have separate vertex and pixel workloads, and no idea what the performance of a PC will have in either of these fields. You might have a 2:1 ratio of pixel shaders to vertex shaders, or a 4:1, or a 6:1. If your engine is heavy on the vertex workload, pixel shaders can be sitting idle a lot of the time if you have them in abundance. You might also have a set of pixel shaders that run poorly on one GPU and don't tax another, but you can't target the top-end without alienating your bottom end. Even with top-end GPUs, ATi and nVidia can have quite different characteristics. In a closed box, you can balance your design to accomodate exactly the hardware. You can aim for a mix of vertex and pixel workloads that match the PS:VS ratio, and can keep careful tabs on BW use to balance it out and maximise data access. In the PC space, generally you have a target for BW and any more BW just allows for higher refresh or texture res.

I think the best comparison is look at a last-gen console like XB and PS2, and compare what it's producing now to what a top-end PC of the time can produce now.

pjbliverpool · Sep 30, 2006

Shifty Geezer said:
Considerably more, potentially. I doubt you'll find anyone willing to put a figure to that, but knowing exactly what your GPU is, and writing to it more directly, you can structure you graphics engine to be a better fit. eg. In the PC space you have separate vertex and pixel workloads, and no idea what the performance of a PC will have in either of these fields. You might have a 2:1 ratio of pixel shaders to vertex shaders, or a 4:1, or a 6:1. If your engine is heavy on the vertex workload, pixel shaders can be sitting idle a lot of the time if you have them in abundance. You might also have a set of pixel shaders that run poorly on one GPU and don't tax another, but you can't target the top-end without alienating your bottom end. Even with top-end GPUs, ATi and nVidia can have quite different characteristics. In a closed box, you can balance your design to accomodate exactly the hardware. You can aim for a mix of vertex and pixel workloads that match the PS:VS ratio, and can keep careful tabs on BW use to balance it out and maximise data access. In the PC space, generally you have a target for BW and any more BW just allows for higher refresh or texture res.

I think the best comparison is look at a last-gen console like XB and PS2, and compare what it's producing now to what a top-end PC of the time can produce now.

I don't think there is really anything on xbox which can out do what a Ti4200 powered PC could demonstrate at low resolution. Obviously the Ti4200 is more powerful than NV2a but not hugely so.

ROG27 · Oct 1, 2006

Jawed said:
And the best way of doing that is to have zero overdraw.

In theory you could use an SPE to perform (tiled) occlusion queries on all triangles, before they're submitted to RSX. This is a sort of similar technique to the tiled predication that Xenos uses. This way you get a humungous reduction in overdraw, vastly reducing the fillrate used. With the added benefit that the pixel shaders can spend more time doing funky stuff to the pixels you do end up seeing, because they're not lumbered with shading pixels that you end up never seeing.

Jawed

That is a clever use of an SPE that I had not thought of.

Carl B · Oct 1, 2006

I like your style Jawed.

darkblu · Oct 1, 2006

Jawed said:
And the best way of doing that is to have zero overdraw.

In theory you could use an SPE to perform (tiled) occlusion queries on all triangles, before they're submitted to RSX. This is a sort of similar technique to the tiled predication that Xenos uses. This way you get a humungous reduction in overdraw, vastly reducing the fillrate used. With the added benefit that the pixel shaders can spend more time doing funky stuff to the pixels you do end up seeing, because they're not lumbered with shading pixels that you end up never seeing.

i don't think that'd be worth the effort given that the same effect can be achieved with an early-out z-test done over front-to-back sorted scene. bar transparencies, of course, which would be your heaviest overdraw contributor anyway.

Fafalada · Oct 1, 2006

Rangers said:
You think for their times Gamecube and Xbox CPU sucked?

Well they offered general performance significantly under the PC baseline - even if you looked at theoretical numbers. I'd argue that situation is a lot different this generation.

It's funny, I think the Wii CPU may the most powerful at a glance out of all this generations CPU's..729 mhz beefy OOOE.

750 series have very minimal OOOe - so if Wii CPU is indeed derivative of it, I most certainly wouldn't call that beefy.

darkblu said:
i don't think that'd be worth the effort given that the same effect can be achieved with an early-out z-test done over front-to-back sorted scene.

Early Z won't save you geometry bus traffic or VS. Exactly at what point that would make any significant performance difference will depend on each application though.
But you're right, as a fillrate saving optimization, it'd be pretty pointless.

Jawed · Oct 1, 2006

darkblu said:
i don't think that'd be worth the effort given that the same effect can be achieved with an early-out z-test done over front-to-back sorted scene. bar transparencies, of course, which would be your heaviest overdraw contributor anyway.

From ATI's recent "improved hierarchical-Z" patent it seems that the Z-buffer precision in existing ATI GPUs isn't great, when testing Z before pixel shading. Full-precision Z-testing only occurs in the ROPs. Which is way too late if it turns out the fragment shouldn't be seen. The patent implies that enough fragments "fall through the cracks" that a more advanced technique is a performance gain.

Obviously that's ATI, and this is NVidia we're talking about. But the nature of early Z-testing in the pixel pipeline is such that overdraw is inevitable (you'd need a vast amount of on-die memory for a full-precision Z-buffer).

---

That wasn't what I was thinking of earlier, to be honest.

I'm in a quandary over this stuff. I don't know where (if) the costs of occlusion culling result in a net gain. Far too far out of the loop and hoping that some devs will talk about this subject in more detail some time.

e.g. there's support for occlusion queries in Xenos. I don't understand if they're worth using, or if they're different (more useful, faster, more precise, more timely?) from earlier GPUs.

I dunno...

Jawed

Barbarian · Oct 5, 2006

nAo said:
you are taking for granted that what barbarian wrote is correct

How did I miss all that love. Anyway, to the best of my knowledge, what I wrote was correct.
nAo might know things better than me since he has been working on ps3 for much longer i bet.
So take all that with a grain of salt of course.

any news about RSX?

MrWibble

Arwin

Now Officially a Top 10 Poster

MrWibble

babcat

Shifty Geezer

uber-Troll!

HolySmoke

patsu

Arwin

Now Officially a Top 10 Poster

pjbliverpool

B3D Scallywag

babcat

Jawed

Shifty Geezer

uber-Troll!

pjbliverpool

B3D Scallywag

ROG27

Carl B

Friends call me xbd

darkblu

Fafalada

Jawed

Barbarian

Similar threads