Yeah but unless you're working for Id, Epic, or maybe Crytek, good luck getting a chance to actually take serious advantage of it on an actual project...
No, he won't. He's under NDA. The conclusion is Barbarian provided figures that probably aren't the full story, somehow or other.Would you be willing to tell us what you are thinking?
I'm curious, has this ability been confirmed in any way besides offloading some vertex processing to the Cell? I've always assumed that, despite any such offloading, the RSX will still be just as bandwidth limited when it comes to actually rendering the final image (especially at 1080p) as if it were being done on a PC, as the main bandwidth hogs are still completely dependent on the RSX.The interelationship between CELL and RSX affords the viability of composite frame rendering techniques, that is CELL and RSX alternating adding elements to a frame before it is rendered. No other current system will really be doing this to the extent PS3 will be.
I'm curious, has this ability been confirmed in any way besides offloading some vertex processing to the Cell? I've always assumed that, despite any such offloading, the RSX will still be just as bandwidth limited when it comes to actually rendering the final image (especially at 1080p) as if it were being done on a PC, as the main bandwidth hogs are still completely dependent on the RSX.
If not, I'm curious which parts are being offloaded to the Cell.
Amen. Frankly even if you do work for one of the big boys, you'll still have a slightly thicker API between you and the hardware, you'll still be making compromises for the variety of hardware actually out in the wild, and you're not Carmack or Sweeney so you'll probably just be making the tea anyway.
Please MrWibble and nAo, say no more. I'll write off Barbarian's comments. Thank you for your clarifications. Hate to get you guys into trouble over our petty questions. In fact, if you send a note to the mod to delete your posts and our responses, I'm ok too.
And the best way of doing that is to have zero overdraw.But you can easily see how the most important bandwidth saver is minimising the amount of data written to the framebuffer.
Considerably more, potentially. I doubt you'll find anyone willing to put a figure to that, but knowing exactly what your GPU is, and writing to it more directly, you can structure you graphics engine to be a better fit. eg. In the PC space you have separate vertex and pixel workloads, and no idea what the performance of a PC will have in either of these fields. You might have a 2:1 ratio of pixel shaders to vertex shaders, or a 4:1, or a 6:1. If your engine is heavy on the vertex workload, pixel shaders can be sitting idle a lot of the time if you have them in abundance. You might also have a set of pixel shaders that run poorly on one GPU and don't tax another, but you can't target the top-end without alienating your bottom end. Even with top-end GPUs, ATi and nVidia can have quite different characteristics. In a closed box, you can balance your design to accomodate exactly the hardware. You can aim for a mix of vertex and pixel workloads that match the PS:VS ratio, and can keep careful tabs on BW use to balance it out and maximise data access. In the PC space, generally you have a target for BW and any more BW just allows for higher refresh or texture res.When it comes to optimizing a GPU in a closed system how much more (relatively speaking) power can you eeek out than a very similiar chip in a typical PC environment? But if someone put an off the shelf GPU into a closed box themselves is there really a lot they could do to get more power out of it?
Considerably more, potentially. I doubt you'll find anyone willing to put a figure to that, but knowing exactly what your GPU is, and writing to it more directly, you can structure you graphics engine to be a better fit. eg. In the PC space you have separate vertex and pixel workloads, and no idea what the performance of a PC will have in either of these fields. You might have a 2:1 ratio of pixel shaders to vertex shaders, or a 4:1, or a 6:1. If your engine is heavy on the vertex workload, pixel shaders can be sitting idle a lot of the time if you have them in abundance. You might also have a set of pixel shaders that run poorly on one GPU and don't tax another, but you can't target the top-end without alienating your bottom end. Even with top-end GPUs, ATi and nVidia can have quite different characteristics. In a closed box, you can balance your design to accomodate exactly the hardware. You can aim for a mix of vertex and pixel workloads that match the PS:VS ratio, and can keep careful tabs on BW use to balance it out and maximise data access. In the PC space, generally you have a target for BW and any more BW just allows for higher refresh or texture res.
I think the best comparison is look at a last-gen console like XB and PS2, and compare what it's producing now to what a top-end PC of the time can produce now.
And the best way of doing that is to have zero overdraw.
In theory you could use an SPE to perform (tiled) occlusion queries on all triangles, before they're submitted to RSX. This is a sort of similar technique to the tiled predication that Xenos uses. This way you get a humungous reduction in overdraw, vastly reducing the fillrate used. With the added benefit that the pixel shaders can spend more time doing funky stuff to the pixels you do end up seeing, because they're not lumbered with shading pixels that you end up never seeing.
Jawed
And the best way of doing that is to have zero overdraw.
In theory you could use an SPE to perform (tiled) occlusion queries on all triangles, before they're submitted to RSX. This is a sort of similar technique to the tiled predication that Xenos uses. This way you get a humungous reduction in overdraw, vastly reducing the fillrate used. With the added benefit that the pixel shaders can spend more time doing funky stuff to the pixels you do end up seeing, because they're not lumbered with shading pixels that you end up never seeing.
Well they offered general performance significantly under the PC baseline - even if you looked at theoretical numbers. I'd argue that situation is a lot different this generation.Rangers said:You think for their times Gamecube and Xbox CPU sucked?
750 series have very minimal OOOe - so if Wii CPU is indeed derivative of it, I most certainly wouldn't call that beefy.It's funny, I think the Wii CPU may the most powerful at a glance out of all this generations CPU's..729 mhz beefy OOOE.
Early Z won't save you geometry bus traffic or VS. Exactly at what point that would make any significant performance difference will depend on each application though.darkblu said:i don't think that'd be worth the effort given that the same effect can be achieved with an early-out z-test done over front-to-back sorted scene.
From ATI's recent "improved hierarchical-Z" patent it seems that the Z-buffer precision in existing ATI GPUs isn't great, when testing Z before pixel shading. Full-precision Z-testing only occurs in the ROPs. Which is way too late if it turns out the fragment shouldn't be seen. The patent implies that enough fragments "fall through the cracks" that a more advanced technique is a performance gain.i don't think that'd be worth the effort given that the same effect can be achieved with an early-out z-test done over front-to-back sorted scene. bar transparencies, of course, which would be your heaviest overdraw contributor anyway.
you are taking for granted that what barbarian wrote is correct