Can anyone expand on these slides pls? GPU clipping twice as fast as before? Auto Z? 3.2 GB/s (84 times) increase for the shader compliler? (hope these slides are not too big)
MasterDisaster said:http://www.larabie.net/xbox360/360PerformanceUpdate.jpg
http://www.larabie.net/xbox360/PredicatedTilingPerf.jpg
http://www.larabie.net/xbox360/ShaderPerfFix.jpg
Here's the text from *some* of the slides:
--------------------------------------
AutoZ Vertex Shaders
Compiles vertex shaders into 2 versions bundled together:
One version outputs position only
Other version does all outputs
D3D automatically chooses the appropriate version when loading shader
Results are guaranteed identical (no Z fighting)
2 versions are smaller than can be done manually since constants are shared
AutoZ & BeginZPass/EndZPass
This combination gets you:
An optimized GPU Z pre-pass
Double-speed fill
Optimized Z pre-pass vertex shaders
With no CPU cost
And free bounds determination for predicated tiling
Predicated Tiling Perf
D3D’s tiling perf has steadily improved:
Compiler makes more efficient vertex shaders
E.g., too many vfetches often made vertex shaders vfetch-bound, hurting tiling disproportionately
GPU’s clipping configured to be twice as fast
Support for D3DTILING_ONE_PASS_ZPASS
One Z pre-pass works for all tiles
AutoZ means BeginZPass/EndZPass is faster
Good because tile patching now done at end of first Z pre-pass
Predicated Tiling Perf
Before evaluating tiling perf, don’t forget to:
Use D3DCREATE_BUFFER_2_FRAMES
...and so also double size of secondary ring buffer
Check for bad CPU/GPU synchronization via
% Frame GPU Wasted counter in PIX
Note that Tiling can save memory:
E.g., 1280×720×32bpp at 4× multisampling
28 MB on traditional architecture
10 MB EDRAM + 3.5 MB extra front buffer + ≈2.5 MB extra secondary buffer = 16 MB with tiling
GpuLoadShaders
Allows shader loads to be predicated
Useful with BeginZPass/EndZPass:
GpuLoadShaders allows simpler shader substitution during Z pass
Useful with command buffers:
A single command buffer can potentially encode multiple different passes
Keep using SetVertex/PixelShader for normal rendering
Have to GpuOwn literal constants
Not faster than regular APIs
Memory Performance Moral
Write-combining tips:
__storewordupdate, __storefloatupdate ensure proper ordering
volatile does, too, but can introduce suboptimal code gen
Always look at generated assembly
Beware of memcpy
Consider switching to cacheable
WC was 41 MB/s before, 3.4 GB/s after
Cacheable was 310 MB/s before, 1.0 GB/s after
Cacheable is more forgiving
Saving Memory
Read white paper “Xbox 360 Texture Storageâ€
Then use new APIs when you bundle your textures:
XGAddress2D/3DTiledExtent to get true allocation size
XGGetTextureLayout to enumerate unused portions that can be used for other stuff
XGSet[Cube|Array]TextureHeaderPair to pack two 128×128 (or smaller) textures in the same space required for one
Good Swap Synchronization
Use D3DCREATE_BUFFER_2_FRAMES to allow CPU 1 frame ahead of GPU
And resize ring buffer appropriately via D3DPRESENT_PARAMETERS:: RingBufferParameters
Let D3D do swap throttling
Double buffer all dynamic per-frame resources
--------------------------------------
I know this is going to hurt some people's feelings, but one of these days it needs to be said at least once.
If there needs to be a software update to tweak the load-balancing characteristics of Xenos, it can only mean that the thing is fundamentally broken.
Unified shading's key benefit is automatic load balancing. Shuffling around of compute resources on the fly, at any time, which specifically has to include mid-batch. Software can't do this. The hardware must manage itself. If it doesn't do that, the effort is wasted. If Xenos doesn't do that, well, that's ... bad.
The hardware... does do that. And from day 1 it was noted developers can get their hands dirty with algorhythms that are tailored to tailor to their specific criteria (wow, control is now bad?). But as a dev here said, he was guestimating less than 5% gain from doing such.
You want to discuss or just be a venting jerk? If you want to discuss stuff, please don't pretend I've written stuff I haven't written. Then you'll get a proper reply.Wow, so GPUs are always fundamentally broken.
Wtf! Back off now. Whatever you think I've written, you have got the wrong man. Seriously. My posting history is open for searches.Acert93 said:The fact you constant drive at these same fundamental points, with no evidence, really clears things up. Thanks.
To be precise, they ameliorated the performance of some elements of the D3D API and the HLSL compiler from the XeDK.Now we're getting somewhere. We finally have official comfirmation from MS themselves of unlocking the full power of the Xbox 360 GPU instead of some random forum post. Interesting....
Well it was an advantage, if not the advantage that lb is done automatically to always get 100% utilization. If you must constantly tweak it to get the best possible performance, than there really is no advantage over not having it at all imo, since you also have to tweak loads on a non US hardware ... but maybe i misunderstand that a bit here.
Because, on the one hand, I want to be conscious about what's going on to some extent, and on the other hand, it's cool to increase the extent of my knowledgement... To make it clear, not to the point of getting saturated with info, because I like to mind my own business and don’t like poking my nose in other folks.Right - because the primary/only objective of anything we implement in a game, is to advertise it in PR.
X360 is something to take seriously, it isn't the old and crappy PS2. Yeah, PS2 games have improved since the debut of the console, but Ps2 was a dictatorial console, xboxers got tired of Ps2 based ports, we were suffering because of PS2 limitations -playability matters but developers don't tend to take advantage of a specific hardware when they create a multisystem game-.That doesn't mean there is anything that hasn't been 'yet revealed', as the topic subject line says. We've known about these features for a long time now, before the 360 was launched in fact, they're not new, and programmers have had access to them before now as well. Just because they're getting (better) integrated into microsoft's software libraries doesn't mean it suddenly is new stuff. Quite the opposite.