http://www.conceptics.ch/knowledgeb...=1&PHPSESSID=398f4ff4c7006d50d6e0b8da0f1cf251
Hardtuning, PS2, performance, maximize performance, GDC Europe, Performance Analyzer
Ok, so just to finish up I’ll show you again the scan from the very first slide. This is a title which achieves a lot of stuff I’ve talked about today. Notably, for most of the time it’s drawing stuff it’s churning out 10 to 20 million polys. That’s no a random benchmark, that’s physically what the GS starts drawing and like I said during the PA description, it doesn’t count zero-area stuff.
Also, the CPU usage is up at about 50% or more, which is more than double what an average title might see, and better than most good titles by a decent margin. It’s also getting dual-issue for about 80% of the time during the critical processing phases (about half a frames worth).
So good performance is possible, and not only that, but this title was written without the aid of the PA, only using the built in performance counters and a lot of skill. It’s also not perfect and I fully expect to see this developers next title beating this performance on every level. Hopefully you guys can do the same.
.
.
http://www.gamasutra.com/gdce/2001/green/green_03.shtml
Speed is all about the Bus.
This has been said many times before, but it bears repeating. The theoretical speed limits of the GS are pretty much attainable, but only by paying attention to the bus speed. The GS can kick one triangle every clock tick (using tri-strips) at 150MHz. This gives us a theoretical upper limit of:
150 million verts per second = 2.5 million verts / frame at 60Hz
Given that each of these polygons will be flat shaded the result isn’t very interesting. We will need to factor in a perspective transform, clipping and lighting which are done on the VUs, which run at 300MHz. The PS2 FAQ says these operations can take 15 – 20 cycles per vertex typically, giving us a throughput of:
5 million verts / 20 cycles per vertex
= 250,000 verts per frame
= 15 million verts per second
5 million verts / 15 cycles per vertex
= 333,000 verts per frame
= 20 million verts per second
Notice the difference here. Just by removing five cycles per vertex we get a huge increase in output. This is the reason we need different renderers for every situation – each renderer can shave off precious cycles-per-vertex by doing only the work necessary.
This is also the reason we have two VUs – often VU1 is often described as the “rendering†VU and VU0 as the “everything else†renderer, but this is not necessarily so. Both can be transforming vertices but only one can be feeding the GIF, and this explains the Memory FIFO you can set up: one VU is feeding the GS while the other is filling the FIFO. It also explains why we have two rendering contexts in the GS, one for each of the two input streams.