Titanio said:
DeanoC might be able to enlighten more accurately and more precisely, since it again is a technique obviously being used in HS also.
Actually for the E3 demo instancing is turned off for HS...
O.K. lets start at the beginning.
For the transform engine, there is some constant state (i.e. each vertex has the same data) and some per vertex variable state (vertex data etc.). What instancing allows is you to loop the vertex streams. So you have 1 vertex rendered N times, this by itself is useless as it would produce EXACTLY the same vertex N times. However instancing allows different streams to be accessed at different rates. So while the main vertex data (position & normal etc.) is repeated N times, some other data (instance data) isn't. This allows you to render an object N times in N different places.
However is much more complicated with skinned animations. In a skinned animation, the matrices that determine the pose you see is encoded in the constant state. Which isn't effected by instancing at all. I.e. in other words instancing a skinned character would produce N EXACT copies just in N different places.
For proper instanced character, you need some way of varying the pose per instance. That proves to be a much harder problem, its sometimes possible (IIRC as the ATI demo does) to store M complete poses in constant space and then each instance can choose one of M poses. However this is much harder if each character has blended animation state (i.e. you need a pose per character), which is what HS has.
At this point PC start running out of steam, the only real method is to store each characters pose in a vertex texture and lookup that in the vertex shader. However NV40 vertex texture fetchs are so slow that you lose any benefits instancing gives you.
On X360 you can go out to main RAM fairly arbitarly so the main problem is likely bandwidth.
A Cell SPE could do it fairly easily, arbitary DMA request could gather the data at will. Bandwidth and latency hiding will be the main bottleneck (you have to do enough work to cover 16 Quadwords (4 4x4 matrices) per vertex).
With no public data on RSX, there is nothing to say.
Of course once people have had longer to play, there will likely be better ways of doing this stuff (i.e. compressed poses or evaluating the animations on the GPU etc.)