New dev tools ? PlayStation 3 Edge

inefficient · Mar 27, 2007

betan said:
While it is very impressive for Cell, the technique eats almost all of cell's computational power to get 19% increase on RSX. I expect them to come up with approximate culling algorithms that will put less stress on Cell with reasonable performance improvements on RSX.

And the fact that they haven't include the input triangle count in the presentation is a little suspicious.

I don't know where you got that 19% number from. But good luck getting a 3.75 Mil poly scene rendered at 60fps on RSX with 758 human characters and 50+ cars on screen - all of the characters running fully independent animations without the Cell helping.

Npl · Mar 27, 2007

inefficient said:
That's 1.5Mil on screen. If 60% were culled, the scene was originally 3.75 Mil triangles. That is exactly (almost too exactly) in line with 750k triangles per SPU over 5 SPUs he mentioned earlier.

ok, that explains it. Its 750 millions in your previous post

inefficient · Mar 27, 2007

Npl said:
ok, that explains it. Its 750 millions in your previous post

Ah... well that would just be ridiculous.

betan · Mar 27, 2007

inefficient said:
I don't know where you got that 19% number from. But good luck getting a 3.75 Mil poly scene rendered at 60fps on RSX with 758 human characters and 50+ cars on screen - all of the characters running fully independent animations without the Cell helping.

It was mentioned in GCM Replay presentation, regarding culling in demo.

Shifty Geezer · Mar 27, 2007

betan said:
While it is very impressive for Cell, the technique eats almost all of cell's computational power to get 19% increase on RSX.

Not sure on that. The demo had animation systems running as well. 5 SPUs were used for the entire demo, not just triangle culling to speed up rendering. You can easily see how that would scale to a real game application. 30 fps halves SPU use; decrease poly counts; reduce number of people. Say 200 people and lots of cars for a suitable scene in a GTA type game, and you'll be down to 3 SPUs used leaving 3 for other functions. Not too shabby...

inefficient · Mar 27, 2007

betan said:
It was mentioned in GCM Replay presentation, regarding culling in demo.

Ok, I just re-listened to the part you are talking about.

He is talking about doing simulations to predict performance improvements using GCM replay's "What If" feature. You can just tell the program "what if I had compressed all textures", or "what if I had pre-culled all back facing tris". And in this case the simulation said he would gain a 19% perf increase from triangle culling alone.

And of course those are just simulations. Playing around with the GUI tool isn't going to magically make the game better. Coders have to go in and do the hard work of actually implementing those specific features that the profiling showed would yield good results.

betan · Mar 27, 2007

Shifty Geezer said:
Not sure on that. The demo had animation systems running as well. 5 SPUs were used for the entire demo, not just triangle culling to speed up rendering. You can easily see how that would scale to a real game application. 30 fps halves SPU use; decrease poly counts; reduce number of people. Say 200 people and lots of cars for a suitable scene in a GTA type game, and you'll be down to 3 SPUs used leaving 3 for other functions. Not too shabby...

While the amount of animation overhead is a mystery for me, adding random variations doesn't seem to be comparable to culling of millions of triangles.

Anyway, I don't agree with your math. Framerate affects RSX as much as SPUs.
At 30 fps halve of SPUs is not enough because RSX can roughly double the triangle count anyway. Why keep it same (or reduce as you say)? It would be waste of RSX's cycles. More importantly You wouldn't need culling anyway as you are not pushing RSX.
If you double RSX's input as well then you need to use same SPU setup.

That is the math for me, assuming everything scales pretty much linearly (RSX, SPU culling etc) which is over simplification.

Shifty Geezer · Mar 27, 2007

Your maths is logical...as long as you don't change RSX's workload. Handling half the vertex workload on SPUs gives RSX half as much work to do - per pixel. But you could then use the extra available RSX power to add AA or longer shaders etc. Taking the demo, with 750 peeps and 50 cars at 60fps, reduce that to 200 peeps on 3 SPUs, shift the skinning onto RSX, and add 2x AA on RSX, or improve the skin shaders of the people drawn. If RSX's resources are sitting idle, it's not because the SPUs aren't being used to feed it more geometry, but because the devs aren't using it fully on the geometry they are drawing!

ihamoitc2005 · Mar 27, 2007

Double?

betan said:
While the amount of animation overhead is a mystery for me, adding random variations doesn't seem to be comparable to culling of millions of triangles.

Anyway, I don't agree with your math. Framerate affects RSX as much as SPUs.
At 30 fps halve of SPUs is not enough because RSX can roughly double the triangle count anyway. Why keep it same (or reduce as you say)? It would be waste of RSX's cycles. More importantly You wouldn't need culling anyway as you are not pushing RSX.
If you double RSX's input as well then you need to use same SPU setup.

That is the math for me, assuming everything scales pretty much linearly (RSX, SPU culling etc) which is over simplification.

Since 30 fps has 2x clock cycles, can this demo at 30fps have 7.5 Mil triangle count, 1500 persons and 100 cars with independent animations?

betan · Mar 27, 2007

ihamoitc2005 said:
Since 30 fps has 2x clock cycles, can this demo at 30fps have 7.5 Mil triangle count, 1500 persons and 100 cars with independent animations?

If you assume everything scales linearly and SPU input in the demo is 3.75 mega triangles, yes.
You may even do better because there will be more occlusion and RSX's input triangle count will be sublinear. Input is probably less than 3.75 mega though.

Way too many assumptions of course.
I think devs should come out and clarify the numbers and assumptions.

Because considering 19% improvement claim RSX should still be able to make dreams come true without culling, which is suspicious.

So to inefficient, I think 19% may not be the case for the full demo scene, as presenter said it was for incomplete version. The final demo has a lot of occlusion and even with MSAA improvement may be well above 20%. That would be in line with my expectations of RSX as well.

Again, absence of input triangle count in the presentation is suspicious.

Laa-Yosh · Mar 27, 2007

Ugh, blendshapes don't sound good for facial animation if an SPE can only hold 1000-1500 vertices. That may be enough for a head and a single additional shape. During animation it's easy to have 10-15 shapes activated at once, and a good setup would require at least 25-30 shapes. It's just too memory intensive for realtime stuff as it seems...

patsu · Mar 27, 2007

I'm not familiar with blendshape at all. Is it doable in "chunks of 1000-1500 vertices" ? Also, SPU can access main memory where it makes sense.

EDIT: Are there alternative approaches ?

nAo · Mar 27, 2007

Laa-Yosh said:
Ugh, blendshapes don't sound good for facial animation if an SPE can only hold 1000-1500 vertices.
That may be enough for a head and a single additional shape. During animation it's easy to have 10-15 shapes activated at once, and a good setup would require at least 25-30 shapes. It's just too memory intensive for realtime stuff as it seems...

Not really a big deal, you don't need to blend a full head in one go, you can split your per head data in smaller packets and blend a packet at time, that's easily parallelizable over multiple SPUs and you can also completely hide latencies through double buffering. This is a kind of job one can easily run close to theoretical peak performance.

homy · Mar 27, 2007

nAo said:
Not really a big deal, you don't need to blend a full head in one go, you can split your per head data in smaller packets and blend a packet at time, that's easily parallelizable over multiple SPUs and you can also completely hide latencies through double buffering. This is a kind of job one can easily run close to theoretical peak performance.

I agree. It's a whole new parallel world now. We need to think parallely.

Laa-Yosh · Mar 27, 2007

nAo said:
Not really a big deal, you don't need to blend a full head in one go, you can split your per head data in smaller packets and blend a packet at time, that's easily parallelizable over multiple SPUs and you can also completely hide latencies through double buffering. This is a kind of job one can easily run close to theoretical peak performance.

Well, that's better then. you can probably also compress blend shape data somehow, usually no more then 20-30% of vertices change for a single shape.

V3 · Mar 28, 2007

patsu said:
I'm not familiar with blendshape at all. Is it doable in "chunks of 1000-1500 vertices" ? Also, SPU can access main memory where it makes sense.

EDIT: Are there alternative approaches ?

Blendshape basically is just different poses, of say a face. From these different poses, one can generate alot of animation by interpolating between these poses. A bit from this pose, abit from that, abit from another will create a facial expression animation.

What Laa-Yosh was concerned is that it'll take alot of memory to store the different poses, since you know SPE only have 256k. But really its no biggie if you know how to it works in code.

Fafalada · Mar 28, 2007

V3 said:
But really its no biggie if you know how to it works in code.

Yep, its just fancy name for morphing. Been done as hardware streaming implementations for ages, and many people have made it work in only 16KB using SPEs predecessors.

_phil_ · Mar 28, 2007

Fafalada said:
Yep, its just fancy name for morphing. Been done as hardware streaming implementations for ages, and many people have made it work in only 16KB using SPEs predecessors.

Indigo prophecy did that.

one · Apr 5, 2007

Probably this is the patent app for SPURS
http://v3.espacenet.com/textdoc?DB=EPODOC&IDX=US2007074207&F=0

SPU task manager for cell processor

Publication number: US2007074207
Publication date: 2007-03-29
Inventor: BATES JOHN P (US); WHITE PAYTON R (US); STENSON RICHARD B (US); BERKEY HOWARD (US); VASS ATILLA (US); CERNY MARK (US); MORGAN JOHN (US)
Applicant: SONY COMP ENTERTAINMENT INC (JP)
Classification:
- international: G06F9/455; G06F9/455;
- European:
Application number: US20050238087 20050927
Priority number(s): US20050238087 20050927

Abstract of US2007074207
Cell processor task management in a cell processor having a main memory, one or more power processor units (PPU) and one or more synergistic processing units (SPU), each SPU having a processor and a local memory is described. An SPU task manager (STM) running on one or more of the SPUs reads one or more task definitions stored in the main memory into the local memory of a selected SPU. Based on information contained in the task definitions the SPU loads code and/or data related to the task definitions from the main memory into the local memory associated with the selected SPU. The selected SPU then performs one or more tasks using the code and/or data.

Butta · Apr 9, 2007

Have any devs begun working with Edge yet? If so what is the overall impression? Are there areas or features missing that Sony will need to focus on in order to bring the performance levels on par (or better than) the 360? (particularly w/r to areas of concern on PS3 such as lighting, texture memory, bandwidth limitations, etc vertex stuff seems to be well adressed)

New dev tools ? PlayStation 3 Edge

inefficient

Npl

inefficient

betan

Shifty Geezer

uber-Troll!

inefficient

betan

Shifty Geezer

uber-Troll!

ihamoitc2005

betan

Laa-Yosh

I can has custom title?

patsu

nAo

Nutella Nutellae

homy

Laa-Yosh

I can has custom title?

V3

Fafalada

_phil_

one

Unruly Member

Butta

Similar threads