PS3 and Havok (Famitsu Article)

It is interesring that the VMX128 has a dotproduct instruction.

I wonder what the fastest way to do a dot product on a SPU would be. The straightforward way would by 5 instructions by my estimation. But there must be a faster way.

mul
rotate quad 2 bytes
add
rotate quad 1 byte
add
 
It is interesring that the VMX128 has a dotproduct instruction.

I wonder what the fastest way to do a dot product on a SPU would be. The straightforward way would by 5 instructions by my estimation. But there must be a faster way.

mul
rotate quad 2 bytes
add
rotate quad 1 byte
add

R1 =
x1x1x1x1
*
x2x3x4x5


R2 =
y1y1y1y1
*
y2y3y4y5


R4 =
z1z1z1z1
*
z2z3z4z5


R3 =
R2
+
R1

R6 =
x6x6x6x6
*
x7x8x9x10

R5=
R3
+
R4

R1 =
y6y6y6y6
*
y7y8y9y10

etc... etc...

This is surely not the fastest SoA dot-product you can imagine and although with an increasing number of vectors to be dotted the stalls would reduce (we could avoid to do the obvious "splatting" I have done in that code because the splatting does take some cycles although we can hide such cost with the loading vectors portion of the code perhaps or we could have a case in which this splatting is not needed as we are not dotting a single vector with multiple other ones or if we have, like on PlayStation 2's VU's a broadcast operator that allows us to multiply or add to a vector a single field from another vector) I suspect that the main benefit of SoA form can be seen right away: we are processing 4 dot products in parallel.
 

Interesting... when the verts are arranged like that (SoA), not having the HW ability to do a sum across a vector doesn't look too bad at all.

I supposed the SoA form will also be used in some Xenos apps too since it's more cache freindly. If so, the dot product instruction usefulness is going to become limited.
 
Panajev2001a said:
the stalls would reduce
Having data to process in loops is kind of a prerequisite if you are talking about optimization on level of cycle counting.
Xenon's DP latency is 14cycles, it's not going to be terribly fast if you don't have lots of DPs or something else to schedule around it to fill out the latency gaps.
 
Interesting... when the verts are arranged like that (SoA), not having the HW ability to do a sum across a vector doesn't look too bad at all.

I supposed the SoA form will also be used in some Xenos apps too since it's more cache freindly. If so, the dot product instruction usefulness is going to become limited.

The way we are trained in schools AoS form is more intuitive and being able to do horizontal operations across fields and all is easier for us to think about 3D math and vectors.

SoA might be a bit counter-intuitive at first and requires to re-organize your data from beginning to end to get most benefits out of it (although you could also waste some cycles to get data in and out of SoA form before and after some critical math loops).

Still, when it is the driver and the hardware that do a good chunk of the work for you, SoA form is not so bad... see G80's "scalar" processors ;).
 
h-103_59279_phy0011.jpg.jpg


Is 3 Pc core a veiled term to rappresent X360Cpu? :p
 
h-103_59279_phy0011.jpg.jpg


Is 3 Pc core a veiled term to rappresent X360Cpu? :p
edit [strike]Why are you so happy to dismiss the xenon perf... [/strike] i'm under the impression I misunderstood your post ;)
Anyway it doesn't make sense, ppu thread is 33ms, one xenon core with two thread running and with better altivec unit wouldn't be three time faster...

Thanks, Fafalada for your response ;)
 
h-103_59279_phy0008.jpg.jpg
h-103_59279_phy0009.jpg.jpg

Picture6:
This is a ragdoll demo called "Beating and Playing" {Comment: I don't know what the real title might be}. The blowing-off (blasting-off) of 200 soldiers’ bodies is simulated by physics operations. In the end up to 800 hundred bodies will be possible

This Heavenly Sword demo was played on lasy year's GDC...
 
There's already several threads about it on other sites too. Please use those instead. :p

Seriously now, this thread (the Famitsu article) had the interesting bit about Nvidia's coop with Havok. I wonder if the new situation means or leads to any increasing cozyness between Nvidia and Intel. (What with their mutual chipset and IGP war on the one hand, and DAAMIT on the other). Maybe it's just coincidence and has no consequences.

Okay, lockdown! ;-)
 
Back
Top