Just to give you an established (Sony marketing numbers,this is a real best case scenario for T'n L operations, others such as vector normalisation are lots more costly (in a way inefficient)) example. Take ps2s VU1: Using software pipelineing techniques, a perspective transformation can be archieved through software pipelining with a throughput of 7 cycles. this includes 4*Muls (1Flop/each), 4 MulAs (1Flop/each), 8*MADDAs (2Flops/each), 4*MADD (2Flops/each) and 1 DIV (1Flop). this adds to 33flops/7cycles (or 1,414 Gflops) compared to a theoretical maximum of 3,2 Gflops for vu1. This assumes you can mask all loop control & LS operations under DIVs 7 cycle latency, your whole environment to be infinitly fast (VIF1,MEM,CPU/VU0, Bus), so this also represents kind of an upper limit to what is possible with VU1. To make something senseful, you`d also have to do other operations on your geometry data, like vector normalisations (throughput of 13 cycles on VU1, best case, includes 4*MULS, 2*MADDS, 1*RSQRT (1Flop) representing a "sustained" performance of 9Flops/13cycles or 0,2 GFlops of VU1 theo. max of 3,2 Gflops). So to give a realistic theoretical bestcase wihle regarding VU1s surounding to be infinitly powerful, you'll end with perhaps about 0,7 GFlops peak. In a real world scenario you would propably be quite satisfied if sustaining 50-70% of that rate over prolonged periods (>500 ms). Just for you not misunderstanding me, similiar calculations apply for nearly all architecture/code cases.