PIII FLOPS and 128 bits SIMD ops

patroclus02

Newcomer
Hi,

I don't think I understand why Intel PIII CPU is considererd to be a 2 GFLOPS (at 500MHz).
Maybe can you guys tell me if my reasoning is ok:

PIII core has 2 SIMD FP units, that I suppose can output a single precision operation per clock (FMUL or FADD), but keeping in mind that the FMUL unit is not fully pipelined.
I also suppose these operatetions are 64 bits, that is, an single precision operation with two 32 bit operands (as 128 bit SIMD have to be converted into 2 64 bit chunks, except for the new CORE DUO architecture). Am I getting this right?? :oops:

So, suppose you do a FMUL and FADD per clock (I don't think this is possible because of FMUL not being fully pipelinined), that would be 2 FP operations per clock, and so being 1GFLOP.. :oops:

Where am I wrong?? :?:
Thank you! :D
 
Sorry, if you do a FMUL and FADD per clock that would be 2+2 FP operations per clock (operating with 64 bits chuncks per clock) , and so would be 2GFLOP at 500MHz...

I suppose this is it, BUT it is very optimistic, as FMUL unit is not fully pipelined.
Maybe you could do some operations at that rate, but interesting transformations using products and sums won't be able to get that peak performance.
 
FADD and FMUL both have 2 cycles throughput. So in theory, if you alternate between FADD and FMUL, you can get 4 SP per cycle. Of course, in reality you have to do some other things beside these operations (such as load, store, branch, etc.), so it's impossible to get peak performance.

I once did a "theoretical" DOT3 program which can achieve over 2GFLOPS on a P3 1GHz (all in cache, as the memory bandwidth is not big enough to sustain such rate).
 
Back
Top