Are all FLOPS created equal?

Gubbi said:
Right, but even if you do implement a software cache, it'll have an order of magnitude worse latency than hardware level 1 cache access (a mask, two loads, a compare and a branch), and that is for a one-way associative cache. And you end up saving a copy of the same data in each SPE like I mentioned.
I agree, a software managed cache would be much slower.

But comparing latency, L2 cache is not really stunning. We don't have cache latency figures for Xenon but going by the PPC970, L2 cache has an access latency of 11 cycles. SPE's LS has a 6 cycle read and 4 cycle write latency by comparison.
 
london-boy said:
I think this is a big misunderstanding.

Tim was saying that ultimately, the output of a chip largely depends on the software.

So one chip might have higher theoretical "peak" number mumbo jumbo, but if the application is not optimised for it, it will perform worse than a chip with smaller PR numbers.

Nothing new really, this has been discussed before. The G5 example was a bit unclear, but it makes sense.

Even with optimization it is not unusual for a chip with half the peak performance to have 2x the real world performance. Peak generally stands for the P in POS. In general the more unbelievable a peak number is, the more unrealistic it is as an indicator of real world performance.

Aaron Spink
speaking for myself inc.
 
Back
Top