ShootMyMonkey said:
They're comparing assembly code performance..but not to code generated from a compiler, but code using a general FFT library. There's no direct comparison there that gives us any insight into assembly vs compiler generated code (though obviously I'd expect assembly to be better, I should hope compiler code is within decent range of it as is usually the case).
The second bar in that graph IIRC is unoptimised code using a library, not compiler generated SPE code without a library.
Did you... look at the chart and not the text above it or something? It says right there,
optimized compile of a general FFT library using xlc compiler gets 9 GFLOPS vs. 19 for straight assembly. And the second and third bars show exactly that..
Apologies, again, I was reading it that they took a general FFT library and put it on Cell and were comparing that to their own optimised code. I see where you're coming from now though...
ShootMyMonkey said:
Well, like I was saying, it pretty much shows that there's better than 2:1 improvement using direct ASM code over the compiled code, and that is not an insignificant fraction.
..But on this point, I think it may depend on what you're doing, no? I'm not sure if this represents a general case that can be applied to all code. The compiler may come closer with some work more than others. And of course there's always room for improvement as the compilers mature..