RoOoBo said:
But compilers are not magical beings that produce the perfect sequence code. That is the reason because in some cases you are still programming in assembly (even in RISC machines). You may not want to optimize all your programs (or shaders in this case), but the ones that are critical for performance sure you should even for different architectures.
No, compilers cannot compile perfectly optimally, particularly in a runtime environment.
However, there are significant advantages. It is for this reason that I fully support 3D Labs' approach to HLSL's: standardize the HLSL, let the hardware developers design the machine/compiler.
The main reason is simply this: video hardware is changing at a breakneck rate. If we get bogged down in a standardized instruction set, then that instruction set will hold progress back, just as has happened with the x86 architecture. While it is true that you can, for example, squeeze a little bit mroe performance out of x86 by going straight to the assembly, the truth is that our processors would be running one heck of a lot faster if the HLL's had been standardized instead of the processor instruction set.
One other example: What would you rather have in three years: a 1GHz GPU running on an equivalent of the x86 instruction set, or a 1GHz GPU running on an equivalent of a RISC instruction set? Which would be faster? Obvoiusly the more advanced one would be.
I'm really hoping that DX10 takes a "hands off" approach to assembly programming, and goes all HLSL. I also hope that 3DLabs' proposal to standardize the HLSL, not the assembly, goes through for OpenGL 2.0.
The process of translating x86 instructions in the internal micro-ops that both Intel and AMD use from the Pentium Pro times is not a problem at all.
Yes, it is a problem. Particularly for a GPU, having to decode would require far more precious transistors. On a GPU, those transistors could be put to use much more effectively than the same transistors in a CPU. And, just as you stated before, a compiler can't be quite as optimal as programming right to the assembly. Don't you think that the internal translator in those CPUs reduces performance?