Anyway, trying to be objective, could it be possible that ATI's internal instruction set is very close to ps 2.0, so the optimizing compiler just corrects 'stupidities' of the programmer? For example it could reorder instructions to break dependecies, or detect when less registers can be used, or when a copy is not required? In that case, I can see the need for it, but it would still be unfair compared to the advanced optimization compiler Nvidia had to write and which will never be truely optimal.