OpenGL guy: There is no problem, I appreciate your response. I'm trying to point out, that HD4890 - despite almost 3-times higher ALU power - is only 1,6-times faster at the average. Maybe slightly more, if we skip HL2 as a CPU limited game - or maybe slightly less, if we skip COD5, which could be affected by a R6xx-related driver bug (at least, at the R600 launch interview, Eric Demers mentioned, that any game, which runs slower on R600 than on R580 - even with MSAA enabled - is more likely affected by a driver issue than by a hardware limit).
Anyway, if a 3-times (ALU) faster GPU performs about 1,6-times faster in real-world situations, it have to be caused by something. The first reason could be number of ROPs, which is the same for both R600 and RV770. (I choosed the non-MSAA results for comparision to avoid the impact of broken resolve hardware). The second reason could be different TMUs, which are more capable on R600. And I'd like to know, which feature has more impact in these games - if the better FP16 performance, or the additional point sampling units.
I believe, that more capable TMUs and (or) higher number of ROPs could boost R7xx performance significantly. And because of that, I think R8xx will bring at least one of these changes.
Davros: 8-bit per component?