With the amount of spatial coherence typically in graphics, a SIMD4 seems like a strange choice. Imagine the amount of control logic required in a 32 core GPU! I don't think packing quads into a wavefront is a problem - only the last wavefront would be partially empty.
Ofcourse it will be stripped to make sure all IHVs (including mobile) are able to support it. But nonetheless AMD seems to be the biggest contributor towards Vulkan. Having a common efficient API across all platforms with suitable vendor specific extensions would be great for gaming in general.
Wow so big difference between peak ALU and fillrate tests (and also memory bandwidth) but very comparable perf in overall Manhattan and T-Rex. From what I know, the peak ALU test doesn't utilize the vec4 dot product unit which decreases it's number by 7 flops per cycle per ALU pipe. Still it...
Sorry, no games here except CS1.6.
Btw, this card has peak bandwidth of 76GB/s compared to 4870's 115GB/s, but higher Peak flops ( 1.36 Tflops ). But this is the card to get as it supports OpenCL and CS 5.0 too. :smile: