Looking at that picture of the CU, and reading the leak info, it seems there is some mangling of the details, probably from English not being the writer's first language.
Who is sweetvar, and does any of the sweetvar info match up with this VGLeaks info? The "Scalar ALU's 320" doesn't make a lot of sense either (looking at the CU diagram), and is either a case of misunderstanding specs, or another confusion with English.
Edit:
Reading this link on Anandtech that describes what the Scalar ALU's do, it seems like adding an extra one to the CU could be pretty beneficial if the CU is used for compute tasks.
Who is sweetvar, and does any of the sweetvar info match up with this VGLeaks info? The "Scalar ALU's 320" doesn't make a lot of sense either (looking at the CU diagram), and is either a case of misunderstanding specs, or another confusion with English.
Edit:
Reading this link on Anandtech that describes what the Scalar ALU's do, it seems like adding an extra one to the CU could be pretty beneficial if the CU is used for compute tasks.
- http://www.anandtech.com/show/4455/amds-graphics-core-next-preview-amd-architects-for-compute/4So what does a scalar unit do? First and foremost it executes “one-off” mathematical operations. Whole groups of pixels/values go through the vector units together, but independent operations go to the scalar unit as to not waste valuable SIMD time. This includes everything from simple integer operations to control flow operations like conditional branches (if/else) and jumps, and in certain cases read-only memory operations from a dedicated scalar L1 cache. Overall the scalar unit can execute one instruction per cycle, which means it can complete 4 instructions over the period of time it takes for one wavefront to be completed on a SIMD.
Conceptually this blurs a bit more of the remaining line between a scalar GPU and a vector GPU, but by having both types of units it means that each unit type can work on the operations best suited for it. Besides avoiding feeding SIMDs non-vectorized datasets, this will also improve the latency for control flow operations, where Cayman had a rather nasty 44 cycle latency.