I don't know how you came up with these numbers. First of all, the number of units between G80 and R600 aren't even direclty comparable due to differences in clocking, layout, etc.
How does 64*4 = 320? The chip doesn't work the way you seem to think.
G80's ALUs are more scalar than R600, but R600 has plenty of math power for most tasks.
The problem is how ATI adverting HD-2900XT, there is not actually 320 stream processors on that chip, there is only 64 real processors, but each is cable of 5 operations per shader clock. The 320 individual stream processing units in R600 are arranged in 4 groups of 80 SIMD arrays and each functional unit is arranged as a 5-way superscalar shader processor. First; most of the stream processors are simpler and aren't capable of special function operations. For every block of five stream processors, only one can handle either a special function operation or a regular floating point operation. The special function stream processor is also the only one able to handle integer multiply, while others can perform simpler integer operations. This means is that each of the five stream processors in a block must run instructions from one thread.
Although the unified shader concept is similar between the two cores, the way they go about presenting this functionality is a bit different. (Whereas the G80 has 128 aptly-named Unified Shaders), the R600 has 320 Stream Processors. Clearly 320 is a bigger number than 128, but as we know in the hardware world, bigger numbers don't always mean something is better. The fact of the matter is that Stream Processors are different than Unified Shaders. ATI's Stream Processors are an integral part of the Superscalar architecture implemented on the R600. Those 320 processors on the R600, but some of them are standard ALU's and some of them are special-function ALU's.
In contrast, NVIDIA's G80 has up to 8 groups of 16 (128 total) fully generalized, fully decoupled, scalar, stream processors, but keep in mind the SPs in G80 run in a separate domain and can be clocked as high as 1.5GHz. In ATI's R600, each functional SP unit can handle 5 scalar floating point MAD instructions per clock. And one of the five shader processors can also handle transcendental as well. In each shader processor, there is also a branch execution unit that handles flow control and conditional operations and a number of general purpose registers to store input data, temporary values, and output data.
TMU's and ROP's holds R600 back due to because not much space left on 80nm tech. Since the chip using lots of transistors that cause increasing size and complexity of the chip and the wafers on which chips are made are fixed in size and if you have a chip with lots of transistors, it takes up lots of space, and you can't make so many of them from one wafer.