>>But if you implemented two 128bit buses and a memory controller that could interleave the flow, wouldn't that give you a theoretical 32Gb\sec bandwidth?<<
I think what you're talking about is sorta what DDR does. DDR is, more accurately, twice the bus width multiplexed onto half the pins... so a 300 mhz, 128 bit DDR bus would be essentially the same as a 300 mhz, 256 SDR bus. The memory will present the lower half of the 128 bits on the first half of the clock cycle, and the upper half on the upper cycle. That's why DDR is somewhat less efficient than SDR, since you run into granularity problems with a wider bus. That's why NVIDIA split their 128 bit bus into 4 32 bit buses... that way you can read 4 64 bit chunks every cycle rather than 1 256 bit chunk. Since every bit you read has to be linearly adjacent, and not all the data you read every clock cycle *is* adjancent to each other, this can help a lot.
DDR2 doubles the data rate once more, and, I'm assuming doubles the effective bus width once more as well... at the afforementioned 300 mhz clock rate, SDR would have a 300 megabits per sec per pin transfer rate, DDR would have 600 mbps/pin, and DDR2, 1200.
So, at equal clock speed, DDR2 on a 128 bit bus is just as fast as DDR on a 256 bit bus, except you can lose some efficiency. If NVIDIA sticks to their 4x 32 bit bus controller and ATI has their 4x 64 bit bus, and everything else is equal, they could have about the same raw effective throughput. However, DDR2 will be difficult to clock as high as current DDR modules because timing constraints are much, much tighter (multiplexing 4 signals across a pin per clock cycle instead of just 2)
If ATI's memory controller is capable of using DDR2 on a 256 bit bus (it's possible it's the same sort of thing as the Geforce 2 MX's, where the halved memory controller was capable of a 128 bit SDR or 64 bit DDR bus), and they use that capability in the near future, the NV30 can't come close in raw memory throughput.
Actual effective throughput however is dependant on a lot more than just the speed of the memory, however... if NVIDIA's memory architecture has a bit less raw bandwidth, it could make up for it in efficiency.... or the other way around, if ATI is more efficient. NVIDIA is probably extending data compression through the entire framebuffer on the NV30, however; much like the NV20-25 compress the Z-buffer. Since there's a lot of redundant data stored with MSAA, this will likely be *very* effective in increasing antialiasing performance.