If NVidia has any sense then they'll release a 24 pipe version of the 6800Ultra, at roughly the same clockspeeds.
The 6800GT with 16 pipes seems to have outsold X800Pro with its 12 pipes (3:1 ? in retail - ignoring OEM/systems integrators) for six reasons:
1. Doom 3
2. SM3
3. 16 pipes
4. slight performance lead in some DX9 titles (before HL-2 hit the streets)
5. overclockability, without the need to mod
6. slight availability advantage?
Of these reasons, I dare say that 3 is the dominant factor since it implies 1 and 4. The average enthusiast sees that it's a battle between 16 pipes and 12 pipes and picks the former. It's obvious, innit.
It's so obvious that it's the only one of these four reasons that's never even argued-down by fanatics. Even if the 12 pipes in an X800 Pro are faster in the toughest parts of HL-2 than the 16 pipes of a 6800Ultra.
Sadly, NVidia will prolly think that SM3 was the real reason, but even if the PR department thinks that, they should recognise that the number of pixel shader pipelines is the next statistic that most enthusiasts recognise.
The big question seems to be which process will NVidia be using? If they can get 90nm then they're home and dry with the increased transistor count of a 24-pipe part.
It'll be NVidia's 24 pipes plus 6 vertex shaders versus ATI's 16 pipes plus 8 vertex shaders hybrid (the vertex shaders are unified, so that they can also perform as 8 pixel shaders). In the past I've suggested that the 8 unified pipes will be arranged in quads, so that R520 can operate as 24-0 or 20-4 or 16-8 (pixel shader - vertex shader count), in other words the unification is quite coarse-grained.
NVidia will rubbish the 8 unified pipelines, saying that R520 is really just a 16-pipe part. They'll be saying this till the cows come home, i.e. until they're forced to go with a fully-unified architecture some time in mid-2007, as a catch-up to the fully-unified WGF-busting R600.
Personally I wonder how much a 16+8-unified pipe card will benefit from the 8-pipes unified. For practical purposes I wouldn't be surprised if R520 performs on average like an 18 pixel shader card with 6 vertex shaders. It'll only be competitive with a 24-pipe NVidia card due to a 20%-ish faster clock. Oh, and the new AA algorithm that's been rumoured. How much will that ease bandwidth pressures? Will it be a part of R520?
If NVidia is going with 110nm then I suppose they can eke out an extra 100MHz. 6600GTs are hitting 550MHz core on overclock as far as I can tell, so 520MHz for a die with nearly twice as many transistors could be doable. I suppose that's the least risky approach for NVidia. Bit boring though.
Jawed