Yes, G92b will not be replaced before 40nm; the only GPUs to be replaced between now and 2009 by new GT2xx are GT200 (via GT200b) and G98/G86 (via GT206).
I figured as much. Like I said before, aside from RV770, G92 is the greatest perf/$ GPU ever made given today's games workload along with PCB and RAM cost. I don't think it's possible for NVidia to do much better at that die size with their current 8-wide SIMD architecture and thus it makes no sense to create a GT200 derivative to replace it.
That RV770 beats it in that perf/$ by ~20% is unreal when looking back at the R600 tech that it's based on.
BTW, do you know of any good die shots of G92/G92b that let us see areas like the RV770 and GT200 shots do?
Heh, I probably shouldn't comment on this, but I said 2 quarters and I meant it.
I don't doubt this at all, but I don't think it hurt NVidia much. Given the prices that the 9800 was selling at, what would they have done with GT200 two quarters earlier? Price it at $800 to a few morons? They already owned the $200+ market, and G94 was as fast as RV670 while being cheaper to make.
As for the precise chips in the 40nm generation, I'm not sure, but it's important to keep in mind that the greatest difficulty might be not to get pad-limited. Just look at RV770... At this rate, eDRAM might just turn out to be used in mainstream chips to prevent being pad limited! (I can dream, can't I?)
This is why I doubt we'll see 256-bit in the console space next gen, and EDRAM is here to stay. Sony and MS will want their chips to be <100 mm2 by the end of the generation.
I think the next step for GPU makers is to increase setup speed, particularly with cascaded and omnidirectional shadow maps. BW is hitting a wall and limiting ROP speed, and math/texturing can only do so much. There are geometry reduction techniques, but it'll take a while before they really chop down polygon count.
The special-function change was probably more about a hardware engineer nearly passing out of laughter after seeing the spec supported fixed-function hardware for FP64-level SIN/COS/... - there's honestly not much point in even theoretically supporting that IMO!
Wouldn't log2 and exp2 still be quite useful?