Beyond3D's GT200 GPU and Architecture Analysis

If ATI releases an rv770x2 (note: I don't expect 4870 clocks), I don't think nvidia will let that go without releasing something to compete. Is that at all feasible on 65nm GT200? If not how far behind is a 55nm GT200? Could they clock a single 55nm GT200 high enough or would they go dual gpu with that part?
 
If ATI releases an rv770x2 (note: I don't expect 4870 clocks), I don't think nvidia will let that go without releasing something to compete. Is that at all feasible on 65nm GT200? If not how far behind is a 55nm GT200? Could they clock a single 55nm GT200 high enough or would they go dual gpu with that part?
The problem is that even with 55nm shrink they'd have hard time fitting dual-chip card to PCI Express specs
 
Well Nvidia mentioned shader clocks as being one of their disappointments with the GTX280.

I can't imagine the card being primarily bound by anything other than shader throughput, so they will want to scale the shader clock as high as possible. Probably start to run into heat issues @ about 1.8 GHz though I would think.

If they could get the shader clock up to ~1.7+GHz and the price down to $499-549, they would probably have a competitive product (close to 4870CF performance, single chip solution, cheaper).
 
If I had to put money on it, I'd say NV will release GTX 270 and 290 models based on GT200b and discontinue GTX 260 and 280 ASAP.
 
If I had to put money on it, I'd say NV will release GTX 270 and 290 models based on GT200b and discontinue GTX 260 and 280 ASAP.
No reason for them to confuse the retail with GT200b refreshes.
I doubt that GT200b will be that much faster or better than GT200 to deserve it's own place in the line-up.
They'll probably just switch GT200s for GT200bs in 260/280, lower their prices a bit and present something like GTX290 as their top-end solution -- the latter might be presented on GT200 actually and then switched to GT200b also.
 
No reason for them to confuse the retail with GT200b refreshes.
I doubt that GT200b will be that much faster or better than GT200 to deserve it's own place in the line-up.
They'll probably just switch GT200s for GT200bs in 260/280, lower their prices a bit and present something like GTX290 as their top-end solution -- the latter might be presented on GT200 actually and then switched to GT200b also.

Well, it can't be ruled out entirely but if there's performance to be had by increasing clockspeeds then why not do it? Short of affecting yields, of course.
 
Well, it can't be ruled out entirely but if there's performance to be had by increasing clockspeeds then why not do it? Short of affecting yields, of course.

yields, TDP and thus board cost could all be affected by raising the clockspeed. So you can aim for more competitive on price or performance, they would obviously aim for some sweet spot based on yields.

Right now there doesn't seem to be much reason to bump up performance on the 280, but an R700 could change that. The 260 however either needs to be much cheaper or perform better because its in no mans land atm.
 
Well, it can't be ruled out entirely but if there's performance to be had by increasing clockspeeds then why not do it? Short of affecting yields, of course.
They'll do it, but not by upgrading 260/280.
260 is suffering from a low price/perfomance ratio and 200b would be a nice fit to it.
280 will have to be lowered too because of a new top-end card that they'll make to counter R700.
So it seems reasonable to me for them to just swap 200s with 200bs in the current GTX line, lower their prices and introduce a new "Ultra" based on GT200b with maxed out clocks.
 
They'll do it, but not by upgrading 260/280.
260 is suffering from a low price/perfomance ratio and 200b would be a nice fit to it.
280 will have to be lowered too because of a new top-end card that they'll make to counter R700.
So it seems reasonable to me for them to just swap 200s with 200bs in the current GTX line, lower their prices and introduce a new "Ultra" based on GT200b with maxed out clocks.

What you're saying certainly has merit, and I won't disagree that it is entirely likely. I just think they'll go a slightly different route is all. Guess we'll know in a few months :)
 
I wonder if the next evolution of this GPU will be more than just a die shrink. Or maybe we'll see the die shrink first and then a further evolution along the lines of the G80->G92. Just like with G80 -> G92 I think we'll see a re-evaluation of the allocation of transistors. Since the DP stuff looks to be idle most of the time during gaming (unless I misunderstand its use) it may be the first to go, along with a reduction in memory bandwidth. I don't know if they'll be able to get close to 280GTX performance with 256bit GDDR3, so maybe it will be back to 384bit GDDR3.

Whatever you think about the relative merits of the GT200 vs RV770 architectures, it seems like ATI was able to build more features into a smaller transistor budget. I would like to see Nvidia squeeze a little more out of this architecture with the next revision, although I'm sure cost cutting is bound to be a major focus of the next version. Some of that will come 'free' since the new die will be smaller at 55nm, but I'm sure they'll go further than just a straight die shrink.
 
Removing DP from GT200b makes a lot of sense, actually, because there will be plenty of GT200 chips to satisfy the GPGPU market. I don't expect much difference in die size, though, as all the SMs in GT200 only take up 26% of the die space, and per-SM size didn't increase much from G92 to GT200.

We're not even sure that 55nm even is much cheaper per transistor than 65nm at this point, though it will be in the future (hence G92b). The main point in us making the translation is to make an apples to apples comparison, i.e. GT200 should have less than twice the cost of G92b/RV770, despite being over twice the size, and we should evaluate the technology in this way.
 
Yeah, given that GT206/iGT206/iGT209 are all 55nm too and presumably don't have DP, not keeping DP for GT200b would save some physical verification time & risk (plus it'd obviously reduce die size very slightly). That's certainly very believable. I guess the most coherent line-up would be: GTX 290 1024MiB: 10C/8R - GTX 270 896MiB: 9C/7R - GTX 270 448MiB: 9C/7R - GTX 250 768MiB: 8C/6R/OEM-Only

Of course, if GT200b is really GDDR5, then that would complicate matters. Either way, the most important thing by far for NVIDIA right now is to get VRAM allocation to suck less. Either 512MiB GDDR5 for the whole family or 512-448MiB GDDR3 for some models would make any kind of sense unless it could still sustain reasonably high resolutions. If they still haven't managed to fix that, then they really deserve to get in trouble...

IMO, GDDR3 for GT200 would have been an excellent decision had it not been delayed by nearly 2 quarters. However I think it's pretty hard to argue against GDDR5 in Q3 for an ultra-high-end SKU if you're not limited by VRAM footprint. Well, we'll see what happens...
 
not keeping DP for GT200b would save some physical verification time & risk (plus it'd obviously reduce die size very slightly).
So GT200b wouldn't be an optical shrink?

Is G92b an optical shrink?

IMO, GDDR3 for GT200 would have been an excellent decision had it not been delayed by nearly 2 quarters.
Hmm, looking more like nearly 3 quarters - 7-8 months.

Is it reasonable to assume that the fp64 ALU (double-precision with fast 32-bit integer multiply) is what's caused the delay?

However I think it's pretty hard to argue against GDDR5 in Q3 for an ultra-high-end SKU if you're not limited by VRAM footprint. Well, we'll see what happens...
Since NVidia was there for the development of GDDR5 it seems reasonable to assume they'll have GPUs ready to use it fairly soon - the bandwidth advantage over GDDR3 is unignorable, which was not the case with GDDR4.

Since NVidia appeared to refer to the lifetime of G92b as being up to 9 months, that implies that a replacement in the same performance category is needed in Q1 2008. Could this be their first 40nm GPU? If so, presumably it'd have GDDR5, 128-bit ~70GB/s and ~150mm2?

Jawed
 
Is it reasonable to assume that the fp64 ALU (double-precision with fast 32-bit integer multiply) is what's caused the delay?
No. Or at least if it was the cause it's not a very good excuse since it should not have been that hard for them :) My guess ... a simple freak accumulation of errors. Shit happens.
 
No. Or at least if it was the cause it's not a very good excuse since it should not have been that hard for them :) My guess ... a simple freak accumulation of errors. Shit happens.
I'm thinking back to when it was discovered (admittedly late) that CUDA documentation took a left turn in "pre-announcing" GT200 capabilities:

http://forum.beyond3d.com/showpost.php?p=1141136&postcount=264

http://forum.beyond3d.com/showpost.php?p=1141411&postcount=276

Stuff relating to fp64 and int32 changed in CUDA documentation, even though at the time of the prior version(s) of the documentation GT200's "November 2007" specification should have been known.

Anyway, I will agree that there are a number of other improvements/additions, any of which could have contributed to the delay.

Jawed
 
So GT200b wouldn't be an optical shrink?
Is G92b an optical shrink?
You can't get away from physical verification even for an optical shrink nowadays AFAIK, but anyway no, I don't think either G92b or GT200b are straightforward optical shrinks. There is no inherent cost advantage to the latter, it's just a bit less R&D and there always are small bugs (hidden through the driver) you want to fix and things you want to improve yield-wise...
Hmm, looking more like nearly 3 quarters - 7-8 months.
Heh, I probably shouldn't comment on this, but I said 2 quarters and I meant it.
Since NVidia appeared to refer to the lifetime of G92b as being up to 9 months, that implies that a replacement in the same performance category is needed in Q1 2008. Could this be their first 40nm GPU? If so, presumably it'd have GDDR5, 128-bit ~70GB/s and ~150mm2?
Yes, G92b will not be replaced before 40nm; the only GPUs to be replaced between now and 2009 by new GT2xx are GT200 (via GT200b) and G98/G86 (via GT206). MCP78/MCP7A/MCP7C will also be replaced it seems but I will admit not to be perfectly familiar with NVIDIA's current chipset roadmap (and honestly, I'm not sure I care given just how pitiful their delivery has been in that area lately).

As for the precise chips in the 40nm generation, I'm not sure, but it's important to keep in mind that the greatest difficulty might be not to get pad-limited. Just look at RV770... At this rate, eDRAM might just turn out to be used in mainstream chips to prevent being pad limited! (I can dream, can't I?)
Stuff relating to fp64 and int32 changed in CUDA documentation, even though at the time of the prior version(s) of the documentation GT200's "November 2007" specification should have been known.
The special-function change was probably more about a hardware engineer nearly passing out of laughter after seeing the spec supported fixed-function hardware for FP64-level SIN/COS/... - there's honestly not much point in even theoretically supporting that IMO!
 
Yes, G92b will not be replaced before 40nm; the only GPUs to be replaced between now and 2009 by new GT2xx are GT200 (via GT200b) and G98/G86 (via GT206).
I figured as much. Like I said before, aside from RV770, G92 is the greatest perf/$ GPU ever made given today's games workload along with PCB and RAM cost. I don't think it's possible for NVidia to do much better at that die size with their current 8-wide SIMD architecture and thus it makes no sense to create a GT200 derivative to replace it.

That RV770 beats it in that perf/$ by ~20% is unreal when looking back at the R600 tech that it's based on.

BTW, do you know of any good die shots of G92/G92b that let us see areas like the RV770 and GT200 shots do?
Heh, I probably shouldn't comment on this, but I said 2 quarters and I meant it.
I don't doubt this at all, but I don't think it hurt NVidia much. Given the prices that the 9800 was selling at, what would they have done with GT200 two quarters earlier? Price it at $800 to a few morons? They already owned the $200+ market, and G94 was as fast as RV670 while being cheaper to make.

As for the precise chips in the 40nm generation, I'm not sure, but it's important to keep in mind that the greatest difficulty might be not to get pad-limited. Just look at RV770... At this rate, eDRAM might just turn out to be used in mainstream chips to prevent being pad limited! (I can dream, can't I?)
This is why I doubt we'll see 256-bit in the console space next gen, and EDRAM is here to stay. Sony and MS will want their chips to be <100 mm2 by the end of the generation.

I think the next step for GPU makers is to increase setup speed, particularly with cascaded and omnidirectional shadow maps. BW is hitting a wall and limiting ROP speed, and math/texturing can only do so much. There are geometry reduction techniques, but it'll take a while before they really chop down polygon count.
The special-function change was probably more about a hardware engineer nearly passing out of laughter after seeing the spec supported fixed-function hardware for FP64-level SIN/COS/... - there's honestly not much point in even theoretically supporting that IMO!
Wouldn't log2 and exp2 still be quite useful?
 
Back
Top