NVIDIA Fermi: Architecture discussion

If they'd skip anything on that list of yours it would be the 16SP part.

That's what I first thought, but those low-end discrete cards usually sell mostly on price, which explains the GeForce 205 (8SPs) and 310 (16SPs). Both are made on 40nm, so Nvidia probably needs a Fermi card with ~16SPs to replace them; unless of course they decide to compete against Cedar with DX10.1 products only and just keep selling those models, focusing on the mid-to-high-end for DX11 cards.
 
Considering they are just introducing a 3x0 line of products, I guess you answered your own questions yourself. What would they call DX11 lower end products in such a case? GT4x0? :LOL:
 
Frankly, I can't see Nvidia making a 384-SP chip, because that would mean too many distinct DX11 dies:

GF100 with 512SPs, one with 384SPs, one with ~256SPs, one with ~128SPs, one with ~64, one with ~32, and probably one with ~16 to replace the GeForce 310.

I guess they could just skip the 32-SP part, but still...

You can add one with 448 SPs too. That's the only one I can see being a full chip with units disabled. The 384 SPs one, would have a total of 4 TPCs disabled. I don't think that makes sense in terms of costs...

I agree on everything else, except skipping the 32 SPs part, since each of Fermi's TPCs have 32 SPs, so the ultra-low end should have 32 SPs.
 
Considering they are just introducing a 3x0 line of products, I guess you answered your own questions yourself. What would they call DX11 lower end products in such a case? GT4x0? :LOL:

I think those are being released, because despite the delays, they already spent the resources on R&D, so might aswell release them.
As soon as the Fermi based replacements are ready, they should release them to replace the DX10.1 parts that were just introduced.
 
You can add one with 448 SPs too. That's the only one I can see being a full chip with units disabled. The 384 SPs one, would have a total of 4 TPCs disabled. I don't think that makes sense in terms of costs...
Disabling 1/4 of the units wouldn't be unprecedented for nvidia, they did that right back with g80 (and some g92 9600gso) and tried disabling 1/5 with gt200 (though were forced to disable less by competition). I can only see a 384SP part though if that would still be about as fast as a HD5870 at least.

I agree on everything else, except skipping the 32 SPs part, since each of Fermi's TPCs have 32 SPs, so the ultra-low end should have 32 SPs.
Except GT2xx parts have 24 SPs per TPC, but still GT218 apparently only has 16 SP, so such things seem doable. I don't know though if it's easily possible to remove one of the two 16-wide subunits in fermi. And I certainly wouldn't mind low-end having 32 SPs...
 
Disabling 1/4 of the units wouldn't be unprecedented for nvidia, they did that right back with g80 (and some g92 9600gso) and tried disabling 1/5 with gt200 (though were forced to disable less by competition).

Yes, but my point was that it wouldn't be cost effective. Makes more sense to disable just a couple of TPCs instead and I believe that's reserved for the GeForce 370 or GeForce 375, if it's ever needed. IMO, GeForce 360 either has more SPs than 384 (maybe 448) or is a totally new chip based on Fermi, with 384 SPs.

mczak said:
I can only see a 384SP part though if that would still be about as fast as a HD5870 at least.

Agreed, but that certainly seems doable, given HD 5870's performance delta over past generations.

mczak said:
Except GT2xx parts have 24 SPs per TPC, but still GT218 apparently only has 16 SP, so such things seem doable. I don't know though if it's easily possible to remove one of the two 16-wide subunits in fermi.

Was it ever confirmed that GT218 was a "true" GT200 derivative ? I always thought it was a trusty G92 based part.
I too have no idea if it's possible to disable SPs in each TPC, maybe someone else can enlighten us.

mczak said:
And I certainly wouldn't mind low-end having 32 SPs...

Me neither :)
 
Was it ever confirmed that GT218 was a "true" GT200 derivative ? I always thought it was a trusty G92 based part.
It supports DX10.1, hence it can't really be G9x based. Unless you want to take the fact it has the same number of SMs per TPC as G9x and not as the other GT2xx members to mean it's really more like a improved g98 rather than a "true" gt2xx chip.
I too have no idea if it's possible to disable SPs in each TPC, maybe someone else can enlighten us.
I think it would need some rework of instruction dispatch (since there's dual issue per tpc) but we don't know yet really how that stuff works for other potential DP-deprived parts. I could see 16 SPs working though.
That said, the current 16 SP chip (gt218) isn't competitive with AMD's current low end chip (rv710 which has 80 shader units), and since I'll have to assume cedar is going to be faster I could certainly see 32SPs making sense for the lowest end nvidia part.
 
It supports DX10.1, hence it can't really be G9x based. Unless you want to take the fact it has the same number of SMs per TPC as G9x and not as the other GT2xx members to mean it's really more like a improved g98 rather than a "true" gt2xx chip.

Yes, of course. It would have DX10.1 added to it, but still G9x based.
 
Yes, of course. It would have DX10.1 added to it, but still G9x based.
I think decision if a chip is really gt2xx derived or gt9x derived could be made by its cuda support, since this has some implications how the TPCs look like on a low level (number of registers for instance is different for cuda 1.1 vs 1.2). I thought gt218 supported cuda 1.2 (hence gt2xx based) but I'm unable to get some (trustworthy) source to confirm that's actually the case...
 
why wouldn't G92 support opengl 3.1?

an easier explanation is nvidia wouldn't go through the trouble of making a mix of G9x and GT21x. just a make GT218 as a GT21x chip like the other ones, with the full 16K register files
the one difference is two SM per cluster instead of three, but G98 and geforce 8200 have that feature too (one SM instead of two)
 
Ah no, even in software design, design processes aren't that flexible, even in a client driven market for software, specifically because the consequences of not following design rules causes cost over runs and poor products. Just as an example, when they aren't followed unless very very lucky, you can expect a two fold in increase in cost. In an engineering environment those two areas would be substantially higher. TSMC and nV wouldn't be stupid enough to do something like that.

You are right, TSMC and NV would not be that stupid, only one of them would. Especially if management was being told something else.

-Charlie
 
no not at all. And its not just nV's problem it will also be TSMC's as well. You would have to have stupidity on both sides of the fence for something like what Charlie is suggesting. There is absolutely no logical explination to skip design steps unless it was carifully looked into and is something that would give a time advantage vs. a small increase in cost in the short term, but long term benefits are substantial.

Charlie you are painting a picture where they never looked into risk managment and mitigation because of steps not being followed and what the reprocutions of those risks are, if this was a choice made, its just stupidity, no one in thier right mind would actually do something like this unless they look into what I have just said. Because we aren't talking about a hundred grand or something like that here, its more like hundreds of millions and billions of dollars on both sides.

No, I am painting a picture of smart engineers and myopic management convinced of their own invulnerability. Bad and/or retaliatory management gets told what they want to hear.

So, if I am so wrong, why is ATI have simply fabulous yields on their parts? Should you answer something like "They are not, and it is TSMCs fault", can I ask how anyone with a shred of sanity would launch a chip that is almost 2x as large on the same process?

-Charlie
 
Back
Top