NVIDIA Fermi: Architecture discussion

unless your sources are people that made the chip, no one knows the size right now ;), that info was never released at least not yet.



Kinda confused there, if its a different architecture why not? You aren't thinking about the increased cache sizes, the l2 cache, those take up transistors too, and are very packed much more dense. The only way I see how you came up with the die size is if you calculated from the gt200b die and transistor amounts, and crazily it comes out to the size you are pointing at. I don't think that is accurate.

*Cough* 104 die candidates. I wonder how that info came about? You would have to have seen a........ Nice weather here, how about where you are?

-Charlie
 
I don't know A3 will do the trick, and given the problems with A1 and A2, I don't know if NV knows either. We are ~2 weeks from A3 silicon, so that is when we will know. :)

Because your history with this kind of predictions: There is no problem. Tape-out was late but after this they are in the timeframe. And a launch with A3 in february means they would be very fast.
 
I don't know A3 will do the trick, and given the problems with A1 and A2, I don't know if NV knows either. We are ~2 weeks from A3 silicon, so that is when we will know. :)

I read what SG said, and I do agree with it. That said, if whatever NV is trying to change in the A2 -> A3 stepping was not possible, then we would be seeing either production A2s at low bin splits, or a B1.

So what about your theory about a screwed up memory controller? See your information just doesn't make sense, since its in a hundred different places.


B1s would take MUCH more time than A3s, if for no other reason than they couldn't use partially re-done A1/2 silicon to reduce fabbing time. If you notice what happened, A1 took about 6-7 weeks, and A2 was about 4. That tells me that hot lots of all the metal layers take about 4 weeks to force through the system.

A1 wasn't a respin, so how you do you expect 6-7 weeks?

If it was a full silicon respin, you would be looking at mid-January at the earliest, quite possibly later. I haven't heard that a Bx line will be needed, but then again, wait for A3.... If the clocks don't go up significantly for A3, things are looking mighty uncomfortable in NV land.

-Charlie

If there were any silicon changes, minimum its 2 months to get chips back, and that is being very optimistic. Wasn't this already gone over in this very thread?
 
*Cough* 104 die candidates. I wonder how that info came about? You would have to have seen a........ Nice weather here, how about where you are?

-Charlie

maybe you are guessing?

Next time you get a chance, can you ask about the info in your sig? You do have 11 or 12 days left......

-Charlie


Oh no I was talking about Win 7 parts, thats why I only selected portions of your post. And since you have no idea about when nV pushed thier launch out at the time of your post, I don't believe you knew anything.
 
Last edited by a moderator:
May I ask you guys whats your guess on the percentual advantage (or disadvantage) of GTX 380 over 5870 ?

Of course, a little explanation of why would be good.

I expect nothing more than 15%+ performance - based on history, SP Flop performance and memory bus. I cant really see the advantage of a dual warp scheduler, large caches and the new CUDA cores for gaming applications so 15% is my guess.

Sorry about my english.
 
I think this has been linked before:

http://www.semiconductor.net/article/438968-Nvidia_s_Chen_Calls_for_Zero_Via_Defects-full.php

so, what if via defects are the key problem with GF100? Re-doing metals is, effectively, re-doing vias I guess. And presumably vias are pretty fragile compared to metals (though I presume vias and in-layer interconnect count under the heading of "metal"), since the vias have an interface where two layers meet - which I presume is the bit that's failing.

If vias are the problem then metal spins are presumably being used to reduce sensitivity to via defects. What degree of redundancy in vias is there? It seems to me that power vias should be highly redundant. Presuming via redundancy isn't an option for data, the options I guess are either to beef them up (dunno how much play there is there), or to change the 3D stacking of interconnects so that the most troublesome data routes traverse minimal vias and/or layers. Though I'd expect that's a primary design goal from day one - so not sure how much that can be improved, either.

Jawed
 
so, what if via defects are the key problem with GF100? Re-doing metals is, effectively, re-doing vias I guess. ...
The problem that's being talked about in that article doesn't seem to be something that can be fixed with a metal spin, it's a process issue, something that TSMC needs to fix.

For non-power vias, there's no redundancy at all.

Changing via sizes on an existing layout would result in almost starting from scratch for all routing. It would change the complete timing of the chip. Not possible.

For new chips, reducing the amount of vias isn't the correct way to go either.
- For one, you'd only reduce the problem but it wouldn't go away. So your zero-defects target isn't solved. (Note: he's asking zero defect for long term reliability, not for zero-defect for initial yield. A big difference and not an unreasonable request.)
- But more important: current design rules explicitly limit the maximum length of a single continuous stretch of metal wire and enforce a weaving pattern between metal layers instead (and thus use vias). This is to avoid antenna violations: during silicon production, metal wires may temporarily not be connected to anything (e.g if your signal goes from M4 to M3 and back to M4, M3 will be unconnected when M4 hasn't been added yet). Those metal wires function as antennas and will pick up charges that have nowhere to go. If this charge gets too large, the wire will zap and you end up with a broken chip.
 
May I ask you guys whats your guess on the percentual advantage (or disadvantage) of GTX 380 over 5870 ?

Of course, a little explanation of why would be good.

I expect nothing more than 15%+ performance - based on history, SP Flop performance and memory bus. I cant really see the advantage of a dual warp scheduler, large caches and the new CUDA cores for gaming applications so 15% is my guess.

Sorry about my english.

Rys, based on what's known, insider info and some guesstimation talks about it here:

http://techreport.com/articles.x/17815

Basically twice as fast as a GTX 285, which puts the single chip Fermi based GeForce on the heels of the HD 5970, although it most likely won't beat it across the board.
 
The problem that's being talked about in that article doesn't seem to be something that can be fixed with a metal spin, it's a process issue, something that TSMC needs to fix.

<snip>

Sorry for the off-topic, but since I'm kind of new here I have to ask: do you or did work in the semiconductor industry ?

And thanks again for sharing your knowledge on this :)
 
Basically twice as fast as a GTX 285, which puts the single chip Fermi based GeForce on the heels of the HD 5970, although it most likely won't beat it across the board.

That info is a bit old, and the performance guestimation conclusion is IMHO not justified in the text. Except the shader performance (which is far from all-important for game performance, look at 4890v285) it's more like +50% theoretic and then comes all the system limits etc (as can be seen in the 4890->5870 transition).
 
That info is a bit old, and the performance guestimation conclusion is IMHO not justified in the text. Except the shader performance (which is far from all-important for game performance, look at 4890v285) it's more like +50% theoretic and then comes all the system limits etc (as can be seen in the 4890->5870 transition).

Old ? It's not even one month...

And how can guesstimation be justified ? Until actual benchmarks are revealed, guesstimation and rumor, will continue to be guesstimation and rumor, even though Rys's article isn't just guesstimation and there's lots of good and factual info.
 
That info is a bit old, and the performance guestimation conclusion is IMHO not justified in the text. Except the shader performance (which is far from all-important for game performance, look at 4890v285)

Your comparing to completely different architectures there though. GF100 and GT200 have similar shader setups however so the 1.7TF vs 0.7TF is fairly comparable.

Suffice to say in shader limited situations - which I believe is a lot of games these days, Fermi should wipe the floor with GT200.
 
The problem that's being talked about in that article doesn't seem to be something that can be fixed with a metal spin, it's a process issue, something that TSMC needs to fix.
AMD and NVidia are already producing 40nm chips - so what we seem to be left with is that the sheer scale/power/clocks of GF100 is going beyond the somewhat frayed capabilities of 40nm.

Electromigration issues are ascribed to the quality of the via manufacturing, which sounds like it's purely a process problem. Presumably this problem mostly affects the smallest vias, those that are closest to the silicon face (the top-most layers?).

An update to the article:

(TSMC later said any problems with vias on the ATI chip were "teething problems" that were quickly resolved. James said that the ATI graphics chips also came from early production runs. "Our part was quite early in ATI's production and any report we do is inevitably only a snapshot of that production," James said after IEDM concluded.)

seems to imply that vias shouldn't be suffering that problem, though.

So if it's not metal layout, per se, and via manufacturing is "good enough for AMD", we're left at power, leakage and clock problems :?:

Those metal wires function as antennas and will pick up charges that have nowhere to go. If this charge gets too large, the wire will zap and you end up with a broken chip.
So what you're saying is that the stages of manufacturing pose electrical risks to the exposed wires, as metal layers are applied, so that a chip can fail during manufacturing? Ouch. Is that a question of timeliness, e.g. the risk grows if the manufacturing stages are separated by more than a day or two?

Does this effectively constrain the "parking" of wafers? e.g. M1 can only be parked for 1 week, while M2 could be parked for a month?

Jawed
 
Old ? It's not even one month...
And how can guesstimation be justified ? Until actual benchmarks are revealed, guesstimation and rumor, will continue to be guesstimation and rumor, even though Rys's article isn't just guesstimation and there's lots of good and factual info.

(It was discussed here when it came)
Sure the article contains good factual and rumoured info, it just seems wrong to conclude an actual 2x performance increase based on that very info. That's why I call the *performance guestimate* a bit unjustified from the rest of text.

Your comparing to completely different architectures there though. GF100 and GT200 have similar shader setups however so the 1.7TF vs 0.7TF is fairly comparable.
Suffice to say in shader limited situations - which I believe is a lot of games these days, Fermi should wipe the floor with GT200.

Ofcourse the 1.7 (@1700mhz but that another story) vs 0.7 is more comparable than the (4890) 1.36 vs 0.7, but still the graphics workload effiency of R7x0 is pretty good and it's considerably faster in real shader limited scenarios (as eg. carstens/jawed's studies shows) and still it looses by some 15% overall.

And again 4890->5870 is +100%/100%/100%/23% (alu/tex/rop/bw) and becomes like +50% IRL.
While 285->380 is (from the article) +140%/60%/50%/21% and should be +100% IRL? That would require some SERIOUS graphics relevant architectural improvements (especially bw saving) that the article doesn't mention. (granted, Rys' claim is a bit more theoretic and therefore not as bold as Silus')
 
(It was discussed here when it came)
Sure the article contains good factual and rumoured info, it just seems wrong to conclude an actual 2x performance increase based on that very info. That's why I call the *performance guestimate* a bit unjustified from the rest of text.

We'll see :)

Psycho said:
Ofcourse the 1.7 (@1700mhz but that another story) vs 0.7 is more comparable than the (4890) 1.36 vs 0.7, but still the graphics workload effiency of R7x0 is pretty good and it's considerably faster in real shader limited scenarios (as eg. carstens/jawed's studies shows) and still it looses by some 15% overall.

And again 4890->5870 is +100%/100%/100%/23% (alu/tex/rop/bw) and becomes like +50% IRL.
While 285->380 is (from the article) +140%/60%/50%/21% and should be +100% IRL? That would require some SERIOUS graphics relevant architectural improvements (especially bw saving) that the article doesn't mention.

Again, you are comparing same architecture, different generation (HD 4890 to HD 5870) and different architecture, different generation (GTX 285 to GTX 380), so indeed those "SERIOUS graphics improvements" may exist. And the artcile can't mention them, because the graphics bits are mostly still unknown.
 
(granted, Rys' claim is a bit more theoretic and therefore not as bold as Silus')

Since you added this after I replied to your post, I'll reply to it aswell :)

I've made no "bold claim". I did however say that if Rys guesstimation/speculation is accurate, twice as fast as a GTX 285 puts the GTX 380 on the heels of the HD 5970 and this is seen in many benchmarks around the web, where two GTX 285s in SLI (not twice as fast as a single GTX 285 on most occasions) run neck and neck with the HD 5970, beating it sometimes.
 
That info is a bit old, and the performance guestimation conclusion is IMHO not justified in the text. Except the shader performance (which is far from all-important for game performance, look at 4890v285) it's more like +50% theoretic and then comes all the system limits etc (as can be seen in the 4890->5870 transition).

Only problem with that theory tho is that 2x 9600GTs are almost fast as a GTX280 in shader intensive games. Crysis, Stalker and Oblivion. 2x 9600GTs are within 10-15% of a GTX280 at several resolutions, someting was borked in the SPs of GT200 chips as noway in hell should 2x 9600Gts or 1 9800GTX be within 50% of 1 GTX280 but they are in every single shader intensive game bench I've seen. I'll be doing some testing next week and I'll share my results here.
 
Back
Top