NVIDIA Fermi: Architecture discussion

Well I've heard this talk of Nvidia not following TSMC's guidelines on 40nm design from a few different sources now. If there's any truth to that it could be coming back to bite them in the ass so who know what they're scrambling to fix at this point.

No, those 'rumors' were started by me, and they were about the G200 65->55nm shrink that took 3 or so steppings to pull off. It should have been a no-brainer, but somehow, it took multiple steppings.

I have never heard it for 40nm, that can be summed up by incompetence on NV's part. There is NO WAY a chip smaller than RV740 should have been so delayed by process problems. I know what the 740 yields were, and they were economically viable, so smaller parts should have been too.

Something else is at play here.

Yep, it does matter because if Juniper was originally slated to debut a while after Cypress then it means AMD also gets their "big" chip perfected first before scaling down. Rumours place Cypress in Q2 so it could have simply been TSMC's delays that caused them to arrive together.

Simply put, they were not slated for the same time. Doing so makes no sense. It was scheduled for a small gap between them though. The idea is to get one chip bug-free(ish), and then half and quarter it once you have updated it with the fixes. It is the same thing that NV is doing for the Fermi variants, they haven't taped out either last time I looked.

That's a fair assumption but it doesn't mean that leading with a performance part makes it easier to get the high-end stuff out faster. For example, if GF100 was only half of what it is - say just 8 cores - would it have been here by now or is there something else going on other than the fact it's a big chip? GT2xx derivatives are tiny aren't they and that didn't seem to help much at all.

If you do a small part, and optimize the architecture for that, it is much harder to make it larger than to do the hard work up front and cut things out. See G212 for more there..... :)

-Charlie
 
The one place where DRC violations are acceptable is for memories: the fab provides a basic 1-bit memory cell that can be used by a design house to build their own RAMs. Those cells violate the standard DRC deck, but are pre-approved by the fab.
Really? I did not know that, thanks. I think you just cleared up a bunch of stuff for me and I'm just amazed that I understand it now. :oops:

Sincere appreciations, it did help give me a good clue. :)
 
Something else is at play here.

Well that's the big question isn't it? If as silent-guy says there are all these measures in place to catch problems early on is it really possible that they screwed up that badly?

Simply put, they were not slated for the same time. Doing so makes no sense. It was scheduled for a small gap between them though. The idea is to get one chip bug-free(ish), and then half and quarter it once you have updated it with the fixes.

If you do a small part, and optimize the architecture for that, it is much harder to make it larger than to do the hard work up front and cut things out. See G212 for more there..... :)

Yep, that makes sense to me.
 
Well that's the big question isn't it? If as silent-guy says there are all these measures in place to catch problems early on is it really possible that they screwed up that badly?

The way I heard it on the 65->55nm transistion is that TSMC did do what Silent-Guy said, and NV overrode them. His version of things is "the way it happens", but a sufficiently loud, angry, or threatening client can bend them at their discretion, should they be willing to shoulder the results.

NV thought they knew better, and JHH could scream at the laws of physics until they were intimidated. Physics won.

-Charlie
 
If you do a small part, and optimize the architecture for that, it is much harder to make it larger than to do the hard work up front and cut things out. See G212 for more there..... :)

-Charlie

Thats been Ati's approach for some time now, often using newer (smaller) process on value oreintated parts (mobile, chipset, rvX30/rvX10),.. the latest being the 4770 (rv740 - 40nm) prior to the HD5000 (RV8xx - 40nm), I think this philosophy came into play as far back as the x700XT but execution is always a factor.
 
No, those 'rumors' were started by me, and they were about the G200 65->55nm shrink that took 3 or so steppings to pull off. It should have been a no-brainer, but somehow, it took multiple steppings.
This is why you should stick to reporting tape-out dates: all they require are a cosy relationship with a back-room fab worker in Taiwan. When you fantasize about technical issues that are above your level of understanding, you have this strange tendency to run with the most unlikely (most sensational?) story line.

You obviously can't come up with a single example of how bullying your fab partner into violating a particular process rule when doing a shrink could result in some kind of advantage (at the cost of more risk.) Here's the funny part: neither can I.
 
Well yeah TSMC's own problems make it difficult to allocate blame. Who knows, maybe GT2xx was so late because Nvidia was unwilling to launch a high volume, low cost part until there was sufficient supply from TSMC.

There are many details that are not adding up. GT21x chips are highly underwhelming compared to predecessors and I can't come up with a reasonable explanation why frequencies on those are so modest. Eventually we'll find out what is going on, but at the moment all we can deal with is random guesswork.

There are a lot of potential variables. And any problems that AMD saw are going to be magnified significantly for Nvidia given how much more ambitious Fermi is than Cypress.
Chip complexity is an "indirect" variable for yields IMO and that's exactly the reason why am I asking myself if it would had been wiser for NV to pull a smaller part ahead of GF100 in terms of production.

In terms of the A3 spin do you really have any confidence in any Fermi rumours on either manufacturing status or timeline at this point? It seems to be shifting on a weekly basis.
By the time I saw the date on shown A1 chip at GTC I didn't have much hope personally that a real hard launch within 2009 is possible and AMD's supply problems with Cypress chips also support it. Now if there's a respin however modest it may be, it pushes obviously any release projections further into 2010.

This simple fact is that we have 4 products ramped/ramping on a (relatively) new, capacity constrained process in a market where demand is high for everything and our 40nm products are the only parts with the latest feature set.

No doubt here. While I know that no IHV will ever expose yield data on specific chips I'd say that it's pretty safe to assume that yields are considerably more reasonable for Juniper than for Cypress chips. I haven't seen any as high shortages for 5700 SKUs as with 5800's so far.

In the end it all bounces back to TSMCs 40G problems and I suppose based on the above that even if GF100 shouldn't have any architectural hw problems, yields should be diametrically more troublesome than with Cypress.
 
There are many details that are not adding up. GT21x chips are highly underwhelming compared to predecessors and I can't come up with a reasonable explanation why frequencies on those are so modest. Eventually we'll find out what is going on, but at the moment all we can deal with is random guesswork.

the cards are meant for OEM, so they choose low power over everything else.
even in retail, you would buy one if you want both low power and nvidia. If they upped voltages and clocks the cards would get more ridiculed by the G9x :LOL:

anyway that recalls the geforce 8600, radeon 2600 days. Awkward scaling down of huge chips.
 
the cards are meant for OEM, so they choose low power over everything else.
even in retail, you would buy one if you want both low power and nvidia. If they upped voltages and clocks the cards would get more ridiculed by the G9x :LOL:

Fair enough; but doesn't it still sounds strange that a 40nm chip (despite it's slightly higher chip complexity) has a lower frequency tolerance than 55nm chips? It doesn't only sound like there's no gain at all, but even more that the 40G is putting those into a disadvantage.

Albeit not directly comparable at all, the 5870@40nm reached the same 850MHz as the 4890@55nm with something less than twice the chip complexity.
 
double precision hopes

Uhh .. I hope I don't get caught in the fighting and the bloodshed going on :rolleyes:
I am hoping, but not expecting, that the middle-range Nvidia parts will have double precision. Thoughts?
 
I am hoping, but not expecting, that the middle-range Nvidia parts will have double precision. Thoughts?

What would NV benefit from having full DP support on mid-range parts? Besides adding chip complexity it will eat into sales of parts that actually have good DP performance.
 
What would NV benefit from having full DP support on mid-range parts? Besides adding chip complexity it will eat into sales of parts that actually have good DP performance.

I guess the benifit will not be large but it will offer developers a consistent feature set which may be benificial to things like engineering software developers. Lets imagine you are a computational fluid dynamics software house, then there is very little benefit to investing in GPGPU to you if your software (which requires DP) runs on only highest end cards since most of your potential audience will be using midrange cards (maybe Quadro branded midrange cards) on lets say mobile workstations or middle end desktops which most engineers actually use. There is more to GPGPU than just big supercomputers or servers. Without DP in middle-end, serious GPGPU applications wont really proliferate.

But I dunnow if that reasoning is sound or not from a business perspective. AMD certainly has reasons to think that providing DP on midrange doesnt make sense but then again GPGPU doesnt seem to be a big part of AMD's strategy either.

I think I am kind of alone in this reasoning so I am not really expecting them to add DP (but I am hoping they do). To me, not adding DP to a chip they market as capable of doing compute is like Intel or AMD making CPUs without DP on the basis that consumers dont really need DP to browse facebook or watch hulu.
 
with being one DP unit for eight SP units, you're looking at 12 DP units in a gt215 (is that the number of DP units on an Athlon II X4) and 6 DP units on a GT216 (yuck!).
At that point you'd better ssh into a machine which has a gtx260 or better, or compile CUDA to x86.

The real DP game will be Fermi, GT200 provided a working implementation for the HPC people to get onboard.
 
Fair enough; but doesn't it still sounds strange that a 40nm chip (despite it's slightly higher chip complexity) has a lower frequency tolerance than 55nm chips?

IMHLO it's a difference, if you put mobile GPUs on Desktop SKUs or if you design Desktop GPUs from the ground up. Especially wrt to clocks.
 
GT21X is anything but a sterling design. It scales badly and the "enhancements" are not making it better compared to GT200.

Going with high clock speeds would do little for the performance, but remove the only advantage those cards have over the G9X generation. from an end user point of view, that is.
 
I wonder where these cards they are taking pictures of are at in terms of working. We know they have sent some to GPGPU people as well. Are they running at low clock speeds or something? Otherwise why are they waiting to let someone review them, unless they are trying to avoid being accused of a paper launch. I guess that might be possible, but given they are already accused of such a thing and past history I would imagine they would be willing to let them be reviewed anyway unless they are running at abysmally low frequencies.
 
GT21X is anything but a sterling design. It scales badly and the "enhancements" are not making it better compared to GT200.
How did it scale badly? By the same token you could say that RV530 scaled badly from R580 compared to G73 scaling down from G71.

For the most part they scaled as expected. The starting points were just too disparate. Similarly, GT280 was huge compared to RV770 with minimal performance gain, so similarly sized derivatives based on these achitectures were not expected to be close in performance.
 
Back
Top