NVIDIA Fermi: Architecture discussion

Discussion in 'Architecture and Products' started by Rys, Sep 30, 2009.

  1. no-X

    Veteran

    Joined:
    May 28, 2005
    Messages:
    2,455
    Likes Received:
    471
    Using your logic, ATi could marketed RV770 as 900SPs part, because they never said, that any real product will offer full configuration.

    If any GPU has a set of SPs, which is always used for redundancy (=the case of Tesla parts), there's no reason to advertise full number of SPs... until you want to spoil your competitors launch...
     
  2. Silus

    Banned

    Joined:
    Nov 17, 2009
    Messages:
    375
    Likes Received:
    0
    Location:
    Portugal
    That would be quite different wouldn't it ? ATI would be advertizing that their architecture has 900 SPs, when in fact it only has 800. That's not what's happening here. What we have here is a product based on Fermi, that has some units disabled. Is that a big deal ?

    Fermi has 512 ALUs. Products based on Fermi will have as many ALUs as NVIDIA wants to or has to (up to 512). Since they never revealed how many ALUs Fermi based Teslas would have (until now that is), I really don't understand why the big deal over this...

    Most people in this thread are always going on about how NVIDIA is focusing so much on the HPC market (with Tesla) and that it doesn't matter for gamers, yet make a big deal about what a Tesla product is, now that full specifications are known.

    As a saying around here goes (which I will roughly translate): "Punished for having a dog and punished for not having one"
     
  3. A.L.M.

    Newcomer

    Joined:
    Jun 2, 2008
    Messages:
    144
    Likes Received:
    0
    Location:
    Looking for a place to call home
    Oh, come on...
    Everyone, Rys included, was fooled by the words of JHH, calculating the theoretical math power of Tesla with 512 cores active. This is because they wanted to be unclear in their statements...
    They were simply boasting something that they couldn't, just to keep up the hype.
    The whole press meeting and presentation of Fermi was about the Tesla version, not the Geforce one. Thus I expect (as everyone else out there does) that if someone is telling me: "I am presenting a HPC card based on a chip that can have up to 512 cuda cores", I will find 512 cuda cores active in the high end version of that line of products.

    All the theoretical comparisons with RV870 were wrong, actually, and they didn't clarify anything, just in order to fool people...
     
  4. Sontin

    Banned

    Joined:
    Dec 9, 2009
    Messages:
    399
    Likes Received:
    0
    Do you know the specification of the geforce cards?
     
  5. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    Debate a claim or a data point.
    Nobody with a decent point is going to concern themselves with what Charlie can/will/won't/can't/has the itch/hankerin' for/desire to do.

    If you want to debate his motivations or whatever level of good/bad qualities, take it up in on the Semiaccurate forum.
     
  6. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,059
    Likes Received:
    3,119
    Location:
    New York
    The complete "english" translation is "Damned if you do and damned if you don't" :D

    Whether or not Nvidia broke some implicit promise by disabling a few cores isn't really an issue as long as they achieve promised performance. AMD had to disable a few SIMDs to produce their second tier part and they're working with a much smaller die. It's an ominous sign for Geforce parts if the cut was made for yield reasons though. Tesla should be a whole lot more tolerant to low yields given the much lower volumes and higher ASP.
     
  7. Silus

    Banned

    Joined:
    Nov 17, 2009
    Messages:
    375
    Likes Received:
    0
    Location:
    Portugal
    Yeah, that's basically it :)

    Exactly. But I doubt it will affect the 512 ALUs part that much anyway. Those parts are always low volumes too.
     
  8. Silus

    Banned

    Joined:
    Nov 17, 2009
    Messages:
    375
    Likes Received:
    0
    Location:
    Portugal
    I'm not disputing that it was what they wanted us to believe. You are however disputing that they never said it in so many words: that Tesla would not use Fermi's full chip. And the actual fact is that they didn't.

    Also, they never boasted about something that they couldn't.
    Fermi was pitched as 8x DP over GT200. And this was in GTC, which as you said was about the Tesla version. Last time I checked 8 * 78 GFLOPs (624) is roughly equal to the ~627 GFLOPs that the now known Tesla based on Fermis, will have. So what they claimed, this far, is what they seem to be delivering.
     
  9. Groo The Wanderer

    Regular

    Joined:
    Jan 23, 2007
    Messages:
    334
    Likes Received:
    2
    So you are saying that it is more power efficient to have fewer higher clocked shaders than more lower clocked ones? Interesting view of physics your company has. I wonder if that is the explanation for bumpgate?

    -Charlie
     
  10. Groo The Wanderer

    Regular

    Joined:
    Jan 23, 2007
    Messages:
    334
    Likes Received:
    2
    I disagree. I think they expected to be at 512, if you recall, they only had A1 silicon for ~2-3 weeks before the conference. I don't think they realized how screwed they were at the time. Now they do.

    Short story, they may have been deluded by their own theoretical silicon prowess, but I think this may have been a rare case of corporate honesty on their part, which is why it seems so odd and unfamiliar to hear.

    Once again, reality intruded and ruined their master plan. Le-sigh. I am pretty sure ORNL wasn't as pleased with the silicon they got either.

    -Charlie
     
  11. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    I would have thought that downclocking more units could have been the way to go for the top low-volume bin. Random defects shouldn't have forced a cluster inactivation, even with the historically coarser methods Nvidia has used for redundancy.

    Just wondering:
    Maybe they aren't getting much of a power win in downclocking, and can't push the voltage any lower without incurring stability or subthreshold leakage problems?
    edit: maybe it's not subthreshold leakage, but one of the static leakage components?

    The supply voltage on what I believe was one of the official Tesla slides looked to be decently low and close to 1.0 and voltage scaling has been getting more challenging the lower it gets.
     
  12. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    Now I'm biased here, because the number one thing on my wishlist for Fermi architecturally was per-cluster clock speeds and it looks like it didn't happen, but I suspect this is more likely to be power-related; parametric yields, if you wish.

    Intra-chip variability is a big deal on these nodes and for such a massive chip. You need to choose one voltage for the entire chip, and some parts are going to have quite a bit of headroom left whereas others will just barely deliver. If you want to minimize voltage for a given clock frequency (i.e. optimize performance/mm²), then it helps to get rid of the lowest-clocking clusters. If you want to minimize leakage, you can disable the highest-clocking clusters which are usually the most leaky. So if you want to optimize overall power consumption, you pragmatically do a little bit of both.

    Obviously it'd be ridiculous to claim they couldn't deliver a low-volume SKU with 512 cores if they really wanted to, and I see neliz in this thread hinting they very well might still deliver a GeForce one. But given that power efficiency is super-important here and that these boards are already described as "<=225W" despite that, it probably wouldn't be a good idea to do so because variability would indirectly kill them. As I said, I might be overemphasising this point because flexible clock domains were at the top of my wishlist, but heh.
     
  13. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    To make removing clusters power efficient then the chip would have to have full, and fairly fine grained power gating, which gets expensive. Perf/W usually goes in favour of lower speed (hence lower voltage) rather than disabling clusters as you are still paying for the leakage element of the disabled parts - this is why you seen GTX 295 and Hemlock in the configuration you do.
     
  14. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    Well, the chip is very big, so what's a few tens of millimeters square between friends. ;)
    Upon review, the Tesla specs have voltage at 1.05.
    Could a chip that big be pushed lower?
     
  15. digitalwanderer

    digitalwanderer Dangerously Mirthful
    Legend

    Joined:
    Feb 19, 2002
    Messages:
    18,992
    Likes Received:
    3,533
    Location:
    Winfield, IN USA
    You mean you still pay for the leakage on the disabled clusters? WTF? That makes no sense to me. :oops:
     
  16. sethk

    Newcomer

    Joined:
    May 1, 2004
    Messages:
    93
    Likes Received:
    1
    Arun's post makes a lot of sense regarding the targeted selection of voltage sensitive portions of the chip to disable in order to hit a desired clockrate@voltage number, as opposed to just lowering the voltage below 1.05 and seeing if it still runs. I have to believe that at a certain point a 'brute force' (i.e. across the board) lowering of voltage wouldn't really be possible even at lowered frequencies, unless the 'weakest link' is disabled before trying to lower the voltage.
     
  17. Mize

    Mize 3dfx Fan
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    5,079
    Likes Received:
    1,149
    Location:
    Cincinnati, Ohio USA
    They're disabled logically, but it's not easy to disable power to them.
     
  18. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY

    Who said that, might want to read my post again:wink:. And do you have any basic understanding of thermal output vs. leakage vs. voltage at a given frequency in silicon chips?

    This is exactly why I stated you can't look at Tesla's flops and really work backwards to find out anything much about the Geforce line. And also A2 silicon was in the range of 550mhz to 650 mhz as base clocks if we use a multiplier of 2.2 to 2.4 to get the gflops of range they are going for. So come again? If you listened to the web conference call, they did state there the reason why the flops are lower for Tesla is because they had to because of end requirements of the systems they are going to be in.
     
  19. digitalwanderer

    digitalwanderer Dangerously Mirthful
    Legend

    Joined:
    Feb 19, 2002
    Messages:
    18,992
    Likes Received:
    3,533
    Location:
    Winfield, IN USA
    Ah, thanks. Makes sense now. :oops:
     
  20. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,022
    Likes Received:
    122
    I wonder how much effort this would be. Of course core i7/i5 is doing this, though it only has 4 cores so it isn't that fine-grained. nvidia would need to have 16 power gated sections (or if we assume shader clusters can only be disabled in pairs at least 8). OTOH though maybe this is easier if it only needs to be done statically? Obviously core i7 does this fully dynamic too.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...