NVIDIA Fermi: Architecture discussion

Discussion in 'Architecture and Products' started by Rys, Sep 30, 2009.

  1. Silus

    Banned

    Joined:
    Nov 17, 2009
    Messages:
    375
    Likes Received:
    0
    Location:
    Portugal
    That certainly isn't what I took from Bill Daily's words. Can ECC simply be switched off, without removing the transistors behind it ?
    DP capabilities can obviously be trimmed down, by removing some of the Stream Processors, but why would Bill Daily mention this, if the full fledged Fermi chip was indeed powering the high-end GeForce too ?

    Still, I don't disregard that possibility, so it's wait and see I guess.
     
  2. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,135
    Likes Received:
    573
    So? It can't get full throughput through the fp32 and int32 pipelines when not using DP ... you could just as easily say they did it like that because they needed the extra multiplier hardware for DP anyway. Seeing as how little part of the die is taking up by ALUs though, I doubt it matters in the big scheme of things. They'd have to get the area ratio of ALUs way up first.
    IMO if patents don't get in the way ATI will simply follow them next-gen, ECC on caches is only a ~10% area overhead and the per block in memory stored ECC codes for the DRAM (which I'm pretty sure is what they are doing now) is pretty much gratis except for the reduced bandwidth.
     
  3. Silus

    Banned

    Joined:
    Nov 17, 2009
    Messages:
    375
    Likes Received:
    0
    Location:
    Portugal
    In here, some of the points being discussed above are mentioned:

    http://www.brightsideofnews.com/new...mi-is-less-powerful-than-geforce-gtx-285.aspx

    In update #2, where Theo seems to have talked with Mr. Andy Keane, General Manager of Tesla Business and Mr. Andrew Humber, Senior PR Manager for Tesla products and these came up:


    • Memory vendor is providing specific ECC version of GDDR5 memory: ECC GDDR5 SDRAM
    • ECC is enabled from both the GPU side and Memory side, there are significant performance penalties, hence the GFLOPS number is significantly lower than on Quadro / GeForce cards.
    • ECC will be disabled on GeForce cards and most likely on Quadro cards
    • The capacitors used are of highest quality
    • Power regulation is completely different and optimized for usage in Rack systems - you can use either a single 8-pin or dual 6-pin connectors
    • Multiple fault-protection
    • DVI was brought in on demand from the customers to reduce costs
    • Larger thermal exhaust than Quadro/GeForce to reduce the thermal load
    • Tesla cGPUs differ from GeForce with activated transistors that significantly increase the sustained performance, rather than burst mode.
    The third and last points certainly give the same hint that Bill Daily gave before. That GPUs used for Tesla and GeForce will differ in terms of features enabled. So the question is if this "enabling" can be done without removing actual transistors.
     
  4. compres

    Regular

    Joined:
    Jun 16, 2003
    Messages:
    553
    Likes Received:
    3
    Location:
    Germany
    Why do we have to wait so long to see this? :(
     
  5. Silus

    Banned

    Joined:
    Nov 17, 2009
    Messages:
    375
    Likes Received:
    0
    Location:
    Portugal
    Er...In my last post, I obviously meant "So the question is if "disabling" can be done without removing actual transistors."

    Seems I can't edit yet. Must be because I'm new here :)
     
  6. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,505
    Likes Received:
    424
    Location:
    Varna, Bulgaria
    Probably the INT32 ALUs will be throttled/disabled by -- let's say -- a factor of four, for the GF/Quadro SKUs?
     
  7. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,875
    Likes Received:
    767
    Location:
    London
    No - typically for NVidia the register file bandwidth just isn't there.

    I partly agree - it's a question of the balance required for INT multiplication. INT multiplication seems to be a troublesome bottleneck in earlier GPUs. If INT multiplication stayed in the special function unit, then it would be "too slow". NVidia's only choice then is to put it into the main pipe.

    So, in the end, NVidia's gained INT and DP capability through the addition of the INT32 unit and probably a super-wide adder for subnormals. The latter is, arguably, the only bit that's DP-specific. It seems to me similarly costly as the DP-specific-overhead in RV870.

    Because compute is part of graphics now, I think increased INT capability is justifiable for graphics, particularly as bytewise addressing is part of DirectCompute - 24-bit arithmetic isn't enough to address the largest resources that D3D11 supports, bytewise.

    The irony being that this architecture is meant to scale over the next 3-4 years. And the one thing that will definitely go up is the proportion of die taken by ALUs, since memory scaling hasn't got much breathing room. I can imagine a 512-bit variant, but not more.

    I expect a right marketing battle over ECC will ensue. I strongly believe it's a white elephant as there is still no public evidence that soft errors aren't the result of faulty hardware in GPU based systems.

    Jawed
     
  8. SlmDnk

    Regular

    Joined:
    Feb 9, 2002
    Messages:
    588
    Likes Received:
    206
  9. Richard

    Richard Mord's imaginary friend
    Veteran

    Joined:
    Jan 22, 2004
    Messages:
    3,508
    Likes Received:
    40
    Location:
    PT, EU
    This shows two things:

    a) It will run DX11 benchmarks.
    b) It's big.

    So unless nVidia thought the community had any doubts of the above, why release this while conveniently leaving out the fps. Seems a bit... desperate is too strong a word but definitely awkward. I'd expect a similar stunt by PowerVR/SIS back from the dead or even LRB which do have something to prove.

    Does nVidia believe it has something to prove?
     
  10. Sxotty

    Legend Veteran

    Joined:
    Dec 11, 2002
    Messages:
    5,087
    Likes Received:
    448
    Location:
    PA USA
    While that is true there are still people who say it isn't working, doesn't exist, cannot run code yet. Thus it addresses part a) as mentioned in your post. An elephant is big, but cannot run DX11 benchmarks. :)
     
  11. w0mbat

    Newcomer

    Joined:
    Nov 18, 2006
    Messages:
    234
    Likes Received:
    5
    Well, since we know how NV copes with this stuff (eg fake Fermi board) i wouldnt take this as proof. Maybe they got a HD5870 under the table =D
     
  12. Vincent

    Newcomer

    Joined:
    May 28, 2007
    Messages:
    235
    Likes Received:
    0
    Location:
    London

    Another Possibility :

    Nvidia may have two distinctive ASICs with/without DP Support. ( Fermi and Geforce )
     
  13. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,505
    Likes Received:
    424
    Location:
    Varna, Bulgaria
    Wow!
    Using DP arithmetics to compare GT200 vs. Fermi -- that's more like a case of showing how much GT200 lacks doubles throughput.
     
  14. Tchock

    Regular

    Joined:
    Mar 4, 2008
    Messages:
    849
    Likes Received:
    2
    Location:
    PVG
    It's as simple as laser-cutting or eFuse blowing. They did it on the Quadros vs Geforce, this should be similar.

    ECC should be disabled even on the Tesla, otherwise you wouldn't need to advertise ECC on/off available memory figures.

    Was talking to Farhan that day and he puts nV in good faith that DP FLOPs won't magically vanish. I'm more skeptical. :lol:

    If nVidia had something else to show to the gamer community other than the same GF100, why is it not being shown? Knowing them it would have come first. Instead they're showing corner case advantages vs GT200. I know it's part of Sun Tzu and playing the cards you play best and leave the rest to imagination, but ATI plays every card they have in hand. One exudes confidence the other doesn't.


    p/s: This is kinda like Barcelona vs Harpertown the more I look at it. At least GPU cycles are faster. Hmm.
     
  15. Vincent

    Newcomer

    Joined:
    May 28, 2007
    Messages:
    235
    Likes Received:
    0
    Location:
    London


    The customer who bought the Tesla C10XX :shock:


    :razz:
     
  16. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,580
    Likes Received:
    622
    Location:
    New York
    What's wrong with that given the target market? There are people who are currently using Tesla's weak DP throughput today who would be interested in the comparison.
     
  17. Silus

    Banned

    Joined:
    Nov 17, 2009
    Messages:
    375
    Likes Received:
    0
    Location:
    Portugal
    Don't really see it as a problem. You also saw almost nothing about G80, until about 2-3 weeks from the actual launch and look how that turned out. They obviously have A2 chips but don't want to show the gaming bits yet, since A3 will be the chip that will...er...ship :)

    Showing HPC specific tasks running on Fermi, makes sense since that's the new market that Fermi is trying to get into in full force and it's highly profitable.

    All the fuss (most of it fueled by "articles" written by you know who) that NVIDIA was leaving the high-end market - because of supply constraints, that affected AMD too, but NVIDIA was the only one in trouble in those "articles" of course - and the fact that NVIDIA only showed Fermi "HPC bits" was further indication of that, was absurd. Fermi was designed with much more than gaming in mind, but it's definitely a gaming chip aswell.
     
  18. digitalwanderer

    digitalwanderer Dangerously Mirthful
    Legend

    Joined:
    Feb 19, 2002
    Messages:
    17,641
    Likes Received:
    2,105
    Location:
    Winfield, IN USA
    Y'all really believe that's a working card just because nVidia says it is? :shock:

    [​IMG][​IMG][​IMG]

    Oh man! Unless they show the card, monitor, and the wire connecting them clearly in the shot I'm gonna be extremely skeptical; and even if they do that I'll check carefully to make sure it's not a photoshopped screencap placed on a set shot.
     
  19. Groo The Wanderer

    Regular

    Joined:
    Jan 23, 2007
    Messages:
    334
    Likes Received:
    2
    Umm, the people I talk to say they test about once every hour or four (depends on wafer rates on the tool more than anything else). Now if TSMC took metrology to 1/10 of what it was, they should have caught it in a day at most.

    On top of that, you can't ramp a process without massive metrology input and feedback. If you are trying to up yields from crap to less crap, you NEED that feedback. Even if management tells the engineers to save a very small bit of time by skipping that step, all they will do is make sure process improvement goes from science to guesswork.

    To miss it for multiple months is not plausible. To not test is not plausible. To lessen tests to a degree that this would go undetected is also not plausible. If you have a good explanation for how you ramp a process and new equipment without feedback until the chip is done, let me know, we can make a lot of money on it.

    [conspiracy hat on] One scenario could be that someone will lose less money by paying TSMC to spike yields on the whole process than they would by their competitor eating them alive in the market.[conspiracy hat off] I am not saying this is happening, nor am I saying it is only affecting ATI, I am just saying that something is really really wrong. The explanations don't add up, or even come close.

    Now if they had said, "we are ramping new lines, and during that, XYZ", that would explain why output is not going up, but not why it went DOWN from what it was. Please note I am talking yield as a percentage of die candidates, but overall number of dies coming off the line. That should not go down at all, ever, or at least not down a lot. It did.

    I was just referring to the graphics portion. I agree with what you say on the overall picture, for now. It will be a different game in ~6 months though, but I can't say why yet.

    Lets see how they do that. It is going to be funny to watch them spin that one. "It is _THE_ most important thing since the invention of knee pads" said one NV spinner when asked about Fermi, "but it is only important in chips measuring over 500mm^2 because of technical reasons that are 'beyond our scientific understanding'*". Spin till ya puke.

    -Charlie

    * They actually used that on me when they were trying to convince me that the bad bumps were not catchable at an earlier stage. Really. The other five process/packaging people I talked too all gave me an answer that was within the understanding of then current science, and all five had the same answer too.
     
  20. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    Can anybody give a quarter sensible reason how putting a DVI port on a tesla will help reduce costs for supercomputers when these babies cost ~$3K a pop. And it's not like they undercut Quadros on price either
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...