NVIDIA GF100 & Friends speculation

Discussion in 'Architecture and Products' started by Arty, Oct 1, 2009.

  1. Sxotty

    Legend

    Joined:
    Dec 11, 2002
    Messages:
    5,496
    Likes Received:
    866
    Location:
    PA USA
    Likewise. Is it pax east?
     
  2. Sontin

    Banned

    Joined:
    Dec 9, 2009
    Messages:
    399
    Likes Received:
    0
    yes: http://www.geforcelan.com/

    And Rollo said it will be a hard launch of the cards.
     
  3. Groo The Wanderer

    Regular

    Joined:
    Jan 23, 2007
    Messages:
    334
    Likes Received:
    2
    As an update, I am hearing, totally unconfirmed so far, that there may be one other tapeout either done or pending. I am far from 100% on this one though.

    -Charlie
     
  4. Groo The Wanderer

    Regular

    Joined:
    Jan 23, 2007
    Messages:
    334
    Likes Received:
    2
  5. Ethatron

    Regular Subscriber

    Joined:
    Jan 24, 2010
    Messages:
    948
    Likes Received:
    417
    I always thought FP-logic has suffix-bits to accomodate for rounding errors, so I thought in this case 24 bit is not 23+1 implicit but a real 24bit mantissa + 1 implicit bit.
    Otherwise none of the rounding modes in IEEE would make any sense as they would all be identicall if underflow bits can't surface.

    I think a reference is x87 FP32 treatment in single precision mode. It uses more bits while calculating and chopping products. AFAIK
     
  6. Groo The Wanderer

    Regular

    Joined:
    Jan 23, 2007
    Messages:
    334
    Likes Received:
    2
    Where did you get that number from? Got a link?

    -Charlie
     
  7. Groo The Wanderer

    Regular

    Joined:
    Jan 23, 2007
    Messages:
    334
    Likes Received:
    2
    It probably was done by NV's PR team. They tend to shop stories like that around to various sites, starting out at the high end, and moving down. They use it to point to as an 'independent source' to 'collaborate' their view.

    Normally, the sites run by people with a brain no better than to touch those pieces, and so things get shopped to progressively more sketchy sites until someone bites. ATI used to do this back in the day, I haven't seen it in a while. AMD and Intel never did that I am aware of, but that isn't definitive.

    -Charlie
     
    #1807 Groo The Wanderer, Feb 22, 2010
    Last edited by a moderator: Feb 22, 2010
  8. Groo The Wanderer

    Regular

    Joined:
    Jan 23, 2007
    Messages:
    334
    Likes Received:
    2
    I have never seen anyone ask for editing rights to an article. Some have asked nicely, and a few I have offered it to, but those were deep architecture articles where some of the bits were nuanced and complex. It was more of a fact check than editorializing, and the articles were like this:

    http://www.semiaccurate.com/2009/10/29/look-100-core-tilera-gx/

    For simple pieces or reviews, never seen it, never been asked either. I am pretty sure that any PR person knows better than to ask that. If a site does do let PR run roughshod over their articles, it is open season on that site, and they become coopted very fast, and die off quick.

    I have heard rumors of some Taiwanese vendors asking about such, but nothing concrete.

    Also, this is very different from getting a letter after something goes up saying, "You got that wrong, and here is why. Can you correct it?".

    -Charlie
     
  9. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    GPU fp32 has historically not been particularly accurate but gradually improving. This has been tightened up as FFMA in CUDA 2.0 devices (Fermi onwards), which holds on to the full result from the MUL and there is only one rounding after the addition. It's all IEEE-754 compliant precision in 2.0.

    The point I was raising was that an fp32 MUL will normalise its output. e.g. if you multiply two integers that are encoded as subnormals, the ALU will return the most significant digits and normalise the result (if possible).

    I've just checked it (sigh, should have done that earlier), and CUDA 1.x's mul24 returns the low 32 bits of the result. Emulation in 2.0 devices should be nothing more than a bit of masking before doing the multiplication.

    So, regardless of my normalisation point, this technique can't work on Fermi to perform the old function.

    ---

    Oh and I've just noticed that floating point exceptions are quiet in CUDA 2.0.

    Jawed
     
  10. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    But that's not enough bits for DP. Whereas fp32 and int32 bridged does the job. The latency is the same as an fp32 MUL, but the ALUs effectively become 8-lane instead of 16 (G.4.1 in CUDA Programming Guide 3.0).

    Jawed
     
  11. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    Wouldn't Fermi just use its DP arithmetic units for int32 mul?
     
  12. John021

    Newcomer

    Joined:
    Jan 1, 2010
    Messages:
    29
    Likes Received:
    0
  13. ap_

    ap_
    Newcomer

    Joined:
    Feb 17, 2010
    Messages:
    9
    Likes Received:
    0
    I think so too. Also explains why DP and int32 rates are similar in Jawed's reference.
     
  14. Mize

    Mize 3dfx Fan
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    5,079
    Likes Received:
    1,149
    Location:
    Cincinnati, Ohio USA
  15. CouldntResist

    Regular

    Joined:
    Aug 16, 2004
    Messages:
    264
    Likes Received:
    7
    Meh. Hitler rants are only good when he sounds as disappointed supporter. The message of the meme is supposed to be "I was loyal fan, and they shafted me", not "LOL you failed, losers".
     
  16. eastmen

    Legend Subscriber

    Joined:
    Mar 17, 2008
    Messages:
    13,878
    Likes Received:
    4,727
  17. aaronspink

    Veteran

    Joined:
    Jun 20, 2003
    Messages:
    2,641
    Likes Received:
    64
    Nothing I've seen suggest that Fermi has DP units per say. Everything available so far point to a bridge and extend functionality integrated into the SP units themselves. Basically, each SP unit contains a ~24x53b multiplier and two of these are bridges together to generate the DP mantissa. In contrast it appears that RV870 bridges 4 ~24x24 multipliers from 4 SP unit for its DP math.

    At 1/4 or 1/2 rate FP there is little point in having separate units from an area perspective.
     
  18. aaronspink

    Veteran

    Joined:
    Jun 20, 2003
    Messages:
    2,641
    Likes Received:
    64
    My contention was they've already extended the SP multiplier such that 2 SP multipliers can be bridged to handle a 54b mantissa. The other option is to have a separate multiplier but that doesn't explain the int32 multiplier performance.
     
  19. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    /nitpick on

    Shouldn't it be 27bx27b multipliers?
     
  20. Mindfury

    Newcomer

    Joined:
    Oct 6, 2009
    Messages:
    232
    Likes Received:
    0
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...