NVIDIA Fermi: Architecture discussion

Discussion in 'Architecture and Products' started by Rys, Sep 30, 2009.

  1. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    http://www.realworldtech.com/page.cfm?ArticleID=RWT093009110932&p=7

    This implies that all "32 FPU cores" work together to produce 32 DP-FMAs in two cycles. It seems to imply that INT is not used for DP, and that each pipeline produces 16 FMAs in two cycles. But the instruction is despatched to both pipelines, from a single warp.

    There's also the question of the need to implement subnormals, which in GT200 seemingly carries the cost of a 168-bit adder. Does GF100 have an adder like that?

    Except SP and INT can't be dual-issued within a pipeline.

    Jawed
     
  2. Voxilla

    Regular

    Joined:
    Jun 23, 2007
    Messages:
    832
    Likes Received:
    505
    Well, Fermi isn't available now. If you want to do DX11 now HD 5870 is the only option. From a point of view of DirectX11 I don't see why Fermi would be more interesting.

    Last year I got the HD 4870 first too but later switched to the GTX280 on my main development machine, mainly because it had twice the amount of memory. I might be tempted too this time to switch to Fermi, but from what I know about it I doubt.


    Personally I have to make a distinction between my professional work, and my 'hobby' work. At work we develop for a non gaming market and use the professional cards, these lag for quite some time to the consumer cards and we don't get them for free. For development they can be bought at reduced prices, at least with NVidia. We also use ATI cards for another product, so we use both brands. In my development machine there is both an ATI and Nvidia, both usable under XP, not so under Vista. So yes we develop and test for all kinds of cards.

    For my hobby work I stick to one card, as with Vista no two brands can be used in the same machine, Windows 7 will fix that apparently. And yes, probably key game developers get free and preliminary hardware. I remember the days where I got a couple of ATI 9700pro's and 8500s for free.
     
  3. chavvdarrr

    Veteran

    Joined:
    Feb 25, 2003
    Messages:
    1,165
    Likes Received:
    34
    Location:
    Sofia, BG
    Fermi is on sale?! Where? :D
    Also writing for Rv8xx, there is big chance that this software will run on every DX11 card.
    Writing for Fermi will mean it will run only on future NV hardware.
    With NO date set for HF100 availability and NO date set for low/mid videocards based on GF100 architecture... what makes you think all developers shouldn't buy AMD hardware and instead wait for NV?!
     
  4. Enforcer

    Newcomer

    Joined:
    Apr 17, 2008
    Messages:
    32
    Likes Received:
    0
  5. Creig

    Newcomer

    Joined:
    Nov 20, 2006
    Messages:
    57
    Likes Received:
    1
    I have received video cards from my wife as Christmas presents in the past. If I feel the need to upgrade and it's around that time of year, I expect it will be on my 'wish list' once again.
     
  6. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    Are those separate units in GF100 for such an estimate to make sense?
     
  7. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    Which begs the question, why bother with so much int mul? It's not like they are gonna have terabytes of memory in fermi, so address calculation don't need so much precision right now.
     
  8. Enforcer

    Newcomer

    Joined:
    Apr 17, 2008
    Messages:
    32
    Likes Received:
    0
    DP multipliers are 53x53, and take much more size than 2 single precision 24x24 multipliers, so even if some parts of logic can be shared, the estimate is quite fair.
    There is additional cost for re-using as well, example:
    http://www.lirmm.fr/arith18/papers/libo-multipleprecisionmaf.pdf
    P.S. My point is that gamers/consumers can easily tolerate 5-10% increase in cost and power,
    and get DP and INT32 MUL functionality as a bonus.
     
    #468 Enforcer, Oct 5, 2009
    Last edited by a moderator: Oct 5, 2009
  9. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    May be they are using the same 32 bit multipliers for both sp and int.
     
  10. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    Aside from the Int units themselves, there is already much of the infrastructure in place.

    Whatever the hardware cost of a full 32-bit multiplier, it still sits behind the already established register file, read ports, operand collectors, and two rather complex schedulers. If it shares hardware with DP floating point units that would be inactive anyway, so much the better.

    The integer multiply is half-speed and DP FMA is half-speed, coincidence?
     
  11. Sampsa

    Newcomer

    Joined:
    May 23, 2007
    Messages:
    43
    Likes Received:
    1
    Location:
    Finland
    7 years ago NVIDIA tested packed GPUs from Fab like this:

    [​IMG]

    [​IMG]

    We know Fermi is a working silicon:

    [​IMG]

    So I wonder how does today's Fermi test board look like? Does it really have a lot of wires sticking out, as well as test modules, and looks like a character from Terminator like Fudo describes or is it something similar to year 2002's test board:)
     
  12. Tim

    Tim
    Regular

    Joined:
    Mar 28, 2003
    Messages:
    875
    Likes Received:
    5
    Location:
    Denmark
    If they are launching in November like Fudo claims (or even December) they should have working chips on production level PCPs by now, it would pretty unusual/dangerous to start mass-production without producing prototype-boards.

    Edit: No reason to quote all the pictures.
     
    #472 Tim, Oct 5, 2009
    Last edited by a moderator: Oct 5, 2009
  13. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Ooh, that's very useful, will have a proper read of that at some point.

    It's worth noting that a multi-precision unit is an 83% overhead on 2 single-precision FMA units. 708590 um² versus 386384 (2*193417) (180nm library).

    But this is swamped by the non-math portion of the cores. I'm guessing that the DP/SP ALUs amount to ~15% of each core. Assuming the die is 480mm², with about 46% being cores (judging from the die shot), that's 33mm² (your earlier estimate on 45nm) out of 220mm² of cores.

    Jawed
     
  14. apoppin

    Regular

    Joined:
    Feb 12, 2006
    Messages:
    255
    Likes Received:
    0
    Location:
    Hi Desert SoCal
    They said they designed it to be modular to keep all of their options open - just as in the past.
     
  15. rjc

    rjc
    Regular

    Joined:
    Oct 27, 2008
    Messages:
    270
    Likes Received:
    0
    Hotboy at it again:

    So depending (a fair bit) on the system used is maybe a little above GTX295 performance at the moment. Likely will get better if they can get the frequency up and with a more mature driver.

    Translation note: 老 is more like venerable or experienced or well known sort of affectionate term rather than old which sounds kind of harsh isn't it?
     
    #475 rjc, Oct 6, 2009
    Last edited by a moderator: Oct 6, 2009
  16. Tchock

    Regular

    Joined:
    Mar 4, 2008
    Messages:
    849
    Likes Received:
    2
    Location:
    PVG
    Nah, everyone on chiphell is used to calling JHH's nick :lol:
     
  17. Florin

    Florin Merrily dodgy
    Veteran Subscriber

    Joined:
    Aug 27, 2003
    Messages:
    1,707
    Likes Received:
    345
    Location:
    The colonies
    But what does that mean? Specifically, the 'it is said can run X10000, still frequency not high in these circumstances'?
     
  18. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,059
    Likes Received:
    3,119
    Location:
    New York
    Presumably Vantage extreme score. Haven't heard anything about target clocks though so it's hard to say what's low.
     
  19. dkanter

    Regular

    Joined:
    Jan 19, 2008
    Messages:
    360
    Likes Received:
    20
    IMUL in fermi is 32b now, but the latency is 4 cycles (vs. 2 for IADD).

    Also, ALUs and AGUs are different animals.

    David
     
  20. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    I should have used an online translator heh....so unless I've picked the wrong chinese translation it could be:

    http://babelfish.yahoo.com/translat...ead.php?tid=56185&lp=zh_en&btnTrUrl=Translate
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...