Compare & Contrast Architectures: Nvidia Fermi and AMD GCN

Discussion in 'Architecture and Products' started by Acert93, Dec 24, 2011.

  1. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
    Judging from the 1/4-th DP rate, most likely there isn't enough mantissa range for full-speed INT32 multiplication.
     
  2. DarthShader

    Regular

    Joined:
    Jul 18, 2010
    Messages:
    350
    Likes Received:
    0
    Location:
    Land of Mu
    Couldn't a similar argument be made for the hardware schedulers not really having the opprtunity to show their mojo in simplisitc code? (unless the test addresses them specificaly ofc)
     
  3. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,382
    It is usually pretty easy to get peak rates out of directed tests, so the situation you describe is unlikely. The real problem is to create complicated test cases that trigger secondary effects at the system level.
    Typical examples are cache trashing (the more caches the harder it becomes) and SDRAM transaction scheduling.
    When AMD moved from 5VLIW to 4VLIW, they said that the former was a bit more architecturally efficient than the latter (for graphics workloads), but that the smaller size of the latter made it possible to compensate for the efficiency loss that by putting more instances on the die. On average, that's the right decision, but it makes you vulnerable to cases where this doesn't work.

    E.g. adding more parallel resources can make the performance dramatically worse if it tilts the cache into a trashing mode.

    (This is just a general observation, I'm not saying that this is specifically the case for GCN.)
     
  4. Tridam

    Regular Subscriber

    Joined:
    Apr 14, 2003
    Messages:
    541
    Likes Received:
    47
    Location:
    Louvain-la-Neuve, Belgium
    AMD talked about it during AFDS back in June :

    24 BIT INT MUL/MULADD/LOGICAL/SPECIAL @ full SP rates
    32-bit Integer MUL/MULADD @ DPFP Mul/FMA rate
     
  5. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    Fermi's subdivision of the geometry pipeline may also be different from GCN.

    Fermi has the polymorph engine and raster engine blocks, and devotes a fabric to keeping the polymorph engines in each SM in sync with one another. Outside of cases where there is an ordering constraint, it allows for more parallel setup work.

    AMD has kept the geometry engine confined outside of the CU block, which may mean that it is more conservative about how it sets up primitives and geometry.
    The division is also different because the pixel pipe contains both the scan conversion and render backend, while the primitive pipe contains the tessellation and geometry.
    Nvidia pairs edge setup, rasterization, and culling in one block, with the other functions placed in the polymorph block.

    I'm curious now as to the specialized bus in GCN for the ROPs and GDS.
    Is it to save bandwidth? Is it also because the GDS and ROPs are part of a pipeline with rather strict ordering, and the arrays of CUs and their R/W subsystem is not consistent enough to maintain it?
     
  6. denev2004

    Newcomer

    Joined:
    Apr 28, 2010
    Messages:
    143
    Likes Received:
    0
    Location:
    China
    That's so bad. Sounds like GCN's int is inferior to Fermi's
     
  7. cal_guy

    Newcomer

    Joined:
    Jun 27, 2008
    Messages:
    217
    Likes Received:
    3
    For 32-bit integer Fermi is double the rate of GCN. However the HD 7970 still has a 20% advantage over the GTX 580 in 32-bit operations because of it's shader count.
     
  8. denev2004

    Newcomer

    Joined:
    Apr 28, 2010
    Messages:
    143
    Likes Received:
    0
    Location:
    China
    Sounds like more based on its frequency...Anyway, The difference is smaller than that of SP performance
     
  9. Gipsel

    Veteran

    Joined:
    Jan 4, 2010
    Messages:
    1,620
    Likes Received:
    264
    Location:
    Hamburg, Germany
    Are you sure the smaller Fermis (GF104/114 and smaller) do 32bit integer multiplication at half rate as GF100/110 does (or it is the same as GCN's "@ DPFP rate")?
    And all other (simpler) 32bit integer operations are full rate starting with the HD4000 series anyway (and traditionally quite a bit faster than on nV GPUs). That's where some of the advantage for cryptographic stuff comes from (like bitcoin, the fast bit manipulating instructions of AMD GPUs also help of course).
     
  10. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
    Both 32-bit addition and bitwise op's are full-rate on AMD since Cayman.
     
  11. Gipsel

    Veteran

    Joined:
    Jan 4, 2010
    Messages:
    1,620
    Likes Received:
    264
    Location:
    Hamburg, Germany
    Since Wekiva aka Spartan aka Troy aka Makedon aka RV770 ;)

    AMD actually presented RV770 to have 12.5 times the bithsift performance of RV670. And I tested it, the HD4000 series has indeed full rate bitwise ops (and additions were already full rate even with R600 iirc).

    Edit:
    Cypress basically added fullrate 64bit bitshifts (only 32bit of the result can be written, but with the 3 source operands of the bitalign instruction, one can supply a 64bit source) or full rate 32bit rotates.
     
    #31 Gipsel, Dec 28, 2011
    Last edited by a moderator: Dec 28, 2011
  12. Acert93

    Acert93 Artist formerly known as Acert93
    Legend

    Joined:
    Dec 9, 2004
    Messages:
    7,782
    Likes Received:
    162
    Location:
    Seattle
    Thanks for the feedback folks. Much appreciated.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...