Performance evolution between GCN versions - Tahiti vs. Tonga vs. Polaris 10 at same clocks and CUs

Discussion in 'Architecture and Products' started by Alessio1989, Sep 18, 2016.

Tags:
  1. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
    With no patch control point shader and with no vertex shader GCN 1.1 tessellation performs quite well (HS body only + DS that takes SV_PrimitiveId and SV_Barycenrics as input). Primitive rate becomes the bottleneck. Tiny triangles also cause various bottlenecks with the rasterization pipeline (Polaris improved this).
    It is recording and reordering triangles to improve screen locality. The synergy with tile based compression (DCC & lossless depth compression) is clearly there. Screen locality also (trivially) improves render target cache hit rate. I am not expecting it to behave like PowerVR TBDR (= no overdraw), but it should be easily able to save 20%+ of render target bandwidth in common cases. Nvidia could also use slightly more complex DCC algorithms, as tiling should hide the DCC latency better and invoke DCC hardware less often. This gives further bandwidth gains.

    One case where the Nvidia tiling really helps is particle rendering (rgba16f output). Particles are most often 2 triangle quads. Nvidia can bin thousands of particles to tiles before rasterizing them. Particles close to each other spatially (from the same emitter) are likely also close in the triangle list, meaning that they get binned together. Particle effects (big explosion) close to the camera are the number one reason for big frame dips in games. One explosion is < 1000 particles = gets binned at once. So instead of hammering the memory bandwidth (read + write) with 100x full screen rgba16f overdraw (of the nearby explosion smoke particles), we get a single read + a single write. This is a huge saving.

    Good example of potential gains (this technique blends particles in LDS):
    http://www.slideshare.net/DevCentra...ndering-using-direct-compute-by-gareth-thomas
     
    #41 sebbbi, Sep 21, 2016
    Last edited: Sep 21, 2016
  2. lanek

    Veteran

    Joined:
    Mar 7, 2012
    Messages:
    2,469
    Likes Received:
    315
    Location:
    Switzerland
    No wonder why i still have my 7970's as main compute gpu's for raytracing, they are so solid and a good balance between power consumption and performance for computing.

    To note, thoses are the original 7970 reference ones ( not GHZ but flashed with Ghz bios), with over engineered and solid PWM.


    Agree completelly with that, something to retain is that Pascal gpu's run in the 1.8ghz mark, when Polaris run at way lower... The architectures and their respective performance differences are not so big, im pretty sure that a 480 will have not to pale against a 1080 if it was running at the same speed.

    Nvidia with Pascal have nearly the same number of SP than with previous generation, there's a different configuration of the SM, but at nearly the same shader counts you end with a difference in performance who seems not so far of the difference on clock speed between a 980 and a 1070, same things goes for the TitanX.

    I can compare with Firestrike scores ( DX11 ) :

    980 ( 2048 SP ) = 11'686
    1070 ( 1920SP )= 16'229

    score difference = 38,8%

    980 = 1216 mhz boost
    1070 = 1683 mhz boost

    38.4 % more core speed.
     
    #42 lanek, Sep 21, 2016
    Last edited: Sep 21, 2016
  3. seahawk

    Regular

    Joined:
    May 18, 2004
    Messages:
    511
    Likes Received:
    141
    But higher clocks are also a result of the design of the chip. Maybe NV is trading some mm² die size for higher clocks, maybe AMD would need to remove some SPs to achieve higher clocks within the same die size, but in the end both are making those decisions. Maybe GF is worse than TSMC, but AMD decided to move its production to GF. Maybe AMD´s architecture really owns NV under DX12 and Vulkan, but then they did bet on people moving to Win10 quickly and on software developers to scrap all their tried, tested and optimized engines, middlewares and tools to move to DX12 At least the last was always highly unlikely because software developers also have to meet release dates and anything coming out until 2018 will have to run good on DX11 and was most likely started on DX11.

    I am growing tired of AMD always being the victim.
     
  4. pharma

    Veteran Regular

    Joined:
    Mar 29, 2004
    Messages:
    2,928
    Likes Received:
    1,626
  5. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,435
    Likes Received:
    263
    There was an improvement with GCN 1.1 that helps in this situation, primarily by reducing the latency of the HS.
     
    Alexko, I.S.T., BRiT and 2 others like this.
  6. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
    Every time I hear some tidbits about issues that were fixed with GCN 1.1, I feel glad that the consoles are not GCN 1.0.
     
    I.S.T. likes this.
  7. Ethatron

    Regular Subscriber

    Joined:
    Jan 24, 2010
    Messages:
    859
    Likes Received:
    262
    Alessio1989, pharma and Razor1 like this.
  8. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY
    same as performance per watt, just broken out in a different way lol.
     
    ieldra likes this.
  9. MrFox

    MrFox Deludedly Fantastic
    Legend Veteran

    Joined:
    Jan 7, 2012
    Messages:
    5,446
    Likes Received:
    3,945
    It should be joules per frame:runaway:
     
    Alexko and ieldra like this.
  10. Esrever

    Regular Newcomer

    Joined:
    Feb 6, 2013
    Messages:
    594
    Likes Received:
    298
    calories per frame
     
    ieldra likes this.
  11. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY
    hmm bacon! wait now I'm hungry lol.
     
    ieldra likes this.
  12. sonen

    Newcomer

    Joined:
    Jul 13, 2012
    Messages:
    53
    Likes Received:
    33
    Or frames per joule, which would be equivalent to performance per watt. Sounds good to me - why call it broken?
     
    ieldra likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...