Performance evolution between GCN versions - Tahiti vs. Tonga vs. Polaris 10 at same clocks and CUs

Discussion in 'Architecture and Products' started by Alessio1989, Sep 18, 2016.

Tags:
  1. Alessio1989

    Regular Newcomer

    Joined:
    Jun 6, 2015
    Messages:
    582
    Likes Received:
    285
  2. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,797
    Likes Received:
    2,056
    Location:
    Germany
    Very interesting comparison - good to see it finally going live. I'm a bit sad though because from this set of tests, it seems like the overall improvements over Tonga are rather small judging from the gains achieved there.

    Maybe more future-looking workloads can help Polaris shine.
     
  3. seahawk

    Regular

    Joined:
    May 18, 2004
    Messages:
    511
    Likes Received:
    141
    How often have we said this when AMD launched an new GPU, maybe too often already.
     
    milk and I.S.T. like this.
  4. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,486
    Likes Received:
    397
    Location:
    Varna, Bulgaria
    Even the quadrupled L2 size with double the throughput still barely hints at any tangible performance benefit. And yes I know, GCN still relies on its proprietary global data share for syncing on top of the dedicated color/depth caches. The faster primitive occlusion and tessellation also doesn't show much contribution, despite the roomier L2 holding the spillovers to the global memory.
     
  5. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,853
    Likes Received:
    4,463
    Did the Tahiti vs. Tonga vs. Polaris comparison from computerbase.de go unnoticed in this thread? Their comparison shows impressive performance boosts between architectures, especially in gameworks titles where tessellation is very (ab)used:

    [​IMG]

    Between Tahiti and Polaris we see a full 35% performance boost out of the exact same theoretical compute throughput / fillrate and lower bandwidth.



    And how many times have said predictions failed to come true?
    Recent comparisons of Tahiti vs. GK104, Hawaii vs. GK110 and Hawaii vs. GM204 are pretty much self explanatory right now.
     
    no-X likes this.
  6. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,853
    Likes Received:
    4,463
    I figured it's time to start deriving from the other mess of a thread and this is an interesting subject.
    Computerbase.de has made a comparison between a Tahiti, a Tonga and a Polaris 10 GPUs, all with the same number of CUs enabled and same clocks.

    https://www.computerbase.de/2016-08...ormance/#abschnitt_gcn_1_3_und_4_im_vergleich


    This is basically GCN1 vs. GCN3 vs. GCN 4, in the form of a R9 280X, a R9 380X and a RX 470.

    In some games the difference is pretty negligible (e.g. Dirt Rally, Thalos Principle) but in others, the performance boost is pretty huge:

    [​IMG]


    In general, the games that have gotten the largest boosts from the architectural improvements are the gameworks games, which tend to push geometry as far as maxwell cards can do.
    But there are games like Ashes of the Singularity who have sizable performance boosts too. Maybe the support for a larger number of compute queues in GCN3 and the HWS units in GCN4 are making a difference when async compute is being used.
     
    RootKit and iroboto like this.
  7. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    7,788
    Likes Received:
    6,079
    Did these all arrive at the same price points when entering the market?
    Also, in today's market, are they competitive in price points -- err I mean if you were to buy them new?
     
  8. kalelovil

    Regular

    Joined:
    Sep 8, 2011
    Messages:
    555
    Likes Received:
    93
    I wonder what their source is for Tonga having a 512kB L2 cache. AMD never published this information at the time of Tonga's release.
    Fiji had 2MB, and was in most respects double Tonga (apart from shader engines, within which the same 4 they doubled CUs instead).
     
  9. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,486
    Likes Received:
    397
    Location:
    Varna, Bulgaria
    It's a good guess, I think. Hawaii packs 1MB (8 memory controllers, 128kB partition), Tahiti came with 768kB (6 controllers), so it's logical that Tonga keeps the same amount of L2 SRAM per partition.
     
  10. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,853
    Likes Received:
    4,463
    Both Tahiti and Tonga went through different PCBs, core clocks, memory clocks, etc. during their lifetimes.
    Moreover, I think none of the cards in the article are running the clocks they were shipped with, so such a comparison wouldn't make much sense.
    The reviewers clocked all cards the same in order to get the same compute and fillrate throughputs, in an effort to compare the architectures and not their position in the market.



    That said, I was hoping for this thread to be about discussing what architectural improvements between GCN1-4 have resulted in substantial differences such as the ones we're seeing with Witcher 3 and Ages of the Singularity.
     
  11. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    7,788
    Likes Received:
    6,079
    Lol no worries. I do appreciate the OT. I was just wondering how they were priced relatively to each other. This is a good showcase of architecture.


    Sent from my iPhone using Tapatalk
     
  12. willardjuice

    willardjuice super willyjuice
    Moderator Veteran Alpha Subscriber

    Joined:
    May 14, 2005
    Messages:
    1,373
    Likes Received:
    242
    Location:
    NY
    Well isa wise nothing changed between tonga and polaris, just the front end. So in games where there's a large difference between the two a reasonable guess might be "tahiti/tonga/etc. were severely bottlenecked by triangle processing in this part of the game whereas polaris was not". GCN in generally was (is?) weak in this area compared to the competition (both nvidia and intel).

    But graphics is a complicated business, hard to say with any level of certainty without some profiling. This would be my offhand guess.
     
    I.S.T. likes this.
  13. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,853
    Likes Received:
    4,463
    Here's another comparison that popped up today (credit to @Alessio1989 )

    http://www.bitsandchips.it/9-hardware/7334-tonga-vs-polaris-sfida-clock-to-clock


    So you're suggesting the average 7% better performance-per-clock between Tonga and Polaris is coming from the increased L2 cache alone?
    Witcher 3 and Metal Gear Solid V are showing a whopping 15% difference, though those are both gameworks titles.


    Regardless, it really does seem that most of the architectural improvements happened during the 28nm generations and Polaris' changes may just be all about the new node and specific power improvements.
    The OpenCL driver does call Tahiti a "SI" (below Bonaire+Hawaii GFX7), Tonga a GFX8 and Polaris a GFX8.1.



    Interesting to see how each of the architecture step took its own geometry performance improvement, up until Polaris 10 which seems to be evened out with GP106.
    Though I don' know if that's either good or depressing (or both). On one hand, the newer cards aren't hurting as much with gameworks titles. On the other, AMD is spending R&D resources to counter gameworks in their own hardware. This has to be frustrating as all multiplatform games are console ports coming from a more compute-centric GCN1/2 architecture in the first place.
     
  14. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,012
    Likes Received:
    112
    The "good guess" isn't good enough. 1MB would be just as good a guess (why not say it's doubled Bonaire instead of Hawaii derivative). 512kB, 1MB, 768kB, 1.5MB (though supposedly for the latter two options part of it would be deactivated) are all options I've seen mentioned somewhere for Tonga - all are good guesses but only one is right...
    I don't think the performance difference would be all that big though in any case - surely if there'd be a 3% performance difference AMD would have increased it earlier (the die size the L2 uses should still be tiny).
    If you'd really wanted to know, I suppose some directed compute tests could reveal it.
     
  15. willardjuice

    willardjuice super willyjuice
    Moderator Veteran Alpha Subscriber

    Joined:
    May 14, 2005
    Messages:
    1,373
    Likes Received:
    242
    Location:
    NY
    I'm suggesting the isa between tonga and polaris is the same, nothing more! I believe this can be verified with codexl (although it seems like you figured this out on your own). It's nearly impossible to connect performance differences (especially when they are in the single digit % range) to a single block of hardware. I wasn't kidding when I said graphics is complicated! You're right though, in general not a whole lot changed with polaris. Perhaps there are some interesting chips on the horizon...

    Off-topic but I think you should consider the possibility that developers (even those working on gameworks titles) don't create rendering pipelines that purposely cater to one ihv at the detriment of another ihv. I promise there's no hidden agenda among developers (at least none that I've encountered). The notion that amd is "wasting" r&d money on geometry processing is kind of crazy. There's nothing wrong with processing triangles faster! I don't think it's unreasonable to say amd overshot compute a bit on gcn just like nvidia overshot compute a tad with fermi. I view gcn's "rebalancing" as a reflection of reality and not a reaction to gameworks (let's be real, it's doubtful amd's engineers even knew of gamework's existence when designing these revisions).
     
    Exophase, Kej and Razor1 like this.
  16. function

    function None functional
    Legend Veteran

    Joined:
    Mar 27, 2003
    Messages:
    5,135
    Likes Received:
    2,248
    Location:
    Wrong thread
    Pretty sure GCN experienced developers like sebbbi have said that AMD's geometry engines needed improving. It seems very unlikely that they've done this to counter Gameworks, and more that it's a side of the pipeline that was bottlenecking them, and so they improved it.

    Up next are expected to be the ROPs, where nvidia are now ahead of them. As well as becoming more BW and power efficient, they would also appear to need more of them. They have been limited to a maximum of 16 per shader engine.
     
    Razor1 likes this.
  17. seahawk

    Regular

    Joined:
    May 18, 2004
    Messages:
    511
    Likes Received:
    141
    Outside the things AMD once said nobody needed (more geometry and tesselation power), I think much of the improvements come from more Cache and better colour compression, so more effective bandwith. It is quite interesting that the biggest gain comes from improved geometry power in the CB tests. I am more and more inclined to think that CGN was not a good architecture for DX11.
     
  18. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    AMD's performance per ROP (and per unit of fillrate) is far ahead of NVidia, in games.

    AMD's real problem is not doing tile-binned rasterisation. Doing that would have the side effect of "making the ROPs more efficient", but in truth wouldn't make any difference. Not doing work on a triangle you know will be overwritten is a win.

    The irony is that with clustered geometry/occlusion algorithms, the need to do tile-binned rasterisation disappears. AMD could re-architect for this just in time for all the advanced engines to do a better job themselves.
     
    Lightman likes this.
  19. function

    function None functional
    Legend Veteran

    Joined:
    Mar 27, 2003
    Messages:
    5,135
    Likes Received:
    2,248
    Location:
    Wrong thread
    Perhaps, but their absolute performance is frequently behind and having more performance per unit of fillrate is scant consolation if it's bottlenecking other areas of your system. Which, it appears, might be happening with Polaris and certainly was something that happened with Fiji.

    One thing I've realised about the PC gaming market is that it's never to late to add an optimisation that you need.

    The exciting new way of doing things is always three years later in gaining mass adoptance than you hoped it will be ...
     
    I.S.T. likes this.
  20. function

    function None functional
    Legend Veteran

    Joined:
    Mar 27, 2003
    Messages:
    5,135
    Likes Received:
    2,248
    Location:
    Wrong thread
    It's been an absolute belter in consoles though, where games really lean on compute / async compute, and "Vulkan Doom" style results will be far more common.

    Probably better for DX11 than Kepler too, tbh.
     
    RootKit likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...