DX12 Performance Discussion And Analysis Thread

Discussion in 'Rendering Technology and APIs' started by A1xLLcqAgt0qc2RyMz0y, Jul 29, 2015.

  1. I.S.T.

    Veteran

    Joined:
    Feb 21, 2004
    Messages:
    3,174
    Likes Received:
    389
    :lol:
     
  2. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    7,583
    Likes Received:
    703
    Location:
    Guess...
    I'm glad you picked that up because those results just confused me.
     
  3. Ext3h

    Regular Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    337
    Likes Received:
    294
    Looks like Nvidia got heavily CPU limited in the Fable benchmark this time. Even at 4k.

    And no, the game doesn't really make proper use of async compute at all. Only about 5% (time wise) of the workload has been offloaded to a dedicated compute queue. I've seen the GPUView dumps of Nvidia and AMD runs. No draw call overload, backpressure only in the graphics queue and no more than a single compute command every few graphic batches, only copy commands where ever issued asynchronously.

    So it looks essentially the same as it would have with DX11, a perfectly safe, well optimized techdemo, where the only DX12 benefit left is the reduced driver overhead. And even that isn't true for Nvidia.
     
    drSeehas likes this.
  4. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,853
    Likes Received:
    4,463
    A great start would be this post from Ext3h.


    It's running asynchronously where it's supported. "Async Compute" isn't a mandatory DX12 "flag".


    "Async Compute" is the ability to start rendering and compute tasks at the same time, throughout the ALUs. If it's not running concurrently, there's no "Async Compute" happening.


    What the hell does this even mean?! I was just plain and simple called "obtuse" a couple of posts ago and I'm the one needing to pay attention to my words?! Is dogpiling a thing now on B3D?!


    You could almost say it's a DX12 implementation tailored for nVidia GPUs, then...
    Not that I expected any less from Tim Sweeney, though. :(
     
    digitalwanderer likes this.
  5. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    Which reviews show Nvidia being CPU-limited at 4K? There seems to be evidence to the contrary, since Anandtech's results generally show no sensitivity to CPU choice until 720p, and Techreport's factory-overclocked 980 Ti is demonstrably faster relative to Fury than other reviews with stock cards.
     
  6. Ext3h

    Regular Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    337
    Likes Received:
    294
    Extremetech accidentally managed to throttle the CPU at 1.7Ghz by choosing the wrong power profile and that resulted in Fury X outranking the 980 Ti even at 4k and 1080p. On 720p, the Fury X took only a 2% performance hit from reduced clock speed. 980 Ti lost about 30%.

    Not fair, I know. And not intended either. But still surprising.

    Bear in mind that Extremetech was also using a Haswell-E CPU with 20MB L3 cache, so that thing is a beast when it comes to hiding CPU related latencies as it suffers from virtually no L3 cache misses at all. When they increased the clock speed, that was probably also the reason why they had the only 720p run where the 980 Ti could actually beat the Fury X.
     
  7. pharma

    Veteran Regular

    Joined:
    Mar 29, 2004
    Messages:
    2,911
    Likes Received:
    1,608
    I wouldn't put to much into the ExtremeTech review ....
     
  8. Ext3h

    Regular Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    337
    Likes Received:
    294
    Me neither. I know its faulty.

    But it yielded some nice evidence on the 720p run. They got the 980 Ti to perform both worse than everyone else, and better than everyone else. The almost 160FPS in 720p with a stock clock 980 Ti are just as surprising.


    But also bear in mind, that MS demanded that Fable should only be tested in 3 profiles: 1080p and 4k in full details, 720p with minimum details. So the 720p run may not be representative at all, nobody knows what got changed in that run.
     
  9. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    So the 980 Ti is CPU-limited when the CPU is massively downclocked and the resolution is at 720p.

    Where should I be looking for the rankings changing at 4K between the 980 Ti and Fury X?
     
  10. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    I must say I am not a fan of comparing stock NVIDIA to stock AMD as their sales model/channel seems to be a bit different where NVIDIA provides greater flexibility for their partners to differentiate from the reference design in terms of performance based upon noise, heat design, and importantly greater clocking capability; ExtremeTech used a stock reference 980/980ti and lets be honest only very early technology adopters should have these as they are not as good as the slightly later AIB manufacturers.
    Still not ideal but maybe they should use one or two manufacturer brands that design both AMD and NVIDIA, say ASUS and MSI - still not ideal but at least it is meant to be closer to optimum design for both without being at the very extreme.

    I am shocked they reported performance from AMD PR directly for the 390 and 380, ironically that performance would put the 390 around the Nano at pcgameshardware.de: http://www.pcgameshardware.de/DirectX-12-Software-255525/Specials/Spiele-Benchmark-1172196/
    Still the 390x is looking good in all tests so far from various sites.

    Cheers
     
  11. Ext3h

    Regular Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    337
    Likes Received:
    294
    I don't know. They are not online any more. Maybe they were just a fluke. Now both graphs show them ranked evenly, and oddly enough, both seemed to have received a performance boost on 1080p, which indicates some common CPU limits. Perhaps particle physics.


    But the CPU limit isn't only there when downclocked. It only became obvious. It's even there at regular clock on a i7-4960X (Anandtech). Still, only 720p, sure, but it is there. Only an entirely oversized i7-5960X (costing twice as much as the GPU) could leverage the CPU limit far enough to the let the 980 Ti outperform the Fury X.

    While AMD for once did not have a CPU limit at all, at that resolution.

    Draw your own conclusions.
     
    drSeehas likes this.
  12. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,853
    Likes Received:
    4,463
    Agreed. Specially for the GTX 980, stock clock models are incredibly hard to find nowadays.
     
  13. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    If it's not obvious, there's little justification in saying Nvidia is limited by it. There's no reason to state that an item whose influence is a second-order effect compared to a more dominant bottleneck cannot have some impact.
    If an artificial case of a downclock to a specific and non-representative speed is sufficient to indict one vendor, I have bad news for both when I require a downclock to 1 MHz.

    That makes it applicable to a claim of being CPU-limited at that resolution, although given the vast gulf in capability between an i7 and an i3, saying it is CPU-limited may not be fully accurate without more elaboration.

    AMD's performance was sensitive to changes in CPU choice, just not in a manner that was intuitive.

    One vendor has a higher CPU dependency, although in absolute terms it requires a significant drop in CPU performance to make it clear.
     
  14. pharma

    Veteran Regular

    Joined:
    Mar 29, 2004
    Messages:
    2,911
    Likes Received:
    1,608
    Thought this was relevant to the thread topic:
    [​IMG]
     
  15. Genotypical

    Newcomer

    Joined:
    Sep 25, 2015
    Messages:
    38
    Likes Received:
    11
    Is this true? That's bad if so. It would mean they left the real benefits for the xbox one and took it down a notch for PC.

    Nvidia....

    What are the chances AMD can have their driver force compute shaders to be run asynchronously concurrently...
     
  16. dogen

    Regular Newcomer

    Joined:
    Oct 27, 2014
    Messages:
    335
    Likes Received:
    259
    Don't jump to conclusions...
     
  17. Genotypical

    Newcomer

    Joined:
    Sep 25, 2015
    Messages:
    38
    Likes Received:
    11
    Someone mentioned that extremetech results were provided by AMD. I want to provide the full context of the quote the person made for clarity. It doesn't make as much sense for a 390 to beat a stock 980 without good usage of async. Not what we would expect but it seems to be the case based on the below.

    Highlights that benchmark results need more information than just the name of the GPU. Frequencies should be mentioned at least. This is one of those really annoying things about getting data from benchmarks.
     
    drSeehas likes this.
  18. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    I don't recall seeing a DX11 vs DX12 comparison, so saying that there is no reduction in driver overhead for Nvidia is a dubious assertion.
    There's no requirement that implementations become magically equal with DX12.

    If, and this is an if, the explicitly listed compute category is asynchronous compute, we see the overall contribution it makes to frame time. It could go to zero ms and the overall picture would only change a little.
    That's only so much of the ~33ms frame time.
    Even Ashes of the Singularity was noted after the kerfluffle started to not seriously push the envelope there, either.


    There are possibly hundreds of reasons why things could go one way or the other.
    For one thing, numbers provided by AMD purporting a lead for a card that is not reflected by reviews actually does make sense, in view of what has already happened.
     
  19. Genotypical

    Newcomer

    Joined:
    Sep 25, 2015
    Messages:
    38
    Likes Received:
    11
    I think the ET article explains it well. The figures provided by AMD are supported by a review (actually apparently just one review looked at the 390). AMDs result was actually more favorable than ETs own result for the 980. Taking into account Clock differences you get your explanation. If the question is what version of the 980 should be used... who knows. Use the reference? Use the fastest? OC it to 2GHz?

    From what I have seen it doesn't seem like the game is making much use of asynchronous compute but I'll have to read more. I am seeing claims that lionhead has not ported it over from the xbox one yet. They never did demonstrate it on PC even though they had a demonstration on a 980 in the past that showed other dx12 features.

    This benchmark might not belong here.
     
  20. huebie

    Newcomer

    Joined:
    Apr 10, 2012
    Messages:
    29
    Likes Received:
    5
    The Work Distributor in Kepler can communicate in both directions but i don't know at which protocol level (in other words: there may be a very limited backward communication). Furthermore i'm pretty sure it's not a ARM core.

    Edit: Fermi's was not bidirectional. So there is a change since Kepler for all the series above (e.g. GTX 680, 750, 960 and so on). Maybe discribed via CC 3.0
     
    Ext3h likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...