DX12 Performance Discussion And Analysis Thread

Discussion in 'Rendering Technology and APIs' started by A1xLLcqAgt0qc2RyMz0y, Jul 29, 2015.

  1. Alessio1989

    Regular

    Joined:
    Jun 6, 2015
    Messages:
    614
    Likes Received:
    321
    I suspect that Skylake would get similar performance boost in that benchmark, not so sure about Gen 8 and especially Gen 7.5 iGPUs.
     
    cal_guy likes this.
  2. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,296
    Location:
    Helsinki, Finland
    Why waste 16 CPU cores to push draw calls (and determine visibility)? You can instead do the culling on GPU and perform a few ExecuteIndirects to draw the whole scene, saving you 15.9 CPU cores for tasks that are better suited for CPU :)
     
    Alessio1989, chris1515 and liolio like this.
  3. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    How much throttling due to power/thermals are we seeing in these tests? CPU or GPU?
     
  4. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    Isn't the game just filling the CPU with AI/physics calculations, leaving a minimal amount of CPU time for graphics?
     
  5. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,235
    Likes Received:
    4,259
    Location:
    Guess...
    From the article:

     
  6. Infinisearch

    Veteran

    Joined:
    Jul 22, 2004
    Messages:
    779
    Likes Received:
    146
    Location:
    USA
    Sebbbi quick question regarding D3D12 and explicit multi-adapter... can you use one GPU to do culling and indirectexecution buffer generation for another GPU?
     
  7. oscarbg

    Newcomer

    Joined:
    Sep 2, 2009
    Messages:
    35
    Likes Received:
    13
    +1 to test Intel GPUs with Ashes specially someone with Skylake HD 530..
    also interested to see 3DMark API overhead D3D12 test on new Gen9 Intel HD 530.. but can't find any review with that results.. anyone?
     
  8. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,296
    Location:
    Helsinki, Finland
    In our case the visible cluster buffer is just a regular append buffer. The culler appends a single 32 bit integer to the buffer for each visible cluster (cluster = 64 vertices). You can copy this append buffer from one GPU to other just like any resource. So you could do the culling on integrated GPU and rendering on discrete.
     
  9. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,296
    Location:
    Helsinki, Finland
    I read somewhere that Ashes doesn't work yet with HD 530. It would be interesting to see the results, especially on a GT4e laptop with limited TDP. Current desktop Skylakes with low end GT2 graphics should be 100% GPU bound. DX12 shouldn't improve things much, unless Ashes uses async compute or some other new DX12 features that improve GPU utilization.
     
  10. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,235
    Likes Received:
    4,259
    Location:
    Guess...
    It was in the arstechnica review.

    In the artechnica review the scores are pretty much identical across the board whether they use a 4 core without HT Haswell or a 6 core with HT Haswell so that suggests that even on a 980Ti and 290x, whether they are using DX12 or DX11, the game is GPU bound - unless it can't take advantage of more than 4 cores that is.

    But the crazy thing is, the 290x still gets a huge leap when using DX12 as opposed to DX11. So that suggests a GPU limitation is being freed up by DX12. As you say, maybe async compute? I find it hard to imagine that GCN would get such a huge boost from async compute though compared to Maxwell but perhaps it's a bug with Maxwells implementation (driver or hardware)?
     
  11. AlexV

    AlexV Heteroscedasticitate
    Moderator Veteran

    Joined:
    Mar 15, 2005
    Messages:
    2,535
    Likes Received:
    144
    Have we considered more trivial explanations ahead of the fancier ones? I.e., if we compare pound for pound the 290x and the 980Ti (i.e. spec-per-spec), it appears that moving to DX12, in Ashes, allows the former to perform more in line with (some of) its theoreticals (e.g. slightly less ALU, slightly more BW etc.).
     
  12. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,296
    Location:
    Helsinki, Finland
    I don't know anything about Maxwell's async compute implementation, but I know that GCN gets huge benefits from it. It is too early to speculate, since we don't even know whether Ashes of Singularity uses asynch compute or not. If they use async compute, it might be that AMD is the only vendor that has implemented it in the drivers currently.

    If I had time, I would write a DX12 microbenchmark at home (to see how well all the DX12 GPUs perform async compute, ExecuteIndirect and other new features)... but we have a newborn baby at home, taking all my free time :)
     
  13. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,235
    Likes Received:
    4,259
    Location:
    Guess...
    That would certainly be one hell of a result for AMD and probably benefit PC gaming as a whole given the increased competition is would bring. I can't say I'm particularly optimistic about that option though, especially as we didn't see similar "unleashing of potential" with Mantle which you would assume would be even more likely to achieve that result.
     
  14. AlexV

    AlexV Heteroscedasticitate
    Moderator Veteran

    Joined:
    Mar 15, 2005
    Messages:
    2,535
    Likes Received:
    144
    I would not necessarily take the Mantle experiments to mean much (now that we've gotten over the "it's going to change the world" phase). Granted, I'd like to underline that drawing many conclusions from Ashes is, IMHO, unwise, as it's still rather early days. Having said that, it does not appear to me that simply performing somewhat closer to what hardware specifications would suggest is such an outworldly win. I also would not necessarily take it as a strong indication of the future, as there's room in DX12 for unmatcheable investments in ones driver and developer outreach to act as the key differentiator.
     
  15. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    PC Perspective's results show little difference for Intel when going from 4 to 8 cores, although it does so by mixing CPU architectures.
    It takes an i3 or an AMD chip to tank the throughput. Within AMD's CPU range, 6 to 8 cores is not a major performance change.

    As far as GPU bound?
    A= 980
    B= 980 Ti
    C= 290
    D= 390
    E = Fury X

    B to E
    http://www.extremetech.com/gaming/2...-singularity-amd-and-nvidia-go-head-to-head/2
    A to D
    http://www.pcper.com/reviews/Graphi...ted-Ashes-Singularity-Benchmark/Results-Avera
    C to B
    http://arstechnica.com/gaming/2015/...ly-win-for-amd-and-disappointment-for-nvidia/

    There are two notable performance tiers for each IHV.

    So, given the lovely way the tech press, the IHVs, and Oxide have handled a very immature platform:
    A is on par to D
    B is on par to E
    C is on par to B

    I don't have a direct comparison that can fully close the loop. There are frame numbers given that give rough equivalences, although that is risky given how noisy the set is.
    However, even without the numbers, if we assume that the 290 <=390 and the 980 is <= the 980 Ti.
    The 290 is bracketed as being less than or equal to the 390, yet on par with the the 980 Ti. That means there's no room for the < and everything is coming out the same. Either these chips are all the same or we have plenty of room for crap in these results.
    It is not showing a clear sign we're getting what we should out of these GPUs or the preview methods.

    The most clear constant among the previews is that AMD's DX11 implementation is inferior.
    To me, this looks like one of the least crazy things about how the performance testing has been handled across all these sites.
     
  16. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    There's going to be an element of baseline coding/targeting that's favourable to the architectures that are featured in consoles, given consoles already employ a software model akin to this and one will pretty much get exactly this.
     
  17. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
    Razor1 likes this.
  18. lanek

    Veteran

    Joined:
    Mar 7, 2012
    Messages:
    2,469
    Likes Received:
    315
    Location:
    Switzerland
    Thats 100% sure.. But at the same times, i tend to believe that outside some rare cases, this should not be much the case.
     
    #78 lanek, Aug 24, 2015
    Last edited: Aug 24, 2015
    Razor1 likes this.
  19. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY

    Yes this was and probably will always be the case even in the future, different architectures have affinities to how code is written. This is why gameworks and Tress fx will always work better on the respective IHV's hardware, unless the developer helps with the opposite IHV's paths. DX12 doesn't solve this, no API will really.
     
    pharma likes this.
  20. gamervivek

    Regular

    Joined:
    Sep 13, 2008
    Messages:
    805
    Likes Received:
    320
    Location:
    india
    The first mover thinks that ROPs control the frontend and tessellation performance and that since Fiji hasn't improved it over Hawaii is the reason why it is not doing that much better.

    http://www.overclock.net/t/1569897/...singularity-dx12-benchmarks/400#post_24321843

    And his 'analysis' of the hardware that has only come into prominence now is supposedly all the rage right now.

    The antagonist who has been posted above doesn't know that CUDA miner was there before OpenCL for AMD.

    The whole thing was a bit funny like all ocn threads turn into before it was being plastered everywhere. :-|
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...