The discrepancy seems too significant to not have been noticed in some other context. The PS4 at least uses asynchronous compute, and if this scaling behavior were universal I think it would have been remarked upon. The latest version actually cut the compute load per kernel, and I am pretty sure they can generate more dispatches in a frame than what is being done here.I'm struggling to see how NVidia is failing by any sensible metric when Graphics + compute completes in 92ms on GTX980Ti and 444ms on Fury X. Or compute only which is 76 versus 468ms. AMD, whatever it's doing, is just broken.
Or maybe Fiji is just spending 25.9ms sleeping, then waking up momentarily to execute a kernel that should take about 8 microseconds.
It does seem like the trend line for Nvidia really flattened in the one case where we know it was off.3dilettante: wouldn't it be interesting if active TDR is slowing down these tests...
Asynchronous compute versus forced-synchronous.Would someone please explain what the differences between the first and the second test are?[/QUO
It's the same workload, and not one that is trying something too exotic.So far it's been discovered that different workload types perform differently on different GPU architectures
"Usage" is hard to define, because it's likely to be a number that measures multiple parts of the GPU to determine load. It's also likely to be measuring multiple points across some parts, e.g. the load on each shader engine (there's 4).GPU usage on Fury-X is odd.
Compute : ~10% all the time
Graphics only: 40%
Graphics + compute: ~10% all the time
Graphics, compute single commandlist: 80-90% (usage seems stable in afterburner, does spike like in many pictures; highest recorded usage by afterburner 91%)
The first test and the latest test have different parameters - they are two different scheduler/simultaneous command benchmarks. It isn't the point that they are technically pushing the same type of work through, it's that they are different in behavior of handling said work.It's the same workload, and not one that is trying something too exotic.
Thanks, did this. Was then able to finish without driver crash.
Multiple threads issuing work to the GPU?The discrepancy seems too significant to not have been noticed in some other context. The PS4 at least uses asynchronous compute, and if this scaling behavior were universal I think it would have been remarked upon. The latest version actually cut the compute load per kernel, and I am pretty sure they can generate more dispatches in a frame than what is being done here.
I wonder what is being missed here.
It may well do, but I don't see how this test is illuminating in that regard. The numbers we're seeing are orders of magnitude wrong.Some of the patches for HSA include discussion of the generation of run lists that the schedulers use to pick kernels to lanch, does this need an intermediate layer of analysis for DX12 for the ACEs?
If there's 8 ACEs and all the work in this test is going to a single ACE, maybe there's a problem getting work from a single ACE to the entire GPU? It doesn't seem logical to me, though, to have an architecture with this limitation, since surely this is the most common case.The AMD GPU with the simplest front end seems to have the least difficulty with the increasing kernel count, is this because it takes less work to add to queues if there's less of a problem space to analyze?
http://tools69.com/benchmarked-ashes-of-the-singularity/Right now, we can see that DX12 definitely makes a difference in performance, giving the game developers a lot more power. But with great power comes great responsibility, and some developers may not be able to handle DX12, at least not without more time and effort.
The next fight is shaping up to be Lionhead’s Fable Legends, and that will perhaps be a more neutral battleground as it’s neither an AMD nor an Nvidia title. In fact, it appears Microsoft (who owns Lionhead) is determined to put forth a message that DX12 is unified. Microsoft doesn’t want DX12 to appear as a fractured landscape, one where AMD or Nvidia rules, a place where processor graphics gets left in the dust. In that sense, Fable should be the most likely vendor-agnostic approach to DX12 we’re going to see in the near term. We’re certainly looking forward to testing it, though it may be a few months.
Ultimately, no matter what AMD, Microsoft, or Nvidia might say, there’s another important fact to consider. DX11 (and DX10/DX9) are not going away; the big developers have the resources to do low-level programming with DX12 to improve performance. Independent developers and smaller outfits are not going to be as enamored with putting in more work on the engine if it just takes time away from making a great game.
Benchmarked: Ashes of the Singularity
Sept. 1, 2015
http://tools69.com/benchmarked-ashes-of-the-singularity/
Yes, the article says DX12 may require manual optimizations to get the highest GPU performance possible and is something only Big developers will have resources for. Smaller developers won't have the resouces to spend alot of time optimizing and may focus more on getting the product to market and will spend less time writing GPU optimizations ... maybe an engine like the Unreal might be ideal for a small developer, and there are a few others.Big developers have the capability to make their own engines, small developers use engines already made. Since most of them are moving to DX12 support, it's a matter of how optimized the engines are for either architectures and the effects they support, no?
Maxwell cards are now also crashing out of the benchmark as they spend >3000ms trying to compute one of the workloads.