No DX12 Software is Suitable for Benchmarking *spawn*

Why can Vega reach in 720p 350 fps and titan XP only 240 fps?
Because Vega uses an i7 6850, While TitanXp uses a Ryzen 1700, clearly the superior single threaded performance of the 6850 is helping here since 720p is a CPU exercise.
And also why does Vega fall behind titan XP at 4k? Workload distribution?
Nope, because it's the slower card when the GPU is the focus of the test.
 
You may want to read the review and contextualise it all. The xb1 is running on 6 Jaguar cores, running at 1,75 Ghz.

Minor detail, but it's actually 7 cores.

The Computerbase review is running 6 cores / 12 thread intel CPU at 4,3 Ghz. Not only does this CPU beat jaguar clock for clock, it has 6 more available threads and is clocked more than 2x higher. It not achieving 2x the CPU performance in a CPU limited scenario on NV hardware (or even ony AMD hardware for that matter), is frankly embarassing for a DX12 engine. A DX12 engine from an MS first party dev.

X1 has the advantages of a more highly optimised driver, lower level optimisation of software and additional hardware.

X1 command processor can save a significant amount of cpu work if it's actually being used (and if anyone is actually using it I'd think it's Turn 10).
 
Because Vega uses an i7 6850, While TitanXp uses a Ryzen 1700, clearly the superior single threaded performance of the 6850 is helping here since 720p is a CPU exercise

6850 single thread performance is nearly 50% higher?

Isn't it more likely that some parts of Vega are faster, and this shows at lower resolutions?
 
6850 single thread performance is nearly 50% higher?
Isn't it more likely that some parts of Vega are faster, and this shows at lower resolutions?
No, probably not. But neither is Vega 42 % faster, as is evidenced by the fact of UHD results which have spawned since yesterday.
 
No, probably not. But neither is Vega 42 % faster, as is evidenced by the fact of UHD results which have spawned since yesterday.

Bottlenecks can moved around as you change resolution - at low resolutions you might be limited by how fast you can process geometry, at much higher resolutions you might be limited by pixel shading.

It's typically been AMD that gained relative performance as resolution increased, but perhaps Vega has brought some changes to the way AMD handle some parts of the pipeline that factor into this?
 
Yet, the i7-6800K also has 50% more threads at it's disposal, since my colleague seems to have teste one machine with and one without SMT (if that's not a typo).

So, what's more likely: 12 BDW-E threads being ~50% faster than 8 Ryzen-threads in an engine that's known for innovative programming methods and exploiting multiple CPU threads or yet unheard of geometric power for Vega while faltering at higher resolutions due to pixel shading limitations in an engine that abundantly uses compute for many effects.
 
Yet, the i7-6800K also has 50% more threads at it's disposal, since my colleague seems to have teste one machine with and one without SMT (if that's not a typo).

So, what's more likely: 12 BDW-E threads being ~50% faster than 8 Ryzen-threads in an engine that's known for innovative programming methods and exploiting multiple CPU threads or yet unheard of geometric power for Vega while faltering at higher resolutions due to pixel shading limitations in an engine that abundantly uses compute for many effects.

DavidGraham said it was clearly the higher single threaded performance factoring in the result, which I want so sure about. But neither am I sure that the polar opposite - having 50% more threads - explains being nearly 50% faster.

The Intel chip has 50% more threads, but the AMD one has 33% more cores.

You're talking about roughly twice the performance per core from the Intel chip than the AMD one (and for a game seemingly designed around two four core, four thread clusters for that matter).

That would seem a far more outrageously divergent performance result than a graphics card or graphics driver related issue.
 
That would seem a far more outrageously divergent performance result than a graphics card or graphics driver related issue.
It's not unprecedented, several games favor Intel's superior single threaded performance heavily. For example, here is the 7700K having 50% more fps than Ryzen 1700 in Destiny 2 @1080p, the margin will be even bigger @720p. This could be a similar situation, add to that disabled SMT, and you have a far more superior i7 6800K that accounts for the difference.

destiny2-cpu-bench-1080p-highest.png

https://www.gamersnexus.net/game-bench/3038-destiny-2-beta-cpu-benchmarks-testing-research

Also GameGPU numbers are up, 1080Ti is 40% faster than Vega 64 @1080p in SWBF 2.
http://gamegpu.com/action-/-fps-/-tps/star-wars-battlefront-ii-beta-test-gpu-cpu

It might also be that Vega can sustain higher clockspeeds at lower resolutions, I've seen 1.5Ghz drop to 1.3Ghz in GTA going from 1080p to 4k
Sustaining 200MHz higher clocks won't account for a 45% difference, a CPU bottleneck will.
 
You're talking about roughly twice the performance per core from the Intel chip than the AMD one (and for a game seemingly designed around two four core, four thread clusters for that matter).
CPU usage wasn't recorded, so it's not safe to say that the game actually used consistently that many cores, or that the multi threading even applies to the render loop and not only to an decoupled game loop.

However there is a good chance that the code in the render loop was simply tuned towards the Broadwell/Haswell family. We had this issue already a couple of times when Ryzen was just released, that tuning attempts for better performance on Sandy Bridge to Broadwell had actually a negative impact on Ryzen performance. Most prominent with non-temporal stores, but it's even enough if the compiler has just missjudged any architecture dependent instruction latencies.

And such misguided optimization can easily make up for an apparent 50% single thread performance boost.
 
Further analysis of GameGPU numbers, since they include DX12 scores at various resolutions.

@1080p DX12, all cards lose fp.
@1440p DX12, NVIDIA cards lose fps, but AMD cards gain 2 or 3 fps, FuryX gains 10fps!
@2160p DX12, NVIDIA cards lose 2 fps, AMD gains 3 fps, but FuryX loses 200% fps! and becomes unplayable, likely a VRAM limitation that isn't present in DX11.

All in all, awful DX12 implementation, it demands more VRAM, hurts fps, and introduces awful fps pacing that makes the experience far worse than DX11, the situation is mirrored exactly in FIFA 18 and Battlefield 1, it appears the Frostbite engine suffers a broken DX12 path.

http://gamegpu.com/action-/-fps-/-tps/star-wars-battlefront-ii-beta-test-gpu-cpu
 
DSO gaming benched Forza 7, their RX580 scored better than their GTX 980Ti. Reason? GPU utilization is horrendous on NVIDIA cards, their 980Ti barely pushed 70% utilization. Something I can confirm with my GTX 1070.

Our Radeon RX580 was able to surpass the NVIDIA GTX980Ti, something that really amazed us. While the GTX980Ti was able to push an average of 80fps on Ultra settings at 1080p, the Radeon RX580 offered an average of 103fps. It’s also worth noting that GPU usage was higher on AMD’s hardware. We don’t know what is causing the underwhelming performance on NVIDIA’s hardware, or whether a new driver update will be able to fix it.

http://www.dsogaming.com/pc-performance-analyses/forza-motorsport-7-pc-performance-analysis/2/
 
Could this be a case of A-sync compute allowing for far better utilization of the AMD GPUs in this particular case? Without having a Dx11 version to compare to, it's hard to say whether NVidia's well optimized Dx11 driver would fill execution bubbles where it may not be filling them adequately in this title. IE - would it have performance in Dx11 similar to or higher than AMD's Dx12 performance.

Regards,
SB
 
Could this be a case of A-sync compute allowing for far better utilization of the AMD GPUs in this particular case?
If it was a Maxwell GPU on Nvidias side: Very likely.

But with Pascal? If the GPU got idle, then it was most likely actually idle, and not just stalled internally.

Regardless, this is still just speculation. If we actually wanted to have any interpretable results, someone should just take a ETW trace, and post the flame chart, unblock chart plus GPUView screenshot for a randomly selected single frame. Given how low the utilization actually was, we should see quickly what stalled. Even without symbols, you can still distinguish which modules blocked, and below the Nvidia driver you have full symbols for DXGI and the kernel / Visual C++ run time.
 
DSO gaming benched Forza 7, their RX580 scored better than their GTX 980Ti. Reason? GPU utilization is horrendous on NVIDIA cards, their 980Ti barely pushed 70% utilization. Something I can confirm with my GTX 1070.



http://www.dsogaming.com/pc-performance-analyses/forza-motorsport-7-pc-performance-analysis/2/
Could this be a case of A-sync compute allowing for far better utilization of the AMD GPUs in this particular case? Without having a Dx11 version to compare to, it's hard to say whether NVidia's well optimized Dx11 driver would fill execution bubbles where it may not be filling them adequately in this title. IE - would it have performance in Dx11 similar to or higher than AMD's Dx12 performance.

Regards,
SB
If it was a Maxwell GPU on Nvidias side: Very likely.

But with Pascal? If the GPU got idle, then it was most likely actually idle, and not just stalled internally.

Regardless, this is still just speculation. If we actually wanted to have any interpretable results, someone should just take a ETW trace, and post the flame chart, unblock chart plus GPUView screenshot for a randomly selected single frame. Given how low the utilization actually was, we should see quickly what stalled. Even without symbols, you can still distinguish which modules blocked, and below the Nvidia driver you have full symbols for DXGI and the kernel / Visual C++ run time.
I really do not think it is the case that it is async compute being problematic on NV. You still get nominal 99% GPU utilisation in DX11 games on Pascal or Maxwell which cannot possibly have asyncronous compute, in fact, it happens rather often with uncapped framerates or just pumping the resolution and settings high enough. As the benchmark above talks about, Forza 7 is having trouble even getting anywhere near 99% nominal readouts.
Forza 7, in my informed opinion, seems to be heavily CPU limited for preparing frames to the GPU, thus limiting its utilisation.

I do not agree with the speculation put forth in this thread that the lower than DX12 level of API in the xb1 is the reason for the disparate performance scaling we see vis-a-vis PC (command processor?). There are so many cross platform games on PC which do in fact scale within the expected values with GPU and CPU power (offering 5-8 times the framerate you see on console in terms of CPU boundedness or GPU boundedness) vis-a-vis the consoles, and those games are also DX11 primarily. Or heck, you have the example of Gears of War 4 (IMO a good DX12 port), whose SP benchmark on high end CPUs caps out at the 200 fps line constantly (in comparison to the 30 fps cap forced on xb1 for performance consistency concerns). I honestly just think there is a problem, either driver-side or game-side, with Forza 7 on PC, regardless of GPU vendor. It is just worse on NV.
 
Last edited:
Back
Top