No DX12 Software is Suitable for Benchmarking *spawn*

However no love for Nvidia as it had no improvement for either DX11 or DX12 above 720p - makes one wonder if this patch was specific for AMD even if they did go on about improving for CPU that shows at 720p :)
I'd assume Nvidia isn't actually scheduling anything concurrently, so the difference between the two is minimal. DX12 I'd imagine is just emulating the DX11 path with hopefully a little less driver overhead. Case in point Doom, where the game stuttered really badly, but a compute queue finally showed up in the recent version that fixed everything. Intrinsics aside, the FPS was likely similar and the queue only addressing timing issues with Id's optimizations. I'm not sure it's worth adding a Pascal specific path to games for what are likely marginal gains at best. The only reason I could see the path happening is because people are benchmarking the game. The effort involved for what might be a few percentage point gain I'm not sure is warranted. Maybe with Volta included eventually, but that could again be an entirely different path.
 
err Doom Vulkan Pascal gets some decent improvements in the later reviews, any where from 20% up. Going from Ogl, to Vulkan with Async on. So not sure what the differential is for async on and off, but I'm pretty sure its decent enough in Pascal now.

The 1060 is ~10% behind the rx 480 but it can play all the resolutions up to 4k at the same settings as the rx480. And this was just through driver updates from nV, so there seems to be more work to be done on drivers for Vulkan on nV's end.

http://www.guru3d.com/articles_pages/gigabyte_radeon_rx_480_g1_gaming_review,16.html
 
Quick look shows that the 480 improved by a fair margin with the update with DX11 at 1080p (around 10%), but there was a big imrovement with DX12 with the 480
End result 1060 ties the 480, and slightly edges it under the FX 3870 and @720p regardless of the processor! And DX12 still lags behind DX11 on both cards.
 
I'd assume Nvidia isn't actually scheduling anything concurrently, so the difference between the two is minimal. DX12 I'd imagine is just emulating the DX11 path with hopefully a little less driver overhead. Case in point Doom, where the game stuttered really badly, but a compute queue finally showed up in the recent version that fixed everything. Intrinsics aside, the FPS was likely similar and the queue only addressing timing issues with Id's optimizations. I'm not sure it's worth adding a Pascal specific path to games for what are likely marginal gains at best. The only reason I could see the path happening is because people are benchmarking the game. The effort involved for what might be a few percentage point gain I'm not sure is warranted. Maybe with Volta included eventually, but that could again be an entirely different path.
The point being it had no improvement for both DX11 and DX12 for NVIDIA, whereas AMD had an improvements in both, context being 1080p rather than 720p that benefited both suggesting other improvements also happened as the game seems to be more GPU than CPU bound above 720p.
And as Razor mentions in later reviews there is now an improvement for Nvidia Pascal in Vulkan Doom, but that is digressing from the point of both DX11 and DX12 boosts for AMD in this game after patch.

Worth noting though this is the same Glacier 2 engine (albeit modified) that had relatively poor performance with recent Hitman in DX11 for Nvidia,, for now anyway and I agree more information is needed to understand why; maybe driver-DX11 or similar reason to Hitman (albeit one of the chapters-scenes did have good performance on Nvidia)
Cheers
 
Doom Vulkan Pascal gets some decent improvements in the later reviews, any where from 20% up. Going from Ogl, to Vulkan with Async on. So not sure what the differential is for async on and off, but I'm pretty sure its decent enough in Pascal now.
Never said they didn't get an increase. Just that previously they had been running with async off (no compute queues) and bad stuttering. I'll admit I'm speculating here, but getting the cross-lane intrinsics working and removing some compute tasks for the presentation seems likely. Unless Nvidia was really behind on their Vulkan drivers they should have had the concurrency portion with a second queue available for the initial benches if it made sense. I'm guessing the crosslane improved the performance and the async queue the timing issues.

The point being it had no improvement for both DX11 and DX12 for NVIDIA, whereas AMD had an improvements in both, context being 1080p rather than 720p that benefited both suggesting other improvements also happened as the game seems to be more GPU than CPU bound above 720p.
That's sort of what I was getting at with AMD possibly running an async/concurrent path and Nvidia routing most work through the graphics queue. Save the TSSAA portion which would likely be bandwidth intensive and not a great concurrent task. As an optimization AMD could in theory extract the compute shaders for OGL. It's also possible Id rewrote one of the shaders and Nvidia replaced it resulting in no change. On the other hand AMD could have optimized a shader and provided it to Id. It's also possible something Nvidia already optimized got translated to AMD's path. Result being AMD increase and Nvidia staying the same as they already had the optimization.
 
That's sort of what I was getting at with AMD possibly running an async/concurrent path and Nvidia routing most work through the graphics queue..
How does that work with DX11 improvements in Deus Ex patch for AMD?
TBH this seems to have similar trend to Hitman but performance gap increased even further, did Nvidia ever manage to overcome their performance deficit in drivers for Hitman 2016?
I thought it was the Glacier 2 game engine/post processing that influenced the performance, which sort of showed in how one episode where Nvidia performed better than AMD.
Cheers
 
Last edited:
Worth noting though this is the same Glacier 2 engine (albeit modified) that had relatively poor performance with recent Hitman in DX11 for Nvidia,, for now anyway and I agree more information is needed to understand why; maybe driver-DX11 or similar reason to Hitman (albeit one of the chapters-scenes did have good performance on Nvidia)
It seems the situation is mirrored in Hitman 2016 as well, using the internal benchmark, 980Ti is delivering same fps as Fury X under DX12. Despite delivering largely inferior performance during normal gameplay. So this is really not uncommon.

http://www.bit-tech.net/hardware/graphics/2016/09/16/gigabyte-gtx-1060-g1-gaming-review/5
 
It seems the situation is mirrored in Hitman 2016 as well, using the internal benchmark, 980Ti is delivering same fps as Fury X under DX12. Despite delivering largely inferior performance during normal gameplay. So this is really not uncommon.

http://www.bit-tech.net/hardware/graphics/2016/09/16/gigabyte-gtx-1060-g1-gaming-review/5
Although with the Glacier 2 I am talking about both DX11 and DX12 together.
That shows how much of an issue there is with the Glacier 2 engine IMO, because the 480 and 390X are better than the Fury X.

The 980ti is a pretty good Maxwell 2 GPU that still sort of holds up, in the game when measured with tools designed for DX12 and with a custom model the 980ti can be faster than a Fury X in that game (PCGameshardware.de), but it is academic because the bit-tech review unfortunately does not have both models anyway.

Key point is to needs to be custom AIB model and not reference blower one where the gap increases.
But a reference 980ti should be faster than a 390X (many did not have good OC out of the box) then.
Key point is that this is also influenced by the episode, which fits in with my previous post comments.
Hardware Canucks use a different episode and the results are very different between a 980ti and Fury X vastly superior although they use a reference 980ti (but that would still be a notable gap with custom card), with both publications using same tools to measure DX11 and DX12.

Cheers

Edit:
Edited to change top comment as found the Fury X eventually in the chart, and it has an issue with this episode of the game it seems as it is also behind the 390X and 480.
 
Last edited:
Look again, both reference FuryX and 980Ti are used. (Fury X is in the middle of the graph)
I ninja edited while you were replying as I did notice that doh :)
But, the reason I missed it is because the Fury X is actually worst than the 480 and 390X.
So that is a bad comparison to use anyway IMO and shows something is wrong.
At 1080p even the 470 4GB is better in terms of minimum fps!!
Which impacts heavily on the Fury X average as it must have a pretty moderate % down there.
Anyway as I mentioned depending upon the Hitman episode the 980ti has been benched faster than the Fury X, and in the episode where the 980ti is faster the Fury X also has improved and does not suffer low minimum fps (they were measured higher than 390X).
TBH this engine seems a bit of a nightmare, especially with Hitman 2016 and the variation in episode but generally Nvidia does have issues with achieving optimised performance (980ti aside).

Edit:
And this is another example of why there should be full transparency of internal benchmarks with not just settings but methodology/frames presented/captured-reported fps with internal measurement tools.

Cheers
 
Last edited:
And this is another example of why there should be full transparency of internal benchmarks with not just settings but methodology/frames presented/captured-reported fps with internal measurement tools.
Agreed, take Ashes for example, the internal test tends to have several measurment variations, and thus could prove to be unreliable, hardwarecanuks uses the test but logs in fps through PresentMon, results are interesting.

In DX11, FuryX is slightly ahead of the 980Ti, but in DX12 980Ti is slightly ahead!
http://www.hardwarecanucks.com/foru...asus-gtx-1080-gtx-1070-strix-oc-review-3.html
http://www.hardwarecanucks.com/foru...asus-gtx-1080-gtx-1070-strix-oc-review-9.html
 
Never said they didn't get an increase. Just that previously they had been running with async off (no compute queues) and bad stuttering. I'll admit I'm speculating here, but getting the cross-lane intrinsics working and removing some compute tasks for the presentation seems likely. Unless Nvidia was really behind on their Vulkan drivers they should have had the concurrency portion with a second queue available for the initial benches if it made sense. I'm guessing the crosslane improved the performance and the async queue the timing issues.

It was all driver related, depending on the map, nV actually has the lead at times too.
 
Last edited:
Agreed, take Ashes for example, the internal test tends to have several measurment variations, and thus could prove to be unreliable, hardwarecanuks uses the test but logs in fps through PresentMon, results are interesting.

In DX11, FuryX is slightly ahead of the 980Ti, but in DX12 980Ti is slightly ahead!
http://www.hardwarecanucks.com/foru...asus-gtx-1080-gtx-1070-strix-oc-review-3.html
http://www.hardwarecanucks.com/foru...asus-gtx-1080-gtx-1070-strix-oc-review-9.html


Well for the most part I think most reviewers are using PresentMon for AOTS, just odd that the swing is so great when using the in game benchmarks on so many of these DX12 titles.
 
How does that work with DX11 improvements in Deus Ex patch for AMD?
Even in DX11 I'd think the drivers could route shaders to a compute queue as opposed to graphics queue as an optimization feature. Shouldn't be that different from shader replacement. That's just one possibility here and I have no idea what they are actually doing. The other possibility I mentioned was taking an optimization Nvidia may have implemented on their specific path and ported it to the AMD path.
 
Forza Horizon 3 DX12 test:

f3_1920.png


http://gamegpu.com/racing-simulators-/-гонки/forza-horizon-3-test-gpu
 
Going to go out on a limb here and suggest memory capacity plays a large factor in that game. 470 just above Fury X, 290x equal to 380x, 960 besting a 780ti. Interestingly there is no 8GB 480 tested. Not sure why they even presented this benchmark considering the results. All it shows is you need >6GB to attempt playing the game.
 
hmm then why is the 1060 above the 470? Then as resolutions increases, Fury X takes the lead over the 1060 and the 470.... Its not due to memory limits at all.

And they even tested the memory usage at the different resolutions

f3_vram.png


If anything the opposite should have happened if the game was hitting the memory so hard on cards with less than 6gb..... We should have seen the 1060 take the lead more as the res went up.
 
Going to go out on a limb here and suggest memory capacity plays a large factor in that game. 470 just above Fury X, 290x equal to 380x, 960 besting a 780ti. Interestingly there is no 8GB 480 tested. Not sure why they even presented this benchmark considering the results. All it shows is you need >6GB to attempt playing the game.
Take another look, If it were limited by memory you bet I would post it in the 4GB thread, Fury X takes the lead at 4K, and even @1080p it never consumes more than 3.5GB (see Razor's post). Also the 470 tested is only 4GB so same as FuryX.
 
Last edited:
hmm then why is the 1060 above the 470? Then as resolutions increases, Fury X takes the lead over the 1060 and the 470.... Its not due to memory limits at all.
Because 6GB is greater than 4GB? As resolution increased other bottlenecks would kick in. The framebuffer will take more space and the raw power of the cards help more with higher resolution even if streaming in textures. The memory measurements won't be entirely accurate either. Quadrupling resolution costs 25% performance.

Fury X:
1080p -> 42fps
1440p -> 38fps
4k -> 29fps

Take another look, If it were limited by memory you bet I would post it in the 4GB thread, Fury X takes the lead at 4K, and even @1080p it never consumes more than 3.5GB (see Razor's post).
So the fact that razor's memory usage graph shows 4827MB being used at 1080p indicates all the 3/4GB cards aren't bottlenecked by memory? Sure some resources could stay resident, but in that case I'm sure they'd use more than 4.8GB on a 6GB card. If it's not memory they're struggling to load resources in advance.
 
Back
Top