No DX12 Software is Suitable for Benchmarking *spawn*

So Fury X has different bottlenecks, maybe due to lacking the primitive discard accelerator? We've seen it plenty where Fury X can barely scrape past 10% faster than the 390X.

Wow what? Both those cards you just mentioned don't have a primitive discard accelerator lol, so you think only Fiji is hampered by it? Where is the logic in that? Why don't ya pick something both cards have similar units of and then go from there *sarcasm*.... Now with that, AMD drivers limit tessellation amounts so that game if it was run with high tessellation, something you will have tell us, wouldn't hurt AMD cards. So we can throw that out the window.

Concentrate on HUB's and DF's results with Titan X, where they can't get above 80fps yet my Crossfire 480's hit 90fps or HUB's 295X2 wasting the Titan X.

So if its the lack of primitive discard accelerator that caused the problem in Fiji, 295x2 which doesn't have it err ok. That gave two distinct showings that its NOT because of the lack of primitive discard accelerator. But you just didn't understand what you were saying so you said it anyways.

So now why is the 1070 and Fiji, respectively where they should be? Now we can see it can be something else right? Maybe nV's path in the game code? It could be there, see we didn't really solve or specify anything, outside something is going wrong.....

Look either ya bone up or making statements like this just doesn't fly.


A bit more comprehensive testing.

DX11 and DX12 on the 1070 have similar CPU bottlenecks in that specific game, so we can't even put this squarely on the drivers, need a lot more testing but for now nV's CPU bottleneck is noticeable on higher performance cards.

This is against Fiji, a card, Adorned, you complained that you didn't have to do proper testing, which led you to the path of using dual rx480's, undoubtedly screwed up your theories by doing so too.

So if in your, Adorned, theory, you think there is an nV driver bug right, so they fix that, they will absolutely annihilate anything AMD would have in DX12 if that..... if the 1070 is still faster around the Fiji, with 20% overhead? SO is nV leaving 20% of its performance on the table with this game? If so and its fixed, that means, in this game AMD Vega would need to get to 1080 gtx performance levels in other games to keep up with the 1070 in this game, does your theory now sound likely?

Now you see why you don't speculate based on BAD testing? Cause I just got a hyperbole outcome to what you stated! Vega will beat the pants off of Pascal where ever it may land if the bug is still there is what you stated, but looks to me they will end up around the same performance levels in this game if the bug is there according to the Fiji and 1070 benchmarks.
 
Last edited:
And what have you been saying?
Arrgh don't make me go that far back...
But in retrospective you're right, I should've said what I was talking about:
https://forum.beyond3d.com/posts/1956409/
https://forum.beyond3d.com/posts/1947045/ (pictures are gone in that last pic, unfortunately...)

Regardless, that's what I've been saying: Crossfire scaling in DX12, using AFR, gets me ridiculously high gains, well over 90% with the second card.


I think personal attacks are not what this forum is about, if we wanted that we have plenty of other forums out there where thats the norm.
I think it isn't about fanboy crap (or paid shills, because those have been proven to exist in forums.. who knows?) from the nvidia patrol either, who constantly tag team on flamebait and strawmen and always put likes each others' posts, yet here we are...
Many threads in this sub-forum are corrupted beyond any hope by the same users, over and over again. As it seems, these people tend to coordinate their narrative through PM or out-of-forum communication, so it's a lot harder (and time-consuming) to organize and present the proof of intent. Moderators also have better things to do than to analyze tens of posts to get wind of these coordinated "attacks", so it's pretty much the perfect crime in a forum like this.

And IMHO that's a lot worse than a single post with a personal attack that can very transparently be detected and moderated. It causes a lot more damage.



RX 480 is a 5.8 TFlop, GTX 1060 is 4.4 TFlop.
The GTX1060 FE hovers at 1800MHz most of the time, so that's 4.6TFLOPs. 3rd party solutions go higher than this, over 1900MHz easily on many of them, and that's 4.9 TFLOPs.

The reference RX 480 usually clocks lower than the 1266 boost, usually at ~1210MHz so its throughput is closer to 5.6 TFLOPs.
So now we're down to a 14-21% difference in throughput.
 
Last edited by a moderator:
Interesting, his findings with 3600Mhz ram + Ryzen + Nvidia are much better than the comparison posted some days ago. AMD is also pulling ahead in several CPU limited scenarios (Ryzen or Intel, doesn't matter) in Dx12, so it might just be a limitation with Nvidia drivers, or maybe he is still GPU bound at 720p :LOL:

Yeah this must be it.
Ryzen had some "issues" with memory performance. I.e.:
http://www.eteknix.com/memory-speed-large-impact-ryzen-performance/

The fix with AotS also basically related to memory access/caching improvement:
https://twitter.com/FioraAeterna/status/847472836344033280
https://twitter.com/FioraAeterna/status/847835875127959554

Ryzen reviewers also found various issues with memory speed/latency. I.e. just some examples:
http://www.guru3d.com/articles-pages/amd-ryzen-7-1800x-processor-review,13.html
http://www.eurogamer.net/articles/digitalfoundry-2017-amd-ryzen-7-1800x-review

Ryzen motherboard bios updates apparently mostly are fix for these issues.

And for some reason NV driver is more affected than AMD graphics driver on games that -- I guess -- are also memory intensive. Which is hardly surprising as NV driver is always trying to be smart. Sometimes too smart for it's own good.
 
And for some reason NV driver is more affected than AMD graphics driver on games that -- I guess -- are also memory intensive. Which is hardly surprising as NV driver is always trying to be smart. Sometimes too smart for it's own good.
TechPowerUp did some testing on Ryzen at low/high RAM speeds. In some games, like Hitman for example the difference amounted to be 12%. Other games however, difference amounted to almost nill. Among them is Rise Of Tomb Raider.

hitman_1920_1080.png
rottr_1920_1080.png


Total Warhammer exhibited a (14%) increase, Civilization 6 (11%), Anno 2205 (7%), Dishonored 2 (7%), Battlefield 1 (5%). Other games didn't show any improvements. Keep in mind this is a comparison between 2133MT/s and 3200MT/s. if we started from 2666MT/s the difference will get smaller.

https://www.techpowerup.com/reviews/AMD/Ryzen_Memory_Analysis/9.html

The take away here is RAM speed can definitely affect Ryzen in some titles, but it is not the case in Tomb Raider.
 
ne can watch HUB's video and not realise that when Titan X ties with a 1070, yet Crossfire 480's thrash both isn't a clear driver issue is beyond me though. This "DX12 Ryzen Crossfire" theory is probably the most laughable thing I've read yet.
Although a few professional reviewers have identified issues or at least an unpredictability with RoTR and Hitman while other games are fine.
Your conclusion is sort of similar to someone (not saying you just hypothetical) blaming Nvidia involvement in games for AMD drivers in DX11:)
It makes sense to try other games that have good DX12 implementation and various sections/scenes, and to check if something else is untoward on the platform/maybe try higher voltages.
Another wierd result on Youtube is RedTech Gaming who saw notable increases in both RoTR and Hitman when they used the latest Ryzen BIOS for his platform and possibly tweaked the voltages - last point he talks about about voltage tweaking the platform is not very clear in their video but if applied possibly this more than the BIOS *shrug*, shame he is not clear on this point.
Other games only had small/marginal improvements, and he was using a GTX1080.
While I would not usually put weight on youtube tech vid sites, it is relevant if we also talk about yours and something to mull over what further tests if any, and interesting the improvements were really only those 2 games RoTR and Hitman.
Around 3min 11secs.


Too early to blame all of this on Nvidia IMO if it ends up coming down to primarily RoTR behaving poorly with Nvidia cards on Ryzen, especially if we also consider HardwareUnboxed test using GTX1070FE with various games in this very specific context and the only game that seem unusual was RoTR and even then the DX12 performance was pretty good compared to your test for some reason apart from when he tested briefly the Titan Pascal that did show major issues in RoTR on Ryzen and lower fps than his GTX1070FE.
The FuryX was a baseline GPU with the GTX1070FE to highlight performance anomalies.
GP102 though possibly does need a driver update to help it with whatever is the cause in RoTR (also possibly Hitman) on Ryzen, especially as its performance is only 10% faster to sometimes 5% slower in RoTR against the GTX1070FE on this platform as seen by Hardware Unboxed.

Cheers
 
Last edited:
No, it means the driver allocates no own threads, but it can execute multi-threaded when you do call it that way. The core functions are thread-safe (you thread in any way you want), and commands-lists are affinity locked (you can use 1 and the same thread only, but you can record many command-lists concurrently). Even if there are opportunities for a threading-"AI", it is not allowed.
Ah, I think I see what you mean. I was not talking about command lists and such, but the stuff the drivers do in the background. Since Kepler, Nvidia uses software-based scheduling for functions with known latencies, right? This scheduling and probably compiling and maybe even on-the-fly code optimization (shader replacement?) can probably be done with different levels of intensity, just like you can compress files with different compression levels. It was this background stuff I was talking about, which I can imagine could get in the the way - maybe even especially with a Nvidia-sponsored title like RotTR, since they will have a lot of inside knowledge of what exactly to tune there.

Now, when you got lots of free CPU cycles either by virtue of high IPC or by being GPU bottlenecked anyway, this might be a good thing. If you're doing too much in the background or the cores themself do not have too many free resources to begin with, your additional work in the driver could get in the way.

Nvidias driver probably has inherited some lines of thought they applied to their mini-OS which is the DX11 driver. While AMD might be treading the light way route?

Is this the same guy who's been benching Starcraft II for years even with it's 26% faster performance between the 7700K and 7600K, or showing KL twice as fast as Broadwell-E?

Anyone can run macros on a couple of PC's and chart the end results. Understanding what is being seen is another matter altogether.
No, that's me. And for SC2 LotV I have done clock scalings with more than 4 different µarchs, which clearly show one thing: At a different amount of MHz (IPS so to say) depending on the architecture, it reaches a point where it scales better than linear with the clock speed.


FYI, I already benched TR at low settings and I know what is going on here.
What is it then that's going on? Could you be so kind and elaborate on that?
 
Last edited:
Ah, I think I see what you mean. I was not talking about command lists and such, but the stuff the drivers do in the background.

That's what I was hinting at. There is no "background" in the DX12(-driver). When you compile a shader-pipeline, you get the full hit. Same for all functionality. Nvidia likes to do that in DX11(-driver), progressively optimizing shaders and such, but's now allowed under DX12(-rules).
 
That's what I was hinting at. There is no "background" in the DX12(-driver). When you compile a shader-pipeline, you get the full hit. Same for all functionality. Nvidia likes to do that in DX11(-driver), progressively optimizing shaders and such, but's now allowed under DX12(-rules).

They surely will have to have some kind of translation from DX12 shader language into their machine code, won't they? And even then... prohibited things have been done in the past.
 
Since Kepler, Nvidia uses software-based scheduling for functions with known latencies, right?
The scheduling in this case is the compiler determining the latency of dependent arithmetic instructions, and arranging the stream so that instructions that consume prior results either have enough independent work between them and the producer, or the appropriate stall count has been noted for the producer.
That's pretty much the only hardware to software scheduling transition with Kepler, and it's a rather low-level kind independent of the other forms. Determining ALU latency that is hard-wired is more straightforward than optimizing buffers and work allocation.
DX12 has more emphasis on controlling when compilation happens, so that it happens outside of the critical loop.

I ran across some discussion of various optimizations or custom paths added for workarounds, or to enable some of the underutilized multithreading functions for DX11 for games like Civ5. I think I remember some observations of Nvidia's driver trying to spin up threads in unexpected ways in order to tweak the responsiveness of thread scheduling in the past. Perhaps if some form of that remains, it becomes a non-optimization with Ryzen.
 
No, that's me. And for SC2 LotV I have done clock scalings with more than 4 different µarchs, which clearly show one thing: At a different amount of MHz (IPS so to say) depending on the architecture, it reaches a point where it scales better than linear with the clock speed.
Nothing wrong with using SC2 though, It's a popular title with huge fan base and big tournaments all over the world. Massive multiplayer matches with 8 players are still CPU hogs to this day. It could use some Ryzen optimizations though. Years ago, some users found that tricking the OS into recognizing an AMD CPU as an Intel one (CPU Spoofing) will generally improve performance on the AMD CPU by a noticeable margin.
 
Why are FLOPS relevant in this particular scenario?

Just can't see anything wrong with nVidia drivers in this particular scenario. More TFlop powerfull card has beaten less TFlop card in Low detail game setting (when geometry processing may not be a bottleneck anymore). It is rather strange, that it is not the case always (i mean High detail game setting).
 
More TFlop powerfull card has beaten less TFlop card in Low detail game setting (when geometry processing may not be a bottleneck anymore). It is rather strange, that it is not the case always (i mean High detail game setting).

The GTX 1060 has well over twice the theoretical pixel fillrate of the RX 480 (50% more ROPs at ~50% higher frequency), and it gets ~50% higher fillrate in synthetic tests.


Source: anandtech's GTX 1060 review.


Once you dumb down shader complexity (ALU bound), texture resolution (bandwidth and I guess TMU bound) and geometry, then the graphics card is probably just pushing as much pixels as the ROPs allow, while the rest of the pipeline gets to stay idle for longer cycles.
(This is probably why the GTX 1060 gets a considerably higher SteamVR score than the RX480, for example. That and/or the single-pass stereo feature of Pascal, though I don't think there has ever been any benchmark to determine its influence.)


That said, it should be expected for the GTX 1060 to soundly beat the RX 480 when graphics settings are set to "low".
The fact that it doesn't in DX12 RotTR at low settings is really the odd duck here.
 
Once you dumb down shader complexity (ALU bound), texture resolution (bandwidth and I guess TMU bound) and geometry, then the graphics card is probably just pushing as much pixels as the ROPs allow, while the rest of the pipeline gets to stay idle for longer cycles.

Question is if shaders are so dumb on Low setting (in TR). For RX 480: 2304/32 = 72, for GTX 1060: 1280/48 = 26. So shaders must run (with all internal HW overhead) less than 72 cycles (RX 480) or 26 cycles (GTX 1060) to become ROP (or theoretical pixel fillrate) bound. I can't say for sure if shaders in TR are or are not that simple, but for me these numbers looks like early DX9 era.
 
You can't just wave around with theoretical pixel fillrate or flops anymore. Yes GeForce has a much higher pixel fillrate... In case of 32bit render target. It is 1/2 rate for 64bit and 1/4 for 128bit. While Radeon will keep full rate for 64bit render targets. And then when you come to texturing side the reverse is true (full rate filtering for 64bit textures on GeForce and half rate on Radeon). And then there's a super fast rate for pixels that only go to depth buffers.
Without knowing exactly what a specific game is doing you can't go around explaining some performance differences on some theoretical numbers alone. This used to work quite well in the past but not anymore with the new approaches to rendering (deferred etc.) and various other tricks in graphics pipeline (for example NV tilling).
 
You can't just wave around with theoretical pixel fillrate or flops anymore. Yes GeForce has a much higher pixel fillrate... In case of 32bit render target. It is 1/2 rate for 64bit and 1/4 for 128bit. While Radeon will keep full rate for 64bit render targets. And then when you come to texturing side the reverse is true (full rate filtering for 64bit textures on GeForce and half rate on Radeon). And then there's a super fast rate for pixels that only go to depth buffers.
Without knowing exactly what a specific game is doing you can't go around explaining some performance differences on some theoretical numbers alone. This used to work quite well in the past but not anymore with the new approaches to rendering (deferred etc.) and various other tricks in graphics pipeline (for example NV tilling).
Exactly. And these are the "easy" bottlenecks. There's also a whole bunch of harder to analyze geometry bottlenecks. When you decrease the rendering resolution, the geometry density (per pixel) increases. Comparing high end GPU performance at 720p is going to result in surprises. Extremely low resolutions reveal bottlenecks that are not visible at 1080p or 4K. Some of low resolution bottlenecks are the same that high tessellation level causes (but not all of them).

And the bottlenecks in reading & writing various kinds of buffers have even bigger perf differences. I wrote a buffer perf tester some time ago: https://github.com/sebbbi/perftest
 
You can't just wave around with theoretical pixel fillrate or flops anymore
Exactly. And these are the "easy" bottlenecks. There's also a whole bunch of harder to analyze geometry bottlenecks. When you decrease the rendering resolution, the geometry density (per pixel) increases. Comparing high end GPU performance at 720p is going to result in surprises.
Thanks guys for eloquently pointing out what should be obvious from the start. Careful though, paranoia runs rampant here, some will have irrational ideas and accusations might start flying quickly. Better duck for cover.

On a more serious note, TechSpot reiterated their piece on this, they now have several games tested @720p and low details. In general, @DX11 Ryzen performs worse on AMD's 480, @DX12 Ryzen is worse on NV's 1060.

RotT_DX11.png

RotT_DX12.png




So, are Nvidia GPUs limiting Ryzen's gaming performance? Well, we didn't find any evidence of that. In some DX12 scenarios, the 1800X performs better than the 7700K when paired with a RX 480 over the GTX 1060, but that doesn't prove Nvidia is handicapping Ryzen.
http://www.techspot.com/article/1374-amd-ryzen-with-amd-gpu/
 
Thanks guys for eloquently pointing out what should be obvious from the start. Careful though, paranoia runs rampant here, some will have irrational ideas and accusations might start flying quickly. Better duck for cover.

On a more serious note, TechSpot reiterated their piece on this, they now have several games tested @720p and low details. In general, @DX11 Ryzen performs worse on AMD's 480, @DX12 Ryzen is worse on NV's 1060.

RotT_DX11.png

RotT_DX12.png





http://www.techspot.com/article/1374-amd-ryzen-with-amd-gpu/

Any idea what drivers was used for both Nvidia and AMD?
I could not see it in the article.
Shame he chose some disappointing titles such as Warhammer DX12 as that is a skewed one in terms of dev optimisation/kinda broken and not sure if it would provide a usable result-trend.
I would think Sniper Elite 4 and Gears of War 4 would make more sense to see Nvidia behaviour than Warhammer.
Thanks
 
Last edited:
Any idea what drivers was used for both Nvidia and AMD?
Latest ones, The i7 7700K is downclocked to 1800X frequency though @4GHz.

Curious to see if there is any truth to the story, we grabbed a test system and installed a GeForce GTX 1060 and Radeon RX 480. Both were equipped with the latest display drivers and were only put up against low quality settings at 720p to help eliminate GPU bottlenecks.
http://www.techspot.com/article/1374-amd-ryzen-with-amd-gpu/
 
Back
Top