still under "theory", nV hardware can do async in hardware up to a certain point with less latency than AMD hardware, but when stressed after that point its kinda like this
still under "theory", nV hardware can do async in hardware up to a certain point with less latency than AMD hardware, but when stressed after that point its kinda like this
Those results are AMD's PR provided results, not results from compiled by ExtremeTech.
A great start would be this post from Ext3h.Anyone feel like generously giving me an update on what's going on in this thread, in baby language? So I can understand it and all?
It's running asynchronously where it's supported. "Async Compute" isn't a mandatory DX12 "flag".It is absolutely running asynchronously.
"Async Compute" is the ability to start rendering and compute tasks at the same time, throughout the ALUs. If it's not running concurrently, there's no "Async Compute" happening.The question is whether it is running CONCURRENTLY.
Please, Please, PLEASE pay attention to the words you use in technical discussion.
You could almost say it's a DX12 implementation tailored for nVidia GPUs, then...No draw call overload, backpressure only in the graphics queue and no more than a single compute command every few graphic batches, only copy commands where ever issued asynchronously.
Which reviews show Nvidia being CPU-limited at 4K? There seems to be evidence to the contrary, since Anandtech's results generally show no sensitivity to CPU choice until 720p, and Techreport's factory-overclocked 980 Ti is demonstrably faster relative to Fury than other reviews with stock cards.Looks like Nvidia got heavily CPU limited in the Fable benchmark this time. Even at 4k.
Extremetech accidentally managed to throttle the CPU at 1.7Ghz by choosing the wrong power profile and that resulted in Fury X outranking the 980 Ti even at 4k and 1080p. On 720p, the Fury X took only a 2% performance hit from reduced clock speed. 980 Ti lost about 30%.Which reviews show Nvidia being CPU-limited at 4K? There seems to be evidence to the contrary, since Anandtech's results generally show no sensitivity to CPU choice until 720p, and Techreport's factory-overclocked 980 Ti is demonstrably faster relative to Fury than other reviews with stock cards.
So the 980 Ti is CPU-limited when the CPU is massively downclocked and the resolution is at 720p.Extremetech accidentally managed to throttle the CPU at 1.7Ghz by choosing the wrong power profile and that resulted in Fury X outranking the 980 Ti even at 4k and 1080p. On 720p, the Fury X took only a 2% performance hit from reduced clock speed. 980 Ti lost about 30%.
I must say I am not a fan of comparing stock NVIDIA to stock AMD as their sales model/channel seems to be a bit different where NVIDIA provides greater flexibility for their partners to differentiate from the reference design in terms of performance based upon noise, heat design, and importantly greater clocking capability; ExtremeTech used a stock reference 980/980ti and lets be honest only very early technology adopters should have these as they are not as good as the slightly later AIB manufacturers.Those results are AMD's PR provided results, not results from compiled by ExtremeTech.
I don't know. They are not online any more. Maybe they were just a fluke. Now both graphs show them ranked evenly, and oddly enough, both seemed to have received a performance boost on 1080p, which indicates some common CPU limits. Perhaps particle physics.So the 980 Ti is CPU-limited when the CPU is massively downclocked and the resolution is at 720p.
Where should I be looking for the rankings changing at 4K between the 980 Ti and Fury X?
I must say I am not a fan of comparing stock NVIDIA to stock AMD as their sales model/channel seems to be a bit different where NVIDIA provides greater flexibility for their partners to differentiate from the reference design in terms of performance based upon noise, heat design, and importantly greater clocking capability; ExtremeTech used a stock reference 980/980ti and lets be honest only very early technology adopters should have these as they are not as good as the slightly later AIB manufacturers.
If it's not obvious, there's little justification in saying Nvidia is limited by it. There's no reason to state that an item whose influence is a second-order effect compared to a more dominant bottleneck cannot have some impact.But the CPU limit isn't only there when downclocked. It only became obvious.
That makes it applicable to a claim of being CPU-limited at that resolution, although given the vast gulf in capability between an i7 and an i3, saying it is CPU-limited may not be fully accurate without more elaboration.It's even there at regular clock on a i7-4960X (Anandtech). Still, only 720p, sure, but it is there.
AMD's performance was sensitive to changes in CPU choice, just not in a manner that was intuitive.Only an entirely oversized i7-5960X (costing twice as much as the GPU) could leverage the CPU limit far enough to the let the 980 Ti outperform the Fury X.
While AMD for once did not have a CPU limit at all, at that resolution.
One vendor has a higher CPU dependency, although in absolute terms it requires a significant drop in CPU performance to make it clear.Draw your own conclusions.
Looks like Nvidia got heavily CPU limited in the Fable benchmark this time. Even at 4k.
And no, the game doesn't really make proper use of async compute at all. Only about 5% (time wise) of the workload has been offloaded to a dedicated compute queue. I've seen the GPUView dumps of Nvidia and AMD runs. No draw call overload, backpressure only in the graphics queue and no more than a single compute command every few graphic batches, only copy commands where ever issued asynchronously.
So it looks essentially the same as it would have with DX11, a perfectly safe, well optimized techdemo, where the only DX12 benefit left is the reduced driver overhead. And even that isn't true for Nvidia.
Is this true? That's bad if so. It would mean they left the real benefits for the xbox one and took it down a notch for PC.
Why include AMD results?
In our initial coverage for this article, we included a set of AMD-provided test results. This was mostly done for practical reasons — I don’t actually have an R9 390X, 390, or R9 380, and therefore couldn’t compare performance in the midrange graphics stack. Our decision to include this information “shocked” Nvidia’s PR team, which pointed out that no other reviewer had found the R9 390 winning past the GTX 980.
Implications of impropriety deserve to be taken seriously, as do charges that test results have misrepresented performance. So what’s the situation here? While we may have shown you chart data before, AMD’s reviewer guide contains the raw data values themselves. According to AMD, the GTX 980 scored 65.36 FPS in the 1080p Ultra benchmark using Nvidia’s 355.98 driver (the same we driver we tested). Our own results actually point to the GTX 980 being slightly slower — when we put the card through its paces for this section of our coverage, it landed at 63.51 FPS. Still, that’s just a 3% difference.
It’s absolutely true that Tech Report’s excellent coverage shows the GTX 980 beating past the R9 390 (TR was the only website to test an R9 390 in the first place). But that doesn’t mean AMD’s data is non-representative. Tech Report notes that it used a Gigabyte GTX 980, with a base clock of 1228MHz and a boost clock of 1329MHz. That’s 9% faster than the clocks on my own reference GTX 980 (1127MHz and 1216MHz respectively).
Multiply our 63.51 FPS by 1.09x, and you end up with 69 FPS — exactly what Tech Report reported for the GTX 980. And if you have an NV GTX 980 clocked at this speed, yes, you willoutperform a stock-clocked R9 390. That, however, doesn’t mean that AMD lied in its test results. A quick trip to Newegg reveals that GTX 980s ship in a variety of clocks, from a low of 1126MHz to a high of 1304MHz. That, in turn, means that the highest-end GTX 980 is as much as 15% faster than the stock model. Buyers who tend to buy on price are much more likely to end up with cards at the base frequency, the cheapest EVGA GTX 980 is $459, compared to $484 for the 1266MHz version.
I don't recall seeing a DX11 vs DX12 comparison, so saying that there is no reduction in driver overhead for Nvidia is a dubious assertion.Is this true? That's bad if so. It would mean they left the real benefits for the xbox one and took it down a notch for PC.
If, and this is an if, the explicitly listed compute category is asynchronous compute, we see the overall contribution it makes to frame time. It could go to zero ms and the overall picture would only change a little.What are the chances AMD can have their driver force compute shaders to be run asynchronously concurrently...
There are possibly hundreds of reasons why things could go one way or the other.Someone mentioned that extremetech results were provided by AMD. I want to provide the full context of the quote the person made for clarity. It doesn't make as much sense for a 390 to beat a stock 980 without good usage of async.
There are possibly hundreds of reasons why things could go one way or the other.
For one thing, numbers provided by AMD purporting a lead for a card that is not reflected by reviews actually does make sense, in view of what has already happened.