No DX12 Software is Suitable for Benchmarking *spawn*

Wouldn't you be aiming to keep vram usage below 4GB on a 4GB card?

Whether through drivers or through developer choices based on detected video memory (this is a DX12 game, right?), I would have though you'd be trying to do exactly that. And it wouldn't necessarily mean it wasn't costing you performance.

Store what you need to in main memory, and transfer only as much into video memory as you can fit and stream as appropriate, even if there's overhead to that.
 
Because 6GB is greater than 4GB? As resolution increased other bottlenecks would kick in. The framebuffer will take more space and the raw power of the cards help more with higher resolution even if streaming in textures. The memory measurements won't be entirely accurate either. Quadrupling resolution costs 25% performance.

Fury X:
1080p -> 42fps
1440p -> 38fps
4k -> 29fps


So the fact that razor's memory usage graph shows 4827MB being used at 1080p indicates all the 3/4GB cards aren't bottlenecked by memory? Sure some resources could stay resident, but in that case I'm sure they'd use more than 4.8GB on a 6GB card. If it's not memory they're struggling to load resources in advance.


No, we have seen the same type of memory streaming in many games up till now, the memory subsytem isn't what is causing the frame rate variance....

Do you know with streaming textures the performance loss is? I can tell you its in the neighborhood of 10% if the developers were capable not to overload the bandwidth of the pci-e bus, which the engine developers are familiar with.

Not only that you have 980 4gb doing better than Fury X at 1080p and loosing out to the Fury X at 4k, if it was truly a memory amount issue that wouldn't happen either. The 980 would have had the same constraints as the Fury X
 
Last edited:
Not only that you have 980 4gb doing better than Fury X at 1080p and loosing out to the Fury X at 4k, if it was truly a memory amount issue that wouldn't happen either. The 980 would have had the same constraints as the Fury X

It doesn't necessarily work this way. Bottlenecks potentially shift around many thousands of times during the creation of a frame, and different parts of the GPU may even be bottlenecked by different things at the same time. So while both may be suffering from limited vram, there will undoubtedly be other factors that continue to affect performance and there is absolutely no requirement that both scale the same with resolution.
 
They don't need to scale the same but I would expect them to have a similar drop in performance relatively to each other.

We can easily say otherwise this game runs better on nvidia hardware.

You have a gtx 970 in there that exhibits the opposite of a memory bottleneck too, if we consider the game to be hitting that to even begin with.
 
Last edited:
There is only one other benchmark where I have seen R9 280X (Tahiti) < R9 390 (Hawaii) < R9 380X (Tonga) < Polaris 10. Or rather, GCN1 < GCN2 < GCN3 < GCN4 so linearly within cards with different compute and fillrate performances:

B36QdJ5.png


Looks like Forza Horizon 3 is measuring geometry bottlenecks. Not compute performance, not memory amount, not fillrate, etc. Just geometry, at least at 1080p.
Which is why even the cut-down and lower-clocked Polaris 10 in the RX 470 seems to be keeping up with the GTX 1060 6GB.

Maybe they have a tessellated ocean mesh being rendered below the tracks all the time, or something.
Whatever it is, it seems to be killing performance all around, while image quality doesn't seem to be all that good, compared to older racing games available on the PC.



EDIT: Ocean is definitely tessellated in the PC version, whereas the Xbox One version seems to be just a moving texture. The console version doesn't even seem to show many differences other than the rendering resolution (or MSAA) and the ocean.


Regardless, the Xbone version seems to be rendering at 1080p 30FPS. If the RX 460 which is >50% faster than the Xbone's GPU isn't even hitting 20 FPS average then something is very different with the PC version.
 
Last edited by a moderator:
ooks like Forza Horizon 3 is measuring geometry bottlenecks. Not compute performance, not memory amount, not fillrate, etc. Just geometry, at least at 1080p.
Which is why even the cut-down and lower-clocked Polaris 10 in the RX 470 seems to be keeping up with the GTX 1060 6GB.
So the 970 that's scoring 67% higher FPS than 290x/380x is only ~20% ahead in TessMark? Higher clocked and narrower chips with better cache efficiency seem to be performing better here. So I'm still saying memory is significant.
 
Going to go out on a limb here and suggest memory capacity plays a large factor in that game. 470 just above Fury X, 290x equal to 380x, 960 besting a 780ti. Interestingly there is no 8GB 480 tested. Not sure why they even presented this benchmark considering the results. All it shows is you need >6GB to attempt playing the game.

Most likely. It would have also been interesting to see a R9 390(x) with 8 GB in there. That would have made for an interesting comparison to the R9 290(x). Then again considering how badly 290(x) does on that site's benchs, it might not have illustrated much. But the 8 GB Rx 480 would certainly have been an interesting data point.

It's almost like they deliberately chose not to use any AMD card with more than 4 GB of memory.

Back to the R9 290(x), however. It's shocking how badly it performs relative to the R9 380(x) and Rx 470.

There might be some truth to ToTTenTranz's position that Geometry performance has a large impact on overall performance in the title.

Regards,
SB
 
So the 970 that's scoring 67% higher FPS than 290x/380x is only ~20% ahead in TessMark? Higher clocked and narrower chips with better cache efficiency seem to be performing better here. So I'm still saying memory is significant.
Tessmark at x64 doesn't completely define geometry performance. For example, different tessellation factors get different results between GCN1, 2 and 3:

SquEJXk.png


And performance even changed a lot through driver optimization (AFAIK, it boiled down to figuring out how to optimize vertex reuse in the driver itself):

zluoHzC.png


I find it very hard that the cache efficiency would be single handedly responsible for the RX 470 going above a Fury X.
If it is, it would be an unprecedented case on how extreme it would be.

Regardless, the only things the RX 470 and the Fury X have in common are the L2 cache amount and the number of geometry processors. The Fury X has twice the amount of compute and rasterizer resources of the RX 470, as well as over twice the bandwidth.
 
That makes sense, as the res increases, the impact of tessellation would decrease. Need more tests though to confirm.
 
Adding another data point for judging Tessellation performance of GCN:
http://imgur.com/WwNE39i as per earlier request not a full size image, so you have to click it ;)

Note though, that performance with higher tessellation factor can vary dramatically from driver to driver especially on older GCN generations like the Tahiti chip. From Hawaii going forward, those variations shrink more and more, as driver seems either settled or not to play such an important role anymore.
 
I find it very hard that the cache efficiency would be single handedly responsible for the RX 470 going above a Fury X.
If it is, it would be an unprecedented case on how extreme it would be.
I doubt it's the only factor, but seems to be a rather significant one. I checked some of the youtube comparisons and a 3GB 1060 was getting utterly crushed compared to a 4GB 470. Minimum framerate was ~15fps@1080p and ~7fps at 4k for the 1060. The min/max delta was also huge. An 8GB 480 was also noticeably ahead of a 6GB 1060. So memory followed up by tess factor seems likely here.

The posted benchmarks are arguably ranked by memory capacity and then geometry performance. If it were just tessellation I'd imagine some 8GB models (390X more than 480) get included in that benchmark to bias the results a little more.
 
It seems they have published "HQ" (not VHQ) benchmarks and things are a lot more in accordance to other games:

DdZcBS.png



Though there are still some discrepancies that point to geometry bottlenecks, like Tonga getting awfully close to Hawaii cards (again, Tonga has 4* GCN 3 geometry processors while Hawaii has 4* GCN2).
And the test is especially punishing to Tahiti cards, which doesn't make much sense with Pitcairn being 50% faster.. Memory amount shouldn't be all that important because 3GB GK110 cards are in the same ballpark.

Regardless, what's the deal with same scores getting bars with different sizes, like the 980 Ti vs. Fury X?




That may make sense, but a game could also increase the tessellation level along with resolution to maintain a certain triangle size.

It seems there are two types of geometry quality in the games' settings, static and dynamic:

keDdaQe.png
 
Last edited by a moderator:
Maybe they define 103 and 35 differently.
 
Maybe they define 103 and 35 differently.
Higher bars representing same fps corresponds to higher min fps, 980Ti has 88 min fps while FuryX has 84, so 980Ti gets represented a notch higher.
I checked some of the youtube comparisons and a 3GB 1060 was getting utterly crushed compared to a 4GB 470. Minimum framerate was ~15fps@1080p and ~7fps at 4k for the 1060. The min/max delta was also huge. An 8GB 480 was also noticeably ahead of a 6GB 1060. So memory followed up by tess factor seems likely here.
Some of these comparisons use Dynamic presets, which makes head to head comparisons impossible as the game modifies them on the fly.
 
Last edited:
[strike]Hanlon's & Occam's shaving instruments to the rescue: A typo? These bars do not look like Excel generated them automagically from the numbers in them.[/strike]

Scratch that. They obviously stack the bars, yes.
 
Last edited:
Some of these comparisons use Dynamic presets, which makes head to head comparisons impossible as the game modifies them on the fly.
That still depends on the definition of "Dynamic". If it's tessellating by distance and resolution it should be reasonably consistent and the preferred method. If it's reducing tessellation based on framerate then yeah it won't work.
 
Back
Top