AMD Mantle API [updating]

Why get hung up on that? Other sites went and played the game for a good number of minutes and showed their findings. I thought we were all trying to get away from pre-canned repeatable benchmarks which are known to be optimized by IHVs. What I want to see is a 30 minute session in a 40+ player populated map focusing on frametimes. Is that unreasonable?

There's a desire to have benchmarks that aren't being specifically optimized for.
Programs that don't give repeatable results are random number generators, not benchmarks. Finding methodology errors or unforseen circumstances becomes an intractable problem because so much of the experimentation is no longer controlled.
It becomes uncertain if you are benchmarking the card, drivers, netcode, player choices, and ISP.
Can it be done to give a sort-of okay answer? To a certain degree, maybe. The percentages in question aren't always big enough to show very well, but that is a data point as well.

With sufficiently large data sets (multiple long runs, statistical analysis) and some kind of standard player script, it might give something more quantitative, but does that sound like something that is going to be practical for most sites?
The in-house tools that simulate server load eliminate a bunch of the unknowns, but it doesn't sound like those are coming out.
It doesn't seem likely that the online service would appreciate players faking traffic for their own benchmarking purposes, either.
 
Just to point out couple interesting numbers here, both PCLab.pl and PCGamesHardware happen to have a test with the same multiplayer map (Siege of Shanghai), same hardware (i7-4770K and R9 290X), and same settings (Ultra @1080P).

In DX, both have similar frame rates that form a pretty good baseline:

PCGamesHardware: 63
PCLab.pl: 64.5


In Mantle, PCGamesHardware's nearly doubled, while PCLab.pl's even below GTX 780 (83.8)

PCGamesHardware: 121
PCLab.pl: 81.5


Unless there are more data, I would take all these MP numbers with grain of salt, which brings up a question... not to look down the hard works by AMD, Frostbite's repi@company, and DICE, but perhaps BF4 isn't a great title to showcase Mantle, for the first impression at least. I am thinking an open world, singleplayer game like GTA 5 or Watch Dogs that can show separate (or both) CPU and GPU limited scenarios with built-in timedemos that became popular in Quake days for consistent test bases. If IHV can optimized these in-game timedemos, that also mean they also have to optimize the whole game, which isn't all that bad I think.
 
There's a desire to have benchmarks that aren't being specifically optimized for.
Programs that don't give repeatable results are random number generators, not benchmarks. Finding methodology errors or unforseen circumstances becomes an intractable problem because so much of the experimentation is no longer controlled.
It becomes uncertain if you are benchmarking the card, drivers, netcode, player choices, and ISP.
Can it be done to give a sort-of okay answer? To a certain degree, maybe. The percentages in question aren't always big enough to show very well, but that is a data point as well.

With sufficiently large data sets (multiple long runs, statistical analysis) and some kind of standard player script, it might give something more quantitative, but does that sound like something that is going to be practical for most sites?
The in-house tools that simulate server load eliminate a bunch of the unknowns, but it doesn't sound like those are coming out.
It doesn't seem likely that the online service would appreciate players faking traffic for their own benchmarking purposes, either.


Developers themselves talk in terms of frametimes not bullshit FPS numbers. So I must wonder, what happened to the holy crusade to educate and steer the world in that direction? Mantle is pretty much the poster case where you'd want to see testing done that way. Or do we have to wait until Nvidia urges some websites to do it because they have an advantage in it? :LOL:
 
Frame times when the player is staring at the wall and half the other players are picking their noses, or when 10 players manage to fire an RPG into the benchmarking player's face while the engine's netcode starts lag compensating?
 
Developers themselves talk in terms of frametimes not bullshit FPS numbers. So I must wonder, what happened to the holy crusade to educate and steer the world in that direction? Mantle is pretty much the poster case where you'd want to see testing done that way. Or do we have to wait until Nvidia urges some websites to do it because they have an advantage in it? :LOL:

Here too, [H] repeated playing a game instead of just benchmarking them ad nauseum and were pioneering this. since, after all else, in BF4's case, the SP portion does not matter.
 
Developers themselves talk in terms of frametimes not bullshit FPS numbers. So I must wonder, what happened to the holy crusade to educate and steer the world in that direction? Mantle is pretty much the poster case where you'd want to see testing done that way. Or do we have to wait until Nvidia urges some websites to do it because they have an advantage in it? :LOL:
You sound pretty confused here gkar... not sure where to start.

Do you guys even play Battlefield 4 at all? Anything done in multiplayer even over long play sessions would just be completely unreliable as a way of measuring repeatable performance.

Programs that don't give repeatable results are random number generators, not benchmarks.
Haha, well said. Regardless of how much good or bad Mantle does in BF4 multiplayer, it is not a usable case for analysis by the tech press without better (engine) tools. Single player and other games will have to fill that need.
 
I have the feeling that gkar1 would be ranting against multi-player benchmarks if the results didn't match his convictions. ;)
 
Unless there are more data, I would take all these MP numbers with grain of salt, which brings up a question... not to look down the hard works by AMD, Frostbite's repi@company, and DICE, but perhaps BF4 isn't a great title to showcase Mantle,

You may be right.

There's a bit of a chicken and egg problem with Mantle. Games written for the more widly used D3D11 are going to avoid over-taxing the CPU so adding Mantle support only provides a modest improvement.

On the other hand, anything written for Mantle that takes advantage of the extra CPU time is going to challenge a D3D11 system, so there's a inclination to avoid writing such in the first place as the market for the product might be smaller.

As a result, we only have games out there written to use less CPU and this doesn't fully show what Mantle can do.

StarSwarm might be the only good example, but even then Oxide mistakenly disabled some multithreading during the released build. A more recent build has enabled this multithreading, though I haven't seen a recent benchmark.

It's possible we'll end up with two different sets of CPU requirements, one for NVidia cards and one for GCN cards -- e.g. min Corei7 + GTX or Corei3 + Radeon.
 
Last edited by a moderator:
Just to point out couple interesting numbers here, both PCLab.pl and PCGamesHardware happen to have a test with the same multiplayer map (Siege of Shanghai), same hardware (i7-4770K and R9 290X), and same settings (Ultra @1080P).

In DX, both have similar frame rates that form a pretty good baseline:

PCGamesHardware: 63
PCLab.pl: 64.5


In Mantle, PCGamesHardware's nearly doubled, while PCLab.pl's even below GTX 780 (83.8)

PCGamesHardware: 121
PCLab.pl: 81.5


Unless there are more data, I would take all these MP numbers with grain of salt
There is also data of golem.de, i7-3770K, 290X, Win8.1 (instead of Win7 as PCGH used), Siege of Shanghai, Ultra setting with FOV 90°:
DX11.1: 63.8 fps
Mantle: 112.2 fps

Appears to be pretty consistent with the PCGH numbers, considering the slightly slower CPU (lower fps with Mantle) and the slightly faster DX11.1 path of Win8.1 compensating this under DX. And they also ran different resolutions, so one sees that it is pretty much CPU limited up to 1080p (without MSAA).

01-battlefield-4-siege-of-shanghai-chart.png

By the way, they also described their procedure. They always take the same route and average five runs excluding outliers.
 
Last edited by a moderator:
From my experience with BF4 MP, simply taking the same route is not enough, a simple heli or jet explosion/crash half a mile away up above could mess with your fps, same thing with boats, grenades and rockets explosions, .. vehicles passing by, getting suppressed, squad mates spawning on you, getting shot at with bullets hitting stuff around you (bullets impact has smoke shadows now) .. and so many other things.

I would advice testing MP in an empty map at first .. taking the same route on a vehicle at first .. then on foot, while firing at some fixed targets in both cases.

Then I would advice getting to spectator mode on any busy Team deathmatch map, get to a nice high position over the map, as you get a good look at the small sandbox beneath you with many people gunning each other down you start recording frames.. get as many recording as you can, until you establish a general statistical trend (ie, Mantle is 20% faster than DX in 70% of the runs) .. then you call it a day.
 
i dont get why people are so hung up on this, we dont need to re-invent the wheel. use standard statistical models with sufficiently large enough sample size ( you can work this out as you go based on the normalisation of variances). Hell a bell curve of frame times would give a good level of accuracy.

I dont see why you would want to remove normal variance from your "realworld benchmark", you just want to remove the outliers, while having a consistent base.


if someone wants to send me the hardware i'll happily do it :p:LOL:
 
There is also data of golem.de, i7-3770K, 290X, Win8.1 (instead of Win7 as PCGH used), Siege of Shanghai, Ultra setting with FOV 90°:
DX11.1: 63.8 fps
Mantle: 112.2 fps

Appears to be pretty consistent with the PCGH numbers, considering the slightly slower CPU (lower fps with Mantle) and the slightly faster DX11.1 path of Win8.1 compensating this under DX. And they also ran different resolutions, so one sees that it is pretty much CPU limited up to 1080p (without MSAA).

01-battlefield-4-siege-of-shanghai-chart.png

By the way, they also described their procedure. They always take the same route and average five runs excluding outliers.

Thanks, Jawed also posted golem.de way back, I missed it.

Self-correcting the numbers I quoted from PCGH, 63 and 121 are 720p not 1080p. The actual settings are on the left above labels, not at the title of the chart.
 
use standard statistical models with sufficiently large enough sample size ( you can work this out as you go based on the normalisation of variances). Hell a bell curve of frame times would give a good level of accuracy.
You have clearly never played or benchmarked BF4 multiplayer... it's completely impractical to expect any sort of convergence over practical time periods or player behaviour. Hell the game is probably patched faster than you could get any sort of reasonable result.

Performance just depends a hell of a lot more on what you and the other 63 people are doing than what graphics API is in use. You'll never successfully de-tangle the two without better control. Even if you did to your satisfaction, you're not going to convince me that you have :)
 
The large number of players may actually help as their actions average out to some extent, in some sense it's inherently a sample size of 63 for a single run. ;)
 
The large number of players may actually help as their actions average out to some extent, in some sense it's inherently a sample size of 63 for a single run. ;)
Definitely not the case in practice :). There's a huge variety to what happens depending on the players and actions in the game. Even just simple stuff like if and when the levolution stuff gets triggered (which someone could maybe control for...) to much more dicey stuff like how much people feel like camping out areas, relying on certain vehicles and weapons, etc. It doesn't all average out at all... you have games at both ends of the scale from crazy non-stop action to sniper duel stalemates and everything in between.

You're just not going to convince me that any amount of "random sampling" (as best as possible) is going to control for the vast array of variables. If someone is serious about this you'd need a few layers of network recording/playback/simulation I think.
 
You have clearly never played or benchmarked BF4 multiplayer... it's completely impractical to expect any sort of convergence over practical time periods or player behaviour. Hell the game is probably patched faster than you could get any sort of reasonable result.

Performance just depends a hell of a lot more on what you and the other 63 people are doing than what graphics API is in use. You'll never successfully de-tangle the two without better control. Even if you did to your satisfaction, you're not going to convince me that you have :)
Reviewers could get 63 of their friends to do the exact same thing in each run. ;)
 
You have clearly never played or benchmarked BF4 multiplayer... it's completely impractical to expect any sort of convergence over practical time periods or player behaviour. Hell the game is probably patched faster than you could get any sort of reasonable result.
clearly
http://bf4stats.com/pc/itsmydamnation
http://bf3stats.com/stats_pc/itsmydamnation
http://bfbcs.com/stats_pc/itsmydamnation
http://www.bf2stats.net/player/59385128/

never played BF in my life

Performance just depends a hell of a lot more on what you and the other 63 people are doing than what graphics API is in use. You'll never successfully de-tangle the two without better control. Even if you did to your satisfaction, you're not going to convince me that you have :)

You don't have to, stats doesn't try to. Your using the fact that over a sufficient amount of play time ( obviously you need to set constraints) "events" that caused spikes will become normalised within the distribution. You could then use some simple measurements of the collective frame times mean, mode, median, range , standard distribution, log dist etc etc.

if you really want to try and keep your data comparable as possible just get two + people to constantly stay next to each other ( sniper squad on roof top...lol). Sure its hardly perfect, but it doesn't need to be, take enough data ( which should be easy now that it can be logged in BF4) which honest isn't that hard on maps like zavod ( so long as the rounds aren't long enough to destroy all the trees). Hell if you are concerned about that you could just use data from 5 mins in to 15-20 or something like that to minimise the change in environmental conditions and give enough time for everyone to load and start playing.


edit:

its actually funny, currently my play time matches my order of fav BF games, Bf2 then BC2 then BF4 then BF3................
 
Last edited by a moderator:
There only way to get a semblance of convergence on BF4 in a live multi-player environment is by playing hours for each configuration. And even then with similar game play. That may be enjoyable for the reviewer if he's into that kind of game, but it will somewhat kill his writing productivity.
 
Back
Top