AMD Mantle API [updating]

Well we don't have much choice unless you know of any games being built from the ground up with Mantle?
No, we don't, and that's why we can't draw any conclusions either
Drawing conclusions would be like saying car x is as good as racing car x because doing specific actions on said car is just as fast, while we know the racing car could leave the other car to dust on a racing track
 
This is what happens in titles where the developers optimize equally for both AMD and Nvidia.

DHEyVIi.jpg

Yeah I can see Nvidia really got a huge amount of optimisation in that benchmark where a GTX 770 beats a GTX 970 :rolleyes:
 
It's some engine that was used to tout the high draw call count enabled by Mantle.
The developers want Absolute Freedom for game devs, which means all the batching and state re-use tricks needed to get around the cripplingly low number of draw calls that can be squeezed through standard DX11 that constrain flexibility in adding materials, properties, or effects to objects are to be avoided.

One possibility is that this leads to thousands and thousands of very similar or identical calls, and Nvidia was able to optimize some of the most obvious ones.
Nvidia's retort to Mantle was to provide the DX11 driver set that beat Mantle, and its PR included a set of functions they optimized, as well as a graph that showed how successive versions of Nvidia's driver chipped away at Mantle's lead.

Oxide spoke to the reasonable costs of implementing Mantle. They did not revise significantly beyond the first implementation.
Regardless of how good a tool is, it's tough for a first draft to survive three or four go-arounds by one of the leading optimization teams in graphics.

I would rather the developers getting the game right on day one than having to wait three or four driver optimizations to get a similar experience.

Man-hours are finite and are better spent on new games (developers) and new features (driver team).
 
I would rather the developers getting the game right on day one than having to wait three or four driver optimizations to get a similar experience.
It helps that the engine in question isn't a shipping product, and all they released was a tech demo.

We've seen what the sausage looks like coming out on day one from publishers, and it's been noted that traditional driver updates carry a lot of the quick fixes.
It's not the situation I would prefer, either, but I do recognize that the teams that do this are good at what they do, and as fragile and arbitrary as the bespoke driver optimization paradigm is, I cannot at this time say that the ostensibly superior low-level methods have made things a net positive.

Nvidia, and AMD--I think--have too much on the line over this, and so they have an interest in trying again and again, and as we can see from the Oxide example, they can get it right in a decently small number of tries.
Some of that is motivation, and some of it may come from their central position that accumulates information about all the sorts of problems and mistakes that the fragmented and churning development teams cannot learn from one another or necessarily avoid repeating.

Publishers have apparently found a line far short of this that they are happy with, so I'm somewhat pessimistic as to where the crossover point actually is between the devil we know and the new shiny one.

Man-hours are finite and are better spent on new games (developers) and new features (driver team).
I'm sure there is someone in that mix responsible for quality implementation and useful features, the various sides have been tossing that ball back and forth for a while.
 
I would rather the developers getting the game right on day one than having to wait three or four driver optimizations to get a similar experience.

Man-hours are finite and are better spent on new games (developers) and new features (driver team).
There is a bit of a perverse incentive for the IHV with the largest amount of resources to have games that are not completely optimized. Because they have more manpower to fix it behind the scenes, while the other does not.
 
It helps that the engine in question isn't a shipping product, and all they released was a tech demo.

Yes, but that is the example you gave: an IHV that iterated on drivers until they matched what developer had on day one.

Most big improvements on driver performance nowadays come from bug squashing. There will always be bugs (we aren't naive, ofc), but a huge work for a relative 10% improvement is wasteful.

We've seen what the sausage looks like coming out on day one from publishers, and it's been noted that traditional driver updates carry a lot of the quick fixes.

... when we see this on every recent game from a given publisher, something is horribly wrong.
 
Yes, but that is the example you gave: an IHV that iterated on drivers until they matched what developer had on day one.
They didn't stop until the exceeded it, which shows the different levels of motivation. Why should an end user care if it took more effort?
Has AMD been able to nag them into revisiting it, just one more time? If not, then that is still an illustration of what happens when motivation is not the same between stakeholders.

That's not to say there wasn't an upside to making AMD look bad, since number of the optimizations applied to functions in the API that are used by other games, which benefits more broadly.

Most big improvements on driver performance nowadays come from bug squashing. There will always be bugs (we aren't naive, ofc), but a huge work for a relative 10% improvement is wasteful.
What is the higher cost? Huge work on the part of a driver team or two that leads to good results, or cavalier semi-effort across dozens to hundreds of development teams?
I do not disagree with the utility long-term, but a deeply immature paradigm that will someday be good doesn't automatically win against something that has been tested under fire for many years.
(edit: removed unnecessary "and" in above sentence)

... when we see this on every recent game from a given publisher, something is horribly wrong.
That is the point of what I wrote.
Changing the API does not modify the rest of the ecosystem. It does tie the hands of some of the few entities directly motivated to compensate for it, however imperfectly.
 
Last edited:
They didn't stop until the exceeded it, which shows the different levels of motivation. Why should an end user care if it took more effort?

Because human-hours are finite and some believe they could be better spent on different priorities? Like other games that don't use low-level APIs?

Has AMD been able to nag them into revisiting it, just one more time? If not, then that is still an illustration of what happens when motivation is not the same between stakeholders.

That's not to say there wasn't an upside to making AMD look bad, since number of the optimizations applied to functions in the API that are used by other games, which benefits more broadly.

You should agree that nvidia's effort (motivation) on that tech demo were nothing like their effort (motivation) on your regular DX11 game. So if we are arguing that we should be wary of developers' efforts (motivation) on this one tech demo, shouldn't we put IHV's driver teams on the same standard? It's just one tech demo after all, what about when we have DX12/Metal? Should all IHV demand developers to make a DX11 version so they can spend countless human-hours to have a slightly higher bar on a graph?

There are very talented developers, we could trust them a little more. Although that's just my wish, I understand that if they don't have any responsibility you can't complain about it later.

What is the higher cost? Huge work on the part of a driver team or two that leads to good results, or cavalier semi-effort across dozens to hundreds of development teams?
I do not disagree with the utility long-term, but a deeply immature paradigm that will someday be good doesn't automatically win against something that has been tested under fire for many years.
(edit: removed unnecessary "and" in above sentence)

Right now there are only a handful of games that are actively targeted on driver updates and mostly on the early months (bug squashing), after that we see very small improvements. There's only so much a team can work on, so I'm not sure your comparison holds.

That is the point of what I wrote.
Changing the API does not modify the rest of the ecosystem. It does tie the hands of some of the few entities directly motivated to compensate for it, however imperfectly.

I know.
 
There are very talented developers, we could trust them a little more.
If you look at BF4: it works very well on Nvidia GPUs with DX11. And it works well with Mantle. Yet it performs badly with DX11 on AMD.

I don't think anybody is accusing Dice of not being talented or of running different (low performance) DX11 code on AMD. This shows that even for talented programmers, a lot of optimization is still required at the driver level. This will never go away, not even with DX12, which optimizes for driver overhead on the CPU but will still require the same GPU low level optimizations as before.
 
Because human-hours are finite and some believe they could be better spent on different priorities? Like other games that don't use low-level APIs?
Nvidia already does spend time on other games. Its DX11 performance is considered top-notch.

You should agree that nvidia's effort (motivation) on that tech demo were nothing like their effort (motivation) on your regular DX11 game.
There's a similar motivation for every Mantle game that comes out. Perhaps things cross over once the number of Mantle game releases per month exceeds Nvidia's capacity for optimization targets.

So if we are arguing that we should be wary of developers' efforts (motivation) on this one tech demo, shouldn't we put IHV's driver teams on the same standard? It's just one tech demo after all, what about when we have DX12/Metal?
That was one obvious example, this thread is already covering other examples of games with DX11 performance on Nvidia that keeps pace very well with Mantle.
I gave the example of multithreaded dispatch in Civ 5. It was not a universal fix, but it benefited the gamers with the targeted hardware.
There is going to be a low-level DX12, but also a high-level portion of the API which AMD can ill-afford to ignore.
Should all IHV demand developers to make a DX11 version so they can spend countless human-hours to have a slightly higher bar on a graph?
AMD will so that everything below an R280 doesn't regress by 10-15%, and it will probably need a fallback incase there's another Thief+Tonga+BF4 kind of regression.
Or there needs to be another Mantle driver update, which happens a now and again when a new title comes out.

Right now there are only a handful of games that are actively targeted on driver updates and mostly on the early months (bug squashing), after that we see very small improvements. There's only so much a team can work on, so I'm not sure your comparison holds.
Nvidia's optimization for a subset of DX11 functions that were hit hard on Oxide's Star Swarm demo has applicability elsewhere. Given the quality of game software development, it would either be bugs squashed in the early months versus praying the Ubisofts or EAs of the world get around to salvaging their products.
 
I believe I got my message through.

I would rather the developers getting the game right on day one than having to wait three or four driver optimizations to get a similar experience.
 
I would rather the developers getting the game right on day one than having to wait three or four driver optimizations to get a similar experience.
It's not straightforward to get things right on DX11. DX11 is a black box. Each driver/hardware combination behaves differently. You need to program your code differently to get the same end result on another hardware, as the abstraction level doesn't allow you to directly state what you want. Doing things right on some driver/hardware combination might be the worst thing for another. This is neither a good situation for the developers or the driver teams. Both need to do lots of extra work.

Performance of the low level console APIs is much easier for the developer to understand. I am glad that modern PC APIs (DX12 and Mantle) have manual resource management and resource binding model close to actual GPU binding model. With these APIs you can actually get the results you want. It will be MUCH nicer to work with these APIs. Assuming the debugging and profiling tools are good, I'd expect to see better PC games launch quality.
 
It's not straightforward to get things right on DX11. DX11 is a black box. Each driver/hardware combination behaves differently. You need to program your code differently to get the same end result on another hardware, as the abstraction level doesn't allow you to directly state what you want.

Granted it will be easier to work with more modern API's, but aren't the issues you mention also true with Mantle or DX12? I can't say Mantle benchmarks show the consistency from one performance review to the next, and perhaps the same will be true with DX12?
 
It's not straightforward to get things right on DX11. DX11 is a black box. Each driver/hardware combination behaves differently. You need to program your code differently to get the same end result on another hardware, as the abstraction level doesn't allow you to directly state what you want. Doing things right on some driver/hardware combination might be the worst thing for another. This is neither a good situation for the developers or the driver teams. Both need to do lots of extra work.

Performance of the low level console APIs is much easier for the developer to understand. I am glad that modern PC APIs (DX12 and Mantle) have manual resource management and resource binding model close to actual GPU binding model. With these APIs you can actually get the results you want. It will be MUCH nicer to work with these APIs. Assuming the debugging and profiling tools are good, I'd expect to see better PC games launch quality.

Even if a game isn't at it's top theoretical performance months later (due to continued patchs/driver updates), getting what the developers wanted "at launch" is preferable.

If there's a bug that will yield a 50% performance increase both developers and IHVs will work towards a patch/driver either way, it's that "race to the final 10%" that I contest: IHV want that so they can get that slightly higher bar, but they only have that power at the expense of developer freedom, blindfolding them and potentially hindering games.

What I mean is: we could start praising a little more developers that deliver consistent performance and a little less that slightly higher bar.

I hope we can get a nice common ground and everybody is working on it (Developers, Microsoft, Khronos, Apple, AMD, Intel, Nvidia), low-level APIs are coming.
 
If there's a bug that will yield a 50% performance increase both developers and IHVs will work towards a patch/driver either way, it's that "race to the final 10%" that I contest: IHV want that so they can get that slightly higher bar, but they only have that power at the expense of developer freedom, blindfolding them and potentially hindering games.
This is mainly true for the big AAA games that both interest the press and will be played by big masses. IHVs don't have limitless resources to optimize every single game (there are several thousands of games released every year). Unfortunately a higher level API (such as DX11 or OpenGL 4.3) doesn't allow the small developer to fully optimize their game, as some of the optimization tricks are not available to them (API doesn't give you as low level access as the driver team has). With Mantle, Metal and DX12 the developer has lower level access, meaning that they can do most of the same optimizations as driver teams can without needing to coordinate with them. This is much more efficient for both parties.
Granted it will be easier to work with more modern API's, but aren't the issues you mention also true with Mantle or DX12? I can't say Mantle benchmarks show the consistency from one performance review to the next, and perhaps the same will be true with DX12?
Driver data management should have minimal impact on performance on lower level graphics APIs, as the game program itself does all the memory and data management. The driver doesn't actually even know anything about the memory and data layout anymore (it is basically just a big raw data blob). Shaders and state are also combined/compiled at start up (removing runtime optimization possibilities on state changes).

The low level drivers could still detect shaders and replace them with IHV authored low level optimized versions. Unfortunately shader optimizations are still necessary until all the important modern GPU instructions are exposed by HLSL.

I hope that HLSL will soon expose all the most important instructions in the modern GPUs (such as warp vote, ballot and cross lane operations). OpenCL 2.0 has a very nice platform independent feature set to utilize these GPU instructions, including work group broadcast, voting, reduction and prefix sum (example: https://software.intel.com/en-us/ar...ted-parallelism-and-work-group-scan-functions). Both AMD and Intel support these features in OpenCL 2.0. Nvidia supports similar features in CUDA. This proves that the hardware support is definitely there. We just need HLSL support to fully access all the advanced GPU features (that already available in OpenCL).

There isn't enough public information about Mantle to know whether it exposes some of the GCN specific operations (GCN OpenGL extensions: https://www.opengl.org/registry/specs/AMD/gcn_shader.txt). If all the GCN instructions are exposed, there is no need for IHV shader replacements (assuming the developer did their job properly).
 
If all the GCN instructions are exposed, there is no need for IHV shader replacements (assuming the developer did their job properly).
Heh. I love you guys and all, but even the "best" developers who spend lots of time on a specific architecture are not going to get things ideal for every SKU, and there are obviously still a myriad of architecture-specific settings that are never going to be exposed directly (buffer sizes, etc). And that doesn't even cover the fact that developers are absolutely not going to optimize for a suitably wide range of architectures, especially ones who are particularly focused on consoles.

Most of the time this machinery all works "well enough" with some reasonable defaults, but benchmarks and other high profile applications are always going to get some special treatment. I like it even less than you do, but it's naive to think otherwise.

Still, low-level APIs are obviously a step in the right direction. I just don't expect everyone to give up on app-specific optimization even with the new APIs.
 
I think the best graphics-engine developers are radically more effective with "metal" APIs. In the console space it takes years for a console generation to "bed in" (SDKs getting major revisions every quarter, fixing swathes of bugs and adding gobs of features that have been coming real soon now for too long). So it'll be a while until D3D12 is fully matured.

At least HLSL compilation is "mature". Well, ahem...

So, I think with a long-term stable API in D3D12, which is strongly mirrored by at least one console, we should see a big step-up in graphics-engine quality - both in terms of the sophistication of the rendering algorithms and in terms of game robustness in consumer systems.

I used to hope that 12 would be the final iteration of D3D (been saying this for years). In some ways it really looks like it could be. But I fear it's not ambitious enough, because I'm not convinced it's going to allow developers to write their own rendering pipelines out of pure compute (proper producer-consumer buffering, cached map/reduce, so that it stays on-chip in general). Which is what Larrabee promised (OK, excepting texturing hardware).

I still hate you guys, Andrew, for chickening out on Larrabee. We would be further along to getting rid of stupid painted triangles.
 
And that doesn't even cover the fact that developers are absolutely not going to optimize for a suitably wide range of architectures, especially ones who are particularly focused on consoles.
Exactly, as the Tonga situation shows, BF4 and Thief ran worse with Mantle than DX, even though Tonga was a subset of GCN with not much radical changes.
http://www.anandtech.com/show/8460/amd-radeon-r9-285-review/6

The situation extends to Civ:Beyond Earth too:
http://pclab.pl/art59998-10.html

This is the major gripe I have with the concept of low level APIs, the potential that on newer hardware, older games could suffer, or on relatively old hardware newer games would suffer greatly, Which is a serious issue that takes away some of the advantages the PC platform enjoys over others.
 
Heh. I love you guys and all, but even the "best" developers who spend lots of time on a specific architecture are not going to get things ideal for every SKU, and there are obviously still a myriad of architecture-specific settings that are never going to be exposed directly (buffer sizes, etc). And that doesn't even cover the fact that developers are absolutely not going to optimize for a suitably wide range of architectures, especially ones who are particularly focused on consoles.
NVIDIA and AMD hardware development has stabilized. Both current designs resemble each other quite closely, being similar in-order scalar SIMD designs with similar latency hiding techniques, similar general purpose caches and similar total width (thread count and occupancy charasteristics) and operating clocks. There hasn't been any radical changes lately, allowing developers to optimize for one target to cater for both. ALU:TEX ratio has also stabilized (no big changes there anymore). Float 16 ALU support will of course stir things a little in the future, but both should benefit from the same optimizations.

Historically developers haven't specially optimized for Intel GPUs, but that is certainly changing. Intel is very important nowadays if you want to sell your game to wide audiences. Some developers still don't optimize their games to Intel GPUs or laptops in general but that's just stupid. Obviously it takes developers some time to understand how Intel hardware operates to get 100% out of it. But it seems that Intel hardware design is already quite stable (Broadwell didn't bring radical changes). Obviously bandwidth optimizations are more important for integrated GPUs, meaning that tiled (compute) techniques (lighting, particles, etc) will benefit Intel GPUs more than AMD and Nvidia. But this is already where many developers are heading, meaning that no special code paths are likely needed.
 
Back
Top