AMD Mantle API [updating]

pMax · Nov 14, 2013

Davros said:
I'm not too sure, their quality assurance is shockingly bad

A company that produces* CHIPS as complex as an x86 should have a bad QA?
That'd be very strange.

I suppose they are like ALL other companies. Testsuites, regression tests, CI and the like.

Fact is, no matter if you write 800 tests with dynamic parameters for just a module, your code will always ends up with bugs. Sometime (or often) you even have to deal with 3rd party bugs/side effects that would cause YOU to have bugs... bugs have a nice cascading effect sometime.

Especially if it is targeted for a big, competitive market and the stuff you do (aka drivers for AMD) is complex and has to deal with an unpredictable thing like an OS and DX/OGL middle layer with a game on top of it.

*well, WAS producing until it sold its fab's

UniversalTruth · Nov 14, 2013

xDxD said:
ok, if you look at my interventions in this same thread, you will find that from this point of view I completely agree with you, and in fact as a consumer i'm not at all happy with mantle (while recognizing the the flow rate of innovation and being attracted by promises improvements): my only hope is that it will shake the industry for the agnostic api improved and go back to being free to choose the pc components especially for their "goodness" intrinsic, not because they are better than others thanks to "tricks".

I don't understand what exactly you want!

There will still be a choice for you, either running slower nvidia hardware with DX or faster AMD with Mantle...

Svensk Viking · Nov 14, 2013

pjbliverpool said:
Yes, as pMAX says the possibility of running GPGPU code on the integrated GPU for example is very exciting and something AMD has alluded to in a previous slide. It would certainly make AMD CPU's more appealing but then you have the 8 core scaling benefits competing with the iGPU GPGPU benefits. Would be nice to see these advantages extend to other vendors in the form of an open API though. To harness both the CPU and iGPU power of Haswell for example would be amazing!

I don't know tech, is there an inherent advantage in running GPGPU on the integrated GPU? Would it work if you had two different discrete GPUs as well?

I'm mostly thinking that it would be awesome if one upgrades to AMD's new series two or three years from now, but still can reuse one's current HD 7950 for dedicated GPGPU alongside the new card.
That's one thing I do like about Nvidia's PhysX, an old GPU doesn't suddenly become a paperweight the moment you upgrade

pMax · Nov 14, 2013

Svensk Viking said:
I don't know tech, is there an inherent advantage in running GPGPU on the integrated GPU?

...the advantage would be latency, I think. iGPU shares memory with the CPU, so you can take advantage of HSA and share pointers to data for moderately complex GPGPU with a great latency bonus (like i.e. processing an octree or whatever) where also CPU may take part somehow.

dGPU would require transfers through the PCIe in those cases.

Andrew Lauritzen · Nov 14, 2013

xDxD said:
my only hope is that it will shake the industry for the agnostic api improved and go back to being free to choose the pc components especially for their "goodness" intrinsic, not because they are better than others thanks to "tricks".

We're completely aligned on this. Hence why I was surprised to see that Johan seemed to have other plans...

UniversalTruth said:
There will still be a choice for you, either running slower nvidia hardware with DX or faster AMD with Mantle...

Let's be clear... these presentations largely confirmed that the big gains come in CPU overhead, which is unsurprising as GPUs already run with fairly high efficiency in most cases (and would require some big changes to shading languages to improve on that much). On a high end machine with a typical GPU-bound situation, there's not going to be a huge difference unless a developer intentionally sabotages the DX path. iGPUs could of course see larger benefits from reduction in CPU cycles due to the shared power budget.

Overlapping compute and shadow map rendering is probably the biggest potential gain in GPU performance and that might show up in benchmarks. That said, it's sort of a one-time trick as there aren't that many phases in rendering where the shader array is completely unused... you basically have to be rendering depth/stencil only for that to be the case. On power-constrained GPUs it's likely not to be as much of a win as well since during those portions of the frame the shader array can be shut off and the thermal budget gained from that can be applied to raise the GPU frequency. Whether GCN can power gate the shaders at that granularity is unclear though so perhaps they just sit there wasting power regardless in which case it would still be a win

Note that taking advantage of overlapping compute/graphics does not necessarily require Mantle. A clever driver could notice the relevant disjoint dependencies and do the same thing, but it's not clear that going forward you want yet more cleverness in drivers...

One other thing from Johan's slides that might make a difference on the GPU side is one bullet that hinted on exposing the compressed depth/MSAA representations. Since I believe AMD has to resolve multisampled depth buffers before sampling them, this could make them more competitive when using deferred MSAA (such as Frostbite supports). That said, NVIDIA already doesn't take as much of a hit from this as AMD, so I expect it to level that out somewhat if anything.

Moving to pure bindless enables new algorithms and techniques too, but that has been available in GL on NVIDIA for a little while already.

Stuff like the 100k batches tech demo are cool as they potentially enable some new possibilities (although I'd like to see the same attempted with bindless and multidrawindirect first), but it's not clear that you can ship a game that requires that until DX/GL offer similar capabilities. Which, again, should be the end goal here.

Malo · Nov 14, 2013

Davros said:
Do you expect to see a different approach ?

http://www.hardware.fr/news/13445/apu13-amd-mantle-premiers-details.html

Read the section on Execution Model and Multi-GPU

3dilettante · Nov 14, 2013

Andrew Lauritzen said:
What I'm questioning is the slide and claim that in the future we should continue to develop some portable version of Mantle alongside DirectX/GL - that simply doesn't makes sense in my opinion unless there's an absolute failure to integrate the relevant improvements into those.

I'm starting to wonder from the statements of "X has been pushing for this for years" that maybe someone isn't moving to integrate, and Mantle is an attempt to force or bypass them.

Granted, I don't think all the PR about APIs evolving too slowly is entirely fair. Up until recently, I'm not sure GPUs could be trusted with a ball of string, much less broad access to global memory.

Maybe I'm reading too much into what Johan said but it didn't sound like that was the goal to me. It sounded like the intention - regardless of what the rest of the industry does - is to continue on a separate path with Mantle and develop it as they see fit. Maintaining veto rights (i.e. "well you can change that in DX but we're going to do it our way in Mantle") is no different than a proprietary API. If AMD intends to give up control of Mantle and the rest of the industry wants to accomplish the same result through DX/GL, are they going to happily go along with that?

Maybe that assumption breaks down when the #GPU lines<#platforms.
Like I noted earlier, a hardware-linked API isn't quite as bad if the platform holder is breaking the API anyway.
First, there is getting the changes put into the standards, which may involve hashing things out with Microsoft, the IHVs, CAD, mobile, whatever stakeholders.

Then, even if added to the APIs, a game dev has to consider the API support of the platform or instantiation thereof.
Including mobile, laptops, consoles, it's probably a tuple.

Renderer.works
{platform, ODM, OEM, OS, OS revision/fork, API, API revision?, telecom?, user needs to update the above?, user can't update the above?, OEM drivers?, deprecated device?, not deprecated but typical shoddy device support?}

OR
{AMD?,Intel?,Nvidia?,IMG?,Mali???}

willardjuice · Nov 14, 2013

This isn't the place to talk about the quality of EA's QA department.

Jawed · Nov 14, 2013

So, can GCN (or recent version thereof) create work for itself?

3dilettante · Nov 14, 2013

If we go by AMD's description of HSA queuing and Kaveri's role as the initial product, at least for compute it should be possible.
I'm not sure it's as applicable to the graphics domain.

kukreknecmi · Nov 14, 2013

Since Core API is based on AMD_IL, i find it hard for Nvidia to adopt. Not sure if its about "Mantle Driver" or smth else. If its about so called "mantle driver", they should adopt it to pre GCN archs. at first to access more user potential. Before adopting to pre GCN archs, I dont find it reasonable to allow Nvidia to access it. Not sure if Pre-GCN archs prevent smth for developing a "Mental driver" or if its about marketing for GCN cards.

DavidGraham · Nov 14, 2013

3dcgi said:
I don't work in this area and may be naive, but I would think if OpenGL adopts some of the same methods as Mantle resulting in there being < 5% performance difference between the two the business case for maintaining Mantle will go away.

I guess we have to uphold a certain minimum standard for performance improvement to call Mantle a success .. IMO, anything less than 15% and it's a waste .. a trivial upgrade option at best. 15% and up, then it will be a more significant improvement. Right now none of the developers talking about it has much confidence it can reach more than 20%, which doesn't bode well for Mantle's case.

On another note, most of things announced so far are optimizations to facilitate programming and control over writing codes, maybe some reduction in memory footprint, others are concerned about CPU improvements, which only helps CPU limited games, the underutilized Bulldozer CPUs and APUs. Battlefield 4, likely the best case for Mantle for months to come, is not even remotely CPU limited. GPU features seem few and far in between and none of the announced so far seem to greatly impact performance.

Please feel free to correct me when necessary.

Psycho · Nov 14, 2013

DavidGraham said:
Battlefield 4, likely the best case for Mantle for months to come, is not even remotely CPU limited.

Don't look at single player benchies. It IS quite cpu limited, especially if you prefer enough fps over max details. ( this is even in single player: http://gamegpu.ru/action-/-fps-/-tps/battlefield-4-test-gpu.html )

I would say 20% general performance would be quite a lot. For instance it would generally make the $299 280x trade blows with the $499 gtx780. And I would certainly not expect that much in more gpu limited scenarios.
Again, it will benefit the fps junkies more than the detail whores

kukreknecmi said:
If its about so called "mantle driver", they should adopt it to pre GCN archs. at first to access more user potential. Before adopting to pre GCN archs, I dont find it reasonable to allow Nvidia to access it. Not sure if Pre-GCN archs prevent smth for developing a "Mental driver" or if its about marketing for GCN cards.

Part of the point is obviously to raise the minimum feature level, and pre-gcn cards would certainly miss some required features (and the api is likely also exposing features that kepler is missing). For instance the amount of buffer type aliasing you can't do in DX11 (to allow lesser hardware to require different layouts/implementations for different buffers).

willardjuice · Nov 14, 2013

DavidGraham said:
I guess we have to uphold a certain minimum standard for performance improvement to call Mantle a success .. IMO, anything less than 15% and it's a waste .. a trivial upgrade option at best. 15% and up, then it will be a more significant improvement. Right now none of the developers talking about it has much confidence it can reach more than 20%, which doesn't bode well for Mantle's case.

On another note, most of things announced so far are optimizations to facilitate programming and control over writing codes, maybe some reduction in memory footprint, others are concerned about CPU improvements, which only helps CPU limited games, the underutilized Bulldozer CPUs and APUs. Battlefield 4, likely the best case for Mantle for months to come, is not even remotely CPU limited. GPU features seem few and far in between and none of the announced so far seem to greatly impact performance.

Please be happy to correct me when necessary.

The access to compressed msaa data could help BF4 out.

snarfbot · Nov 16, 2013

wouldnt mantle benefit a forward renderer more than deferred? neatly avoid all the issues with msaa and transparencies and still get to use a ton of lights.

anyway i think going forward lol there will be better than 20% improvements in overall performance.

Andrew Lauritzen · Nov 16, 2013

snarfbot said:
wouldnt mantle benefit a forward renderer more than deferred? neatly avoid all the issues with msaa and transparencies and still get to use a ton of lights.

Mantle doesn't solve any of the issues with (pure) forward renderers... you still don't want to render the same geometry more than once.

sebbbi · Nov 17, 2013

The common misconception seems to be that Mantle brings only CPU gains, and only helps with low end CPUs. This is not true.

Here are some examples of potential GPU gains:
Bindless textures and hardware virtual memory (*) allow rendering in larger batches, thus increasing the GPU utilization (GPU partially idles at start/end of draw/dispatch calls). Application controlled memory management means that GPU needs to shuffle less resources around (this isn't only a CPU hit and many current games have frame rate spikes because of this issue). Also the developer can pack resources more tightly (multiple resources in same page/line, increasing memory/cache utilization). With Mantle you can run multiple kernels in parallel (or kernel + graphics in parallel) in a controlled way, and thus reduce the GPU bottlenecks. For example render a shadow map (mostly ROP and geometry setup) and compute ALU heavy pass (for example lighting for the previous light source) at the same time. This results in much higher GPU utilization. Better predicates and storing GPU query results to GPU buffers (without CPU intervention) allow GPU optimization techniques that are not possible with PC DirectX. AMD also claims improvements to indirect draw/dispatch mechanisms, but do not spill the details in Mantle slides (these improvements potentially bring big GPU gains for certain advanced use cases). Direct access to MSAA data could also make deferred rendering AA much faster (more about that in the reply below).

(*) DirectX 11.2 also has partial support for hardware virtual memory (in form of tiled resources). However it has limitations and the Windows 8.1 requirement basically makes the API useless right now (Mantle has much bigger user base right now). Hopefully Microsoft will solve this issue, and bring some other Mantle's features to 11.3 (and/or 12.0).

AMD announced that with Mantle we finally have full manual access to both GPUs in Crossfire. This is excellent news. I was quite worried that SLI/Crossfire would die soon, as many new graphics engines will start doing scene management and rendering decisions on GPU side. Alternate frame rendering (with automatically synchronized memory between cards) is just not a good fit for a scenario where the data set is mutated slightly every frame (by compute shader passes that are pretty much impossible to analyze by automatic logic). AFR works best when everything is freshly generated during a single frame and there are no dependencies to existing data. However this kind of processing is a huge waste of GPU (and CPU) time, and frankly we can do much better (and I believe that forthcoming "pure" DX11+ engines that have no legacy baggage surely will). With Mantle, supporting Crossfire is possible even in these kinds of advanced GPU driven rendering engines. Hopefully Nvidia releases something similar in the future as well, or they will see very bad SLI scaling in some games/engines in the future.

snarfbot said:
wouldnt mantle benefit a forward renderer more than deferred? neatly avoid all the issues with msaa and transparencies and still get to use a ton of lights.

anyway i think going forward lol there will be better than 20% improvements in overall performance.

Deferred antialiasing will be much more efficient, assuming the "Advanced MSAA features" in the Mantle slides means that you have direct access to GPU color/depth blocks and MSAA/layer data (including coverage sample index data). With all that data available, tiled (and clustered) deferred renderers can separate pixels (different sample counts) more efficiently and recover geometry edge information (using coverage samples) in a much more precise and efficient way.

willardjuice said:
The access to compressed msaa data could help BF4 out.

That would definitely give a big GPU boost for a deferred renderer with MSAA (especially with coverage based EQAA/CSAA). Of course estimating the gains is not possible right now, since AMD hasn't yet released full Mantle API specifications, so we don't know exactly how low level access you have to the MSAA & depth/color compression data.

NThibieroz · Nov 17, 2013

Andrew Lauritzen said:
Mantle doesn't solve any of the issues with (pure) forward renderers... you still don't want to render the same geometry more than once.

One of the major features of Mantle is the drastic reduction of draw call overhead. Today a lot of developers are having to make compromises on how many batches they render to achieve their target performance; this impacts their technical vision and can therefore be a factor in the decision of which type of renderer to support.
With Mantle rendering a depth pass to prime the depth buffer in a Forward(+) renderer becomes a completely viable option without running into CPU bottleneck situations.
Another advantage would be to process Forward+ tile culling (or other compute shaders operating on the scene) with asynchronous compute to get better GPU utilization.
I am looking forward to seeing how Mantle adopters will be using the power available to them to optimize their engine once they've had more time to play with the API.

sebbbi · Nov 17, 2013

NThibieroz said:
One of the major features of Mantle is the drastic reduction of draw call overhead. (...) With Mantle rendering a depth pass to prime the depth buffer in a Forward(+) renderer becomes a completely viable option without running into CPU bottleneck situations.

Reduced draw call overhead definitely helps in the cases where you must submit draw calls multiple times. Mantle slides also hint another possibility. With low level access to hardware command buffers, you could just record the draw calls once, add some predicates to pixel shader disabling (allows you to disable shaders on the depth only pass), and instruct the GPU to run the same command buffer twice with different predicates active (reusing same command buffer twice is not possible using stardard PC DirectX). The end result being that you pay zero CPU overhead for the depth only pass.

However the depth only pass still costs quite a few extra GPU cycles, especially if the game uses complex geometry and tessellation (with displacement maps, etc) and/or heavy skinning (some next gen games will use hundreds of bones for human faces).

Lighting cost is also higher with forward+, since the lighting is done in the pixel shader that is ran for triangles (not during a full screen pass). Quad efficiency of modern high polygon games (high quality models + tessellation) can be as low as 60%, meaning that you basically lose 40% of your GPU cycles. In comparison, deferred lighting in full screen pass later has no quad efficiency problems (it's a single full screen quad or a compute shader pass that has ~8x8 pixel granularity).

Forward+ is a good solution if your triangle counts are not that high and you don't use heavy vertex animation or tessellation. In low polygon games (current gen ports for example) the depth only pass is dirt cheap (for GPU), and quad efficiency is often more than 80% (so you lose less than 20% of your lighting performance). But I just don't see it as a viable technique in the future, especially now as Mantle allows low level access to the GPU MSAA data on PC as well (this is a huge gain for the deferred renderers).

NThibieroz said:
Another advantage would be to process Forward+ tile culling (or other compute shaders operating on the scene) with asynchronous compute to get better GPU utilization.

This is a good idea, and should provide similar GPU performance gains as asynchronous compute use during shadow map rendering.

3dcgi · Nov 17, 2013

sebbbi said:
Quad efficiency of modern high polygon games (high quality models + tessellation) can be as low as 60%, meaning that you basically lose 40% of your GPU cycles. In comparison, deferred lighting in full screen pass later has no quad efficiency problems (it's a single full screen quad or a compute shader pass that has ~8x8 pixel granularity).

I understand you're using a generalization here, but many games are bandwidth limited for a significant portion of the time so you're not really losing 40% of the performance. It could still be a significant hit though.

AMD Mantle API [updating]

pMax

UniversalTruth

Svensk Viking

pMax

Andrew Lauritzen

Moderator

Malo

Yak Mechanicum

3dilettante

willardjuice

super willyjuice

Jawed

3dilettante

kukreknecmi

DavidGraham

Psycho

willardjuice

super willyjuice

snarfbot

Andrew Lauritzen

Moderator

sebbbi

NThibieroz

sebbbi

3dcgi

Similar threads