AMD Mantle API [updating]

Hmm repi didn't mention how they are circumventing wddm. He also mentioned it's not tied to gcn (interesting...). Other than that, no big surprises.
Yeah... the bit about resources being global and not tied to a context is sort of a red flag for me. Hopefully he can come and clarify what's going on.

BF4 Mantle API deployment is aimed for late December. Time of implementation - 2 months. Larger effort to build the rendering engine to working state, but then much easier optimisation to the final engine state. According to Repi, total manpower time is globally smaller for Mantle than for other PC APIs.
Pretty sure 2 months was the delta of taking a fully complete DX11 engine and writing a Mantle version, not any claims about being able to write one from scratch more quickly.

Rest is mostly what I expected and spoke about earlier in the thread. Not entirely sure why we need binding sets if you have pure bindless but maybe repi can clarify.
 
If Mantle is a thin abstraction layer, then it's impact on performance may not be as profound as we thought it would be. Dice is also developing an OpenGL path for frostbite, a move possibly supported by Valve and NVIDIA, that should bring the recent OGL Draw Calls improvement to all GPUs.

Right now it seems Mantle is going to of a big benefit to APUs , less so to High-End GPUs. It seems AMD wants Mantle to include NVIDIA as well, which is a bit odd.

In fact Johan stated that most Mantle functionality can be supported on today’s modern GPUs from other vendors, and although Mantle is built with GCN-architecture cards in mind, it does not require them.
 
Last edited by a moderator:
I am not sure if this is the right thread for some questions I have regarding the Mantle presentation today. Hopefully repi can answer my doubts.

t3cKdujl.jpg


Didn’t know you couldn’t do something like this in current APIs. So we have to call SetShaderResourceView() every frame even if we are not changing the memory references? Current drivers should be able to detect this though and prevent binding the same resources again and again.


yvjJdqFl.jpg


Similar to above, if you could bind all resources at once then using a material ID to reference the resource (into a resource array?) for the current triangle should be sufficient.


mLTjhEWl.jpg


I couldn’t get the idea of ‘Linear frame allocators’.


yrYqhTal.jpg


Maybe I didn’t understand this correctly, but with shader resource views it should be possible to reuse a render target between multiple shaders. What am I missing?


gFJBH55l.jpg


What is runtime compilation? Is it needed in cases where you have a super shader with host supplied dynamic variables?


uFQDrz5l.jpg


This sounds cool but in what case a GPU would need to skip over already submitted commands? How does this help in the occlusion query optimization mentioned?
 
Right now it seems Mantle is going to of a big benefit to APUs , less so to High-End GPUs. It seems AMD wants Mantle to include NVIDIA as well, which is a bit odd.

In fact Johan stated that most Mantle functionality can be supported on today’s modern GPUs from other vendors, and although Mantle is built with GCN-architecture cards in mind, it does not require them.

AMD is looking for nearly 100% developer uptake Mantle. Making Mantle so easy for Nvidia to get on board, inviting them on board (with a bit of a delay), makes that near 100% developer support doable. The onus will now be on Nvidia, not AMD, to support Mantle.

Wonder if JHH will return the favor and open Gsync to AMD.
 
AMD is looking for nearly 100% developer uptake Mantle. Making Mantle so easy for Nvidia to get on board, inviting them on board (with a bit of a delay), makes that near 100% developer support doable. The onus will now be on Nvidia, not AMD, to support Mantle.
I'd like to hear Johan comment more on this point, but I doubt he'll be allowed...

Regardless, you don't go off and design and API in isolation, then come back to everyone and say "it's on you to support it now!" What if NVIDIA (for example) doesn't like the way it's designed and wants changes? What if AMD releases Mantle 2.0 and puts in stuff that NVIDIA can't/doesn't want to support? This is exactly the same as CUDA... i.e. what motivation could anyone else possibly have to try and support an API that is designed for and controlled by AMD?

If the answer is that they'll open up design to the wider community, then why not have done that in the first place? We'll just form a nice little community of IHVs and developers to design a graphics API... except wait a second, we already have *two* such groups (Khronos and Microsoft). Why not just go through the two groups we already have?

If the answer to that is that for some reason DICE/AMD have lost faith in those standards processess, they should come out and say it. But of course, they won't do that since that would be politically unwise so we have to connect the dots ourselves. It's also a little arrogant to think that somehow they could design and manage such an API and the relevant stakeholders better than DX/GL have done... quite frankly no matter how impressive Mantle ends up being it's a little bit easier to make an API for 1 piece of hardware on one operating system than it is to drive a standard.

I know this is coming off as fairly bitchy but I'm trying to make a point. AMD or DICE needs to speak bluntly to why they are not going through standard APIs and stop tip-toeing around the issue.

To be clear, I think most of the industry is on-board with the changes they are trying to make in driving lower-level APIs. I agree with most of the Mantle design and have pushed for similar directions myself. I have a lot of respect for Johan and AMD for making this happen and forcing the industry to take notice. My divergence in opinion is in terms of the future plan; if the intention is really to get this standardized, then we need the OS vendors and other IHVs involved at a minimum, at which point why don't we just call it DirectX ## or GL ## and go through the groups that we already have established to do these things? If there's a valid reason not to do that, I have yet to hear it.
 
Last edited by a moderator:
Didn’t know you couldn’t do something like this in current APIs. So we have to call SetShaderResourceView() every frame even if we are not changing the memory references? Current drivers should be able to detect this though and prevent binding the same resources again and again.

I think you're misunderstanding what's being shown in that slide. D3D11 works by giving you a bunch of slots that correspond to registers in its virtual shader ISA, and when you bind to a slot the driver does a bunch of magic to put all of your texture data in a data structure that the actual hardware can understand. With a bindless setup, instead of working with slots, you instead directly provide the GPU with pointers that it can follow to find the texture info. The slide is showing a potential data structure that you could set up yourself, where one piece of memory has a pointer to another piece of memory filled with info for more resources.

I couldn’t get the idea of ‘Linear frame allocators’.

I think they're probably referring to a simple allocation scheme where during a frame you keep 'allocating' from a large buffer by advancing a pointer, then at the end of the frame you 'free' all of it once. We do this a lot on consoles for temporary data being written by the CPU for the GPU to consume. In D3D you have no direct access to memory, thus you have no direct control over allocation strategies.

Maybe I didn’t understand this correctly, but with shader resource views it should be possible to reuse a render target between multiple shaders. What am I missing?

What he means is that you could allocate a piece of memory and re-use it for many different purposes. In D3D11 the memory for a resource is tied to the ID3D11Texture2D. When you create that texture, you specify certain immutable properties like the format, the size, number of mip levels, etc. and the driver allocates the appropriate amount of memory. Now let's say at the beginning of a frame you render to a render target, but then you're done with it for the rest of the frame. Then immediately after, you want to render to a depth buffer. With full memory control, you could say 'use this block of memory for the render target and then afterwards use it as a depth buffer'. In D3D11 however you can't do this, you must create both the render target texture and the depth buffer as separate resources. This is also something that's very common on consoles, where you have direct memory access.

What is runtime compilation? Is it needed in cases where you have a super shader with host supplied dynamic variables?

The way that it currently works with D3D11 is that you compile your shaders to D3D assembly, which is basically a hardware-agnostic "virtual" ISA. In order to run these shaders on a GPU, the driver needs to compile the D3D assembly into its native ISA. Since developers can't do this conversion ahead time, the driver has to do a JIT compile when the game loads its shaders. This makes the game take longer to load, and the driver doesn't have a lot of time to try aggressive optimizations. With a hardware-specific API you can instead compile your shaders directly into the hardware's ISA, and avoid the JIT compile entirely.

As for patching, the driver may need to patch shaders in order to support certain functionality available in D3D. As an example, let's say that a hypothetical GPU actually performs its depth test in the pixel shader instead of having extra hardware to do it.. This would mean that the driver would have to look at depth state is currently bound to the context when a draw call is issued, and patch the shader to use the correct depth-testing code. With a hardware-specific shader compiler you can instead just provide the ability to perform the depth test in the pixel shader, and totally remove the concept of depth states.

This sounds cool but in what case a GPU would need to skip over already submitted commands? How does this help in the occlusion query optimization mentioned?

The obvious use is the one they mentioned: culling and occlusion testing. Imagine that the CPU says 'draw all of this stuff', and then the GPU goes through that list and for each one performs frustum and occlusion culling. The GPU can then alter the command buffer to skip over non-visible meshes, and then when the GPU gets around to executing that part of the command buffer it will only draw on-screen geometry.

As for occlusion queries, the main problem with them in D3D/GL is that the data can only be read by the CPU but the data is actually generated by the GPU. The GPU typically lags behind the CPU by a frame or more so that the CPU has enough time to generate commands for the GPU to consume, which means if the CPU wants to read back GPU results they won't be ready until quite a bit of time after it issued the commands. In practice this generally requires having the CPU wait at least a frame for query results. This means you can't really effectively use it for something like occlusion culling, since by the time the data is usable it's too late.
 
If the answer is that they'll open up design to the wider community, then why not have done that in the first place? We'll just form a nice little community of IHVs and developers to design a graphics API... except wait a second, we already have *two* such groups (Khronos and Microsoft). Why not just go through the two groups we already have?

If the answer to that is that for some reason DICE/AMD have lost faith in those standards processess, they should come out and say it. But of course, they won't do that since that would be politically unwise so we have to connect the dots ourselves. It's also a little arrogant to think that somehow they could design and manage such an API and the relevant stakeholders better than DX/GL have done... quite frankly no matter how impressive Mantle ends up being it's a little bit easier to make an API for 1 piece of hardware on one operating system than it is to drive a standard.
It's not arrogant to think you can do something better than someone else. Especially when the existing options were designed for older hardware.

My personal opinion is most standards should be first developed by one entity and then opened to a committee. The creator gets first mover advantage for their effort and the committee can make any modifications necessary to support a variety of hardware. Design by committee from the start isn't efficient.

IMO Nvidia should have done this with Cg and CUDA, but they elected to retain control and thus no other company would support it.

If Mantle proves to be successful I expect it to influence DX and OpenGL.
 
It's not arrogant to think you can do something better than someone else. Especially when the existing options were designed for older hardware.
I don't mean in terms of the API design itself - like I said, I think pretty much everyone will agree that most of the things in Mantle are a long time coming. What I mean is the thought that AMD or DICE can be better stewards of managing the API in the future is somewhat arrogant. If that's not the intention, then we move on to the other situations...

My personal opinion is most standards should be first developed by one entity and then opened to a committee.
Sure, but there's a big difference between coming up with a prototype/proof of concept to support a radical proposal and actually shipping products that target the API before "opening it up". Basically the issue of whether other IHVs can propose modifications to Mantle 1.0 is already closed before it started. That's not the path towards something portable.

If Mantle proves to be successful I expect it to influence DX and OpenGL.
I fully agree, but then the question remains - why do we need Mantle if we can just adopt similar improvements into DX/GL? I still have yet to get an answer to what is wrong with the current standards setup.

To be clear again, I have no issues with AMD shipping something proprietary here to prove a point and influence the standards. I'm all for that. But why on earth should we maintain a third API/set of drivers when we could just integrate the good parts and call it DX ##? DX in particular has no issues with dropping compatibility with older hardware or making major changes to the API (see DX10), so that argument simply doesn't apply.

Ultimately you can't just ignore the OS vendors as well. If you want games to play nice with other applications (and we've come a long way from XP... uses and expectations are far higher), they need to be involved. Bypassing Microsoft and WDDM is not the path to a successful future API.
 
If the answer is that they'll open up design to the wider community, then why not have done that in the first place?

The simplest answer IMO: it's partially a business decision to synchronize the API with the launch of the consoles.

A committee couldn't do it in a timely fashion and both AMD (and the developers) can profit from this. Keeping it in a hush-hush for so much time is also because of this business decision.

I personally expected Mantle to be much more proprietary than it is, but AMD does have a history of working on standards (GDDR and HSA come to mind) so they probably thought they could pull it off in-house with careful expandability design (which was stated to be possible in one of the sessions).

Besides, as you stated twice, most of the developers are probably on board on Mantle design and should be vocal about it, nonetheless, Khronos and Microsoft were late, they didn't provide what the developers wanted and AMD grasped the business opportunity.

As 3dcgi pointed out, there are also technical reasons for doing it closed and then opening it up.
 
I fully agree, but then the question remains - why do we need Mantle if we can just adopt similar improvements into DX/GL? I still have yet to get an answer to what is wrong with the current standards setup.
Sometimes you must prove something to people before they take action. Johan has been asking for API improvements and the timing and business case worked out for AMD to accept the challenge. Maybe Microsoft and Khronos needed proof as they ignored the requests for API changes. Not to mention DX and OpenGL both cater to everyone while Mantle does not.

Some of the benefits of Mantle could have been accomplished with OpenGL extensions, but I suspect they wouldn't have caught on. A new API generates excitement in the opportunity to start from a clean slate and the psychology of this seems to be important.
 
A very simple example of an advantage of Mantle as mentioned in the presentation is that they can release Mantle for Win 7 and Win 8, where MS has tied DirectX to new OS releases (same with 8.1 / 11.2 now again).
 
Sometimes you must prove something to people before they take action. Johan has been asking for API improvements and the timing and business case worked out for AMD to accept the challenge. Maybe Microsoft and Khronos needed proof as they ignored the requests for API changes. Not to mention DX and OpenGL both cater to everyone while Mantle does not.
Like I said, I don't disagree with any of that. Note that I made exactly the same argument about the Haswell DirectX extensions. The difference is that Intel's stated goal is to standardize them through DX/GL, not to permanently diverge the API. What I'm questioning is the slide and claim that in the future we should continue to develop some portable version of Mantle alongside DirectX/GL - that simply doesn't makes sense in my opinion unless there's an absolute failure to integrate the relevant improvements into those.

Let me put this another way... what's AMD/DICE's move if NVIDIA (again, for example) says "sure, let's standardize the Mantle concepts but we'd like Microsoft onboard" and the result becomes a new DX version (or similar for some OpenGL reboot)? Does AMD continue to develop new versions of Mantle along-side that effort? If truly the goal is to get to a portable, standard API, that should be considered the ideal end-goal for everyone, right?

Maybe I'm reading too much into what Johan said but it didn't sound like that was the goal to me. It sounded like the intention - regardless of what the rest of the industry does - is to continue on a separate path with Mantle and develop it as they see fit. Maintaining veto rights (i.e. "well you can change that in DX but we're going to do it our way in Mantle") is no different than a proprietary API. If AMD intends to give up control of Mantle and the rest of the industry wants to accomplish the same result through DX/GL, are they going to happily go along with that?

I'm probably wasting my time typing this since I think it's clear we're not going to get good answers (or even if AMD has thought this through that far). :) Still, I'd love to hear Johan's and/or AMD's position on some of these questions beyond the PR "oh it'll be great and everyone will just adopt it and live in harmony" fairy-land talk. I don't think it's unfair to ask if they would go along with the rest of the industry if the consensus was to drive the change through DX/GL.

Maybe we can at least get some answers to the WDDM questions though. All of this is sort of academic if they didn't make good choices there to be honest :)
 
Last edited by a moderator:
Well, I was hoping to see an apples-to-apples comparison in BF4 between the DX11 and Mantle versions..
I guess that's still a couple of weeks away.
 
This sounds cool but in what case a GPU would need to skip over already submitted commands?

Just imagine you have a queue of commands with a conditional 'jump' command that can depends on previous results. When the front-end would parse it, it would i.e. skip one possible rendering path favouring another one.
This is unfeasible with currently batched drawindirect et al commands, since everything is heavily batched up.

BUT, on the other hand, if you could submit small slices for rendering and put conditionals before their rendering, this would heavily increase the front-end job, but would allow more fine-grained execution (and a better resource usage.
problem is, your draw calls would explode from few thousands to hundreds thousands... but at that point you would have only drawn datapath, with the skipped one making only more gruntwork for the GPU frontend...

imho, with mantle AMD will lead and Intel/NVIDIA will follow.
There is no point into making a committee and getting another DX.
What really matters is that AMD shapes the API to be mostly supported by other vendors.
Committees is like having 20 architects agreeing on a single design inside a room. You dont want to see whatever could result... nor implement it :p

what looks terribly interesting is the possibility, with mantle, to have a desktop Kaveri and a 2xx and see your game i.e. benefit from iGPU due to the i.e. GPGPU work executed inside the iGPU...
Something that would REALLY make gamers prefer AMD CPUs over any Intel one (as long as the CPU will be 'fast enough', but with more gruntwork/sound moved to the front ends, it might).
On gaming notebook, that would *really* makes the difference.
 
Last edited by a moderator:
Back
Top