I'm really not familiar with this engine or the way Nvidia optimized for it.
It's some engine that was used to tout the high draw call count enabled by Mantle.
The developers want Absolute Freedom for game devs, which means all the batching and state re-use tricks needed to get around the cripplingly low number of draw calls that can be squeezed through standard DX11 that constrain flexibility in adding materials, properties, or effects to objects are to be avoided.
One possibility is that this leads to thousands and thousands of very similar or identical calls, and Nvidia was able to optimize some of the most obvious ones.
Nvidia's retort to Mantle was to provide the DX11 driver set that beat Mantle, and its PR included a set of functions they optimized, as well as a graph that showed how successive versions of Nvidia's driver chipped away at Mantle's lead.
Oxide spoke to the reasonable costs of implementing Mantle. They did not revise significantly beyond the first implementation.
Regardless of how good a tool is, it's tough for a first draft to survive three or four go-arounds by one of the leading optimization teams in graphics.
Did you miss a word in 1) ?
Several, I suppose.
1) There is more choice and control, and "let's not and say we did" is one choice.
Another point: I suspect that these lower level interfaces will give the developers way more rope to hang themselves.
Or more landmines to blame on the victim when they step on them.
We've already seen that Thief doesn't do Mantle well on Tonga, and that AMD is blaming the developers. I have no reason to believe the AMD is lying, but I would love to know what exactly it is that makes it so chip specific: all other GCN GPUs get better performance on Thief with Mantle.
Should we blame the devs? If they got it right with every other GCN chip, it looks like they were doing what should be expected of them. It seems like Mantle is exposing the reality that removing the barriers to low-level features means exposure to low-level flaws.