It's quite simple really, AMD aimed at Mantle & co since dawn of GCN, and hoped it will be good enough for DX11 & co.
NVIDIA meanwhile did everything they could to get DX11 & co perfect, and in the process forgot to think forward, and what Mantle & co could bring to the table.
The process for putting together an industry standard API can be a protracted one, with differing logjams and compromises that generally do not get commented upon by the stakeholders after the fact.
One possible interpretation of the Mantle situation is that whatever ongoing efforts into the successor APIs there were had come to an impasse, and Mantle served as a way to break the impasse by putting an actual lower-level API into the market and drawing in developers.
Nvidia may have seen that a lower-level API would be useful in the future, however it was also coming from a different place where it also had some notable positions of advantage with what was already in place. The trend was that Nvidia was leading in driver resources and devrel, so it could get more of those benefits without ripping out its investments.
It's also possible that what Nvidia wanted, if it wanted a lower-level API, differed more than the other major stakeholders. Mantle or something resembling it would not have been the only way of going about things.
At least some of the performance increases with Vulkan versus OpenGL for AMD look to be examples where AMD was notably underperforming with similarly positioned Nvidia GPUs, so Vulkan's benefit is at least in part that it's bypassing a decent chunk of the traditional AMD driver performance tax, or at least inflicting some of the software immaturity and weak optimization penalties on competing silicon that had gained an insurmountable lead on the old APIs.
AMD's situation is such that even if Nvidia does eventually rectify these problems (there's enough money, talent, and inertia to figure something out), it's could still be a win if the costs for keeping up on the thick APIs was becoming too high for the weaker competitor. At least if AMD becomes second-best at Vulkan and the like, the reduced load on AMD might make that affordable.
Changing to (partially) software scheduling brought great power savings for NVIDIA, but it also lost some flexibility compared to Fermi & GCN to my limited understanding, and there's probably other elements stacking on that, too.
The primary scheduling change that comes to mind for Nvidia is the shift of dependence checking for ALU instructions when going from Fermi to Kepler, taking out of an extra layer of hardware monitoring and encoding it in the instruction stream.
That's below the level of the APIs, and not relevant for asynchronous compute or special operations exposed with intrinsics.
Poorly optimized instruction streams for this can affect the number of stall cycles due to dependences, leading to less effective use of ALU resources.
However, the general case should not present an insurmountable challenge, and there is little evidence that Nvidia is suffering in terms of getting performance per hardware FLOP. It's not like Fermi is even showing up for this particular fight, and Kepler to Maxwell to Pascal shows that they have stayed with this without showing a plateau effect due to this particular architectural feature.
It's not like AMD's instruction scheduling is actually flexible beyond constraining its execution loop so that dependences must resolve prior to the next issue cycle. The impact that might have in anything else that needs to fit into that execution loop or what changes can be made to it might be evidenced by what just happened to the RX 480's power ceiling. Hardware interlocks or decent static dependence checking are not the biggest problems out there.
AMD's choice also not a panacea, as this forum is rife of examples and commentary on how poor AMD's instruction generation is, particularly in the PC space and its non-presence in a lot of the compute space. If Nvidia is experiencing problems with just its GPUs needing hints on what instructions depend on others in a handful of cycles, other pitfalls apparently can make up for it on the other side.