It's a weird concept initially, but you get used to it
The reality is that you can either spend a lot of time trying to "optimize" game API use on the fly and thus ensure that *all* games are going to see that overhead regardless of how "nice" their use of the API code is, or trust the game to be efficient. Ultimately the only right answer is to make a thin driver and let games that make efficient use of the API go as quickly as possible, even if it means dropping the "safety net" for games that do stupid stuff.
The new APIs really just take that same concept to the next level... you absolutely can screw yourself over pretty badly if you don't know what you're doing in D3D12/Mantle, but the experts finally have a path with the training wheels off
I'm not sure. IIRC it was the 15.31 driver that shifted to the new D3D11 UMD so perhaps someone benchmarked that? Power-constrained chips like ultrabooks - especially the Macbook Air chip with HD5000 - will show the largest differences, but going forward it's really all chips. I don't think the "old" driver was ever shipped on Haswell though so you'd have to test an HD4000 ultrabook or something if you wanted to see the delta.
Beyond that I'm not really sure what there is publicly. A lot of the improvements were just mixed in with other game improvements as well so it's not necessarily possible to tease it out without a directed test. Anyways it was mentioned here (
http://techreport.com/news/24600/intel-gets-serious-about-graphics-for-gaming) and I think here (
http://techreport.com/news/24604/performance-boosting-intel-igp-drivers-are-out), but like I said I'm not sure how to tease apart those performance improvements quoted to get the parts that were due to CPU overhead reduction vs. something else.