In a very odd position. Because what they lack in raw performance and dedicated memory bandwidth, they usually make up in entirely different domains, by CPU-GPU latency and zero-copy HSA features.
Which makes your second question (which you edited away) not as trivial as you might think. It still depends on how soon the different HSA initiatives will pick up, cross platform that is. Currently, it's still rather unintuitive to work with such a platform, from a developers point of view. It's getting better, but we are not quite there, as you still need to compile your application with at least 3 different compilers if you want to cover all 3 vendors. Cross vendor setups are messy, to say the least, and require nasty abstraction which effectively undoes the recent improvements.
Using a heterogenous architecture per DX12 would be possible, but probably not as efficient as you might think, respectively not in that way. Take CR for an example, it allows a number of smart tricks, but your IGP simply lacks in raw throughput to make it worth it. Where you do profit though, is if you offload portions onto the IGP which require frequent synchronisation with the application, as this becomes a lot cheaper in terms of latency, compared to the dedicated GPU.
The real question is though:
Is it worth to optimize for heterogeneous architectures? IHMO, it's not, at least not until you are developing a major engine which reaches a sufficient number of systems which have just such a configuration. And even then you have to evaluate if there are any tasks you can safely offload, without running into other limitations. Effectively, you are probably going for an horizontal cut of your render pipeline at a few predetermined breaking points, based on raw performance, but not based on differences in capabilities or timing characteristics. Simply because you can't account for the latter ones to be fulfilled by ANY device in an average system.
Sorry for the editing, was tthinking it is a bit too early for this type of question he he.