I think it really isn't all that bad yes GCN loses out to some of the more efficient VLIW parts in terms of perf/transistor especially at low to medium (up to Barts) performance class (but not too badly really), but against Cayman for instance there's no such disadvantage at all (using Pitcairn of course for comparison, not the much less efficient Tahiti).Of course. VLIW had to go but I think mainly because it was poorly suited for getting into markets besides games.
GCN in a APU will be interesting because of the higher transistor count per performance level than VLIW. In the mobile market with phones and such NV is even still using non-unified architectures because of transistor count AFAIK.
Also even at the lower end, the Mars vs. Whistler Benchmarks (HD8790M vs. 7670m) were pretty convincing - sure the former has 30% more transistors but it is also a lot more faster than that (granted if you'd scale back the clocks to the same levels it would probably be no longer spank it but still if there's any disadvantage in perf/transistor it has to be pretty small).
So GCN should rock in APUs. Especially since you can leverage for compute which at least some of the stuff you can do there isn't limited much by memory bandwidth.
(And yes it's true nvidia doesn't use unified designs for tegra, they are dead last to switch to that there, but I guess the biggest advantage it gets them is by using less accurate pixel shaders, fp20 saves a lot on datapaths, makes for way smaller multipliers etc.)