Two notes of caution here :-
Its well known in the industry that ARM has a track record of conservatively estimating their core sizes, the PowerVR guys can be a little more errr, "creative" shall we say.
Similarly, don't just take it as read that the performance numbers are correct. I worked with one of the chips implementing the original MBX and it was nowhere near the performance envolope stated in their material. Remember SGX is a unified shader - Ask yourself are they quoting SGX peak fill rate with the core 100% dedicate to fragment processing ? Similar question goes for vertex processing...
Lies, lies, damn lies and GPU marketing material and all that.
On the power consumption front there are a number of variables to take into account.
Total power efficiency for the GPU core will depend on the number of gates in the core, number and area of the RAM's in the core. How many of those are active at anyone time and (this is the key bit you've missed so far) the amount of external BW consummed by the GPU core.
Not to trivialise it though the gate/RAM area is a big issue without power gating. Sub 65nm static power consumption through leakage is a big deal, so the SGX would seem to have the edge over Mali there, however, if the utilisation of the core is 100% during a rendering phases then there is no/limited opportunity to power gate (you need to keep the gates powered up to do the work) and this is where the SGX gets let down.
SGX being a unified shader architecture its compute core is shared between vertex and fragment processing (which inccidentally is probably why its smaller). It attempts to load balance using some hoopy hyper threading system, this will likely have the effect that the core is active a lot more of the time meaning agressive power gating really may not buy you that much. Mali has the advantage that MaliGP can be completely power gated after its finished processing (thats about 30% of the architecture powered off). Thats gotta be worth something!
Another factor that plays in here is the number and size of the RAM instances in the design. I don't know the in's and out's of the implementation of SGX (I haven't seen any die shots I can analyse), but to keep a hyper threaded unified shader architecture fed they probably have a big ass cache RAM to context switch in and out of to keep the thing ticking over. That's gonna cost big on the power consumption front. As long as the core is active that RAM needs to stay powered up.
Mali has some neat (and patent protected) tricks up its sleave in that regard. It doesn't have any context switch overhead thanks to a nifty trick of carrying the context with each thread. This means they have little or no pipeline flush overhead and no need for a munging great cache to store it.
Last thing you need to take into account is the memory bandwith consummed by the two cores. External memory banwidth to DRAM consumes stupid amounts of power and nothing hammers the crap out of memory quite like a GPU running 3D graphics.
I attended a tech seminar (come to think of it I think it was an ARM one) were they talked about external DRAM accesses being 10x the cost of internal computation in some cases. While I'm not sure I buy 10x, even 2x would be a significant effect and reducing the bandwidth used by a GPU would make a significant difference to overall power consumption.
I've heard ARM make some pretty bold claims about Mali's BW reduction techniques. Whilst I don't have any first hand experience to confirm or deny those claims. I am told by trusted sources that they are on the level though and they do have an advantage compared to SGX with real world work loads. Enough to offset the size difference? I can't say, but interesting to note.
So whats my point? The above are just a few obvious things that I've obsereved which tell me its impossible to make an apples for apples comparison of the two based on publicly available data. We are only getting a tiny glimpse of the whole picture.
Going on experience I'd say PowerVR will over promise and under deliver on the SGX, but they'll sell a bundle of them anyway and so we'll suffer more mediocre graphics experiences on handsets for another generation. ARM are winning designs away from PowerVR however, so there must be something in this that's making sense to some big names.
As for the Mali400 MP, I think is a very poorly thought out product. If you are going to introduce a multi core scaleble product why the hell not scale both fragment and vertex shader cores. This smacks of something nailed together in a hurry to meet some spurious customer request if you ask me (wonder if thats anything to do with them loosing one of their key strategic technical people earlier in the year...).
Anyway lets hope they get more of a clue with the next one and give PowerVR a real run for their money.