Managing top-end performance all around against devices well beyond its size class -- doing it without pushing toward the diminishing returns of high clock speed and core count nor necessitating a major foundry process advantage -- shows Apple's head is still in the right place with the A7's design. Once my performance expectations were readjusted to account for the limited thermal/power profie of the 5s's casing, I was really able to appreciate their accomplishment here.
None of the details uncovered so far in the analysis and investigation of their new hardware helps me to fully account for the oddities in their graphics benchmark performance: it has the pixel fill rate of a 433 MHz G64xx by my understanding and estimation but nothing resembling twice the performance of MediaTek's G62xx solution outside of that aspect of performance. To me, it almost feels like some kind of in-between variant of the two, like an SGX535 type situation with the ALU count of the lower configuration and the TMU count of the higher config.
Maybe other implementation details of the reference design of those cores, like clock rate or buffer size or memory interfaces, account for the discrepancies. Or, maybe, as was suggested to me in a bit of intriguing speculation, Apple didn't strictly implement one of the PowerVR reference designs in this situation. I have no doubt that the designs are Rogue based (GPU design takes an exeedingly high degree of specialization to produce competitive results... Apple won't be going solo on design for GPUs), but maybe they customized some stuff or used some kind of unannounced variant, like say a G6235. Really, there's just a lack of information to go on right now.