Apologies for redundancy, but I feel like my earlier questions/comments might have gotten buried in the quote I placed it in.
I still don't understand how the sources in the article supports the claims David is making:
1) I can't find anywhere that states SIMD FP16 capability for Series 6, 6XT or 7. IMG repeatedly refers to the design as scalar (
https://imgtec.com/blog/graphics-cores-trying-compare-apples-apples/) and that the optimization guide states that this allows unencumbered swizzling.
2) The Series 6 instruction set manual, while pretty blatantly incomplete and with some inconsistencies, does specify that 16-bit sub-register addressing can be performed for source and destination (along with broadcast to full register destination) for "supported" operations. There's no list of what these operations, but rcp is given as an example. These modifiers are specified as free, and given separately from the .lp modifier that David references in the article (that zeroes out the non-FP16 LSBs from the source and dest, but doesn't say anything about the exponent range). There are also pack and unpack instructions that appear to be able to convert two FP32 values into two FP16 values (in a single register) and vice-versa. Nothing about this manual suggests it fully applies to Series 6XT (which is attributed to Apple's A8) much less 7 and beyond.
3) Nothing that I can find in Apple's Metal optimization slide deck suggests that there are free FP16 to FP32 conversions for anything other than texture sampling, rather than between ALU ops.
So based on all of this how is it apparent that Apple's shader core is doing anything fundamentally differently from IMG's, much less to the extent that it must be a completely different design?
While Apple's GPU hires have to have been going somewhere, they'd need a lot for driver development (which many of the job titles do suggest), and higher level integration into the rest of the SoC as well as validation (which many of the other job titles suggest). There could also be more work in the pipeline that simply hasn't shown up in their GPUs yet, much less starting with A8.
I get the feeling that there's some level of insider knowledge that everyone else knows or accepts, or I'm just not seeing something bigger.