Analysis Concludes Apple Now Using Custom GPU Design

http://www.realworldtech.com/apple-custom-gpu/

David Kanter has compared the optimization information provided by PowerVR and Apple and found they are quite different. He concludes that while Apple is still using PowerVR IP for the fixed function hardware, Apple has been using their own custom designed shader cores since the A8. These shader cores are optimized for 16-bit processing with 16-bit register files and data paths. 32-bit data can consume 2 register files and conversion between 16-bit and 32-bit data types is free. Since 16-bit is sufficient for many graphical tasks as well as image processing and neural networks, a 16-bit optimized design should be more power efficient and offer more performance than a 32-bit focused design.
 
Ladies and Gentlemen,

I have a new article up at RWT!

Previously, Apple’s iPhones and iPads used PowerVR GPUs from Imagination Technologies for graphics. Based on our analysis, Apple has created a custom GPU that powers the A8, A9, and 10 processors, shipping in the iPhone 6 and later models, and some iPads. Using public documents, we demonstrate that the programmable shader cores inside Apple’s GPU are different from Imagination Technologies’ PowerVR and offer superior 16-bit floating-point performance and data conversion functions. We further believe that Apple has also developed a custom shader compiler and graphics driver. The proprietary design enables Apple to deliver best-in-class performance for graphics, and other tasks that use the GPU, such as image processing and machine learning.

http://www.realworldtech.com/apple-custom-gpu/

I look forward to the discussion.

David
 
He did a nice job researching some of those details, but his conclusion is flawed. Some of the differences pointed out, like the additional optimizations for 16-bit calculations, are exactly the kind of customizations that Imagination would make for a core customer like Apple.

The reality is that Apple hasn't cut Imagination out at all. They signed a uniquely comprehensive and elaborate licensing agreement a while back that allowed them to be strategic partners in design.

Apple obviously needs to have a strong team of GPU engineers on their side to be able work toward the customizations they want at the pace of implementation they've had. And, with the turmoil that's been happening at Imagination and the migration of many of their engineers from the Imgtec side of the project to the Apple side, Apple may some day truly acquire Imagination's GPU and video processor assets in whole. As for now, though, Imagination is still collecting the licensing and royalties from their GPUs that go into iOS products like they always have.
 
Have there been any tests/benchmarks run between "iOS" & "Android" GPUs based on PowerVR GPU IP, or are we merely reading a pack of suspicions grasped out of thin air?
 
Have there been any tests/benchmarks run between "iOS" & "Android" GPUs based on PowerVR GPU IP, or are we merely reading a pack of suspicions grasped out of thin air?
You can't compare GPU IP between Android and iOS with benchmarks because they run Metal API on iPhones.
 
Crucially the author of the article can have no idea where the split in design is. Have IMG produced ad-hoc special shaders at Apple's request ? Have Apple done the work in complete isolation from IMG, it is impossible to know unless you are involved.
 
You have insider stealth knowledge. The internet demands to know your sources! ;)

Apple has been hiring a lot of gpu hw engineers based in what I have seen on linkedin past couple of years. Even apple recruiters openly search candidates for gpu development and verification. I'm pretty sure somebody motivated enough could dig significant information from linkedin.
 
Last edited:
I think there would be more to the shader cores than the register layout and conversion capabilities if the thread management and instruction issue component count in the shader core category, so perhaps not all of that portion is custom?
On the fixed-function side, I think there are some patents Apple has made for texturing that might point to customizations outside of the programmable portion.

Would the implementation for this register file change from the standard Nx32 file to a 2Nx16 file?
I'm curious if this could be seen if die shots of sufficient clarity were compared between the A8 and higher shader cores and a standard implementation.

As a licensee of the GPU architecture, does this extend to Apple being able to modify instruction semantics?
The instruction compiler reference has some selection and broadcast behaviors handled on a per-register basis, so would that still apply if register N is now N and N+1 (FP16) registers, or is there some kind of renaming of the IDs?
Presumably, that would be where Apple's custom software would fit. Are there other points that make that distinguishable from the compiler using selects and broadcasts to keep 32-bit register halves separate, plus some tweaks to how the values are handled?

One possible test might be seeing if wavefront occupancy changes for different allocations of 16-bit registers versus 32-bit.

edit:
One comment about the implications of Apple's more vertical integration is that from the standpoint of architectural disclosure, Apple can more readily silo its architectural changes and information since it's not trying to sell those to anybody else. That paints a picture of a future that leaves even less to speculate about in threads like these, unfortunately.
 
Last edited:
I'm not sure how closely David Kanter paid attention to Imagination's announcement of Series 7XT Plus:
https://imgtec.com/blog/powervr-series7xt-plus-gpus-advanced-graphics-computer-vision/
... but I think he's confusing implementation customizations for architectural design direction.

Now, Apple tends to be very deliberate in their choice of words when it comes to marketing and presentations, and they made a point at the unveiling conference to claim that the graphics part in the A10 was "a new GPU", a subtle nod to the fact that it's been customized beyond the current generation of PowerVR cores available to the general market. And while that doesn't stop it from still being a PowerVR GPU, this does represent a milestone this time versus previous years in terms of aggressiveness in implementation. It's not so unlike when Apple started taking credit for the design on their iPhone/iPad SoCs, naming them with their "A" Series monikers once they stopped using the fairly standard ARM11 generation CPUs and worked with Intrinsity for their customized implementation of a Cortex A8 that could achieve the higher clock rates (like Samsung's Hummingbird).
 
Posted some questions on RWT, didn't get any takers yet (maybe didn't help that it was in response to the wrong person...) - going to try here in case someone wants to address it:

Interesting read, thanks David.

I'm probably out of my depth, but honestly I feel like I could use more convincing on this. Maybe I'm missing some of the evidence? I'll indicate what seems evident from Apple.

In the datatypes section of the Advanced Metal Shader Optimization presentation:

"A8 and later GPUs use 16-bit register units
Use the smallest possible data type
• Fewer registers used → better occupancy
• Faster arithmetic → better ALU usage"

I could see how this description, at least in a broad sense, could apply to 2xFP16 SIMD. Particularly since this is meant to be a set of guidelines for optimization and not an architecture description. SIMD lanes could, under liberal interpretation, possibly be considered "register units." Stating that A8 and later use 16-bit units doesn't mean that they do so exclusively. Packed 16-bit SIMD will result in fewer registers used and faster arithmetic/better ALU usage when vectorization is successful, and on average it will be at least some of the time.

If it does have bonafide scalar 16-bit register addressing for operations it may be aliased on top of a 32-bit register file, similar to how NEON works in AArch32. In this case the register file would be extended to include isolating the top/bottom half of a register for operation inputs, and updating only one half for writeback. Would this really drive a redesign of the entire shader core?

The part about free conversion is here:

"For texture reads, interpolates, and math, use half when possible
• Not the texture format, the value returned from sample()
• Conversions are typically free, even between float and half"

In this particular context this seems to mean that conversion specifically from the texture sampler is free in most cases, and not just from float to half (hence why that's specified as even being free). This suggests that at least some set of integer to float operations are free, and I doubt this would be the case for ALU ops. And I don't see why they would use this section to talk about anything more than texture sampling.

IMG's optimization guide makes it clear that there's a conversion cost when changing precision between shader operations. It does not as far as I can find specify a cost for sampling conversion or make any format recommendations on this basis. Including a variety of conversion capabilities as part of the fixed function sampler hardware seems like fairly standard functionality, given that using floating point sampler variables with integer format textures is pretty desirable.

If there's more information in the video can someone please quote it? I can't seem to view it except in Safari or through some app...

EDIT: On further reading of IMG's Series 6 instruction set manual, it's apparent that it can perform 16-bit sub-addressing of 32-bit registers through source and destination element selectors. I couldn't find anything indicating SIMD, and in several places Series 6 and onwards are referred to as scalar (including in an explanation for why swizzling is inexpensive).

There are also pack and unpack instructions that should support converting from FP16 to FP32 and vice-versa. This doesn't mean that such conversions are free, but as I originally contended I don't see anything that suggests they are with Apple either.

So I'm not seeing any evidence that Apple's doing something differently from IMG.
 
Last edited:
It would be difficult to not believe apple is doing parts/full gpu themselves based on las few years of openly advertising jobs in linkedin. Here is current snapshot:

https://www.linkedin.com/jobs/apple-gpu-jobs

Of course the design cycles are rather long and it will take apple quite some time to take full control of the gpu.
 
Hey guys just letting you know I merged the two threads together.

Also while David makes some simplifications, he's right in the general sense that Apple gpus are more custom than the public currently believes. Good to see the information finally getting out!
 
Crucially the author of the article can have no idea where the split in design is. Have IMG produced ad-hoc special shaders at Apple's request ? Have Apple done the work in complete isolation from IMG, it is impossible to know unless you are involved.
Apple isn't doing work in complete isolation from IMG as the Apple engineers talk to IMG, but Apple doesn't want their customizations to get provided to other licensees. The best way to do this is to make the changes yourself. Apple has hired enough GPU designers over the last 4-5 years that they are surely doing more than customizing the shader core.
 
Hey guys just letting you know I merged the two threads together.

Also while David makes some simplifications, he's right in the general sense that Apple gpus are more custom than the public currently believes. Good to see the information finally getting out!

Anandtech pointed at the possibility of a custom GPU design from Apple before him, so it's not that with all due respect to David that he "found" something no one else did, or mentioned it "first". If there are any sort of custom cores developed for Apple it didn't just start with the Apple A8, it did two generations before that with the SGX554, which wasn't coincidentially licensed or implemented by any other of IMG's partner and I even recall back then hearing in the background that IMG didn't even have its own drivers developed for it.

http://www.anandtech.com/show/10685/the-iphone-7-and-iphone-7-plus-review/4

However between having a big and valuable partner like Apple and at every call from them to jump the only other possible answer for IMG was/is "how high" and calling everything ones gut feeling tells him to as " Apple (mostly) custom cores" is a magnitude of distance. I don't know either what's going on exactly and I won't exclude the possibility of Apple developing a custom GPU core based on IMG GPU IP eventually (if they haven't already), but the lack of real evidence heck even indications for the given case that since A8 they are all custom cores developed by Apple sounds like too sluggish to me.

From the David's writeup, last paragraph:

Going forward, Apple has three options. The status quo is licensing some or all of the fixed-function hardware from Imagination Technologies to complement internally designed components such as the shader core. In this scenario, Apple might eventually upgrade to a newer version of PowerVR, but presumably while negotiating a better deal for licensing and royalties. A second option is simply buying Imagination Technologies, although that would come with considerable extra baggage (e.g., the MIPS processor line) and Apple already passed on this opportunity earlier in 2016. On the other hand, Apple could continue to customize more and more of their GPU – eventually designing out Imagination Technologies. Ultimately Apple will have to decide whether they can do a better job on their own, but so far the company seems to be excellent at developing world-class expertise in new areas.

So what would it mean if IMG's foreseeable future GPU IP royalties will not shrink in any justifiable fashion per unit sold?

What's interesting is that Apple are now recruiting GPU enginers in London. It seems to be a brand new team.

Interesting; thank you.
 
Last edited:
I don't have any insider information about the mobile GPU industry. I am wondering why you believe this is a custom Apple GPU instead of simply PowerVR Series 8? Series 8 was already announced a while ago. Wouldn't it be reasonable to believe Apple is the first to deploy a full sized Series 8 GPU?
 
I don't have any insider information about the mobile GPU industry. I am wondering why you believe this is a custom Apple GPU instead of simply PowerVR Series 8? Series 8 was already announced a while ago. Wouldn't it be reasonable to believe Apple is the first to deploy a full sized Series 8 GPU?

Problem being that IMG has announced only Series 8XE (which is low end GPU IP) so far: https://imgtec.com/powervr/graphics/series8xe/ There hasn't been any announcement for Series8XT, albeit they have mentioned it in public presentations from what I recall. We don't even have a clue what 8XT would stand for (even in rough lines).

How come so far no one has bothered to compared A9 to A10 die shots? I might be bad as a layman with die shots, but from what I can tell I haven't seen any major differences for their respective GPU blocks (it's 6 clusters in both cases).
 
Back
Top