Apple A8 and A8X

It doesn't take too much thinking to understand that a lot of the rendering in a smartphone device suits reduced precision. It's a completely different problem space to desktop GPUs, which are very wasteful in terms of power for a lot of what a smartphone would be asked to do. Being able to throw tens or hundreds of watts at the problem affords you design decisions like FP32 everywhere, but I assure you that is not the case when pJ/pixel is the metric you care about.

Maxwell's design choices fundamentally will not play into the hands of absolute lowest power in certainly incredibly common workloads, no matter how power efficient the core is. You can choose to believe the "FP32 is all that's needed" trope, but the very basics of rendering and how arithmetic logic works all argue otherwise.
 
Remember that I am talking specifically about pixel rendering precision in games. The fact remains that reduced FP16 pixel rendering precision is arguably a significant tradeoff in visual fidelity, and trying to pick and choose when and when not to use this reduced precision for pixel rendering in games is no trivial task either.
 
Last edited by a moderator:
That's my point. F16 native computation is not a tradeoff in quality for the kinds of rendering I'm talking about (RGB or sRGB UI composition mainly). For most of the work a smartphone will ever do, ~10bpp is enough.
 
There is a significant difference in pixel rendering quality between FP20 and FP16 precision (and even FP20 is nothing to be proud of in the first place), so it is debatable that FP16 is "perfectly good enough" for pixel rendering precision. Anyway, modern day ultra mobile GPU architectures with increasingly higher performance and increasingly higher power efficiency should not have to resort to heavily reduced pixel rendering precision by default (Maxwell sure as hell doesn't). That is just my opinion on the subject.

Here's a snip out of the developer performance recommendations for PowerVR:

6.4. Demystifying Precision
PowerVR hardware is designed with support for the multiple precision features of graphics APIs such as OpenGL ES 2.0/3.0. Three precision modifiers are included in the API spec for OpenGL ES 2.0 onwards, ‘mediump’, ‘highp’, and ‘lowp’; lower precision calculations can be performed faster, but need to be used carefully to avoid trouble with visible artefacts being introduced. The best method of arriving at the right precision for a given value is to begin with ‘lowp’ or ‘mediump’ for everything (except samplers) then increase the precision of specific variables until the visual output is as desired.

6.4.1. Highp
Float variables with the ‘highp’ precision modifier will be represented as 32 bit floating point values; this precision should be used for all vertex position calculations, including world, view, and projection matrices, as well as any bone matrices used for skinning where the precision, or range, of ‘mediump’ is not sufficient.. It should also be used for any scalar calculations that use complex built-in functions such as ‘sin’, ‘cos’,’ pow’, ‘log’, etc.

6.4.2. Mediump
Variables declared with the ‘mediump’ modifier are represented as 16 bit floating point values
covering the range [65520, -65520]. This precision level typically offers a performance improvement over ‘highp’, and should be considered wherever ‘highp’ would normally be used (providing the precision is sufficient and maximum and minimum values will not be overflowed).

6.4.3. Lowp (Series 5/5XT Only)
A variable declared with the ‘lowp’ modifier will use a 10 bit fixed point format, allowing values in the range [-2, 2] to be represented to a precision of 1 / 256. This precision is useful for representing colours and any data read from low precision textures, such as normals from a normal map. Care must be taken not to overflow the maximum or minimum value of ‘lowp’ precision, especially with intermediate results.

I was under the impression that mediump doesn't deliver any performance benefit, but in any case these are the cases IMG recommends FP32, FP16 or INT10 for.

You can read the entire thing here: https://github.com/burningsun/pecke.../参考资料/PowerVR.Performance Recommendations.pdf to avoid creating an account at IMG's homesite.
 
As said in the recommendations for developers: "lower precision calculations can be performed faster, but need to be used carefully to avoid trouble with visible artefacts being introduced"
 
And as I'm loathe to repeat, you don't have to be careful with UI composition because you know completely that F16 is more than enough for that. You have to understand that it's the most common workload in the world on these devices (and I'm talking like 99%+ of all the pixels all the smartphones and tablets in the world will ever render using current display technologies).

That the gates are then also usable for power and performance advantages in games is a nice side benefit.
 
Most modern day GPU's built "mobile first" are using some form of lossless delta color compression to dramatically reduce power consumption with UI elements.
 
Most modern day GPU's built "mobile first" are using some form of lossless delta color compression to dramatically reduce power consumption with UI elements.
I don't think that's related to ALU precision.

Conservatively applying mediump in those cases where you perform simple calculations on colour data is easy. Depending on the kind of application that could be just 5% of your shader code, or it could be 90%.
 
As said in the recommendations for developers: "lower precision calculations can be performed faster, but need to be used carefully to avoid trouble with visible artefacts being introduced"


do you mistake 16bit color with 16p?

but between "65536" and "1,844674407 x 10^19" there is a "rather wide" gap
 
I'm surprised that no one has posted this yet:

68003.png

68002.png

68012.png

68013.png

iPhone6_GFXBench_Rundown.PNG


Battery life isn't exactly good but there's no performance degradation at all, such that the iPhone 6 finishes the test with higher performance than anything else (albeit onscreen with a relatively low definition) Shield Tablet included, and longer battery life than the latter.

Not bad for a phone, and a relatively small one at that (compared to, say, the Galaxy S5).
 
I've read the review but I only now noticed the LG G3 scores; if there aren't any mistakes in the relevant measurements it has a rough 48% performance degradation which is completely unacceptable.
 
I'm surprised that no one has posted this yet:

68002.png

68013.png


Battery life isn't exactly good but there's no performance degradation at all, such that the iPhone 6 finishes the test with higher performance than anything else (albeit onscreen with a relatively low definition) Shield Tablet included, and longer battery life than the latter.

Not bad for a phone, and a relatively small one at that (compared to, say, the Galaxy S5).

My take on the performance of both A7 & A8 is that when you take into context each device an it's device specific resolution (ie the Onscreen) tests, they are very fast and (minus physics) perform as well if not better than the competition on the market at the moment, which is some mean feat considering the A7 is now 13 months old and is still up there at the top, beating the normal Galaxy S5 by some distance.
 
I was under the impression that mediump doesn't deliver any performance benefit,
From my testing on IOS hardware its about 20% quicker than highp, so its definitely noticable
 
I've read the review but I only now noticed the LG G3 scores; if there aren't any mistakes in the relevant measurements it has a rough 48% performance degradation which is completely unacceptable.
There's no mistake. The LG G3 has an inconsistent, sinusoidal performance pattern. In this case it was at around the bottom of its curve at the end of it's battery life on this test, so we more or less captured it at its worst.

Josh tells me that the average is probably closer to 15fps, but again that's an average over a very long period of time since the G3 takes a while to transition from high to low.
 
From my testing on IOS hardware its about 20% quicker than highp, so its definitely noticable


Thanks zed; very useful tidbit :)

There's no mistake. The LG G3 has an inconsistent, sinusoidal performance pattern. In this case it was at around the bottom of its curve at the end of it's battery life on this test, so we more or less captured it at its worst.

Josh tells me that the average is probably closer to 15fps, but again that's an average over a very long period of time since the G3 takes a while to transition from high to low.

Thanks Ryan; it's about enough to read about inconsistent performance patterns.
 
Battery life isn't exactly good but there's no performance degradation at all, such that the iPhone 6 finishes the test with higher performance than anything else (albeit onscreen with a relatively low definition) Shield Tablet included, and longer battery life than the latter.

No doubt a great "performance degradation" result, but not really comparable to Shield tablet either in terms of performance and platform power consumption. Shield tablet renders Onscreen at 1920x1200 [1200p] resolution, with significantly higher render precision quality, while maintaining a very stable average fps of ~ 56fps for > 90% of the looped T-Rex Onscreen test run (> 100 continuously looped test runs until the battery % capacity is extremely low). So if you compare that fps result to the iPhone 6+ fps result which is rendered at a more similar (albeit slightly higher) resolution, Shield tablet still has 1.75x higher delivered performance. Of course, this comparison is academic more than anything for obvious reasons.

I think if you look at S805 Adreno 420 performance degradation results, there will be significant performance throttling compared to A8 GX6450. That said, A8 has the advantage of using a more advanced 20nm fab. process node too compared to 28nm HPM for S805.
 
Last edited by a moderator:
No doubt a good "performance degradation" result, but not really comparable to Shield tablet either in terms of performance and power consumption. Shield tablet renders Onscreen at 1920x1200 [1200p] resolution, with significantly higher render precision quality, while maintaining a very stable average fps of ~ 56fps for > 90% of the looped test run (> 100 continuously looped test runs until the battery % capacity is extremely low). So if you compare that fps result to the iPhone 6+ fps result which is rendered at a more similar (albeit slightly higher) resolution, Shield tablet still has 1.75x higher delivered performance. Of course, this comparison is academic more than anything for obvious reasons.

Now try the same exersize but this time with an apples to apples comparison ie a K1 in a hypothetical smartphone and THEN we'll see how high GPU performance could be there, let alone how much of that sustained. Not that there's much chance we'll see any, but hey you never know. Tablets with smartphones aren't comparable for obvious reasons.

WTF has the precision test to do with T-Rex exactly?

I think if you look at S805 Adreno 420 performance degradation results, there will be significant performance throttling compared to A8 GX6450. That said, A8 has the advantage of using a more advanced 20nm fab. process node too compared to 28nm HPM for S805.
Do you have any comparable data from any other device carrying a S805? If no I'd say it's a wee bit daring to generalize a possible LG G3 implementation over a multitude of devices.

Further to that the used process is probably last in line for any comparisons or overgeneralisations; Apple doesn't use large batteries and that analogy has hardly changed for iPhones if you consider how much larger the displays are compared to iPhone5S.
 
There are some results for Adreno 420 here: http://www.anandtech.com/show/8314/galaxy-s5-ltea-battery-life-performance

The Galaxy S5 LTE-A does render at an extremely high Onscreen resolution, so the Onscreen fps is relatively low. And the GPU performance degradation (relative to peak short term performance) is quite severe on all Galaxy S5 variants, whether using Adreno 330 or Adreno 420. In fact, the HTC One M8 (with Adreno 330) throttles quite a bit less than the Samsung Galaxy S5, but it still does have some throttling nevertheless.
 
Last edited by a moderator:
Back
Top