Due to geekbench, iPhone 8 gets 2800 single thread and 3800 multiple threads in lower mode. It seems that A11 uses one Monsoon and one Mistral in lower mode. Would anyone like to test vfp benchmark in lower power mode?
 
Another app named CPU DasherX can be used to check basic information of cpu
 

Attachments

  • 17A169B6-F7BE-4FE5-92CA-C2A1E607B04B.png
    17A169B6-F7BE-4FE5-92CA-C2A1E607B04B.png
    299.3 KB · Views: 37
checked iphone 7 plus cpu information with CPU DasherX and the result near Vitaly Vidmirov app.
90txe8.png
2hh41nm.png

low power mode
oucf8l.png
x43iiw.png
 
For further details about these features, please check the Metal 2 documentation. A11 makes many significant performance improvements to the GPU. It has up to 2X math performance when it comes to tasks for computer vision, image processing, and machine learning. But that is not the only area of improvement for performance. Let us review the improved performance and capabilities of A11 GPU. We doubled F16 math and texture filtering rate per clock cycle when compared to A10 GPU.
Please note: On A11, using F16 data types in your shaders when possible makes a much larger performance difference.
source:link(transcript section)
if A10 GPU 6 cluster FP16 is equal to PowerVR 7XT(+) 6 Cluster FP16 = 768 FP16/Clock and 12 texels/clock,
then A11 GPU have 1536 FP16/Clock and 24 texels/clock, same as A10X GPU but with more low clock because the texture fill rate
is not same based on gfxbench texturing offscreen maximum value(17002 vs. 21261 MTexel/s)
no information regarding FP32 and FP16 ratio. my guess A11 GPU clock is around 800MHz.
 
Last edited:
Finally I've bought iPhone8, so benchmark iteration time will be shorter.
I'm still struggling with stable autodetect. Threads are jumping between cores like crazy.

Repo is updated with WIP commit, just in case.
~6 seconds measurement time should be enough even with high thread count (otherwise zeroes are displayed)
"id" is a core signature based on issue width.

Updated numbers for # of threads (1-6):
2376*
2304 2304
2304 2304 1680
2304 2304 1572 1572
2304 2304 1572 1572 1572
2304 2304 1572 1572 1572 1572
(measured on iPhone8, iOS 11.1)

* Single thread frequency result is significantly affected by measurement loop iteration count (CALIB_REPEAT)
2376 +-0 variability with ~1/2 msec loop
2385 +-5 variability with ~1/16 msec loop
 
Finally I've bought iPhone8, so benchmark iteration time will be shorter.
I'm still struggling with stable autodetect. Threads are jumping between cores like crazy.

Repo is updated with WIP commit, just in case.
~6 seconds measurement time should be enough even with high thread count (otherwise zeroes are displayed)
"id" is a core signature based on issue width.

Updated numbers for # of threads (1-6):
2376*
2304 2304
2304 2304 1680
2304 2304 1572 1572
2304 2304 1572 1572 1572
2304 2304 1572 1572 1572 1572
(measured on iPhone8, iOS 11.1)

* Single thread frequency result is significantly affected by measurement loop iteration count (CALIB_REPEAT)
2376 +-0 variability with ~1/2 msec loop
2385 +-5 variability with ~1/16 msec loop
could you make some test of gfxbench low-level and high-level offscreen test on low power mode on? since iphone 6s, apple soc throttling a lot on graphics. for example, iphone 7 plus stable only 67% from initial score on manhattan 3.1 long term benchmark. with low power mode, iphone 7 plus will produce stable result from the beginning.
 
Is there some information about A11 GPU? ALU count, Texture block count, ROP block count, frequency, memory bandwith and shader model supported?
 
Yeah, should be eDP or DP. See the A10X teardown at ifixit. In an AppleTV they just hook up the SoC to an external DP to HMDI TX IC.

Bit surprised there is an USB PHY on it as well, with its 5V I/O.

And from the Ifixit teardown, might be an PCIe lane somewhere as well...

Is that even possible in a 10nm process ? Of course you could, because you can, because it's not very area efficient. Just ULPI with an off-the-shelf PHY is more efficient.
 
geekbench battery test iphone 8 plus vs 7 plus

#############Battery Runtime####Battery Score#######Battery Level
iphone 8 plus#####16:12:20#########9723#############100% → 1%
iphone 7 plus#####11:11:40#########6716#############100% → 0%


any idea how to interpret that score?
source:
iphone 7 plus
iphone 8 plus
 
Well, just from looking at the numbers it seems to have about 5 minutes additional battery life...which would be impressive, considering the (much) faster SoC and allegedly, a smaller capacity battery to begin with.
 
I finally uploaded iOS Spectre attack proof-of-concept to github, after all the hype has died down.
Actually it was done a couple of weeks ago, but I was too lazy to prepare and submit it :-/
https://github.com/vvid/ios-spectre-poc

There is also disabled code to check for meltdown, but it doesn't work (at least with Spectre-V1-like speculation).
I.e. lines 835/849
 
CPU DasherX Benchmark latest update.
IPhone X A11 GPU FP32 look weaker than A9 GPU but near FP16 A9X GPU
Seems like this is bullshit benchmark.

For example they show 297 GFlops for A11.
With NEON, each A11 monsoon core can theoretically achieve 57 GFlops.
It think they just multiplied single-core measured result by 6.

The only adequate peak FLOPS benchmarking utility in App Store is vfpbench.
http://dench.flatlib.jp/app/vfpbench

Measured numbers are a bit lower - 51GFlops single core and 197GFlops for six cores with a code like

Code:
op q0, q12, q13
op q1, q12, q13
op q2, q12, q13
op q3, q12, q13
op q4, q12, q13
op q5, q12, q13
op q6, q12, q13
op q7, q12, q13
op q8, q12, q13
op q9, q12, q13
op q10, q12, q13
op q11, q12, q13
where op is fmla.4s

With 6 core layout the clock frequency of Monsoon is 3% less.
So we have 197- 2*49,5 = 98GFlops for all 4 Mistral cores.
98000(Gflops) / 4 (cores) / 1572(freq) = 15,58Flops per cycle.
That means each Mistral has two 128bit pipelines. Too good, given it's size.

So, I don't believe A11 GPU FP32 is slower.
 
Last edited:
Seems like this is bullshit benchmark.

For example they show 297 GFlops for A11.
With NEON, each A11 monsoon core can theoretically achieve 57 GFlops.
It think they just multiplied single-core measured result by 6.

The only adequate peak FLOPS benchmarking utility in App Store is vfpbench.
http://dench.flatlib.jp/app/vfpbench

Measured numbers are a bit lower - 51GFlops single core and 197GFlops for six cores with a code like

Code:
op q0, q12, q13
op q1, q12, q13
op q2, q12, q13
op q3, q12, q13
op q4, q12, q13
op q5, q12, q13
op q6, q12, q13
op q7, q12, q13
op q8, q12, q13
op q9, q12, q13
op q10, q12, q13
op q11, q12, q13
where op is fmla.4s

With 6 core layout the clock frequency of Monsoon is 3% less.
So we have 197- 2*49,5 = 98GFlops for all 4 Mistral cores.
98000(Gflops) / 4 (cores) / 1572(freq) = 15,58Flops per cycle.
That means each Mistral has two 128bit pipelines. Too good, given it's size.

So, I don't believe A11 GPU FP32 is slower.

Maybe Still ongoing developement of code lol but atleast iphone A10 and older soc give gflop number like vfp benchmark.

I ask developer about lower score of iPhone X GPU and he respond it.
bdoe0x.jpg
 
For GPU GFLOPS I think gfxbenchmark metal 3.0.3 ALU test can be an alternative to know how many GFLOPS FP32 of A11 GPU because my iPhone 7 plus ALU test score near DasherX score(around 300). But you need Charles proxy trick to download that old version of gfxbenchmark metal.
 
Any test suite which has a single "ALU" test doesn't seem very reliable to me. It's frankly complete nonsense to evaluate shader core performance with a *single* microbenchmark as if that was some kind of end-all-be-all of ALU performance. Just the fact someone is doing that indicates to me they probably don't understand how to test ALU performance in a useful way, and therefore I'd be very reluctant to believe any results.

The original GFXBench 3.0 ALU1 test, for example, is basically testing nothing but trigonometric performance... GPUs that are slower at sin/cos (because they're bloody useless in most real workloads) are significantly slower at it. The GFXBench 3.1 ALU2 test is slightly better, but still just one biased datapoint amongst others (e.g. branch and divergence efficiency matters quite a bit more than in typical workloads).

I honestly don't remember anything about any analysis of the A11, I've completely erased that from my brain for lack of interest... which tells me that it's probably *not* 2x slower than A10 for FP32 FMAs, because I'd hopefully have remembered that :p

A few random thoughts on what might be going wrong: maybe they're using a very low resolution with lots of blended layers, and it's not enough tiles/pixels to fill all the shader cores? or maybe it's just one *massive* shader and the A11 either has lots of instruction cache misses, or it's a tiny loop with many iterations and the A11 compiler doesn't unroll it properly while the A10 did?

There's literally a billion things that could go wrong with any microbenchmark, which is why if a result seems anomalous, you really want to iterate and modify the test to understand what's going on - or at least have a lot of knowledge of the trade-offs so you intuitively create a test which is unlikely to hit that kind of problem in the first place, and not a lot of people have that knowledge unfortunately. Kishonti for example didn't do that very well - they often relied on all the HW vendors telling them everything they did wrong in the beta versions until it kinda sorta worked in the final release...
 
Last edited:
Why the A11 is worse on 3DMark but better on GFXBench?

Sorry i forgot say worse than SD835
 
Last edited by a moderator:
A10 seems to have some real advantages over the A11 in performance, as shown in numerous speed tests like this one during an intensive video rendering/export match-up featured at 10:35 in the following video:

I believe iMovie on iOS makes good use of the GPU to accelerate this render out process, so I’m not surprised to see Apple’s first attempt at a GPU distinct from PowerVR DNA (as much as it is considering they still follow a solid TBDR path) not match the refinement and effectiveness of the PowerVR recipe overall.

Apple should’ve just bought the company at whatever high price was being asked. Having all the top designers they could get, along with the refined IP already designed, is invaluable.
 
Back
Top