Kishonti GFXbench

The marginal struggle in triangle performance of the A5 versus the newer competition appears to be a condition of the higher image resolutions; I'm guessing it's a consequence of USC balancing.
 
http://www.anandtech.com/show/6121/glbenchmark-25-performance

Anandtech has results up for a variety of Android SoC.

EDIT:
http://www.anandtech.com/show/6126/glbenchmark-25-performance-on-ios-and-android-devices

iOS results now up too. Looks like the SGX543MP2 struggles in the triangle tests against the Tegra 3 and Adreno 225.

So they compared Tegra3 to Tegra3....wow :(
I didn't even bother to look at the second page.

The comparison of the A5 with the A5X (second link) was more interesting. The A5X is nearly always roughly twice as fast as the A5. Therefore PowerVR seems to have a really scaleable Multi-GPU implementation. Nice!
 
The marginal struggle in triangle performance of the A5 versus the newer competition appears to be a condition of the higher image resolutions; I'm guessing it's a consequence of USC balancing.

Could be one of the factors; keep in mind that the ULP GF in T30 clocks at 520MHz, which also goes for its Vec4 VS unit in a specific singled out synthetic case scenario.

The whole 2.5 affair (and also encounting Arun's comments above) smells like it's quite an ALU intensive benchmark. And that's probably the reason why the 543MP2 banks so close in its ranking with the Adreno225. Essentially both have oversimplified 8 Vec4 ALUs; the iPad2 MP2 comes out at a good advantage considering that its clocked at 250MHz which is significantly lower what the 225 is clocked at and also of course the Mali400MP4 in the 32nm Exynos.

Execution still being a question mark, but Intel's and TI's 544MP2s clocked at ~532MHz will fair a tad better than the MP4 in iPad3. Even worse I don't want to imagine what kind of performance a "simple" Rogue GC6200 could deliver in that one.

I now am very curious about 2 solutions that haven't appeared in the results for 2.5 yet: Adreno320 and SGX544. My gut feeling estimates the first to give or take break even with iPad3 and the latter to probably break even with T30.

The comparison of the A5 with the A5X (second link) was more interesting. The A5X is nearly always roughly twice as fast as the A5. Therefore PowerVR seems to have a really scaleable Multi-GPU implementation. Nice!

Performance also scales as expected according to clockspeed between iPad2 (GPU@250MHz) and iPhone4S (GPU@200MHz).
 
Therefore PowerVR seems to have a really scaleable Multi-GPU implementation. Nice!
PowerVR's scalable core count is as much multi-GPU as a Radeon HD 7970 is a 7770 in tri-CF. Which is to say it's 1 GPU with multiple functional units, a far different beast than discrete GPUs.
 
PowerVR's scalable core count is as much multi-GPU as a Radeon HD 7970 is a 7770 in tri-CF. Which is to say it's 1 GPU with multiple functional units, a far different beast than discrete GPUs.
That's not completely true - there are several fundamental differences, including separate triangle setup/rasterisation units (only introduced in Fermi - you could make an argument that the GTX480 had 4 cores) but also all the complexity inherent in multiple cores having their own binning unit which can write triangles to memory despite that these triangles must ultimately still be rasterised/rendered in order.

If you wanted to make a multi-GPU comparison, it's probably closest to SFR but without having to process the geometry multiple times and each GPU being able to do as much geometry processing as it wants. This is obviously reliant on sharing the same memory so it's impossible with multiple chips today, but in the future who knows with technologies like TSV etc...
 
Am I right in thinking..that the adreno seems to excel at heavy shader workloads? According to nenamark and gl benchmark 2.5 the adreno 225 performs much much better than gl benchmark 2.1...

Where as power vr sgx 5 series seems to be a good all rounder...but doesn't seem to be as quick per execution resources...as adreno (with decent drivers) on more advanced shader oriented workloads.

So...looking at the disparity between adreno 225 scores between gl benchmark 2.1 - 2.5...could we also expect a similar trend with adreno 320...where as the performance is far better than the Egypt 720p tests?...

If so we could be in for a shock when the scores debut for 2.5...it has a good chance to overtake the A5x..despite A5x having quad memory and being a tablet chip...and adreno has no access to lpddr3 as yet and likely on immature drivers...
http://www.anandtech.com/show/6126/glbenchmark-25-performance-on-ios-and-android-devices/2
 
If so we could be in for a shock when the scores debut for 2.5...it has a good chance to overtake the A5x..
I'd personally be in for a shock if the Adreno 320 wasn't at least in the same ballpark :p Remember that Adreno 320 is supposedly 4 TMUs@400MHz or 1600MPix/s peak while the A5X is 2000MPix/s according to GLBenchmark. At the same time, the Adreno 225 already had 2x the raw GFlops compared to SGX per TMU, and you'd expect efficiency to have improved further in Adreno 320. So 0.8x the fillrate and >1.6x the GFLOPS in a flops-heavy benchmark that does very good front-to-back sorting so TBDR helps but not as much as in typical workloads - yeah, I'd be disappointed if the Adreno 320 wasn't fairly competitive, speaking strictly for myself.

despite A5x having quad memory and being a tablet chip...
And being on 45nm versus 28nm for the Adreno 320 - today's tablets chips are tomorrow's smartphone chips... :)
 
Ha yea true...BUT even on 32nm hkmg I doubt iPhone 5 A5x will have the same clocks (if we get A5x at all?).

Something to consider is if Qualcomm got serious with their commitment to development and did an nvidia utilising those 4 kraits, open cl and halti api s...then games could be looking astounding...and would be on another level completely compared to iPad 3. :).
 
I think he was trying to say that PVR scalable core count is not the same thing as Crossfire/SLI multi-GPU, which is true, but the analogy between 7970 vs. 7770 Trifire may be a bit off.
 
Last edited by a moderator:
The issue becomes the degree of redundancy.

A Series5XT MP core by itself works as an independent GPU.
 
If it's on the same piece of silicon it's not the same as Crossfire or SLI.

Yes and you're all right in a way. Because 5XT MPs scale entire cores (on the same piece of silicon) there's an amount of redundancy involved, being probably the primary reason why Rogue scales clusters instead of entire cores this time. The most important difference however between 5XT MPs and desktop mGPUs, is that the latter rely on AFR no added hw, while the first on SFR with hw scheduling logic (oversimplified).
 
.....while the first on SFR with hw scheduling logic (oversimplified).

and:
not only the fillrate increases, but also the polygon throughput.
If you look at a few of the offscreen-results from anandtech the triangle rate increases nearly 2-times between the MP2 and the MP4.
 
and:
not only the fillrate increases, but also the polygon throughput.
If you look at a few of the offscreen-results from anandtech the triangle rate increases nearly 2-times between the MP2 and the MP4.

IMG heavily marketed that geometry scaling for MPs is at 95%.
 
I assumed the increased detail of GLBench 2.5 over 2.1 would let the new iPad's A5X stretch its legs for a more accurate representation of its performance, so I wasn't interpreting the iPad 2 A5's proportionately lower score in the new test as underperforming. Since Kishonti only had one score range (1504 frames) uploaded for the iPad 2, though, I wanted to verify that result for any variability.

With GLBenchmark 2.5 now finally approved in the App Store, I was able to give it a run on my iPad 2 with iOS 5.1.1, and I received scores in line with the previous results (1507 frames). I noticed that, as expected for a proper TBDR, increasing the precision of the depth buffer to 24-bit and also adding 4xMSAA doesn't impact performance much.
 
I noticed that, as expected for a proper TBDR, increasing the precision of the depth buffer to 24-bit and also adding 4xMSAA doesn't impact performance much.

Excuse the hairsplitting, but the MP2 due to it being 2 full cores has 32 z/stencil units. I have my doubts that you'd get a similarly low performance drop if you'd have a single core TBDR with 16 or even 8 z/stencil units.
 
And the results are in for Qualcomm's Adreno 320 GPU:

Qualcomm's APQ8064 and GLBenchmark 2.5 - MDP/T Results
http://www.anandtech.com/show/6185/qualcomms-apq8064-and-glbenchmark-25-mdpt-results

Egypt HD (Offscreen 1080p):
APQ8064: 28.6
A5X: 24

Egypt Classic (Offscreen 1080p):
APQ8064: 79.2
A5X: 87

Not a generational leap but probably good enough to share the top spot with the SGX543MP4 for the next 6 months or so (APQ8064 is not yet available in a retail device, but should be in the next 1-2 months).
 
Yea very impressive numbers too be honest....power consumption should be much less (lithography/redundency?)....and it's wayy more balanced as a overall soc.

Don't forget it also has a generation ahead of api s....
 
And the results are in for Qualcomm's Adreno 320 GPU:

Qualcomm's APQ8064 and GLBenchmark 2.5 - MDP/T Results
http://www.anandtech.com/show/6185/qualcomms-apq8064-and-glbenchmark-25-mdpt-results

Egypt HD (Offscreen 1080p):
APQ8064: 28.6
A5X: 24

Egypt Classic (Offscreen 1080p):
APQ8064: 79.2
A5X: 87

Not a generational leap but probably good enough to share the top spot with the SGX543MP4 for the next 6 months or so (APQ8064 is not yet available in a retail device, but should be in the next 1-2 months).

A5X is manufactured under 45nm and the MP4 clocked at 250MHz. Go down to 28nm and clock it at a comparable frequency as Adreno 320 (=/>400MHz?) and the picture can change radically.

It doesn't of course change what you're saying, however yes it'll take some time until 8064 shows up in devices while the iPad3 launched in Q1 this year.
 
Back
Top