[Performance] : ARM Mali T628 MP6

Plamensito

Newcomer
Patrick-Moorhead-Paper-940x3801.jpg



As already mentioned Samsung announces Exynos 5420 Octa. The processor will be driven by the Mali T628 graphics with 6 clusters.

The Mali T604, T624 and T628 is based 16FP and 2 Vec4.


Mali T628 in Exynos 5420 contains 6 clusters. Productivity is calculated as follows:

16FP x 2 Vec4 x 6 Clusters x 0.533MHz = 102.336 GFLOPS

As announced by Samsung, Mali T628 MP6 is 2 times more productive than PowerVR SGX544MP3 (Exynos 5410 Octa):

4USSE2 x 4 MAD's x 2 ALU x 3 MP x 0.533 = 51.168

Now look table for Performance offscreen :

Note: PowerVR SGX554MP4 has additional scalar (х 1.125)

Note 2 : More for Adreno 330 :
http://www.359gsm.com/forum/viewtopic.php?f=127&t=13152

8597b9890751.jpg
 
Hmm interesting twist if true on the ALU lane count for the S600 Adreno 320 and Adreno330.

As for the rest the math results are correct you just have quite a complicated way of calculating things.

I'm not sure if Mali T6xx has Vector ALUs and not SIMDs; I'd like to think it's the latter for all newer generation GPUs. In that case it makes my life easier to think for a T628MP6@533MHz:

6 * SIMD16 * 2 FLOPs * 0.533GHz = 102.34 GFLOPs

Oh and by the way for accuracy's sake if you're going to count probably SFU FLOPs for SGX554 you should also count them for something like the T604 (1 SFU/SIMD16 afaik). In that case the T604 is actually at a theoretical peak of ~72GFLOPs.
 
Hmm interesting twist if true on the ALU lane count for the S600 Adreno 320 and Adreno330.

As for the rest the math results are correct you just have quite a complicated way of calculating things.

I'm not sure if Mali T6xx has Vector ALUs and not SIMDs; I'd like to think it's the latter for all newer generation GPUs. In that case it makes my life easier to think for a T628MP6@533MHz:

6 * SIMD16 * 2 FLOPs * 0.533GHz = 102.34 GFLOPs

Oh and by the way for accuracy's sake if you're going to count probably SFU FLOPs for SGX554 you should also count them for something like the T604 (1 SFU/SIMD16 afaik). In that case the T604 is actually at a theoretical peak of ~72GFLOPs.

Yes, The Mali T604 add + 1TMU = 72.4GFLOPS.

The table does not include the additional 1 TMU (8x2+1)
 
T628MP6=150% of T604MP4 at same frequency. So T628 has no extra compute power core for core over T604 ?

The graph in the link has a somewhat higher figure for T604 than the graph on this thread ?
 
Last edited by a moderator:
T628MP6=150% of T604MP4 at same frequency. So T628 has no extra compute power core for core over T604 ?

The graph in the link has a somewhat higher figure for T604 than the graph on this thread ?

Probably not; but I'd expect the 628 (see the diagram on ARM's site compared to a 624 f.e.) to have quite a few aspects doubled compared to 604.
 
Hmm interesting twist if true on the ALU lane count for the S600 Adreno 320 and Adreno330.

As for the rest the math results are correct you just have quite a complicated way of calculating things.

I'm not sure if Mali T6xx has Vector ALUs and not SIMDs; I'd like to think it's the latter for all newer generation GPUs. In that case it makes my life easier to think for a T628MP6@533MHz:

6 * SIMD16 * 2 FLOPs * 0.533GHz = 102.34 GFLOPs

Oh and by the way for accuracy's sake if you're going to count probably SFU FLOPs for SGX554 you should also count them for something like the T604 (1 SFU/SIMD16 afaik). In that case the T604 is actually at a theoretical peak of ~72GFLOPs.

Yes, Mali T604 add +1 TMU (8x2+1) = 72.4 GFLOPS.
The table does not include the additional 1 TMU for Mali GPU, to be able to reveal the exact difference double between T628 MP6 (102.336) and SGX544MP3 (51.168)

If you need to add 1 TMU per T628 MP6 will look like this:
17 x 2 x 6 x 0.533 = 108.732 GFLOPS
 
You mean SFU instead of TMU I guess.

Anyway if you want to count the SFU FLOPs for the SGX544MP3 in the Exynos5410 it should be:

Each ALU = Vec4 + 1 or else 9 FLOPs/ALU

3 cores * 4 ALUs * 9 FLOPs * 0.533GHz = 57.56 GFLOPs

Albeit I'd personally prefer IHVs or their respective marketing departments to not count things like SFU FLOPs into the arithmetic throughput.

As for Rogue I'd say it doesn't hurt to say that each cluster = SIMD16 + 2 TMUs

G6130 = 1*SIMD16
G62x0 = 2*SIMD16
G64x0 = 4*SIMD16
G6630 = 6*SIMD16
 
You mean SFU instead of TMU I guess.

Anyway if you want to count the SFU FLOPs for the SGX544MP3 in the Exynos5410 it should be:

Each ALU = Vec4 + 1 or else 9 FLOPs/ALU

3 cores * 4 ALUs * 9 FLOPs * 0.533GHz = 57.56 GFLOPs

Albeit I'd personally prefer IHVs or their respective marketing departments to not count things like SFU FLOPs into the arithmetic throughput.

As for Rogue I'd say it doesn't hurt to say that each cluster = SIMD16 + 2 TMUs

G6130 = 1*SIMD16
G62x0 = 2*SIMD16
G64x0 = 4*SIMD16
G6630 = 6*SIMD16

Yes, as I said in the header post are not taken into account for the additional scalar SGX544 MP3 and additional SFU for Mali GPU ;)

With or without, the gap remains the same - x2 in favor of Mali T628MP6 than SGX544MP3.

Without additional scalar/SFU :

Mali T628 MP6 = 102.336
SGX544 MP3 = 51.168

With additional scalar/SFU :

Mali T628 MP6 = 108.732
SGX544 MP3 = 57.564

Regards ;)
 
Back
Top