The quad-core Mali-T760 inside Rockchip's quick-to-market Cortex-A12 SoC hits the benchmarks:
http://gfxbench.com/result.jsp?benchmark=gfx30&test=545&order=score&base=gpu
For Mali T604 and T628, peak performance is 17 FP32 FLOPS per ALU per cycle. http://malideveloper.arm.com/downloads/OpenCL_FAQ.pdf shows this is compsed of:
- 7: dot product (4 Muls, 3 adds)
- 1: scalar add
- 4: vec4 add
- 4: vec4 multiply
- 1: scalar multiply
So the formula is:
17 FP32 flops/cycle * ALU count * core count * frequency
T604 MP4 : 17 * 2 * 4 * 0.533 = 72.488 FP32 GFLOPS
T628 MP6 : 17 * 2 * 6 * 0.533 = 108.732 FP32 GFLOPS
This is assuming FP32, but as the ALU's vector units are quite flexible, you can actually do more work in the vector units using FP16, or less using FP64. You can achieve 5 FP64 FLOPS per ALU per cycle, so that gives us:
T604 MP4 : 5 * 2 * 4 * 0.533 = 21.32 FP64 GFLOPS
T628 MP6 : 5 * 2 * 6 * 0.533 = 31.98 FP64 GFLOPS
This might be the most ghetto chip breakdown ever, but also the first time ever I see a die shot of a new Mali GPU: http://www.antutu.com/view.shtml?id=7879
Includes IP block size breakdowns: http://news.mydrivers.com/picture/309044/309044_36.html
Yep, that area annotation is wrong (excludes GMEM and isn't boundary accurate for the blocks it does enclose). It's a bit bigger.
I think prioritizing performance per square millimeter over performance per milliwatt for mobile (whether by intention or by simply not having the architectural efficiencies to do otherwise) can result in a product along the lines of a K1 where the primary target market becomes a niche like tablets versus mainstream or high-end smartphones.
?
I know Apple has designed for lower thermals/power by using more die area, which I'm saying is the correct priority for a mobile design. That's what I mean by prioritizing higher performance per milliwatt ahead of even performance per square millimeter.
I wonder if the start of that focus on larger silicon layouts was that Fast14 type technology they got from Intrinsity on, like mentioned, the Apple A4. Apple has managed to surprisingly shrink silicon usage with the A7, though, yet their priorities still seem to be in the right place.
I think prioritizing performance per square millimeter over performance per milliwatt for mobile (whether by intention or by simply not having the architectural efficiencies to do otherwise) can result in a product along the lines of a K1 where the primary target market becomes a niche like tablets versus mainstream or high-end smartphones.
Yes, I too have read up on how nVidia's latest development hardware compares to the actual end products from last year's competition, and I also observe that the OEMs who build the smartphones at the highest performance end, where selecting an app processor without an integrated modem/baseband is a completely acceptable design decision and who've used nVidia in this space before, are not selecting Tegra K1 nor are the MediaTeks, Rockchips, Broadcomms, Samsungs, TIs, etc of the world licensing K1's GPU IP for their SoCs.