Freescale iMX6 Discussion

Exophase

Veteran
As for Vivante, Freescale claims some *extremely* impressive benchmark numbers for the GC2000 in i.MX61, but I'd like to see numbers on a platform with VSync on before passing judgement... Still much more impressive than anyone expected I think.

I went looking around for this and if anyone is interested see page 28 here:

http://www.freescale.com/files/32bit/doc/fact_sheet/IMX6SRSFS.pdf

I'd love to know more about the architecture of Vivante's higher end GPUs. All I could really glean is that they're unified and that the highest end ones are "multicore", although from the block diagram I don't think this is in the full-on sense it is for SGX543MP.

BTW, the GC2000 in i.MX6 dual/quad runs at 533MHz:

http://www.engineeringtv.com/video/Freescale-i-MX-6-ARM-Cortex-A9
 
Last edited by a moderator:
What page 28? The pdf has only 2 pages.
He meant this presentation: http://www.freescale.com.cn/cstory/ftf/2011/pdf/1166.pdf
And yes, that's mostly where I got that data from :) Didn't want to link it because of what happened when I linked to the Qualcomm roadmap one and I didn't feel the GPU data was *that* valuable since it's clearly without VSync unlike the iPad, and I doubt the die size estimate for GC2000 (half iPad's 32nm) is even fair in any way (e.g. 40nm vs 45nm and excluding GC350 for OpenVG). The overall presentation definitely has truckloads of data on i.MX6x though which is nice. Sounds like an very strong product except for the fairly low (1.2GHz) clock speed on the A9s. One likely explanation is that Page 38 states "core voltage: 1.1v" which implies it's probably pure 40LP rather than 40LPG like NVIDIA.

The tables on Page 20 and Page 21 are very interesting when it comes to tablet TDP and 2xA15 vs 4xA19. They claim A15 has 25% less DMIPS/W than A9 - I don't think that's enough to justify 4xA9 given the subpar quad-core scaling, and the data on Page 22 seem even less reliable. Oh and is that A15 DMIPS number even right? 12K@2x2GHz implies only 3 DMIPS/MHz rather than the previously rumoured >3.5DMIPS/MHz. That seems awfully low. Is that right or is Freescale just making up all these numbers as they go?
 
As for Vivante, Freescale claims some *extremely* impressive benchmark numbers for the GC2000 in i.MX61, but I'd like to see numbers on a platform with VSync on before passing judgement... Still much more impressive than anyone expected I think.

Why would you want to see numbers with vsync on?? That would mean that like iPad2 you don't actually see the performance of the device.
 
Why would you want to see numbers with vsync on?? That would mean that like iPad2 you don't actually see the performance of the device.
Exactly. But GLBenchmark 2.1 wasn't available when they ran those numbers probably, and they can't disable VSync on the iPad 2. So they ran the benchmark on the iPad 2 with VSync like everyone else, then ran it on their own development platform without VSync because they can disable it there. That's fine but then they compared it. Ugh...

Anyway I linked the presentation in the post above, these are still very impressive numbers, clearly superior to Exynos and at the very least in the same ballpark as the iPad 2. Apparently GC2000 has 2 TMUs and Freescale clocks it at 533MHz, while Vivante clocks it at 625MHz here: http://www.vivantecorp.com/Product_Brief.pdf

So we're looking at 1066GPixel/s, 20.5GFLOPS (Vec4 SIMD FP32), and 85MTri/s without the added efficiency of TBDR. Pretty damn good performance for those specs if that presentation is to be believed, their driver team should be proud. However if Freescale's claim of "half iPad2 die size" is correct (and excludes GC320/GC350), then their 6.9mm²-on-40nm silicon claim is hilariously unrealistic (even *slightly* more so than Imaginations' on SGX543 iirc).
 
Exactly. But GLBenchmark 2.1 wasn't available when they ran those numbers probably, and they can't disable VSync on the iPad 2. So they ran the benchmark on the iPad 2 with VSync like everyone else, then ran it on their own development platform without VSync because they can disable it there. That's fine but then they compared it. Ugh...
Unfortunately we see the vsync locked vs unlocked all the time with competitors, obviously the comparison doesn't fly with anyone technical, but not everyone is technical...
Anyway I linked the presentation in the post above, these are still very impressive numbers, clearly superior to Exynos and at the very least in the same ballpark as the iPad 2. Apparently GC2000 has 2 TMUs and Freescale clocks it at 533MHz, while Vivante clocks it at 625MHz here: http://www.vivantecorp.com/Product_Brief.pdf

So we're looking at 1066GPixel/s, 20.5GFLOPS (Vec4 SIMD FP32), and 85MTri/s without the added efficiency of TBDR. Pretty damn good performance for those specs if that presentation is to be believed, their driver team should be proud. However if Freescale's claim of "half iPad2 die size" is correct (and excludes GC320/GC350), then their 6.9mm²-on-40nm silicon claim is hilariously unrealistic (even *slightly* more so than Imaginations' on SGX543 iirc).

Agree the numbers do look good, I'm guessing that it's 60-70% the performance of iPad2.

Who knows on the die size!
 
Unfortunately we see the vsync locked vs unlocked all the time with competitors, obviously the comparison doesn't fly with anyone technical, but not everyone is technical...

Agree the numbers do look good, I'm guessing that it's 60-70% the performance of iPad2.

Only one of the benchmarks that Freescale ran (Pro) is anywhere close to vsync limit for the average. Egypt at 42.24FPS may still increase a few points w/vsync off but I doubt the hit is especially dire. There's no way it'd approach the 76.6FPS score Freescale claims, much less 42-67% higher like you're suggesting.

This is all assuming the benchmark was anywhere on the level, and isn't taking into consideration driver improvements since then.
 
Only one of the benchmarks that Freescale ran (Pro) is anywhere close to vsync limit for the average. Egypt at 42.24FPS may still increase a few points w/vsync off but I doubt the hit is especially dire. There's no way it'd approach the 76.6FPS score Freescale claims, much less 42-67% higher like you're suggesting.

This is all assuming the benchmark was anywhere on the level, and isn't taking into consideration driver improvements since then.

You're ignoring the latest GLBench2.1 results for ipad2 that give a result of 85.7fps for egypt and 148 fps for pro, these are at 1280x720, scaling to 1024x768, gives about 100fps and 170fps respectively, which puts the GC2000 results in the 56-76% range. Basically GC2000 is not faster than ipad2, irrespective of what vivanti's marketing may say.
 
You're ignoring the latest GLBench2.1 results for ipad2 that give a result of 85.7fps for egypt and 148 fps for pro, these are at 1280x720, scaling to 1024x768, gives about 100fps and 170fps respectively, which puts the GC2000 results in the 56-76% range. Basically GC2000 is not faster than ipad2, irrespective of what vivanti's marketing may say.

I'll wait for a more apples to apples comparison from a third party then. If the results differ as much as you say it's not because of vsync and I'm sure driver enhancements haven't had that kind of alarming impact either - it's either Vivante did it wrong or the benchmark is not really the same. That, and can you really assert that this benchmark delivers perfect scaling with resolution for iPad2?

If it is because of vsync then the performance of Egypt is way too erratic. Makes me wish we had some frame-period vs time plots. But noting the results for "high" in 2.1 are a lot different (and more clearly vsync limited) than the results for 2.0 I'm going to go with it being unfair to compare the two.
 
Last edited by a moderator:
Frankly I personally had hopes from the early performance claims to see something far more efficienct from the iMX6. What I'm seeing here in that pdf doesn't convince me of anything that will be that much better than Tegra3 overall (where of course both T3 and iMX6 will have their own advantages and disadvantages).

Besides the vsync on vs. vsync off trick, what are they exactly comparing here considering die area? The A5 as an SoC has 2.5x times the die area of a Tegra2 (whereby the GPU block in the first is just a tad below the entire T2 SoC estate) and 1.5x times the die area of a Tegra3.

Before someone comes to such nonsensical comparisons (starting from the fact that they won't compare within iOS but rather in a Android or win8 environment and thus completely different SoCs) he should have the decency to also see that the CPU block on the A5 is also roughly around 40mm2 (roughly 1/4 of the A5) with Tegra2 again being at 49mm2 with a dual core A9 at 1.0GHz just like the A5. Whatever you add for the A5 CPU block it's bleedingly obvous that Apple didn't care one bit about spending die area.

If their GC2000 GPU block with all the bells and whistles (power islands, cache etc.) is also roughly about 1/4th the entire iMX6 die area estate and the entire SoC is quite a bit smaller than A5, I fail to see any supposed advantaqe.

I'm willing to bet that if GLBenchmark2.1 had a 1024*768 offscreen run that the Tegra3 would be quite close to that result (considering it gets already 53.0 fps in 1280*720 offscreen) and from a GPU that is being claimed for 200MTs/s. Here I doubt even the 85M Tris/s you folks mention above.
 
While that Freescale presentation has made me reconsider my perspective some, the reconsideration is about the competitiveness of quad A9 versus dual A15, not the proficiency of their graphics. They're comparing a graphics part of theirs not nearly early enough to release against 543MP2 and using all sorts of unfounded and non-comparable measures.

Considering the typical smartphone CPU workloads and the distribution potential among the A9s, I think a conservative quad could deliver competitive CPU efficiency. It definitely wouldn't be competitive in performance, but the significance of CPU performance is sooooooo overrated. When a CPU is trying to be more than a system's orchestrator and housekeeper, it'll likely be doing another core's job less proficiently than that core would do it itself. SoCs are forward thinking that way, covering all of the main usage scenarios with a specialized core... die area is cheaper than power consumption.

Though the CPU is a general purpose processor, it's not the best processor for all general purpose work anyway. The GPU can step up these days for at least a small subset of important jobs, especially in future apps where live video and/or audio feeds will be taken and analyzed by the phone.
 
Last edited by a moderator:
Apparently GC2000 has 2 TMUs and Freescale clocks it at 533MHz, while Vivante clocks it at 625MHz here: http://www.vivantecorp.com/Product_Brief.pdf

So we're looking at 1066GPixel/s, 20.5GFLOPS (Vec4 SIMD FP32)

In this (german language) article from May:
http://www.elektroniknet.de/bauelem...les_iMX6_setzt_auf_Triple-Core-Grafik-Engine/

they claim 667MHz and 24GFLOPS and give some more details.
There is an older Version of that Vivante spec sheet I found somewhere (not online anymore...) which goes only up to GC2000 where they list the GC2000 as 108MTri/s, 1300MPix/s and 32GFLOPS, Vec4 FP32 SIMD.

21.344 GFLOPS @667MHz ?


Maybe Freescale wanted to clock it at 750MHz first, but that did not work so they reduced it to 667 but Marketing still had the 24GFLOPS number in their Powerpoint slides.

I guess the 200MTri/s were also just a mistake (whoops wrong table column...? GC4000 instead of 2000?). Or they just use some creative math... Marketing PDFs :rolleyes:


As for the benchmark results:
Freescale only had first revision silicon for some weeks (s. that Youtube Video from charbax). I guess the drivers could still be improved until final silicon next year.
 
Addition to my last post:
Sorry forgot to take the mentioned 9 FLOPS / Clock / Shader into account.
So it really is:
4*9*667MHz = 24012 MFLOPS

I guess this could be Vec4 + 1 Scalar like in the Adreno.
 
Addition to my last post:
Sorry forgot to take the mentioned 9 FLOPS / Clock / Shader into account.
So it really is:
4*9*667MHz = 24012 MFLOPS

I guess this could be Vec4 + 1 Scalar like in the Adreno.

Since you're post count is at 1, what was your last post?
 
Sorry - actually this my third try to post something but it gets blocked :)
Forum Bug ;-)
I posted in advanced mode and got a Message that my post needs to be approved by a Moderator so "my last post" did not get through. But for the addition above I used the Quick Reply and it got through directly... ;)
@ Moderator please fix this mess.

For everyone else here the link where I got the Numbers from:
http://www.elektroniknet.de/bauelem...les_iMX6_setzt_auf_Triple-Core-Grafik-Engine/
 
Addition to my last post:
Sorry forgot to take the mentioned 9 FLOPS / Clock / Shader into account.
So it really is:
4*9*667MHz = 24012 MFLOPS

I guess this could be Vec4 + 1 Scalar like in the Adreno.
GC2000's shader units run to a different clock compared to rest of the pipeline.
 
I've been kind of eagerly checking for i.MX6 availability. Unlike most other SoC manufacturers Freescale tends to make their i.MX series available in low quantity orders through distributors or directly, so it's a viable choice for low volume or even hobbyist hardware.

One thing I really like about i.MX6, outside of the quite competitive GPU and quad-core option, is just how much it has integrated. For instance, it has a lot of embedded regulators, meaning you only need to supply it two DC/DC regulated voltages, foregoing the need for complex PMICs. And it can put it I/Os on 3.3V or 1.8V, allowing easier peripheral interface with level shifters (especially for say, LCDs). It's also one of the few ARM SoCs this generation with DDR3 support.

Hope it's going to be available soon..
 
Back
Top