OMAP4 & SGX540

Ailuros · Aug 24, 2012

Thanks Rys.

-----------------------------------------------------------

http://www.anandtech.com/show/6158/the-archos-101-xs-review

....if it weren't for such short battery life

Lazy8s · Aug 24, 2012

OMAP is being shown off quite well there, at the top of the charts! The video decode test for battery life, as the article astutely noted, does imply that the deficiency has more to do with power management than the requirements of any of the processors.

Rys · Aug 24, 2012

Definitely looks like DVFS is disabled to me.

Ailuros · Aug 25, 2012

Rys said:
Definitely looks like DVFS is disabled to me.

Dumb layman question: if true and DVFS is disabled, wouldn't enabling it affect up to some degree peak performance?

french toast · Aug 25, 2012

I've got an even dummer question...what is DVFS?

Ailuros · Aug 25, 2012

Dynamic Voltage (and) Frequency Scaling.

french toast · Aug 25, 2012

That explains it then...

Ailuros · Sep 7, 2012

http://www.amazon.com/gp/product/B008GFRBBW/

OMAP4460 & OMAP4470 for that rather high volume deal.

***edit: http://www.fudzilla.com/home/item/28659-nvidia-responds-to-amazons-kindle-fire-performance-claims

Now that was quick

tangey · Sep 7, 2012

Ailuros said:
http://www.amazon.com/gp/product/B008GFRBBW/

OMAP4460 & OMAP4470 for that rather high volume deal.

***edit: http://www.fudzilla.com/home/item/28659-nvidia-responds-to-amazons-kindle-fire-performance-claims

Now that was quick

I note that the new kindle is using a 1.2ghz Omap4430, and the 7" kindle HD is using a 1.2Ghz Omap4460.

Aren't those processors effectively identical in terms of performance and functionality ?

AlphaWolf · Sep 7, 2012

Faster sgx540 in 4460 isn't it?

ltcommander.data · Sep 7, 2012

AlphaWolf said:
Faster sgx540 in 4460 isn't it?

Are CPU and GPU clocks tied on OMAP4? The Galaxy Nexus's 4460 clocked in at 1.2GHz CPU and 307MHz GPU both 80% of rated 1.5GHz CPU and 384MHz GPU. Maybe that was just a design choice rather than a design limitation.

AlphaWolf · Sep 7, 2012

The impression I get from the wiki on OMAP is no, but maybe that's just max operating frequencies.

Ailuros · Sep 7, 2012

AlphaWolf said:
Faster sgx540 in 4460 isn't it?

No idea; at least if the 540 in the 4460 is clocked at 384MHz it would be at least one difference compared to the 4430 in the Kindle Fire.

ams · Sep 7, 2012

Ailuros said:
http://www.amazon.com/gp/product/B008GFRBBW/

OMAP4460 & OMAP4470 for that rather high volume deal.

***edit: http://www.fudzilla.com/home/item/28659-nvidia-responds-to-amazons-kindle-fire-performance-claims

Now that was quick

I am shocked that no one (including NVIDIA) has called out Amazon on their claim that Omap 4470 has "50% higher [GPU] floating point operations per second" vs. Tegra 3. The lowest performance versions of Tegra 3 (ie. T30L, found in devices such as Google Nexus 7) has a Geforce ULP GPU with ~ 10 GFLOPS (ie. 10 billion floating point operations per second) theoretical throughput, while the regular version of Tegra 3 (ie. T30, found in devices such as Asus Transformer Prime) has a Geforce ULP GPU with ~ 12 GFLOPS (ie. 12 billion floating point operations per second) theoretical throughput. Omap 4470 (found in devices such as Archos 101 XS and upcoming 8.9" Kindle Fire HD) has a PowerVR SGX 544[MP1] GPU with ~ 12 GFLOPS (ie. 12 billion floating point operations per second) theoretical throughput.

Amazon claimed that the GPU in Tegra 3 has 8 GFLOPS (ie. 8 billion floating point operations per second) theoretical throughput, which seems totally false based on all the available information on Tegra 3. Since Tegra 3 appears to be between ~ 10-12 GFLOPS, Amazon's "50% higher [GPU] floating point operations per second" claim turns out to be "0-20% higher [GPU] floating point operations per second" at best in reality.

On top of that, many publications reporting on the event were completely confused regarding the SoC details of the 8.9" Kindle Fire HD vs. the 7" Kindle Fire HD. The GPU performance in the 8.9" Kindle Fire HD variant is far superior to the GPU performance in the 7" Kindle Fire HD variant (and the CPU performance is significantly improved too). Omap 4460 (found in devices such as Archos G9) has a PowerVR SGX 540[MP1] GPU with ~ 6 GFLOPS (ie. 6 billion floating point operations per second) theoretical throughput.

Ailuros · Sep 8, 2012

ams said:
I am shocked that no one (including NVIDIA) has called out Amazon on their claim that Omap 4470 has "50% higher [GPU] floating point operations per second" vs. Tegra 3. The lowest performance versions of Tegra 3 (ie. T30L, found in devices such as Google Nexus 7) has a Geforce ULP GPU with ~ 10 GFLOPS (ie. 10 billion floating point operations per second) theoretical throughput, while the regular version of Tegra 3 (ie. T30, found in devices such as Asus Transformer Prime) has a Geforce ULP GPU with ~ 12 GFLOPS (ie. 12 billion floating point operations per second) theoretical throughput. Omap 4470 (found in devices such as Archos 101 XS and upcoming 8.9" Kindle Fire HD) has a PowerVR SGX 544[MP1] GPU with ~ 12 GFLOPS (ie. 12 billion floating point operations per second) theoretical throughput.

Amazon claimed that the GPU in Tegra 3 has 8 GFLOPS (ie. 8 billion floating point operations per second) theoretical throughput, which seems totally false based on all the available information on Tegra 3. Since Tegra 3 appears to be between ~ 10-12 GFLOPS, Amazon's "50% higher [GPU] floating point operations per second" claim turns out to be "0-20% higher [GPU] floating point operations per second" at best in reality.

I'm sure if you'd poke him he'd take a 180 degree and tell you that meant pixel shader FLOPs only. What's with the FLOP craze anyway especially for ULP GFs in Tegras; it's not like you can use the PS ALUs for anything GPGPU either.

I'm not shocked one bit when NVIDIA once in a while gets paid back with similar or worse marketing stunts that they are pulling themselves from time to time. Under that light the ULP GF in Tegra3 has "12 cores" or else let's count each ALU lane as a core because each of them for sure can act independently

On top of that, many publications reporting on the event were completely confused regarding the SoC details of the 8.9" Kindle Fire HD vs. the 7" Kindle Fire HD. The GPU performance in the 8.9" Kindle Fire HD variant is far superior to the GPU performance in the 7" Kindle Fire HD variant (and the CPU performance is significantly improved too). Omap 4460 (found in devices such as Archos G9) has a PowerVR SGX 540[MP1] GPU with ~ 6 GFLOPS (ie. 6 billion floating point operations per second) theoretical throughput.

In real time and at the same frequencies a SGX544 is between 40 to nearly 100% faster than a SGX540 (and no there's such thing as a 540MP1 since Series5 isn't multicore capable); if you'd have a severely fillrate limited case chances are high that the difference is closer to zero.

In any case NVIDIA entered the small form factor with it's typical aggressive marketing and if they get once in a while from different sides paid back in a similar manner it's entertaining at best. Or more simple monkey see monkey do. As for the rest it' just typical marketing and no I don't expect of course that the average consumer knows what the 4460 vs. the 4470 contains.

ams · Sep 8, 2012

Marketing antics aside, you know and I know that comparing maximum theoretical pixel shader FLOPS between a unified shader GPU architecture and non-unified shader GPU architecture is utter nonsense, and is very much a misrepresentation of the GPU performance on a non-unified architecture. At the end of the day, Amazon has tried to pull wool over people's eyes in suggesting that Omap 4 GPU performance vastly exceeds that of Tegra 3, and in giving people the impression that the 7" Kindle Fire HD has similar performance and resolution/PPI as the 8.9" version. The saddest thing about it is that their one tablet with full HD resolution and higher performance Omap 4 SoC is still approximately 2.5 months away from shipping to customers. And when websites start benchmarking the 7" Kindle Fire HD in the very near future, it will become all too clear that the GPU performance of this variant is simply far far behind that of Tegra 3. The 7" Kindle Fire HD will be competitive in SunSpider and BrowserMark (which do not directly measure GPU performance), but won't come close in most of the GLBenchmark [2.1/2.5] tests, and certainly will not offer the same quality of gaming experience as, say, a Tegra 3 equipped Nexus 7 would. On a side note, the WiFi features on the Kindle Fire HD are a very welcome addition, and so is the additional hard drive storage capacity.

Arun · Sep 8, 2012

ams said:
Amazon claimed that the GPU in Tegra 3 has 8 GFLOPS (ie. 8 billion floating point operations per second) theoretical throughput, which seems totally false based on all the available information on Tegra 3. Since Tegra 3 appears to be between ~ 10-12 GFLOPS, Amazon's "50% higher [GPU] floating point operations per second" claim turns out to be "0-20% higher [GPU] floating point operations per second" at best in reality.

Well you can also get fairly close to these numbers in another way: T30L is ((1VS + 2PS) * 8 FLOPS * 400MHz) = 9.6 GFLOPS. OMAP4470 is (4 USSEs * 9 FLOPS * 384MHz) = 13.8 GFLOPS. And yes, you can definitely get 9 real flops in a single cycle, although it's a bit of an extreme case. So that's 44% higher peak which isn't so far from the claimed 50% higher peak.

On top of that, many publications reporting on the event were completely confused regarding the SoC details of the 8.9" Kindle Fire HD vs. the 7" Kindle Fire HD. The GPU performance in the 8.9" Kindle Fire HD variant is far superior to the GPU performance in the 7" Kindle Fire HD variant

True, reading the initial press articles I was slightly confused about it myself.

Marketing antics aside, you know and I know that comparing maximum theoretical pixel shader FLOPS between a unified shader GPU architecture and non-unified shader GPU architecture is utter nonsense, and is very much a misrepresentation of the GPU performance on a non-unified architecture.

The vertex shader is idling most of the time in GLBenchmark 2.1 and Taiji - in that case it makes decent sense to just forget about those flops. And in GLBenchmark 2.5 the Vertex Shader is clearly a limitation for Tegra, so should we just remove some of those Pixel Shaders flops then? You're never going to get a perfect balance so there's no fair way to compare GFLOPS between unified and non-unified architectures, period.

Ailuros said:
In real time and at the same frequencies a SGX544 is between 40 to nearly 100% faster than a SGX540

In practice it's usually closer to 100% though, which does mean the Nexus 7 will have significantly faster GPU performance than Tegra 3 (but on the other hand SGX544 will be faster than Tegra 3, especially T30L).

ams · Sep 8, 2012

Arun said:
Well you can also get fairly close to these numbers in another way: T30L is ((1VS + 2PS) * 8 FLOPS * 400MHz) = 9.6 GFLOPS. OMAP4470 is (4 USSEs * 9 FLOPS * 384MHz) = 13.8 GFLOPS. And yes, you can definitely get 9 real flops in a single cycle, although it's a bit of an extreme case. So that's 44% higher peak which isn't so far from the claimed 50% higher peak.

The GPU in T30L reportedly operates at 416MHz (not 400MHz), so even in that extreme case vs. Omap 4470, the difference would be 38% when comparing the lowest performance Tegra 3 variant to Omap 4470 (so still a far cry from 50%, and in a less extreme scenario with the standard T30, the total GFLOPS difference vs. Omap 4470 would be essentially nil). That said, I believe that Ailuros is correct in suggesting that Amazon was comparing maximum theoretical pixel shader flops. Note that there was a slide that specifically mentioned 8 billion floating point ops per sec vs. 12 billion floating point ops per sec (http://1.androidauthority.com/wp-content/uploads/2012/09/omap-4470-vs-tegra-3.jpg). Also note that Amazon stated "Tegra 3" and did not say T30L. Since the GPU in T30L reportedly operates at 416MHz, and the GPU in T30 reportedly operates at 520MHz, that would give Tegra 3 between 10.0-12.5 GFLOPS throughput overall (and 6.7-8.3 GFLOPS pixel shader flops). At the end of the day, all things considered, Amazon's slide was confusing and even misleading with respect to GPU graphics performance differences between these two SoC's.

You're never going to get a perfect balance so there's no fair way to compare GFLOPS between unified and non-unified architectures, period.

Yes, but to completely ignore vertex shader flops when comparing a non-unified architecture to a unified architecture is still highly misleading in my opinion.

In practice it's usually closer to 100% though, which does mean the Nexus 7 will have significantly faster GPU performance than Tegra 3 (but on the other hand SGX544 will be faster than Tegra 3, especially T30L).

I think you meant to say that Tegra 3 (with ULP Geforce, all variants) will have much faster GPU performance than Omap 4460 (with SGX 540), but Omap 4470 (with SGX 544) will have somewhat faster GPU performance than Tegra 3 (with ULP Geforce T30L or T30 variants).

Ailuros · Sep 9, 2012

ams said:
Marketing antics aside, you know and I know that comparing maximum theoretical pixel shader FLOPS between a unified shader GPU architecture and non-unified shader GPU architecture is utter nonsense, and is very much a misrepresentation of the GPU performance on a non-unified architecture.

Either way you turn it since it's a marketing stunt it doesn't have to reflect average realistic performance. I'm not going to argue about the sillyness of any sort of those type of marketing claims, but I recall specifically NV claiming that the ULP GF in T3 is 3x times faster than the ULP GF in T2. Let me pull that equally nast marketing trick here and see how it can backfire if you claim bullshit:

ULP GF T2@333MHz=
1Vec4 PS = 8 * 0.333GHz = 2.664 GFLOPs
ULP GF T3@520MHz=
2Vec4 PS = 16 * 0.52GHz = 8.32 GFLOPs
------------------------------------------------------
8.32 / 2.664 = 3.12x difference what a coincidence

At the end of the day, Amazon has tried to pull wool over people's eyes in suggesting that Omap 4 GPU performance vastly exceeds that of Tegra 3, and in giving people the impression that the 7" Kindle Fire HD has similar performance and resolution/PPI as the 8.9" version. The saddest thing about it is that their one tablet with full HD resolution and higher performance Omap 4 SoC is still approximately 2.5 months away from shipping to customers. And when websites start benchmarking the 7" Kindle Fire HD in the very near future, it will become all too clear that the GPU performance of this variant is simply far far behind that of Tegra 3. The 7" Kindle Fire HD will be competitive in SunSpider and BrowserMark (which do not directly measure GPU performance), but won't come close in most of the GLBenchmark [2.1/2.5] tests, and certainly will not offer the same quality of gaming experience as, say, a Tegra 3 equipped Nexus 7 would. On a side note, the WiFi features on the Kindle Fire HD are a very welcome addition, and so is the additional hard drive storage capacity.

Again if you'll poke the Amazon CEO he'll take another 180 degree turn and tell you sorry I meant the highest end variant only. I thing we should be familiar with marketing crap these days.

-------------------------------------------------------------------------------------------------------------------
Arun,

Help me out here: isn't in the ULP GF PS ALUs a programmable blending unit that when no blending is used they could use another theoretical FLOP?

In other words if the story would go about 4+1 ALUs you'd have the probably count the +1 for theoretical peak arithmetic throughput as well on Tegras, Adrenos and possibly others too.

ams · Sep 10, 2012

Ailuros said:
but I recall specifically NV claiming that the ULP GF in T3 is 3x times faster than the ULP GF in T2.

NVIDIA does rate 3D performance of Tegra 3 relative to Tegra 2 as "Up to 3x". And while 3x performance improvement vs. Tegra 2 is not typical, there are some examples to back that up. GLBenchmark 2.5 has some tests that show ~ 2.6-3.2x performance improvement in fps (http://images.anandtech.com/graphs/graph6121/48839.png). Lost Planet 2, Da Vinci, and Glowball show ~ 2.1-2.7x performance improvement in fps (shown in one of the Tegra whitepapers).

OMAP4 & SGX540

Ailuros

Epsilon plus three

Lazy8s

Rys

Graphics @ AMD

Ailuros

Epsilon plus three

french toast

Ailuros

Epsilon plus three

french toast

Ailuros

Epsilon plus three

tangey

AlphaWolf

Specious Misanthrope

ltcommander.data

AlphaWolf

Specious Misanthrope

Ailuros

Epsilon plus three

ams

Ailuros

Epsilon plus three

ams

Arun

Unknown.

ams

Ailuros

Epsilon plus three

ams

Similar threads