OMAP4 & SGX540

Discussion in 'Mobile Devices and SoCs' started by roninja, Feb 17, 2009.

  1. Ailuros

    Ailuros Epsilon plus three Legend Subscriber

  2. Lazy8s

    Lazy8s Veteran

    OMAP is being shown off quite well there, at the top of the charts! The video decode test for battery life, as the article astutely noted, does imply that the deficiency has more to do with power management than the requirements of any of the processors.
     
  3. Rys

    Rys Graphics @ AMD Moderator Veteran Alpha

    Definitely looks like DVFS is disabled to me.
     
  4. Ailuros

    Ailuros Epsilon plus three Legend Subscriber

    Dumb layman question: if true and DVFS is disabled, wouldn't enabling it affect up to some degree peak performance?
     
  5. french toast

    french toast Veteran

    I've got an even dummer question...what is DVFS?
     
  6. Ailuros

    Ailuros Epsilon plus three Legend Subscriber

    Dynamic Voltage (and) Frequency Scaling.
     
  7. french toast

    french toast Veteran

    That explains it then...
     
  8. Ailuros

    Ailuros Epsilon plus three Legend Subscriber

  9. tangey

    tangey Veteran

  10. AlphaWolf

    AlphaWolf Specious Misanthrope Legend

    Faster sgx540 in 4460 isn't it?
     
  11. Are CPU and GPU clocks tied on OMAP4? The Galaxy Nexus's 4460 clocked in at 1.2GHz CPU and 307MHz GPU both 80% of rated 1.5GHz CPU and 384MHz GPU. Maybe that was just a design choice rather than a design limitation.
     
  12. AlphaWolf

    AlphaWolf Specious Misanthrope Legend

    The impression I get from the wiki on OMAP is no, but maybe that's just max operating frequencies.
     
  13. Ailuros

    Ailuros Epsilon plus three Legend Subscriber

    No idea; at least if the 540 in the 4460 is clocked at 384MHz it would be at least one difference compared to the 4430 in the Kindle Fire.
     
  14. ams

    ams Regular

    I am shocked that no one (including NVIDIA) has called out Amazon on their claim that Omap 4470 has "50% higher [GPU] floating point operations per second" vs. Tegra 3. The lowest performance versions of Tegra 3 (ie. T30L, found in devices such as Google Nexus 7) has a Geforce ULP GPU with ~ 10 GFLOPS (ie. 10 billion floating point operations per second) theoretical throughput, while the regular version of Tegra 3 (ie. T30, found in devices such as Asus Transformer Prime) has a Geforce ULP GPU with ~ 12 GFLOPS (ie. 12 billion floating point operations per second) theoretical throughput. Omap 4470 (found in devices such as Archos 101 XS and upcoming 8.9" Kindle Fire HD) has a PowerVR SGX 544[MP1] GPU with ~ 12 GFLOPS (ie. 12 billion floating point operations per second) theoretical throughput.

    Amazon claimed that the GPU in Tegra 3 has 8 GFLOPS (ie. 8 billion floating point operations per second) theoretical throughput, which seems totally false based on all the available information on Tegra 3. Since Tegra 3 appears to be between ~ 10-12 GFLOPS, Amazon's "50% higher [GPU] floating point operations per second" claim turns out to be "0-20% higher [GPU] floating point operations per second" at best in reality.

    On top of that, many publications reporting on the event were completely confused regarding the SoC details of the 8.9" Kindle Fire HD vs. the 7" Kindle Fire HD. The GPU performance in the 8.9" Kindle Fire HD variant is far superior to the GPU performance in the 7" Kindle Fire HD variant (and the CPU performance is significantly improved too). Omap 4460 (found in devices such as Archos G9) has a PowerVR SGX 540[MP1] GPU with ~ 6 GFLOPS (ie. 6 billion floating point operations per second) theoretical throughput.
     
  15. Ailuros

    Ailuros Epsilon plus three Legend Subscriber

    I'm sure if you'd poke him he'd take a 180 degree and tell you that meant pixel shader FLOPs only. What's with the FLOP craze anyway especially for ULP GFs in Tegras; it's not like you can use the PS ALUs for anything GPGPU either.

    I'm not shocked one bit when NVIDIA once in a while gets paid back with similar or worse marketing stunts that they are pulling themselves from time to time. Under that light the ULP GF in Tegra3 has "12 cores" or else let's count each ALU lane as a core because each of them for sure can act independently :roll:

    In real time and at the same frequencies a SGX544 is between 40 to nearly 100% faster than a SGX540 (and no there's such thing as a 540MP1 since Series5 isn't multicore capable); if you'd have a severely fillrate limited case chances are high that the difference is closer to zero.

    In any case NVIDIA entered the small form factor with it's typical aggressive marketing and if they get once in a while from different sides paid back in a similar manner it's entertaining at best. Or more simple monkey see monkey do. As for the rest it' just typical marketing and no I don't expect of course that the average consumer knows what the 4460 vs. the 4470 contains.
     
  16. ams

    ams Regular

    Marketing antics aside, you know and I know that comparing maximum theoretical pixel shader FLOPS between a unified shader GPU architecture and non-unified shader GPU architecture is utter nonsense, and is very much a misrepresentation of the GPU performance on a non-unified architecture. At the end of the day, Amazon has tried to pull wool over people's eyes in suggesting that Omap 4 GPU performance vastly exceeds that of Tegra 3, and in giving people the impression that the 7" Kindle Fire HD has similar performance and resolution/PPI as the 8.9" version. The saddest thing about it is that their one tablet with full HD resolution and higher performance Omap 4 SoC is still approximately 2.5 months away from shipping to customers. And when websites start benchmarking the 7" Kindle Fire HD in the very near future, it will become all too clear that the GPU performance of this variant is simply far far behind that of Tegra 3. The 7" Kindle Fire HD will be competitive in SunSpider and BrowserMark (which do not directly measure GPU performance), but won't come close in most of the GLBenchmark [2.1/2.5] tests, and certainly will not offer the same quality of gaming experience as, say, a Tegra 3 equipped Nexus 7 would. On a side note, the WiFi features on the Kindle Fire HD are a very welcome addition, and so is the additional hard drive storage capacity.
     
    Last edited by a moderator: Sep 8, 2012
  17. Arun

    Arun Unknown. Legend

    Well you can also get fairly close to these numbers in another way: T30L is ((1VS + 2PS) * 8 FLOPS * 400MHz) = 9.6 GFLOPS. OMAP4470 is (4 USSEs * 9 FLOPS * 384MHz) = 13.8 GFLOPS. And yes, you can definitely get 9 real flops in a single cycle, although it's a bit of an extreme case. So that's 44% higher peak which isn't so far from the claimed 50% higher peak.

    True, reading the initial press articles I was slightly confused about it myself.

    The vertex shader is idling most of the time in GLBenchmark 2.1 and Taiji - in that case it makes decent sense to just forget about those flops. And in GLBenchmark 2.5 the Vertex Shader is clearly a limitation for Tegra, so should we just remove some of those Pixel Shaders flops then? You're never going to get a perfect balance so there's no fair way to compare GFLOPS between unified and non-unified architectures, period.

    In practice it's usually closer to 100% though, which does mean the Nexus 7 will have significantly faster GPU performance than Tegra 3 (but on the other hand SGX544 will be faster than Tegra 3, especially T30L).
     
  18. ams

    ams Regular

    The GPU in T30L reportedly operates at 416MHz (not 400MHz), so even in that extreme case vs. Omap 4470, the difference would be 38% when comparing the lowest performance Tegra 3 variant to Omap 4470 (so still a far cry from 50%, and in a less extreme scenario with the standard T30, the total GFLOPS difference vs. Omap 4470 would be essentially nil). That said, I believe that Ailuros is correct in suggesting that Amazon was comparing maximum theoretical pixel shader flops. Note that there was a slide that specifically mentioned 8 billion floating point ops per sec vs. 12 billion floating point ops per sec (http://1.androidauthority.com/wp-content/uploads/2012/09/omap-4470-vs-tegra-3.jpg). Also note that Amazon stated "Tegra 3" and did not say T30L. Since the GPU in T30L reportedly operates at 416MHz, and the GPU in T30 reportedly operates at 520MHz, that would give Tegra 3 between 10.0-12.5 GFLOPS throughput overall (and 6.7-8.3 GFLOPS pixel shader flops). At the end of the day, all things considered, Amazon's slide was confusing and even misleading with respect to GPU graphics performance differences between these two SoC's.

    Yes, but to completely ignore vertex shader flops when comparing a non-unified architecture to a unified architecture is still highly misleading in my opinion.

    I think you meant to say that Tegra 3 (with ULP Geforce, all variants) will have much faster GPU performance than Omap 4460 (with SGX 540), but Omap 4470 (with SGX 544) will have somewhat faster GPU performance than Tegra 3 (with ULP Geforce T30L or T30 variants).
     
  19. Ailuros

    Ailuros Epsilon plus three Legend Subscriber

    Either way you turn it since it's a marketing stunt it doesn't have to reflect average realistic performance. I'm not going to argue about the sillyness of any sort of those type of marketing claims, but I recall specifically NV claiming that the ULP GF in T3 is 3x times faster than the ULP GF in T2. Let me pull that equally nast marketing trick here and see how it can backfire if you claim bullshit:

    ULP GF T2@333MHz=
    1Vec4 PS = 8 * 0.333GHz = 2.664 GFLOPs
    ULP GF T3@520MHz=
    2Vec4 PS = 16 * 0.52GHz = 8.32 GFLOPs
    ------------------------------------------------------
    8.32 / 2.664 = 3.12x difference what a coincidence :roll::lol:

    Again if you'll poke the Amazon CEO he'll take another 180 degree turn and tell you sorry I meant the highest end variant only. I thing we should be familiar with marketing crap these days.

    -------------------------------------------------------------------------------------------------------------------
    Arun,

    Help me out here: isn't in the ULP GF PS ALUs a programmable blending unit that when no blending is used they could use another theoretical FLOP?

    In other words if the story would go about 4+1 ALUs you'd have the probably count the +1 for theoretical peak arithmetic throughput as well on Tegras, Adrenos and possibly others too.
     
  20. ams

    ams Regular

    NVIDIA does rate 3D performance of Tegra 3 relative to Tegra 2 as "Up to 3x". And while 3x performance improvement vs. Tegra 2 is not typical, there are some examples to back that up. GLBenchmark 2.5 has some tests that show ~ 2.6-3.2x performance improvement in fps (http://images.anandtech.com/graphs/graph6121/48839.png). Lost Planet 2, Da Vinci, and Glowball show ~ 2.1-2.7x performance improvement in fps (shown in one of the Tegra whitepapers).
     
Loading...

Share This Page

Loading...