Samsung Orion SoC - dual-core A9 + "5 times the 3D graphics performance"

Arun · Apr 7, 2012

From \kernel\drivers\media\video\samsung\mali\platform\ pegasus-m400\mali_platform_dvfs.c
So the 4412 is running at at least 440MHz, if they haven't upped it even more since the source drop, and certainly would explain the benchmarks.

Very good catch!

ltcommander.data said:
Tegra 2 PS are actually FP20 (bottom of page 7 in the above white paper). No idea about Tegra 3, but seeing it is mainly an expansion of Tegra 2 rather than a redesign, it's probably still at FP20. Which is why I've been curious how they meet the DX9 compliance necessary for the Windows 8 support they've been demoing.

I'm pretty sure they must have improved their ALU precision to FP24 and their depth buffer support from 16-bit to 24-bit. Although they don't expose 24-bit depth in OpenGL ES which is probably because it will have a noticeable performance hit (full +50% depth bandwidth since they don't support framebuffer compression AFAICT). Every time I do performance analysis on Tegra at work my eyes bleed at all the depth fighting artifacts... (although it's not as bad in games/benchmarks that set their zmin/zmax intelligently it's still fairly bad).

Obviously these are all trade-offs and I understand some of the reasons why they made them, but I think at a basic level NVIDIA designed the original Tegra GPU in an era where they thought handheld GPU performance wouldn't increase anywhere nearly as fast as it has, and more importantly they thought they'd be more limited by area than they actually could be at this point (leading to things like no framebuffer compression). It will be interesting to see how aggressive they are with handheld Kepler (and how similar it is to PC Kepler) once that comes to market although it remains to be seen when that actually is and what the competition will be at that point...

french toast · Apr 7, 2012

It will be interesting to see how aggressive they are with handheld Kepler (and how similar it is to PC Kepler) once that comes to market although it remains to be seen when that actually is and what the competition will be at that point...

Ha, nice

Ailuros · Apr 8, 2012

ltcommander.data said:
http://www.nvidia.com/content/PDF/t...ing_High-End_Graphics_to_Handheld_Devices.pdf

Tegra 2 PS are actually FP20 (bottom of page 7 in the above white paper). No idea about Tegra 3, but seeing it is mainly an expansion of Tegra 2 rather than a redesign, it's probably still at FP20. Which is why I've been curious how they meet the DX9 compliance necessary for the Windows 8 support they've been demoing.

That's what I thought too; but it should be there in T3 at least (FP24 PS and 24bit Z) for whatever windows it may run on.

Exophase · Apr 11, 2012

french toast said:
That suggests that they used to be equal, however wiki is not always accurate.

2004 standards are ancient history, even in the mobile world, but I think the TMU to ROP ratio was non-one long before that. Look at the first dual-texturing GPUs, they could handle two texels for every pixel output. That right there indicates a 2:1 ratio. At this point fragment shading wasn't programmable (and I wouldn't consider it fully programmable until DX9 level) so it's difficult to identify how many "ALUs" these units had. But the combiners often had several stages for each pixel output too, so if you count those as ALUs that number is higher. You need at least one combiner stage for every texel, but you could have more for other inputs.

french toast said:
So he does seem to suggest FP16...for both PS/VS...This plays out on his projected Mali400 @ 400mhz in his table;..http://www.anandtech.com/show/4686/samsung-galaxy-s-2-international-review-the-best-redefined/16

He's saying that all of the SIMDs are capable of FP16 or better per lane, but that doesn't mean that they're limited to that.

Ailuros said:
The majority of those GPUs don't have dedicated blending units, they're capable of programmable blending in the ALUs.

I bet it's more that the shaders do half the blending and the ROP does the other half, something like this:

Code:

shader: color.rgb *= color.a
shader: color.a = 1 - color.a
ROP: render_target.rgb = (render_target.rgb * color.a) + color.rgb

Because normally the shader can't read the render target directly, you really want to keep that decoupled.

Arun · Apr 11, 2012

Exophase said:
Because normally the shader can't read the render target directly, you really want to keep that decoupled.

Both SGX and Tegra have full access to the previously rendered color in the pixel shader

So with the right extensions/low-level access you can do HDR in non-RGB colorspaces very efficiently for example...

Ailuros · Apr 12, 2012

Not the right topic anyway, but I'm really wondering for quite some time now where the 9th FLOP per ALU comes from in Series5XT GPU IP.

wco81 · Apr 12, 2012

Well Samsung announced Galaxy Tab 2 at 7 and 10.1-inch screen sizes.

They have ICS and are described as 1Ghz dual-core. They don't try to compete with high-resolution screen on the iPad but they are lower in prices, especially the 7-inch model at $250.

Ailuros · May 10, 2012

These results for the 32nm Exynos should be final:

http://www.glbenchmark.com/phonedetails.jsp?D=Samsung+GT-N8000&benchmark=glpro21

Nebuchadnezzar · May 12, 2012

Ailuros said:
These results for the 32nm Exynos should be final:

http://www.glbenchmark.com/phonedetails.jsp?D=Samsung+GT-N8000&benchmark=glpro21

Galaxy Note 2 model number, smashing.

Nebuchadnezzar · Jun 1, 2012

So the kernel sources for the S3 were released and the final clock on the Mali is the same as I posted several months ago, 440MHz.

While I'm pretty sure there's no driver magic going on here, the only reasonable explanation would be that the memory bandwidth is vastly improved. Bandwidth tests put the S3 at roughly 30% higher speeds in real-world metrics over the S2. I don't see any other explanation for a 95% performance increase for only 65% clock increase on the GPU. Reports have been posted that it has an "internal 128bit bus" over 64bit in the 4210, but I do not understand how exactly does this help memory bandwidth, as the memory itself remained (apparently) unchanged.

french toast · Jun 1, 2012

Nebuchadnezzar said:
So the kernel sources for the S3 were released and the final clock on the Mali is the same as I posted several months ago, 440MHz.

While I'm pretty sure there's no driver magic going on here, the only reasonable explanation would be that the memory bandwidth is vastly improved. Bandwidth tests put the S3 at roughly 30% higher speeds in real-world metrics over the S2. I don't see any other explanation for a 95% performance increase for only 65% clock increase on the GPU. Reports have been posted that it has an "internal 128bit bus" over 64bit in the 4210, but I do not understand how exactly does this help memory bandwidth, as the memory itself remained (apparently) unchanged.

I get 797mb/s on that test using a 70mb test size-45% i would have thought that they would have stuck in some lpddr2 1066?? anyway im not running out of bandwidth anytime soon i know that much, this thing chews through anything, the only gripe is that Samsung didn't stick in 2gb ram, ive only got 780mb to play with for some reason, i expect thats ICS+touchwhizz taking that, then i reguarly find around 400mb used constantly doing nothing on the home screen, even when i close every app from task manager and clear ram i barley get it under 400mb at best, ive installed a ram booster app to clear ram when i get to only 300 free.

Ive noticed i some times get kicked out of apps like Opera when im heavy loading, i wonder if that ICS/Touchwhizz cutting apps back? you would have though they would have started with some annoying background processess first??

Anyway once ive put that auto ram booster in i don't run out of ram for my main apps, certainly even with everything running some 25 apps ive experienced only a slight stutter when ram get filled and ICS starts closing or tombstoning, Bandwidth it seems is more than fine.:smile:

Let me know if you want me to run any benchmarks, i have quite a few installed already. cheers.

Ailuros · Jun 1, 2012

Nebuchadnezzar said:
So the kernel sources for the S3 were released and the final clock on the Mali is the same as I posted several months ago, 440MHz.

While I'm pretty sure there's no driver magic going on here, the only reasonable explanation would be that the memory bandwidth is vastly improved. Bandwidth tests put the S3 at roughly 30% higher speeds in real-world metrics over the S2. I don't see any other explanation for a 95% performance increase for only 65% clock increase on the GPU. Reports have been posted that it has an "internal 128bit bus" over 64bit in the 4210, but I do not understand how exactly does this help memory bandwidth, as the memory itself remained (apparently) unchanged.

What am I missing, where's the 95% performance difference? http://www.glbenchmark.com/compare....00 Galaxy S III&D2=Samsung GT-i9100 Galaxy S2 ....or are you saying that the S2 results are at 440MHz in the latter?

IF above Egypt offscreen results should be at default GPU frequencies (266 and 440MHz respectively) the results look quite reasonable.

Nebuchadnezzar · Jun 1, 2012

Ailuros said:
What am I missing, where's the 95% performance difference? http://www.glbenchmark.com/compare....00 Galaxy S III&D2=Samsung GT-i9100 Galaxy S2 ....or are you saying that the S2 results are at 440MHz in the latter?

IF above Egypt offscreen results should be at default GPU frequencies (266 and 440MHz respectively) the results look quite reasonable.

65fps?! Wait a moment. ......... Okey I just ran it again in 2.1.4 and now it gives me 61fps. A few weeks ago I was getting constantly 53fps. This is silly. Either they changed something between 2.1.3 > 2.1.4 or drivers did indeed improve performance and I didn't notice it in the meantime. Bollocks. French toast can you run an Egypt test on the latest GLBenchmark? I'm still waiting on my blue S3.

french toast · Jun 1, 2012

Egypt offscreen 720p- 11064 frames 98fps 98/65*100= 51%

Pro offscreen 720p -6167 frames 123fps

Note latest run, ive edged abit higher at 99 + 125 however ive only conducted 3 tests

EDIT; As im quite new to this app it seems ive made a mistake, or at least im confused as the app says its running in 1080p offscreen, version 2.1.4?? i thought 1080p was version 2.5?

Nebuchadnezzar · Jun 1, 2012

Well well. This pretty much leaves out only viable explanation indeed driver improvements, if Kishonti didn't change much in the benchmark.

So basically scores on the S2 improved 30-40% over the last year or so..

Ailuros · Jun 4, 2012

Nebuchadnezzar said:
Well well. This pretty much leaves out only viable explanation indeed driver improvements, if Kishonti didn't change much in the benchmark.

So basically scores on the S2 improved 30-40% over the last year or so..

Doesn't surprise me one bit; ARM isn't the only IHV with GPU IP where some driver and/or compiler tweaking brought significant performance increases over time.

almighty · Jun 4, 2012

Nebuchadnezzar said:
Well well. This pretty much leaves out only viable explanation indeed driver improvements, if Kishonti didn't change much in the benchmark.

So basically scores on the S2 improved 30-40% over the last year or so..

Adreno released a new driver for the Adreno 220 GPU in December and I gained 2-3x the performance jump in loads of apps.

I can get 53fps in Nenamark 2 which is much higher then even the Galaxy S2.

french toast · Jun 7, 2012

Right quick update about Exynos 4412 and quad cores in general on Android.

Loaded system tuner pro (amazing peice of software) to track all four threads to see whether they acually are being used, and if so how much.
It is quite clear that that they do get used very frequently, sometimes clocking all 4 at 800mhz, some times thrashing all four at 1.4ghz (more often than you would think) with those optimisations Samsung said being evident as 3 cores shut off, or 2 or even 1, with differnent cores being able to clock at different frequencies for extra efficiency.

The minimum speed is 200mhz, and you adjust it to only go to 1ghz using power saving mode, which does very slightly impact performance even with all 4 cores available, that along with the speed you run out of ram tells me Android could easilly use another gb ram and some higher single thread performance, amazing considering my phone is now considerably faster than my netbook.

So all those nay sayers saying that 4 cores were a waste of battery (batterylife is very good) and would be a waste of resources as they would be redundant can pipe down, i have have seen my self that all 4 threads are indeed used at differing frequencies with maxx frequency used more often than you would think, Android is silky smooth as a result.

Cheers.

Ailuros · Jun 7, 2012

While there will always be naysayers for pretty much everything, it remains a fact that efficiency per core (or per thread) is way more important than a sterile amount of cores. Besides personally I wouldn't care how often N hw is really needed, but when needed how badly exactly.

french toast · Jun 7, 2012

Yes i get your point, but most of what you have said is in software, Exynos 4412 is nearly at the apex of what can be done on a mobile device, i suspect Snapdragon S4 PRO, built on 28nm HKMG would be the ultimate sumit of both performance and batterylife, but for the next 2 years apart from ram, i can't say im going to want or need any extra power in my pocket, we have got to the stage of a pc in your pocket, ridiculous.

Can i ask you, how does the Adreno 320 compare to the S3's Mali 400 mp4?

EDIT; One more thing, in head to head gaming, ie Nova 3, Tegra 3 gets trounced by Exynos 4412.

Samsung Orion SoC - dual-core A9 + "5 times the 3D graphics performance"

Arun

Unknown.

french toast

Ailuros

Epsilon plus three

Exophase

Arun

Unknown.

Ailuros

Epsilon plus three

wco81

Ailuros

Epsilon plus three

Nebuchadnezzar

Nebuchadnezzar

french toast

Ailuros

Epsilon plus three

Nebuchadnezzar

french toast

Nebuchadnezzar

Ailuros

Epsilon plus three

almighty

french toast

Ailuros

Epsilon plus three

french toast

Similar threads