Samsung Exynos 5250 - production starting in Q2 2012

balagamer · Jul 5, 2013

Nebuchadnezzar said:
Oh hell Samsung, shame on you!

I'm currently doing GPU overclocking and voltage control in the kernel for the 5410/i9500 and was screwing around with what was supposed to be a generic max limit only to be surprised by what it actually represents.

This GPU does not run 532MHz; that frequency level is solely reserved for Antutu and GLBenchmark* among things. The GPU on non-whitelisted applications is limited to 480MHz. The old GLBenchmark apps for example run at 532MHz while the new GFXBench app which is not whitelisted, runs at 480MHz. /facepalm

For anybody interested, here's some scores at 640MHz, for comparison's sake of what 544MP3 could do. I tried 700 but that wasn't stable within the prescribed upper voltage limit (1150mV).

GFXBench 2.7.2 (offscreen):
2.7 T-Rex: 14fps
2.5 Egypt: 48fps

Antutu 3DRating (onscreen): 8372 / 31.4fps
Antutu 3.3.1 3D benchmark: 8584

Basemark Taiji: 46.54

3DMark:
Ice storm standard: 11357 overall, 11486 graphics, 58.1fps GT1 43.8fps GT2
Ice storm extreme: 7314 overall, 6680 grapgics, 39.1fps GT1, 23.1fps GT2

with gfx benchmark the fill rate scores are drastically different when compared to iphone5 despite having similar gpu as s4, is that limitation due to platform or due to some other factors?

Deleted member 13524 · Jul 5, 2013

balagamer said:
with gfx benchmark the fill rate scores are drastically different when compared to iphone5 despite having similar gpu as s4, is that limitation due to platform or due to some other factors?

Clock speeds. The GPU in the iphone 5 is lower clocked.

Ailuros · Jul 5, 2013

ToTTenTranz said:
Clock speeds. The GPU in the iphone 5 is lower clocked.

325 (A6) vs. 480MHz (E5410)

By the way the lowest fillrate efficiency out of the crop of Series5XT/6 GPUs they used for that graph should be the 544MP3 in the Exynos5410:

PowerVR-Series5XT-Series6-vs-competing-GPUs-fillrate-efficiency.jpg

http://withimagination.imgtec.com/i...vrs-market-leading-fillrate-efficiency-part-8

balagamer · Jul 5, 2013

ToTTenTranz said:
Clock speeds. The GPU in the iphone 5 is lower clocked.

take a look at triangle throughput here

http://gfxbench.com/compare.jsp?cols=2&D1=Samsung+GT-I9500+Galaxy+S4&D2=Apple+iPhone+5

iphone 5 doubles the score on almost all chart there, amy be due to the difference in rendering reolsution?

tangey · Jul 5, 2013

ToTTenTranz said:
Clock speeds. The GPU in the iphone 5 is lower clocked.

In glbenchmark, Samsung 5410 has 13% better off-screen fill rate than iphone5, but has 60%+ higher clock (assuming the 5410 is clocking @ 533Mhz for the test).

So either Samsung graphics datapath is inferior to the one in the A6, or its a driver issue.

tangey · Jul 5, 2013

Ailuros said:
325 (A6) vs. 480MHz (E5410)

By the way the lowest fillrate efficiency out of the crop of Series5XT/6 GPUs they used for that graph should be the 544MP3 in the Exynos5410:

Yes and even that is probably being generous, as the graph might assume 480Mhz clock to work out the theoretical fill rate.

Ailuros · Jul 5, 2013

tangey said:
Yes and even that is probably being generous, as the graph might assume 480Mhz clock to work out the theoretical fill rate.

It's so far my understanding that it actually clocks in the widest majority of cases at 480MHz and only in a couple of benchmarks at 532MHz. I might understood Nebu wrong but I think it clocks at 480MHz in GLB.

Besides that I'd still love to know which the nearly 100% efficiency variant is.

Nebuchadnezzar · Jul 5, 2013

Ailuros said:
It's so far my understanding that it actually clocks in the widest majority of cases at 480MHz and only in a couple of benchmarks at 532MHz. I might understood Nebu wrong but I think it clocks at 480MHz in GLB.

Besides that I'd still love to know which the nearly 100% efficiency variant is.

480 in GFXBench and 532 in the old GLB apps.

I'm curious about the wording in that IMG blog, as if it wants to say that the inefficiency is because of the high clocks. I'm pulling straws here.

I did a quick bench of 350 vs 480MHz, both those clocks on the GPU force a memory lock to 800MHz so bandwidth shouldn't be an issue:

350:
2.5 Egypt offscreen: 3534 frames
[strike]Fill-rate offscreen: 987526ktex/s[/strike]

480:
2.5 Egypt offscreen: 4517 frames
[strike]Fill-rate offscreen: 1323662ktex/s[/strike]

37.14% superior frequency for 27.81% improvement in Egypt [strike]and 34.03% improvement in fill-rate.[/strike]

I can do some more synthetic benches while locking all of the phone's frequencies and several runs if somebody would like to see that.

PS: Does that ImgTech blog even take into account Exynos's cheating?

To illustrate this, the graph below shows fillrate efficiency calculated based on independent measured fillrate data from Kishonti’s GFXBench suite.

Would be funny if the efficiency is calculated based on a 532MHz clock but 480MHz results

PS2: I found the fill-rate to be very bogus, reran it:

480:
Run1 1902600 ktex/s
Run2: 1415437 ktex/s
Run3: 1911574 ktex/s
Run4: 1936990 ktex/s

350:
Run1 1630870 ktex/s
Run2: 350: 1644457 ktex/s
Run3: 350: 1674204 ktex/s

Given the above reruns, it's even worse: only 15.69% improvement on the best scores between 350 and 480, that's bandwidth limitation, right?

I'll have to investigate GPU thermal throttling...

Ailuros · Jul 5, 2013

Nebuchadnezzar said:
Would be funny if the efficiency is calculated based on a 532MHz clock but 480MHz results

6 TMUs * 480MHz = 2.88 GTexels/s
Kishonti results onscreen = 1.97 GTexels/s = 68%

480:
Run1 1902600 ktex/s
Run2: 1415437 ktex/s
Run3: 1911574 ktex/s
Run4: 1936990 ktex/s

350:
Run1 1630870 ktex/s
Run2: 350: 1644457 ktex/s
Run3: 350: 1674204 ktex/s

Given the above reruns, it's even worse: only 15.69% improvement on the best scores between 350 and 480, that's bandwidth limitation, right?

I'll have to investigate GPU thermal throttling...

Alas if its already throttling in a simple fillrate test. Either the driver needs some serious work, or there's something else wrong with bandwidth being one of probably many candidates.

Nebuchadnezzar · Jul 5, 2013

Ailuros said:
6 TMUs * 480MHz = 2.88 GTexels/s
Kishonti results onscreen = 1.97 GTexels/s = 68%

And if you lower the frequency, the efficiency goes up.

Made a across-the-table sweep on some possible scenarios:

I also tested lowering the internal bus but that didn't have any effect at all on the scores.

What are the actual bandwidth requirements per TMU per cycle?

Ailuros said:
Alas if its already throttling in a simple fillrate test. Either the driver needs some serious work, or there's something else wrong with bandwidth being one of probably many candidates.

I'm still not aware of any GPU throttling mechanism, but memory has throttling in place.

tangey · Jul 6, 2013

Ailuros said:
.

Besides that I'd still love to know which the nearly 100% efficiency variant is.

Iphone5 isn't far away.

Assuming its 325 MHz. Then 650 per core. X3=1950 M t/s
Have to allow a small reduction as IMG have said multi core performance scales about 95% linear. Would work out about 1850.

Offscreen fillrate in gfxbench is 1835

Ailuros · Jul 6, 2013

Nebuchadnezzar said:
And if you lower the frequency, the efficiency goes up.

Made a across-the-table sweep on some possible scenarios:

Interesting.

I also tested lowering the internal bus but that didn't have any effect at all on the scores.

What are the actual bandwidth requirements per TMU per cycle?

No idea to be honest.

I'm still not aware of any GPU throttling mechanism, but memory has throttling in place.

Or Series5XT cores simply aren't meant for very high frequencies, unlike of course Series6 according to my so far understanding.

Nebuchadnezzar · Jul 8, 2013

http://browser.primatelabs.com/geekbench2/compare/2136607/2136542

Can anybody theorise the difference in stream scores in the above results? The higher one is from A15 at 800MHz and the other one is A7's at 1500MHz.

Which also by the way answers my question from the MediaTek thread about how high the 5410's A7's can go. The cores are very frugal even at a higher voltage.

Exophase · Jul 8, 2013

Nebuchadnezzar said:
Can anybody theorise the difference in stream scores in the above results? The higher one is from A15 at 800MHz and the other one is A7's at 1500MHz.

This is by no means a solid analysis, but maybe the A7 is latency bound by the FPU operations while the A15 isn't (technically even the copy operation should be going through the FPU). Stream consists of very tight loops; if the compiler isn't unrolling it then you could end up with loading and storing to the same registers causing stalls due to WAW hazards (and hitting RAW with load-use latency, to some degree). The A15 would hide this due to its register renaming.

But that doesn't explain why Cortex-A9s get much better scores in Geekbench, since they should be subject to the same problem.

Arun · Jul 8, 2013

Nebuchadnezzar said:
And if you lower the frequency, the efficiency goes up.

Very interesting data, thanks!

Nebuchadnezzar said:
I'm still not aware of any GPU throttling mechanism, but memory has throttling in place.

Is there any way to change memory CAS like on the desktop? That could be very interesting to test the impact of latency vs bandwidth (although I don't know if the memory controller itself is clocked based on memory frequency and whether it plays a noticeable part in total latency or not).

It's worth pointing out that LPDDR3-1600 has worse latency than LPDDR2-1066 (not sure exactly how it compares to LPDDR2-800) so you might have a double whammy of higher latency than some competing systems with higher GPU frequency as well. So I suspect the memory latency in cycles might be higher than on any other SGX device (except for very badly designed ones perhaps, I don't really know).

Nebuchadnezzar · Jul 8, 2013

Arun said:
Very interesting data, thanks!
Is there any way to change memory CAS like on the desktop? That could be very interesting to test the impact of latency vs bandwidth (although I don't know if the memory controller itself is clocked based on memory frequency and whether it plays a noticeable part in total latency or not)..

Yes, but the value fields are undocumented so i would be changing them blindly. Line 192; https://github.com/AndreiLux/Perseus-UNIVERSAL5410/blob/perseus/drivers/devfreq/exynos5410_bus_mif.c

french toast · Jul 18, 2013

A new 'enhanced exynos ' due to be announced next week
http://m.gsmarena.com/enhanced_exynos_5_octa_unveiled_on_its_way_to_galaxy_note_iii-news-6413.php

Could this be the rumoured 5420? If so what chance have w got of seeing something like a mali t624?

wishiknew · Jul 23, 2013

Very weird. T628 MP6 this time. http://www.engadget.com/2013/07/22/exynos-5-octa-5420/

Nebuchadnezzar · Jul 23, 2013

wishiknew said:
Very weird. T628 MP6 this time. http://www.engadget.com/2013/07/22/exynos-5-octa-5420/

600MHz supposed target clock.

french toast · Jul 23, 2013

wishiknew said:
Very weird. T628 MP6 this time. http://www.engadget.com/2013/07/22/exynos-5-octa-5420/

Surprising, I thought the t624 would be used...and mp6? Didnt see that one.

This has settled it then, a galaxy note 3 containing this beasty will be my next phone.
What about power consumption, hope they hav implemented big little better on this.

Samsung Exynos 5250 - production starting in Q2 2012

balagamer

Deleted member 13524

Guest

Ailuros

Epsilon plus three

balagamer

tangey

tangey

Ailuros

Epsilon plus three

Nebuchadnezzar

Ailuros

Epsilon plus three

Nebuchadnezzar

tangey

Ailuros

Epsilon plus three

Nebuchadnezzar

Exophase

Arun

Unknown.

Nebuchadnezzar

french toast

wishiknew

Nebuchadnezzar

french toast

Similar threads