Samsung Exynos 5250 - production starting in Q2 2012

  • Thread starter Deleted member 13524
  • Start date
T624 and T604 have very recently both been submitted and accepted for GLes3.0 conformancy. I don't see them submitting the same graphics IP twice under different names.
http://www.khronos.org/conformance/adopters/conformant-products/#opengles

Why not? The third entry in that page shows me a list from nVidia with a Geforce GT430 and GT530 graphics cards, both of which share the exact same GPU (GF108), same core clocks, shader clocks and memory clocks. They're essentially the same graphics card, just with a different name.


T624,T628 and T678 were all defined as 2nd generation T600-series in the launch.
http://armdevices.net/2012/08/07/arm-launches-mali-t624-mali-t628-and-mali-t678-gpu/

"Mali-T624 a performance-enhanced version of the Mali-T604"
http://www.armtechforum.com.cn/2012/8_Next_Generation_Visual_Computing.pdf (see page 13)
And is cited as having "architectural enhancements", along with T-628 and T678.
"Each of the second generation Mali-T600 Series GPUs features a 50% performance increase compared to first generation Mali-T600 products"

T658 seems to have disappeared as well, but at least it didn't get into product before they had a memory wipe.

After seeing all kinds of shady naming schemes in GPUs for the last 8 years, who knows what 2nd generation really means? It could be just T604 with a driver update and/or somewhat higher clocks.
 
No they are different variants and to that variants that have been added fairly recently. The last document tangey linked to, mentions on page 13 that Mali T628 scales up to 5 GPixels/s and Mali T678 scales up to 378 GFLOPs at most likely 625MHz for both I'd say it would be one HELL of a driver update to squeeze out as many GFLOPs all of the sudden or better a pixy dust driver.

T604@5250 is afaik: 4*SIMD16, 1 TMU/SIMD, 2 SFUs/SIMD clocked at 533MHz
64 SIMD lanes * 2 FLOPs * 0.533GHz = 68.22 GFLOPs
8 SFUs * 1 FLOP * 0.533GHz = 4.26 GFLOPs
68.22 + 4.26 GFLOPs = 72.48 GFLOPs theoretical peak

Now an 8 cluster T678 without any changes compared to the above at 625MHz would have a theoretical peak:

128 SIMD lanes * 2 FLOPs * 0.625GHz = 160 GFLOPs
16 SFUs * 1 FLOP * 0.625GHz = 10 GFLOPs
160 + 10 = 170 GFLOPS theoretical peak or 208 GFLOPs short of the theoretical peak ARM itself is listing. So what could be at miss here that "accidentially" also promises 50% more performance? More FMACs per SIMD lane and most likely more SFUs/cluster. I don't think I need to do any speculative math of how T678 is laid out exactly.

What I wouldn't want to guestimate for it, is its total die area and power consumption.
 
We were comparing T604 to T624, so I didn't really understand why all the calculations about the superior models.
 
We were comparing T604 to T624, so I didn't really understand why all the calculations about the superior models.

And where do you suggest that 624 gets its supposed +50% performance from exactly?

graphics-and-GPU-Compute-roadmap.jpg


The T658 has been most likely canned since its no longer listed anywhere; the 678 is to the 624 what the 658 used to be compared to the 604.

The Mali-T678 delivers a 50% performance improvement compared to the Mali-T658.

http://www.arm.com/products/multimedia/mali-graphics-plus-gpu-compute/mali-t678.php
 
No they are different variants and to that variants that have been added fairly recently. The last document tangey linked to, mentions on page 13 that Mali T628 scales up to 5 GPixels/s and Mali T678 scales up to 378 GFLOPs at most likely 625MHz for both I'd say it would be one HELL of a driver update to squeeze out as many GFLOPs all of the sudden or better a pixy dust driver.

T604@5250 is afaik: 4*SIMD16, 1 TMU/SIMD, 2 SFUs/SIMD clocked at 533MHz
64 SIMD lanes * 2 FLOPs * 0.533GHz = 68.22 GFLOPs
8 SFUs * 1 FLOP * 0.533GHz = 4.26 GFLOPs
68.22 + 4.26 GFLOPs = 72.48 GFLOPs theoretical peak

Now an 8 cluster T678 without any changes compared to the above at 625MHz would have a theoretical peak:

128 SIMD lanes * 2 FLOPs * 0.625GHz = 160 GFLOPs
16 SFUs * 1 FLOP * 0.625GHz = 10 GFLOPs
160 + 10 = 170 GFLOPS theoretical peak or 208 GFLOPs short of the theoretical peak ARM itself is listing. So what could be at miss here that "accidentially" also promises 50% more performance? More FMACs per SIMD lane and most likely more SFUs/cluster. I don't think I need to do any speculative math of how T678 is laid out exactly.

What I wouldn't want to guestimate for it, is its total die area and power consumption.

I know in absence of real architectural disclosure from IHVs we have use bogoflops, but why must we count transcendentals as "flops"?:rolleyes:

160gflops in 2014 seems late. Rogue will be here this year.
 
Mmm dont understand whats h as happened with midguard, it started off very promising on paper. ..the fact a version didnt arrive with exynos 5410 was odd indeed, sgx 544 mp3 doesnt set the world on fire and that obviously was a better solution for samsung.

Rogue arrives soon which you would expect to put a beat down on current leader Adreno.

Speaking of which does anyone have any info on adreno 420?..
 
I know in absence of real architectural disclosure from IHVs we have use bogoflops, but why must we count transcendentals as "flops"?:rolleyes:

160gflops in 2014 seems late. Rogue will be here this year.

ARM is rating the T678 at 378 GFLOPs probably at 625MHz if you re-read my post. My calculation was merely an example that if T62x and T678 would be identical in terms of hw to the T604, the T678 would cut quite short in terms of GFLOPs.

ARM was the first to include transcedentials to arithmetic throughput, and if you'd think that Rogue's so far FLOP counts weren't for their majority with SFU FLOPs encounted you're wrong. If there's a reasonable way to include those (usually single FLOPs) for GPGPU f.e. it's not really that absurd for any marketing to quote them; after all even if you can't use them apart from some weird corner cases it's not like it never happened before. How many theoretical FLOPs did the G80 have with and without the infamous missing MUL?

Anyway 378 GFLOPs either way they're counted is whole damn LOT for a SFF mobile GPU for 2014; as I said though since T604 isn't exactly "small" or exactly humble when it comes to power consumption I wouldn't want to know what a T678 with twice the clusters, twice the MMUs and over twice as wide ALUs will look like in terms of die area and/or power consumption.
 
ARM is rating the T678 at 378 GFLOPs probably at 625MHz if you re-read my post. My calculation was merely an example that if T62x and T678 would be identical in terms of hw to the T604, the T678 would cut quite short in terms of GFLOPs.

ARM was the first to include transcedentials to arithmetic throughput, and if you'd think that Rogue's so far FLOP counts weren't for their majority with SFU FLOPs encounted you're wrong. If there's a reasonable way to include those (usually single FLOPs) for GPGPU f.e. it's not really that absurd for any marketing to quote them; after all even if you can't use them apart from some weird corner cases it's not like it never happened before. How many theoretical FLOPs did the G80 have with and without the infamous missing MUL?

Anyway 378 GFLOPs either way they're counted is whole damn LOT for a SFF mobile GPU for 2014; as I said though since T604 isn't exactly "small" or exactly humble when it comes to power consumption I wouldn't want to know what a T678 with twice the clusters, twice the MMUs and over twice as wide ALUs will look like in terms of die area and/or power consumption.

Marketing will be marketing, but why do WE have to consider a transcendental asone flop?
 
Marketing will be marketing, but why do WE have to consider a transcendental asone flop?

If you can use besides the typical ALU FMACs another ADD or MUL from the SFU f.e. without having to walk upside down on your hands while coding would you care?

There will be cases where we might see clusters in upcoming GPUs with as many SFUs as SIMD lanes, which might also be the case for the T678. If for GPU compute you can truly use instead of 32 FLOPs from a SIMD16 another 16 FLOPs from the SFUs, you'd sure as hell would mention them inside theoretical peaks whether you're a marketier or engineer.

If it's a rare case where you'd need to sacrifice virgins or walk upside down to reach those extra FLOPs then yes of course it's absolute nonsense to even mention them.
 
I see someone on the Mali developer forums asked what the difference is between T-604 and T-624.

"Both are "Midgard" architecture GPU, so provide almost the same functionality to the graphics API. Mali-T62x is a second generation Midgard core, so provides higher performance and energy efficiency at the same frequency vs Mali-T604. "

higher performance and energy efficiency...doesn't sound like a driver enhancement/renaming to me. So is switching to T624 as baseline (which appears to be what has happened based on the PDF I posted earlier), ARM's answer to the T604 power issues that have been highlighted by anandtech and others.

Also T624 supports ARM's own texture compression algorithm, I'm surprised to learn that T604 doesn't.

http://forums.arm.com/index.php?/topic/16588-mail-t604-and-t624-difference/

It looks increasingly to me that T604 has a very limited lifespan, perhaps the 5250 might be its only significant showing ?
 
Last edited by a moderator:
So if I'm reading those diagrams correctly: http://www.arm.com/products/multimedia/mali-graphics-plus-gpu-compute/index.php are Mali T6xx's scaling up to 4 clusters per core (in a relative sense)?

tangey,

If you consider that theoretical peak of 72 GFLOPs (FP32) for T604@533MHz isn't particularly high for a new generation GPU (stripped from SFU FLOPs it's actually at 68 GFLOPs) I'm not suprised that ARM went for a refresh within their architecture. Now as for power savings I don't obviously have a clue. I don't even know or have ever read anything about their FP64 throughput and how it's exactly realised in hw. Are they merging FLOPs within their ALUs to reach FP64 or do they have dedicated FP64 units? How many FP64 FLOPs at 533MHz exactly? Any changes regarding FP64 from T604 to later variants?
 
The fillrate test is a bit weird so I never bothered figuring out why exactly SGX's Android drivers were slower on it. It's fair to say it's unlike what you'll ever see in either a game or an UI (that doesn't mean it's worthless - just don't read too much into any architecture having problems with that specific workload).

The triangle test, however, tests a corner case but it is a valid corner case and I have absolutely no idea why it'd be any slower on Android or on the Exynos - in fact, I'm not even sure how driver or hardware revision could possibly influence it! Rys? ;)
 
The triangle test, however, tests a corner case but it is a valid corner case and I have absolutely no idea why it'd be any slower on Android or on the Exynos - in fact, I'm not even sure how driver or hardware revision could possibly influence it! Rys? ;)

I assume you are certain that the iPhone5 GPU block doesn't have by any chance something more?
 
I assume you are certain that the iPhone5 GPU block doesn't have by any chance something more?
I'm not confirming or denying anything (it's certainly not my place to comment on any of IMG's customers) - however my understanding of the test implies the numbers should be significantly higher for the Exynos at least. So to me, it seems unlikely to be a hardware difference.
 
Poof goes my theory (and it might had been a good one too...) :cry:

***edit: for accuracy's sake I should also admit that I vastly underestimated the 5410 for GLB2.7 where I didn't expect more than ~550 frames. Now there's obviously some good driver optimisation behind it (compared to the iPhone5 score), but it would be nice if Rys or someone else could explain why the vast difference in triangle rates.
 
I can't really explain that. At a big guess, it looks like something's gone wrong either in the app's timing code (unlikely) or possibly the GPU is operating at a low performance level via DVFS for some reason (much more likely).

I'll try and figure it out at some point, although I'm not exactly flush with free 5410s right now.
 
Kernel source has been released for the 9500.

I'll be updating information as I'm going through the sources.

The 5410 now has proper independent power-gating in its CPUIdle driver.

The CPUFreq driver is populated from 200MHz up until 1300MHz for the A7 cores and 2000MHz for the A15 cores in 100MHz steps.

The shipping CPU already is in its second revision REV_2_0.

The shipping frequencies are 500MHz to 1300MHz for the A7 cores, 800MHz to 1600MHz for the A15 cores.

EDIT: When the IKS is active, which means at all times, then the CPU is set up to run in "turbo"-like configurations, if 1-2 A15 are active, max frequency is 1800, 3 are active, it is 1700, if all 4 big CPUs are on, 1600MHz is the maximum frequency. I'm still reading through the max index they setup there so I can't confirm this behaviour yet, but it's there in the code.

Hotplugging seems to be DEAD! Finally!

The GPU _is_ running at 532MHz. There is a define for to ommit the last frequency step in the driver and limit it to 480MHz, however, in the released source this is not used and the frequency is 532MHz.
 
Last edited by a moderator:
Good to hear they're power gating the cores now... in inability to do that on Octa would be pretty devastating..

1.3GHz for A7s is good news. Ideally you'd have a fairly smooth performance curve between the A7s and A15s; I don't think this will quite give that (800MHz A15 will probably tend to beat 1.3GHz A7) but at least the gap is closer than it'd have been with a 1.2GHz limit. I can't wait to see some power consumption numbers, I hope someone (probably Anandtech) does a very deep dive on this.

Are you yet able to ascertain anything on how the cores are currently scheduled?
 
Back
Top