That high parallelism comes at a cost in both die space and power consumption however. And will affect mobile GPUs far more than desktop GPUs. Mobile SOCs don't have the option, IMO, of going over 400 mm^2 or even 200 mm^2 just for the GPU. Likewise they aren't likely to ever exceed even 20 watts power consumption where desktop GPUs can hit well over 200. And even budget ones hit over 50.
As well, GPUs on the desktop already hit their power wall a few years ago. Which is why progress has slowed quite significantly. And if rumors of delays for both Nvidia and AMD are true then the situation has gotten considerably worse.
Mobile GPUs still have some room, but not all that much, IMO.
There's a new generation of mobile GPUs ahead to unfold (for NV it might take 1 or more Tegras depending on their roadmap). With each new process (on estimate for the time being 2-3 years for each full node for the time being) and a new hw generation every 4-5 years. We're at the verge of 28nm for those and a new GPU generation at the same time, so there's one major step ahead escalating with a slower pace until the next process node and then we'll have in about 5 years or so another generation.
No it's fairly impossible that SFF SoCs will scale beyond 200mm2 and that 400mm2 paradigm sounds even exaggerated for upcoming console SoCs. Let's say they're in the 300-350@28nm league; with 2 shrinks down the road console manufacturers will be able to go down to either south of 200 or at 200mm2 and no those SoCs weren't designed from the get go for low power envelopes either.
Assume under 28nm some SoC designers go again as far as up to 160mm2 and hypothetically "just" 10mm2 get dedicated into GPU ALUs. Based on the data I've fetched you need for synthesis only 0.01mm2 at 1GHz /28HP for each FP32 unit. Consider that synthesis isn't obviously the entire story and you don't necessarily need neither 1GHz frequencies nor would you use HP for a SFF SoC. Under 28HP 1GHz however the theoretical peak is at 2 TFLOPs and you may dump down from there. Note that I'm not confident that those GPUs will reach or come close to the TFLOP range under 28nm, but I'd be VERY surprised if they won't under 20LP and no not obviously at its start.
NV and AMD are slowing things down but it's a strategic move since it's better to milk the remaining crop of gaming enthusiasts at high margins, than go for volume. However GK110 comes within this month and it'll have somewhere north of 4.5 TFLOPs FP32 (which is 3x times the GF110 peak value) and the Maxwell top dog in either late 2014 or early 2015 isn't going to scale by less if projections will be met and that's always not will all clusters enabled.
Hence why I used Exynos 5 as an example as it at least has had a fairly nice study on the power characteristics of the chip, the CPU and the GPU.
What is it's peak power envelope 8W with roughly half going for the CPU and half the GPU? Another nice uber-minority paradigm to judge from what exactly? Not only is the T604 market penentration for the moment uber-ridiculous, but we'd have to ask why on earth ARM was so eager to integrate that early FP64 units into the GPU. I'll leave it up to anyone's speculation how much it affects die area and power consumption exactly, but in extension to the former synthesis rate of 0.01mm2@28HP you need for the same process same frequency 0.025mm2. That's a factor of 2.5x and while they obviously have only a limited number of FP64 units in T604, it's not a particular wonder that they're stuck at "just" 72GFLOPs peak theoretical while a upcoming G6400 Rogue would be on estimate at the same frequency at over 170GFLOPs.
***edit: note that I have not a single clue how ARM integrated FP64, but whether they used FP32 units with loops or dedicated FP64 units, it's going to affect die area either way.
The GPU in that can already exceed 4 watts. That's already getting to the point where it will make light weight mobile devices (tablets) more difficult to design. It's basically at a wall for mobile designs unless battery power density increases to match increased power consumption. Otherwise we'll have devices getting heavier and heavier. Or battery life getting shorter.
See above. It yields in GL2.5 around 4100 frames while the 554MP4@280MHz is at ~5900. Still wondering why Samsung picked a 544MP3 at high frequencies for the octacore? Even more ridiculous the latter will be somewhat faster than the Nexus10 GPU. An even dumber question from my side would be why Samsung didn't chose a Mali4xxMP8 instead for the octacore; both die area and power consumption would had been quite attractive.
It's impossible to guess since we don't even have the smallest clue as to how large the Durango or Orbis SOCs are.
Regards,
SB
See first paragraph above; as a layman I have the luxury to not mind to ridicule myself. I'm merely waiting to stand corrected for my chain of thought. Mark that I NEVER supported the original posters 5 year notion for Tegras. I said more than once that if you stretch that timespan it's NOT impossible.