Can in your opinion a Tegra4 max out its real performance today?
Sure. It'll easily be able to utilize a single core at 1.9GHz for short periods of time while launching apps, loading web pages, etc, where there's enough work to be done that it's worth taking less perceived time. Only in some specialized scenarios like bulk compression will it make sense for it to use all of its four cores at whatever peak clock is allowed, but they could exist. It will probably be able to max the GPU in some games if allowed to render to a really high resolution, particularly if it has optional features that can increase GPU load, or has them turned on for Tegra 4s.
This of course doesn't mean the pathological case of running the CPUs and GPU both at full capacity simultaneously but this applies to anything on the market (and why I think it's disingenuous when review sites start doing this)
That roadmap isn't new they just re-adjusted it slightly recently. I was always breaking my head in the past years WTF T4 is placed so close to T3, while the distance between T3 and T2 is bigger. If you level the placement in that diagram (yes I know marketing yadda yadda...) to quite simple tasks with a very specific perf/W ratio then of course power consumption between T4 and T3 for those use cases is not going to change significantly but so won't performance despite the T4 carrying a by several times faster GPU than the T3.
Not really sure what you're saying, CPU performance won't change significantly for Tegra 4 vs Tegra 3? Of course that isn't true. Maybe perf/W doesn't improve a lot, but I wanted to get across that you needed some significant mixture of improving perf/W and perf. Not necessarily a strong amount of both, it's enough to increase peak perf a lot while not decreasing perf/W by much, if you have the (short term) power budget to utilize it.
I haven't digged too much into A5x to be honest, but it was my impression so far that besides 64bit one of the important changes were much higher perf/W.
So you think Cortex-A57 substantially increases peak perf and perf/W on the same node while not using so much more area that it becomes impractical to put four of them on an SoC? That doesn't sound realistic.
Of course perf/W isn't an absolute comparison, you can have a different curve shape and therefore be better at some parts and worse at others. But from what we know of A9 vs A15, I'm going to say that ARM by and large A15 has notably worse perf/W than A9 given a similar node and implementation. nVidia says its power optimized 845MHz core uses 40% less power than a 1.6GHz performance optimized A9 in Tegra 3 (although that A9 perhaps had the benefit of better dynamic power consumption vs poorer static). They say they achieve the same performance but I expect that in most cases this won't actually be true. 40% is about what you'd get from a shrink. This huge difference in clock speed represents what would probably be a best case for a perf/W comparison.
Point is, ARM clearly sacrificed power efficiency at the expense of peak perf, and they had no intention of making A15 supersede A9 but offer another point optimized for different devices/usage scenarios.
A57 will likely only get more complex/aggressive (although I could see it more outright replacing A15). It is possible that A15 had some glaring problems or poor balance issues power-wise and that A57 fixed them, you would expect at least some level of optimization and ARM is still pretty new at some of these wider/heavier CPU structures. A7 probably achieves substantially better perf/W than A8 (at the same node) while offering competitive peak perf. So improvements do happen, but you can't really make any assumptions about them and ARM hasn't really said anything about big perf/W breakthroughs for A57.
Cortex-A53 isn't really on the table atm, I doubt nVidia is even interested if they didn't go for A7, but who knows - maybe they were too far into Tegra 4's design by the time it would have even been up for consideration.
I'm confident that the result might resemble a lot from the outside to a reduced Kepler cluster; in reality it won't be anything else IMHO then a SoC GPU block fine tuned for SFF markets with all the lessons they learned with Kepler included.
Perhaps, but I'm mainly referring to DX11 features (or OpenGL 4.3 at least, which they did specify) and unified precision in shaders, which you yourself have often said come at a big area cost.
Trick question then: is it likelier that first desktop Maxwell chips will arrive on 28 or on 20HP and why?
Couldn't guess because I have no idea when those are supposed to be released.
Well that's one of those typical marketing oxymorons you hit on all of those kind of slides whereever they come from. Someone in another forum tried even a funkier explanation and claimed the scale is for GFLOPs with the Parker GPU ending up at a glorified 1 TFLOP. The unfortunate thing is that its nonsense to compare FP20 with FP32 ALUs as the primary point and the next best being that the ULP GF in T2 delivered just 5.33 GFLOPs, so no that scale won't work just according to some folks convenience. Despite it being a marketing slide there is a reasoning behind it however twisted it might be due to its marketing nature.
Frankly I barely pay attention to nVidia's marketing, it's pretty consistently ridiculous