Different technologies i see.. well if they are keeping the same 4+1 design, and Cortex A7 can't be used...(i still don't see why they couldn't..and just give the A7 its own cache?)..then they certainly wont get the full benefit using an A15...so they would probably reuse the A9?...
I'm saying if they do the same thing they did in Tegra 3 they'd need to use an additional Cortex-A15, not that A7 doesn't work but A9 somehow does. A9 would in fact be more problematic because it doesn't have all of the instructions A15 and A7 has.
The point of the matter isn't about whether or not they can pair a single Cortex-A7 with a Cortex-A15 4-core cluster, obviously they can since the coherency protocol allows it and ARM themselves uses it for big.LITTLE (no idea if ARM will let you configure something like 4 A15s with 1 A7 though).
What I'm saying is they can't do the SAME thing they did in Tegra 3. Doesn't actually matter if they use a Cortex-A15 as the companion core, they can't mux one away from the L2 cache. So when I see "4 + 1" I think one of two possibilities:
1) nVidia is capitalizing on keeping terminology consistent even though the approach isn't (and may well not be limited to 4 vs 1 instead of 5 concurrent cores - no reason for this!)
2) The leak, like so many before it, is made up and the author thinks that the same approach as in Tegra 3 can be used because they don't understand the technology
Note the key point here for "the way nVidia does it" is that it's handled completely transparently in hardware. big.LITTLE requires OS support. In order to do this in Tegra 3 they'd have to have to duplicate the same sized L2 cache for that single Cortex-A15 vs the four core one which would be a massive waste.
I think im getting mixed up with the power gated L2 from krait/Saltwell/A9?...decoupling L2 is a different method entirely?...
I'm not talking about separate power rails.
When you buy a Cortex-A9 from ARM it doesn't come with L2 cache. It comes with the usual AXI bus interface, and you have to put a separate L2 cache (including controller) on the other side of this bus. They added a couple of things to the bus as optional cache optimizations, but they warn that it breaks the standard bus operation.
Doing it this way makes the L2 higher latency and quite likely, lower bandwidth, depending on clocking. Usually the core includes L2, and the L2's design is closely integrated into the critical path/pipeline of the processor.
Decoupled L2 means you can interface things besides CPU cores to the same L2. It also means you can do stuff like what nVidia did with the companion core. With tightly integrated L2 you can't do these things the same way.
Either way, Tegra 4 is going to be the leader if it uses Kepler derived GPU..as they will provide software that utilises it.
What makes you so sure about that? I'll entertain the notion that GPU dominance = SoC dominance (I don't believe it at all). But just because Kepler is doing very well in the high end 200+ TDP and perhaps down to the 20+ or so TDP segments, it doesn't mean it'll easily beat other mobile solutions assuming it can really scale down to < 2W in the first place. It still doesn't have the bandwidth enhancements that say, TBDR and PVRTC 2bpp give IMG's Series 5, never mind the framebuffer compression and who knows what else Rogue brings..