Qualcomm Krait & MSM8960 @ AnandTech

I thought A7 and A15's could share an L2. But I was under the impression it was true software-managed heterogeneous MP as well.
 
The big.LITTLE thing can't be completely OS-agnostic: the CPU are not the same (though they should still be compatible), but more importantly your L2 caches don't have the same size and so their software maintenance will be different.

I dont see why it cant be OS agnostic. Nvidia says the low power companion core in Tegra 3 is OS agnostic. The choice of core is decided by the CPU state requested by the OS.

So for big.LITTLE, even though the CPU's are different they run the same instruction set(which is why big.LITTLE is even possible), and AFAIK the caches should be shared. So i think it should be OS agnostic as well


Edit: Metafor beat me to it.

Btw when are we going to see the first MSM8960 product actually ship? Would they be able to ship in Q2?
 
I dont see why it cant be OS agnostic. Nvidia says the low power companion core in Tegra 3 is OS agnostic. The choice of core is decided by the CPU state requested by the OS.

So for big.LITTLE, even though the CPU's are different they run the same instruction set(which is why big.LITTLE is even possible), and AFAIK the caches should be shared. So i think it should be OS agnostic as well

If the caches aren't identical in regards to maintenance, then hardware switching wouldn't be as easy. Tegra 3 uses identical cores and identical cache logic (albeit, smaller in size) for its companion core, so the switch is an easy hardware mux.

But I was under the impression the L2 caches for an A7 and A15 were identical.
 
But I was under the impression the L2 caches for an A7 and A15 were identical.
Nope, they definitely don't *have* to be, even for the initial SW releases. And yeah, this does add some complexity compared to NVIDIA's shadow core...
 
Tegra 3 has a bit of lead time, but yeah... this looks like a winner.

Judging from Anandtech's testing, battery life won't get any better just from the 28nm generation, not that I expected it too. Bigger battery capacities should become more common to facilitate that.

From a CPU standpoint, this promises to be exactly what Qualcomm was going for: sizable lead time over A15 while arguably still being a better pairing of performance/power efficiency. A great step forward.

The GPU improves Qualcomm's standing a little, but they wouldn't be taking the crown there.
 
Thanks. That's not bad at all. Kind of makes you wonder why anyone would go with Tegra 3 at this point, by the way.

Maybe, but the Tegra 3 is a 40nm SOC, while the Krait is 28nm. Was there not a refesh planned of the Tegra 3 at 28nm in Q2?

Sure as hell did not stop several Smartphone makers making place for a Tegra 3 in there Smartphones.
 
Maybe, but the Tegra 3 is a 40nm SOC, while the Krait is 28nm. Was there not a refesh planned of the Tegra 3 at 28nm in Q2?

No idea what Grey (mainstream smartphones) will look like, but it could very well be some sort of former Tegra 28nm shrink. Wayne (Tegra4) however doesn't sound like a shrunk Tegra3@28nm and yes I still believe their former roadmaps showing T4 only being twice as fast on a SoC level compared to Tegra3 as bullocks. If it should be real, then Houston they have a problem.

Sure as hell did not stop several Smartphone makers making place for a Tegra 3 in there Smartphones.

You realize though that at this stage NV could only dream of to have even just 1/4th of Qualcomm smartphone SoC design wins?
 
Not impressed at all, mediocre integer performance on single thread workloads like sunspider is complete disappointment, it's on pair with A9 on clock to clock basis, though SIMD performance is good on per core basis, however this Java based Linpack from market completely dependent on VM performance, so possibly some optimisations to Davik has been added here. C Linpack on 1,2Ghz Exynos show performance around 1Gflop(10 times more than Java Linpack), so you could imagine the room for optimisation for Java Linpack. I think this dual core Krait will hardly loose to 1.6Ghz T33 almost everywhere except Vellamo
 
Maybe, but the Tegra 3 is a 40nm SOC, while the Krait is 28nm. Was there not a refesh planned of the Tegra 3 at 28nm in Q2?

Sure as hell did not stop several Smartphone makers making place for a Tegra 3 in there Smartphones.

Several? I've heard of the HTC One X. Have I missed others?

But those choices could be explained by scheduling concerns, as Tegra 3 enjoyed a nice head start. What about designs scheduled for the next few quarters?

I don't mean to derail the thread too much (I just thought T3 was relevant as having the biggest/fastest ARM CPU in mobile devices right now, PS Vita excluded) but I don't think the 28nm shrink is coming any time soon.

Plus, the MSM8960 benched by AnandTech was in a smartphone, perhaps tablets will feature faster-clocked chips.
 
I am curious about how sunspider and browsermark scales with the design changes. It does seem like its scaling only with frequency. Wonder where the bottleneck is.
 
I am curious about how sunspider and browsermark scales with the design changes. It does seem like its scaling only with frequency. Wonder where the bottleneck is.
I wonder the same. And I also wonder whether that 750 mW per core is correct, that's quite high.
 
But I was under the impression the L2 caches for an A7 and A15 were identical.

A7 has it's own L2 cache as does A15 and the two have different topologies (size and associativity). The two L2 caches are coherently connected via a fully coherent interconnect (CCI-400).

Therefore, for the switching phase of big.LITTLE Task Migration you do not need to flush any of the caches (and that includes the L1 on either A7 or A15) as the processing clusters are fully coherent and can snoop one another.

Only once you want to power a processing cluster down do you need to be aware of the differences in the L2 cache topologies. This does require either OS awareness or Hypervisor awareness if that route is followed. However with any processor the OS always needs to lookup the size of the L2 cache in order to execute the correct number of clean by set-way operations so big.LITTLE only adds a little extra work. Also, there is no performance impact in an A7-A15 big.LITTLE system because cache maintenance does not need to be performed until after the switch and the cluster is ready to be powered down.

On a Tegra-3 switch the shared L2 has lower OS awareness required, but you do need to flush the L1 caches from the outbound cluster prior to the switch because Cortex-A9 doesn't have the AMBA 4 ACE coherency extensions so it is not possible for two clusters to be coherently connected.
 
I wonder the same. And I also wonder whether that 750 mW per core is correct, that's quite high.

It's not unfathomable at 1.5GHz with a power-virus type workload and measuring from the root of the power rail. You have to take into account losses due to the power grid and power gates. But yes, it does seem rather high, too bad the rail voltage isn't reported.
 
Back
Top