Qualcomm Krait & MSM8960 @ AnandTech

Arun

Unknown.
Moderator
Legend
Here's the article: http://www.anandtech.com/show/4940/qualcomm-new-snapdragon-s4-msm8960-krait-architecture
It seems based primarily (but not exclusively) on this whitepaper: http://www.scribd.com/doc/67918290/EWoyke-S4-White-Paper-Rd2-LR

Krait Summary:
- 3.3 DMIPS/MHz (Scorpion: 2.1, A9: 2.5, A15: ~3.5)
- 11+ stages integer pipeline (A9: 8+, A15: 14+)
- 3 decode ports (vs 2 for A9/Scorpion and 3 for A15)
- 7 execution ports (A9: 4 IIRC, A15: 8)
- 4 issue ports (A9: 3+Branch, A15: 8)
- 4KB+4KB L0 (1c latency)

Krait Analysis: When comparing DMIPS/MHz numbers, always remember Dhrystone is a ridiculously outdated benchmark, but also remember that (like Coremark) it runs completely out of L1 cache (although far from completely out of L0 cache on Krait - the L0 is all about power, not performance, though). That makes it completely impossible to judge the quality/complexity of the OoOE hardware (it's basically just there wasting power most of the time) and doesn't say much about the load/store pipeline either (e.g. the A15 can issue a load ahead of a store as soon as the load's address is available - it doesn't have to wait on the data, which is a neat trick, no idea if Krait can do something similar). And like the A9 (& Intel P6) but unlike the A15 (& AMD K8/K10), Krait seems to have a single large issue queue rather than many small ones. It's hard to say how true ARM's arguments are about potential power savings here, but this does give a slight performance advantage to the A15. This performance improvement will be especially noticeable after a critical L2 cache miss as the decode hardware will be able to fill the queues and the execution hardware will be able to churn through them faster afterwards - once again, this won't help Dhrystone because it never misses L1 (let alone L2) but with 4 issue ports it's nearly certainly not a big advantage. However I'm listing all these examples to highlight how the DMIPS/MHz score doesn't mean that much.

If I had to guess about the real world, Krait will be a lot faster than the Cortex-A9, but the Cortex-A15 will be significantly faster than Krait (at least on integer workloads). Part of that advantage is simply clock speeds; Qualcomm can say all they want about Krait being competitive there, but the reality is it'll launch at 1.5GHz on TSMC 28LP (supposedly scaling up to 1.7-2GHz later though) whereas OMAP5 will run the A15 at 2GHz on UMC 28LP. ST-E A9600 will run at up to 2.5GHz on GF 28SLP (High-K without SiGe) whereas the APQ8064 quad-core will also run at up 2.5GHz but on a TSMC 28HPM (High-K *with* SiGe) process. However, I suspect Krait will also be significantly more power efficient than A15, and I suspect this will also affect performance: while the A9600 can run at up to 2.5GHz, it won't reach that high with both cores on for power reasons, and will be subject to thermal throttling (especially when running at the same time as the 200GFlops+ Rogue GPU!) much more often than the MSM8960. Therefore the increased power efficiency could result in slightly closer performance than you would expect when limited to a smartphone TDP - on a tablet, I don't think the power advantage will help as much for performance.

Krait is still clearly a very impressive core though, and I look forward to learning even more about it if Qualcomm ever gives more details, or when it's publicly available otherwise.

---

GPU: Exact same core clocked 50% higher (400MHz vs 266MHz - ala Exynos 45->32nm but without the High-K!) but with feature (e.g. MRT and sRGB) improvements to support DX9_3 and supposedly significant driver improvements.

Cellular: It's the same core as the 28nm MDM9615 and supports LTE Release 9 features (just like MDM6600 supported Release 7 HSPA+ features without being 21Mbps). Very interestingly it still supports simultaneous 1x Voice and LTE data for CDMA networks. This is presumably without a secondary RF chip so it's fair to say there's a separate 1.25MHz RF chain for 1x in addition to the 10+10MHz RF chains for LTE and/or DC-HSPA (all downlink, obviously it'd FDD so extra MHz on uplink).

Connectivity: This is a biggie: the WiFi & Bluetooth basebands are integrated just as I thought! In the past Qualcomm integrated the GPS baseband and added a GPS RF circuit to their 3G RF circuit (they still do). But now the WiFi/BT baseband is also integrated but I'm absolutely certainly the RF is not integrated in the RF chip - it's in a separate wireless combo chip, the 65nm WCN3660, which therefore needs to have a lot less digital logic than competitive combo chips (maybe part of the PHY is still in there so the I/O link with the baseband isn't too busy, just like DigRF 3G moved a bit of the processing to the RF chip?) and gets away with a much smaller die size (Qualcomm claims <15mm² including FM Rx/Tx whereas the BCM4330 takes 25.5mm²).

This is a VERY smart integration strategy (much smarter than the retarded "let's put both the BB & RF for BT/FM on our 3G RF chip") and leads the way towards integrating all RF into the same chip. Given how aggressive Qualcomm has always been with process technology, I don't think a true single-chip solution ever made sense for them, but a smart integration approach like this makes a lot of sense. And a 72Mbps WiFi and Bluetooth baseband must take very little die size on 28nm so there's not much wasted if customers prefer using a competitor's wireless combo chip. It will be interesting to see what they do now that they've acquired Atheros - Qualcomm's BT/WiFi wasn't very impressive last generation, but it's supposedly improved, and combined with Atheros (which still has the world's smallest mobile WiFi chip on 65nm!) they are in a very strong position here.
 
The L0 cache seems pretty clever, but isn't 85% hit rate for a direct mapped cache rather high/implausible?

Also, L0 has to be inclusive in L1 for it to make sense.
 
The L0 cache seems pretty clever, but isn't 85% hit rate for a direct mapped cache rather high/implausible?

Also, L0 has to be inclusive in L1 for it to make sense.

The numbers come from a very very limited (and frankly outdated) set of benchmarks. Arun already mentioned one.
 
Qualcomm can say all they want about Krait being competitive there, but the reality is it'll launch at 1.5GHz on TSMC 28LP (supposedly scaling up to 1.7-2GHz later though) whereas OMAP5 will run the A15 at 2GHz on UMC 28LP. ST-E A9600 will run at up to 2.5GHz on GF 28SLP (High-K without SiGe) whereas the APQ8064 quad-core will also run at up 2.5GHz but on a TSMC 28HPM (High-K *with* SiGe) process.

1.5GHz would make sense, if the products arrive in mid-2012 for Qualcomm while the competitors higher clocked chips arrive in late-2012/early 2013.
 
Yeah; being so far ahead to market with good processor platforms than the next generations from their competitors sets Qualcomm up for another period of very high marketshare.

I think TI will still do well, and Qualcomm's claims about graphics aren't entirely compelling.
 
Anand's details about the Adrenos reveal just how robust they are. Some of the low level GLBenchmark tests have always suggested it.

The potential for huge performance gains with improved drivers/software is there, yet needing another generation with a 50% boost in clock to catch an older 543MP2 isn't an argument for dismissing the possibility of partnering with IMG for PowerVR in the future.
 
Lazy8s said:
Anand's details about the Adrenos reveal just how robust they are. Some of the low level GLBenchmark tests have always suggested it.

The potential for huge performance gains with improved drivers/software is there, yet needing another generation with a 50% boost in clock to catch an older 543MP2 isn't an argument for dismissing the possibility of partnering with IMG for PowerVR in the future.


In fairness, no other SOC solution is competitive with Apple's A5 in graphics either. The issue is whether or not they're lagging the rest of the market, which for the time being, is clearly not the case.
 
I hope these driver improvements make their way back to Adreno 220 devices.

I was wondering the same thing.
Maybe we'll see a substantial 3D performance increase in Snapdragon S3 devices when Ice Cream Sandwich is released?

BTW, is there a list saying which Snapdragon S3 devices are using 32bit memory and which are using 2*32bit?
 
The L0 cache seems pretty clever, but isn't 85% hit rate for a direct mapped cache rather high/implausible?
Assuming 64 byte cache lines: If the test applications do mainly 32 bit reads (integer & single precision float operations) and access near 100% of each fetched cache line (pretty much all highly cache line optimized bucketed structures and linear memory accesses do this), approx 16 reads will be from the same cache line, and we will have 15/16 (93.75%) hit ratio. But with that small cache, especially a direct mapped one, I agree that 85% seems pretty high for general average statistic.
 
In fairness, no other SOC solution is competitive with Apple's A5 in graphics either. The issue is whether or not they're lagging the rest of the market, which for the time being, is clearly not the case.

For the time being is absolutely correct; however anyone having doubts for the foreseeable future aren't that unjustified either. You don't need a SGX54xMP2 at any price for any upcoming device; it's about enough to have a single 543/544 core clocked at 500MHz and you already have roughtly A5 graphics performance.

Apple is a chapter of its own in any case; by the time others reach A5's graphics performance they'll be rolling out their next SoC.

Exophase,

I have severe doubts that any IHV codes separates drivers for each and every GPU variant of the same generation. IMHO (but would like to stand corrected if wrong) it should be one shader compiler and one driver per generation. All it comes down to then would be if an OEM integrates the newer driver or not.
 
I have severe doubts that any IHV codes separates drivers for each and every GPU variant of the same generation. IMHO (but would like to stand corrected if wrong) it should be one shader compiler and one driver per generation. All it comes down to then would be if an OEM integrates the newer driver or not.
Though drivers for an architecture lineup might normally come from one codebase you can be sure that that is littered with cases and conditional compiles. Sometimes drivers don't work on older models not because they really couldn't but because the IHV is not willing to commit the QA resources for the older hw.
 
Though drivers for an architecture lineup might normally come from one codebase you can be sure that that is littered with cases and conditional compiles. Sometimes drivers don't work on older models not because they really couldn't but because the IHV is not willing to commit the QA resources for the older hw.

If it doesn't cause any problems (which I can't figure why it would) I don't see why an IHV wouldn't want to commit for older hw.
 
Can someone explain Qualcomm design philosophy to me? They do these highly custom cores to bundle with their cellular radios. However, they are a very popular radio manufacturer, so they could bolt on stock Cortex cores and people would buy their chips. From what I've seen, their modifications do not make them clear and away performance winners, nor do they trounce the competition in battery life. What gives?
 
Can someone explain Qualcomm design philosophy to me? They do these highly custom cores to bundle with their cellular radios. However, they are a very popular radio manufacturer, so they could bolt on stock Cortex cores and people would buy their chips. From what I've seen, their modifications do not make them clear and away performance winners, nor do they trounce the competition in battery life. What gives?

For starters, it's probably a lot cheaper to license an instruction set from ARM than an entire IP core.

Furthermore, the 1st-Gen Snapdragons were clear CPU performance winners compared to Cortex A8, and Krait has the advantage of being available sooner than Cortex A15 about half a year, making it clearly better than Cortex A9 solutions. These launch dates asynchronous with the other SoC manufacturers are usually good (look at Tegra 2).
 
Back
Top