Here's the article: http://www.anandtech.com/show/4940/qualcomm-new-snapdragon-s4-msm8960-krait-architecture
It seems based primarily (but not exclusively) on this whitepaper: http://www.scribd.com/doc/67918290/EWoyke-S4-White-Paper-Rd2-LR
Krait Summary:
- 3.3 DMIPS/MHz (Scorpion: 2.1, A9: 2.5, A15: ~3.5)
- 11+ stages integer pipeline (A9: 8+, A15: 14+)
- 3 decode ports (vs 2 for A9/Scorpion and 3 for A15)
- 7 execution ports (A9: 4 IIRC, A15: 8)
- 4 issue ports (A9: 3+Branch, A15: 8)
- 4KB+4KB L0 (1c latency)
Krait Analysis: When comparing DMIPS/MHz numbers, always remember Dhrystone is a ridiculously outdated benchmark, but also remember that (like Coremark) it runs completely out of L1 cache (although far from completely out of L0 cache on Krait - the L0 is all about power, not performance, though). That makes it completely impossible to judge the quality/complexity of the OoOE hardware (it's basically just there wasting power most of the time) and doesn't say much about the load/store pipeline either (e.g. the A15 can issue a load ahead of a store as soon as the load's address is available - it doesn't have to wait on the data, which is a neat trick, no idea if Krait can do something similar). And like the A9 (& Intel P6) but unlike the A15 (& AMD K8/K10), Krait seems to have a single large issue queue rather than many small ones. It's hard to say how true ARM's arguments are about potential power savings here, but this does give a slight performance advantage to the A15. This performance improvement will be especially noticeable after a critical L2 cache miss as the decode hardware will be able to fill the queues and the execution hardware will be able to churn through them faster afterwards - once again, this won't help Dhrystone because it never misses L1 (let alone L2) but with 4 issue ports it's nearly certainly not a big advantage. However I'm listing all these examples to highlight how the DMIPS/MHz score doesn't mean that much.
If I had to guess about the real world, Krait will be a lot faster than the Cortex-A9, but the Cortex-A15 will be significantly faster than Krait (at least on integer workloads). Part of that advantage is simply clock speeds; Qualcomm can say all they want about Krait being competitive there, but the reality is it'll launch at 1.5GHz on TSMC 28LP (supposedly scaling up to 1.7-2GHz later though) whereas OMAP5 will run the A15 at 2GHz on UMC 28LP. ST-E A9600 will run at up to 2.5GHz on GF 28SLP (High-K without SiGe) whereas the APQ8064 quad-core will also run at up 2.5GHz but on a TSMC 28HPM (High-K *with* SiGe) process. However, I suspect Krait will also be significantly more power efficient than A15, and I suspect this will also affect performance: while the A9600 can run at up to 2.5GHz, it won't reach that high with both cores on for power reasons, and will be subject to thermal throttling (especially when running at the same time as the 200GFlops+ Rogue GPU!) much more often than the MSM8960. Therefore the increased power efficiency could result in slightly closer performance than you would expect when limited to a smartphone TDP - on a tablet, I don't think the power advantage will help as much for performance.
Krait is still clearly a very impressive core though, and I look forward to learning even more about it if Qualcomm ever gives more details, or when it's publicly available otherwise.
---
GPU: Exact same core clocked 50% higher (400MHz vs 266MHz - ala Exynos 45->32nm but without the High-K!) but with feature (e.g. MRT and sRGB) improvements to support DX9_3 and supposedly significant driver improvements.
Cellular: It's the same core as the 28nm MDM9615 and supports LTE Release 9 features (just like MDM6600 supported Release 7 HSPA+ features without being 21Mbps). Very interestingly it still supports simultaneous 1x Voice and LTE data for CDMA networks. This is presumably without a secondary RF chip so it's fair to say there's a separate 1.25MHz RF chain for 1x in addition to the 10+10MHz RF chains for LTE and/or DC-HSPA (all downlink, obviously it'd FDD so extra MHz on uplink).
Connectivity: This is a biggie: the WiFi & Bluetooth basebands are integrated just as I thought! In the past Qualcomm integrated the GPS baseband and added a GPS RF circuit to their 3G RF circuit (they still do). But now the WiFi/BT baseband is also integrated but I'm absolutely certainly the RF is not integrated in the RF chip - it's in a separate wireless combo chip, the 65nm WCN3660, which therefore needs to have a lot less digital logic than competitive combo chips (maybe part of the PHY is still in there so the I/O link with the baseband isn't too busy, just like DigRF 3G moved a bit of the processing to the RF chip?) and gets away with a much smaller die size (Qualcomm claims <15mm² including FM Rx/Tx whereas the BCM4330 takes 25.5mm²).
This is a VERY smart integration strategy (much smarter than the retarded "let's put both the BB & RF for BT/FM on our 3G RF chip") and leads the way towards integrating all RF into the same chip. Given how aggressive Qualcomm has always been with process technology, I don't think a true single-chip solution ever made sense for them, but a smart integration approach like this makes a lot of sense. And a 72Mbps WiFi and Bluetooth baseband must take very little die size on 28nm so there's not much wasted if customers prefer using a competitor's wireless combo chip. It will be interesting to see what they do now that they've acquired Atheros - Qualcomm's BT/WiFi wasn't very impressive last generation, but it's supposedly improved, and combined with Atheros (which still has the world's smallest mobile WiFi chip on 65nm!) they are in a very strong position here.
It seems based primarily (but not exclusively) on this whitepaper: http://www.scribd.com/doc/67918290/EWoyke-S4-White-Paper-Rd2-LR
Krait Summary:
- 3.3 DMIPS/MHz (Scorpion: 2.1, A9: 2.5, A15: ~3.5)
- 11+ stages integer pipeline (A9: 8+, A15: 14+)
- 3 decode ports (vs 2 for A9/Scorpion and 3 for A15)
- 7 execution ports (A9: 4 IIRC, A15: 8)
- 4 issue ports (A9: 3+Branch, A15: 8)
- 4KB+4KB L0 (1c latency)
Krait Analysis: When comparing DMIPS/MHz numbers, always remember Dhrystone is a ridiculously outdated benchmark, but also remember that (like Coremark) it runs completely out of L1 cache (although far from completely out of L0 cache on Krait - the L0 is all about power, not performance, though). That makes it completely impossible to judge the quality/complexity of the OoOE hardware (it's basically just there wasting power most of the time) and doesn't say much about the load/store pipeline either (e.g. the A15 can issue a load ahead of a store as soon as the load's address is available - it doesn't have to wait on the data, which is a neat trick, no idea if Krait can do something similar). And like the A9 (& Intel P6) but unlike the A15 (& AMD K8/K10), Krait seems to have a single large issue queue rather than many small ones. It's hard to say how true ARM's arguments are about potential power savings here, but this does give a slight performance advantage to the A15. This performance improvement will be especially noticeable after a critical L2 cache miss as the decode hardware will be able to fill the queues and the execution hardware will be able to churn through them faster afterwards - once again, this won't help Dhrystone because it never misses L1 (let alone L2) but with 4 issue ports it's nearly certainly not a big advantage. However I'm listing all these examples to highlight how the DMIPS/MHz score doesn't mean that much.
If I had to guess about the real world, Krait will be a lot faster than the Cortex-A9, but the Cortex-A15 will be significantly faster than Krait (at least on integer workloads). Part of that advantage is simply clock speeds; Qualcomm can say all they want about Krait being competitive there, but the reality is it'll launch at 1.5GHz on TSMC 28LP (supposedly scaling up to 1.7-2GHz later though) whereas OMAP5 will run the A15 at 2GHz on UMC 28LP. ST-E A9600 will run at up to 2.5GHz on GF 28SLP (High-K without SiGe) whereas the APQ8064 quad-core will also run at up 2.5GHz but on a TSMC 28HPM (High-K *with* SiGe) process. However, I suspect Krait will also be significantly more power efficient than A15, and I suspect this will also affect performance: while the A9600 can run at up to 2.5GHz, it won't reach that high with both cores on for power reasons, and will be subject to thermal throttling (especially when running at the same time as the 200GFlops+ Rogue GPU!) much more often than the MSM8960. Therefore the increased power efficiency could result in slightly closer performance than you would expect when limited to a smartphone TDP - on a tablet, I don't think the power advantage will help as much for performance.
Krait is still clearly a very impressive core though, and I look forward to learning even more about it if Qualcomm ever gives more details, or when it's publicly available otherwise.
---
GPU: Exact same core clocked 50% higher (400MHz vs 266MHz - ala Exynos 45->32nm but without the High-K!) but with feature (e.g. MRT and sRGB) improvements to support DX9_3 and supposedly significant driver improvements.
Cellular: It's the same core as the 28nm MDM9615 and supports LTE Release 9 features (just like MDM6600 supported Release 7 HSPA+ features without being 21Mbps). Very interestingly it still supports simultaneous 1x Voice and LTE data for CDMA networks. This is presumably without a secondary RF chip so it's fair to say there's a separate 1.25MHz RF chain for 1x in addition to the 10+10MHz RF chains for LTE and/or DC-HSPA (all downlink, obviously it'd FDD so extra MHz on uplink).
Connectivity: This is a biggie: the WiFi & Bluetooth basebands are integrated just as I thought! In the past Qualcomm integrated the GPS baseband and added a GPS RF circuit to their 3G RF circuit (they still do). But now the WiFi/BT baseband is also integrated but I'm absolutely certainly the RF is not integrated in the RF chip - it's in a separate wireless combo chip, the 65nm WCN3660, which therefore needs to have a lot less digital logic than competitive combo chips (maybe part of the PHY is still in there so the I/O link with the baseband isn't too busy, just like DigRF 3G moved a bit of the processing to the RF chip?) and gets away with a much smaller die size (Qualcomm claims <15mm² including FM Rx/Tx whereas the BCM4330 takes 25.5mm²).
This is a VERY smart integration strategy (much smarter than the retarded "let's put both the BB & RF for BT/FM on our 3G RF chip") and leads the way towards integrating all RF into the same chip. Given how aggressive Qualcomm has always been with process technology, I don't think a true single-chip solution ever made sense for them, but a smart integration approach like this makes a lot of sense. And a 72Mbps WiFi and Bluetooth baseband must take very little die size on 28nm so there's not much wasted if customers prefer using a competitor's wireless combo chip. It will be interesting to see what they do now that they've acquired Atheros - Qualcomm's BT/WiFi wasn't very impressive last generation, but it's supposedly improved, and combined with Atheros (which still has the world's smallest mobile WiFi chip on 65nm!) they are in a very strong position here.