An overview of Qualcomm's Snapddragon Roadmap

Lots of good info in there. The quad version of their next GPU could be interesting depending on when it reaches product.

Some of the claims in there (especially on the graphics side) are highy questionable, though.
 
Hah @ them using NEON-based stuff for most of their current generation CPU performance comparisons, and also things like V8 which while very very relevant they were actually responsible for porting to ARM (and told me point blank they did their best to make it more tolerant of Snapdragon's long pipeline - which is a very good thing and a great investment on their part, just makes it less representative of other workloads).

Another very surprising tidbit is this presentation seems to imply the 2.5GHz Snapdragon might be done on SiON (28 LP) and not High-K (although it's far from explicit on this point). I was quite shocked to discover that OMAP5 will be done on a 28nm SiON process at UMC and GF, not High-K - so that's possibly 2GHz versus 2.5GHz on the same process - although I still suspect Qualcomm is probably using High-K despite that slide, and there's also the question of whether they both use Triple Gate Oxide if they're SiON. Either way I'm skeptical about their performance claims; if they're both SiON, then 23% more headroom would therefore imply identical DMIPS/MHz as A15, which seems extremely unlikely given all of their claims so far. I think they must just be underestimating A15 clocks and/or DMIPS/MHz.

Either way, Qualcomm's roadmap is solid as always, there's no denying that.
 
Another very surprising tidbit is this presentation seems to imply the 2.5GHz Snapdragon might be done on SiON (28 LP) and not High-K (although it's far from explicit on this point).

2.5 won't be on LP. But the slide is fairly market-speak and doesn't distinguish that :/

I was quite shocked to discover that OMAP5 will be done on a 28nm SiON process at UMC and GF, not High-K - so that's possibly 2GHz versus 2.5GHz on the same process

2GHz on LP? I suppose it's plausible but difficult to believe even with the A15's pipeline.

- although I still suspect Qualcomm is probably using High-K despite that slide, and there's also the question of whether they both use Triple Gate Oxide if they're SiON.

2.5GHz is HK, 1.4-1.7GHz is LP. 8960 won't be 2.5.
 
2GHz on LP? I suppose it's plausible but difficult to believe even with the A15's pipeline.
I was surprised as well, but TI apparently said so explicitly: http://www.eetimes.com/electronics-news/4214774/Upset-TI-slams-Samsung-s-foundry-efforts (EETimes reporting isn't always right these days, but this is a very good article overall and written by Mark LaPedus to boot, so I'd tend to trust it). Keep in mind it might use Triple Gate Oxide at least (TSMC certainly supports it at 28LP, presumably UMC/GF do too but I don't know for certain).

ARM's A15 presentation says "Feasibility work showed critical loops balancing at about 15-16 gates/clk" - if that means ~16 FO4 on A15, then it's an absolute speed demon and 2GHz on SiON might not be that surprising, however I read that as meaning 'relatively simple gates' rather than necessarily FO4. Also being 'feasible' doesn't mean that's necessarily what they did I suppose. I don't know the terminology enough to know what is the most likely meaning, any ideas?
2.5GHz is HK, 1.4-1.7GHz is LP. 8960 won't be 2.5.
Oh, so 2.5GHz is only APQ8064. That makes a lot of sense, the PR certainly wasn't very clear though :???:

EDIT: BTW, it's nice that Snapdragon has a tightly coupled L2 unlike A9 (but like A15), however I think it's worth pointing out that an OoOE core like the A9 can hide L2 latency better than an in-order one like Snapdragon.
 
I was surprised as well, but TI apparently said so explicitly: http://www.eetimes.com/electronics-news/4214774/Upset-TI-slams-Samsung-s-foundry-efforts (EETimes reporting isn't always right these days, but this is a very good article overall and written by Mark LaPedus to boot, so I'd tend to trust it)

UMC 28nm. Probably still SiON but could potentially be faster than TSMC's 28LP. Plus TI's never been a slouch at pushing frequency from the back-end side.

ARM's A15 presentation says "Feasibility work showed critical loops balancing at about 15-16 gates/clk" - if that means ~16 FO4 on A15, then it's an absolute speed demon and 2GHz on SiON might not be that surprising, however I read that as meaning 'relatively simple gates' rather than necessarily FO4. I don't know the terminology enough to know what is the most likely meaning though, any ideas?

*Shrug*. Based on the pipeline, I'd say it's based on NAND-equivalent delay but who knows. I really really really doubt they're able to pull off 12-cycle NEON VMLA with 16 FO4.

Oh, so 2.5GHz is APQ8064. That makes a lot of sense, the PR certainly wasn't very clear though :???:

I forget whichever model name but only HK variants will go above 1.4-2.0GHz. I'm not sure what frequency it is of the A15 they're comparing to, but I'm reasonably sure it isn't 2.0GHz. PR is never clear, unfortunately :/
 
UMC 28nm. Probably still SiON but could potentially be faster than TSMC's 28LP. Plus TI's never been a slouch at pushing frequency from the back-end side.
Yeah, although presumably they will dual-source with GF again, using the same 28LP SiON process there as Qualcomm. Either way ST-Ericsson's 2.5GHz peak on High-K is suddenly looking less impressive.

*Shrug*. Based on the pipeline, I'd say it's based on NAND-equivalent delay but who knows. I really really really doubt they're able to pull off 12-cycle NEON VMLA with 16 FO4.
NAND-equivalent delay makes sense, thanks. On the A8/A9, MAC was done as separate MUL then ADD with a dedicated MAC FIFO afaik (so it had twice the latency), however A15 supports a fused FMAC which must indeed presumably must be done in 12 cycles (including Issue & Writeback). That might indeed be a frequency bottleneck.
 
Yeah, although presumably they will dual-source with GF again, using the same 28LP SiON process there as Qualcomm.

Krait is on TSMC.....

NAND-equivalent delay makes sense, thanks. On the A8/A9, MAC was done as separate MUL then ADD with a dedicated MAC FIFO afaik (so it had twice the latency), however A15 supports a fused FMAC which must indeed presumably must be done in 12 cycles (including Issue & Writeback). That might indeed be a frequency bottleneck.

12 cycles is a lot for VFMA. IIRC, A15's VMLA throughput is 1 quad per cycle so they aren't double-pumping. The 12 cycles is likely only for VMLA, but we'll have to wait until instruction latencies are released to be sure.

I would suspect ARM would use NAND-equivalents more than they would FO4 as they're more front-end oriented and their modeling is likely based on gate-delay rather than wire delay. But 2.0GHz on 28 SiON is indeed impressive.
 
Krait is on TSMC.....
My understanding is that TSMC is the lead supplier with GF as a likely second source down the line, based on this article: http://semimd.com/blog/2011/02/07/qualcomm-shies-away-from-high-k-at-28nm/ - either way, we should probably leave that discussion at this, right or wrong :)

12 cycles is a lot for VFMA. IIRC, A15's VMLA throughput is 1 quad per cycle so they aren't double-pumping.
I think that's right, yes.

The 12 cycles is likely only for VMLA, but we'll have to wait until instruction latencies are released to be sure. I would suspect ARM would use NAND-equivalents more than they would FO4 as they're more front-end oriented and their modeling is likely based on gate-delay rather than wire delay. But 2.0GHz on 28 SiON is indeed impressive.
Indeed. BTW, I just remembered Bulldozer has a latency of 6 cycles for fused FMA and it's also quite a speed demon. David Kanter from RealWorldTech mentioned a (presumably NAND-equivalent) ~17 gate delay rumour on comp.arch in his article, not sure if it's true but either way it should have a lower gate delay than the vast majority of CPUs out there. If they can do 6 cycles on ~17 NAND-equivalent, then 10+ on 16 FO4 doesn't seem so impossible anymore. However even then I'd be skeptical ARM would be willing to trade-off area/power to achieve that latency, and your reasoning on why ARM would talk in NAND-equivalent makes sense to me.
 
Indeed. BTW, I just remembered Bulldozer has a latency of 6 cycles for fused FMA and it's also quite a speed demon. David Kanter from RealWorldTech mentioned a (presumably NAND-equivalent) ~17 gate delay rumour on comp.arch in his article, not sure if it's true but either way it should have a lower gate delay than the vast majority of CPUs out there. If they can do 6 cycles on ~17 NAND-equivalent, then 10+ on 16 FO4 doesn't seem so impossible anymore. However even then I'd be skeptical ARM would be willing to trade-off area/power to achieve that latency, and your reasoning on why ARM would talk in NAND-equivalent makes sense to me.

A 12-cycle VFMA would be possible with 16 FO4's but it'd be a colossal waste of gates and flops. Plus it'd also mean VMLA would be ~20 cycles, which I don't believe it is. The long pole really is VMLA, not VFMA if we're talking a throughput of 1 quad/cycle.
 
Back
Top