Another very surprising tidbit is this presentation seems to imply the 2.5GHz Snapdragon might be done on SiON (28 LP) and not High-K (although it's far from explicit on this point).
I was quite shocked to discover that OMAP5 will be done on a 28nm SiON process at UMC and GF, not High-K - so that's possibly 2GHz versus 2.5GHz on the same process
- although I still suspect Qualcomm is probably using High-K despite that slide, and there's also the question of whether they both use Triple Gate Oxide if they're SiON.
I was surprised as well, but TI apparently said so explicitly: http://www.eetimes.com/electronics-news/4214774/Upset-TI-slams-Samsung-s-foundry-efforts (EETimes reporting isn't always right these days, but this is a very good article overall and written by Mark LaPedus to boot, so I'd tend to trust it). Keep in mind it might use Triple Gate Oxide at least (TSMC certainly supports it at 28LP, presumably UMC/GF do too but I don't know for certain).2GHz on LP? I suppose it's plausible but difficult to believe even with the A15's pipeline.
Oh, so 2.5GHz is only APQ8064. That makes a lot of sense, the PR certainly wasn't very clear though2.5GHz is HK, 1.4-1.7GHz is LP. 8960 won't be 2.5.
I was surprised as well, but TI apparently said so explicitly: http://www.eetimes.com/electronics-news/4214774/Upset-TI-slams-Samsung-s-foundry-efforts (EETimes reporting isn't always right these days, but this is a very good article overall and written by Mark LaPedus to boot, so I'd tend to trust it)
ARM's A15 presentation says "Feasibility work showed critical loops balancing at about 15-16 gates/clk" - if that means ~16 FO4 on A15, then it's an absolute speed demon and 2GHz on SiON might not be that surprising, however I read that as meaning 'relatively simple gates' rather than necessarily FO4. I don't know the terminology enough to know what is the most likely meaning though, any ideas?
Oh, so 2.5GHz is APQ8064. That makes a lot of sense, the PR certainly wasn't very clear though
Yeah, although presumably they will dual-source with GF again, using the same 28LP SiON process there as Qualcomm. Either way ST-Ericsson's 2.5GHz peak on High-K is suddenly looking less impressive.UMC 28nm. Probably still SiON but could potentially be faster than TSMC's 28LP. Plus TI's never been a slouch at pushing frequency from the back-end side.
NAND-equivalent delay makes sense, thanks. On the A8/A9, MAC was done as separate MUL then ADD with a dedicated MAC FIFO afaik (so it had twice the latency), however A15 supports a fused FMAC which must indeed presumably must be done in 12 cycles (including Issue & Writeback). That might indeed be a frequency bottleneck.*Shrug*. Based on the pipeline, I'd say it's based on NAND-equivalent delay but who knows. I really really really doubt they're able to pull off 12-cycle NEON VMLA with 16 FO4.
Yeah, although presumably they will dual-source with GF again, using the same 28LP SiON process there as Qualcomm.
NAND-equivalent delay makes sense, thanks. On the A8/A9, MAC was done as separate MUL then ADD with a dedicated MAC FIFO afaik (so it had twice the latency), however A15 supports a fused FMAC which must indeed presumably must be done in 12 cycles (including Issue & Writeback). That might indeed be a frequency bottleneck.
My understanding is that TSMC is the lead supplier with GF as a likely second source down the line, based on this article: http://semimd.com/blog/2011/02/07/qualcomm-shies-away-from-high-k-at-28nm/ - either way, we should probably leave that discussion at this, right or wrongKrait is on TSMC.....
I think that's right, yes.12 cycles is a lot for VFMA. IIRC, A15's VMLA throughput is 1 quad per cycle so they aren't double-pumping.
Indeed. BTW, I just remembered Bulldozer has a latency of 6 cycles for fused FMA and it's also quite a speed demon. David Kanter from RealWorldTech mentioned a (presumably NAND-equivalent) ~17 gate delay rumour on comp.arch in his article, not sure if it's true but either way it should have a lower gate delay than the vast majority of CPUs out there. If they can do 6 cycles on ~17 NAND-equivalent, then 10+ on 16 FO4 doesn't seem so impossible anymore. However even then I'd be skeptical ARM would be willing to trade-off area/power to achieve that latency, and your reasoning on why ARM would talk in NAND-equivalent makes sense to me.The 12 cycles is likely only for VMLA, but we'll have to wait until instruction latencies are released to be sure. I would suspect ARM would use NAND-equivalents more than they would FO4 as they're more front-end oriented and their modeling is likely based on gate-delay rather than wire delay. But 2.0GHz on 28 SiON is indeed impressive.
Indeed. BTW, I just remembered Bulldozer has a latency of 6 cycles for fused FMA and it's also quite a speed demon. David Kanter from RealWorldTech mentioned a (presumably NAND-equivalent) ~17 gate delay rumour on comp.arch in his article, not sure if it's true but either way it should have a lower gate delay than the vast majority of CPUs out there. If they can do 6 cycles on ~17 NAND-equivalent, then 10+ on 16 FO4 doesn't seem so impossible anymore. However even then I'd be skeptical ARM would be willing to trade-off area/power to achieve that latency, and your reasoning on why ARM would talk in NAND-equivalent makes sense to me.
That link doesn't work for me, or the document was removed. Did someone save a copy?Here is a presentation by Qualcomm giving some numbers on Snapdragon including some comparisons between their next-gen architecture and the A15.
http://www.kandroid.org/board/data/board/conference/file_in_body/1/4.session.제7회_KANDROID_세미나_퀄컴.pdf