An overview of Qualcomm's Snapddragon Roadmap

convergedw

Newcomer
This is a report from the Linley group that Qualcomm posted on their site. It gives a pretty good overview/analysis of Qualcomm's roadmap over the next year or so and is a nice source to keep track of all of their various Snapdragon iterations.

http://www.qualcomm.com/documents/linley-report-dual-core-snapdragon

It sounds like they might be aiming to get a revamp of the Scorpion core to market in 2012. If so, I would imagine that we'll be hearing about it soon....either their analyst day in November or at 3GSM.
 
Didn't know scorpion is OoOE CPU :p
very nice technical document(not too complicated so good for those who are not into those kind of things but still are curious), shame they didn't include samsung with its hummingbird when they compared scorpion to what others have to offer.
After reading that and seeing results from scorpion 2nd generation devices(HTC desire HD running msm7230) I can't wait to see how will dual-core snapdragon do(especially that 512kb L2 'faster than in A9'). Shame about lack of h.264 HP :cry:
 
Didn't know scorpion is OoOE CPU :p
very nice technical document(not too complicated so good for those who are not into those kind of things but still are curious), shame they didn't include samsung with its hummingbird when they compared scorpion to what others have to offer.
After reading that and seeing results from scorpion 2nd generation devices(HTC desire HD running msm7230) I can't wait to see how will dual-core snapdragon do(especially that 512kb L2 'faster than in A9'). Shame about lack of h.264 HP :cry:

Since Hummingbird is functionally a Cortex A8, you can use everything (except the max clock and power consumption) of the A8 in that paper as a comparison.
 
Since Hummingbird is functionally a Cortex A8, you can use everything (except the max clock and power consumption) of the A8 in that paper as a comparison.

I should have been more precise. I meant not the CPU only but the whole 'platform' as such. It means sgx540, combined with that hummingbird CPU and multimedia chip which is capable of playbacking up to 1080p video streams. Either Linley didn't see the necessity of including it in this comparison or didn't think samsung should be considered key player on the market.
 
I should have been more precise. I meant not the CPU only but the whole 'platform' as such. It means sgx540, combined with that hummingbird CPU and multimedia chip which is capable of playbacking up to 1080p video streams. Either Linley didn't see the necessity of including it in this comparison or didn't think samsung should be considered key player on the market.

Most of the article was focused on CPU performance and power as well as microarchitecture and some comments on RF integration. I agree a detailed comparison of various GPU architectures would've been nice but there's relatively less information available on Adreno/Yamato/z430 than Scorpion.
 
The article makes it sound like Scorpion and Cortex-A9 are using more or less the same OoO technologies, but I doubt this. The word from Anand was that Scorpion could do "some" things OoO, but is "not A9 class."

The way I figure it works is that there are separate pipelines for integer and load/store (and maybe more, like multiply) that each take different stages. These pipelines can complete out of order from each other in terms of writeback, so integer instructions could keep being issued ahead of a stalled load/store pipe so long as there are no dependencies. This would be like ARM11/XScale (but dual issue of course). But it would mean there's no reorder queues in front of the pipelines, so you couldn't for instance run one ALU operation ahead of another that was dependency stalled. I also figure there's no register renaming.

metafor said:
Most of the article was focused on CPU performance and power as well as microarchitecture and some comments on RF integration. I agree a detailed comparison of various GPU architectures would've been nice but there's relatively less information available on Adreno/Yamato/z430 than Scorpion.

Quite the contrary, until this article I've seen very little on Scorpion. This one doesn't have an awful lot either, but at least it discloses pipeline lengths and cache sizes. I've seen a lot of information on z430 in the i.MX51 user guide and AMD's slides, I'd say we know a lot about it... what we don't really know is how Adreno 205 and 220 improved on it. I blame Qualcomm for just being secretive in general.
 
Quite the contrary, until this article I've seen very little on Scorpion. This one doesn't have an awful lot either, but at least it discloses pipeline lengths and cache sizes. I've seen a lot of information on z430 in the i.MX51 user guide and AMD's slides, I'd say we know a lot about it... what we don't really know is how Adreno 205 and 220 improved on it. I blame Qualcomm for just being secretive in general.

Well said, the only other detailed article about scorpion can be found on insidedsp.com but this one shows what kind of upgrades were made to the first generation and what can we expect from dual-core scorpion. Still I wonder how will that 512kb L2 compare to the standard used in A9.
True that we don't know anything about adreno 205 or 220 apart from the raw numbers.
 
Still I wonder how will that 512kb L2 compare to the standard used in A9.

This article actually highlighted a pretty significant concern, that L2 in Cortex-A9 is running over the external AXI bus instead of an internal one. It'll be good to know what bus speed AXI is running at in ie OMAP4 and Tegra 2. In OMAP4 at least it goes to a higher level interconnect (L3) so they might be able to ramp it higher if only the L2 cache is hanging off of it. Will have to look at the OMAP4430 TRM again sometime.. I'd check it now if it weren't so huge -_-
 
The article makes it sound like Scorpion and Cortex-A9 are using more or less the same OoO technologies, but I doubt this. The word from Anand was that Scorpion could do "some" things OoO, but is "not A9 class."

The article is written quite fondly and probably in a better light than it should've been. But given the difference in pipeline lengths and a plethora of other factors, I wouldn't discount how "OoO" Scorpion is based solely on what little performance numbers we have between the two.

Quite the contrary, until this article I've seen very little on Scorpion. This one doesn't have an awful lot either, but at least it discloses pipeline lengths and cache sizes. I've seen a lot of information on z430 in the i.MX51 user guide and AMD's slides, I'd say we know a lot about it... what we don't really know is how Adreno 205 and 220 improved on it. I blame Qualcomm for just being secretive in general.

Perhaps. But my point stands. This is a microprocessor article comparing various ARM CPU's.
 
I'm not discounting OoO based on performance differences. It's more based on what Qualcomm hasn't been saying about it, and the comment on Anandtech.

I agree that the article doesn't come off as 100% NPOV, which is a little alarming for an analyst group. I also found the citation of Tegra 2 being the "graphics leader" a little suspect, almost as if it has to be just because it's made by nVidia. I full expect SGX 540 to be at least competitive, if not clearly leading itself. But there isn't really a lot to go on.
 
I agree that the article doesn't come off as 100% NPOV, which is a little alarming for an analyst group. I also found the citation of Tegra 2 being the "graphics leader" a little suspect, almost as if it has to be just because it's made by nVidia. I full expect SGX 540 to be at least competitive, if not clearly leading itself. But there isn't really a lot to go on.
I think you'll find OMAP4 and Adreno 220 to be leading Tegra 2 quite clearly, yes (barring any miraculous driver improvements which seem unlikely given the Tegra 3 focus at this point).

Of course Linley has no independent capacity to verify NVIDIA's claims and others are unlikely to counter them directly (or be able to prove otherwise publicly). This is a general problem with analysts in this business; heck, they practically never even have any access to anything resembling a die size estimate! How you can evaluate Qualcomm's competitive position in the standalone baseband market (for example) without roughly knowing their die size and that of their competitors is completely beyond me. MDM8200 was over 100mm², but they obviously didn't go around letting everyone know about it.

So these analysts find themselves to be in a position where they need to be 'slightly too optimistic about everybody', which is not a bad compromise, but far from ideal.
 
I'm not discounting OoO based on performance differences. It's more based on what Qualcomm hasn't been saying about it, and the comment on Anandtech.

No offense to Anand, but his speculation is just that and hardly anything to do with actual info. As for what Qualcomm doesn't say, well, they don't say much about anything....

I agree that the article doesn't come off as 100% NPOV, which is a little alarming for an analyst group. I also found the citation of Tegra 2 being the "graphics leader" a little suspect, almost as if it has to be just because it's made by nVidia. I full expect SGX 540 to be at least competitive, if not clearly leading itself. But there isn't really a lot to go on.

I read the article as pretty much reiterating what separate companies say in press releases, but with more technical information involved. Based on at least GLBench, Tegra 2 is slightly ahead of the 540 and Adreno 205 is somewhere around 20% slower than the 540.

I expect 220 to be a significant leap as that is meant for true console-level graphics.
 
I'm pretty sure we've had this discussion before, but it wasn't mentioned as speculation so much as a direct comment:

"Qualcomm claims the ability to do some things out of order, but by and large the pipeline is in order which ultimately keeps it out of the A9 classification."

Suggesting that this isn't him guessing but actually knowing. Why would this necessarily have nothing to do with actual info?

On the other hand, I've questioned some of the conclusions Linley has drawn in the past. For instance, a claim was once made that perf/MHz of Cortex-A9 and Atom were the same, and that both of them were only 25% the perf/MHz of Nehalem.
 
I'm pretty sure we've had this discussion before, but it wasn't mentioned as speculation so much as a direct comment:

"Qualcomm claims the ability to do some things out of order, but by and large the pipeline is in order which ultimately keeps it out of the A9 classification."

See, it's hard to tell where the comment from QCOM begins and where Anand's conclusions begin. I somehow doubt a QPerson actually said that it's "out of the A9 classification".

On the other hand, I've questioned some of the conclusions Linley has drawn in the past. For instance, a claim was once made that perf/MHz of Cortex-A9 and Atom were the same, and that both of them were only 25% the perf/MHz of Nehalem.

From a theoretical perspective, those aren't really unreasonable claims. Keep in mind this is solely from the point of view of CPU performance. The difference in memory subsystem and system bus performance dramatically changes the end result of course.
 
See, it's hard to tell where the comment from QCOM begins and where Anand's conclusions begin. I somehow doubt a QPerson actually said that it's "out of the A9 classification".

I'm not claiming a comment from Qualcomm here, but I am claiming that Anand isn't pulling this out of thing air. What it appears to be is a statement made based on him knowing things about the architecture the rest of us don't, things he's not at liberty to divulge. He may have already said too much. Nonetheless, I imagine he has a good reason for saying what he is. Furthermore, Linley's comment (that the core has some manner of speculative execution) doesn't contradict this, I feel it's only not enough to draw much of a conclusion from.

From a theoretical perspective, those aren't really unreasonable claims. Keep in mind this is solely from the point of view of CPU performance. The difference in memory subsystem and system bus performance dramatically changes the end result of course.

Solely from a CPU point of view it's even less reasonable, IMO. Just the same, the claim was made in the context of real world performance, that you would need 4x more A9 cores per/MHz to keep up.
 
It could be doing out of order execution but without any speculation.

\Shrugs

EDIT: It doesn't do that. :oops:
 
Last edited by a moderator:
I'm not claiming a comment from Qualcomm here, but I am claiming that Anand isn't pulling this out of thing air. What it appears to be is a statement made based on him knowing things about the architecture the rest of us don't, things he's not at liberty to divulge. He may have already said too much. Nonetheless, I imagine he has a good reason for saying what he is. Furthermore, Linley's comment (that the core has some manner of speculative execution) doesn't contradict this, I feel it's only not enough to draw much of a conclusion from.

It's not. And without a description from Qualcomm, I don't think any journalist out there can reliably claim information. I like Anand a lot but I'm not going to take his word for it.

Solely from a CPU point of view it's even less reasonable, IMO. Just the same, the claim was made in the context of real world performance, that you would need 4x more A9 cores per/MHz to keep up.

Compared to Nehalem? Dhrystone (I know, I know, not indicative of real world performance, but you'd be surprised how often it's used as a metric in CPU design) puts Nehalem at roughly 22 DMIPS/MHz from the benchmarks I've seen. The A9 pulls ~2.5 according to ARM.
 
There are lots of things it could or couldn't be doing.. probably about all we have any confidence in is that it's doing at least something OoO under some circumstance. It could be doing everything Cortex-A9 is and more, but I doubt it.

I remember Intel actually referred to Atom as having OoO capabilities because it can execute integer instructions ahead of floating point ones. That's kinda like calling something OoO because stores go off asynchronously on a write buffer.

Anything with branch prediction performs speculative execution. Prefetching can be considered speculative, and technically so can predicated instructions (although it's explicitly instrumented by the program). I was never really sure what else speculative execution referred to that would be specific to OoOE.

My expectation is stall that it's in-order execution and out-of-order completion.

Compared to Nehalem? Dhrystone (I know, I know, not indicative of real world performance, but you'd be surprised how often it's used as a metric in CPU design) puts Nehalem at roughly 22 DMIPS/MHz from the benchmarks I've seen. The A9 pulls ~2.5 according to ARM.

I was talking perf/MHz per core. 22 DMIPS/MHz is for 4 cores, or 5.5 DMIPS/MHz per core. That's only 2.2x more than Cortex-A9, which is closer to what you'd realistically expect. Note that the Cortex-A9 number is also per core, and Cortex-A9 can also be implemented as quad core.
 
Last edited by a moderator:
Something I've just found http://developer.qualcomm.com/sites/default/files/IQ-Tech-Track-AdrenoGPUandPerformanceTools.pdf
The only interesting part is about next generation of adreno graphics - Adreno 3xx. According to this paper it'll be GPGPU with OpenCL support running new openGL ES 'Halti' core(openGL ES 3.0 codename?) and if the presentation is to be believed it's going to be used on the 28nm snapdragon next year.
So if we add what linley said about improved scorpion architecture which should come around 2012 and this new GPU it gives us one hell of an interesting SoC! At least on paper :D
Love this never-ending performance race! Soon every smartphone will become outdated month after launch just like it is now with PC's.
 
Back
Top