Qualcomm's lower-end chips with OpenGL ES 2.0 and Scorpion CPU

Arun I've got a question for you
How hard can it be to make scorpion CPU have OoOE just like cortex A9?

Probably it would be easier to use A9 and don't try to change scorpion but maybe it would benefit from OoOE and have some advantage over A9...
 
Arun I've got a question for you
How hard can it be to make scorpion CPU have OoOE just like cortex A9?

Probably it would be easier to use A9 and don't try to change scorpion but maybe it would benefit from OoOE and have some advantage over A9...

Pretty much impossible. OoOE determines the whole architecture of a CPU. It's not something that you can just bolt on.
 
As silent_guy said, it'd pretty much be an entirely new CPU generation; OoOE is one of the most fundamental changes you can make to a CPU design... I have no idea whether they plan to invest in a new generation, obviously the back-end guys were focusing on 45nm but I have no idea what the architectures ones might be working on for the 28nm timeframe (if anything).

BTW, it feels pretty good being spot on (in the original news piece) - now if only someone else didn't have the scoop straight from Qualcomm even before I speculated about it, bah... ;) http://www.linleygroup.com/Newsletters/LinleyMobile/lm090129.html

Talking of the A8/A9/Scorpion, I stumbled upon this case of a pretty epic fail with NEON: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0344b/ch16s05s02.html - while not a big deal for some things, it seems to me that it'd make it harder to benefit much from NEON in a game engine. Apparently will be improved in the A9, I wonder how bad it is on Scorpion...

And finally one quick off-topic note: looks like I was wrong in one of my posts about OMAP3, the U380 SoC from Ericsson Mobile Platforms (which might or might not have been canned following the merger with ST-NXP) was on 65nm, not 45nm. So what I heard there was clearly wrong. That sadly gives me a hunch I might have been way too optimistic about the Motorola SoC which I speculated was therefore also 45nm; who knows though. BTW, the STn8820 was also 100% canned to focus on the U8500, and the U500 also seems to have been canceled.
 
BTW, it feels pretty good being spot on (in the original news piece) - now if only someone else didn't have the scoop straight from Qualcomm even before I speculated about it, bah... ;) http://www.linleygroup.com/Newsletters/LinleyMobile/lm090129.html
So I was right after all that qcom will try to put scorpion almost everywhere so that they could benefit from the hard work they done on designing it ;)

Talking of the A8/A9/Scorpion, I stumbled upon this case of a pretty epic fail with NEON: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0344b/ch16s05s02.html - while not a big deal for some things, it seems to me that it'd make it harder to benefit much from NEON in a game engine. Apparently will be improved in the A9, I wonder how bad it is on Scorpion...
Maybe when they were designing scorpion they figured out some way to overcome this 'flaw' but even if they didn't it will still be better than current A8 implementation.
 
Indeed, definitely looks like they want to amortize Scorpion in a lot of designs. They're also being a lot more aggressive with video & 3D hardware on 45nm. All good news for the industry... :)

But well, you know, I'm a tad subjective when it comes to NEON because I 300% agree with the point of view of a certain third-party I won't name that think it's as much of a joke as I think I am, and will stick to VFP. These companies have three zillion redundant units capable of doing the same thing, then they act all surprised when their die size is higher than the competition. Oh gosh, hoocoodanode?

Just look at Snapdragon: you could do video decoding on NEON. Or you could do it on the advanced DSP. Or you could do it in the dedicated accelerators! Errr, what exactly is the point, guys? And how many application developers do you really think are going to use SIMD intrinsics on an open platform where 90%+ of the user base doesn't have it available? The same thing is true to differing extents for other architectures, including OMAP4. It's as if all these guys were living in a world of their own where there's no competition and higher die size is merely a benefit because it allows you to justify a higher selling price... Oh well, I really should stop being so bitter! ;)
 
Indeed, definitely looks like they want to amortize Scorpion in a lot of designs. They're also being a lot more aggressive with video & 3D hardware on 45nm. All good news for the industry... :)

But well, you know, I'm a tad subjective when it comes to NEON because I 300% agree with the point of view of a certain third-party I won't name that think it's as much of a joke as I think I am, and will stick to VFP. These companies have three zillion redundant units capable of doing the same thing, then they act all surprised when their die size is higher than the competition. Oh gosh, hoocoodanode?

Just look at Snapdragon: you could do video decoding on NEON. Or you could do it on the advanced DSP. Or you could do it in the dedicated accelerators! Errr, what exactly is the point, guys? And how many application developers do you really think are going to use SIMD intrinsics on an open platform where 90%+ of the user base doesn't have it available? The same thing is true to differing extents for other architectures, including OMAP4. It's as if all these guys were living in a world of their own where there's no competition and higher die size is merely a benefit because it allows you to justify a higher selling price... Oh well, I really should stop being so bitter! ;)
Well, these are generic designs. Someone might need DSPs, someone Neon etc. If you want something specific you have to build your own design like Apple :)
 
Indeed, definitely looks like they want to amortize Scorpion in a lot of designs. They're also being a lot more aggressive with video & 3D hardware on 45nm. All good news for the industry... :)

But well, you know, I'm a tad subjective when it comes to NEON because I 300% agree with the point of view of a certain third-party I won't name that think it's as much of a joke as I think I am, and will stick to VFP. These companies have three zillion redundant units capable of doing the same thing, then they act all surprised when their die size is higher than the competition. Oh gosh, hoocoodanode?

Just look at Snapdragon: you could do video decoding on NEON. Or you could do it on the advanced DSP. Or you could do it in the dedicated accelerators! Errr, what exactly is the point, guys? And how many application developers do you really think are going to use SIMD intrinsics on an open platform where 90%+ of the user base doesn't have it available? The same thing is true to differing extents for other architectures, including OMAP4. It's as if all these guys were living in a world of their own where there's no competition and higher die size is merely a benefit because it allows you to justify a higher selling price... Oh well, I really should stop being so bitter! ;)

I think that NEON will be good for 3rd party software developers.
When someone tries to do multimedia player with hardware acceleration he is doomed cause most chip manufacturers or ODM don't release any kind of SDK or documentation which they could use but thanks to NEON they could write program that would work well on many hardware platforms that have NEON.

Besides it can be used for accelerating audio cause most DSP's are used only for video decode.
 
Well, these are generic designs. Someone might need DSPs, someone Neon etc. If you want something specific you have to build your own design like Apple :)
Why would someone *need* that? I have yet to see a single smartphone manufacturer do anything really fancy with this stuff. They just use it for multimedia, often with partner-supplied libraries. Maybe slow-motion video encode but that can be done great in fixed-function hardware if thought of in advance.

The only potential killer app I've ever found for it is what ARM summarizes as "Voice and handwriting recognition" - but as far as I can tell, the processing requirements for that really aren't as high with conventional methods as people sometimes make it out to be. Certainly if there was a way that gave much better results for both that and speech synthesis, but also required much more processing power, that'd be appealing - but I'm not really aware of any, not that I've looked incredibly in-depth into the matter.

I think that NEON will be good for 3rd party software developers.
When someone tries to do multimedia player with hardware acceleration he is doomed cause most chip manufacturers or ODM don't release any kind of SDK or documentation which they could use but thanks to NEON they could write program that would work well on many hardware platforms that have NEON.
But who cares about that stuff? The entire point of those chips is the media acceleration. Those third party apps are great for older processors, but I don't really see the point now. Surely every phone with those chips will have a hardware accelerated media player, and if that thing's GUI is so bad you want to use another one then you probably shouldn't have bought the phone in the first place. Plus, even discounting that, I think there's a real willingness from some chip designers to expose the APIs to successful third party video software developers.

Besides it can be used for accelerating audio cause most DSP's are used only for video decode.
Whadda? :) The industry standard is becoming to put a small audio processor on its own power island so you can shutdown everything else while playing music. That's how you get to numbers like 100+ hours for audio on OMAP4/Tegra. On NEON, I don't even see how you could get to 20 hours... Even with the OMAP3 DSP, Archos' MID has some pretty pitiful battery life for audio, doesn't it?
 
That's all great really but I wouldn't call those DSP's advanced if they can't process 720p h.264 high profile video with average 5 mbits :p
I know I'm being picky but IMO only baseline profile is useless. Especially on something that want's to be a small computer (MID, netbook, nettop etc.) and HD PMP.
It would mean that every 720p BDrip would have to be re-encoded. Even youtube HD would be too much :devilish:
 
That's all great really but I wouldn't call those DSP's advanced if they can't process 720p h.264 high profile video with average 5 mbits :p
Wow, poor DSP designers, I think you just insulted every single one of them that has ever lived... :) I'm certainly not aware of 720p H.264 High Profile having shipped in a commercial product using a classic DSP and no accelerator even for the tight loops... But then again I guess you meant a DSP+Accelerator approach probably.

I know I'm being picky but IMO only baseline profile is useless. Especially on something that want's to be a small computer (MID, netbook, nettop etc.) and HD PMP.
I can hear people at NVIDIA crying right now! j/k (and quite a few other companies too, I'm not sure anyone but TI & Apple is doing High Profile in their next-gen SoCs). Anyway to address your reason:

It would mean that every 720p BDrip would have to be re-encoded.
Wow, a 720p library not being usable until the ends of time... shock horror :) How many people already have their library of BDrips anyway? Seems like a niche of a niche to me. And this generation nobody supports High Profile, by next generation everyone will be 1080p so you'd have to re-encode anyway. Plus 5mbps is not very high so you'd probably want to encode it again on a SoC that supports 10-20Mbps. Finally, consider that even OMAP4 doesn't support bitrates anywhere near high enough to play any Bluray stream without transcoding.

BTW, when it comes to image quality, it's worth pointing out that many more SoCs will support VC-1 Advanced Profile than H.264 High Profile/CABAC, and the former's image quality is obviously better than H.264 Baseline. So if you can find a good VC-1 encoder, that's probably the way to go.

Even youtube HD would be too much :devilish:
Yeah, that's definitely very unfortunate, and is probably the best use-case I can think of where the user would be pissed off because he wouldn't understand why it doesn't work on an ARM netbook that is claimed to be so awesome at HD playback.

Another point, I think, is that practically all HD content that won't require any illegal process to obtain (zomg, bypassing HDCP! the horror!) won't be High Profile - in part because the devices don't support it, of course. But Youtube is certainly a big exception there, and does make it look pretty dumb, heh...

Don't get me wrong, I'd love every SoC in the world to have a PowerVR VXD 380 running at 300MHz so it could handle two simultaneous streams of 1080p H.264 High Profile at 40Mbps+... But I don't think that's realistic at this point :) It would be extremely amusing if Apple was the first to come out with that - I wouldn't be overly optimistic at this point though. Ah well, we'll see what happens.
 
Finally, consider that even OMAP4 doesn't support bitrates anywhere near high enough to play any Bluray stream without transcoding.
OMAP4 should be capable of playing any Blu-ray stream. Maybe not with it's DSPs alone, but the SGX540 (and Dual-Core ARM9) should do it. The Handset/Netbook/Videoplayer Developer just has to work a little bit harder ;)
 
OMAP4 should be capable of playing any Blu-ray stream. Maybe not with it's DSPs alone, but the SGX540 (and Dual-Core ARM9) should do it. The Handset/Netbook/Videoplayer Developer just has to work a little bit harder ;)
OMAP4 uses dedicated/specialized hardware to playback 1080p video, the DSP/CPU/GPU are shutoff *completely* in those cases. And I very much doubt this dedicated hardware is programmable; it's "configurable" at best IMO. The DSP alone is not much more powerful than what you have in OMAP3, and without accelerators it barely does 720p H.264 Baseline.

Now, if you mean you could bypass that hardware and use the DSP, the two A9s (with NEON), and the GPU all at the same time in an incredibly complex and interwoven program, and that with this you could do 1080p H.264 High Profile @ 40Mbps... Then yeah, there might be just enough processing power to do that, but it'd be incredibly power inefficient and, more importantly, would be mind-blowingly hard to program. The chances of anyone ever bothering to do this is, IMO, literally zero.

Consider that at MWC08, about 18 months post-tapeout, the best they could demo was a partner using the DSP for 720p H.264 decode at not incredibly high bitrates - and they didn't even yet manage to use the hardware accelerators at all, AFAIK! Which is why OMAP3's demos have become increasingly more appealing over the years: the software curve. And that's with something that is at least an order of magnitude easier than what you're describing, so really please don't get your hopes up ;)
 
Pretty much impossible. OoOE determines the whole architecture of a CPU. It's not something that you can just bolt on.

That's not quite true. OOO processors are really in-order processors with an OOO part in the middle. If you do a non-agressive OOO implementation - you're not going to do anything else on a mobile chip - it's not a complete redesign.
 
Now, if you mean you could bypass that hardware and use the DSP, the two A9s (with NEON), and the GPU all at the same time in an incredibly complex and interwoven program, and that with this you could do 1080p H.264 High Profile @ 40Mbps... Then yeah, there might be just enough processing power to do that, but it'd be incredibly power inefficient and, more importantly, would be mind-blowingly hard to program. The chances of anyone ever bothering to do this is, IMO, literally zero.

I've heard of one company that petty much does things like that to get 1080p out of 720p hardware so I would put the chances of that being rather higher.
 
OMAP4 uses dedicated/specialized hardware to playback 1080p video, the DSP/CPU/GPU are shutoff *completely* in those cases. And I very much doubt this dedicated hardware is programmable; it's "configurable" at best IMO. The DSP alone is not much more powerful than what you have in OMAP3, and without accelerators it barely does 720p H.264 Baseline.

Now, if you mean you could bypass that hardware and use the DSP, the two A9s (with NEON), and the GPU all at the same time in an incredibly complex and interwoven program, and that with this you could do 1080p H.264 High Profile @ 40Mbps... Then yeah, there might be just enough processing power to do that, but it'd be incredibly power inefficient and, more importantly, would be mind-blowingly hard to program. The chances of anyone ever bothering to do this is, IMO, literally zero.

Consider that at MWC08, about 18 months post-tapeout, the best they could demo was a partner using the DSP for 720p H.264 decode at not incredibly high bitrates - and they didn't even yet manage to use the hardware accelerators at all, AFAIK! Which is why OMAP3's demos have become increasingly more appealing over the years: the software curve. And that's with something that is at least an order of magnitude easier than what you're describing, so really please don't get your hopes up ;)
Hm ok, thanks for explaining. I just thought if a SGX535 with Single Core Atom can play a Blu-ray stream plus Quake 3 at the same time then a SGX540 with two Cortex-A9 must be able to do it too (for example on a Netbook). But I guess that was just very wishful thinking and I have to wait for OMAP5 in 2013 ;)
 
OMAP4 uses dedicated/specialized hardware to playback 1080p video, the DSP/CPU/GPU are shutoff *completely* in those cases. And I very much doubt this dedicated hardware is programmable; it's "configurable" at best IMO. The DSP alone is not much more powerful than what you have in OMAP3, and without accelerators it barely does 720p H.264 Baseline.
That's interesting. So it's very different from OMAP3, since IIUC the video IP in IVA2+ are controlled by the DSP. According to high-level information provided by TI I would assume IVA3 uses a very similar architecture: the IVA3 block is made of IP + DSP.
And I don't think there exists any HW that does video codec that doesn't have any kind of CPU (at large) to handle at least the various different file formats.
Do you have some link for further information?

Consider that at MWC08, about 18 months post-tapeout, the best they could demo was a partner using the DSP for 720p H.264 decode at not incredibly high bitrates - and they didn't even yet manage to use the hardware accelerators at all, AFAIK! Which is why OMAP3's demos have become increasingly more appealing over the years: the software curve. And that's with something that is at least an order of magnitude easier than what you're describing, so really please don't get your hopes up ;)
And what if they couldn't use the hw accelerators because these were too specific to some codec algorithm that they are of no use for H.264?
 
That's not quite true. OOO processors are really in-order processors with an OOO part in the middle. If you do a non-agressive OOO implementation - you're not going to do anything else on a mobile chip - it's not a complete redesign.
I guess that's theoretically possible, I'm just not sure anyone has ever done that in the history of the industry? Of course, that doesn't prove anything...
I've heard of one company that petty much does things like that to get 1080p out of 720p hardware so I would put the chances of that being rather higher.
Unless you're thinking of a software company, the only SoC I can think of that was upped from 720p to 1080p is Tegra. It is certainly not my understanding that they're doing something as fancy as what I described (although I'm not far from sure of the details), but I see your point.
 
Back
Top