Qualcomm's lower-end chips with OpenGL ES 2.0 and Scorpion CPU

Laurent06: The answer to both of your questions lies in the IVA 3 section of this PDF... It has tons of nice info on other things too though :) http://focus.ti.com/en/pdfs/wtbu/omap_4_pb_swpt034.pdf?DCMP=wtbu_omap&HQS=ProductBulletin+PR+omap4pb
Thanks for the link! But this kind of confirms what I said: IVA3 contains the DSP and the video IP, so my guess is that your claim that the DSP isn't used when decoding isn't true ;)
Of course, I might have missed something when I read the PDF :)
 
That's not quite true. OOO processors are really in-order processors with an OOO part in the middle. If you do a non-agressive OOO implementation - you're not going to do anything else on a mobile chip - it's not a complete redesign.
I see your point and I suppose it's possible, but I wonder if there aren't second order considerations that make it harder than it seems? I can't think of any practical cases where it has been implemented.

One negative that comes to mind with incremental surgery is that it would add a few stages (any idea how many?) to an already deep (in case of an A8) pipeline.

Or is the absence of any practical example more because IP vendors prefer to go all in when doing OOO and add other features like speculative execution as well?
 
Thanks for the link! But this kind of confirms what I said: IVA3 contains the DSP and the video IP, so my guess is that your claim that the DSP isn't used when decoding isn't true ;)
Of course, I might have missed something when I read the PDF :)
Well, here's what it says:
PDF said:
The third generation IVA on the OMAP 4 applications processor is divided into two sections: a power-optimized, multi-format hardware accelerator for mainstream codecs and a programmable digital signal processor (DSP) based portion for emerging codecs and audio.
This kind of phrasing is completely different from how they described OMAP3. You could argue maybe the technical writer made a mistake, but assuming it is correct it says pretty damn clearly that the hardware accelerator portion is not based on a DSP, since only the second portion is described as such. So the DSP is still there, and it's what does the job for 'emerging standards' like, say, the new chinese video format... But in other cases, it's not even taking any power as it's on a separate power island being shutoff completely.

TBH, I don't see how they could possibly get to their power numbers with a classical DSP-centric approach anyway... The difference between OMAP3 and OMAP4 is too great for this being nothing more than an incremental improvement in DSP accelerators.
 
An OoO processor is in-order in the front end and in-order past the reorder stage.
The logic of ALUs themselves would also be only changed slightly.

The amount of work outside of those is hefty, though.

The various schemes for a modern OoO would involve:
Rehashing most of the issue network and logic, particularly for a Tomasulo-type OoO.

Revamping the internal register addressing as a consequence of the change in issue hardware to provide all or some of the following: reservation stations, a future file and committed state (or the allocation tables to track those register in a common pool), and new tags on the bypass network.

Flag registers, processor state registers, and exception data would have to be properly tracked, which involves per-instruction tracking, duplication, or renaming.

The hardware responsible for committing instruction progress must change, persuant to the prior alterations.
Failure to fully implement such changes can lead to improper or undefined system states to become visible off-chip.

One negative that comes to mind with incremental surgery is that it would add a few stages (any idea how many?) to an already deep (in case of an A8) pipeline.
OoOE is a measurable increase, even if it were modest. The big explosion in complexity usually happens when the addition of wide superscalar, pipelining, and deeper speculation are added on top.

The difficulty is that there are many data paths, interlocks, and registers throughout the pipeline that simply do not exist prior to going OoO. The amount of state being tracked does go up with pipeline length.
While the size of the unglamorous parts of the chip may not appear to amount to much, getting them wrong renders the processor useless.

At that point, what design effort is actually being saved? The chip needs to be thoroughly verified anyway.

Or is the absence of any practical example more because IP vendors prefer to go all in when doing OOO and add other features like speculative execution as well?
If there's a pipeline and the processor doesn't stall on a branch, speculative execution is present by default, OOE or not.
 
Thanks for the link! But this kind of confirms what I said: IVA3 contains the DSP and the video IP, so my guess is that your claim that the DSP isn't used when decoding isn't true ;)
Of course, I might have missed something when I read the PDF :)
I stumbled upon this, it's an article written by someone who has spoken with TI, it's about as explicit as you can get: http://www.insidedsp.com/tabid/64/articleType/ArticleView/articleId/300/Default.aspx
Article said:
The chips’ video performance is enabled by the IVA3 video engine, which includes a programmable ‘C64x DSP core plus video codec accelerators. According to TI, the ‘C64x DSP is not needed to achieve this performance; it’s included for backwards compatibility and to enable customers to support future video codecs.
The justification for NEON is also pretty damn pitiful:
Article said:
It’s somewhat surprising that the Cortex cores in the OMAP 4 chips include the NEON multimedia extensions, given the number of other on-board multimedia engines. TI believes that the silicon area consumed by the NEON extensions is justified by the need for software compatibility (presumably with OMAP 3). In addition, TI says that in some cases customers may choose to split demanding multimedia tasks among multiple processing engines (for example, a Cortex-A9 core and the ‘C64x core) for higher performance.
So what's ST's excuse for including it in U8500? Stupidity? Okay, ignore me, once again I'm just bitter... :)
 
Few tidbits about scorpion from the same webpage dated at 2007.
Don't know if you've read it but there are few interesting details about snapdragon. You can find it here.
Article said:
Although Scorpion and Cortex-A8 have many similarities, based on the information released by Qualcomm, the two cores differ in a number of interesting ways. For example, while the Scorpion and Cortex-A8 NEON implementations execute the same SIMD-style instructions, Scorpion’s implementation can process128 bits of data in parallel, compared to 64 bits on Cortex-A8. Half of Scorpion’s SIMD data path can be shut down to conserve power. Scorpion’s pipeline is deeper: It has a 13-stage load/store pipeline and two integer pipelines—one of which is 10 stages and can perform simple arithmetic operations (such as adds and subtracts) while the other is 12 stages and can perform both simple and more complex arithmetic, like MACs. Scorpion also has a 23-stage floating-point/SIMD pipeline, and unlike on Cortex-A8, VFPv3 operations are pipelined. Scorpion uses a number of other microarchitectural tweaks that are intended to either boost speed or reduce power consumption. (Scorpion’s architects previously designed low-power, high-performance processors for IBM.) The core supports multiple clock and voltage domains to enable additional power savings.
It seems they really tweaked (changed) cortex A8 architecture. Now I understand why Arun says that scorpion is more expensive and more complex than cortex A8.
Article said:
At first glance, it doesn’t look like much—as noted earlier, Scorpion is expected to run at 1 GHz in a 65 nm process, which is slightly lower than the 1.1 GHz top speed that ARM currently quotes for the Cortex-A8 in 65 nm. Scorpion is quoted as providing 2100 DMIPS at 1 GHz; Cortex-A8 is quoted at 2000 DMIPS at the same speed. However, a notable difference is that the Cortex-A8 top speed is for a TSMC GP (general-purpose) process, while the Scorpion speed is for the LP (low-power) process.
Higher performance at supposedly lower power consumption is quite an achievement.
Updated Scorpion with 1.2GHz@45nm should consume roughly the same if not even less.
Article said:
Qualcomm claims that Scorpion will have power consumption of roughly 200 mW at 600 MHz (this figure includes leakage current, though its contribution is typically minimal in low-power processes). In comparison, ARM reports on its website that a Cortex-A8 in a 65 nm LP process consumes .59 mW/MHz (excluding leakage), which translates into about 350 mW at 600 MHz.
According to this it would seem that 1GHz scorpion should consume approx. 340 mW which is less than cortex A8 at 600mhz. Impressive and hopefully true.
 
Yeah, I read that way back in the day... ;) One quick point though: don't take mW/MHz too seriously. That's only true for constant voltage; which is basically never true. In Snapdragon's case, power consumption at 1GHz is sure to be much higher than that because voltages must also be much higher; why do you think they never quote any power numbers above 600MHz? :) That's not necessarily a bad thing, mind you. It's good for a design to have performance headroom in certain markets; just don't illusion yourself about power.
 
Yeah, I read that way back in the day... ;) One quick point though: don't take mW/MHz too seriously. That's only true for constant voltage; which is basically never true. In Snapdragon's case, power consumption at 1GHz is sure to be much higher than that because voltages must also be much higher; why do you think they never quote any power numbers above 600MHz? :) That's not necessarily a bad thing, mind you. It's good for a design to have performance headroom in certain markets; just don't illusion yourself about power.

I know. I try to take those things with a grain of salt ;)
But still I seems that clock per clock snapdragon may be more power conservative than competing solutions at least below 1GHz.
 
Yeah, I read that way back in the day... ;) One quick point though: don't take mW/MHz too seriously. That's only true for constant voltage; which is basically never true. In Snapdragon's case, power consumption at 1GHz is sure to be much higher than that because voltages must also be much higher; why do you think they never quote any power numbers above 600MHz? :) That's not necessarily a bad thing, mind you. It's good for a design to have performance headroom in certain markets; just don't illusion yourself about power.

Qualcomm has stated that the power consumption number for Snapdragon at 1GHz is 500mW. That is supposed to move down to 300mW at 45nm :D.

Of course, I have been reading about these numbers for about 3 years now so I am looking forward to finally seeing some real world implementations this summer.
 
Qualcomm has stated that the power consumption number for Snapdragon at 1GHz is 500mW. That is supposed to move down to 300mW at 45nm :D.

Of course, I have been reading about these numbers for about 3 years now so I am looking forward to finally seeing some real world implementations this summer.
Do you mean snapdragon as whole SoC or only scorpion cpu?
 
According to this pdf document we can assume that:

1)ATI graphic assets that qualcomm bought won't go to waste

2)Imageon series will be changed to adreno graphics and it will be given numbers to indicate opengl version and performance

3)Going even further it might indicate that the SoC package can have different GPU according to what the buyer wants

4)We will finally get satisfying 3D graphic performance from their chips

but I wonder will they end what ATI started with z460... they need to develop improved GPU's to become more competitive with SGX and GeForce ULV.

If it works maybe they will try to sell their adreno graphics GPU's just like TI does. One more player on the market would be a good thing :)
 
According to this pdf document we can assume that:

1)ATI graphic assets that qualcomm bought won't go to waste

2)Imageon series will be changed to adreno graphics and it will be given numbers to indicate opengl version and performance

3)Going even further it might indicate that the SoC package can have different GPU according to what the buyer wants

4)We will finally get satisfying 3D graphic performance from their chips

but I wonder will they end what ATI started with z460... they need to develop improved GPU's to become more competitive with SGX and GeForce ULV.

If it works maybe they will try to sell their adreno graphics GPU's just like TI does. One more player on the market would be a good thing :)

Hey that's the pdf link I posted on xda!!
http://forum.xda-developers.com/showpost.php?p=3515800&postcount=666
damn you Wishmaster stop following me! :p /jk
 
but I wonder will they end what ATI started with z460... they need to develop improved GPU's to become more competitive with SGX and GeForce ULV.

If they've started that kind of development fairly recently or at worst now, I'm afraid the result would compete rather with either/or next generation cores.
 
If they've started that kind of development fairly recently or at worst now, I'm afraid the result would compete rather with either/or next generation cores.

I know I'm not talking about SGX530 that comes with omap3 or about GPU from tegra1. I was thinking about SGX540 and tegra 2.

They should start it now to finish on time.

Hey that's the pdf link I posted on xda!!
http://forum.xda-developers.com/showpost.php?p=3515800&postcount=666
damn you Wishmaster stop following me! :p /jk

Sorry :)
just sharing what I found and what I think about it.
Didn't know you posted it on xda ;)
 
Last edited by a moderator:
Back
Top