ST-Ericsson Nova A9600: dual-core ARM A15, PowerVR Series 6

rektide

Newcomer
Describing the 28nm A9600, which the press release says will sample in 2011:

It features the industry’s best and most efficient low-power implementation known today of a dual ARM® Cortex- A15 MPCore™ with each core running up to 2.5GHz thanks to very innovative power saving techniques to be disclosed later this year.

POWERVR Rogue GPU that delivers in excess of 210 GFLOPS. The graphics performance of the A9600 will exceed 350 million ‘real’ polygons per second and more than 5 gigapixels per second visible fill rate (which given POWERVR’s deferred rendering architecture results in more than 13 gigapixels per second effective fill rate).

Here's the press release.. They appear to be converting to PowerVR from Mali.
 
They're a lot like Samsung in that regard, with the handset unit choosing suppliers of their processors independent of SE's semiconductor platform side.

While they won't have a better time to choose their own tech next year, they've made some odd choices in the recent past.
 
Running the core GPU above 400 MHz and the ALUs to some extent a lot higher, some of the performance figures might start to make some sense.
 
Running the core GPU above 400 MHz and the ALUs to some extent a lot higher, some of the performance figures might start to make some sense.

You'd still need 12 TMUs@>400MHz to exceed the 5.0 GPixels/s mark. TMUs are anything but cheap in hw, rather the contrary; especially if the texturing capabilities of those exceed the DX10.1 level.
 
Yeah, I'm off. I didn't factor in the SoC in a larger context nor properly account for the rising demand for resolution in these devices.

Nova A9600 should equip 8 TMUs clocked at 625 MHz. Other parts clocked higher.
 
Yeah, I'm off. I didn't factor in the SoC in a larger context nor properly account for the rising demand for resolution in these devices.

Nova A9600 should equip 8 TMUs clocked at 625 MHz. Other parts clocked higher.

It'll lead to a dead end with this endless guessing game. In order for your scenario to work you'd end up with 64 ALUs (in order to sustain a 4:1 ALU:TMU ratio) with either 4 FLOPs/ALU at a higher frequency than 625MHz as you say or 8 FLOPs/ALU at a lower frequency than 625MHz.

Not that I know anything but how about a triple core each with 4 TMUs and a >400MHz frequency across the core?
 
It'll lead to a dead end with this endless guessing game. In order for your scenario to work you'd end up with 64 ALUs (in order to sustain a 4:1 ALU:TMU ratio) with either 4 FLOPs/ALU at a higher frequency than 625MHz as you say or 8 FLOPs/ALU at a lower frequency than 625MHz.

Not that I know anything but how about a triple core each with 4 TMUs and a >400MHz frequency across the core?

If we start with metrics of the 543/544 core that we do know and look at the pure graphics performance:-
@200Mhz 35M polys/s, 400M pixels/s (no overdraw).

Is it too simplistic to say that 554 with its x2 pipes will do twice this i.e. 70M polys + 800M pixels/s. if so at an imaginary 600MHz, a dual core 554 would be hitting 420M polys + 4.8G pixel/s.
(ignoring bandwidth issues that are probably a concern at that stage !)

So is it not conceivable that a next gen rogue core could acheive somewhat less than those figures (350M polys and 5.2G pixels) using a single core @ 500Mhz ?, i.e. less than x2 the performance of a single core 554, but design for much higher frequencies, noting that it's likely that ST's fill rate includes an overdraw allowance.
 
Last edited by a moderator:
If we start with metrics of the 543/544 core that we do know and look at the pure graphics performance:-
@200Mhz 35M polys/s, 400M pixels/s (no overdraw).

Is it too simplistic to say that 554 with its x2 pipes will do twice this i.e. 70M polys + 800M pixels/s. if so at an imaginary 600MHz, a dual core 554 would be hitting 420M polys + 4.8G pixel/s.
(ignoring bandwidth issues that are probably a concern at that stage !)

Are you sure you wouldn't want to redo that match?

So is it not conceivable that a next gen rogue core could acheive somewhat less than those figures (350M polys and 5.2G pixels) using a single core @ 500Mhz ?, i.e. less than x2 the performance of a single core 554, but design for much higher frequencies, noting that it's likely that ST's fill rate includes an overdraw allowance.

Errr nope:

The graphics performance of the A9600 will exceed 350 million ‘real’ polygons per second and more than 5 gigapixels per second visible fill rate (which given POWERVR’s deferred rendering architecture results in more than 13 gigapixels per second effective fill rate).

http://www.stericsson.com/press_releases/NovaThor.jsp

>5GPixels without overdraw, >13GPixels with overdraw according to ST always.
 
Think of it as an HD6470 @ 650MHz.

IMHO the 5,2GPixel (13GPixel with overdraw) point to 650MHz or 1300MHz. Otherwise the Series6 would be too big @ 28nm; or ST is using a 2MP config without mentioning it in the press release.


According to Gipsel, one SP-block with 40 SPs measures 1,85 mm² (based on the latest Die-Shot of a Bobcat APU, you can find here: http://www.hardware.fr/art/imprimer/819/ ). The 8 TMUs add a little bit of area. Result: 3 mm² (as far as I have understood it: 40 SPs + 4 TMU)

so at 40nm the 160 SPs + 8 TMU would need 1.85 x 4 + (3-1,85) x 2 = 9,7 mm²

With 28nm this could be reduced to 4,8 mm².... unfortunately GPUs need more than only SP's and TMU's. ;)
 
Think of it as an HD6470 @ 650MHz.

IMHO the 5,2GPixel (13GPixel with overdraw) point to 650MHz or 1300MHz. Otherwise the Series6 would be too big @ 28nm; or ST is using a 2MP config without mentioning it in the press release.


According to Gipsel, one SP-block with 40 SPs measures 1,85 mm² (based on the latest Die-Shot of a Bobcat APU, you can find here: http://www.hardware.fr/art/imprimer/819/ ). The 8 TMUs add a little bit of area. Result: 3 mm² (as far as I have understood it: 40 SPs + 4 TMU)

so at 40nm the 160 SPs + 8 TMU would need 1.85 x 4 + (3-1,85) x 2 = 9,7 mm²

With 28nm this could be reduced to 4,8 mm².... unfortunately GPUs need more than only SP's and TMU's. ;)

Unfortunately Fusion hasn't been designed either for smart-phone heights. I wouldn't expect by the way that all of the Rogue variants will be 100% DX11; there are probably quite a few X11 aspects that could be cut back due to being redundant in the embedded space and better dedicated to additional performance transistors.

I'll let a bit of the speculation out that I currently have and the IMG folks can laugh their heart out as much as they want since I'm sure it'll be wrong, but at least more feasible than the senseless math based on something entirely different.

Rogue/A9600 = "triple core"

Per core:

16 VecX ALUs
4 TMUs
120M Tris/s@450MHz

16*10*0.45= 72 * 3 = 216
4*450 = 1.8 * 3 = 5.4
120*3*95% = 350
 
Rogue/A9600 = "triple core"

Per core:

16 VecX ALUs
4 TMUs
120M Tris/s@450MHz

16*10*0.45= 72 * 3 = 216
4*450 = 1.8 * 3 = 5.4
120*3*95% = 350

10 FLOPs per ALU.. is that 4x FMADD + 2 for something else?
 
Last edited by a moderator:
10 FLOPs per ALU.. is that 4x FMADD + 2 for something else?

It's a scenario from the layman here. I just used reverse math starting from the TMUs and went from there to the ALUs.

I'd like to stand corrected but I'd be very surprised if they've gone the Vec5 (or VLiW5) road after all; seeing that AMD went just recently from VLiW5 to VLiW4 it would sound strange. 5XT ALUs are already 4+1 which probably gives you a 9th FLOP in probably non3D applications. Since 16*10 boded very well with the other stuff I could guess that there's a 2nd unit this time around that could be used for GP stuff and add another FLOP.

As I said I'm sure it'll be 100% wrong and the IMG folks are laughing in their fists heh :(
 
Im curious as to when this would release, given that TI is the lead licensee and OMAP 5 is slated to be available in devices in H2 2012, when would the other licensees release?

If Cortex A15 based SoC's are not going to be available before H2, that would mean the next gen SoC's (slated for release in early 2011) would either go with a dual or quad Cortex A9 on 28nm.
 
ST-Ericsson must be able to not only avoid ramping up the main CPUs much but to control consumption well even when they do ramp in order to offer that 2.5 Ghz configuration.

With the amazing 5G+ tex/sec real world fill rate, the GPU is also apparently keeping to the higher standard of clock rate.

The differentiation in approaches and implementation in SoC design is really accelerating now. When ST-Ericsson and IMG reveal their innovations for achieving this later this year, we'll have a better idea about the evolution of app processors in this space and the competitive landscape.
 
Well Arun's speculations might be closer to reality than mine after all; it could very well be that GPU frequencies end up significantly higher under 28nm then I imagined so far.

That's a weird revival Lazy8s by the way ;)
 
Back
Top