ST-Ericsson Nova A9600: dual-core ARM A15, PowerVR Series 6

Just a question: what exactly are the plans of ST-E with this chip? AFAIK (but I could be totally wrong on this) they are not a mobile SOC power house.
 
Just a question: what exactly are the plans of ST-E with this chip? AFAIK (but I could be totally wrong on this) they are not a mobile SOC power house.

who knows...given that last month they revealed that their 5XT chip, L8540 and the SOI variant samples are "expected" to be available this quarter, which means they'll not see handsets until H2 2013, I do not expect a rogue chip from ST until well into 2014..

Its been suggested that Nokia and/or Sony might be clients for ST nova chips with IMG graphics in them.


"During the quarter both the NovaThor L8540 LTE ModAp platform and the FD-SOI (Fully Depleted Silicon On Insulator) variant of this product were taped out and sample wafer fabrication started. Samples of both products are expected to be available during Q4."

http://www.stericsson.com/press_releases/Q32012.jsp
 
Last edited by a moderator:
The ability to shut down USC cluster pairs sounds interesting. I wonder if the G6400 or G6430 have a X2 power-saving mode as well or if this is a new feature exclusive to the G6630?

No idea to be honest; however since it's being mentioned for the first time I'd rather tend to believe that it's for GC6630 only for the time being. While it's definitely an added plus to have, on the other hand assume you integrate such a GPU cluster into a high end tablet, it might save quite a bit on power in 2D or idle status but on the other hand the device will have to have a large enough battery either way to support full GPU utilization under 3D.

who knows...given that last month they revealed that their 5XT chip, L8540 and the SOI variant samples are "expected" to be available this quarter, which means they'll not see handsets until H2 2013, I do not expect a rogue chip from ST until well into 2014..

Its been suggested that Nokia and/or Sony might be clients for ST nova chips with IMG graphics in them.


"During the quarter both the NovaThor L8540 LTE ModAp platform and the FD-SOI (Fully Depleted Silicon On Insulator) variant of this product were taped out and sample wafer fabrication started. Samples of both products are expected to be available during Q4."

http://www.stericsson.com/press_releases/Q32012.jsp

When I first saw the Novathor A9600 announcement stating that it was expected to sample within 2011 I couldn't help but laugh. Whether ST Ericsson or ST Micro or whoever stands behind such bold statements, they really should learn to be far more realistic with such claims. At the very least when you're uncertain pre-announce it that early if you really have to and leave anything considering sampling or availability in the TBA realm until you're absolutely certain of the timeframe.

At least now that we know that ST Micro is manufacturing at Samsung, there's some hope that things will be on track from now on. In that regard I'd agree with you that late 13' would be an unexpected surprise for the A9600 and early 2014 the most reasonable scenario as it stands.

--------------------------------------------------------------------------------------------------
As a sidenote: I'm merely using this thread for some generic Rogue stuff because I didn't want to open another thread exclusively for it.
 
Some background info on rogue and the newest core, the GC6630
http://withimagination.imgtec.com/index.php/powervr/powervr-g6630-go-fast-or-go-home

Some personal observations:

From the graphic, one might approximate that the "all out" versions of the GC6200 and GC6400 core are no more that 25% quicker.

None of the announced cores are DX11 compliant.

From the narrative, one might conclude that the GC6200 core delivers 100Gflops, and therefore the GC6630 might be hitting 350 Gflops, but of course it all depends on clock speed.

Given the missing in action ST nova9600 is quoted as having 210 Gflops graphics performance, it looks very much like the GC6400 or GC6430.
 
From the narrative, one might conclude that the GC6200 core delivers 100Gflops, and therefore the GC6630 might be hitting 350 Gflops, but of course it all depends on clock speed.
It does very much depend on clock speed - frankly 100GFlops on GC6200 is not something that you'll see in typical customer chips based on that design. It's mostly marketing to say that the family starts at that level of performance and in practice this is probably a good thing as it means there is no gap in our line-up.

Realistically the difference between "achievable" and "typical" clock speeds will continue to widen over time as mobile platforms become more and more power limited; this encourages SoC suppliers to use higher-end GPUs but clock them lower. It's fundamentally more efficient to undervolt a big GPU than run a small GPU at the nominal process voltage. So while the achievable clock speed for Rogue is very significantly higher than SGX on the same process, the typical increase likely won't be as high.

Given the missing in action ST nova9600 is quoted as having 210 Gflops
That number of flops per core was not necessarily based on the same architectural revision as the final shipping one. Although any reduction in the number of flops per cluster might not have been done with the intention of reducing the effective ALU:TMU ratio; but instead likely to improve both the performance and efficiency of the existing flops. Let's just say NVIDIA's Missing MUL on G80 made for a good story, but it was still a dubious architectural decision :) The focus should be on efficiency, not peak flops.
 
Some background info on rogue and the newest core, the GC6630
http://withimagination.imgtec.com/index.php/powervr/powervr-g6630-go-fast-or-go-home

Some personal observations:

From the graphic, one might approximate that the "all out" versions of the GC6200 and GC6400 core are no more that 25% quicker.

None of the announced cores are DX11 compliant.

From the narrative, one might conclude that the GC6200 core delivers 100Gflops, and therefore the GC6630 might be hitting 350 Gflops, but of course it all depends on clock speed.

Given the missing in action ST nova9600 is quoted as having 210 Gflops graphics performance, it looks very much like the GC6400 or GC6430.
When they say OpenCL support I wonder if they mean full profile or just embedded profile. And apparently, the SGX544 and SGX554 don't just add DX9 support over the SGX543, they also add OpenCL 1.1 support vs OpenCL 1.0.
 
ST-E always gave me the impression they were targeting over 600 MHz on a G6400 for their FD-SOI A9600 to reach over 200 GFLOPS. If Apple has always been targeting mid-2013 for a Series6 SoC iPad introduction, they might be looking to use a large-sized Rogue variant (maybe a G6430 or some similar custom core) and balance it with a slightly lower clock rate target.

And, of course, the focus on FLOPS is just for the sake of providing a rating. The design work done to keep all units and aspects of the core doing as much useful work as efficiently as possible, all directed and controlled by the software environment like drivers and compilers and everything else, is really what separates the real world performance of one GPU from another.
 
Last edited by a moderator:
Some background info on rogue and the newest core, the GC6630
http://withimagination.imgtec.com/index.php/powervr/powervr-g6630-go-fast-or-go-home

Some personal observations:

From the graphic, one might approximate that the "all out" versions of the GC6200 and GC6400 core are no more that 25% quicker.

None of the announced cores are DX11 compliant.

I thought "all out" stands for DX11.1 and especially GC6x30 core variants?

From the narrative, one might conclude that the GC6200 core delivers 100Gflops, and therefore the GC6630 might be hitting 350 Gflops, but of course it all depends on clock speed.

Given the missing in action ST nova9600 is quoted as having 210 Gflops graphics performance, it looks very much like the GC6400 or GC6430.
If a platform targets win8 or any succeeding OS it's likelier that the core "goes all out" :p

If "all out" stands for something completely different, all hail to the marketing department for levels of secrecy and confusion that are close to ridiculous levels.

By the way for frequencies in Rogue:
http://withimagination.imgtec.com/index.php/powervr/the-rise-of-gpu-compute

3_cpu_vs_gpu_GFLOPS_bars.png
 
the small variant G62x0 has ~30% more Flops as the SGX 543MP4 @ the same frequency. And the bigger variant G64x0 has even ~160% more flops than the 543MP4 (also at the same frequency)

edit:
the 543MP4 has a peak FLOP-rating of 32GFlops @ 250MHz with 64MAD's / 128 Flops per Hz according to anandtech.

Therefore a G6xx0 should have ~41,5MAD / ~83Flops per cluster. This is a strange number, or?
 
Last edited by a moderator:
It does very much depend on clock speed - frankly 100GFlops on GC6200 is not something that you'll see in typical customer chips based on that design. It's mostly marketing to say that the family starts at that level of performance and in practice this is probably a good thing as it means there is no gap in our line-up.

I assuming I can take from the above that GC6200 implementations will typically be less than 100Gflops.

I also extrapolate from the graph that Ailuros reminded us about (assuming its accurate), that the GC6600@600Mhz will be about 10x the performance of the 543MP3 in the iphone5 (assuming my estimate of its clock @325Mhz is correct). Furthermore, one could work back and see that an the GC6600@600Mhz is about 40x an SGX540@200Mhz, and so probably 60x is not out of the question compared to some SGX535 implementations. Howeve Gflops performances translates into graphics performance is an entirely different matter.

Of course whether it is useful to be comparing the highest announced series 6 with one of the very first implementations of series 5, some 5.5 years ago, is another matter.
 
From a garbled google translation it seems:

-each USC has 16 scalar ALUs
-some speculation about OpenCL 1.2 compatibility with the ability of double-precision floating-point by combining 2 ALUs or operating over 2 cycles
-each shared texture pipeline is capable of processing 4 texels/clock
-the "3" in G6x30 is the addition of "frame buffer compression logic"
-there's something about initially only a DX10 WHQL driver will be available, but depending on adoption a driver with full DX11 support will be released
 
so 16 scalar ALU's per USC

100GFlops / 600 MHz / 2 / 16 = 5,2 Flops per ALU

IMHO, this is again a strange number. Don't they use a MAD architecture? 2,6 MAD per USC should not be possible. So they seem to use a VLIW5 architecture with no MAD functionality, or?

600MHz x (16 x 5) x 2 = 96 SP-GFlops and 48 DP-GFlops for the G62x0
 
It's a marketing-specific graph and isn't necessarily intended for you to reverse engineer the architectural highlights.
 
Please stop using that graph to work out flops per ALU :D

I actually re-posted it only to support Arun's post about higher frequencies. I don't even see a scale in that graph so I'm obviously missing what others can see.

so 16 scalar ALU's per USC

100GFlops / 600 MHz / 2 / 16 = 5,2 Flops per ALU

IMHO, this is again a strange number. Don't they use a MAD architecture? 2,6 MAD per USC should not be possible. So they seem to use a VLIW5 architecture with no MAD functionality, or?

600MHz x (16 x 5) x 2 = 96 SP-GFlops and 48 DP-GFlops for the G62x0

I've given up a LONG time trying to figure out how the unit layout may be in any Rogue variant. When ST announced a long time ago its A9600 the only other thing I could make out from the data with speculative reverse math is that the numbers had been most likely calculated with a target frequency of 667MHz in mind for 2 TMUs/cluster and hence 8 TMUs for a quad cluster. At least if I'm not misinterpreting the machine translated mambo jumbo from that japanese site for "4 texels per shared texture pipeline" it seems to be in line of what I've figured out this far.

All my attempts to understand where FLOPs come from are most likely because some of them are odd amounts of some SFU units (or something else?) that can contribute another ADD or MUL along with MADDs per ALU lane and if you carefully read into Arun's post above it might be the case. Under those circumstances you'll never find out that easily how ALUs are exactly laid out. Nonetheless ARM Mali T604 is supposed to yield 72 GFLOPs FP32; now you tell me how you get there when there are most likely only 4*SIMD16 blocks.
 
When they say OpenCL support I wonder if they mean full profile or just embedded profile. And apparently, the SGX544 and SGX554 don't just add DX9 support over the SGX543, they also add OpenCL 1.1 support vs OpenCL 1.0.
Not quite. I've mentioned elsewhere that 1.1 Embedded is coming to all USSE revisions, not just USSEv2. So 543 is capable (as is 540 and friends).
 
Not quite. I've mentioned elsewhere that 1.1 Embedded is coming to all USSE revisions, not just USSEv2. So 543 is capable (as is 540 and friends).
It's good to know that's still the case. I guess that IP roadmap figure is another pretty chart that's better for marketing than technical correctness. :eek:
 
Back
Top