ST-Ericsson Nova A9600: dual-core ARM A15, PowerVR Series 6

May I casually point those interested in small bits about the Rogue architecture towards sections 6.3 and 6.4.3?

Also a specific bit value in 7.1 may be interesting although too much should not be read into it (and obviously MRTs are also supported on SGX, although whether they're exposed is another issue, and if that maximum number of bits was lower then that may theoretically cause certain issues).

I think the document also makes quite clear that maybe (just maybe!) dependent texture reads are a performance issue on SGX and this is no longer the case on Rogue.
 
The difference in granularity of branching might indicate a larger change in the configuration of the design than I expected. Or I'm reading too much into it.
 
Nope, you're not reading too much into it; Rogue's quite a bit different at the scheduling and execution level than SGX (and by that I mean it's brand new and completely different :runaway: )
 
The difference in granularity of branching might indicate a larger change in the configuration of the design than I expected. Or I'm reading too much into it.

(Series 5/5XT Only) The branching granularity on PowerVR SGX hardware is one fragment or one vertex, meaning that area of fragments don’t have to be spatially coherent in terms of branching
· (Series 6 Only) Unlike the previous PowerVR SGX hardware, for branching to be effective on
‘Rogue’, fragments that branch should be spatially coherent.
* A branch granurality of one is pitiful compared to what today's desktop GPUs are capable of. However considering power constraints many of the so far shortcuts mobile architectures took should be perfectly understandable.

* Lowp or else INT10/8 are gone just as I suspected, since I would had been very surprised if Rogue wouldn't contain "scalar" SIMDs.

* Mathematical lookups are redundant to avoid bottlenecks.

* Dependent texture reads are no longer a headache (amongst others I assume).

Nope, you're not reading too much into it; Rogue's quite a bit different at the scheduling and execution level than SGX (and by that I mean it's brand new and completely different :runaway: )

http://www.imgtec.com/powervr/sgx_series6.asp

..... an enhanced scheduling architecture; dedicated housekeeping processors; and a next generation Tile Based Deferred Rendering architecture. These features combine to produce a highly latency tolerant architecture......

Dedicated housekeeping processor based on Imagination’s Meta technology

A "super-threaded" housekeeper then.
 
* A branch granurality of one is pitiful compared to what today's desktop GPUs are capable of. However considering power constraints many of the so far shortcuts mobile architectures took should be perfectly understandable.

In what way would allowing arbitrary divergence without the SIMD penalty seen on modern desktop GPU's be pitiful?

Note that there are penalties associated with flow control on SGX but they have nothing to with granularity.
 
In what way would allowing arbitrary divergence without the SIMD penalty seen on modern desktop GPU's be pitiful?

Note that there are penalties associated with flow control on SGX but they have nothing to with granularity.

The layman here thought that flow control and granularity are associated. Point taken, another lesson learned.
 
In what way would allowing arbitrary divergence without the SIMD penalty seen on modern desktop GPU's be pitiful?

Note that there are penalties associated with flow control on SGX but they have nothing to with granularity.

I don't think he meant pitiful the way you think he did (kind of a strange word choice IMO).. the improved flexibility is good. He must have been referring to the efficiency loss implied, hence why Series 6 seems to have reduced granularity.

Given that Series 5 designs started with just a couple USSEs and were big on explicit SIMD instead of thread implicit - which has the obvious benefit of being able to get more work done per cycle with smaller data types - it's not surprising that they'd also start with full branch granularity. But then move away from it when the ALU requirements scaled a lot higher.

I wasn't actually expecting the 8/10-bit explicit SIMD to be dropped. I assumed the 4x10-bit SIMD capability was necessitating 40-bit registers and carried some overhead due to that. Maybe limiting things to 3x10-bit wasn't worth it (I'm going to assume the 4x8-bit format didn't work that way natively), or maybe there's a much bigger cost in handling both an 8/10-bit fixed point format along with two floating point formats than there is in the traditional integer SIMD divisions.

AFAIK IMG was the only company offering a faster 10-bit lowp option, kind of a shame to see it go.
 
She's a regular in these PowerVR texture compression articles (and TC articles in general, of course). Same with the parakeet(?).
 
See Simon F's personal 3D pages. It's one of his references images used during the development of PVRTC.
 
I don't think he meant pitiful the way you think he did (kind of a strange word choice IMO).. the improved flexibility is good. He must have been referring to the efficiency loss implied, hence why Series 6 seems to have reduced granularity.

Not sure how we'd reduce Rogue branch granularity to be less than '1'... ;-)
 
Err... I think you have that all back-the-front.

Not all too bad since back to front is one of the TBDR strengths :LOL: I should more often have such brainfarts since it seems to get most of you boys into one thread ;)
 
Back
Top