Warning: anything below is based only on a few sparse details found here and there and the rest is pure layman's speculation:
I'm interested in the architecture of PowerVR's unified scalable shader engine. Does it have vector or scalar units?
I'd place my bets on the latter, yet no one can be really sure at this point. Even if someone would answer that question, it still comes down what each and everyone means nowadays exactly with "scalar" or "superscalar" or whatever term is being (ab-)used.
Does the thread management derive from their Metagence stuff, and does that imply anything interesting?
If memory serves well Metagence was a division that started a lot later than PowerVR itself. Simple reasoning would tell me that a GPP/DSP core was actually "born" out of existing past GPU IP than the other way around.
Metagence sports according to IMG's own claims "superthreading"; a rough explanation can be found here (written with the aid from IMG employees):
http://www.audiodesignline.com/showArticle.jhtml?articleID=183701195
I might be completely wrong, but the whole superthreading thingy sounds to me more like an optimisation which is necessary for a general purpose core, unlike a typical GPU which since its conception was laid out for handling a large amount of threads in parallel.
How scalable is it, can you combine much more than 8 of them for a mainstream class PC product? (This is fantasy, I know... too bad Intel had too much NIH attitude to create killer integrated graphics around Series 5, let alone a discrete GPU. I suppose it's also one of the best archs for multichip or multicard solutions...)
If you press a gun at my head I'd say that the SGX555 (which according to IMG's roadmap is to launch (?) in 2009) is that 8x times scaled "something" compared to lower end cores. Here a few details from the latest SGX whitepaper:
PowerVR SGX core architecture currently comprises
seven variants, with sizes ranging from less than
1.5mm2 to 20.3mm2 in a 65nm process:
l SGX 510, 520, 530 - mobile, wireless
l SGX 535, 540 - high-end mobile, automotive
l SGX 545, 555 - PC, games consoles
20.3mm2@65nm sounds tiny, but it then also comes down what one would mean by "PC" exactly. An IGP is a PC part too in that sense.
Maximum effective pixel fillrate performance ranges
from 100Mpix/sec to 4000Mpix/sec @ 200MHz. Polygon
throughput ranges from 2Mpoly/sec to 100Mpoly/sec
@ 200MHz.
Let me sum the highest end thingy up:
4000MPix/s@200MHz
100MPolys/s
20,3mm2@65nm
I'd be damned if I hadn't seen a footnote in another document stating that above rates are at less than 50% shader load (which might imply 1 Tri/clock). Anyway my guess would be 8 TMUs * 200MHz * 2.5 overdraw = 4000; and that might be the mythical "8" you're looking for.
Given a partner requesting a higher end part and IMG being able to divert the needed amount of resources, I don't see why they couldn't scale such a tiny core a lot more; even more so under a smaller manufacturing process. Trick being that the former sentence has too many conditionals for my taste.
How much is texture filtering decoupled from the shader core? Et cetera...
No idea; they tout deferred texturing for one and de-coupled geometry from pixel processing.
Just don't expect Simon to comment on the above *snicker*