PowerVR SGX543

Arun · Feb 24, 2009

Hey, nobody said a TMU was defined as something which outputs one bilinear result per cycle

It could need two clocks to output one. Alternatively it could refer to the number of shader cores, of course... While this might seem to be marketing BS, there's nothing that prevents your TMUs being half-pixel at a HW level too.

Anyway this thing is too confusing given the die size claims, I give up. At the very least, the numbers in the Series5 paper must be wrong wrt fillrate (i.e. 12.5mm² doesn't refer to the same chip as 1000MP/s with 2.5x overdraw) otherwise 32mm² for 8 TMUs vs 12.5mm² for 2 TMUs represents a 56% perf/mm² increase in terms of fillrate, while the PR claims most of the perf gains are ALU-related so the perf/mm² increase would have to be even larger! Surely that can't be... Even DX10.1 support is unlikely to change things to such a degree.

Ailuros · Feb 24, 2009

If and I mean IF SGX543 has 4 TMUs, then it's not my fault that its very own product announcement is conflicting itself. If it has 4 TMUs I'd love to know how someone gets to 1 GPixel fillrate with a 2.5x depth complexity.

***edit:

Arun,

I admit you've convinced me. That SGX543MP thing can eventually mean a quad core config; now what I'll have a very hard time getting convinced of is "linear scaling" with 4 cores...

CarstenS · Feb 24, 2009

Is it known how the internal organisation of PowerVRs USSE (2) works?

I am asking because of this passage in the Whitepapers wrt USSE2
"An extended instruction set with comprehensive vector operations and co-issue capabilities enables advanced geometry and pixel processing as well as GP-GPU tasks."
This looks like being in contrast to what was done with USSE - otherwise there'd be no need to waste two lines of text in the Whitepaper IMO.

Let's see, if copy and paste works:

Code:

	                   SGX520 SGX545 SGX543 SP SGX543 DP SGX543 QP
Die size (sqm)	            2,6	  12,5	 8	   16 	     32
Mtri/sec	            7	  40	 35	   70	     140
Mpix/sec (OD: 2.5)         250	  1000	 1000	   2000	     4000
Mpix/sec (OD: 1.0)     	    100	  400	 400	   800	     1600
Z-Reject/clk               1x  	  1x	 2x	   2x	     2x   
TriSetup/clk/core          1x	  1x	 1.5x	   1.5x	     1.5x
"Perf Shader-Heavy"/core   1x	  1x	 1.4x	   1.4x	     1.4x
FP-ALU/clk/core	            1x	  1x	 2x	   2x	     2x
Pipelines/Core	            ?	  ?	 4	   4	     4

Ailuros · Feb 24, 2009

CarstenS said:
Is it known how the internal organisation of PowerVRs USSE (2) works?

I am asking because of this passage in the Whitepapers wrt USSE2
"An extended instruction set with comprehensive vector operations and co-issue capabilities enables advanced geometry and pixel processing as well as GP-GPU tasks."
This looks like being in contrast to what was done with USSE - otherwise there'd be no need to waste two lines of text in the Whitepaper IMO.

The Series5 (USSE1) has been slightly revamped too. Re-download it and you'll see GPGPU being mentioned several times in that one too. What you'll miss in that one is "co-issue".

CarstenS · Feb 24, 2009

Yeah, that's what I wanted to refer to.

If USSE1 was Vec2-based and USSE2 can now co-issues two scalars (or any other combination which breaks down to a 1:2 ratio), then there's the doubled FP-troughput.

darkblu · Feb 25, 2009

CarstenS said:
Yeah, that's what I wanted to refer to.

If USSE1 was Vec2-based and USSE2 can now co-issues two scalars (or any other combination which breaks down to a 1:2 ratio), then there's the doubled FP-troughput.

my knowledge of the SGX is entirely 53x-derived, so somewhere down the genealogy tree it surely deviates from the truth, and yet, let me help you out here.

USSE is scalar-based for fp32, and 2- and 4-wide SIMD for int16 and int8, respectively. register file is 2048 entries (in 16 banks x 128), 32 bits each. the thread scheduler can choose among 16 threads at zero-cost swap, i.e. up to 16 threads are in-flight, but of course only #USSE x 2 (dual-issue) run at any given time. a task scheduler tries to supply those 16 threads from a 16-deep task queue.

also, the depth/stencil test is 4 fragments/clock/USSE, and the fillrate is 1 pixel/clock/USSE.

basically, you want to check out TI's OMAP3 and intel's SCH docs (SGX 530 and 535, respectively).

OMAP35x Applications Processor Technical Reference Manual
Intel® System Controller Hub (Intel® SCH) Datasheet

CarstenS · Feb 25, 2009

Thanks a bunch!

tangey · Mar 16, 2009

IMG have just announced some more details about their multi-core SGX543 products.

Its available in 2 core, 4 core,8 core and 16 core (!) variants. The 16 core variant is for "high-performance console and computing devices"

8-core variant (32 pipes) @ 400Mhz provides 532M polys per sec and in access of 16Gpixel/sec fill rate

http://www.imgtec.com/News/Release/index.asp?NewsID=449

Panajev2001a · Mar 16, 2009

tangey said:
IMG have just announced some more details about their multi-core SGX543 products.

Its available in 2 core, 4 core,8 core and 16 core (!) variants. The 16 core variant is for "high-performance console and computing devices"

8-core variant (32 pipes) @ 400Mhz provides 532M polys per sec and in access of 16Gpixel/sec fill rate

http://www.imgtec.com/News/Release/index.asp?NewsID=449

I'd like to know how it compares die-size wise to RSX and what that polygon number refers to... in any case... 100 MHz less than RSX and >2x the polygon throughput (and lots of fill-rate too). I'd like to see it under some DX9/DX10 benchmarks, but I'll settle for Crysis FPS scores

.

Hehe, a 16 core variant at ~500 MHz would be PS4 worthy

.

argor · Mar 16, 2009

@tangey thank for the info

i have a question did they release any numbers for the 16 core variant i did not see any when i was skimping over the Press Release

tangey · Mar 16, 2009

argor said:
i have a question did they release any numbers for the 16 core variant i did not see any when i was skimping over the Press Release

No they didn't however their data refers to 95% linear scalability. which is born out in a comparsion between previously released data on the single core 543 @ 200Mhz (35M polys/s and fill rate of 1 Gpixel/s) and the data today on the 8-core product @ 400Mhz @ 532M polys (95% linear) and 16Gpixels per sec (100% linear)

One can conclude that a 16-core 200Mhz core would operate similarly to the above, and a 16-core 400Mhz part would operate virtually twice as quick.

I find it highly significant that IMG for this part are including performance figures for 400Mhz clock, thats never happened before. Given that obviously the multi-core parts are going to be signficant sizewise, and given the higher clock speeds that are being quoted, then IMGs targets are firmly set above its current mobile phone dominating position, perhaps look at a laptop / tablet capable part, or indeed a non-portable console.
We know that they have already announced a major consumer electronics company has licenced a "forthcoming member" of the SGX family, this is widely assumed to be Sony and a next gen PSP, but the higher end variants that IMG have announced today are far far beyond what would be required or suitable, so either the PSP conclusion is wrong, or IMG have they sights on other things higher up the ladder as well as the next gen PSP.

http://www.imgtec.com/News/Release/index.asp?NewsID=412

Entropy · Mar 16, 2009

tangey said:
I find it highly significant that IMG for this part are including performance figures for 400Mhz clock, thats never happened before. Given that obviously the multi-core parts are going to be signficant sizewise, and given the higher clock speeds that are being quoted, then IMGs targets are firmly set above its current mobile phone dominating position, perhaps look at a laptop / tablet capable part, or indeed a non-portable console.
We know that they have already announced a major consumer electronics company has licenced a "forthcoming member" of the SGX family, this is widely assumed to be Sony and a next gen PSP, but the higher end variants that IMG have announced today are far far beyond what would be required or suitable, so either the PSP conclusion is wrong, or IMG have they sights on other things higher up the ladder as well as the next gen PSP.

http://www.imgtec.com/News/Release/index.asp?NewsID=412

Highly intriguing indeed.
However, wordings in IMG press releases have previously explicitly mentioned target markets that never materialized. Until one of their licensees actually bring a product to market, it's a good idea to be restrained in the interpretation of their press releases. They are not device manufacturers after all.

That said, it's difficult not to calculate die sizes, estimate power draw, sketch the necessary memory architecture to feed a 16-core beast, et cetera.

Interesting times.

Ailuros · Mar 17, 2009

Panajev2001a said:
I'd like to know how it compares die-size wise to RSX and what that polygon number refers to... in any case... 100 MHz less than RSX and >2x the polygon throughput (and lots of fill-rate too). I'd like to see it under some DX9/DX10 benchmarks, but I'll settle for Crysis FPS scores .

Hehe, a 16 core variant at ~500 MHz would be PS4 worthy .

I think you're vastly underestimating what SONY or any other console manufacturer has in mind in terms of performance for next generation consoles.

Besides SGX (however scalable it may be) has been designed with the lower end markets in mind and is miles away from a real high end design like you'd find in a console or PC. We're not that far apart from seeing 32SP IGPs at all; with a 16MP SGX543 they're very well equipped to reach the midrange laptop market if a partner like Intel choses to integrate something like that in future SoCs and isn't developing something on its own.

Entropy said:
Highly intriguing indeed.
However, wordings in IMG press releases have previously explicitly mentioned target markets that never materialized. Until one of their licensees actually bring a product to market, it's a good idea to be restrained in the interpretation of their press releases. They are not device manufacturers after all.

True. IMHLO I consider the handheld console market at this point given. Question remains if they can breakthrough to higher end markets like the PC/laptops via SoCs (and not standalone GPUs).

That said, it's difficult not to calculate die sizes, estimate power draw, sketch the necessary memory architecture to feed a 16-core beast, et cetera. Interesting times.

Multicore cannot come with any die area redundancy in my mind. The result has to be still competitive against anything the competition might have.

Panajev2001a · Mar 17, 2009

Ailuros,

I might be underestimating things... but with those specs, assuming they are meaningful, we are talking about several times the speed of RSX.

1.33 Billion Polygons/s (RSX's Set-up engine is limited to 0.25 Billion Polygons)

32 GPixels/s (RSX hits around 4 GPixels/s... 8 ROPS, ~500 MHz)... take away the 2.5-3x overdraw factor IMG usually mentions in their figures and you are still left with 10.67-12.8 GPixels/s of fill-rate which again is more than 2x the fill-rate of RSX.

Still, I think and hope PS4's GPU will come from nVIDIA (just that it is better integrated with the system than RSX was) and its CPU will be a refinement of the current CELL CPU (higher clock-speed, maybe more LS per SPE, enhancements to the PPE core and SPE core, maybe more PPE's and SPE's)... as well as a big focus on lots and lots of fast RAM given to the system. I want PS4's SDK and OS to be ready early, to be refined and optimized, and for developers to move with the least possible pain from PS3 to PS4 while having major issues they faced with PS3 fixed.

Arun · Mar 17, 2009

I think one of the key problems with SGX543 for a console is probably that the ALU ratio isn't forward-looking enough. It's certainly fine for tommorow's handhelds and today's games, but I'm skeptical it makes sense for a next-gen console coming out in 2012(?) with programmers writing shaders explicitly for it. If they were aiming for a console deal, I'd much rather expect them to be pitching a more customized core.

Also... 'maybe more PPE's and SPE's' - maybe?! Now I see what Ailuros means when he says you're vastly underestimating next-generation requirements

Mike11 · Mar 17, 2009

Arun said:
Also... 'maybe more PPE's and SPE's' - maybe?! Now I see what Ailuros means when he says you're vastly underestimating next-generation requirements

Maybe you're overestimating the amount of money Sony and the their customers will have left for a next gen console

If they have to make a console hardware that makes money from day one and have to sell it for $249 from the beginning and already in 2011 then I wouldn't reach for the stars performance-wise. But if the economy turns around quickly and "hardcore" console start selling better then the Wii (like right now with the PS3 in Japan) then maybe we will see a new hardware monster in 2012

Ailuros · Mar 17, 2009

Panajev2001a said:
Ailuros,

I might be underestimating things... but with those specs, assuming they are meaningful, we are talking about several times the speed of RSX.

1.33 Billion Polygons/s (RSX's Set-up engine is limited to 0.25 Billion Polygons)

32 GPixels/s (RSX hits around 4 GPixels/s... 8 ROPS, ~500 MHz)... take away the 2.5-3x overdraw factor IMG usually mentions in their figures and you are still left with 10.67-12.8 GPixels/s of fill-rate which again is more than 2x the fill-rate of RSX.

I think you still didn't fully understand what I meant; SGX was built for specific markets and definitely not for anything high end. SGX-MP is IMHLO aiming for up to midrange notebooks and that's its current finish line. SGX is SoC material and not a high end GPU. If they'd have the resources and interested partners in something like that they would build a high end GPU and not cluster hypothetically an insane amount of cores together (way higher than just 16) to reach the performance manufacturers want for next generation consoles.

Still, I think and hope PS4's GPU will come from nVIDIA (just that it is better integrated with the system than RSX was) and its CPU will be a refinement of the current CELL CPU (higher clock-speed, maybe more LS per SPE, enhancements to the PPE core and SPE core, maybe more PPE's and SPE's)... as well as a big focus on lots and lots of fast RAM given to the system. I want PS4's SDK and OS to be ready early, to be refined and optimized, and for developers to move with the least possible pain from PS3 to PS4 while having major issues they faced with PS3 fixed.

So let's come back to the theoretical number crunching and not use the unknown variable GT3x0, but the currently known and existing GT2x0 design.

You've got in the latter 80 TMUs and 32 ROPs running at at least 600MHz and you get from the 10 clusters a theoretical maximum of over 900GFLOPs/s. Assuming something like 45 or 40nm is being used for something like that the power portofolio of the GPU would be anything but extreme.

Now scratch all that and start going into the speculative number crunching of what a GT3x0 derivative could sport exactly.

tangey · Mar 17, 2009

IMG are presenting their multi-core solutions ar the Multicore Expo today and tomorrow and giving a couple of keynote speeches

Dunno if the keynote transcripts will be made available.
http://www.multicore-expo.com/

"Scaling the User Experience: How Multicore-Based Parallelism Ensures an Ever More Inspiring user experience", by Tony King-Smith IMG.

"Achieving Near Linear Performance Scalability with Multiprocessing for Graphics", by Peter McGuiness IMG

Entropy · Mar 17, 2009

Ailuros said:
You've got in the latter 80 TMUs and 32 ROPs running at at least 600MHz and you get from the 10 clusters a theoretical maximum of over 900GFLOPs/s. Assuming something like 45 or 40nm is being used for something like that the power portofolio of the GPU would be anything but extreme.

Extreme compared to the RSX - well, much higher for sure.
Extreme compared to the Wii Hollywood GPU - YES!

Assuming that the console manufacturers, even for stationary consoles, will follow the desktop high-end in power consumption seems foolhardy. High power consumption was the underlying cause of Microsofts RRoD problem that cost them staggering amounts of money and goodwill. And the PS3 lost a lot of initial BluRay related purchases due to its high level of noise when active (and still does to some extent). Not to mention that the Wii, going full blast draws less than 20W everything active and including power supply losses. And the Wii is the market leader by far.

If any of the console players would go for IMG technology for a stationary device, Nintendo would be it. They are probably going to have to make a significant shift in architecture no matter what and going SoC would seem to fit their modus operandi. Not terribly interested in GPGPU or HPC applications, Nintendo. Avoiding the gate cost of those kinds of features seems like a good move for a games device.

Ailuros · Mar 17, 2009

Are we talking about handheld consoles or high end consoles here? If it's the first I'm willing to bet NINTENDO has a closed deal with NV for it's Tegra IP and SONY for its PSP2 or whatever it's going to be called. And that has absolutely nothing to do IMHLO with gate count or die area since I'd say that SGX should have at least an advantage when it comes to die area. Most likely just more attractive pricing in the first case.

I haven't the vaguest idea about high end consoles yet, because I'd say that none of them has been closed yet. The only IHV I'd dare to say is out of the equasion is Intel due to the extreme higher consumption for the performance level manufacturers are asking for.

In retrospect though considering the above I'd say you might want to rethink a few preliminary conclusions, since at least for the handheld console market Nintendo has picked the bigger die and SONY the highest performance/Watt ratio.

PowerVR SGX543

Arun

Unknown.

Ailuros

Epsilon plus three

CarstenS

Moderator

Ailuros

Epsilon plus three

CarstenS

Moderator

darkblu

CarstenS

Moderator

tangey

Panajev2001a

argor

tangey

Entropy

Ailuros

Epsilon plus three

Panajev2001a

Arun

Unknown.

Mike11

Ailuros

Epsilon plus three

tangey

Entropy

Ailuros

Epsilon plus three

Similar threads