Sony's Next Generation Portable unveiling - PSP2 in disguise

I don't mind people contacting me via skype (although I've decided to remove that contact method). It's contacting me as a moderator via skype that I do not appreciate. I should have been clearer.

Yes, designs do improve in efficiency. There is no question about that, however you are still limited by the number of transistors you have - and how fast they run. Once again, it's where you spend your performance. There is no getting away from that.
Both Cell and RSX are brute force computational beasts, they will no doubt be much larger and run much faster than what sony puts in the NGP. Will they make as optimal use of those transistors? probably not. Will they obliterate it with sheer overwhelming brute force? Probably.

A quick check of google suggests a quad core A9 will be around 10-15mm^2. Cell is over 10x that if memory serves. And obviously the clock speeds are wildly different. (Someone correct me because I'm probably wrong - I'm not a hardware guy). I haven't even touched on memory bandwidth.

In any case, it becomes abundantly clear that performance parity just isn't in the realms of our futile reality.
 
Going by the power consumption numbers for OMAP3, the A9 quad set-up wouldn't leave much headroom on the current process to clock the 543MP4+ much beyond the levels of 543MP phones.
 
Please do not ban me or delete this post.

The SGX 543 has the following capacity at 200MHZ.

http://www.imgtec.com/news/Release/index.asp?NewsID=428

"The new POWERVR SGX543 delivers real-world performance of 35 million polygons/sec and 1 Gpixels/sec fillrate at 200MHz"

The NGP is claimed to use a quad core POWERVR SGX 543. This chip can be clocked up to 400MHZ utilizing the 65nm process. It is likely Sony may wait and try to use a smaller process. This would make speeds of 400MHZ and even faster possible. At 400MHZ this chip would be capable of a fill rate of 8gigapixels and 280m polygons/sec.

This surpasses the specs of the PS3 which is claimed to have a fill rate of 4.4gigapixels and 250 million vertex setup limit.

There are several problems with these comparisons.

1) The cited fillrate for RSX is the pure number of pixels it can output per cycle of 8; if the clock is really 500MHz that makes it 4GPixel/s. But SGX543's fillrates take the raw pixel output rate of 2/clock and multiply it by a 2.5x overdraw factor to get 1GPixel/s instead of 400MPixel/s. If the scene has exactly 2.5x inefficiency due to overdraw then this is valid, but only when comparing to a platform that has no hidden surface removal of its own. In fact, I do believe RSX has early-Z and hierarchical-Z which makes this comparison unfair.

2) You assume that clocks will be at least 400MHz, but we don't know how the core scaling to MP4 affects clock scaling and word right now is 45nm where we haven't seen any other single core SGX come anywhere close to 400MHz. And comparing clock speed to node size between the two is pointless, they're completely different architectures with completely different limitations with respect to frequency.

3) Even if a single SGX543 core can get 35MTri/s at 200MHz it doesn't mean this number will scale perfectly with core count and clock. Chances are at the rates you're talking about you won't even have the bandwidth to service that kind of geometry, even if NGP has dedicated VRAM (256-bit memory buses draw a lot of power)

The most important consideration here is that despite marketing GPU power isn't just about geometry setup and fillrate. Your comparisons completely ignore TMU and ALU capabilities - RSX has 24 TMUs vs 8 on SGX543 MP4 and a much higher theoretical FLOP peak. Also bear in mind that RSX's non-unified shaders make your best case comparisons on SGX very unfair compared to real world on RSX. For instance, if you max out vertex shading on SGX you'll have nothing left to shade pixels and vice-versa. You should consider instead some kind of realistic vertex to fragment processing ratio.

Could it be the NEON and FPU parts of the A9 are horrible?

Compared to what, an array of 6 Cell SPUs? Of course they'll be less powerful. Looking at the scalar FPU is pretty redundant when considering power, NEON supersedes it (unless they aren't using NEON)

NEON is okay for a small SIMD, it's not supposed to be a powerhouse. Something like PSP's VFPU is more powerful per clock.
 
Please do not ban me or delete this post.

The SGX 543 has the following capacity at 200MHZ.

http://www.imgtec.com/news/Release/index.asp?NewsID=428

"The new POWERVR SGX543 delivers real-world performance of 35 million polygons/sec and 1 Gpixels/sec fillrate at 200MHz"

The NGP is claimed to use a quad core POWERVR SGX 543. This chip can be clocked up to 400MHZ utilizing the 65nm process. It is likely Sony may wait and try to use a smaller process. This would make speeds of 400MHZ and even faster possible. At 400MHZ this chip would be capable of a fill rate of 8gigapixels and 280m polygons/sec.

An interview with an IMG exec (pete McGuniess) indicated that the PSP2 would be approx 8 times faster graphically than the galaxy phone. That phone uses an SGX540 @ 200Mhz. Most here suggest a single SGX543 will be about x2 performance of the SGX540. x8 overall for a system using SGX543MP4 would point to it using a similar clock as the galaxy.
 
NEON is okay for a small SIMD, it's not supposed to be a powerhouse. Something like PSP's VFPU is more powerful per clock.
I wonder if there's any chance Sony designed a custom implementation of the Cortex A9 with a 128-bit NEON like what Snapdragon has instead of the stock 64-bit implementation? That should help things along in the SIMD department.
 
I wonder if there's any chance Sony designed a custom implementation of the Cortex A9 with a 128-bit NEON like what Snapdragon has instead of the stock 64-bit implementation? That should help things along in the SIMD department.

You can't design a custom Cortex-A9 in terms of functionality; you can either use a Cortex-A9 or do your own ARM CPU. Scorpion is a custom ARM CPU, not a modified Cortex-A8. It's pretty certain that Sony is using a Cortex-A9 and not a custom CPU.

Since the MPE on Cortex-A9 is both modular and optional it might be possible for a third party to do their own version of it, but I'm skeptical of this because the interface is probably proprietary and the Cortex-A9 core could make some assumptions about how the MPE operates which could include its issue rates. At any rate, Sony (or whoever made it) would possibly still need an architecture license since it'd still implement part of ARMv7a.

Sony also could have included a separate VFPU entirely that has nothing to do with NEON, but it'd have to be implemented over MMIO or something and chances are it'd suffer for it since it'd be behind the L2 interface of the CPU core.
 
I got here a bit late but anyway...

I hope with all this tech they can play ps2 games on the hardware now cause I don't understand why Sony has to keep making new or using weird processors every generation. Why not just use the same processor in the ps3?

They really need to start thinking about code portability and stop jumping on every new fangled piece of hardware that they think will give them a 1up. I'd prefer they make their own hardware and stick with it. I bet its going to be a whole new world of hurt to code for this thing.
 
A smaller version of it. I mean with all the advances they should be able make it by now. I mean its been like 4 years since the ps3 was released.
 
... because a smaller version of PS3 would still be monolithic in die size, power consumption and run too hot. Each individual component has to be designed to be scaled to sub watt power and tiny die sizes from day one.

Otherwise AMD and Intel could just shrink the K8 and Core 2 Duo and put them in mobile phones. Need a GPU? Just recycle the Radeon 3670/Geforce 9600GT, all you need to do is shrink it.
 
And yet the most advanced version of the PS3 that's about 1 year old still uses around or over 100 watts. Now imagine the runtime of that using portable batteries.

No amount of wishing, hoping, doping, or dropping by you or anyone else will allow a portable PSP2/NGP to be as powerful or more powerful than the PS3.
 
And yet the most advanced version of the PS3 that's about 1 year old still uses around or over 100 watts. Now imagine the runtime of that using portable batteries.

No amount of wishing, hoping, doping, or dropping by you or anyone else will allow a portable PSP2/NGP to be as powerful or more powerful than the PS3.

You could perhaps get around the core GPU capabilities while still consuming many times less power. Simply because GPU tasks are so parallel and you can get scale up a lot of relatively slow cores within a similar area budget the (originally) much larger PS3 GPU had. But you wouldn't have the power for the VRAM bandwidth.

On the other hand, there's no way you could approach Cell's power because a bunch of slower cores won't give you the same operational latency as the PPE or SPUs. There's a reason these things are clocked at 3.2GHz, and you don't get to that clock speed without paying for it in power consumption. Both for the frequency scaling itself and for the more and wider pipeline stages.

Of course, wanting the same architecture as PS3 "shrunk" is just nuts, and not how things work.. really not how things can work. PS3 had to choose smaller and faster over power efficient when it was designed.
 
And yet the most advanced version of the PS3 that's about 1 year old still uses around or over 100 watts. Now imagine the runtime of that using portable batteries.

Actually that's 60 watts now.

No amount of wishing, hoping, doping, or dropping by you or anyone else will allow a portable PSP2/NGP to be as powerful or more powerful than the PS3.

That doesn't necessarily make sense either though - obviously it will be possible eventually, as history shows us all these things eventually scale down this much. The PSP at the time got plenty close enough to the PS2, and people were similarly skeptical at the time.

And obviously shrinking the PS3 isn't the same as making a low powered device that can do the same with less power.

Anyway, I was looking at the specs, and noticed the following:

133 MPolygon/s
4 GPixel/s fill rate

Do I read this correct that a 4GPixel/s fill rate means that at 30fps, you can draw the (960x544) screen about 255 times per frame?

Also, 960 × 544 (522240) means roughly half the pixels of the PS3 for 1280x720p (921600), right?

Also I've briefly scurried over Neon and FPU stuff for the A9, but it's hard to find any kind of FLOPS numbers.

So the NGP will never run the same workload as the PS3. It will also not have any games with the size of 25GB or more, I reckon, and it may well not support surround sound either.

The question though souldn't be if the NGP is going to be more powerful than the PS3 (though assuming it has unified memory and that is at least 512Mb, some things are going to be easier), but if it is powerful enough to run mostly the same games using the same or almost the same assets. And that seems to be possible to a fairly large extent.

Look forward to GDC, where I'm sure we'll hear more about this.
 
A factor of 2.5 for the average number of pixels drawn at each point on the screen (scene complexity) is multiplied into that 4G pix/sec fillrate. PowerVR only draws visible pixels, practically/ideally.

4 cores * 2 TMUs/core * 200 MHz (rumored) * 2.5 scene complexity = 4 Gpix/sec
 
Last edited by a moderator:
I tried to understand your example - do you mind if I attempt to rephrase it?

I assume we have N Processing Units (PUs), each of which have an "A" unit and a "B" unit. We also have a mixture of processes, M, which have some mixture of "a" work and "b" work which use A and B units respectively.

Basically, you are saying that there is a worst case scenario where we have a set of processes {Ma} that is almost exclusively type "a" work that all gets sent to one subset, {Na}, of the N PUs, and another set of processes, {Mb} that is almost all "b" work, that gets assigned to set {Nb}. In the case of {Na}, the B units are sitting idle and, vice versa, for {Nb}.

Is that an accurate summary?

In the first instance, SGX has multiple threads per USSE so it seems to me that it would be very bad luck that one unit would exclusively get items from either {Ma} or {Mb}.

Secondly, if one only had a single core, surely it will suffer from exactly the same situation.
Sorry, Simon, I just noticed the part of you post that was referring to me. You understood it correctly, but regarding the first instance, the nature of threads a unit gets is entirely locality based, so it's not a matter of bad luck as it is of unfavorable scene content. If a core/tile works on a given locality, which happens to be predominantly of {Ma} then the core will not get any {Mb} work until the tile is done.

The single core case can be susceptible to similar hurdles or not, depending on how many threads the core can have in flight. The difference with the MP tiled case, is that in the latter having as many threads in flight is not as helpful, as those threads have rigid resource affinities - a thread cannot jump across cores for the duration of its host tile processing.

Anyway, I already took John's reply for a conclusion to the subject, as I originally expected what he said to be the case.
 
Back
Top