PS3 vs X360: Apples to Apples high level comparison...

RSX: 8 VS + 24 PS

1 VS = 1 vec4 + 1 scalar ops per cycle
1 PS = 1 vec4 + 1 vec4 (with co-issue 2 vec2) + 2 scalar ops per cycle (from RSX presentation diagram, there are 2 SFU units)

2 * 8 + (1 + 2 + 2) * 24 = 136 ;)
 
Actually those SFUs look like 2 Vec4 units, each SFU is a stack of 4 "planes", just like each Vector ALU.

I wonder if Dave knows what SFU means (apart from "special function unit")?

Jawed
 
SFUs are special function units, that is. SFUs do simple and complex scalar ops suchs as reciprocal, exp, log, sin, etc..
NV40 has SFUs too and those 'things' stacked upon eacht other are just pixel pipelines, imho.
 
Jawed said:
Ah, I've never seen an SFU on an NVidia diagram before. Good thinking.
Nvidia explicitely talks about SFUs in their vertex shaders pipeline.
NV40 pixel pipelines can also perform a reciprocal and a normalization at the same time.

I suppose, alternatively, it could be the Fog ALU that you can see here:

http://www.beyond3d.com/previews/nvidia/nv40/index.php?p=9

SM3 requires that Fog is done in shader code rather than as a fixed function unit in the ROP.

Jawed
Fog is not a good candiate cause it's an operation one does once per fragment, it doesn't make sense to provide a special unit for it into the shaders ALUs, morover fog doesn't not require a special function unit to be applied as it uses just a couple of fmadd ops that NV40 pixel pipelines already provide.
 
Xmas said:
That fog ALU is just a fixed point 4-component linear interpolation.
yeah, Nvidia still provides a fixed function 'fog unit' on NV40 to be used on integer render targets and/or with non SM3.0 shaders.
 
DaveBaumann said:
RSX ~ 136 Shop/cycle ~ 52 Vec4 + 52 Scalar + 32 Other units

Doubtful.

My first impressions too.

However I wanted to check a few things. Do you have official transistor counts for Xenos? IIRC, 232 + 100 mil was floating around?

If the 232 mil for the Xenos Shader module is correct and RSX has 300 mil, then it could be feasible?

The other question was that in the other thread, you seemed quite convinced RSX would have either 8 or 16 ROPs because of the 128 bit memory controller. Is that still a strong hunch? If so, 32 Pixel Pipes would *fit* those numbers?

nAo said:
RSX: 8 VS + 24 PS

1 VS = 1 vec4 + 1 scalar ops per cycle
1 PS = 1 vec4 + 1 vec4 (with co-issue 2 vec2) + 2 scalar ops per cycle (from RSX presentation diagram, there are 2 SFU units)

2 * 8 + (1 + 2 + 2) * 24 = 136 ;)

How many Dot/cycle do you count here?

I see 56 Dot/cycle which doesn't fit with the *required* 52 Dot/cycle I derived? Unless I'm missing something? :)
 
Jaws said:
The other question was that in the other thread, you seemed quite convinced RSX would have either 8 or 16 ROPs because of the 128 bit memory controller. Is that still a strong hunch? If so, 32 Pixel Pipes would *fit* those numbers?
There's no "fit" between pipelines and ROPs nowadays.

Just like Xenos seemingly only has 8 ROPs, but has "48 pixel pipelines".

http://www.beyond3d.com/forum/viewtopic.php?t=23450

Jawed
 
Jaws said:
I see 56 Dot/cycle which doesn't fit with the *required* 52 Dot/cycle I derived? Unless I'm missing something? :)
I see 56 Dot/cycle too.
I know it doesn't fit with the 51 Gdot/s figure for full system performance but..even 52 Dot products/cycle are too many:

if we assume RSX pixel pipelines ALUs can both co-issue 2 instructions (3-1 or 2-2) as NV40 ALUs and we assume RSX has 8 VS and 20 PS we have:
8*2 + 6*20 = 136 ops per clock cycle

CELL -> 1 Dot (PPE) + 7 Dot (SPE) = 8 Dot per clock cycle -> 25.6 GDot/s
RSX -> 8 Dot (VS) + 40 Dot(PS) = 48 Dot per clock cycle -> 26.5 GDot/s

Total: 52.1 GDot/s

I'm obviously having fun here, it's a divertissement so don't take this stuff too seriously.
 
Jaws said:
DaveBaumann said:
RSX ~ 136 Shop/cycle ~ 52 Vec4 + 52 Scalar + 32 Other units

Doubtful.

My first impressions too.

However I wanted to check a few things. Do you have official transistor counts for Xenos? IIRC, 232 + 100 mil was floating around?

If the 232 mil for the Xenos Shader module is correct and RSX has 300 mil, then it could be feasible?

The other question was that in the other thread, you seemed quite convinced RSX would have either 8 or 16 ROPs because of the 128 bit memory controller. Is that still a strong hunch? If so, 32 Pixel Pipes would *fit* those numbers?

nAo said:
RSX: 8 VS + 24 PS

1 VS = 1 vec4 + 1 scalar ops per cycle
1 PS = 1 vec4 + 1 vec4 (with co-issue 2 vec2) + 2 scalar ops per cycle (from RSX presentation diagram, there are 2 SFU units)

2 * 8 + (1 + 2 + 2) * 24 = 136 ;)

How many Dot/cycle do you count here?

I see 56 Dot/cycle which doesn't fit with the *required* 52 Dot/cycle I derived? Unless I'm missing something? :)

Let me run through the math, just for fun.

The slide said 51 Billion Dot Products/s.

I see 8 Dot4 from the VS ALU's and 48 Dot4 from the PS ALU's: this means 30.8 GDot4/s at 550 MHz.

The CPU can do, with the 7 SPE's, 22.4 GDot4/s at 3.2 GHz (4 Dot4's every 4 cycles on each SPE).

This would mean 53.2 GDot4's/s which is a bit higher than the number they posted and we have not taken into account the VMX unit of the PPE which can provide an additional 3.2 GDot4's/s (same peak performance as the SPE's) which would bring the total for the Broadband Engine to 25.6 GDot4/s.

The GPU should then only push, approximately, 25.4 GDot4/s (taking the PPE's VMX unit into account when finding the peak value of Dot4's/s for the CPU) or 28.6 GDot4's/s (without taking the PPE's VMX unit into account when finding the peak value of Dot4's/s for the CPU). At 550 MHz this means a Dot4's/cycle count of ~46-52 Dot4's/cycle as Jaws said. No buddy, you did not miss anything.

So, we have to map 52 Dot4's/cycle to a structure which at a first look would provide 56 Dot4's cycle or in other words map 25.4 GDot4's/s to an architecture which should push 30.8 GDot4's/s by looking at what nAo posted which I will re-quote here for the reader's viewing pleasure ;).

nAo said:
RSX: 8 VS + 24 PS

1 VS = 1 vec4 + 1 scalar ops per cycle
1 PS = 1 vec4 + 1 vec4 (with co-issue 2 vec2) + 2 scalar ops per cycle (from RSX presentation diagram, there are 2 SFU units)

2 * 8 + (1 + 2 + 2) * 24 = 136 ;)

From the PS ALU's we should count 52 Dot4's/cycle - 8 Dot4's/cycle (VS ALU's) = 44 Dot4's/cycle, but instead it seems that we should count 48 Dot4's/cycle.

Uhm... lots of thinking to be done.

The fun thing would be if Jen-Hsung made a typo there ;).
 
nAo said:
Jaws said:
if we assume RSX pixel pipelines ALUs can both co-issue 2 instructions (3-1 or 2-2)

Edit: I guess you might want to go on this 3-1 or 2-2 business again. This is related to Shader ops count right ?

Each Vec4 ALU in the PS ALU complex would do 2 shader ops peak and then you have 1 shader op from each of the two SFU's for a total of 6 * Pixel Pipelines count/cycle.
 
Panajev2001a said:
Aren't both PS ALU's in each Pixel Pipeline capable of Vec4 operations ?
yes, 1 fmadd and 1 mul (on NV40)
On NV40 one has to be used to help texture fetching, but when you can co-issue you should be able to do 2 Dot4's/cycle, right ? You wrote "1 PS = 1 vec4 + 1 vec4" after-all.
No, the second ALU can't do a dot4.
I assumed Nvidia 'extended' the second ALU on RSX to handle dot products too.
 
Panajev2001a said:
...
The fun thing would be if Jen-Hsung made a typo there ;).

Maybe...that would mean by leaving out the VMX, he underestimated the power of PS3! :p

But I think counting 7 SPUs was *intentional* as a contributer for *shader* ops because I speculated last year that SPUs may run Cg *shaders*. If the do then, by excluding the VMX unit, it's an accurate metric and a true reflection of it's purpose! ;)
 
nAo said:
Panajev2001a said:
Aren't both PS ALU's in each Pixel Pipeline capable of Vec4 operations ?
yes, 1 fmadd and 1 mul (on NV40)
On NV40 one has to be used to help texture fetching, but when you can co-issue you should be able to do 2 Dot4's/cycle, right ? You wrote "1 PS = 1 vec4 + 1 vec4" after-all.
No, the second ALU can't do a dot4.
I assumed Nvidia 'extended' the second ALU on RSX to handle dot products too.

Ok, fair assumption :D.
 
Jaws said:
Panajev2001a said:
...
The fun thing would be if Jen-Hsung made a typo there ;).

Maybe...that would mean by leaving out the VMX, he underestimated the power of PS3! :p

But I think counting 7 SPUs was *intentional* as a contributer for *shader* ops because I speculated last year that SPUs may run Cg *shaders*. If the do then, by excluding the VMX unit, it's an accurate metric and a true reflection of it's purpose! ;)

True ;).

I do not think you would be barred from running a Cg shader on the PPE IMHO though.
 
Back
Top