Nvidia explicitely talks about SFUs in their vertex shaders pipeline.Jawed said:Ah, I've never seen an SFU on an NVidia diagram before. Good thinking.
Fog is not a good candiate cause it's an operation one does once per fragment, it doesn't make sense to provide a special unit for it into the shaders ALUs, morover fog doesn't not require a special function unit to be applied as it uses just a couple of fmadd ops that NV40 pixel pipelines already provide.I suppose, alternatively, it could be the Fog ALU that you can see here:
http://www.beyond3d.com/previews/nvidia/nv40/index.php?p=9
SM3 requires that Fog is done in shader code rather than as a fixed function unit in the ROP.
Jawed
yeah, Nvidia still provides a fixed function 'fog unit' on NV40 to be used on integer render targets and/or with non SM3.0 shaders.Xmas said:That fog ALU is just a fixed point 4-component linear interpolation.
DaveBaumann said:RSX ~ 136 Shop/cycle ~ 52 Vec4 + 52 Scalar + 32 Other units
Doubtful.
nAo said:RSX: 8 VS + 24 PS
1 VS = 1 vec4 + 1 scalar ops per cycle
1 PS = 1 vec4 + 1 vec4 (with co-issue 2 vec2) + 2 scalar ops per cycle (from RSX presentation diagram, there are 2 SFU units)
2 * 8 + (1 + 2 + 2) * 24 = 136
There's no "fit" between pipelines and ROPs nowadays.Jaws said:The other question was that in the other thread, you seemed quite convinced RSX would have either 8 or 16 ROPs because of the 128 bit memory controller. Is that still a strong hunch? If so, 32 Pixel Pipes would *fit* those numbers?
I see 56 Dot/cycle too.Jaws said:I see 56 Dot/cycle which doesn't fit with the *required* 52 Dot/cycle I derived? Unless I'm missing something?
Jaws said:DaveBaumann said:RSX ~ 136 Shop/cycle ~ 52 Vec4 + 52 Scalar + 32 Other units
Doubtful.
My first impressions too.
However I wanted to check a few things. Do you have official transistor counts for Xenos? IIRC, 232 + 100 mil was floating around?
If the 232 mil for the Xenos Shader module is correct and RSX has 300 mil, then it could be feasible?
The other question was that in the other thread, you seemed quite convinced RSX would have either 8 or 16 ROPs because of the 128 bit memory controller. Is that still a strong hunch? If so, 32 Pixel Pipes would *fit* those numbers?
nAo said:RSX: 8 VS + 24 PS
1 VS = 1 vec4 + 1 scalar ops per cycle
1 PS = 1 vec4 + 1 vec4 (with co-issue 2 vec2) + 2 scalar ops per cycle (from RSX presentation diagram, there are 2 SFU units)
2 * 8 + (1 + 2 + 2) * 24 = 136
How many Dot/cycle do you count here?
I see 56 Dot/cycle which doesn't fit with the *required* 52 Dot/cycle I derived? Unless I'm missing something?
nAo said:RSX: 8 VS + 24 PS
1 VS = 1 vec4 + 1 scalar ops per cycle
1 PS = 1 vec4 + 1 vec4 (with co-issue 2 vec2) + 2 scalar ops per cycle (from RSX presentation diagram, there are 2 SFU units)
2 * 8 + (1 + 2 + 2) * 24 = 136
one dot per ALU -> 48 per cycle -> 24 GDot/spc999 said:BTW can you have fun and say to us how many dots can xenus do?
nAo said:Jaws said:if we assume RSX pixel pipelines ALUs can both co-issue 2 instructions (3-1 or 2-2)
nAo said:one dot per ALU -> 48 per cycle -> 24 GDot/spc999 said:BTW can you have fun and say to us how many dots can xenus do?
yes, 1 fmadd and 1 mul (on NV40)Panajev2001a said:Aren't both PS ALU's in each Pixel Pipeline capable of Vec4 operations ?
No, the second ALU can't do a dot4.On NV40 one has to be used to help texture fetching, but when you can co-issue you should be able to do 2 Dot4's/cycle, right ? You wrote "1 PS = 1 vec4 + 1 vec4" after-all.
Panajev2001a said:...
The fun thing would be if Jen-Hsung made a typo there .
nAo said:yes, 1 fmadd and 1 mul (on NV40)Panajev2001a said:Aren't both PS ALU's in each Pixel Pipeline capable of Vec4 operations ?
No, the second ALU can't do a dot4.On NV40 one has to be used to help texture fetching, but when you can co-issue you should be able to do 2 Dot4's/cycle, right ? You wrote "1 PS = 1 vec4 + 1 vec4" after-all.
I assumed Nvidia 'extended' the second ALU on RSX to handle dot products too.
Jaws said:Panajev2001a said:...
The fun thing would be if Jen-Hsung made a typo there .
Maybe...that would mean by leaving out the VMX, he underestimated the power of PS3!
But I think counting 7 SPUs was *intentional* as a contributer for *shader* ops because I speculated last year that SPUs may run Cg *shaders*. If the do then, by excluding the VMX unit, it's an accurate metric and a true reflection of it's purpose!