Xbox 360 graphics processor codenamed: XENOS. some details

DerekBaker said:
wireframe said:
tEd said:
pc999 said:
I had posted this on the other thread but once you are talking about it...

Anyone wants to coment this

So yes, if programmed for correctly, the Xbox 360 GPU is capable of 96 billion shader operations per second.

http://www.hardocp.com/article.html?art=NzcxLDM=

...yes it's the same shit as nvidias 136 shader ops/sec

If he at least would have explained how they count the shader ops but no , just a stinky number

I think HardOCP mixed this one up and the number of shader ops the C1/R500/Xenos/<insert interesting name here> can do is 48 Billion per second. Let me explain why.

48 Billion ops per second is what was first reported. However, if anyone took the time to compute the number of ALUs times cycles they would have arrived at a number of 48*500MHz= 24 Billion ALU ops per second. I think what ATI tried to clarify is that each of the 48 parallel processing units is capable of two ops per cycle. This gives us 48*2*500MHz= 48 billion ops per second.

Of course I could be wrong and the total number of ops just doubled from what was known before. This would be significant because it would make the C1/R500 a theoretically more capable shader processor than Nvidia's/Sony's RSX.

But I doubt they would give out the wrong number (48 billion) by mistake like that. Would be a major cock-up.

Just want to sneak in a comment that these theoretical numbers may be very bad for comparing the two competitors. The same goes for their CPUs. Not only are these theoretical maximum performance numbers and what really matters with something like a console is how close to the optimum you can operate. Look at the Pentium 4 as an extreme example. It has very high theoreticals and can even back them up in specialized benchmarking tests, but when multiple types of code/data need to be processed and the CPU cannot dedicate itself to one task, look what happens. Performance plummets and a design like the Athlon 64 keeps on trucking. It will be very interesting to see how close these machines are and how close the software will be (noting that software will probably look and play very similarly unless one machine offers something substantial above the others).

'On chip, the shaders are organized in three SIMD engines with 16 processors per unit, for a total of 48 shaders. Each of these shaders is comprised of four [my bold] ALUs that can execute a single operation per cycle, so that each shader unit can execute four floating-point ops per cycle.'

http://techreport.com/etc/2005q2/xbox360-gpu/index.x?pg=1
Yes, but don't fmads count as 2 ops. Shouldn't it be something like 48 ALUs*4 FMAD units each*2 Ops per clock per FMAD per clock *500Mhz = 192 Billion floating-point ops per sec.

Executed instructions per second would seem to provide a different measure of performance since an FMAD is only equivalent to 1 instruction but 2 ops. If this is the case, R500 would be capable of 48 ALUs*4 FMAD units each*1 instruction executed/issued per FMAD per clock *500Mhz = 96 Billion floating-point instructions per sec (executed).

The max number of issued instructions per second would seem to be equal to 48 ALUs* [1 Vec3 + 1 Scalar instruction per clock]* 500Mhz = 48 Billion floating-point instructions per sec (issued).
 
Jawed said:
Xmas said:
Jawed said:
The upshot of all this should be that R500 runs every resource (ALU and TMU) at close to 100% utilisation, i.e. the peak-rate is real world.
But rarely for both of them at the same time.
Eh? The multi-threading patent is all about sending one command thread to the texture unit while, at the same time, sending another command thread to the ALU unit.

Not only that but a primary objective of the design is to perform unlimited dependent texturing.

Jawed
I guess my wording wasn't particularly good. What I meant is that you can't get full utilization of both unless the ratio of texops (times the average number of cycles a texture fetch takes) to ALU ops overall (or at least averaged over 64 threads) equals the ratio of TMUs to ALUs. You obviously can't issue texture ops when there are none.
 
Xmas said:
I guess my wording wasn't particularly good. What I meant is that you can't get full utilization of both unless the ratio of texops (times the average number of cycles a texture fetch takes) to ALU ops overall (or at least averaged over 64 threads) equals the ratio of TMUs to ALUs. You obviously can't issue texture ops when there are none.
Agreed.

I'm presuming that more and more algorithms will be making use of either precomputed lighting stored in 3D textures or lighting that is computed into a texture which is then re-used. I can't program shaders, so I'm speaking out of turn really.

But I do think it's important to note that high-performance dependent texturing is a key point of the patent.

Right now there's confusion over the texturing capabilities of R500 (4 TMUs or 16?).

Jawed
 
Back
Top