Xbox 360 1 Teraflop of Performance Explained

Alpha_Spartan · Jul 29, 2005

Jaws said:
Alpha_Spartan said:

Okay, so now let's talk about how Sony attained their 2TFLOP figure.

Click to expand...

See my link above...

I saw it, I just want to see the math.

Alpha_Spartan · Jul 29, 2005

This thread is boring guys because you guys are either burnt-out or lazy, so I thought I'd add some spice. Remember this beaut?

Now do some math, I'll work on it too, but it might take me a while.

Titanio · Jul 29, 2005

Alpha_Spartan said:
Jaws said:

Alpha_Spartan said:

Okay, so now let's talk about how Sony attained their 2TFLOP figure.

Click to expand...

See my link above...

Click to expand...

I saw it, I just want to see the math.

There's going to be a little variation on what people might explain to you (i.e. with regard to GPU's pixel shaders). But one possible conservative stab at it (same format as with the X360 slide):

CPU:
- 12 flops / clock cycle (PPE) + 8 flops / clock cycle (SPEs)
- 12 x 3.2 Ghz x 1 = 38.4 Gflops + 8 x 3.2 Ghz x 7 = 179.2 = 217.6 Gflops

GPU:
Vertex Shaders: 10 flops / clock cycle
- 10 x 550 Mhz x 8 Vertex Shaders = 44 GFlops

Pixel Shaders: 16 flops / clock cycle (only counting main ALUs)
- 16 x 550Mhz x 24 Pixel Shaders = 211.2 Gflops

Total Programmable flops: 255.2

Non Programable GFlops = 1544.8

Total = 2071.6 GFlops = 2 TFlop

If you were to count the mini ALUs in the Pixel shaders, that would boost the Pixel Shader flop rating to 20 flops / clock cycle (264 Gflops total for pixel shaders, 308 programmable Gflops total for RSX).

If you were to count the mini ALUs and the 16-bit normalise, that would boost the Pixel Shader flop rating to 27 flops / clock cycle (356.4 Gflops total for pixel shaders, 404 programmable Gflops total for RSX).

This is based on PC Watch Impress coverage of G70 and RSX (http://pc.watch.impress.co.jp/docs/2005/0701/kaigai195.htm)

Alpha_Spartan · Jul 29, 2005

The constant in all of this seems to be the non-programmable FLOPS. How is that derived?

Secondly, I can see how C1's ProgFLOPS rating is clear cut because theoretically all 48 shaders can EITHER be processing all pixels or all vertices on a given clock.

However, with the RSX am I right to conclude that it can do vertex AND pixel shading on the same clock?

Titanio · Jul 29, 2005

Alpha_Spartan said:
The constant in all of this seems to be the non-programmable FLOPS. How is that derived?

The total figure claimed minus programmable flops

Arty · Jul 29, 2005

Titanio said:
Alpha_Spartan said:

The constant in all of this seems to be the non-programmable FLOPS. How is that derived?

Click to expand...

The total figure claimed minus programmable flops

Aye, so we are trusting Microsoft & Sony more than our math.

mckmas8808 · Jul 29, 2005

So if the PS3 has a 2 to 1 programmable flops advantage and since people still say that there will be no visual difference; then why do these companies release the flops numbers?

Do they really mean nothing or are people not giving credit to something? I don't know that's why I'm asking.

3roxor · Jul 29, 2005

The xbox360 and PS3 got atleast 1860 Giga Flops.

Alpha_Spartan · Jul 29, 2005

Titanio said:
Alpha_Spartan said:

The constant in all of this seems to be the non-programmable FLOPS. How is that derived?

Click to expand...

The total figure claimed minus programmable flops

LOL!!! NO!!!!!!!! *head explodes*

Alpha_Spartan · Jul 29, 2005

mckmas8808 said:
So if the PS3 has a 2 to 1 programmable flops advantage and since people still say that there will be no visual difference; then why do these companies release the flops numbers?

Do they really mean nothing or are people not giving credit to something? I don't know that's why I'm asking.

It seems as if the RSX is able to do vertex and pixel operations on the same clock.

Titanio · Jul 29, 2005

serenity said:
Aye, so we are trusting Microsoft & Sony more than our math.

Exactly. It's pretty much impossible to derive the "non-programmable flops" figures.

mckmas8808 said:
So if the PS3 has a 2 to 1 programmable flops advantage and since people still say that there will be no visual difference; then why do these companies release the flops numbers?

It's extraordinary how focussed this industry is on technical specs. When a new Microwave gets announced, I doubt they go into the little nuts and bolts of its performance and how it works

It's just the way it is really..there's always a "numbers game". Flops happens to be a standard metric for better or worse.

mckmas8808 said:
Do they really mean nothing or are people not giving credit to something? I don't know that's why I'm asking.

I guess wait and see.

Alpha_Spartan said:
It seems as if the RSX is able to do vertex and pixel operations on the same clock.

Every GPU with discrete vertex and pixel units can. Vertex and pixel shaders operate independently and in parallel.

Jawed · Jul 29, 2005

Alpha_Spartan said:
Secondly, I can see how C1's ProgFLOPS rating is clear cut because theoretically all 48 shaders can EITHER be processing all pixels or all vertices on a given clock.

No, there are 64 threads concurrently running on Xenos (48 shader, 16 texture) in any combination of vertex or fragment (pixel) shading - the threads are batched as blocks of 16, vertex-or-fragment:

4-0
3-1
2-2
1-3
0-4

Jawed

dukmahsik · Jul 29, 2005

didnt dave say the 64 thread amount was incorrect?

Titanio · Jul 29, 2005

Jawed said:
Alpha_Spartan said:

Secondly, I can see how C1's ProgFLOPS rating is clear cut because theoretically all 48 shaders can EITHER be processing all pixels or all vertices on a given clock.

Click to expand...

No, there are 64 threads concurrently running on Xenos (48 shader, 16 texture) in any combination of vertex or fragment (pixel) shading - the threads are batched as blocks of 16, vertex-or-fragment:

4-0
3-1
2-2
1-3
0-4

Jawed

I'm VERY confused. I thought the ALUs were allocated to vertex or pixel work, ALU by ALU? If it's done in batches of 16, some may lie idle in a clock?

Carl B · Jul 29, 2005

It's definitely in blocks of 16 - there have been quite the debates on this before. Each 'block' of ALU's, of which there are three (obviously), have to have their ALU's doing the same type of work at any given time.

Jawed · Jul 29, 2005

dukmahsik said:
didnt dave say the 64 thread amount was incorrect?

There's a lack of clarity here, if I remember correctly. There's the capability to perform 32 texture operations concurrently (16 filtered and 16 point-sampled), and I think it's that that lies at the heart of the doubt.

To be honest I don't know what Dave said, just a vague memory - the search facility is terrible on this forum. It's not clear, for example, if the point-sampling operations are actually operating in a pipeline, or if it simply consists of requesting texture data - the thread is then put aside until that texture data is ready in the cache. So the actual point-sampled texture fetch never executes in a "texture pipeline".

So, 64 concurrent threads is the minimum as far as I can tell.

Jawed

Megadrive1988 · Jul 29, 2005

so that explains the 64 threads at once thing

48 shader ALUs plus 16 texture units.

Jawed · Jul 29, 2005

Titanio said:
I'm VERY confused. I thought the ALUs were allocated to vertex or pixel work, ALU by ALU? If it's done in batches of 16, some may lie idle in a clock?

The GPU always creates 16-sized groups of vertices or pixels to work on. That's the smallest unit for the purposes of load-balancing. If a triangle is smaller than 16 pixels then Xenos will find another triangle to make up the numbers.

When shading triangle-edges some pixels around the edge of the triangle are effectively "empty". GPUs work in 4s of pixels, organised as a quad: 2x2. So some ALUs will indeed be doing "nothing" (they'll be running code but the results are junked). This is a common problem to both earlier GPU designs (i.e. RSX) and Xenos.

Because all GPUs shade pixels in groups of 4, the edge of triangles problem is the same whether the ALUs are grouped in 16s or 4s.

It's quite hard to work out the detailed pros and cons of this 16-per-group - is that also the batch size in Xenos (no idea!)? The batch size in RSX is 1024 pixels, organised into groups of 4.

I've been rummaging for meaningful data on this subject but haven't got anything concrete so far...

Jawed

SanGreal · Jul 29, 2005

Jawed said:
To be honest I don't know what Dave said, just a vague memory - the search facility is terrible on this forum

http://www.beyond3d.com/forum/viewtopic.php?p=548680#548680
http://www.beyond3d.com/forum/viewtopic.php?p=581716#581716

For the sake of reference

Jawed · Jul 29, 2005

Well done

Sadly we don't have a definitive answer to the 64/80 question, or whether it's something else entirely (which appears to be a function of texture processing).

Jawed

Xbox 360 1 Teraflop of Performance Explained

Alpha_Spartan

Alpha_Spartan

Titanio

Alpha_Spartan

Titanio

Arty

KEPLER

mckmas8808

3roxor

Alpha_Spartan

Alpha_Spartan

Titanio

Jawed

dukmahsik

Titanio

Carl B

Friends call me xbd

Jawed

Megadrive1988

Jawed

SanGreal

Jawed

Similar threads