Typical GPU Efficiency

KimB · Jun 12, 2005

WaltC said:
Good point. I think he's talking about PC scenarios like running Word or a browser versus running a demanding 3d game, etc. In a PC the chips are always running at 100% peak, but sometimes the programs require no more than 50-60% of a given chip's resources--hence he says "efficiency."

Well, not really. One can see where he's coming from when you just consider a game that has both large and small triangles to render in any given scene. The large triangles will be pixel-limited, and thus the vertex units will sit idle. The small triangles will be vertex-limited, and thus the pixel units will sit idle.

The real question here is whether or not the inefficiencies inherent in moving to a unified architecture will allow the nearly-100% utilization (i.e. no vertex units waiting for pixel units and vice versa) to bring such an architecture ahead of a current one.

Bob · Jun 12, 2005

A point that hasn't been addressed is how much additional hardware is needed to keep such a unified architecture utilized at 100% (or even 90%). If you need ~40% more transistors due to more FIFOs, register file ports, thead management, scoreboarding, etc, then a 60% efficient GPU with 40% more hardware would do just as well as a 100% efficient GPU. Note only that, but it'll have a higher peak performance, which opens the door for more optimzations.

Utilization isn't the only metric that's useful here.

KimB · Jun 12, 2005

Well, that too. Of course, the Xbox 360 part does have a transistor count advantage, with some logic offloaded to the second chip.

Titanio · Jun 12, 2005

Chalnoth said:
Well, that too. Of course, the Xbox 360 part does have a transistor count advantage, with some logic offloaded to the second chip.

Non-eDram silicon on Xenos + Smart Memory is probably in the low 200m range. ~230m?

Compared to >300m for upcoming PC parts, and indeed it's console competitor (RSX).

In moving to this architecture, though, they may have saved transistors elsewhere?

Xmas · Jun 12, 2005

Xenos is 232M transistors + about 100M for the smart memory IIRC. However, it seems to have less shader ALUs and TMUs than RSX.

Titanio · Jun 12, 2005

Xmas said:
Xenos is 232M transistors + about 100M for the smart memory IIRC. However, it seems to have less shader ALUs and TMUs than RSX.

I guess non-eDram transistors come in at about 250m then (?)

Xmas · Jun 12, 2005

Seems about right. Maybe you can count the external scaler/output chip, too.

ERP · Jun 12, 2005

Xmas said:
Xenos is 232M transistors + about 100M for the smart memory IIRC. However, it seems to have less shader ALUs and TMUs than RSX.

Based on?

swaaye · Jun 12, 2005

Well is there a game that saturates, say, the 4 vertex shaders in R9700?

ERP · Jun 12, 2005

swaaye said:
Well is there a game that saturates, say, the 4 vertex shaders in R9700?

I'll bet there are lots that do it for some of the frame.
and in the same frame they'll be idle for extended periods while large polygons or complex ones are being filled.

Xmas · Jun 12, 2005

ERP said:
Xmas said:

Xenos is 232M transistors + about 100M for the smart memory IIRC. However, it seems to have less shader ALUs and TMUs than RSX.

Click to expand...

Based on?

Which one, the transistor numbers or the number of units?
Transistor info from Dave Baumann.

Number of ALUs and TMUs is based on the rumored G70 specs which is rumored to be somewhat similar to, if not very much like, RSX

24 fragment pipes with 2 ALUs and 1 TMU each, plus 8 vertex pipelines

Dave Baumann · Jun 12, 2005

People keep forgetting about the texture address processors.

Xmas · Jun 12, 2005

Well, no. It's separate in RSX as well.

ERP · Jun 12, 2005

Xmas said:
ERP said:

Xmas said:

Xenos is 232M transistors + about 100M for the smart memory IIRC. However, it seems to have less shader ALUs and TMUs than RSX.

Click to expand...

Based on?

Click to expand...

Which one, the transistor numbers or the number of units?
Transistor info from Dave Baumann.

Number of ALUs and TMUs is based on the rumored G70 specs which is rumored to be somewhat similar to, if not very much like, RSX
24 fragment pipes with 2 ALUs and 1 TMU each, plus 8 vertex pipelines

I guess it depends on the similarity of the ALU's.
If they are completly orthogonal in G70 then it should be pretty close.

Dave Baumann · Jun 12, 2005

Xmas said:
Well, no. It's separate in RSX as well.

Jawed · Jun 12, 2005

DaveBaumann said:
People keep forgetting about the texture address processors.

Hey, I don't.

Jawed

Xmas · Jun 12, 2005

DaveBaumann said:
Xmas said:

Well, no. It's separate in RSX as well.

Click to expand...

Well, I took it you were hinting at the fact that ATI mentions a texture address processor in their pipeline diagrams and NVidia does not (though I'm still not entirely sure what exactly that unit does, and I guess NVidia considers this part of the TMU). But in the PS3 presentation, they did (as part of the TMU).

Jawed · Jun 12, 2005

Page 10:

http://www.ati.com/products/radeonx800/RADEONX800ArchitectureWhitePaper.pdf

shows that there is a dedicated texture address calculation ALU in R420, so it's reasonable to expect it appears in other more recent ATI architectures.

NV40 uses the first ALU (Vec3 + Scalar):

http://www.beyond3d.com/previews/nvidia/nv40/index.php?p=10

when calculating texture addresses, I assume. I can imagine that NVidia has split this functionality out into a separate ALU in G70.

Jawed

Xmas · Jun 12, 2005

I'm not that convinced NVidia uses the first SU for "texture address calculations". Probably for projective textures.

nAo · Jun 12, 2005

The first SU/ALU is used to derive (s,t,z,w) from their hyperbolic (linear interpolated) counterparts, AFAIK.

Typical GPU Efficiency

KimB

Bob

KimB

Titanio

Xmas

Porous

Titanio

Xmas

Porous

ERP

swaaye

Entirely Suboptimal

ERP

Xmas

Porous

Dave Baumann

Gamerscore Wh...

Xmas

Porous

ERP

Dave Baumann

Gamerscore Wh...

Jawed

Xmas

Porous

Jawed

Xmas

Porous

nAo

Nutella Nutellae

Similar threads