Typical GPU Efficiency

WaltC said:
Good point. I think he's talking about PC scenarios like running Word or a browser versus running a demanding 3d game, etc. In a PC the chips are always running at 100% peak, but sometimes the programs require no more than 50-60% of a given chip's resources--hence he says "efficiency."
Well, not really. One can see where he's coming from when you just consider a game that has both large and small triangles to render in any given scene. The large triangles will be pixel-limited, and thus the vertex units will sit idle. The small triangles will be vertex-limited, and thus the pixel units will sit idle.

The real question here is whether or not the inefficiencies inherent in moving to a unified architecture will allow the nearly-100% utilization (i.e. no vertex units waiting for pixel units and vice versa) to bring such an architecture ahead of a current one.
 
A point that hasn't been addressed is how much additional hardware is needed to keep such a unified architecture utilized at 100% (or even 90%). If you need ~40% more transistors due to more FIFOs, register file ports, thead management, scoreboarding, etc, then a 60% efficient GPU with 40% more hardware would do just as well as a 100% efficient GPU. Note only that, but it'll have a higher peak performance, which opens the door for more optimzations.

Utilization isn't the only metric that's useful here.
 
Well, that too. Of course, the Xbox 360 part does have a transistor count advantage, with some logic offloaded to the second chip.
 
Chalnoth said:
Well, that too. Of course, the Xbox 360 part does have a transistor count advantage, with some logic offloaded to the second chip.

Non-eDram silicon on Xenos + Smart Memory is probably in the low 200m range. ~230m?

Compared to >300m for upcoming PC parts, and indeed it's console competitor (RSX).

In moving to this architecture, though, they may have saved transistors elsewhere?
 
Xenos is 232M transistors + about 100M for the smart memory IIRC. However, it seems to have less shader ALUs and TMUs than RSX.
 
Xmas said:
Xenos is 232M transistors + about 100M for the smart memory IIRC. However, it seems to have less shader ALUs and TMUs than RSX.

I guess non-eDram transistors come in at about 250m then (?)
 
swaaye said:
Well is there a game that saturates, say, the 4 vertex shaders in R9700?

I'll bet there are lots that do it for some of the frame.
and in the same frame they'll be idle for extended periods while large polygons or complex ones are being filled.
 
ERP said:
Xmas said:
Xenos is 232M transistors + about 100M for the smart memory IIRC. However, it seems to have less shader ALUs and TMUs than RSX.

Based on?
Which one, the transistor numbers or the number of units?
Transistor info from Dave Baumann.

Number of ALUs and TMUs is based on the rumored G70 specs which is rumored to be somewhat similar to, if not very much like, RSX ;)
24 fragment pipes with 2 ALUs and 1 TMU each, plus 8 vertex pipelines
 
Xmas said:
ERP said:
Xmas said:
Xenos is 232M transistors + about 100M for the smart memory IIRC. However, it seems to have less shader ALUs and TMUs than RSX.

Based on?
Which one, the transistor numbers or the number of units?
Transistor info from Dave Baumann.

Number of ALUs and TMUs is based on the rumored G70 specs which is rumored to be somewhat similar to, if not very much like, RSX ;)
24 fragment pipes with 2 ALUs and 1 TMU each, plus 8 vertex pipelines

I guess it depends on the similarity of the ALU's.
If they are completly orthogonal in G70 then it should be pretty close.
 
DaveBaumann said:
Xmas said:
Well, no. It's separate in RSX as well.
:?:
Well, I took it you were hinting at the fact that ATI mentions a texture address processor in their pipeline diagrams and NVidia does not (though I'm still not entirely sure what exactly that unit does, and I guess NVidia considers this part of the TMU). But in the PS3 presentation, they did (as part of the TMU).
 
I'm not that convinced NVidia uses the first SU for "texture address calculations". Probably for projective textures.
 
The first SU/ALU is used to derive (s,t,z,w) from their hyperbolic (linear interpolated) counterparts, AFAIK.
 
Back
Top