Got a question to graphics folks

DavidC

Regular
I have a question to people who know a fair bit of graphics. Modern graphics processors have unified shader units. Geforce 8 series, the Radeon HD 2x00 series, and Intel's G965 GMA X3000 has unified shader units.

Now I heard the execution units are scalar, meaning each of the execution units are able to only process 1 instruction per cycle, rather than vector units with previous GPUs.

How true is this statement??: http://techreport.com/articles.x/12195

"A typical pixel has four components (red, green, blue, and alpha), so the GMA X3000 can really only process two complete pixels per clock cycle."

2 pixels/s is based off the fact that GMA X3000 has 8 unified shader execution units.

(Please ignore the 16 ALU report on the X3000 and just answer this first. I'll get to that after)
 
That's a very simplistic way of looking at it, and because they don't define what they mean by "processing a pixel" it's sort of meaningless.

The only way that works out is if you assume that each pixel requires exactly one vec4 instruction. Even if you're only looking at pixel shading (so not including blending, depth-stencil test, etc.), pixel shaders are rarely that short. And just because a pixel contains four channels doesn't mean that all instructions in the pixel shader are vec4 -- any shader doing per-pixel lighting (for example) is likely to have a variety of scalar, vec2, and vec3 instructions as well.

But since any pixel shader is going to have at least one vec4 instruction (equivalently four scalar instructions), it is accurate to say that the theoretical peak shading throughput is 2 pixels/clock.
 
AFAICT, and this is partially speculation: 8-wide ALUs, 16-wide batch size, one quad = one 'thread', 32 quads in flight in the shader core. The vertex shader works on Vec4s, which should make it easy for the 'threads' to be of the same 'size'.
 
Back
Top