How much time do games spend in the rasterizer?

killeroo

Newcomer
Hi all,

I wish to design an experiment to measure the proportion of time spent rasterizing triangles in a modern game. I believe it will be quite a small number due to (a) highly evolved multiple rasterizers in GPUs, and (b) aggressively optimized draw calls from game engines, but I wonder if multiple deferred passes and tessellated primitives have a negative impact. I'm also curious what Beyond3D members think about this.

So, assuming access to the source of a Direct3D 11 engine, how would you isolate and measure the time spent in the rasterizer?

Are there tools (e.g. PIX) that can make this measurement easier?

Is there way to double the rasterizer load without affecting other parts of the pipeline?

Thanks in advance.
 
Hi all,

I wish to design an experiment to measure the proportion of time spent rasterizing triangles in a modern game. I believe it will be quite a small number due to (a) highly evolved multiple rasterizers in GPUs, and (b) aggressively optimized draw calls from game engines, but I wonder if multiple deferred passes and tessellated primitives have a negative impact. I'm also curious what Beyond3D members think about this.

So, assuming access to the source of a Direct3D 11 engine, how would you isolate and measure the time spent in the rasterizer?

Are there tools (e.g. PIX) that can make this measurement easier?

Is there way to double the rasterizer load without affecting other parts of the pipeline?

Thanks in advance.

would it be practically impossible to measure since rasterizing must be tied to shading time,(it can only buffer so much internally i presume) and it's pipelined so occurs in parallel. e.g. its like asking what % of time a cpu spends decoding. ok, i suppose you can ask if the rasterizer units are idle waiting for shading, hence come up with a figure for rasterizer utilization, and certainly its interesting to know if its rasterizer or shading bound and to what degree. also will GPU manufacturers have balanced the amount of die dedicated to rasterizing (same as any other component) such that the minimum amount of chip space is dedicated to keeping it not a bottleneck.

Maybe another related value is what proportion of the energy burned by the GPU goes into rasterizing versus shading. (what % of the effort/hw is rasterizing). i.e. in the past an s/w engine might spend various amounts of time (=energy) doing transform /clip / rasterizing. hence that way of thinking about profiling

and sure if the energy spent rasterizing is very low compared to everything else it may well be it's such a low % of the die that they can over-power it to ensure it is never a bottleneck?

I suspect you're right that tessellation becomes more rasterizer/trisetup based. a tessellating engine would spend most energy on geometry shading ? (geometry instead of pixel). there's the reyes idea and sony's attempts (unused ps3 h/w experiments?) at dong purely geometry based machine which tessellates everything down to shaded sub pixel polys..
 
A simpler reply, you could certainly remove shading time with a placeholder flat shaded fragment shader and profile that, then you could see if the graphics engine was bound by triangle rasterizing or by shading, and I expect what you say is correct, for most games it will be shader bound, but for a tesselation heavy game you could be back to rasterizing bound at a high enough level ... I've never tried it - but visual quality is mostly dependent on textures and lighting, so both content and hardware design will be oriented to devoting the energy expended in that direction. ( textures are of course simply a well established & effective way of approximating geometry and lighting)
 
If you mean triangle setup*, interpolation*, culling* and fill* as rasterization, you get it all pretty much for free as long as you use complex enough vertex and pixel shaders. This is because the fixed function rasterization hardware runs in parallel to the programmable shaders.

Basically:
- Fill ("pixel setup") is free, if you do enough alu or tex instructions in your pixel shader (something like 3 tex or 10 alu should be enough with current HW). You will be tfetch/tcache/BW/alu bound instead.
- Triangle setup is free of you use complex enough vertex formats (vertex fetch) or execute enough alu/tex instruction in vertex/geometry/hull/domain shader. You will be vfetch/tfetch/tcache/BW/alu bound instead.
- Interpolation is free as long as the pixel shader is complex enough (same stuff pretty much with fill).
- Culling (viewport, backface, hi-z, etc) are usually free, but can be bottlenecks if majority of the geometry gets culled out (and programmable pipeline idles mostly). A good engine however should never send geometry to GPU if it's not at least partially visible.

Basically the fixed function units are designed to be wide enough to not be bottlenecks on most common use cases. However one common case where you are rasterization bound is shadowmap rendering. When rendering to depth buffer only (no pixel shader) with a simple vertex shader, the GPU will be bottlenecked solely by rasterization (triangle setup or fill, depending on the geometry complexity).

You pretty much never want to be rasterization bound, as you are wasting shader cycles that way (shader hardware is idling). Adding instructions to your shaders is free if you are bottlenecked by rasterization. Usually with some tinkering, you can restructure your rendering so that you can do some useful work in the shader cycles that would be otherwise wasted.

(*) Generalizations. For example "fill" would include all fixed function steps that need to be done to assign threads to pixel quads, and to replicate/cull the pixel shader output to subsamples/stencil/depth/mrts, etc.
 
Last edited by a moderator:
Back
Top