PowerVR TBDR and Coherency Sorting
PowerVR pioneered tile-based deferred rendering (TBDR) as far back as 1996. The focus of TBDR
is efficiency, both in processing as well as bandwidth. Tile-based rendering does this by sorting all
the triangle geometry into screen-space tiled regions first before rendering. This is different from
immediate mode rendering (IMR) where every triangle is transformed and immediately drawn. The
benefit of sorting all geometry and then rendering per screen-space tile region (usually 16x16 or
32x32 pixels in size), is that we can complete the rendering of the tile region solely using on-chip
memory for the depth/stencil buffer as well as the colour buffer. IMRs push all this bandwidth
off-chip and depend on cache hits to reduce it, but as geometry submissions are not spatially
coherent in screen space this caching approach typically fails, leading to high bandwidth, latency
sensitivity and poor power efficiency.
Therefore, by sorting geometry first the cache hit rate effectively becomes 100%. Additionally,
depth and stencil buffers are often only used once and hence can be discarded. With GBuffer and
MRT rendering many of the MRT “colour” targets are only used for intermediate scratchpad data
and only one colour buffer is required to be written out to memory. With TBDR, all of this can be
done on chip, saving memory footprint and very significant amounts of bandwidth.
TBDR also offer significant benefits in handling anti-aliasing. As the oversampled buffers only
ever exist in on-chip memory, only the downsampled colour targets are written out, yet again
saving memory footprint and bandwidth.
The PowerVR Photon ray tracing architecture is in many ways identical to the PowerVR TBDR
architecture in that a spatial sort is also done, only rather than in 2D screen space we bin rays
into packets which travel along similar paths through the BVH. The benefits here are similar to
what we find with coherency sorting; namely significant cache efficiency and reduced bandwidth,
while processing remains in a SIMD/SIMT nature, ensuring high power efficiency of the logic and
overall processing.