Even though graphics are displayed two-dimensionally as an image, the conventional approach to drawing them is three-dimensionally as a scene. This tendency results because most graphics processors continue rendering the stream of polygonal scene data that's being sent to it before determining fully which parts belong to the image and despite the fact such data is unordered. Because of the nature of 3D, polygons can be positioned behind and obscured by other polygons, and rendering them from an unordered stream of data will produce pixels that get drawn over others which had already been drawn, wasting the work previously done to produce them. This overdraw reduces the efficiency of a conventional system, since it will be transferring around information for pixels not being used, and also its effectiveness, since only a small portion of the pixels it produces will actually be useful in making the image, to just a fraction of its operational speed.
The rate of producing data and the rate of transferring data are the two factors which ultimately limit performance in processing, so a fundamentally suited solution must be used instead that draws only the most front-lying or visible pixels.
Because the visible information could be anywhere within an unordered stream of data, an approach must be initiated that first collects all of the information from the scene into a display list – display-list rendering. Also, because checking the graphics data for visibility as it streams into external memory would involve a lot of data transfer over the bus and would be slow, another approach must also be enacted that allows the processing core to internally handle as much work as possible. The amount of memory which can fit inside a core, however, is not nearly large enough to hold all of the graphics data, so the job has to get handled in separate pieces. The target space, the full area of the screen, must be split up into small enough tiles to keep the data small enough to process internally – tile-based rendering.
Combining these two approaches, a tile-based display list renderer fully compiles and interprets the incoming stream of graphics data into display lists that correspond to the appropriate tiles of screen area. After the scene has been made manageable in this way for the graphics core, processing is then fast enough as it determines just the visible pixels from the image. Finally, those results are rasterized, and only the necessary pixels are ever drawn.
The benefits from not wasting resources on overdraw are overwhelming. Because even old games like Quake 3 Arena and Serious Sam had 3D which averaged more than three and five layers of surfaces deep respectively (the front of an object covering the back of the object in front of another object in front of some background detail, etc.), the single layer of pixels which a TBDLR calculates are worth several times – corresponding to the game’s number of 3D layers – that amount in fillrate. Such a graphics chip could use several times less pipelines, helping to bring its size and cost down approaching 50%, or it could expend a comparable amount to conventional chips and become several times more powerful. Saving bandwidth, several times less texturing data is transferred around, and several times less shading work is produced, saving data production.
By rendering mostly from within the graphics core to make the visibility check feasible, another overwhelming set of benefits are realized. Minimized external data traffic allows the system to use less expensive memory types, and therefore be more cost effective, and which also consumes less battery/socket power. It results in rendering that occurs at the high internal precision without compromise to external framebuffer settings, raising the image quality for tasks like color blending and flexibility for object depth sorting. It allows for extra samples of the image to be taken for anti-aliasing without requiring more from framebuffer memory. Also, overall operation becomes more effective since there is a high locality kept among the data being processed.
The rate of producing data and the rate of transferring data are the two factors which ultimately limit performance in processing, so a fundamentally suited solution must be used instead that draws only the most front-lying or visible pixels.
Because the visible information could be anywhere within an unordered stream of data, an approach must be initiated that first collects all of the information from the scene into a display list – display-list rendering. Also, because checking the graphics data for visibility as it streams into external memory would involve a lot of data transfer over the bus and would be slow, another approach must also be enacted that allows the processing core to internally handle as much work as possible. The amount of memory which can fit inside a core, however, is not nearly large enough to hold all of the graphics data, so the job has to get handled in separate pieces. The target space, the full area of the screen, must be split up into small enough tiles to keep the data small enough to process internally – tile-based rendering.
Combining these two approaches, a tile-based display list renderer fully compiles and interprets the incoming stream of graphics data into display lists that correspond to the appropriate tiles of screen area. After the scene has been made manageable in this way for the graphics core, processing is then fast enough as it determines just the visible pixels from the image. Finally, those results are rasterized, and only the necessary pixels are ever drawn.
The benefits from not wasting resources on overdraw are overwhelming. Because even old games like Quake 3 Arena and Serious Sam had 3D which averaged more than three and five layers of surfaces deep respectively (the front of an object covering the back of the object in front of another object in front of some background detail, etc.), the single layer of pixels which a TBDLR calculates are worth several times – corresponding to the game’s number of 3D layers – that amount in fillrate. Such a graphics chip could use several times less pipelines, helping to bring its size and cost down approaching 50%, or it could expend a comparable amount to conventional chips and become several times more powerful. Saving bandwidth, several times less texturing data is transferred around, and several times less shading work is produced, saving data production.
By rendering mostly from within the graphics core to make the visibility check feasible, another overwhelming set of benefits are realized. Minimized external data traffic allows the system to use less expensive memory types, and therefore be more cost effective, and which also consumes less battery/socket power. It results in rendering that occurs at the high internal precision without compromise to external framebuffer settings, raising the image quality for tasks like color blending and flexibility for object depth sorting. It allows for extra samples of the image to be taken for anti-aliasing without requiring more from framebuffer memory. Also, overall operation becomes more effective since there is a high locality kept among the data being processed.