Cure For The Common Graphics Processing

Lazy8s

Veteran
Even though graphics are displayed two-dimensionally as an image, the conventional approach to drawing them is three-dimensionally as a scene. This tendency results because most graphics processors continue rendering the stream of polygonal scene data that's being sent to it before determining fully which parts belong to the image and despite the fact such data is unordered. Because of the nature of 3D, polygons can be positioned behind and obscured by other polygons, and rendering them from an unordered stream of data will produce pixels that get drawn over others which had already been drawn, wasting the work previously done to produce them. This overdraw reduces the efficiency of a conventional system, since it will be transferring around information for pixels not being used, and also its effectiveness, since only a small portion of the pixels it produces will actually be useful in making the image, to just a fraction of its operational speed.

The rate of producing data and the rate of transferring data are the two factors which ultimately limit performance in processing, so a fundamentally suited solution must be used instead that draws only the most front-lying or visible pixels.

Because the visible information could be anywhere within an unordered stream of data, an approach must be initiated that first collects all of the information from the scene into a display list – display-list rendering. Also, because checking the graphics data for visibility as it streams into external memory would involve a lot of data transfer over the bus and would be slow, another approach must also be enacted that allows the processing core to internally handle as much work as possible. The amount of memory which can fit inside a core, however, is not nearly large enough to hold all of the graphics data, so the job has to get handled in separate pieces. The target space, the full area of the screen, must be split up into small enough tiles to keep the data small enough to process internally – tile-based rendering.

Combining these two approaches, a tile-based display list renderer fully compiles and interprets the incoming stream of graphics data into display lists that correspond to the appropriate tiles of screen area. After the scene has been made manageable in this way for the graphics core, processing is then fast enough as it determines just the visible pixels from the image. Finally, those results are rasterized, and only the necessary pixels are ever drawn.

The benefits from not wasting resources on overdraw are overwhelming. Because even old games like Quake 3 Arena and Serious Sam had 3D which averaged more than three and five layers of surfaces deep respectively (the front of an object covering the back of the object in front of another object in front of some background detail, etc.), the single layer of pixels which a TBDLR calculates are worth several times – corresponding to the game’s number of 3D layers – that amount in fillrate. Such a graphics chip could use several times less pipelines, helping to bring its size and cost down approaching 50%, or it could expend a comparable amount to conventional chips and become several times more powerful. Saving bandwidth, several times less texturing data is transferred around, and several times less shading work is produced, saving data production.


By rendering mostly from within the graphics core to make the visibility check feasible, another overwhelming set of benefits are realized. Minimized external data traffic allows the system to use less expensive memory types, and therefore be more cost effective, and which also consumes less battery/socket power. It results in rendering that occurs at the high internal precision without compromise to external framebuffer settings, raising the image quality for tasks like color blending and flexibility for object depth sorting. It allows for extra samples of the image to be taken for anti-aliasing without requiring more from framebuffer memory. Also, overall operation becomes more effective since there is a high locality kept among the data being processed.
 
The pipeline of a conventional graphics processor initiates more stages of rendering before determining fully which surfaces are actually visible:

image028.gif

http://www.pvrdev.com/pub/PC/doc/f/PowerVR Tile based rendering.htm

The pipeline of a graphics processor suited to displaying 3D on a screen without wasted work checks fully for visibility up-front in the process:

image030.gif

http://www.pvrdev.com/pub/PC/doc/f/PowerVR Tile based rendering.htm

Checking fully for visibility requires compiling the information for the whole scene into a display list since the scene isn’t necessarily sent in a particular order (visible information could be anywhere) – display-list rendering. This contrasts with conventional renderers which can work on scene data as they get it despite the data being unordered:

image004.gif

http://www.pvrdev.com/pub/PC/doc/f/Display List Rendering.htm

The full scene information is too large to be handled inside the processing core where operation is fast and where expensive external memory access is minimized. So, as the information is being compiled for display-list rendering, another approach is combined whereby the screen is split into manageably sized tiles – tile-based rendering:

image006.gif

http://www.pvrdev.com/pub/PC/doc/f/PowerVR Tile based rendering.htm

Because tile-based rendering keeps processing mostly internal as opposed to being dependent on external memory, colors are always blended at maximum internal precision and are not subject to any sacrifice in external framebuffer depth when blending. With a 16-bit target framebuffer, image quality with a TBDLR is acceptable, but the banding/dithering that gets compounded at each blend with a conventional renderer means that extra memory will usually get spent in order to have a 32-bit framebuffer instead:

4.jpg


6.jpg


5.jpg


7.jpg

http://www.sharkyextreme.com/hardware/articles/kyro_in-depth/4.shtml

Working within the core with the tile-based approach minimizes external memory access over the bus and expends less bandwidth compared to conventional renderers:

image006.gif


image008.gif

http://www.pvrdev.com/pub/PC/doc/f/Display List Rendering.htm

The lower bandwidth expense of a TBDLR enables it to use more cost-effective RAM than a conventional renderer:

image010.gif

http://www.pvrdev.com/pub/PC/doc/f/Display List Rendering.htm

Fewer pixels have to be produced when only the visible surfaces are drawn. Even in this relatively simple scene from Quake 3 Arena, an old game, there are over three layers of depth on average which are producing pixels on a conventional renderer, even when using early depth check techniques:

image024.jpg


image026.jpg

http://www.pvrdev.com/pub/PC/doc/f/PowerVR Tile based rendering.htm

Unlike conventional renderers which draw over and waste old pixels, all pixels from a display-list renderer are part of the final image. This makes its pixel production several times more effective on average and means that such a system wouldn’t need to expend as many pixel pipelines and texturing units for a given level of performance. This results in smaller, and therefore less costly, chips.

The savings on RAM type from internal processing combined with the savings in chip size from visible-only pixel drawing make a difference in the cost breakdown of a graphics system (technically, expending less memory accesses consumes less power which dissipates less heat which could possibly allow for less costly cooling solutions, also):

syscost.jpg

http://195.157.98.220/article.php?article_id=488
 
There are two major fields for real-time graphics technology with distinct advancement curves: desktop/set-top and handheld.

Although the advance of handheld technology has some unique focuses owing to its power-restrictive environment, it shares a lot of general development with desktop/set-top technology of several years ago. In products from that era like the Dreamcast console and the Kyro I/II PC graphics cards, TBDLR proved itself to be the best-balanced solution for cost effectiveness, outclassing any comparable chip. This makes TBDLR a leading candidate for suiting modern handheld graphics, especially since it also satisfies handheld’s other major limiting factor, power consumption. Indeed, this is being demonstrated in the sector currently with MBX. Its nearest challenger, despite being released many months after MBX, is a generation behind in functionality without programmable vertex shading, fractional tessellation and depth adaptation in curved surface rendering, per-pixel lighting with DOT3, FSAA at no practical cost, internal precision color blending, internal floating-point precision depth sorting, and anisotropic filtering, and it only gives comparable fillrates to MBX while expending multiple pipelines – four times as many – and therefore costing a lot more and draining a lot more battery power.

Modern desktop/set-top graphics have high geometry counts which pressure TBDLR’s extra operation of storing the full scene that it has to take on. With bounding box visibility tests at T&L, triangle stripping, indexing, object culling, and various forms of display list compression and memory reclamation, it can still manage to support quite large polygon counts. It might not be able to keep pace with conventional renderers to an indefinite scale, but raising polygon counts won’t yield as great a return as raising image quality for desktop/set-top graphics going into the future. That’s the next direction for graphics in the sector, and TBDLR’s advantages for anti-aliasing and rendering precision are well suited for leading the way.
 
ATI and NVIDIA should be out of business by now!!! I mean, PVR have totally solved graphics limitations forever!!

Informative thread, but what does it have to do with consoles? It's not like any alive console out now or any next gen ones are going to be based on PVR technology...
 
If ImgTech released a high end TBDR card for the desktop it would kill comparable ati and nvidia products the same way the MBX has killed the goforce and imageon.
 
TEXAN said:
If ImgTech released a high end TBDR card for the desktop it would kill comparable ati and nvidia products the same way the MBX has killed the goforce and imageon.

If if if...

Is it gonna happen?

Does this belong to the console forum?

No and No.
 
This is the forum where all console related discussion is supposed to go. There are no provisions for market status or time regarding what can be discussed. Here is where the SNES and Genesis discussions take place, for instance. And the Dreamcast is the most unique console this generation.

Besides, this is also where arcade discussion is defaulted to since embedded systems have traditionally been grouped together away from the main PC focus at this site.
 
Lazy8s said:
This is the forum where all console related discussion is supposed to go. There are no provisions for market status or time regarding what can be discussed. Here is where the SNES and Genesis discussions take place, for instance. And the Dreamcast is the most unique console this generation.

Besides, this is also where arcade discussion is defaulted to since embedded systems have traditionally been grouped together away from the main PC focus at this site.

Oh ok, so this is a Dreamcast thread.

Great to know, i'm out of here.
 
Cut the bitchy remarks out folks. Lazy8s - it's usually polite to quote sources for such articles.
 
london-boy said:
Lazy8s said:
This is the forum where all console related discussion is supposed to go. There are no provisions for market status or time regarding what can be discussed. Here is where the SNES and Genesis discussions take place, for instance. And the Dreamcast is the most unique console this generation.

Besides, this is also where arcade discussion is defaulted to since embedded systems have traditionally been grouped together away from the main PC focus at this site.

Oh ok, so this is a Dreamcast thread.

Great to know, i'm out of here.

Dreamcast incorporated this technology.

This technology is unique, thus the DC is unique.
 
In the bandwidth figures, you have missed out the bandwidth required simply for displaying data.

for a 1024x768x16BPP you are talking about another ~200MB/sec

Just my two cents.
CC
 
TEXAN said:
If ImgTech released a high end TBDR card for the desktop it would kill comparable ati and nvidia products the same way the MBX has killed the goforce and imageon.
AFAIK it hasn't actually killed those products, and anyway, embedded stuff is one thing, PC desktop parts something completely different. None of the PVR chips were without a host of compatibility issues, big and small, which goes to show it doesn't matter how good the hardware is if the drivers don't live up to the task. Gamers require rock-solid stability as well as performance, and TBR chips work sufficiently different to IMR chips (actually completely differently), that a great number of issues can, will, and have cropped up over the years.

I suspect driver support is the biggest reason imgtech hasn't re-entered the PC arena yet, and probably never will again either.
 
I remember Tim 'unreal' Sweeney saying it's not possible to have TnL hardware to work with Tile based rendering ,then PVR telling they'll prove him wrong soon...
Something new ?
 
_phil_ said:
I remember Tim 'unreal' Sweeney saying it's not possible to have TnL hardware to work with Tile based rendering ,then PVR telling they'll prove him wrong soon...
Something new ?

Years old "N@omi 2" arcade board using "Elan" T&L Chip ?
 
Neeyik:
it's usually polite to quote sources for such articles.
I wrote it trying to provide a very general overview of tile-based display list rendering since the topic is only first being introduced to many people.

All of the pictures were sourced from PowerVR's site, the only company using this solution, except the color comparison which came from PowerVR but was hosted by SharkyExtreme's site and the cost comparison which came from STM but was hosted by Eurogamer's website.

Captain Chickenpants:
In the bandwidth figures, you have missed out the bandwidth required simply for displaying data.
Good catch. Tell someone over there to get right on it!
 
randycat99 said:
Aaaaaaalllll that, just to bring up a DC topic??? I thought I'd seen everything...
It's not just about DC. It also relates to Bitboys (though that'd mean it should go in the handheld section), and to Gigapixel which got bought by 3dfx, which got bought by nVidia. Isn't the speculation that the "G" in nVidia's next gen G70 chip stands for Gigapixel? If so, and its true that the PS3's "Graphics Synthesizer" is based on the G70 architecture, then we could have TBDR return to the consoles.
 
There's been lingering speculation about whether the major PC graphics designers will move even closer to a hybrid or TBDL rendering solution at some point and if that could apply to their console projects.

I believe Bitboys is a conventional renderer, actually (sounds quite promising too). It's the Mali handheld graphics architecture from Falanx that employs some of these ideas, and it's described as something of a hybrid tile-based immediate mode renderer. It also sounds very promising.
 
Back
Top