A SONY Patent: PS3 A Hybrid Tile Based Deferred Renderrer?

j^aws

Veteran
Fig 1 from patent,

PS3-block-deferred-rend-h.jpg


Abstract

To provide an image processor with which the amount of rendering can be reduced. A plurality of primitives are categorized into a first group of primitives that are to be displayed on a display and a second group of primitives that are not to be displayed thereon, by an XYZ clipping section, a Z testing section, and a stencil testing section, according to the data about the plurality of primitives in a primitive buffer. A two-dimensional image is drawn in a frame buffer using the data for the first group of primitives in the primitive buffer.

......

BACKGROUND OF THE INVENTION

[0002] The present invention relates to an image processing technique for efficient rendering of three-dimensional images on a two-dimensional screen such as a display device.

[0003] Image processing capacities of image processors, such as game consoles or personal computers, that renders images on a display device have significantly been increased with a recent increase in processor speed.

[0004] For example, two-dimensional images that are used to display fine three-dimensional images of high quality on a two-dimensional screen can be produced in almost real time.

[0005] Two-dimensional images are produced with a plurality of primitives such as polygons that make up an image of a virtual object included in a three-dimensional image (hereinafter, referred to as an "object") and an attribute data set describing attributes of the relevant primitive, such as the shape, size, color, and brightness.

[0006] Image processing for rendering three-dimensional images on a two-dimensional screen may generally be classified into geometry processing and rendering. The geometry processing includes coordinate transformation to transform the coordinates of a set of vertices of primitives. The rendering is used to determine, for example, the color of each pixel from the geometry data obtained as a result of the geometry processing to create a two-dimensional image.

[0007] In this specification, a device that mainly performs rendering operations (including a distributed system and a semiconductor device) is referred to as a rendering processing unit. The rendering processing unit forms a portion of an image processor.

[0008] Some rendering processing units can render graphics and images, such as complicated pictures for better visual effects by rendering the same primitives two or more times. The rendering with two or more render passes is referred to as "multipass rendering". A single process in the multipass rendering is referred to as a "pass". For multipass rendering with three render passes, a polygon may be rendered without blending any texture during the first pass, a texture may be added during the second pass, and a different texture may be added during the third pass.

[0009] Conventional rendering processing units perform rendering multiple times for all primitives during the multipass rendering. This increases the number of rendering operations and, in turn, the amount of processing, when graphics or images of a complicated picture is to be rendered, resulting in a larger processing load.

[0010] The present invention is made with respect to the above-mentioned problems and an object thereof is to provide a rendering processing unit and a rendering method with which rendering operations for three-dimensional images can be achieved with less processing.

[0011] Another object of the present invention is to provide an image processor and components thereof that produce an image for better visual effects, without any overhead.

........

SUMMARY OF THE INVENTION

[0012] A rendering processing unit according to the present invention that solves the above-mentioned problems is a rendering processing unit for rendering three-dimensional images on a two-dimensional screen, the three-dimensional images being each made up of a plurality of primitives, comprising a primitive buffer in which a plurality of attribute data sets are written in association with relevant primitives, each attribute data set representing attributes of one of the plurality of primitives; and a tester that compares the plurality of attribute data sets in said primitive buffer with each other to categorize(sort) the plurality of primitives into a first group of primitives that are to be displayed on the two-dimensional screen and a second group of primitives that are not displayed thereon; the rendering processing unit being configured to render the first group of primitives and not to render the second group of primitives that are categorized out by said tester.

[0013] The rendering may be typical single pass rendering to render a given primitive using textures with only one render pass or multipass rendering to render the same primitive multiple times with different textures. At any rate, the primitives that are to be displayed actually on the two-dimensional screen (first group of primitives) are rendered. The throughput is significantly improved as compared with conventional rendering processing units that render all primitives making up of a three-dimensional image.

[0014] The "attribute data set" may be any kind of suitable data as long as the data can be used for determining whether a given primitive appears on the two-dimensional screen. In general, the attribute data sets may be numerical data, such as numerical data about vertices of a primitive (e.g., coordinates of a vertex, brightness of a vertex, or coordinates of a texture), the size, color, or transparency of a primitive. The numerical data about vertices may be, for example, geometry data that are obtained as a result of geometry processing.

[0015] The number of the primitives that are written in said primitive buffer may preferably be at least equal to the number of primitives with which three-dimensional images that are fit on one screen are built on the two-dimensional screen. This allows efficient rendering on screen basis.

[0016] In order to provide more efficient rendering on pixel basis, the rendering processing unit further comprises interpolation means that interpolates pixels according to a known attribute data set for a primitive. The tester categorizes the primitives on pixel basis by adding a new attribute data set obtained as a result of the interpolation of pixels into the attribute data sets to be compared with each other. With such a configuration, the primitive being displayed on the two-dimensional screen is categorized into the first group of primitives even at one pixel.

[0017] For images of higher resolutions, each pixel is divided into a predetermined number of subpixels. The interpolation means is configured to interpolate the subpixels according to a known attribute data set for a primitive when at least one of the subpixels is subjected to rendering. The tester is configured to categorize the primitives on subpixel basis by adding a new attribute data set obtained as a result of the interpolation of subpixels into the attribute data sets to be compared with each other.

[0018] The rendering processing unit may further comprise coverage calculation means that calculates a ratio of the number of subpixels that are covered by a given primitive being rendered to the total number of subpixels that make up a single pixel, and the attribute data set for the subject pixel may be determined based on the result of the calculation by said coverage calculation means. This provides faster anti-aliasing.

[0019] In order to allow for anti-aliasing, said tester may be configured not to categorize pixels on the boundary of primitives when two or more primitives are drawn at the pixel.

[0020] Each attribute data set in said primitive buffer may include position information, such as coordinate values (X, Y, and Z), that represents the position of a relative primitive in the three-dimensional images, and the tester may be configured to compare the position information included in the attribute data sets to categorize the primitives into first and second groups of primitives, the first group of primitives being primitives that are closest to the perspective of a viewer through the two-dimensional screen, the second group of primitives being other primitives than those categorized into the first group of primitives. This configuration eliminates rendering of the primitives that are hidden behind other primitive or primitives.

[0021] Each primitive may be adapted to be overlaid on stencil data comprising allowed regions that are allowed to be displayed on the two-dimensional screen and non-allowed regions that are not allowed to be displayed thereon, the stencil data representing the transparency and the shape of an image or images to be displayed. In this case, the tester categorizes the primitives into first and second groups of primitives, the first group of primitives being primitives at least a portion of which is overlaid on the allowed region or non-allowed region of the stencil data, the second group of primitives being other remaining primitives than those categorized into the first group of primitives.

[0022] From the viewpoint of increasing a rendering speed, the tester is configured to record a flag describing whether a given primitive is in the first group of primitives or in the second group of primitives, in a predetermined visible flags table that is referred to in rendering. The flag is recorded in association with the attribute data set for the given primitive. Such a configuration allows the rendering processing unit to determine whether a given attribute data set is in the first group of primitives or in the second group of primitives only by means of checking the flag in the visible flags table. The flag may be a numerical flag which has different values for each primitive, the value of the flag being updated based on the number of pixels covered by the primitive being displayed on the two-dimensional screen.

[0023] The rendering processing unit may further comprise editing means that is adapted to refer to the flag for the relevant primitive recorded in the visible flags table and to restrict the reading of the attribute data sets out of the primitive buffer for the second group of primitives. From the viewpoint of avoiding more positively the use of the second group of primitives, said editing means is adapted to delete, from said primitive buffer, the attribute data sets for the primitives that are categorized into the second group of primitives.

[0024] An image processor according to the present invention that solves the above-mentioned problems is an image processor comprising a frame buffer whose size is equal to the size of a display area in a two-dimensional screen; a first processor adapted to perform geometry processing of a plurality of primitives that describe a three-dimensional image to produce geometry data about the three-dimensional image; a second processor that renders two-dimensional images corresponding to the three-dimensional images in said frame buffer according to the produced geometry data; and a controller for use in displaying the rendered two-dimensional images in the display area.

[0025] The second processor compares a plurality of attribute data sets with each other to categorize the plurality of primitives into a first group of primitives that are to be displayed on the two-dimensional screen and a second group of primitives that are not displayed thereon and to render in said frame buffer the two-dimensional image that is made up of the first group of primitives other than the second group of primitives. Each attribute data set represents attributes of one of the plurality of primitives that are specified by the geometry data obtained from said first processor.

[0026] In a preferred embodiment, a buffer memory is provided between said first processor and said second processor and the geometry data produced by said first processor are transmitted to said second processor via the buffer memory.

[0027] The image processor may be configured by further comprising an image acceptance mechanism that accepts the three-dimensional images to be processed, from an external device, and supplies them to said first processor.

[0028] A rendering method according to the present invention that solves the above-mentioned other problems is a rendering method performed by a device for rendering three-dimensional images on a two-dimensional screen, the three-dimensional images being each made up of a plurality of primitives, the device having a primitive buffer in which the primitives are written for the formation of images. That is, this device performs a test pass and a rendering pass in this order, in which the test pass is for writing a plurality of attribute data sets in a primitive buffer in association with relevant primitives, each attribute data set representing attributes of one of the plurality of primitives that make up of the three-dimensional images, and for comparing the plurality of written attribute data sets with each other to categorize the plurality of primitives into a first group of primitives that are to be displayed on the two-dimensional screen and a second group of primitives that are not displayed thereon while the rendering pass is for reading the first group of primitives other than the second group of primitives that are categorized out in the test pass, out of the primitive buffer to render the read first group of primitives.

[0029] The rendering pass may be performed two or more times to render different textures two or more times for the same primitive.

[0030] In order to solve the above-mentioned problems, the present invention also provides a semiconductor device and a computer program.

[0031] A semiconductor device of the present invention is a semiconductor device that is mounted on a computer to which a display having a two-dimensional screen is connected, the semiconductor device being adapted to establish the following features on the computer in cooperation with other components of the computer, the features comprising a primitive buffer in which a plurality of attribute data sets are written in association with relevant primitives, each attribute data set representing attributes of one of a plurality of primitives that make up three-dimensional images; a tester that compares the plurality of attribute data sets in the primitive buffer with each other to categorize the plurality of-primitives into a first group of primitives that are to be displayed on the two-dimensional screen and a second group of primitives that are not displayed thereon; and rendering process means for rendering the first group of primitives other than the second group of primitives that are categorized out by the tester to produce a two-dimensional image to be displayed on the two-dimensional screen.

[0032] A computer program of the present invention is a computer program for use in directing a computer to perform the following tasks, the computer being connected to a primitive buffer in which primitives are written for the formation of images, and a display having a two-dimensional screen, the tasks comprising writing a plurality of attribute data sets in the primitive buffer in association with relevant primitives, each attribute data set representing attributes of one of a plurality of primitives that make up three-dimensional images; comparing the plurality of attribute data sets in the primitive buffer with each other to categorize the plurality of primitives into a first group of primitives that are to be displayed on the two-dimensional screen and a second group of primitives that are not displayed thereon; and rendering the first group of primitives other than the second group of primitives that are categorized out to produce a two-dimensional image to be displayed on the two-dimensional screen. This computer program is implemented when it is recorded in a computer-readable storage medium.

........

Source: Image processor, components thereof, and rendering method

For referrence,

Previous Sony Patent, B3D thread on PS3 parralel/tile/brick renderring:

EDIT:
Apparently the B3D thread PS3 parralel/tile/brick renderring: has a broken link to the patent, so here's the direct link to the parralel/tile/brick rendering patent:

and

Previous Sony Patent, B3D thread on PS3 PixelEngine:

Okay, I'm tired... :oops: But skimming through this patent, it suggests (but correct me if I'm wrong!) that the renderring processor is a hybrid deffered renderer utilizing a primitive buffer, a z-buffer, a stencil buffer and a frame buffer! 8) And referring to a previous patent suggesting parralel tile/brick renderring, the PS3 may well be a hybrid tile based deffered renderrer! :oops: 8) :D ....

Also interesting to note is that fig1 shows three distinct processors, a CPU, a Geometry processor and a renderring processor! :oops: 8) :D ....

Well, let me know what you guys find... :p
 
If this diagram was legit and true, it will be like a (Kutaragi Ken's) dream came true... (with exception made for the 400MHz difference in the BE.... Bring them back!! :LOL: ).

:D
 
Vysez said:
If this diagram was legit and true, it will be like a (Kutaragi Ken's) dream came true... (with exception made for the 400MHz difference in the BE.... Bring them back!! :LOL: ).

:D

why are there 2 256MB Rambus chips, but below them it says 1024MB? 2 * 256 != 1024
 
Could someone translate this diagrams and patent´s so a idiot can understand it. :LOL: There been soooo much patent stuff about PS3 that´s it hard to get a overview of it.
 
a688 said:
why are there 2 256MB Rambus chips, but below them it says 1024MB? 2 * 256 != 1024

Actually that's not the only "problem" with this diagram. A lot of values are "strange"... :LOL:
 
No expert but could those values just be the maximums.

Like the most it can take is 1gig . But most likely they will be using two pools of 256 megs or less because of cost ?

Much like the x800xt pe says 512megs in the docs but only uses 256 megs
 
We already discussed this patent..I believe Pana posted about it some months ago.
I don't think it is related to PS3..but hey.it's just a feeling
 
Separate CPU and geometry processors... No, this is not PS3.

And a big freakin :LOL: to that diagram by the way - "wishful thinking" is probably the best way to describe it.

Anyway, I'm surprised the GPU half of it is only clocked at 800MHz. :LOL:
 
nAo said:

When the guy who made that pic was off in la-la land anyway, I thought he might as well clock the GPU at at least half CPU clock or some such, considering he gave the thing 100GB/s main memory bandwidth and 1GB RAM...

Only 800MHz... That almost sounds...realistic! :LOL:
 
version said:
this is PSP gpu, very old

Thanks...Okay, you seem pretty sure its the PSP GPU, why? Also if it is the PSP GPU, could the 'principles' of the primitive buffer be applied to the PS3 GPU?

And that diagram, where did you pull that out from! :oops: Are you claiming it is legit or your interpretation? :D
 
Version, thanks for the posts...seriously, I don't want to sound cynical or anything but certain aspects of that diagram just look out of place as pointed out by others...

1. Memory of 1GB @ 100 GB/s...but there are 2* 256 MB blocks, huh? There are 4*32 bit channels to the RAM? People were estimating a total of 256 MB @ 50 GB/s with the recent announcements by RAMBUS...

2. The GPU eDRAM is quad ported and that has already been mentioned in the brick rendering patent before. But is it at 1200 GB/s! :oops: ... 1.2 TERABYTES/Sec!!! :oops: Okay, second thoughts, PS2 eDRAM was 48 GB/s in 1999, so this is 'only' *25* more for 2005/2006?

3. The GPU suggests 30 GPolys/s...30 BILLION Polys/sec! :oops: Surely you cannot be serious! ;) PS2 GS was 75 Mpolys/sec, this is 400* more! Even if that poly figure was replaced by pixels for fillrate...that would still be a 'massive' figure. So what would be the fillrate for this *beast* then, 1 Trillion pixels/sec! :oops: ;)

4. The CPU at 3.6 GHz and GPU at 800 MHz seem reasonable by comparison! ;)
 
Suppose that in one of your classes this fall, a term paper represents a huge chunk of your grade. Okay? Okay! Now let's fast forward to the last week of the semester. After putting a ton of energy into your paper, carefully choosing, researching and writing on a topic, would you be willing to let someone you don't even know plagiarize it for absolutely no credit or compensation? On the stranger's part, he or she has nothing to lose (and everything to gain). But as for you, such benevolence could not only compromise your grade in the class but your GPA as well ... :|

Let's say STI have been working on a term paper of sorts and have invested countless hours -- and several billion dollars -- toward that end. How likely are they to reveal their findings to a general public teeming with potential competitors? ;)
 
Pepto-Bismol said:
How likely are they to reveal their findings to a general public teeming with potential competitors? ;)

They will reveal initial specs to devs who are only human, well I use the term human *loosely*! ;)

Competitive companies will have they're own portfolio and intelligence of 'insider info' about their competitors, more so that the public domain...and they're not going to reveal that to the general public either. ! ;)

Sony's competitors aren't going to change their specs because of an alleged 'leak' on some console forum! :p
 
well, 30 billion polys does not sound so outrageous if this is a comparable figure to PS2's / GS's 75 million, which was announced back in 1999. if PS3 has a strong vertex compression shceme, it might be possible. of course, PS3 games will not get anywhere near 30 billion polygons. maybe 1-2 billion at most, without heavy shaders & textures. only light texturing shading and effects, and probably down into the hundreds of millions of polys with heavy shaders and effects
 
Megadrive1988 said:
well, 30 billion polys does not sound so outrageous if this is a comparable figure to PS2's / GS's 75 million, which was announced back in 1999. if PS3 has a strong vertex compression shceme, it might be possible. of course, PS3 games will not get anywhere near 30 billion polygons. maybe 1-2 billion at most, without heavy shaders & textures. only light texturing shading and effects, and probably down into the hundreds of millions of polys with heavy shaders and effects

Megadrive, I would sincerely 'like' to believe those figures...Drooooool...*Drowns in Drool...*

How about another take on the 30 GPolys/s, assuming 32 GFlops per APU @ 4 GHz,

CPU @ 3.6 GHz, with 32 APUs >>> 28.8 GFlops/APU * 32 >>> = 922.6 Gflops

GPU @ 0.8 Ghz, 16 APUs >>> 6.4 GFlops/ APU * 16 >>> = 102.4 Gflops

CPU + GPU = 922.6 + 102.4 = 1024 Gflops ! Volia, your 1 Tflops PS3! :LOL:

The two 'strangest variables I see in that diagram are the '30 GPolys/s' and '1 GB RDRAM @ 100 GB/s'. Assuming the 30Gpoly/s figure could be the 'max' throughput with 1 GB RDRAM @ 100 GB/s, which I doubt. How about,

30 Gpoly/s with 1024 MB @ 100 GB/s...or
15 Gpoly/s with 512 MB @ 100 GB/s....or
7.5 Gpoly/s with 256 MB @ 100 GB/s...or
3.75 Gpoly/s with 256 MB @ 50 GB/s...>>> More likely! :p
3.75 Gpoly/s with 128 MB @ 100 GB/s...>>> Possibility with 256 Mbits :)

...and 3.75 Gpoly/s, it's only 50* more than (PS2 @ 75 Mpoly/s)! :p

Edit: XDR Ram, 256 Mbit @ 6.4 GHz * 4 (32bit) = 128 MB @ 100 GB/s :) ...Distinct possibility with recent Rambus announcements! :p


Vyez said:
If this diagram was legit and true, it will be like a (Kutaragi Ken's) dream came true... (with exception made for the 400MHz difference in the BE.... Bring them back!! ).

I could see diehard PS3 nuts trying to 'overclock' the CPU to 4GHZ for that authentic 1TFLOP (TM) BE! :p
 
Apparently the B3D thread PS3 parralel/tile/brick renderring: has a broken link to the patent, so here's the direct link to the parralel/tile/brick rendering patent:

----------------

I linked this thread to ArsTechnica forums and got a couple of good posts from a developer (i think) called Jason Watkins who summarized the three patents posted on this thread nicely! Thanks Jason... :p

First post,

Jason Watkins said:
The first picture appears to be something other than the PS3, as it shows a system bus and cpu arrangement that looks pretty traditional. I think the guess about the PSP is more likely than this being part of PS3.

It does appear to be a deffered renderer. The term "primitives" as used in the patent appears to refer to triangles or some other form of polygons (quads, quadratic patches maybe). So nothing here suggests "tiled renderer" in the same sense as the PowerVR or it's ilk. In fact the presence of a zbuffer suggests it's extremely unlikely this is a sort first tiled architecture. It looks much more like hardware built with a few tweaks to do defferred rendering a little faster: do a single quick pass to burn in the z buffer, marking primatives that are completely invisible, then do however many subsequent passes you need to build the image.

The 2nd graphic posted by version, mostly matches what I can remember from reading the Cell patent. Other than exact details of how much ram, clock speed, a few bus widths and what peripheral blocks are hanging off it, we can be quite sure this is the configuraiton they'll use in PS3: two chip modules, one with 8 APU's and one with 4 APU's and 4 'GS 2' chips which we can assume are a variation on what's currently in PS2.

We know quite a bit about the Cell architecture chips and software as a whole, but we don't know much about what the GS2 is. I strongly suspect it will turn out to be very similar to the current GS (favoring deep multi-pass over long shaders)... which would be a little unfortunate. Deep multi-pass makes sense for embeded ram and low resolutions, so it does make some sense for a console, but it also makes ports from ati/nvidia style hardware complicated. Most PS2 developers I've I've heard grubling would have rathered a design more like ati/nvidia.

A clean sheet ideal concept for the PS3 graphics chip would be irregular rasterization. *That* would be extremely useful in fact. But I doubt very much we'll see that.


Second post,

Jason Watkins said:
quote:
------------------------------------------------------------------------
principles of the 'primitive buffer' from the patent for the PS3 GPU for deffered rendering?
------------------------------------------------------------------------
Well, there's nothing unique about the 'primative buffer' as mentioned in the patent. As it's described, it seems more or less equivalent to vertex arrays stored in local card memory on current ati or nvidia hardware, combined with the ability to write attributes back to the vertex array from the pixel processing blocks. So it's perhaps more correct to think of it as a more flexible and zero copy variation on rendering to a render target followed by 'interpret as vertex array.' Interesting stuff, but not dramatic.

You can do deffered rendering on ati and nvidia for multiple passes. You do a first pass that burns in the zbuffer, and then all subsequent passes leverage that informaiton. Because ati and nvidia hardware have early z rejection techniques, after you've burned in a zbuffer, following passes will save considerible pixel processing bandwidth.

quote:
------------------------------------------------------------------------
Did you read the end of the first post which has a referenence link to another related Sony Patent
------------------------------------------------------------------------
Nope, I had missed that. The link came up with a patent for text entry on a game controller... didn't realize I needed to skip forward in the search result list they linked to.

Now that, quite clearly, is a tiled/deffered renderer very much in the style of PowerVR. But it goes one step further to sort and manage primatives by a bounding box heirarchy, which overcomes one of PowerVR's real limits (building a bsp of primatives per tile). It also appears to be able to store and compsite overlapping 'bricks'(tiles).

This sounds *very* much like the Talisman design.


They mention grouping by motional characterisitcs (ie velocity vector) and lod. That's very interesting, since it opens the idea of rendering different groups of geometry at differing rates, and using the bricks/tiles stored in the image storage unit as impostors to be warped. If that's true, than this is a huge departure from what I would have expected in PS3 based on the PS2 GS. I also don't think this is necessarily good: Talisman failed for reasons, reasons that still apply: ram is cheap, interframe coherance is complicated, and interframe coherance with shadow maps is almost intractible.

The Salc/Salp stuff is interesting... but I'm kindof skeptical. First, it's just a design for a programmable pipeline, so it doesn't tell much about what might be surrounding that programmable block. It doesn't tell us much about the larget structure of the GS2 (assuming this patent is part of the GS2 design). So I still doubt they'll be supporting irregular rasterization. Second, bit-serial alu's have been around a long time... and I'm sure someone has thought about chaining them together before. Bit serial stuff mostly failed in the dark days, but since graphics specificly can tolerate latencies, it makes sense. Interesting stuff.

Here's a paper describing irregular rasterization. . There's a few papers out now that are all variations on the same idea using differing implimentations.

quote:
------------------------------------------------------------------------
the figures in the second image posted by 'version', didn't disturb you?
------------------------------------------------------------------------
Not really. I tend to ignore meaningless peak figures like XX million triangles/sec. Also, I remembered most of the specs from the patent. It mentions a typical 64meg per chip module implimentation at 1ghz, which seems much more reasonable and likely to me.

quote:
------------------------------------------------------------------------
BTW, there are rumours that PS3 will be OpenGL based! Which should ease ports from ATI/ Nvidia style hardware!
------------------------------------------------------------------------
Now that is interesting. OpenGL ES dumps a lot of teh old baggage, in particular display lists, which would be the most straightforward feature of OpenGL to map to the PS2 GS.

So, on topic then, assuming the above renderring patents are going into PS3, we can call the PS3 a 'tiled based deffered renderer' to join the likes of DC before it then? :p

Anyone also remember 'Talisman' :?: I vaguely remember reading about it in the 90s about revolutionizing 3D rendering using a tile based system really cheaply! :p

Also the irregular rasterization sounds interesting, using an irregular z-buffer technique! :p I think the PS3 architecture would lend itself to it quite nicely! 8) What do you guys think :?:
 
Back
Top