SCE patent: PSP very fast multi-pass ? VERY INTERESTING HSR

What do you think about this ? ;)

http://makeashorterlink.com/?Q29E652D5

This is not exactly what the PSP ended up looking like, but it resembles it pretty well if you look at the Images section f the patent ( it did not have HOS support in this patent, but the patent did not show the final evolution of the chipset ).

There are some neat ideas as Hardware supported Stencil and a very efficient multi-pass rendering technique ( has a primitive buffer which stores T&L Vertices and even if you are doing multi-pass rendering, for multi-texturing, it allows you to send geometry data only once to the Rasterizer/GPU ): you can read a bit about it in the quote I posted in this post few lines down.

From a chat session I am having with a friend:

Panajev2002a: Look at the image sections ( your browser should be able to read TIFF files )
Panajev2002a: That descfribed GPU
tempusknight: what is so special about this?
Panajev2002a: lacks one thing compared to PSP and that would be Hardware tessellation
Panajev2002a: but for the most part that looks VERY similar to what the PSP looks like
tempusknight: novembre 2002 =/
tempusknight: november*
Panajev2002a: The author of that patent also made another patent in which he describes a method of automatic tessellation anyways
Panajev2002a: quite recent huh ? ;)
tempusknight: yea
Panajev2002a: That chipset has some nice ideas:
Panajev2002a: 1) Hardware supported stencil buffer ( that can help for games like Silent Hill 2 and Silent Hill 3 which use Shadow Volumes )
Panajev2002a: 2) it does multi-pass rendering, but it has a Primitive Buffer which holds on to the transformed data sent by the T&L processor and keeps it there until the multi-pass rendering engine has finished its work
Panajev2002a: Normally with multi-pass rendering ( also used to do multi-texturing ) you would have to re-send geometry and this idea let's you send geometry only once
Panajev2002a: effectively this makes it a single pass all effects machine
Panajev2002a: you do not have to re-send the triangle data as it will not leave the primitive buffer until you did all the rendering passes you needed that data for
Panajev2002a: This saves quite a bit of bandwidth
tempusknight: cool
Panajev2002a: and would be ideal for a portable solution where saving bandwidth and processing power ( more waste == more power wasted ) is important
Panajev2002a: as it reduces power consumption

Edit: to reach the patent go here

http://patft.uspto.gov/netahtml/search-bool.html

Put Nobuo in the first field and put Sony Computer in the other field * and for this one choose Assignee name ).

From here choose "Image processing device, image processing method and program distribution medium and data distribution medium for processing images"



This other patent, by the same author, seems at a first look to describe a method for Hardware based subdivision.
 
Galian beast LMAO.

I talk to him on a daily basis, he doesn't think PS3 will use blu-ray though. Because of "cost", he won't believe otherwise.
 
Pan, I'm wondering about this primitive buffer... For complex scenes it might well require QUITE a bit of memory. I wonder if it really would be enough to fit everything.

Maybe it's not meant to hold the entire scene?


*G*
 
It is meant to hold a portion of the scene, not the whole scene.

Basically you would fill it with as many primitives that need to be rendered in multiple passes and you would keep streaming new primitives in as the fully rendered ones are deleted from this buffer.
 
[0070] The primitive buffer 21 stores the geometry data for at least one group of primitives as an example of attribute data sets for the primitives. One group of primitives may be a set of primitives that form an image on a single screen of the display 41 or a single object. Each primitive is assigned with an identification number (hereinafter, referred to as a "primitive number") as the identification information to identify the primitives. The primitive number is included in the attribute data set.

It holds the primitives on a per object basis, how many objects would fit is related to the size of the buffer.

But I have discovered something very interesting...


[0122] The rendering processor 16 performs the Z buffer drawing for all geometry data that are stored in the primitive buffer 21, and writes the Z value(s) close to the point of view in the Z buffer (step S10). After the completion of the Z buffer drawing, the rendering processor 16 clears the visible flags table 62 of the visible flags control section 29 to set all visible flags to "0" (step S20). Then, the test pass is performed and the visible flag in the visible flags table 62 is changed based on the result of the test pass (step S30). During the test pass, the visible flags table 62 is supplied with the flags indicating that the primitives are those to be displayed on the display 41 or those not to be displayed thereon. The rendering processor 16 checks the visible flags table 62 and deletes the geometry data of the primitives that are not to be displayed on the display 41 from the geometry data stored in the primitive buffer 21. In other words, the relevant primitives are deleted (step S40). After the deletion of the primitives, the rendering processor 16 performs multipass rendering for the primitives that are to be displayed on the display 41 (step S50).

Quite a bomb-shell associated with 664 MPixels/s of fill-rate.
 
The sort dependent fragment AA was a bit disappointing, as far as fast AA goes that is pretty lame ... but there is always supersampling.
 
They still have FSAA for that, but I was looking at the Hardware based HSR method ( in addition to the Z-buffer ) which basically forces DOOM 3 rendering method ( order of rendering passes ) in HW.

This added to Stencil support in Hardware, 664 MPixels/s, HOS support in Hardware, Hardware Clipping, Morphing and Skinning ( 8 bones ) and the Primitive Buffer for fast multi-pass rendering make for an interesting GPU :)
 
MfA said:
The sort dependent fragment AA was a bit disappointing, as far as fast AA goes that is pretty lame ... but there is always supersampling.

Did you not see the OC part before? You never mentioned it now that I think about it.
 
[0094] The rendering pass may be performed on subpixel basis rather than on pixel basis to render images of higher resolution. In such a case, a Z buffer for subpixels is used. Images are then reduced in size to obtain actual pixels after the completion of rendering of the subpixels.
 
Mfa, Vince, etc... mind bringing your minds over this patent again ? I am glad you had a nice talk and all, but we all want to learn more here :p
 
It does occlusion culling by doing two passes. One where Z is calculated and stored. The second pass then tests Z and only renders visible pixels (early Z rejection). The primitive buffer is there to ensure that it doesn't have to do transformation twice.

Only usable AA is done with supersampling.

Cheers
Gubbi
 
You are correct Gubbi :)

Having the Primitive buffer that avoids re-transformation for multi-pass rendering ( re-T&L we should say :) ) kills two of the main issues with multi-pass rendering:

1.) efficiency lost for the T&L processor which has to waste cycle re-processing data.

2.) waste of bandwidth re-transmitting data to the Rasterizer unit.

IIRC, this techique allows for more than just few rendering passes using the P-buffer stored and visible geometry, it seems you could do passes up to fill-rate limits.

The thing I disagree with is that this system seems to reject only fully occluded triangles and not individual pixels.

What do you think ?
 
No I just think that is just the usual patent mumbo-jumbo, where in one paragraph an element is a tri and in the very next paragraph a pixel. They do say that "the element" has a X,Y and Z value (and hence a pixel).

They spare the retransformation of geometry but has to buffer the transformed data. So they'll be wasting a rather large amount of on-chip capacity (or bandwidth).

Overall it still seems like a brute force approach (especially the AA) with a few sprinkles of clever.

Cheers
Gubbi
 
I just lost a several pages post due to Mozilla crashing ( it was the Quicktime plugin's fault, this PC did not have Alternatiff installed yet [for patent Images which are .TIFF files] :( ).

However I had way to much talking and this time I think pictures can do the talking for me :)

I wanted to clarify one thing: there is a bandwidth saving as the Primitive buffer does not steal bandwidth from the GPU's VRAM or from main RAM so it leaves more bandwidth for textures' streaming.

What I like the most is that in any case of multi-pass rendering the T&L chip receives no extra work :) ( but I have said it so many times that it is not very new to you :LOL: ).

So, let's talk again about the P-buffer.

Here is a picture of the patent's chipset:

chipset.png


These are the diagrams for the three rendering steps:

Z-buffer.png



Test.png



Multipass.png



To me, this seems an early-Z mechanism that rejects polygons/primitives and not pixels.

This is what in a diagram they show a primitive stored in the P-buffer as being:

Primitive.png
 
chipset.png


Now ONLY if we could find a PS3 version of this picture somewhere in a Sony patent...

We do have that picture of BE and VS.. But that doesn't cut it.

Pana, Vince; to the batmobile! We must search thousands of patents for elusive ps3 diagrams.[/img]
 
Paul said:
Now ONLY if we could find a PS3 version of this picture somewhere in a Sony patent...

We do have that picture of BE and VS.. But that doesn't cut it.

Pana, Vince; to the batmobile! We must search thousands of patents for elusive ps3 diagrams.[/img]

Paul, doesn't that picture look like the PSP ? :p

Please do not post it everywhere, bandwidth is limited :(
 
Back
Top