PowerVR Series5

Panajev2001a · Apr 21, 2004

Then you had to pull Volume Modifiers and Hardware Translucency sorting from the next PVR cards, so no more awards for you

.

(beating a dead horse take 20)

Simon F · Apr 21, 2004

Panajev2001a said:
Cough... Graphics Synthesyzer... cough...

Hey, I am a fan of fat 2,560 bits data-paths .

I'm curious about the design.

I know it has 16 texture units, but how many clocks does it take for each read/modify/write process?

If you have two, small sequentially adjacent translucent polygons that overlap each other, do you get full performance?

jvd · Apr 21, 2004

Simon F said:
Panajev2001a said:

Cough... Graphics Synthesyzer... cough...

Hey, I am a fan of fat 2,560 bits data-paths .

Click to expand...

I'm curious about the design.

I know it has 16 texture units, but how many clocks does it take for each read/modify/write process?

If you have two, small sequentially adjacent translucent polygons that overlap each other, do you get full performance?

I'm not sure if i'm right but isn't every 2 pipes share a tmu. That is why when textureing and filling the polygons its maximum fillrate is cut in half .

Which if i'm right about that is a sucky design

Simon F · Apr 21, 2004

Panajev2001a said:
Then you had to pull Volume Modifiers and Hardware Translucency sorting from the next PVR cards, so no more awards for you

As explained previously....
1) There are no Volume Modifiers in DX or OGL - just stencils. MS would not add the more efficient VMs because it was damn-near impossible to do on IMRs.
2) Yes that's a shame but PC developers were too [insert as applicable] to disable their translucency sorting code in the games so there was no point in leaving it in the hardware.

nAo · Apr 21, 2004

Simon F said:
If you have two, small sequentially adjacent translucent polygons that overlap each other, do you get full performance?

Dunno about that, but I did several tests with alpha blending enabled/disabled on complex scenes (200.000+ triangles per frame) and couldn't measure any relevant performance difference.
Alpha blending is virtually free on the PS2. In fact I keep it activated all the time cause is much simpler on the GS to switch from a blending equation that gives you no blend than just switch alpha blending off.

ciao,
Marco

JohnH · Apr 21, 2004

Ailuros said:
I understood your point perfectly well; my question was going elsewhere and that namely in the grid pattern direction.

I'd probably should ask first if past a certain amount of samples (let's say 16x) a sparse grid really makes any significant difference anymore, yet I can theoretically see a very high edge equivalent resolution even on just an 8x sparsely sample MSAA pattern for example ( 8*8 ).

Clearly the more samples the better, yet I don't think the resulting grid is irrelevant after all.

It was just an example. However, the difference can be quite subtle but when you see it back to back you'd always go for the higher sample rate. You also need to stop thinking purely in terms of triangle edges...

John.

Panajev2001a · Apr 21, 2004

Simon F said:
Panajev2001a said:

Cough... Graphics Synthesyzer... cough...

Hey, I am a fan of fat 2,560 bits data-paths .

Click to expand...

I'm curious about the design.

I know it has 16 texture units, but how many clocks does it take for each read/modify/write process?

If you have two, small sequentially adjacent translucent polygons that overlap each other, do you get full performance?

16 Texture units ?

Ahem... how do I break it to him... it has... well... like... 0.

The way I understand it is that we have 8 of the 16 Pixel Engines that do double duty when the GS is drawing textured primitives and act as TMUs.

Edit: The way the Hardware s set-up is like this:

We have 16x2 Mbits DRAM macros and we have 16 Pixel Engines with a 64 bits interface each: 32 bits RGBA and 32 bits for the Z-buffer.

Each DRAM macro seems to be connected to the Pixel Engines through three busses ( there are buffers and caches around to play with of course ): one 64 bits READ bus, one 64 bits WRITE bus and one 32 bits bus to access texture data.

You can do READ and WRITE operations in parallel then while accessing texture data.

Texturing mode: the GS writes in a 4x2 pixel pattern

Non Texturing mode: the GS writes in a 8x2 pattern.

Reccomended triangle size is 32 pixels in area ( optimal speed ).

As nAo and other licensed PlayStation 2 developers can testify, PlayStation 2's strength lies in rendering translucent surfaces and doing fill-rate heavy operations.

Look at the highest profile PlayStation 2 games, think about Z.O.E. 2 and Final Fantasy X ( during the Summons ) or MGS 2 ( some effects use tons of semi-transparent layers: think about the rain or the heat-haze effect when the Harrier is flying upwards ) for example: particle effects, alpha blended surfaces all over the place.

Panajev2001a · Apr 21, 2004

Simon F said:
Panajev2001a said:

Then you had to pull Volume Modifiers and Hardware Translucency sorting from the next PVR cards, so no more awards for you

Click to expand...

As explained previously....
1) There are no Volume Modifiers in DX or OGL - just stencils. MS would not add the more efficient VMs because it was damn-near impossible to do on IMRs.
2) Yes that's a shame but PC developers were too [insert as applicable] to disable their translucency sorting code in the games so there was no point in leaving it in the hardware.

I am a broken record, you know me...

1.) Shame on DX and OGL

.

2.) Well, to please nostalgic fans like me, no ? Ok, gun pointed to my face means arms up in the sky, turn around, be silent and maybe I can walk away without being shot.

Still, only one to go... Volume Modifiers are now usable nicely in KallistiOS ( a guy released some working source code for "cheap shadows".

I have a question: he said that you cannot do "cheap shadows" through Volume Modifiers and use other Volume Modifiers ( for other purposes ) in the same scene at the same time.

Is it true ?

Thanks for the patience Master Simon F.,

Panajev

MfA · Apr 21, 2004

Can the rasterizer in the graphics synthesizer even work on multiple primitives at a time?

Dave B(TotalVR) · Apr 21, 2004

What I would like to see with gen modifier volumes is a way to work out when you are close to the edge of it, so instead of switching from set of values 1 to set of values 2, blending between them.

For a while now I have been trying to work out some sort of system for volumes used to have an effect on things inside it acording to where it is in the volume. Started with a sphere, so define the center as opaque and the outside as clear, if u can do that in hardware and render it properly then u have hardware smoke volumes. Taking it a step further, apply an arbitary equation for the 'opacity', or the whatever, and there is the possibility of jets of smoke and whatnot. Still havent figured it out yet though;p I was thinking that a display list will make this more than possible. Havent had the time to get on rendermonkey and try programming it on my 9600 but to me it very much sounds like a small step up from the GMV's found on the Neon250.

Dave

MfA · Apr 21, 2004

They have a patent on that (although personally I dont find it non obvious ... but hey).

nAo · Apr 21, 2004

MfA said:
Can the rasterizer in the graphics synthesizer even work on multiple primitives at a time?

Unfurtunately it can't.

MfA · Apr 21, 2004

Can modern architectures do it even?

I assume that immediate mode renderers which are be able to do it will use screen space interleaving ala Bitboys to avoid dependency issues (or to give it it's formal name, sort middle).

pmac · Apr 21, 2004

MfA said:
They have a patent on that (although personally I dont find it non obvious ... but hey).

Is that the patent on depth-based blending ?

It looks like it could be quite useful given all the volumetric effects that crop up in computer games. Here's a couple of snippets from the patent:

[0018] A preferred embodiment of the present invention provides a method and apparatus, which are able to implement volumetric effects, such as forming clouds, efficiently. To do this it provides a set of depth buffer operations which allow depth values to be manipulated arithmetically. These operations allow a depth or blending value to be formed that can be representative of the distance between the front and back of the volume. After derivation, these values can be passed to a texture blending unit in which they can be used to blend other components such as iterated colours, textures, or any other source applicable to texture blending. The result from the texture blending unit can then be alpha blended with the current contents of the frame buffer.

[0059] A further embodiment allows the volumes to be processed as monolithic objects. As the volume would be presented as a "whole" it is possible to handle multiple per pixel entries and exits to/from the volume, as such concave volumes can be handled. Also, as the object would be represented as a volume, no destructive write to the depth buffer is required.

Dave B, is this the kind of thing you're talking about ?

Ailuros · Apr 21, 2004

JohnH said:
Ailuros said:

I understood your point perfectly well; my question was going elsewhere and that namely in the grid pattern direction.

I'd probably should ask first if past a certain amount of samples (let's say 16x) a sparse grid really makes any significant difference anymore, yet I can theoretically see a very high edge equivalent resolution even on just an 8x sparsely sample MSAA pattern for example ( 8*8 ).

Clearly the more samples the better, yet I don't think the resulting grid is irrelevant after all.

Click to expand...

It was just an example. However, the difference can be quite subtle but when you see it back to back you'd always go for the higher sample rate. You also need to stop thinking purely in terms of triangle edges...

John.

If fill-rate wouldn't be a consideration with let's say Super-sampling (even moreso with increased sample amounts) I would of course know what's better in the end. For the time being Multisampling has still sizeable advantages in terms of fill-rates, irrelvant whether TBDR or IMR.

My personal preference would be to continue to use MSAA on the majority of today's sceneries and use only SSAA selectively for parts of the scene that would actually need it (like alphas, render2texture etc etc.). That way SSAA would partially cater for all corner cases and I wouldn't be forced to lower resolutions with anti-aliasing enabled.

Ok I'm OT again....

Dave B(TotalVR) · Apr 21, 2004

Kinda, the basic Idea I had was that, completely and utterly correctly (and not nessecarily possible on a 3D card - simplify) would be to take a given sphere who's density, opacity, call it what you will is governed by an equation dependant on x,y and z.

So, for example the equation for smoke volume might be opacity = const x (X^2+y^2+z^2)^-1 (or in spherical polars const x r^-2).

To calculate the amount of opacity traversed by any single pixel a 3 way integral must be performed. imagine a grid drawn over a sphere. Any given pixel on that grid will have the co-ords (a,b) and the pixels are of length 2 (say) so you must integrate between a-1, a+1 and between b-1 and b+1. Plus a thrid integral along the entire z-axis, or until you meet a partially enveloped object.

Now thats far too complicated off the bat to be done in real time but a lot of simplifications can be made. A lot of the integration results could be pre-calculated, for example. There are more but I dont have time to post all of my thoughts right now as I am off to the boozer. But would be a pleasure to talk more on it.

Dave

nAo · Apr 21, 2004

MfA said:
I assume that immediate mode renderers which are be able to do it will use screen space interleaving ala Bitboys to avoid dependency issues (or to give it it's formal name, sort middle).

AFAIK that's what R300 does. Screen spaces is subdivided in tiles, with each modulo N tile assigned to a quad engine, where N is the number of quad engines.

ciao,
Marco

Simon F · Apr 22, 2004

Panajev2001a said:
Simon F said:

how many clocks does it take for each read/modify/write process?

If you have two, small sequentially adjacent translucent polygons that overlap each other, do you get full performance?

Click to expand...

Ahem... how do I break it to him... it has... well... like... 0.

My my! That is impressive. Zero clocks equals infinite fill rate. With that, why did they bother putting more than one texture unit....?

AGHHH... the system just ate some lengthy additions to this post ("HTTP error"

) describing limitations that would ocur due to single ported memory and pipeline lengths but I just noticed:

nAo said:
MfA said:

Can the rasterizer in the graphics synthesizer even work on multiple primitives at a time?

Click to expand...

Unfurtunately it can't.

This would eliminate problems with data hazards that I was trying to described in the edited text. Unfortunately, it does it by deliberately slowing down the system.

nAo · Apr 22, 2004

Simon F said:
This would eliminate problems with data hazards that I was trying to described in the edited text. Unfortunately, it does it by deliberately slowing down the system.

And that's nothing! What about a completely unaware of memory pages triangles rasterizer ?

Simon F · Apr 22, 2004

nAo said:
Simon F said:

This would eliminate problems with data hazards that I was trying to described in the edited text. Unfortunately, it does it by deliberately slowing down the system.

Click to expand...

And that's nothing! What about a completely unaware of memory pages triangles rasterizer ?

Presumably that would then explain the recommendation of using smallish triangles.

PowerVR Series5

Panajev2001a

Simon F

Tea maker

jvd

Simon F

Tea maker

nAo

Nutella Nutellae

JohnH

Panajev2001a

Panajev2001a

MfA

Dave B(TotalVR)

MfA

nAo

Nutella Nutellae

MfA

pmac

Ailuros

Epsilon plus three

Dave B(TotalVR)

nAo

Nutella Nutellae

Simon F

Tea maker

nAo

Nutella Nutellae

Simon F

Tea maker

Similar threads