Handing out free advice for everyone making their own PlayStation 2 game engines

Discussion in 'Console Technology' started by corysama, Nov 13, 2020.

Tags:
  1. corysama

    Newcomer

    Joined:
    Jul 10, 2004
    Messages:
    189
    Likes Received:
    163
    Through a private channel I have been asked for advice by someone currently working on their own, new PlayStation 2 game engine. If I'm going to take the time to write this stuff up, more that one person should read it. So, I'll post it here.

    I am very, very far from the world's greatest expert on anything I talk about here. I just know a bit and I've forgotten a lot. If anyone else has knowledge to share, please do chime in.

    I am currently very swamped at work and in general. Have been for the past year. This will be a slow moving thread. But,this is fun to write about. So, I'll come back when I can. I've been coming back to this forum since the heyday of PS3 speculation. I'm not often around. But, I'm not ever going completely away.
     
    Pete, orangpelupa, function and 3 others like this.
  2. corysama

    Newcomer

    Joined:
    Jul 10, 2004
    Messages:
    189
    Likes Received:
    163
    Let's talk a bit about the GS.

    The weakness of the GS is that it can only read 1 texture and it has a very limited blend equation. So, you can only do match linearly in a equation like (((A op B) op C) op D). Those parens are cruel and unforgiving.

    The strength of the GS is that it can switch settings in just a few cycles. That's way faster than pretty much any other hardware. Even modern "bindless" hardware is going to spend more than a few cycles if only for the cache miss. So, you can draw a dozen tris, switch blend mode and textures, push the already transformed tris again, switch textures, etc... and it's OK. It's a necessity on the GS because you are required to do multipass rendering to do any interesting shading and you don't want to transform & clip each triangle over and over for each pass.

    There are a few basic GS "shaders". AKA: common multipass setups.
    • Single pass - Just draw the texture. Maybe with some vertex lighting.
    • Lightmapping - 2 sets of UVs. Unfortunately, the GS can only do monochrome multiplication.
    • Blending - Environments geo needs to mix textures to hide tiling. Investigate combining vertex and texture alpha. Optionally needs a lightmap pass in addition.
    • Masked Sphere Mapping - Surprisingly effective and worth a detailed explanation.
    The artists I worked with used masked sphere mapping on way more than I expected. Characters, environments, pretty much everything. It feels dynamic and looks way more sophisticated than it is. We had magazine reviews praising our "normal mapped characters and environments" even though the PS2 does not support normal mapping.

    The idea is simple: 1st pass do spherical environment mapping. So, since we can't use cube maps, just use a 2D image of a lit sphere and convert the view-space normals to 2D UVs. For the 2nd pass, draw the vertex-lit diffuse texture blended over the environment map. For the blend equation you can either do mul-add or lerp. mul-add is a bit more flexible. Lerp is a bit easier for artists to understand and control. With the blend, the diffuse texture's alpha becomes a per-pixel specular mask The per-pixelness of it is unexpected on the PS2 and makes it feel special.

    It is possible to do shadow mapping on the GS. The math is a bit of a mind-bend. You can render an 8-bit depth map easy enough. And, you can use classic projection math to project that map onto an mesh like everyone does for shadow mapping. The catch is that you can't do the normal "if the interpolated vertex distance from the light is > the sampled projected shadow map distance : do shadowing" math everyone uses. But, what you *can* do is: Calculate the vertex distance to the light on the VU and put that in the GS vertex alpha? specular? I forget. But, you can do an alpha test vs. "vertex alpha + shadow map value" And, you can set them up such that the map contains the (8 bit) distance *from* the light *to* the shadow caster. And, the vertex alpha contains the distance *from* the mesh *to* the light. So, if the two distances in their opposite directions cross each other A--<->--B, their sum will be >= 255. With that you can do an alpha test vs. 255 and that's where there is shadow. Unfortunately, you cannot do a multiple or lerp operation in this setup. Best you can do is subtract. But, subtracting colors looks really unnatural. Looks like burning the screen. So, usually you subtract 255 to slam the shadow to full black. :/

    Use 8-bit palettized textures. Really. There is no excuse for 24/32 bit textures. 4 bit textures are more trouble than their worth though. Palettization algos are such old news now that the hard part will be finding a good one that hasn't been abandoned and deleted from the internet. Each texture can have it's own palette. No need to share. I'm a huge fan of palette swaps and color cycling. But, it's rare to find an artist who understands the concept. The only good paint program I know of that supports the practice is https://www.aseprite.org/ Watch this video it's worth it.

    Do not treat GS RAM as a fixed limit on texture memory. Stream textures from EE RAM into GS RAM during the frame. This isn't free. It requires some smart scheduling and space management. But it's faster than you'd expect.

    Particles are usually either additive or lerping. I'm a big fan of "premultiplied alpha" which lets you blend between both effects per-pixel. Great for glowy fire and dark smoke in a single sprite. A bit hard for artists because they don't teach it in art school. Subtractive blending can also be used for evil, unnatural effects. You can get sub-pixel antialiasing for tiny particle by having the VU clamp the minimum screen-space size of a sprite to be 1x1 screen pixel. That way it always touches 1 pixel and never gets lost between pixels. Instead of letting it get smaller, modulate (dim) the particle alpha/brightness by how much smaller than 1x1 it would have been. Boom. Manually calculated sub-pixel color contribution.
     
    Rootax, Pete, function and 2 others like this.
  3. corysama

    Newcomer

    Joined:
    Jul 10, 2004
    Messages:
    189
    Likes Received:
    163
    This doc and one other I can't find were the basis of a full-screen post processing system I worked on that worked out very well. That doc explains how to use a tricky GS setup to copy 8 bits from the Z buffer to the alpha channel of the render target. That's useful for blending in fog and depth of field effects. The one I can't find explained how to set up very specifically sized (very tall) sprites that line up with the internal caches of the GS's EDRAM so that you could copy form one full-screen buffer to another with maximum bandwidth.
    My tests of this technique showed that the marketing BS numbers from the PS2 announcement were actually real --for this very specific setup only. In theory they claimed enough bandwidth that you should be able to do 60 full screen copies per frame at 60 FPS. In practice I was able to demonstrate 50/frame@60FPS.
    So, I had an API that would set up a Path 3 DMA command queue, then there was a simple interface for specifying a small set of commands to the GS:
    1. The full-screen, full-speed trick. But, you could also specify a vertex color, the blend mode and a single UV offset.
    2. Same as 1, but with an arbitrary source and dest rectangle.
    3. The depth to alpha trick.
    4. A full-screen grid where you could distort the grid UVs any way you want.
    I repurposed the depth buffer as the scratch space for this system. It is conveniently the same dimensions and bit depth as the back buffer. And, after the optional depth->alpha copy, there is no more need of the depth values at that point in the frame.
    Using the arbitrary-rect mode I set up a recursive, separable Gaussian blur that could do an arbitrary-wide blur kernel (even as wide as the whole screen) with a fixed cost of 7 full screen passes (<3ms). It worked by doing 5 passes with UV offsets in horizontal line accumulating into a half-width destination buffer. Then it would read from there to do 5 passes with UVs offset to read from a vertical line into a half-height destination. By recursively writing half-width, then half-height, it would eventually reduce down to a single pixel. Along the way it would fill exactly 1 screen with the intermediate results. From there it would recurse in reverse. Starting from the final single-pixel it would stretch and blend the smaller, later results back up over the earlier, larger results in a single pass. By manually controlling the alpha value for each of these upwards blends, you could pick any combination of the intermediate results to contribute to the final result.
    The 5 samples were offset by so that the 4 non-center samples fell between source pixels. So, 7 source pixels were read per dest pixel. There was a lot of banding until I introduced a fixed [+1,-1,0,-1,+1]/128 bias to the sample weights.
    This was obviously great in combination with the depth-to-alpha trick for depth of field effects. It was fast enough that one game left it running 100% of the time. They just adjusted the contribution weights from light-blur during gameplay to heavy-blur during the pause menu.
     
    Pete, Tkumpathenurpahl and BRiT like this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...