Questions about PS2

Just to make sure I'm getting this right: the HiZ data was generated by a Z prepass into the daughter die, that was then copied out to main memory, and then read back into the motherdie to and stored in the a dedicated HiZ buffer on the motherdie?

Is hierarchical z still used?
That article is explaining two things at once. Xbox 360 supported tiled rendering to eDRAM (as eDRAM size was only 10 MB). The same command list was replayed multiple tiles. Xbox 360 GPU had some hardware features to make this slightly faster. Most games didn't use this feature.

Hierarchical Z on the other hand is/was a great feature. It was introduced already in DirectX 7.0 GPUs (http://www.graphicshardware.org/previous/www_2000/presentations/ATIHot3D.pdf). It is still used by all modern AMD/Nvidia/Intel GPUs. The idea is simple. You have an additional lower resolution depth buffer (for example at 8x8 lower resolution, saving 64x BW and memory). This lower resolution depth buffer stores the maximum (furthest away) depth value of the (8x8) tile. As the HiZ buffer is much smaller than the actual Z-buffer, the GPU can keep it either in dedicated fast on-chip memory or efficiently cache it (all modern GPUs have general purpose R&W caches). This cuts down the bandwidth required to read the (full resolution) depth buffer quite a bit, especially in scenes with lots of depth overdraw. Also it allows the GPU to cull multiple pixels at once with a single HiZ depth test.
 
Hierarchical Z on the other hand is/was a great feature. It was introduced already in DirectX 7.0 GPUs (http://www.graphicshardware.org/previous/www_2000/presentations/ATIHot3D.pdf). It is still used by all modern AMD/Nvidia/Intel GPUs. The idea is simple. You have an additional lower resolution depth buffer (for example at 8x8 lower resolution, saving 64x BW and memory). This lower resolution depth buffer stores the maximum (furthest away) depth value of the (8x8) tile. As the HiZ buffer is much smaller than the actual Z-buffer, the GPU can keep it either in dedicated fast on-chip memory or efficiently cache it (all modern GPUs have general purpose R&W caches). This cuts down the bandwidth required to read the (full resolution) depth buffer quite a bit, especially in scenes with lots of depth overdraw. Also it allows the GPU to cull multiple pixels at once with a single HiZ depth test.

I suppose my confusion was about where on Xenos the HiZ buffer is...?

It would seem to make most sense on the mother die, so that fragment shader work could be avoided entirely for (potentially) multiple invisible fragments at once, rather than shading fragments only to have them rejected once they get to the daughter die.

But in order to populate a HiZ buffer on the mother die (which can't read from a z-buffer in edram), wouldn't you have to do a z prepass? Or if the HiZ buffer is on the daughter die, which would allow you to update your HiZ buffer on the fly without a prepass, wouldn't that mean you lose much of the benefit of potentially avoiding unnecessary fragment shader work on the main Xenos die?
 
You can see the post-processing system in action in the worst game I ever worked on: "Charlie and the Chocolate Factory for the PS2". Notice that everything has a vaseline-on-the-lens blur to make it feel "magical". It's doing a 5-tap Gaussian downsample-and-blur recursively all the way down to 1 pixel followed by a recursive upsample-and-blend with variable weights at each upsample stage. Effectively it does work equivalent to 6 fullscreen passes. But, it uses about 5% of the GS @ 30Hz. So, we just left it on all the time and only adjusted the weights for different effects. If could do a blur kernel that effectively stretched across the whole screen or did nothing, or any frequency distribution in between just by adjusting the upsample weights. I'm proud to say that was my only contribution to that game ;) Getting an equivalent effect out of the Xbox version was much harder than the PS2 version. I believe corners were cut.

That's really cool. I tried doing a Gaussian filter that way on PSP once, I don't think it ended up being very viable performance-wise (and here I was already only using the GPU for upscaling)
 
Did Sony not release any optimized tools to ease PS2 development, especially in regards to making better use of the Vector Units? Despite the system's excellent sales, it seemed a lost opportunity to really push the system to it's real limits, as I hear so few games even used VU0.
 
Did Sony not release any optimized tools to ease PS2 development, especially in regards to making better use of the Vector Units? Despite the system's excellent sales, it seemed a lost opportunity to really push the system to it's real limits, as I hear so few games even used VU0.


From what I understand, VU0 was commonly used in macro mode.
 
The issue with the VU0 was that it could not DMA in or out of main memory on its own. That meant the CPU had to upload code and data (max 4K total IIRC), start the execution, find something else to do for a very short while, start polling the VU0 to determine when it finished the work, and repeat. Finding work on the CPU do to between polls was such a pain that the CPU would spend pretty much all the time idle and polling. So, you'd be manually getting the same effect as macro mode.

However... Near the end of the PS2 lifecycle, some friends of mine figured out how to get the VU0 to trigger an interrupt on the CPU. The CPU could set up DMAs out and into the VU0 in the interrupt without any effect on the main thread of CPU execution. With a tiny bit of setup, the VU0 could use that to effectively DMA whatever it wants by proxying through the CPU. With that in place, we could set up long chains of work that would progress independently of the CPU. Suddenly the VU0 became very useful.

Unfortunately, at that point we stopped getting PS2 contracts. I don't think we ever shipped that feature. I bet the higher-end studios discovered this trick long before we did though :)
 
Last edited:
The eDRAM doesn't have three buses. It has one bus that's 8192-bits wide. That bus is connected to two different 8KB buffers. One for pixel accesses and one for texture accesses. The 1024-bit and 512-bit buses connect to those buffers. Not the eDRAM.
You mean what two 1024-bit buses and one 512-bit bus connected to two 8kb buffers and then these two buffers connect to EDRAM with 8192-bit bus? But how there can be only two 8 kb buffers when there is 8 banks of EDRAM?

PS2 EDRAM is something I know a little bit about.
Thank you for writing here. Very interesting info! :D

What I recall was that the pre-launch marketing numbers regarding EDRAM bandwidth suggested that it should be possible to do 60 fullscreen passes at 60Hz
Is these 60 passes is multipass rendering passes? But if my calculations are correct there is: 640x448x(32 color+32 Z)x60(passes)x60Hz = 7875Mb. That is a lot more less than theoretical bandwith.

You can see the post-processing system in action in the worst game I ever worked on: "Charlie and the Chocolate Factory for the PS2". Notice that everything has a vaseline-on-the-lens blur to make it feel "magical". It's doing a 5-tap Gaussian downsample-and-blur recursively all the way down to 1 pixel followed by a recursive upsample-and-blend with variable weights at each upsample stage. Effectively it does work equivalent to 6 fullscreen passes. But, it uses about 5% of the GS @ 30Hz. So, we just left it on all the time and only adjusted the weights for different effects. If could do a blur kernel that effectively stretched across the whole screen or did nothing, or any frequency distribution in between just by adjusting the upsample weights.
Sounds very good! And I think that game looks good! ;)

I'm proud to say that was my only contribution to that game
And I'm proud read some info from you! ;)

Getting an equivalent effect out of the Xbox version was much harder than the PS2 version.
You say that some effects worked better on PS2 than on Xbox?

I believe corners were cut.

What corners! :D

There was also a trick where you could have the sprites copy certain bits from the depth buffer into the alpha channel of your color buffer. Effectively you could copy specifically from the "green" channel (bits 8-16) of depth. By using the color buffer destination alpha instead of a constant as the final upsample blend factor of the blur, you could achieve a decent depth-of-field on the PS2 super cheap (again 10% of a 60Hz frame) or you could just blend in a constant color for colored depth fog. We used both of those techniques in the much better game "Hunter the Reckoning: Wayward". But, I can't find a good video to show it off...
This sounds crazy! :D But on PS2 were done a lot of crazy stuff.
 
However... Near the end of the PS2 lifecycle, some friends of mine figured out how to get the VU0 to trigger an interrupt on the CPU. The CPU could set up DMAs out and into the VU0 in the interrupt without any effect on the main thread of CPU execution. With a tiny bit of setup, the VU0 could use that to effectively DMA whatever it wants by proxying through the CPU. With that in place, we could set up long chains of work that would progress independently of the CPU. Suddenly the VU0 became very useful.
Never heard about it. That means that in last years of PS2 lifecycle VU0 was used a lot and not only in macro-mode.
Your friend you mentioned is from studio you worked/work?
 
Is these 60 passes is multipass rendering passes? You say that some effects worked better on PS2 than on Xbox? What corners! :D That means that in last years of PS2 lifecycle VU0 was used a lot and not only in macro-mode. Your friend you mentioned is from studio you worked/work?

Thanks for the kind words. The PS2 could only read from a single texture sample per pass. So, everything was "multipass" all the time on the PS2 kinda by definition. The raw fill rate and bandwidth of the PS2 was higher than the Xbox for this situation. The Xbox could do a whole lot more math per pixel, but that didn't help much for this technique. So, I think they had to do a plain downsample before the blur on the Xbox. I didn't need to do that on the PS2. Yep, my friend worked with me at the studio I was at at the time. I never saw the VU0 interrupt trick mentioned in the docs or the forums. We didn't talk about it publicly. I don't think anyone did. But, I expect Naughty Dog and several other people discovered it long before we did.

Hey look! The doc about the green-to-alpha trick with depth is public! http://develop.scee.net/wp-content/uploads/2014/11/SpecialEffects.pdf :D

For anyone seeking more information, I'll just leave this here... http://hwdocs.webs.com/ps2 IIRC, the GS basically had a "texture function" setting for combining the interpolated vertex color with the source texture and it had a "alpha blending" setting for combining the result of that with the "destination texture" (aka the frame buffer). That was pretty much it. You could call that "The one and only PS2 fragment shader" if you were feeling generous. If you want a fun challenge: figure out how to use those two settings together to do depth-map shadows. (Hint: you only get an 8-bit depth map and you might not be very satisfied with how the shadow interacts with the lighting...)
 
Been wondering this for a while, but did developers find decent way to fight the MipMap selection problem on ps2?

In my understanding the default method selected the mipmap level purely by distance and didn't take account of the tilt of polygon.
This lead to a quite bit of an aliasing in many games.
 
Been wondering this for a while, but did developers find decent way to fight the MipMap selection problem on ps2?

In my understanding the default method selected the mipmap level purely by distance and didn't take account of the tilt of polygon.
This lead to a quite bit of an aliasing in many games.

Worst examples of that by far are Dynasty Warriors 2 and 3.
 
Thanks for the kind words.
You're welcome!

For anyone seeking more information, I'll just leave this here... http://hwdocs.webs.com/ps2 IIRC, the GS basically had a "texture function" setting for combining the interpolated vertex color with the source texture and it had a "alpha blending" setting for combining the result of that with the "destination texture" (aka the frame buffer).
That link is absolutely priceless! Thank you a lot!
 
A little offtop but. I've heard about almost all consoes what something in their hardware could've been better. More L2 cache, one block instead of other, more RAM etc. What anyone can tell about first Xbox? Because I've never heard what something coud be better in Xbox hardware.
 
I have a question about PS2: were there any other games other than Matrix: Path of Neo that used normal mapping on PS2?

I can't find any others that seem to use the same effect so often, on PS2 I mean.
 
A little offtop but. I've heard about almost all consoes what something in their hardware could've been better. More L2 cache, one block instead of other, more RAM etc. What anyone can tell about first Xbox? Because I've never heard what something coud be better in Xbox hardware.

Probably more RAM and more memory bandwidth to boot. Still doesn't play down how much more graphically powerful the Xbox was compared to it's contemporaries.
 
Ah very cool. Would be also interesting to know why/how they opted for these things too.
A little offtop but. I've heard about almost all consoes what something in their hardware could've been better. More L2 cache, one block instead of other, more RAM etc. What anyone can tell about first Xbox? Because I've never heard what something coud be better in Xbox hardware.
I think it could've used more bandwidth maybe?

On almost every other front it was pretty far ahead of the other consoles. CPU power and GPU power, but some of the really crazy shaders/fullscreen effects were probably not as easy to pull off on the Xbox with less GPU bandwidth. Like the quality of individual particle effects was higher, and generally multiplatform games performed quite a bit better on Xbox, but sometimes the same games were able to do better fullscreen or other complicated shader effects on PS2 because of its higher bandwidth.

Maybe GTA: SA is an example of this. e.g. the Xbox version had much sharper shadows, but some of the fullscreen effects and shading looked "better" on PS2 like the heat haze effects or the reflection maps on the cars.

At least that was always my impression. I remember also that Halo could have a lot of framerate problems in much more chaotic and larger scenes, especially with lots of particle effects/blood decals (sometimes it wouldn't cull this stuff IIRC lol, but I liked that ^^), but at the same time it could even do those scenes in the first place because of having so much more VRAM and such.
 
Last edited:
I've ordered some consoles from hardest to program to easiest. What anyone think about it? Am I mistaken?
Sega Saturn > Nintendo 64 > Playstation 2 > (Gamecube, Wii, Wii U, Xbox 360) > (Playstation, Dreamcast, Xbox) > Xbox One > Playstation 4.
So hardes was Sega Saturn, easiest Playstation 4 and some consoles as you can see are more or less the same.
 
I'm not sure that's an easy list to do. Depends on who programs on them and so many other variables... Also they span so many generations, how can we compare?
 
I'm not sure that's an easy list to do. Depends on who programs on them and so many other variables... Also they span so many generations, how can we compare?
I've read a lot of topics here and on onter sites and that list was based on that information.
 
Back
Top