Brief inquiry on PS2 GS capabilities.

Deadmeat · Feb 8, 2004

...

But GC and DC? Higher resolution textures with "only" respectively 24 and 16Mb RAM?

Both keep directly addressible external texture buffers whose size is only limited by external RAM size. The GS doesn't have this option.

And again, as the majority of PS2 games use mostly 4bit CLUT, other kinds of texturecompression (S3TC and 2x2VQ) wouldnâ€™t have an advantage, if we are talking about the resolution of the textures only.

CLUT textures will never compare to real compressed textures. The fact that PSX2 has the worst texture(worse than DC) of all proves this.

marconelly! · Feb 8, 2004

But GC and DC? Higher resolution textures with "only" respectively 24 and 16Mb RAM?
And again, as the majority of PS2 games use mostly 4bit CLUT, other kinds of texturecompression (S3TC and 2x2VQ) wouldnâ€™t have an advantage, if we are talking about the resolution of the textures only.

GC has more than 24MB of RAM and a more efficient texture compression. You cannot use all the 4bit textures on the PS2. You have to mix with the 8 bit ones. You can't have a skydome textured with 4bit CLUT texture for example, and 8bit CLUT is not very space efficient. Dreamcast, I wouldn't say has higher res textures than selected PS2 games, but on average it was easer to make good textures on it. It has more than 16MB as well - it has 8MB of video RAM and textures were stored there (I am not familliar that DC stored it's textures in main RAM and could access them on a frame-basis, like Deadmeat seems to be implying, or like GC is doing). It also had much fewer polygons to texture, on average.

you'd expect there'd be PS2 games out there with a better balance of RAM allocation for textures that would sit up there with the best textured Xbox games (from a technical standpoint).

That is simply not possible. Even if you allocate all the memory to textures (which is obviously impossible), it would still be only the half of the memory found in Xbox, which is something you can easily and realistically allocate for textures on it. That is before you take texture compression into the story. Are there games on the PS2 that have very good textures - certainly - but no matter what you do with it, you *can* make a game with even higher resolution textures on the Xbox.

True. You can get good looking cinematic-type effects with some effort from the more PC-like systems, too.

Back then, it seemed noone really could do anything comparable. PC cards had awful fillrate and threw all their power onto texturing it seemed. As for the DC, maybe it had untapped potential in that reard, but now we'll never know it.

Squeak · Feb 8, 2004

Re: ...

Deadmeat said:
GS :
1. Must keep all three buffers onchip(eats 3.5 MB out of 4 MB)
2. Has no texture compression.
3. Has only 512K available for texture buffer.
4. Has no external memory.
5. A developer must continually upload new texture into texture buffer inside the GS before sending polygons.

1. 640x480x24/8/1000x3=2764Kb. That leaves 1332Kb for textures, and that's even a generous set up, some games would look fine at 640x240x16/8/1000x3=921Kb, which leaves 3174Kb for texture.
2. Then what would you call CLUT, if not a compression format?
3. See 1.
4. There is main mem, itâ€™s just that the memory controller is on the EE die, as opposed to GC and xbox, where it resides on the GPU.
5. So itâ€™s a manual cache, thatâ€™s called a scratchpad-RAM, right? You can find many papers around the web that will tell you that scratchpads are very well suited for multimedia applications.
Iâ€™ve read that synchronisation between geometry and textures can be problem, especially if using the fast MSKPath 3 upload, but allegedly itâ€™s â€œonlyâ€ a matter of finding the right balance, and send the texture really early on.

Guden Oden · Feb 8, 2004

Re: ...

Deadmeat said:
Xbox :
1. Has an onchip Z-buffer compression.

No, it does not have Z-buffer compression.

An ideal GPU configuration is an internal back frame + z-buffer, and an external front frame buffer + texture memory.

Why is this "ideal"?

As implemented in GC it's actually more of a hindrance as AFAIK you can't draw into the framebuffer during the time interval you spend copying its contents out to main RAM. You can't do it on any design that doesn't have an extra read port just for this very purpose, which seems cost-inefficient to me. Means you basically burn fillrate for no reason, as there's no inherent ADVANTAGE to copying out the front buffer to main memory. It was just a cost-cutting technique they used because 1T-SRAM takes up a lot of die space.

You mention it as an advantage because it is NOT how GS works...

Compare above to the GS

You only say it's "ideal" because it differs from Sony's setup, that's easy to see. You have no FACTS that say a GC-like setup is any more or less ideal.

1. Must keep all three buffers onchip(eats 3.5 MB out of 4 MB)

Actually, there should be more than half a meg free. 640*448 (standard PS2 res, I believe), 16-bit everything, double-buffered with Z is less than 2MB, triple-buffer it's little over 2. Then you could cheat and do half-high front buffer or even all buffers and free up even more. Dunno what math you used though to come up with 3.5 megs, sounds like arithmabogutics to me.

2. Has no texture compression.

Well, it has the MDEC, so it does, sort of.

4. Has no external memory.

Um, well, actually the console has 36 megs external memory... Guess you forgot about that.

Deadmeat · Feb 8, 2004

...

1. 640x480x24/8/1000x3=2764Kb. That leaves 1332Kb for textures

32 bit pixels and Z-buffer, not 24 bit. Pixel is in RGBT format. 32-bit Zbuffer is needed for a higher accuracy.

that's even a generous set up, some games would look fine at 640x240x16/8/1000x3=921Kb

Interlacing? You will never match the image quality of GC and Xbox.

2. Then what would you call CLUT, if not a compression format?

CLUT existed since the 8-bit days of PC-Engine. PSX made an extensive use of CLUT. Yet no one calls it TC.

4. There is main mem, itâ€™s just that the memory controller is on the EE die, as opposed to GC and xbox, where it resides on the GPU.

The GS cannot address("see") the main memory; it doesn't exist as far as the GS is concerned. It is upto the developer to swap out the GS texture before rendering each strips.

5. So itâ€™s a manual cache, thatâ€™s called a scratchpad-RAM, right? You can find many papers around the web that will tell you that scratchpads are very well suited for multimedia applications.

???

Iâ€™ve read that synchronisation between geometry and textures can be problem, especially if using the fast MSKPath 3 upload, but allegedly itâ€™s â€œonlyâ€ a matter of finding the right balance, and send the texture really early on.

Other consoles, even the DC, spare developers of this trouble altogther.

Deadmeat · Feb 8, 2004

No, it does not have Z-buffer compression.

NVIDIA XGPU
0.15-micron Process
233 MHz
4 Pixel Pipelines
2 Texels per Pixel Pipeline
8 Texels per Clock Cycle
4 Texture Layers per Rendering Pass (2 clock cycles)
0.93 Gigapixels per Second
1.87 Gigatexels per Second
3.73 Billion Anti-Aliased Samples per Second
Point, Bilinear, Trilinear, Anisotropic, Quadlinear Mip-Map Filtering
Perspective-Correct Texture Mapping
DotProduct3 Bump Mapping (DOT3)
Environment Mapped Bump Mapping (EMBM)
Cubic Environment Mapping (CEM)
Volumetric Textures (3D Textures)
Z, Stencil, Shadow, and Multisampling Buffers
S3TC and DirectX DXT1-DXT5 Texture Compression
Full-Scene Anti-Aliasing (2x, Quincunx, 4x)
32-bit Color (RGBA)
32-bit Z Buffer
Programmable Pixel and Vertex Shading Processors
2 Vertex Pipelines
233 Million Particles per Second
116.5 Million Polygons per Second
Triangle Tessellation
Z-Buffer Compression and Hidden Surface Removal (HSR)
1 Trillion Operations per Second (1000 BOPS)
80 GFLOPS

I forgot to mention HSR, if a developer would render from front to back instead of back to front like pre-Z buffer days...

Why is this "ideal"?

Because it gives the best balance between high performance and low cost.

As implemented in GC it's actually more of a hindrance as AFAIK you can't draw into the framebuffer during the time interval you spend copying its contents out to main RAM.

How large is each front buffer? 1.2 MB. Do this 60 FPS and the bandwidth requirement is only 72 MB. That's a very small fraction of overall external memory bandwidth. Not to mention that the Flipper would be doing an internal back frame buffer and Z-buffer flush out during this time.

It was just a cost-cutting technique they used because 1T-SRAM takes up a lot of die space.

Exctly, a balance between cost and performance.

You mention it as an advantage because it is NOT how GS works...

GS is a textbook example of "How NOT to design a GPU".

Actually, there should be more than half a meg free. 640*448 (standard PS2 res, I believe), 16-bit everything, double-buffered with Z is less than 2MB, triple-buffer it's little over 2.

While competitions are doing higher resolution and higher color depth.

Well, it has the MDEC, so it does, sort of.

Historians will debate for decades why SCEI engineers put the texture decompression unit outside of a GPU...

Um, well, actually the console has 36 megs external memory... Guess you forgot about that

The money you can't touch is worthless. The memory GS can't touch is worthless too.

Megadrive1988 · Feb 8, 2004

It will be very interesting indeed to see how SCEI approaches graphics with PS3's GPU, the Graphics Synthesizer 3 or 'Visualizer'. 8)

marconelly! · Feb 8, 2004

32 bit pixels and Z-buffer, not 24 bit. Pixel is in RGBT format. 32-bit Zbuffer is needed for a higher accuracy.

Many games on PS2 use 24 bit color buffer and get by just fine. Unless you are looking at the low contrast gradient, it's impossible to tell it's not 32bit anyways. Large number of GC games use 16bit color buffer and utilize dithering.

Interlacing? You will never match the image quality of GC and Xbox.

Actually some of the games with the best image quality today (as long as you play them on a normal, interlaced TC) are the games that use half frame buffer. I'm talking about games like BG

A, Champions of Norrath, Tomb Raider:AOD and some others.

Squeak · Feb 8, 2004

Re: ...

Deadmeat said:
1. 640x480x24/8/1000x3=2764Kb. That leaves 1332Kb for textures

Click to expand...

32 bit pixels and Z-buffer, not 24 bit. Pixel is in RGBT format. 32-bit Zbuffer is needed for a higher accuracy.

Not even the new ATI and Nvidia cards have 32bit z-buffer, so why should a three year old console? And 32bit buffers on a television screen?! That's just not worth it. GC can only do 24bit anyway, and highly doubt that xbox developers waste their precious bandwidth on 32bit buffers.

that's even a generous set up, some games would look fine at 640x240x16/8/1000x3=921Kb

Click to expand...

Interlacing? You will never match the image quality of GC and Xbox.

For slow moving games like adventures and survival horror it would not be a problem.

2. Then what would you call CLUT, if not a compression format?

Click to expand...

CLUT existed since the 8-bit days of PC-Engine. PSX made an extensive use of CLUT. Yet no one calls it TC.

The basic ideas in Block Truncation that S3TC is based on, is very old too.
Vector Quantization is the principle on which all real-time texturecompression in current consumerhardware is based.

Lazy8s · Feb 8, 2004

marconelly!:

Even if you allocate all the memory to textures (which is obviously impossible), it would still be only the half of the memory found in Xbox, which is something you can easily and realistically allocate for textures on it.

The PS2 game wouldn't even have to compare so much in texture variety, just do better at comparing in texture quality technically to top Xbox games (and also, it has fewer polygons to texture on average than Xbox, to start with). You'd expect to see such a thing here or there, if your theory of main RAM storage space being the big limitation here were true. I'm not saying "exactly match or exceed" Xbox quality either, just compare more favorably. That it doesn't seem to have managed anisotropic filtering and dot product bump mapping in-game that even the DC managed to achieve in its abbreviated lifespan, it seems to point to the problem being more than memory.

That is before you take texture compression into the story.

Now you're talking. This is one of the things, other than slapping in more main memory, that could stand to be fixed to best improve the situation. Not that such a thing would necessarily be a simple change, though.

As storage space is the limiting factor for texture budget, and texture compression schemes are for making the most of a texture budget, the fixed memory requirement for texturing should make PowerVR VQ a really great solution. With all of the space it saves, you can really afford to up the resolution and variety of textures with the extra memory.

marconelly! · Feb 8, 2004

The PS2 game wouldn't even have to compare so much in texture variety, just do better at comparing in texture quality technically to top Xbox games (and also, it has fewer polygons to texture on average than Xbox, to start with).

I see what you mean, but:
a) There are games that compare well to top tier Xbox games (one look at SH3, Champions of Norrath, Ghosthunter or Killzone should tell you that much) but none of them can really exceed what is technically possible on Xbox in that regard. 2X the amount of memory is unsourmountable difference, and much bigger one than the advantage DXTC compression gives it (in many cases, CLUT can get you very far and get very close to space saving you get with hardware TC)

b) Top tier PS2 games really don't have fewer polygons to texture, if anything they are probably really close in that regard.

You'd expect to see such a thing here or there, if your theory of main RAM storage space being the big limitation here were true. I'm not saying "exactly match or exceed" Xbox quality either, just compare more favorably. That it doesn't seem to have managed anisotropic filtering and dot product bump mapping in-game that even the DC managed to achieve in its abbreviated lifespan, it seems to point to the problem being more than memory.

As I've said, there are games here and there that compare more favorably, so that point is moot. From what I've seen, DC never managed to have a single game that utilized bumpmapping, and had pretty much one game with aniso textures (but MANY other visual problems in it to counter that advantage). On the other hand, I have yet to verify if that Stretch Panic PS2 game made by Treasure who claimed to have DOT3 bumpmapping, really has it or not.

Bohdy · Feb 8, 2004

Squeek said:
GC can only do 24bit anyway

Sorry to ask but can you help me find some evidence to support that? I have recently had cause to doubt this 'fact' but I have had some trouble locating from where it became so well known.

This is not stab at you or anything

aaaaa00 · Feb 8, 2004

Re: ...

Squeak said:
Deadmeat said:

1. 640x480x24/8/1000x3=2764Kb. That leaves 1332Kb for textures

Click to expand...

32 bit pixels and Z-buffer, not 24 bit. Pixel is in RGBT format. 32-bit Zbuffer is needed for a higher accuracy.

Click to expand...

Not even the new ATI and Nvidia cards have 32bit z-buffer, so why should a three year old console?

Wait, what is my GF4 telling me when it says it supports D3DFMT_D24S8 then? I'd call that a 32-bit z-buffer - 24 bits for depth, and 8 for stencil.

Guden Oden · Feb 8, 2004

Deadmeat said:
Z-Buffer Compression and Hidden Surface Removal (HSR)

My bad. I had the impression that didn't come until the FX series (was thinking of framebuffer compression), but anyway: Z compression only affects bandwidth, not the size of the Z-buffer. Hence, it doesn't matter in the scope of our discussion.

I forgot to mention HSR, if a developer would render from front to back instead of back to front like pre-Z buffer days...

It doesn't remove any more surfaces than what the Z-buffer already does (Z-buffer *IS* hidden surface removal. What it does is eliminate unneccessary texture reads, but this feature is ALSO irrelevant as GS has its own dedicated texture read-port which does not impact framebuffer read/writes. It is also so fast you might just as well not bother at all with doing front-to-back rendering, and indeed, devs do not. 10x overdraw per frame wouldn't bog down the GS noticeably anyway with the fillrate it has at its disposal. It DOES impact a much more bandwidth-limited system like XBox though.

It's one thing knowing a piece of hardware has a feature, another knowing what it is used for and what effect it has on the system. You have a good grasp on the former, but lack significantly in the latter.

Why is this "ideal"?

Click to expand...

Because it gives the best balance between high performance and low cost.

But you don't have any figures to back that up do you? You just say that because you don't like the GS or the way it works.

you can't draw into the framebuffer during the time interval you spend copying its contents out to main RAM.

Click to expand...

How large is each front buffer? 1.2 MB. Do this 60 FPS and the bandwidth requirement is only 72 MB. That's a very small fraction of overall external memory bandwidth.

It's still an unneccessary hit, which the GS does not have to deal with.

Not to mention that the Flipper would be doing an internal back frame buffer and Z-buffer flush out during this time.

NO, it would not. As I said, without a second read port for the framebuffer copy process, you can't access the on-chip memory while it's being read out to main memory! Buffer clears come ON TOP of the copying overhead.

GS is a textbook example of "How NOT to design a GPU".

Have you even opened a textbook on the subject? Somehow, I seriously doubt that.

While competitions are doing higher resolution and higher color depth.

Not always. Do all XBox games run in 32-bit? Not all GC games do.

Historians will debate for decades why SCEI engineers put the texture decompression unit outside of a GPU...

This I also seriously doubt...

I guess they put it where it is because it wouldn't do much good on the GS in its current implementation (its primary purpose being MPEG2 decoding after all), but you, not knowing much of anything regarding GPU design of course would neither understand, nor believe that.

The money you can't touch is worthless. The memory GS can't touch is worthless too.

Why would GS need to touch external memory? It eats display lists and spits out pixels as a result. There's no need for it to touch external memory when it is being fed all it needs through the DMAC in the EE.

It's the same as when people bitch the PS2 has no hardware T&L. That's because it wasn't designed that way. It's the same thing here. Different way of accomplishing the same thing. XBox GPU also eats display lists, but needs to touch memory because it doesn't have any on-chip RAM. Doesn't mean it's inherently superior because of it.

DeathKnight · Feb 8, 2004

Bohdy said:
Squeek said:

GC can only do 24bit anyway

Click to expand...

Sorry to ask but can you help me find some evidence to support that? I have recently had cause to doubt this 'fact' but I have had some trouble locating from where it became so well known.

This is not stab at you or anything

http://www.segatech.com/gamecube/overview/index.html

Bohdy · Feb 8, 2004

Hmm, that only partially helps.

You see I was under the impression that the Gamecube's internal framebuffer was limited to 24bits (8:8:8 or 6:6:6:6 mode), but I have recently been told (with some certainty) that it can and does support 32bit modes (8:8:8:8 etc). Any definitive judgement on this?

Panajev2001a · Feb 8, 2004

ERP said:
Panajev2001a said:

Mip-mapping... I would believe you if I did not have too many PS2 coders swearing at its implemetation in "obscure unt arcane ways".

Click to expand...

I said nothing about the quality of the solution.
The issue is the setup for mip mapping, not the GS side, it's just that pretty much every piece of hardware, since voodoo 1 and a lot before have done it for you. You basically have to compute or fudge the parameters. Usually it ends up being the latter.

One thing is not to provide a feature ( clipping on the GS ) and a totally different thing is to provide a practically broken one.

I do not like the latter.

I do see your point.

passerby · Feb 8, 2004

I think it has been reiterated millions of times that the bottleneck is the EE MIPS core itself, and most PS2 games, if not all, are EE-bound instead of GS-bound. There are a few documents that provide examples of the performance analyser. I am not certain if the examples were taken from R&C, since R&C screenshots were used. Examine the graphs, and take a look at the readings for the GS.

That's right, the GS is only registering that kind of miniscule activity!

With those kind of readings, you may even be able to hook another EE to the same GS, and the GS can hum along nicely.

I'm just reiterating what every dev has stated. The 'bad guy' in the PS2 system is really the MIPS core. Sure the GS is a bit feature-poor, but it was an intentional 'RISC-like' design approach.

PS: Obviously any dev is welcome to correct any mistakes.

Lazy8s · Feb 8, 2004

marconelly!:

From what I've seen, DC never managed to have a single game that utilized bumpmapping,

I have a Tomb Raider title here, on Windows CE no less, that certainly uses it, for one.

Steve Dave Part Deux · Feb 8, 2004

Flipper also performs z-buffer compression, but only when AA is enabled. GO- The pixel copy-out operation to the external frame buffer is done periodically, and I believe there is circuitry for handling that operation since it's generally known that AA can be performed during the copy out procedure. So AFAIK it's not really a case of waiting for the entire contents of the buffer to clear while your entire pipeline stalls.

Brief inquiry on PS2 GS capabilities.

Deadmeat

marconelly!

Squeak

Guden Oden

Senior Member

Deadmeat

Deadmeat

Megadrive1988

marconelly!

Squeak

Lazy8s

marconelly!

Bohdy

aaaaa00

Guden Oden

Senior Member

DeathKnight

Bohdy

Panajev2001a

passerby

Lazy8s

Steve Dave Part Deux

Similar threads