Should the SEGA DC and Saturn have launched with these alternative designs?

2bpp textures on DC was limited to very plain homogenous textures, with not too many shapes (it's essentially a variant of ordered dither).
Pardon? I developed the 2bpp VQ compressor library for DC and, IMHO, the quality was on par with DXTC. (The 1bpp version, on the other hand, did need simpler textures as it had 4x2 (or was is 2x4?) pixel vectors**)

I was about to post a link to some comparison images but I see that the web space from my old ISP has been shutdown so I'll have to find a different host (assuming I still have a copy of the data)


** In any case, having 32-dimensional vectors was tricky.
 
Pardon? I developed the 2bpp VQ compressor library for DC and, IMHO, the quality was on par with DXTC.
I already said I remembered wrong, ok? Stop sticking it to me! :oops: ;)

(The 1bpp version, on the other hand, did need simpler textures as it had 4x2 (or was is 2x4?) pixel vectors**)
I never knew this, interesting. Must not have been very applicable outside of a few special cases?
I was about to post a link to some comparison images but I see that the web space from my old ISP has been shutdown so I'll have to find a different host (assuming I still have a copy of the data)

I was thinking you'd pop by this discussion at some point. If you feel like it, please tell us more about DC VQ. But no so much as to get in trouble. :smile:

** In any case, having 32-dimensional vectors was tricky.
How so? You mean in the compression stage or wrt. HW implementation?
 
Considering Dreamcast didn't even have to begin sacrificing standard definition in its frame buffers and other image qualities to make more room for its textures, it wasn't having problems staying ahead in texture quality.

With its indirect memory accesses, the DC VQ scheme required some considerable computation, absorbed by the hardware in that case. PVR-TC algorithms, then, would seem to be more mindful of the hardware cost of implementation.

Even though Naomi 2 represents a whole multiple of silicon over Dreamcast's processors, that's still less than PS2. TBDR is the answer when the question is eDRAM or a large external bus for solving the bandwidth dilemma.
 
Even though Naomi 2 represents a whole multiple of silicon over Dreamcast's processors, that's still less than PS2. TBDR is the answer when the question is eDRAM or a large external bus for solving the bandwidth dilemma.

Not that I'm one to argue, I do wonder why VF4 had to be scaled back from the Naomi 2 arcade version if this is the case. Was that strictly a memory issue?
 
Not that I'm one to argue, I do wonder why VF4 had to be scaled back from the Naomi 2 arcade version if this is the case. Was that strictly a memory issue?
That depends on what ways it scaled back.

IIRC, the Naomi 1 and 2 arcade boards typically had twice the texture/framebuffer memory (per CLX2) of DC but, AFAIU, that usually meant that on DC developers just used texture compression on a greater percentage of textures.

Naomi 2, OTOH, could handle considerably more geometry since the Elan co-processor offloaded T&L from the SH4.
 
This thread makes me sad to see the DC go away like it did. It's strange though, I kind of don't even consider it in the same generation as the PS2, Xbox, and GC. But considering what it had to offer as far as features go it was certainly ahead of the PS2, except in pure fillrates and a larger storage medium. To me it was more or less a "prototype" for what was ahead.
 
Considering Dreamcast didn't even have to begin sacrificing standard definition in its frame buffers and other image qualities to make more room for its textures, it wasn't having problems staying ahead in texture quality.
The 512^2 res could just be considered a more efficient resolution than the "standard". With a CRT it puts the most pixels on the length of the screen where they are most discernible.
It's not like DC didn't have issues with memory size. The bufferspace for polygons was considerable. Maybe it should have sacrificed a little res. for 400Kb more space?

With its indirect memory accesses, the DC VQ scheme required some considerable computation, absorbed by the hardware in that case. PVR-TC algorithms, then, would seem to be more mindful of the hardware cost of implementation.
Surely only in the developer side compression stage. Which is " someone else's problem".
The lookup was very fast. Faster than looking up full 24bit values.
But it did need 2Kb dedicated SRAM for the table.

Even though Naomi 2 represents a whole multiple of silicon over Dreamcast's processors, that's still less than PS2. TBDR is the answer when the question is eDRAM or a large external bus for solving the bandwidth dilemma.
"Complete" eDRAM, where everything, buffers and textures, are in the same super fast memoryspace allows you to do some tricks a tilebuffer would never ever allow. But if you can't fit the buffer (which will be the case for a forseeable future) than I guess it's the next best thing.
 
The lost potential of all of the ALUs/TMUs/logic/whatever that could fit inside the massive die area of a display buffer of eDRAM is the cost of being able to do those tricks. If wasting die area wasn't an issue, expand the much, much faster (and more fab friendly) on-die SRAM of the tile buffer and really go to town.
 
The lost potential of all of the ALUs/TMUs/logic/whatever that could fit inside the massive die area of a display buffer of eDRAM is the cost of being able to do those tricks. If wasting die area wasn't an issue, expand the much, much faster (and more fab friendly) on-die SRAM of the tile buffer and really go to town.

What about the lost potential of the polysorting hardware and all the bandwidth wasted shuffling the same data over a 800Mb/s bus multible times?

Talking about "wasted potential" is equally rubbish in both cases. There would be no need for the extra ALU and SRAM if you were bottlenecked with the interface.

It can however be discussed if the designers at Sony went a bit overboard, and if the Flipper approach was a better implementation of the idea?
But the Flipper buffer(s) does lose some of the advantages. The memory isn't homogenus anymore. You can't use buffers as textures and textures as buffers and scale them freely. Render to texture is slower and you can't use the previous frame for anything...

Besides the space "lost" to eDRAM is not as great as you seem to imagine. Look at this pic of the first gen. GS.
http://www.teamps2.com/psx2/graphics_chip.jpg
 
The 512^2 res could just be considered a more efficient resolution than the "standard". With a CRT it puts the most pixels on the length of the screen where they are most discernible.
What is this 512x512 you are referring to?
The lookup was very fast. Faster than looking up full 24bit values.
But it did need 2Kb dedicated SRAM for the table.
No. Pre-loading a VQ LUT would only work for immediate mode renderers (e.g. as done on the PS2) and, even then, it's a poor method if you only access a small fraction of any texture. For a deferred renderer it would be a very bad way to handle the compressed texture.

No, IIRC, CLX2 used a cache to hold the VQ table and so only loaded data that was actually needed. This is vitally important if there are > 1 textures in a tile.

Also, FWIW, DC developers would probably have limited themselves to 16bpp for non-compressed textures. Certainly higher bit depths were supported, but I just don't think they would have bothered. <shrug>
 
What is this 512x512 you are referring to?

The recommended resolution of the back buffer by Sony.

No. Pre-loading a VQ LUT would only work for immediate mode renderers (e.g. as done on the PS2) and, even then, it's a poor method if you only access a small fraction of any texture. For a deferred renderer it would be a very bad way to handle the compressed texture.

No, IIRC, CLX2 used a cache to hold the VQ table and so only loaded data that was actually needed. This is vitally important if there are > 1 textures in a tile.

Also, FWIW, DC developers would probably have limited themselves to 16bpp for non-compressed textures. Certainly higher bit depths were supported, but I just don't think they would have bothered. <shrug>

OK, so it's a cache instead of "just" an on-die chunk of RAM? How could it know which parts of the table that was needed? A hardware scheme to determine that, hardly seems worth it to save loading a few Kb.
 
Yeah really.
If the hardware designer has the same target group as the competition, he will match or exceed the competition on the most important ways. Texturing here being perhaps the most important of all.

I think this is a little different to what you previously said ("A much newer machine that costs the same should beat or at least equal the old one in all regards." - emphasis mine), and normally yeah I would expect newer and more expensive kit to beat the competition in the most important ways ... but I know that different folks will disagree on what the most important stuff is.

Toshiba thought fill rate was mega important, but Nvidia thought it had to be balanced with features. Both are gaming graphics chips but both have different influences and you can see why things ended up the way they did. I can see why newer consoles don't always exceed slightly older ones and I don't really have a problem with that, as long as the overall result is good.

"Compromise the integrity"?! What exactly do you mean by that? You optimise your engine to get better graphics, what is compromising about that? If you have to push less polygons or do more passes to do that, then so be it.

Or if you decide not to fill all your memory up with 8 bit CLUT textures then that's cool too!

All right, to be honest I was thinking of the 1 bit VQ which is close to 2bpp in many cases, with codebook.
Look I'm not putting down DC VQ, on the contrary, it's an awesome scheme (only PVR TC exceeds it in cost/compression ratio) and it would probably have been better if PS2 had something similar.
It is however not the reason for the bad texturing of PS2.

It's one of the reasons why DC textures often looked sharper and more colourful though!

Look at the contents of the very thread you linked, There are ways to have close to 2bpp textures that comes very close or is better than DC VQ. One of them is luminance compression, at the cost of only one extra simple pass. Sadly some of the most interesting pics are down. Perhaps not surprisingly, it is six years ago. :LOL:

I read the whole thread before I linked to it, it was very interesting! I never said that the PS2 couldn't improve on just using 4 or 8 bit CLUT textures for everything, just that 2bpp VQ worked very well (and better than 4 color textures). :p

Again, read the thread you linked.

I did, and it showed that 2bpp VQ would decimate a 4 color texture (and a 16 colour texture too, under most circumstances). Okay, okay, I'll stop with the 2bpp VQ / 4 colour thing now. :D

First off I never claimed it was straight forward. It requires you to decompress the textures in batches and have them ready uncompressed before they are to be loaded to the GS. Sort of like streaming from main mem to itself. That's not straight forward, but with a little care in the game design it can be done. One approach could be invisible portals like in many PC games.

If you have to decompress ahead of time and then put them in a texture pool in main memory pool then I can see why it didn't get used much (at all?). Might work well as a way to reduce loading/streaming times though if you decompressed as you loaded in off the DVD, which in turn could help texture quality in a given scene.

Here is why.
Most textures will be tiled or have a colour bit depth lower than the buffer (unless for special reasons. Most texels will cover equal or more than one pixel.
If you have any kind of texture management you will load only the relevant MIP levels. And with virtual texturing the texel bitrate per frame will be even lower.
Overdraw is still only about two and most of that is re-tiled textures.

I can see how you can render a frame with a relatively small number of unique texels, but I disagree that using more than a framebuffer's worth of texture data automatically means "you're doing it wrong". Large textures, high levels of filtering and pushing out lod transitions, multitexturing, combining buffers, real-time reflection mapping etc etc mean you can easily end up sampling lots of texture data.
 
Whoa... it's like I just walked through a time-portal, entering this thread.

Lazy8s said:
If wasting die area wasn't an issue, expand the much, much faster (and more fab friendly) on-die SRAM of the tile buffer and really go to town.
Obviously Die-area mattered, GS was supposed to ship with at least 8MB of eDram at some point, and Sony was unable to manufacture them fast enough as it was during first 6months.
That aside, I sincerely doubt there's anything you could "add" with SRam other then wasting more die-space.
On one hand, GPUs are the most latency tolerant part of your console, so by definition they don't need low latency solutions.
On the other hand, GS Ram was already designed with bandwith somewhere north of 160GB/s to page-buffers, which respectively provided the publicly advertised 48GB/s of bandwith at very-low latency.
Unlike most other GPUs, GS is not that latency tolerant (so it needed low-latency and high-bandwith solution) but on the upside you got a GPU where state-changes were effectively free.
Try and measure what the cost is for texture-cache flush on a typical GPU (immediate, deferred, it doesn't matter) - on GS it was under 10cycles as a worst case...

SimonF said:
Also, FWIW, DC developers would probably have limited themselves to 16bpp for non-compressed textures.
Which for RGBA usage never made much sense to me (given that VQ alternatives were always present).

and, even then, it's a poor method if you only access a small fraction of any texture
Unless you're paging palettes from embeded ram where bandwith is 'free' anyway.
 
I think this is a little different to what you previously said ("A much newer machine that costs the same should beat or at least equal the old one in all regards." - emphasis mine), and normally yeah I would expect newer and more expensive kit to beat the competition in the most important ways ... but I know that different folks will disagree on what the most important stuff is.
I don't see the difference. The two main ingredients of videogame graphics, textures and polygons, is not something you compromise. Both are important, and you'd be a very bad designer if you belittled either. Of course the designers wanted to get the best possible texture performance.

Toshiba thought fill rate was mega important, but Nvidia thought it had to be balanced with features. Both are gaming graphics chips but both have different influences and you can see why things ended up the way they did. I can see why newer consoles don't always exceed slightly older ones and I don't really have a problem with that, as long as the overall result is good.
DC is not slightly older than PS2, one and a half year separate the two, to be precise. On top of that, as you said yourself, DC is less ambitious than PS2, it used less silicon and was generally aiming at not being a PS2 killer, but getting Sega on right keel again.
PS2 on the other hand was a design with all stops pulled out, no holds barred in the design phase. Both the main pieces of silicon had a huge amount of research and expertise behind them.
PS2 should have performed much better in the texturing department than it did. I really can't see where the bottleneck was.

And BTW the GS is not exclusively fillrate oriented, that would just be stupid. There are many other features in there that are quite clever and unique for the design.
Or if you decide not to fill all your memory up with 8 bit CLUT textures then that's cool too!
How is that relevant to this discussion, and what is your point?
It's one of the reasons why DC textures often looked sharper and more colourful though!
I read the whole thread before I linked to it, it was very interesting! I never said that the PS2 couldn't improve on just using 4 or 8 bit CLUT textures for everything, just that 2bpp VQ worked very well (and better than 4 color textures). :p
As showed in the thread you could match and in some cases exeed DC-VQ, blending a monocrome 2bit texture with a lower resolution colour 2 or 4bit texture, getting the same compression ratio.
And besides, I highly doubt texture throughput per frame was the main culprit.

I did, and it showed that 2bpp VQ would decimate a 4 color texture (and a 16 colour texture too, under most circumstances). Okay, okay, I'll stop with the 2bpp VQ / 4 colour thing now. :D
For lower res textures (∼ 64x64 or 128x128) which there was quite a few of even in DC games, AFAICS, DC-VQ, actually has worse compression ratio than palette textures with a shared palette. The LUT takes up 2Kb. More or roughly the same as the texture itself..
If you have to decompress ahead of time and then put them in a texture pool in main memory pool then I can see why it didn't get used much (at all?). Might work well as a way to reduce loading/streaming times though if you decompressed as you loaded in off the DVD, which in turn could help texture quality in a given scene.
Exactly. It would free up main mem for more textures, for the scene being shown in the game at the moment and not have to buffer full size versions of the textures from adjacent scenes.
I can see how you can render a frame with a relatively small number of unique texels, but I disagree that using more than a framebuffer's worth of texture data automatically means "you're doing it wrong". Large textures, high levels of filtering and pushing out lod transitions, multitexturing, combining buffers, real-time reflection mapping etc etc mean you can easily end up sampling lots of texture data.
Pushing out the MIP map boundery to where you get pixelfighting is just plain bad practice. With bilinear it's always best to have the texels at least covering slightly more than one pixel (unless you are doing procedural textures of course).
Even in a worst case scenario with reflection mapping, bump/relief mapping and colour texture on the same model, you should still be well under a 1 to 1 texel to pixel bit ratio.
Large textures are not an exception here. Of course the whole large texture has to be stored somewhere, but the texels actually accessed per frame will still be below the resolution of the screen. And with standard use of compression, MIP mapping and virtualization/clipmapping, the texels having to be precent in a texture buffer for the current frames VRAM is lower data volume than the framebuffer.
 
...

well, i am not quite familiar with the poly counts pushed by ps2/gamecube/xbox games, but i would like to state that in the right hands the hitachi sh4 could push a lot more geometry than what we were used by commercial dc games...

I have coded a pipeline in extremely tight sh4 asm that can push around 150k vertices (1 texture + 1 light) per frame at 60 fps...the only problem is that the vertex shader is executed for every vertex being sent to the powervr chip..i have a new pipeline on the drawing board (albeit, free time is scarce) that will work only once on each vertex shared by multiple triangles and performance is expected to get even more better, allowing even more headroom for complex effects... The only problem is that the dc's 8mb of vram pose a cumbersome limit, the naomi would be much better with double the amount of vram...

For your consideration, the most geometry heavy game that i have seen on dc is Dead or Alive 2 pushing around 20k vertices per frame...

Another interesting observation i made during my experiments with dreamcast is that the 2x antialiasing mode tends to be fairly cheap, usually consuming 20-25% more than non antialiased rendering...
 
The PowerVR and SuperH architectures had many years of considerable development accumulated behind them by obviously the industry's most accomplished engineers, so the development of Sony's processors (EE, GS, Cell) only seemed comparatively large-scale because it was all up front.

Also, the price point and market segment targeted aren't measures of how ambitious a design is.
 
DC is not slightly older than PS2, one and a half year separate the two, to be precise. On top of that, as you said yourself, DC is less ambitious than PS2, it used less silicon and was generally aiming at not being a PS2 killer, but getting Sega on right keel again.

I never said the DC was less ambitious. Setting and trying to meet difficult goals is ambitious; both PS2 and DC designers were very ambitious.

And BTW the GS is not exclusively fillrate oriented, that would just be stupid. There are many other features in there that are quite clever and unique for the design.

It has much higher fillrate than the chip in the Xbox though (a newer, more expensive system on a smaller manufacturing process with faster, hotter chips). This idea that you must beat competitors' hardware in all ways is not one that console designers of any generation have gone in for. Different approaches make things more interesting anyway.

How is that relevant to this discussion, and what is your point?

Nice textures take up memory. How much memory you allocate to textures affects how nice they can be!

As showed in the thread you could match and in some cases exeed DC-VQ, blending a monocrome 2bit texture with a lower resolution colour 2 or 4bit texture, getting the same compression ratio.
And besides, I highly doubt texture throughput per frame was the main culprit.

I expect most PS2 games used 4 and 8 bit palletised textures for most or all of their textures, in which case memory would have been the culprit. I know I keep returning to a common theme here, but I think on the whole it's just this simple, particularly for multi platform games.

Pushing out the MIP map boundery to where you get pixelfighting is just plain bad practice. With bilinear it's always best to have the texels at least covering slightly more than one pixel (unless you are doing procedural textures of course).
Even in a worst case scenario with reflection mapping, bump/relief mapping and colour texture on the same model, you should still be well under a 1 to 1 texel to pixel bit ratio.
Large textures are not an exception here. Of course the whole large texture has to be stored somewhere, but the texels actually accessed per frame will still be below the resolution of the screen. And with standard use of compression, MIP mapping and virtualization/clipmapping, the texels having to be precent in a texture buffer for the current frames VRAM is lower data volume than the framebuffer.

Combining trilinear and aniso filtering can require several none shared texels per texture layer. Pushing out mip transition boundaries and using a high a level of aniso filtering is a relatively cheap way of greatly increasing IQ. It should be compulsory for racing game makers. Also, hooray for the PC.

I could never get into Test Drive Le Mans on the DC, but it was nice to see the DC show off its aniso filtering at least once.
 
well, i am not quite familiar with the poly counts pushed by ps2/gamecube/xbox games, but i would like to state that in the right hands the hitachi sh4 could push a lot more geometry than what we were used by commercial dc games...

I have coded a pipeline in extremely tight sh4 asm that can push around 150k vertices (1 texture + 1 light) per frame at 60 fps...the only problem is that the vertex shader is executed for every vertex being sent to the powervr chip..i have a new pipeline on the drawing board (albeit, free time is scarce) that will work only once on each vertex shared by multiple triangles and performance is expected to get even more better, allowing even more headroom for complex effects... The only problem is that the dc's 8mb of vram pose a cumbersome limit, the naomi would be much better with double the amount of vram...

For your consideration, the most geometry heavy game that i have seen on dc is Dead or Alive 2 pushing around 20k vertices per frame...

Another interesting observation i made during my experiments with dreamcast is that the 2x antialiasing mode tends to be fairly cheap, usually consuming 20-25% more than non antialiased rendering...

Any chance you could release this demo when you're done?
 
The PowerVR and SuperH architectures had many years of considerable development accumulated behind them by obviously the industry's most accomplished engineers, so the development of Sony's processors (EE, GS, Cell) only seemed comparatively large-scale because it was all up front.

Also, the price point and market segment targeted aren't measures of how ambitious a design is.

The SH-4 wasn't anywhere near as heavily targeted at SIMD performance as the EE and was an off the shelf part. The PowerVR chip while it had a great price/performance had some drawbacks and was not a high end part, even at launch, and was also an off the shelf part.
 
I never said the DC was less ambitious. Setting and trying to meet difficult goals is ambitious; both PS2 and DC designers were very ambitious.
Can't remember if it was you or someone else who said the die sizes of the two main ICs in the DC was significantly smaller than the PS2s, but it's true.
It has much higher fillrate than the chip in the Xbox though (a newer, more expensive system on a smaller manufacturing process with faster, hotter chips). This idea that you must beat competitors' hardware in all ways is not one that console designers of any generation have gone in for. Different approaches make things more interesting anyway.
Well, the difference in texturing fillrate isn't that great and the NV2a has the double-texturing feature, that makes two passes with the same UV setup somewhat faster than two passes (GS has dual context registers and eDRAM which make it very good at actually using the fillrate and also all kinds of multipass is very fast). So considering NV2a is an semi off the shelf part, it is as much "better" as you'd expect it to be.
Nice textures take up memory. How much memory you allocate to textures affects how nice they can be!
PS2 had plenty memory compared to consoles that performed better in the texturing department (DC and GC).
I expect most PS2 games used 4 and 8 bit palletised textures for most or all of their textures, in which case memory would have been the culprit. I know I keep returning to a common theme here, but I think on the whole it's just this simple, particularly for multi platform games.
But PS2 had double the memory, so space can't be the factor. And still, the memory size would only limit the texture resolution in games where variety was required, so at least a sizeable amount of games on the PS2 should feature large textures, which is not the case.
Combining trilinear and aniso filtering can require several none shared texels per texture layer. Pushing out mip transition boundaries and using a high a level of aniso filtering is a relatively cheap way of greatly increasing IQ. It should be compulsory for racing game makers. Also, hooray for the PC.
But still, unless the case is pathological (you can always make something up) you'll still be integrating from 4bit textures and only for surfaces with certain inclinations.
I could never get into Test Drive Le Mans on the DC, but it was nice to see the DC show off its aniso filtering at least once.
Not to burst your bubble, but AFAICR the anisotropic "filtering" was just a fade to another textureset (kind of like RIP mapping).
 
Back
Top