Should the SEGA DC and Saturn have launched with these alternative designs?

For lower res textures (∼ 64x64 or 128x128) which there was quite a few of even in DC games, AFAICS, DC-VQ, actually has worse compression ratio than palette textures with a shared palette. The LUT takes up 2Kb. More or roughly the same as the texture itself..
You don't need the whole codebook. You can have a 64*64 non-mipmapped texture with 1kb codebook, for example. That gets 64*64*2/8+1024=2kb; an even 4:1 compression ratio.

I have coded a pipeline in extremely tight sh4 asm that can push around 150k vertices (1 texture + 1 light) per frame at 60 fps...the only problem is that the vertex shader is executed for every vertex being sent to the powervr chip..i have a new pipeline on the drawing board (albeit, free time is scarce) that will work only once on each vertex shared by multiple triangles and performance is expected to get even more better, allowing even more headroom for complex effects... The only problem is that the dc's 8mb of vram pose a cumbersome limit, the naomi would be much better with double the amount of vram...
My best so far has been about 100k vertices per frame at 60 fps with similar conditions (1 light, 1 texture). I've also been working on a way to reduce the overhead of handling reprocessing vertices with identical positions/parameters. I'm planning on using the SH4's indexed cache mode with write back to force one half of the cache to hover over the TA submission area and use the cache line flush instruction to turn each cache line into a store queue. When I need to resubmit a recent vertex again, I can do another cache line flush on the same data again (after doing a write to set the line's dirty bit again, either onto padding or the TA command field) to resend it. It should also be useful for doing front plane clipping.

For your consideration, the most geometry heavy game that i have seen on dc is Dead or Alive 2 pushing around 20k vertices per frame...

DOA2 does closer to 50k vertices per frame from what I've measured...

Not to burst your bubble, but AFAICR the anisotropic "filtering" was just a fade to another textureset (kind of like RIP mapping).
No, it really is anisotropic filtering, done by 4x supersampling the texture.
 
Yeah pretty amazing what DOA2 did on the DC. I think it is unmatched by any other fighter on the console visually. It wasnt just the geometry, but the lighting and reflections in various areas (ie the opera/theater stage). Outstanding work. It makes me wonder how we didnt see more similar examples.

DC came with a few surprises here and there. Sonic Adventure 2 for example was also extremely impressive for its time. The textures and framerate were breathtaking. It was one of the few games that may have actually competed with the PS2's offerings. Cant think of any similar game on the PS2
 
The SH-4 wasn't anywhere near as heavily targeted at SIMD performance as the EE and was an off the shelf part. The PowerVR chip while it had a great price/performance had some drawbacks and was not a high end part, even at launch, and was also an off the shelf part.

I believe that Sega had input during the SH4's design process.

You're dead wrong about the second point though. At launch the PVR2DC was the highest end graphics chip there was: it was more capable than Sega's own Model 3, it decimated the fasted PC GPU, the Voodoo 2 (even in SLI) and it was light years beyond the N64, which launched only 2 years earlier (almost the same time gap as DC to PS2).

And it was not an off the shelf part!

Can't remember if it was you or someone else who said the die sizes of the two main ICs in the DC was significantly smaller than the PS2s, but it's true.

Oh it's true, but I don't think you can measure ambition just by die size.

Not to burst your bubble, but AFAICR the anisotropic "filtering" was just a fade to another textureset (kind of like RIP mapping).

Nope, it's definitely aniso. Confirmed by the developers and you can see it when you play the game.
 
Any chance you could release this demo when you're done?

of course...the only problem is i have given my dreamcast to a friend for overclocking purposes and my Development Box cannot read cd-rs so i do not have a way to burn and test an image :( . I just have to wait a bit till i get it back...
 
My best so far has been about 100k vertices per frame at 60 fps with similar conditions (1 light, 1 texture). I've also been working on a way to reduce the overhead of handling reprocessing vertices with identical positions/parameters. I'm planning on using the SH4's indexed cache mode with write back to force one half of the cache to hover over the TA submission area and use the cache line flush instruction to turn each cache line into a store queue. When I need to resubmit a recent vertex again, I can do another cache line flush on the same data again (after doing a write to set the line's dirty bit again, either onto padding or the TA command field) to resend it. It should also be useful for doing front plane clipping.

DOA2 does closer to 50k vertices per frame from what I've measured...

well yes the way to achieve top performace with sh4 is by handling the cache correctly and dual-completing instructions....my solution does both extremely well, so i expect that when i switch to my new pipeline performance will skyrocket (i have some other tricks as well, but let's keep it simple for the time being :))

DOA2 is way below 50k, you can fire up nulldc to check it out. Actually nulldc increases the amount of vertices processed due to the way it restructures data...
 
i got a used one back in 2003 and since then i dedicate around 2 weeks each year writing high performance sh4 assembly using the built in sh4 pipeline simulator, it is such a great tool.

Recently i also found out another cool feature of the powervr2dc. I usually viewed my work on a vga monitor, but recenlty i connected my devbox to a crt tv and used a flicker free 640x480@24bpp that stores only half the framebuffer in vram in a filtered manner. I didn't notice any discrepancies in image quality and i immediately got one extra megabyte of free vram that can be used to store more textures or geometry :)
 
DOA2 is way below 50k, you can fire up nulldc to check it out. Actually nulldc increases the amount of vertices processed due to the way it restructures data...

That's what I did do. I could tell that the program I was using to get the counts on wasn't totally reliable, but I wasn't able to get a good reference with anything I could get running. I didn't think it could be that far off.

I believe that Sega had input during the SH4's design process.
On a somewhat related note, I ran across this:
http://www.hotchips.org/archives/hc9/3_Tue/HC9.S7/HC9.7.1.pdf
It's a 1997 presentation on the SH4. It has a nice description of the FP hardware. The original top clock speed was going to be 166 MHz...

Not related, but also cool; a presentation on the N64's RSP (with die shot): http://www.hotchips.org/archives/hc9/3_Tue/HC9.S10/HC9.10.2.pdf
 
I believe that Sega had input during the SH4's design process.

You're dead wrong about the second point though. At launch the PVR2DC was the highest end graphics chip there was: it was more capable than Sega's own Model 3, it decimated the fasted PC GPU, the Voodoo 2 (even in SLI) and it was light years beyond the N64, which launched only 2 years earlier (almost the same time gap as DC to PS2).

And it was not an off the shelf part!
It was an off the shelf part in the sense that it was targeted at the general consumer market. It was released a year later in form of the slightly upgraded Neon 250, to pretty lukewarm reviews.
But OK I'd agree and stand corrected that it probably was top of the pops for a short while. GS in the other hand held it's own for several years with many of its features.
Oh it's true, but I don't think you can measure ambition just by die size.
Of course not, there is truth to the saying that "good engineering is making for one dollar what any idiot can do for a hundred". But it's also good engineering to be able to use large resources, if you are given them.
The GS was better in almost every respect on paper, it's a mystery to me why it didn't perform better wrt. texturing.
Nope, it's definitely aniso. Confirmed by the developers and you can see it when you play the game.
Sure it's not RIP mapping? That's still real aniso, only precomputed.
 
of course...the only problem is i have given my dreamcast to a friend for overclocking purposes and my Development Box cannot read cd-rs so i do not have a way to burn and test an image :( . I just have to wait a bit till i get it back...

Damn. :(

How fast has your friend got your DC running btw?

On a somewhat related note, I ran across this:
http://www.hotchips.org/archives/hc9/3_Tue/HC9.S7/HC9.7.1.pdf
It's a 1997 presentation on the SH4. It has a nice description of the FP hardware. The original top clock speed was going to be 166 MHz...

Maybe they were intending for it to be passively cooled at 166 mhz. Pity they couldn't squeeze a bit more than 200 mhz out of it for the DC though. Modders have hit 270 mhz with with relatively little cooling.

I remember reading a devs comments about the DC, and he said only about 20% of his CPU time was spent on T&L (or something like that). Perhaps a small boost in clock speed could have led to a disproportionate increase in poly counts (memory permitting).
 
It was an off the shelf part in the sense that it was targeted at the general consumer market. It was released a year later in form of the slightly upgraded Neon 250, to pretty lukewarm reviews.
But OK I'd agree and stand corrected that it probably was top of the pops for a short while. GS in the other hand held it's own for several years with many of its features.

Neon 250 suffered in the PC space with drivers and poor support for its features. For some reason its maximum performance is listed as 4 million polys/sec, much lower than the DC's 7 million. Or clock for clock about half as fast.

Neon 250 isn't a fair way to judge the DC IMO.

Overall, I don't think the GS aged any better. The day the Xbox and Halo landed the GS looked pretty old, just as something like Metal Gear Solid made the DC look relatively underpowered.

Of course not, there is truth to the saying that "good engineering is making for one dollar what any idiot can do for a hundred". But it's also good engineering to be able to use large resources, if you are given them.
The GS was better in almost every respect on paper, it's a mystery to me why it didn't perform better wrt. texturing.

Fair enough. What was the maximum texture size the GS could handle btw?

Sure it's not RIP mapping? That's still real aniso, only precomputed.

Yeah, pretty sure. TapamN says it was done by supersampling the texture, and I'm not going to argue with him (I seem to remember reading that this was how the DC does it actually). TD Le Mans had a huuuuge draw distance and the road texture seemed surprisingly clean as went off into the distance.

Wasting precious vram on rip-maps when you're running at 30fps and have a GPU that can do aniso filtering would seem to be a less than optimal choice, especially when you're pushing a lot of polygons per frame.
 
I thought so but couldn't remember specifics. A quick forum search did indeed reveal some of the details.

Far from being a "slightly upgraded" version of the CLX in the DC, Neon 250 was actually a big downgrade and much slower. Sorry Squeak ;).

http://forum.beyond3d.com/showpost.php?p=349365&postcount=3

Here are some explanations given previously of their differences.

JohnH wrote:
Just to clarify some of the (key) differences between The PC and console versions of the series 2 HW,

CLX tile was 32x32, PC part was 32x16.
CLX support alpha test with HW front->back sorting to deliver massive effective fillrates when it was use, this was not available on the PC part.
CLX tiling was completely handled by HW, the PC part was 50:50 split between HW and SW
CLX include latency buffering that allowed VQ and palettised to run at full rate, this was removed from the PC part resulting in 50% performance for those formats.
CLX support 2 or 3 (think it was effectively 3 wasn't it Simon ?) external 32 bit memory busses (think in terms of much hyped NV/ATi crossbars), The PC part utilised a single 64 bit bus resulting much greate page break impact.

This all adds up to the PC part being between 50-75% of the performance of the console part inspite of the higher clock rate (125 vs 100 MHz)...

...The differences between CLX and PMX1/Neon250 were all to do with minimising chip area...

Simon F added:
...it was different. For example, VQ textures weren't part of DX so it didn't seem very important to have a high-speed implementation in 250. Similarly, palettised textures were thought to be on their way out so only some 'legacy' support was included.

For the tile sizes, I think Sega themselves requested the 32x32 size because they wanted things like 'extremely fast' opaque overdraw and faster translucency sorting. I think the overdraw in PC games was typically limited (though often higher than some thought) and so this was not quite as important.

Some other features changed because there would have been no support for them in the APIs and even if an extension was included it's sometimes impossible to get a developer to use it. For example, DC had translucency sorting, which was used in the games. Neon250 also had this feature (which I think was even flagged in DX), but when you told the PC developers, I believe the response was frequently either (a) disbelief or (b) "that's nice but we are already sorting all the translucent polygons for cards X,Y, and Z and we couldn't be bothered to disable that for your chip".

(Of course, when Kyro had the translucency sorting removed because no one appeared to be using it, several ex-DC developers said that was a shame. Just shows you can't please everybody )

As for the busses, IIRC CLX had two semi-independent 32 bit busses to its RAM. I suspect that because PVR250 had to work in a PC environment, where there was a 64bit AGP bus, it made more sense to have these the same width.

Finally, when John said the tiling was all HW in CLX and part-software in PVR250, I think you should take that to mean that the 250 was more programmable.

The DC was top end by 1999 standards and frankly phenomenal by 1998 ones. I don't think there's been another console GPU as impressive for its time, and that includes the PS2's GS and the 360's Xenon. Honourable mention to the PS1 though.

Anyway, it's a great pity that PowerVR stuff isn't in home consoles any more, especially when you think about the likes of nv2a and RSX and some of the suitability issues they've faced.
 
Neon 250 suffered in the PC space with drivers and poor support for its features. For some reason its maximum performance is listed as 4 million polys/sec, much lower than the DC's 7 million. Or clock for clock about half as fast.

Neon 250 isn't a fair way to judge the DC IMO.
Well it was a slightly upgraded version of the same chip. 25Mhz on top and 24Mb additional RAM.
Edit. Oh I see. :-C Such pity.

Overall, I don't think the GS aged any better. The day the Xbox and Halo landed the GS looked pretty old, just as something like Metal Gear Solid made the DC look relatively underpowered.
Yeah, but mainly because of one single feature: Detail texturing with highres textures. The rest was pretty standard fare. The bumpmapping wasn't use so much as to make an overall visual difference in the game.
Fair enough. What was the maximum texture size the GS could handle btw?
1024x1024x32 according to GS user's manual.
 
Last edited by a moderator:
sinektik said:
I have coded a pipeline in extremely tight sh4 asm that can push around 150k vertices (1 texture + 1 light) per frame at 60 fps...
Which has no context for comparing it with games (the same or other platforms).
That aside, in terms of tightly optimized pipelines, PS2 real world was around 550k vert/frame(on single VU), and XBox would pass over 1M vert/frame(arguably with the least effort of the 3 platforms) - at 60fps, with the same conditions.

And while we're talking synthetic polygon-drawing scenarios that also get used in games, there's a number of them that will completely break down one or multiple platforms while flying super fast on others. It was hardly a meaningful metric even last generation.

Squeak said:
But PS2 had double the memory, so space can't be the factor
Geometry storage went up at least 4x with PS2, space remained an issue. And goalposts(new platforms) moved by the time PS2 software got out of its early mistakes.
 
Dreamcast's design was near perfect but Saturn was a mess. Sega should've held on with the Sega CD for 2 more years from 1994 to 1996 and then release a Saturn with completely different Lockheed Martin hardware and a PowerPC CPU to compete with N64. A custom all-in-one LMC Real3D/100 with combined geometry, graphics and texture processors in one chip would've destroyed both the N64 and 3DO M2 and 3DFX Voodoo1. Sure Sega would've been much later than Sony Playstation which launched in 1994/1995 but it wouldn't have the mess of the 1994 Saturn architecture. Yu Suzuki and all 3rd prty developers would've been pleased. Sega would then NOT have to launch Dreamcast in 1998/1999 and wait until 2000 or 2001 to compete with PS2 on a more even level in every way. More like NAOMI 2 specs if not better using a new-generation Lockheed GPU, if Locheed had remained in the graphics biz on the PC side. However then ATI would not have been as strong starting with the R300 generation, they'd only have ArtX not Lockheed's engineers. Microsoft would then not have launch Xbox until 2002 or 2003 because they would've been happy being Sega's partner I think. Okay i'm going a bit overboard here but that's just my thinking.
 
Which has no context for comparing it with games (the same or other platforms).
That aside, in terms of tightly optimized pipelines, PS2 real world was around 550k vert/frame(on single VU), and XBox would pass over 1M vert/frame(arguably with the least effort of the 3 platforms) - at 60fps, with the same conditions.

well, we'll see how it will perform when i find some time to make it even better. for the time being the implementation is not yet finalised and i also need to make one for animated meshes.
 
Last edited by a moderator:
Dreamcast's design was near perfect but Saturn was a mess. Sega should've held on with the Sega CD for 2 more years from 1994 to 1996 and then release a Saturn with completely different Lockheed Martin hardware and a PowerPC CPU to compete with N64.

The Sega CD was a mess as well. Also I can't even imagine how much a lockheed martin designed Saturn would have cost Sega back in the day.

I think Sega should have just held on for a couple more years and waited on launching the DC. Though I don't think they expected GPU performance to jump the way it did after the DC's launch.
 
Whoa... it's like I just walked through a time-portal, entering this thread.
But it is 1998... you have just woken from a dream. The real time-portal takes you to a discussion on programming SGI workstations.
Unless you're paging palettes from embeded ram where bandwith is 'free' anyway.
I take it the "free" is the same as Kristof', "nothing is free in 3D".

PS2 on the other hand was a design with all stops pulled out, no holds barred in the design phase. Both the main pieces of silicon had a huge amount of research and expertise behind them.
I think that is, possibly, somewhat offensive.
For lower res textures (∼ 64x64 or 128x128) which there was quite a few of even in DC games, AFAICS, DC-VQ, actually has worse compression ratio than palette textures with a shared palette. The LUT takes up 2Kb. More or roughly the same as the texture itself..
I'm sorry but that would be a silly thing to do. The VQ textures did not have to use a full 2kB LUT - You could, and would, use a smaller table for small textures.

Furthermore, it was possible to order a chain of (small) textures, A, B, C, D etc, so that A had a full 2k LUT, of which B re-used some of the VQ entries and introduced new ones, of which C used a portion and etc etc.
 
It was an off the shelf part in the sense that it was targeted at the general consumer market. It was released a year later in form of the slightly upgraded Neon 250, to pretty lukewarm reviews.
Absolutely not. IMG were developing a PC-based arcade chip (ARC1) and from that evolved CLX2 (via, I think, a paper-only CLX1). Certainly aspects of ARC1/CLX2 were used in a PC chip, but CLX2 was not for the general PC market.

Sure it's not RIP mapping? That's still real aniso, only precomputed.
100% certain it was not RIP mapping. Aniso' texturing on CLX2 took 4 samples per pixel. If you were really keen, I think you could also combine it with trilinear (though that would require an extra polygon layer), to give even more samples/pixel.
 
Back
Top