PowerVR 2DC

Luminescent · Oct 27, 2004

Anyone have access to or know where to find the a complete set of PowerVR2DC specs? How do they compare to the Neon 250 specs listed below? Are Neon250 and PowerVR2DC based on the same core with the same number of pipelines, transistors, etc. or is there a bit more to their differences?

PowerVR 2 based Neon250 specs are as follows:

Sharky Extreme said:
2D Engine

Full ROP, text and line primitives
Full VGA compatibility
YUV to RGB color space conversion
MPEG2 decode assist (motion compensation acceleration)
Integrated 250Mhz DAC (1920x1440x16bpp@65Hz/1600x1200x32bpp@85Hz)
Color key overlay
Multiple video windows

3D Engine

Tile based reduced bandwidth rendering engine
32-bit floating point Z-buffering calculation function with no performance penalty
Up to 5M polygons/sec (forward facing delivered to the screen)
Fill rate 200-500M pixels/sec (depending on scene complexity) Full Triangle & Texture Setup
Full polygon setup engine
Bus mastered parameter fetch
Advanced texturing (Bi-linear, Tri-linear, Anisotropic, Bump-mapping)
True color 32 bpp pipeline
Translucency sorting
Image super-sampling/scene anti-aliasing
Per pixel loadable table fog
Specular highlights with offset colors Alpha + Multipass Blending
Multitexturing support
Color key and alpha blended textures
D3D and OpenGL blend modes
Environment mapping

crystall · Oct 28, 2004

Luminescent said:
Anyone have access to or know where to find the a complete set of PowerVR2DC specs? How do they compare to the Neon 250 specs listed below? Are Neon250 and PowerVR2DC based on the same core with the same number of pipelines, transistors, etc. or is there a bit more to their differences?

--- snip ---

They look pretty much the same. AFAIK PowerVR2DC had some small differences from the PC version, like sprite support (those were just rectangular textured polygons with some quirks).

Lazy8s · Oct 28, 2004

Here are some explanations given previously of their differences.

JohnH wrote:

Just to clarify some of the (key) differences between The PC and console versions of the series 2 HW,

CLX tile was 32x32, PC part was 32x16.
CLX support alpha test with HW front->back sorting to deliver massive effective fillrates when it was use, this was not available on the PC part.
CLX tiling was completely handled by HW, the PC part was 50:50 split between HW and SW
CLX include latency buffering that allowed VQ and palettised to run at full rate, this was removed from the PC part resulting in 50% performance for those formats.
CLX support 2 or 3 (think it was effectively 3 wasn't it Simon ?) external 32 bit memory busses (think in terms of much hyped NV/ATi crossbars), The PC part utilised a single 64 bit bus resulting much greate page break impact.

This all adds up to the PC part being between 50-75% of the performance of the console part inspite of the higher clock rate (125 vs 100 MHz)...

...The differences between CLX and PMX1/Neon250 were all to do with minimising chip area...

Simon F added:

...it was different. For example, VQ textures weren't part of DX so it didn't seem very important to have a high-speed implementation in 250. Similarly, palettised textures were thought to be on their way out so only some 'legacy' support was included.

For the tile sizes, I think Sega themselves requested the 32x32 size because they wanted things like 'extremely fast' opaque overdraw and faster translucency sorting. I think the overdraw in PC games was typically limited (though often higher than some thought) and so this was not quite as important.

Some other features changed because there would have been no support for them in the APIs and even if an extension was included it's sometimes impossible to get a developer to use it. For example, DC had translucency sorting, which was used in the games. Neon250 also had this feature (which I think was even flagged in DX), but when you told the PC developers, I believe the response was frequently either (a) disbelief or (b) "that's nice but we are already sorting all the translucent polygons for cards X,Y, and Z and we couldn't be bothered to disable that for your chip".

(Of course, when Kyro had the translucency sorting removed because no one appeared to be using it, several ex-DC developers said that was a shame. Just shows you can't please everybody )

As for the busses, IIRC CLX had two semi-independent 32 bit busses to its RAM. I suspect that because PVR250 had to work in a PC environment, where there was a 64bit AGP bus, it made more sense to have these the same width.

Finally, when John said the tiling was all HW in CLX and part-software in PVR250, I think you should take that to mean that the 250 was more programmable.

Luminescent · Oct 28, 2004

SimonF said:
Finally, when John said the tiling was all HW in CLX and part-software in PVR250, I think you should take that to mean that the 250 was more programmable.

Anyone care to expand on that statement? Does it mean CLX had a native hardware implimentation for tiling while PVR250 relied partly on cpu software-side assistance, or that CLX had a hardwired (non-configurable) tiling implemenation while the 250 had a configurable one?

Simon F · Oct 28, 2004

Luminiscent posted some questions to me but I thought I might as well answer them in this thread:

Luminescent said:
Sorry to bother you, Simon, but I bugged Kristof about some of the questions below and he was not sure, being that he came onboard PowerVR after the DC. After researching a bit about CLX on B3D, I found your bits information to be most informative and believable, being that you worked on the project. That said, I present to you the following set of questions:

I'll try to answer, but please understand it was a while ago and my memory is fading fast ;-) The information may not be 100% accurate.

Luminescent said:
In a recent thread about PowerVR2DC, I asked the following:

Luminescent said:

Simon F said:

Finally, when John said the tiling was all HW in CLX and part-software in PVR250, I think you should take that to mean that the 250 was more programmable.

Click to expand...

Anyone care to expand on that statement? Does it mean CLX had a native hardware implimentation for tiling while PVR250 relied partly on cpu software-side assistance, or that CLX had a hardwired (non-configurable) tiling implemenation while the 250 had a configurable one?

Click to expand...

Care to explain a little further?

Neon250 actually had a programmable module, i.e. a CPU. It wasn't a Vert. or Pix. shader or anything like that, but enough to move data around, make decisions etc etc before the 3D rendering took place.

The tiling calculations were still done in HW but it may be that it was more flexible than the fully hardwired CLX. I didn't work on the drivers so I can't be certain.

In addition, I read, in the Neon 250 specs listed in the thread above, that it supported a 32 floating-point z-buffer. Is it the same for CLX?

I think they were both pretty closish to IEEE float.

Do all internal units of CLX use 32-bit floating point precision (I've read the texture and geometry setup engines do, but I'm not sure about the texture shader, etc.)?

Those bits would be some form of floating point, but things like texture addressing don't need to be anywhere near as precise as IEEE, so they would be smaller.

If there are int units, what range do they typically work with?

The RGBA colour buffers were just be 8888.

Doesn't MBX work with at FP precision internally (I believe I read it an an ARM or Intel whitepaper)?

Again it will depend on what part of the chip. They will usually be tailored to just have the right amount of precision for the job.

Because CLX is capable of Dot3, I assume that the texture shader sports some sort of combiner unit that can complete a dot product (although I'm not sure about how many components), is this right?

It was a dot product but not DOT3. It used polar coordinates which requires slightly more software set-up. Unfortunately, I changed my mind to go to cartesian coordinates too late for it to be put into CLX <shrug>. Mind you, there were some other nice features in the dot product unit. Go search for the patent if you're really curious.

Is the combiner configurable to allow for other sorts of color blending? How many cycles for a Dot3 instruction and a texture fetch, 2?

a) I guess so, eg. I always wanted to do anistropic translucency with the dot unit.
b) 1 cycle for the normal map fetch, dot calculation and blend with the current accumulation buffer. If you wanted the bump to have been applied to another texture that would have cost another cycle (i.e. an earlier triangle)

The CLX sports two internal 32-bit buffers (FP?) for multipass, right?

They were 8888, and yes they allowed some interesting effects which couldn't be done with other architectures of the day.

This allows for it to maintain color integrity internally, but how many bits does its final framebuffer hold?

Whatever you program it to be. It could be 16 bit, (5:6:5 or 1:5:5:5) or 32. I think it also supported a genuine 24 bit mode (i.e. no wasted bytes).

Simon F said:
Finally, after doing some research, I found the following post in which you stated:

Simon F said:

Lazy8s said:

The CLX's maximum sort depth is 60 .

Click to expand...

What? You can put as many polygons in (with different depth) as you like (memory permitting on CLX). There is no opaque "depth sort" limit because depth comparsions are done with an internal Z-Buffer.

There may or may not (it's too long ago for me to remember) be a limit to how many intersecting layers of translucent polygons it will per-pixel-sort, but you'll hit a practicle performance limit, due to fill rate, first.

Click to expand...

Are you sure there are no theoretical depth sort limits? If you don't remember, can you at least take a guess at whether there was a limit as to "how many intersecting layers of translucent polygons it will per-pixel-sort."

I don't actually recall there being a stated limit, but I would think it'd be hundreds if there were one. As I said, If you genuinely had a couple of hundred layers of transparent polygons you'd be bogged down with a lack of fill rate first.

Out of curiosity, approximately how many transistors was the CLX made from?

No idea but I think the chip may have been somewhere between 1 and 1.5 cm^2 (again it was a while ago). You'd have to then look at the technology and work out how many gates/transisitors you could fit in that and round down a bit etc.

Luminescent · Oct 28, 2004

Any particular reason why CLX and other PowerVR architectures go for FP internally on everything? Is it just a matter of increased range, or is there something more to it? None of Nvidia's or Ati's DirectX 6, 7, or 8 parts did everything in float format.

Lazy8s · Oct 28, 2004

It's impressive that the architecture lets the chips be smaller while having even higher standards of image quality.

Simon F · Oct 28, 2004

Luminescent said:
Any particular reason why CLX and other PowerVR architectures go for FP internally on everything? Is it just a matter of increased range, or is there something more to it?

Well I don't know that everything was float - certainly the accumulation buffer was 'fixed' point.

As for using floating point , one reason is that it's often simply better. Take the depth buffer for example. Using float, (properly), gives you a much better useful accuracy for a given number of bits than fixed point.

Luminescent · Oct 28, 2004

A few more questions for Simon:

Well I don't know that everything was float - certainly the accumulation buffer was 'fixed' point.

For clarification's sake (I want to get things straight, at least with what you can recall): all computational units, including polygon setup, texture setup, hidden surface removal, tiling, texure shader, and color blending compute with the FP format at varied precision levels; the 2 accumulation buffers (which I assume were for multipass) and framebuffer store a max of 32 bits in int format. Correct me if I'm wrong.

Whatever you program it to be. It could be 16 bit, (5:6:5 or 1:5:5:5) or 32. I think it also supported a genuine 24 bit mode (i.e. no wasted bytes).

Is there a performance penalty for using 32-bit mode? I ask this because many DC games seem to output final images in 16-bit color. Is there a reason for this?

On another note, what sort of performance hit does CLX take when rendering stencil shadows/shadow volumes with its modifier volumes functionality? Does it take a fillrate hit or require a separate pass for the rendering of the volumes?

Finally, SharkyGames wrote the following about DC in relation to Neon250 which I found strange, being that you indicated they were fairly equal in functionality:

Sharky Games said:
Sega told us that there are differences between the PC version and the DreamCast version of the video chip (beyond just clock speed 100MHz for DreamCast vs. 125MHz for PC), but the only difference they named was that the per-pixel lighting was unique to the DreamCast version.

Perhaps this has to do with the fact that CLX was capable of dot product per-pixel effects which it could blend with textures and perhaps even environment maps (EMBM?, I'm not sure if CLX could pull off something of that nature), allowing it to produce things like per-pixel specular highlights? If so, Neon 250 was not able to do the same?

Do you know of any DC game that makes use of modifier volumes? I seems Sonic Adventure 2 uses them, but I'm not sure. From what I've read SOA2, DOA2, and Soul Calibur are good showcases of DCs abilities. Are there any other graphical showcases for DC that you recommend?

Lazy8s said:
It's impressive that the architecture lets the chips be smaller while having even higher standards of image quality.

Impressive indeed. It's a pity we can't see such an architecture at work in the current generation.

Dave B(TotalVR) · Oct 28, 2004

Hey John or Simon...

increasing the tile size will obviously require more on chip memory to store it, but what are the benifits of using a larger tile size?

32x16 upped to 32x32 gives you a 32kb tile buffer right? Does this give some kind of burst-write advantage or something? or are the benifits saved clipping work?

If so would PCX-2 have hugely benifited from a larger tile size?

Mind of you most of the cpu time was polygon setup wasn't it?

Lazy8s · Oct 29, 2004

Can't give definitive answers, but to add...

Luminescent:

Is there a performance penalty for using 32-bit mode? I ask this because many DC games seem to output final images in 16-bit color. Is there a reason for this?

Was wondering too. Back then, there were lower standards for precision, so they probably hadn't gained enough of an appreciation for 32-bit yet. Perhaps they felt the difference wasn't too noticeable since internal accuracy was 32-bit fully and that more of the 8 MB should be saved for textures.

On another note, what sort of performance hit does CLX take when rendering stencil shadows/shadow volumes with its modifier volumes functionality? Does it take a fillrate hit or require a separate pass for the rendering of the volumes?

This explains some...

Simon F wrote:
"Modifier volumes are quite a bit more efficient than plain stencils because the (non volume) objects that get modified have "two" shading options for the bits that fall inside and outside of the volumes. This cuts down the amount of fill needed. The volumes themselves take the same sort of fillrate as stencils but this gets done in the ISP (i.e. very high fill rate) part and can carry on in parallel with the texturing. "

Do you know of any DC game that makes use of modifier volumes?

To add some examples, the Crazy Taxi games appear to use them for self-shadowing on the cabs, and Jet Set Radio seems to be a very prominent example with shadows in the environment and on the self-shadowed characters.

I seems Sonic Adventure 2 uses them, but I'm not sure.

It appears so, and it may have been a part of the reason the character self-shadowing didn't translate right to the GameCube version.

Simon F · Oct 29, 2004

Luminescent said:
A few more questions for Simon:

Well I don't know that everything was float - certainly the accumulation buffer was 'fixed' point.

Click to expand...

For clarification's sake (I want to get things straight, at least with what you can recall): all computational units, including polygon setup, texture setup, hidden surface removal, tiling, texure shader,

Ok those would probably use some form of FP.

and color blending compute with the FP format at varied precision levels;

I think colour blending and texture colour filtering are more likely to have been fixed point.

Whatever you program it to be. It could be 16 bit, (5:6:5 or 1:5:5:5) or 32. I think it also supported a genuine 24 bit mode (i.e. no wasted bytes).

Click to expand...

Is there a performance penalty for using 32-bit mode? I ask this because many DC games seem to output final images in 16-bit color. Is there a reason for this?

Performance wise, there'd be very little difference. Obviously 32bit > 24 > 16bit when it comes to bandwidth use, but that's not going to be terribly significant for a 640x480x[50|60] Hz display.

The main reason, I suspect, is that the developers want to reserve as much of the graphics memory for textures, and so using a 16-bit mode saves them a bit of space.

Now because PowerVR's 16-bit format is much better in general that other systems' there's not really a significant loss in quality in going to 16 bit. (IIRC, there's a public demo of this on the PowerVR developer website)

On another note, what sort of performance hit does CLX take when rendering stencil shadows/shadow volumes with its modifier volumes functionality? Does it take a fillrate hit or require a separate pass for the rendering of the volumes?

It looks like someone else has already dug up a quote I made on this so I'll be brief. The modifier volumes are more efficient than stencils because they use fewer passes. You do put them in as a separate objects, much like stencils but you don't have to resubmit the original geometry.

Sharky Games said:
Finally, SharkyGames wrote the following about DC in relation to Neon250 which I found strange, being that you indicated they were fairly equal in functionality:

Sharky Games said:

Sega told us that there are differences between the PC version and the DreamCast version of the video chip (beyond just clock speed 100MHz for DreamCast vs. 125MHz for PC), but the only difference they named was that the per-pixel lighting was unique to the DreamCast version.

Click to expand...

Perhaps this has to do with the fact that CLX was capable of dot product per-pixel effects which it could blend with textures and perhaps even environment maps (EMBM?, I'm not sure if CLX could pull off something of that nature), allowing it to produce things like per-pixel specular highlights? If so, Neon 250 was not able to do the same?

AFAIK, PVR250/Neon250 had the same shading capabilities as CLX. The only differences I can recall are that

CLX had a bigger tile size
CLX Had better support for palette textures
CLX Had some custom fading tricks that were often in arcade systems
CLX acted as the memory controller for the DC
PVR250 had VGA/2D engine and a PCI interface.

There might be others but it was a long time ago.

Do you know of any DC game that makes use of modifier volumes?

I see some have already been listed further down the page but one that did use them is "Toy Commander" which I think was great, but not well known, game.

Dave B(TotalVR) said:
Hey John or Simon...
increasing the tile size will obviously require more on chip memory to store it, but what are the benifits of using a larger tile size?.....
or are the benifits saved clipping work?

There's no clipping involved with tiles. I don't know why people assume this. :? :? :?

Luminescent · Oct 29, 2004

Simon F said:
The volumes themselves take the same sort of fillrate as stencils but this gets done in the ISP (i.e. very high fill rate) part and can carry on in parallel with the texturing.

I assume the ISP is some sort of ROP processor that could output more than 1 modifier volume op per clock in parallel with the rest of the rasterizer pipeline, thus the term "very high fillrate." Is this correct?

Is there stencil buffer support in CLX aside from the modifier volume functionality? If so, is the stencil buffer shared with the z-buffer?

Is CLX capable of EMBM? Would it be able to use environment mapping in conjunction with its dot capability for a similar effect?

Finally, how many internal raster passes (with accumulation buffer) could CLX complete before it sending final output to the framebuffer? Is it theoretically a limitless number of passes?

Grim Reaper · Oct 29, 2004

Luminescent said:
Do you know of any DC game that makes use of modifier volumes? I seems Sonic Adventure 2 uses them, but I'm not sure. From what I've read SOA2, DOA2, and Soul Calibur are good showcases of DCs abilities. Are there any other graphical showcases for DC that you recommend?

If you're interested in seeing what CLX is capable of, rather than specifically the Dreamcast, you should also take a look at some NAOMI arcade machines or, even better, NAOMI 2 machines like Virtua Fighter X.

Luminescent · Oct 29, 2004

Most of the arcades around my house are too cheap to carry Naomi 2 based machines, although I'll try hard to look for one.

Dave B(TotalVR) · Oct 29, 2004

Simon F said:
There's no clipping involved with tiles. I don't know why people assume this.

I meant the work done dividing the scene up into tiles, working out which tile this vertex is in and blah. Binning?

Anyway, if there is no advantage to larger tile sizes then why isn't it done with a 1x1 tile size

Loewe · Oct 29, 2004

Dave B(TotalVR) said:
Anyway, if there is no advantage to larger tile sizes then why isn't it done with a 1x1 tile size

If you only see the ISP I think you are right, a 1x1 tile should be the efficient way.

But would this chip be "well-balanced"?

Lazy8s · Oct 29, 2004

Luminescent:

From what I've read SOA2, DOA2, and Soul Calibur are good showcases of DCs abilities. Are there any other graphical showcases for DC that you recommend?

Dreamcast games I'd also include:

F355 Challenge (60 hz)

Virtua Tennis (60 hz)

Samba de Amigo (60 hz)

SEGA Sports NFL 2K2 and NBA 2K2 (60 hz)

Ecco the Dolphin (30 hz)

Jet Set Radio (30 hz)

Shenmue series (30 hz) -- for variety in animation and texture

Megadrive1988 · Oct 29, 2004

this is one informative thread. thankyou Lazy8s for bringing this up.

Just to clarify some of the (key) differences between The PC and console versions of the series 2 HW,

CLX tile was 32x32, PC part was 32x16.
CLX support alpha test with HW front->back sorting to deliver massive effective fillrates when it was use, this was not available on the PC part.
CLX tiling was completely handled by HW, the PC part was 50:50 split between HW and SW
CLX include latency buffering that allowed VQ and palettised to run at full rate, this was removed from the PC part resulting in 50% performance for those formats.
CLX support 2 or 3 (think it was effectively 3 wasn't it Simon ?) external 32 bit memory busses (think in terms of much hyped NV/ATi crossbars), The PC part utilised a single 64 bit bus resulting much greate page break impact.

This all adds up to the PC part being between 50-75% of the performance of the console part inspite of the higher clock rate (125 vs 100 MHz)...

...The differences between CLX and PMX1/Neon250 were all to do with minimising chip area...

this has to be one the best explainations & comparisons of PowerVR2DC and Neon250 that I've read over the years. we've always known the Dreamcast part was more powerful. I remember reading in Next Generation one of the Sega higher ups said the Katana version of PowerVR2 would have more processors on chip than the PC version.

Also I recall reading that the Dreamcast chip had more pixel elements than the PC version. maybe that is directly related to the 32x32 vs 32x16 spec.

something hardly anyone mentions is the nice lighting effects in Daytona USA 2001. 'per-pixel volumetric car lighting' as IGN put it. really nice.

I am curious as to what PowerVR implementation Sega was looking at for its sucessor to Dreamcast.

Luminescent · Oct 30, 2004

PowerVR Infinite Planes technique and ISP explained: http://home.scarlet.be/~pin10741/ISPexpl.htm
Good job, Kristof.

Any word on whether or not CLX had a stencil buffer?

PowerVR 2DC

Luminescent

crystall

Lazy8s

Luminescent

Simon F

Tea maker

Luminescent

Lazy8s

Simon F

Tea maker

Luminescent

Dave B(TotalVR)

Lazy8s

Simon F

Tea maker

Luminescent

Grim Reaper

Luminescent

Dave B(TotalVR)

Loewe

Lazy8s

Megadrive1988

Luminescent

Similar threads