About the PSP vs NDS graphics chipsD

jackal256

Newcomer
Been reading up a lot about the DS being a super GBA with better 3D hardware, And the PSP GPU is just a mobile GS with texture compression. When dev's unlocked the 333MHz/166MHz mode could've the PSP rival the PS2 for visuals or not?. Since ready at dawn noticed it chip was 4MB edram with a 512bit that had 10.5GB/s bandwidth. Without going into more detail seems like it had enough juice to do 480p30 or 60 if pushed.

With the DS being touted as a handheld N64 i doubt it could run conker at 30fps.
 
Its CPU was up to 2.6Gflops vs 6.2Gflops émotion engine.
Graphic chip could handle 34M polygons VS Graphic Synthesizer's 75M polygons.
 
Its CPU was up to 2.6Gflops vs 6.2Gflops émotion engine.
Graphic chip could handle 34M polygons VS Graphic Synthesizer's 75M polygons.

The Xbox CPU = 2gflops too, And those a raw specs without any textures/effects. On the PS2 many games peaked 15m, I highly doubt they couldn't push the PSP to 10 ~ 16m.
 
Its CPU was up to 2.6Gflops vs 6.2Gflops émotion engine.
Graphic chip could handle 34M polygons VS Graphic Synthesizer's 75M polygons.
Where did you find official specs for the PSP? All I can find are very high-level specs (e.g. CPU frequency, RAM/eDRAM ammount and not much more).

The Xbox CPU = 2gflops too
Per the specification, the Pentium 3 / Celeron A in the console does 4 FLOPs per cycle. At 733MHz that's ~3 GFLOPs.
 
With the DS being touted as a handheld N64 i doubt it could run conker at 30fps.
Neither could N64, though.

I remember back when the DS came out there was a thread here about it's GPU compared to N64, and the consensus was that 64 was more flexible, and could be faster because DS had a hardware geometry limit that was lower than 64's, but 64 almost never reached it's potential performance because of limitations elsewhere (bandwidth, texture cache, etc) while DS could often hit it's limits because of limitations elsewhere (limited screen resolution, limited feature set, etc).

I bet a passable version of Conker could be made on DS. Some scenes may have to be reworked or turned into FMV but I think it could be done, and it might even run smoother.
 
Neither could N64, though.

I remember back when the DS came out there was a thread here about it's GPU compared to N64, and the consensus was that 64 was more flexible, and could be faster because DS had a hardware geometry limit that was lower than 64's, but 64 almost never reached it's potential performance because of limitations elsewhere (bandwidth, texture cache, etc) while DS could often hit it's limits because of limitations elsewhere (limited screen resolution, limited feature set, etc).

I bet a passable version of Conker could be made on DS. Some scenes may have to be reworked or turned into FMV but I think it could be done, and it might even run smoother.

The N64 was never really used fully Conker & Rayman 2 could've looked much better if Nintendo gave out better dev code. The N64 could do 650k polys with advanced effects and much better draw distance at 480i, There is one racing game that used the ram pak that showed the N64 could even come close to the DC. The PSone was outdated even for the time with It having 2MB ram and a GPU with missing features where most visual pushing games had to use fake 3D levels or stream the levels as chunks. The N64 in stock mode could do 360k polys more than 90% of many PSone/DS games.

The DS is just a beefed up PSone.
 
What game was that
He's probably talking about World Driver Championship, which is a great looking game for N64 and has a "high res" mode, that's letterboxed, and it does about as good as the worst looking racing games on Dreamcast, except that in that mode it isn't full screen and the texture filtering looks exactly like you would expect an N64 game to look like. It also doesn't use the expansion pak. The high res mode is interesting, and I think it outputs 480i in that mode, but still only renders 240 lines. Performance can suffer as well. I'm not sure there is a Dreamcast racer that isn't 480p, though. There might be one, but I only own about 130ish games for the platform, out of 600+ if I remember correctly, so there are plenty I've never played.
 
Yeah WDC is probably the most impressive realistic racer on the machine. ERP has said Stunt Racer is more impressive because it was their second go with the tech though.

WDC has some issues like z-fighting because of forgoing z-buffer and also a lot of input lag.
 
Last edited:
Yeah WDC is probably the most impressive realistic racer on the machine. ERP has said Stunt Racer is more impressive because it was their second go with the tech though.

WDC has some issues like z-fighting because of forgoing z-buffer and also a lot of input lag.

Its easier to push a lot of polygons when you do little to no bone-animation, skinning etc. I also think that game even skipped over Z-buffering to save bandwidth. Clever.
 
I'm not sure there is a Dreamcast racer that isn't 480p, though. There might be one, but I only own about 130ish games for the platform, out of 600+ if I remember correctly, so there are plenty I've never played.

AFAIK while there are some DC games that will only output at 480i (even with the VGA box trick), the DC hardware could only render at 640 x 480, and all rasterisation was done internally at 24-bit. The GPU gubbins that accelerated binning of polys over tiles could only work at a 640 x 480 number of tiles iirc, and tiles were of a fixed resolution. This was very fast and used more silicon, but was less flexible than the PC Neon 250 way of doing the same calculations which involved the CPU.

I used to have a link to a very interesting SimonF breakdown of the respective chip's features - DC is actually faster than the later PC part in many ways. Opaque poly fillrate for CLX2 was insane for the time and the size and clock of the chip.

WDC is plenty impressive for N64 though. ERP was a legend.
 
AFAIK while there are some DC games that will only output at 480i (even with the VGA box trick), the DC hardware could only render at 640 x 480, and all rasterisation was done internally at 24-bit. The GPU gubbins that accelerated binning of polys over tiles could only work at a 640 x 480 number of tiles iirc, and tiles were of a fixed resolution. This was very fast and used more silicon, but was less flexible than the PC Neon 250 way of doing the same calculations which involved the CPU.

I used to have a link to a very interesting SimonF breakdown of the respective chip's features - DC is actually faster than the later PC part in many ways. Opaque poly fillrate for CLX2 was insane for the time and the size and clock of the chip.

This is a bunch of correct stuff getting a bit mixed up into not quite right.

Internally, all rendering is does to a 32x32 pixel buffer with 24-bit color+ 8-bit alpha. All blending occurs at this precision and is only optionally dithered when writing to a 16-bit framebuffer. There's actually two color buffers, it renders to one while the other is written to RAM. After the backbuffer is written, it can be reused as temporary storage for preforming multitexture effects.

The pixel clock video DAC is mostly fixed and is designed to run only at 640x480 at 59.94 FPS. You can divide the clock in half, to support NTSC/PAL interlaced, but that's it. It has an options to draw double columns or rows to allow displaying 320x240 (or 640x240 or 320x480) framebuffers. The only other interesting standard resolution supported is 640x400 at 70 FPS on VGA. It's possible to tweak some of the sync timing to make monitors think that a different, higher resolution is being used, but you always end up with a lower actual resolution. The only real use is to trick a monitor into thinking it's getting a widescreen signal, as a kind of automatic anamorphic widescreen switch. It's also possible to extend the display a bit into overscan, but it can cause brightness problems depending on what is displayed.

The actual rendering part of the PVR seems capable of rendering at a resolution of 2048x2048 (or at least various bit fields seem sized so that it can work, I haven't actually tried it). The tile accelerator, the part that generates the lists for the PVR to draw, is limited to a resolution of 1280x480 (i.e. 640x480 with 2x horizontal supersampling). It's probably possible to manually generate command lists on the CPU to bypass this limitation and render at 2048x2048, but I don't really see any practical use for it, outside of some kind of demo-scene style trick.

While the PVR always renders in 32x32 tiles, it's possible to render to a framebuffer that isn't a multple of 32x32 pixels. There's a clipping function that can occur when the tile writing to RAM. So if you wanted to render to something like 33x33 (for some bizarre reason) it would internally render four 32x32 tiles, then only write the 33x33 region. This is how low res 240 row rendering is possible, since 240 isn't a multiple of 32. You aren't limited to clipping the right and bottom off, you can clip the top and left edges, too.

The PVR has scalers when writing the tile to the framebuffer. There's a 1/2 horizontal downscaler, for supersampling, and a more flexible vertical scaler. The vertical scaler can work on almost any ratio, so can scale up, to stretch a 480 row render to fullscreen 576 row PAL, or scale down for supersampling. The downscaler is also used for deflickering on interlaced NTSC/PAL. The upscaler seems like it's bilinear, and is limited to a max 2x upscale. The downscale can work on any ratio, but downscale just blurs the tile a bit (with a user defined weights) then discards rows. It's not actually possible to do a correct 2x box filter vertical downscale. Three taps are used, but only two coefficients can be specified. One coefficient is used for the center row, and the other coefficient is used for both top and bottom rows. The PVR correctly handles blur crossing the top/bottom of a tile.

There seems to be a bug in the official SDK when horizontal supersampling is used. It would turn on the deflicker on VGA, where it's not necessary, making the screen blurrer than it should be. I made a Codebreaker code a while back to fix this. I had to disassemble part of Codebreaker to do it, since normal codes were limited to writing to main RAM, and I wanted a code to disable the blur by modifying the coefficient register, since that would be universal to all games. The only known official games to use supersampling are Ready 2 Rumble Boxing (only the first one, not the sequel), Omikron, Wacky Racers, and some Japanese only Toyota promotional demos.

As for fillrate, I accidentally did a fillrate test a while back. I've been working on improving a Genesis emulator for the Dreamcast by getting it to use the PVR to render the screen, instead of doing it in software. (60 FPS, no frameskip, decent audio) When working on the code to draw the Genesis's background layer/display off handling (generally just a solid full screen color, but some games can change it per line), I made a mistake modifying it. Originally, it was handled by drawing a 320x224 solid color quad. I wanted to change it to drawing 224 quads with a size of 320x1, so I could change the color/depth per line. I forgot to change the height of the quad, so it was drawing 224 quads with a size of 320x224. This added about 5 ms to the render time.

So the first 24 quads were processed unclipped (first quad drawn at row 8, rendering to a 320x256 tile buffer, clipped to a 320x224 framebuffer), so that's 24 quads fully "drawn", or 5376 rows. the remaining quads where partially rendered, each quad one row is less than the previous. I think that's 24700ish rows? So 25476 rows times 320 columns at 60 FPS is 577 Mpixel/s. Since that only added on a bit less than a third of a frame, it looks like about 1.9 Gpixel/s was possible? You could probably get more if you went out of your way to design a best case benchmark.

(Also, I don't draw 224 320x1 quads anymore. It was taking about a third of a millisecond to draw those lines, so now rows with identical colors/depth get merged into a single quad.)
 
This is a bunch of correct stuff getting a bit mixed up into not quite right.
Sounds about right for me. ;) Thank you for your amazing response!

It has an options to draw double columns or rows to allow displaying 320x240 (or 640x240 or 320x480) framebuffers.

Now this bit I did actually do remember. I guess this is how many 2D games did 320 x 240 while putting out a native 480p VGA image? And my memory is a bit foggy, but didn't Capcom vs SNK use a 640 x 480 background and 320 x 240 "sprites"?

The actual rendering part of the PVR seems capable of rendering at a resolution of 2048x2048 (or at least various bit fields seem sized so that it can work, I haven't actually tried it). The tile accelerator, the part that generates the lists for the PVR to draw, is limited to a resolution of 1280x480 (i.e. 640x480 with 2x horizontal supersampling). It's probably possible to manually generate command lists on the CPU to bypass this limitation and render at 2048x2048, but I don't really see any practical use for it, outside of some kind of demo-scene style trick.

IIRC, SimonF speculated that it should be possible to bypass the tile accelerator, do that stuff on the CPU, and then output at a different resolution. I think I'd been asking him about a possible 800 x 600 resolution that I'd read that some homebrewer was attempting.

While the PVR always renders in 32x32 tiles, it's possible to render to a framebuffer that isn't a multple of 32x32 pixels. There's a clipping function that can occur when the tile writing to RAM. So if you wanted to render to something like 33x33 (for some bizarre reason) it would internally render four 32x32 tiles, then only write the 33x33 region. This is how low res 240 row rendering is possible, since 240 isn't a multiple of 32. You aren't limited to clipping the right and bottom off, you can clip the top and left edges, too.

I didn't know this (or I'd forgotten)!

The PVR has scalers when writing the tile to the framebuffer. There's a 1/2 horizontal downscaler, for supersampling, and a more flexible vertical scaler. The vertical scaler can work on almost any ratio, so can scale up, to stretch a 480 row render to fullscreen 576 row PAL, or scale down for supersampling. The downscaler is also used for deflickering on interlaced NTSC/PAL. The upscaler seems like it's bilinear, and is limited to a max 2x upscale. The downscale can work on any ratio, but downscale just blurs the tile a bit (with a user defined weights) then discards rows. It's not actually possible to do a correct 2x box filter vertical downscale. Three taps are used, but only two coefficients can be specified. One coefficient is used for the center row, and the other coefficient is used for both top and bottom rows. The PVR correctly handles blur crossing the top/bottom of a tile.

I don't personally recall any PAL games appearing to be truly fullscreen - there always seemed to be a small black bar top and bottom and the images often felt a little vertically crushed. This made me think PAL games were topping out at 480, though that's just my impression and I was also using VGA a lot.

With the downscaling, am I right in thinking that would be done as you moved off the tile buffer and into the full framebuffer in ram? If so this would mean 480p would take up to ~ 2x the memory of an interlaced image (depending on scaling). I could swear that some games that looked "24-bit" colour on interlaced PAL via RGB (so presumably downscaled / flicker filtered) began to show signs of dithering when switched to VGA mode. It's something I barely noticed on my Trinitron CRT monitors, but I really began to notice ... some kind of fullscreen pattern .... when I used my DC via VGA on LCD monitors. So I've speculated that some games switch to dithering down to 16-bit when using progressive scan 480p.

There seems to be a bug in the official SDK when horizontal supersampling is used. It would turn on the deflicker on VGA, where it's not necessary, making the screen blurrer than it should be. I made a Codebreaker code a while back to fix this. I had to disassemble part of Codebreaker to do it, since normal codes were limited to writing to main RAM, and I wanted a code to disable the blur by modifying the coefficient register, since that would be universal to all games. The only known official games to use supersampling are Ready 2 Rumble Boxing (only the first one, not the sequel), Omikron, Wacky Racers, and some Japanese only Toyota promotional demos.

Well that's a bit of bummer. I remember someone (might have been SimonF or even yourself) saying that SS only seemed to have about a 20% performance hit for overall frame time, so I wondered why it wasn't used more - particularly for 30 fps games. Perhaps this is part of the answer.

Do you have any insight as to the performance hit for enabling aniso filtering? Test Drive Le Mans used it iirc, and I remembering thinking the centre line looked remarkably clear as it disappeared off into the distance ....

As for fillrate, I accidentally did a fillrate test a while back. I've been working on improving a Genesis emulator for the Dreamcast by getting it to use the PVR to render the screen, instead of doing it in software. (60 FPS, no frameskip, decent audio)

Having to use frameskip was why I moved away from Genesis emu on DC! It's awesome that you're still bashing away on a DC emu!

When working on the code to draw the Genesis's background layer/display off handling (generally just a solid full screen color, but some games can change it per line), I made a mistake modifying it. Originally, it was handled by drawing a 320x224 solid color quad. I wanted to change it to drawing 224 quads with a size of 320x1, so I could change the color/depth per line. I forgot to change the height of the quad, so it was drawing 224 quads with a size of 320x224. This added about 5 ms to the render time.

So the first 24 quads were processed unclipped (first quad drawn at row 8, rendering to a 320x256 tile buffer, clipped to a 320x224 framebuffer), so that's 24 quads fully "drawn", or 5376 rows. the remaining quads where partially rendered, each quad one row is less than the previous. I think that's 24700ish rows? So 25476 rows times 320 columns at 60 FPS is 577 Mpixel/s. Since that only added on a bit less than a third of a frame, it looks like about 1.9 Gpixel/s was possible? You could probably get more if you went out of your way to design a best case benchmark.

(Also, I don't draw 224 320x1 quads anymore. It was taking about a third of a millisecond to draw those lines, so now rows with identical colors/depth get merged into a single quad.)

That's absolutely bonkers for a 100 MP/s console, and done without actually trying achieve a high figure.

CLX2 on wikipedia is gives a figure for peak opaque fill with a sort depth of 60. Does this "sort depth of 60" refer to the number of polygons the tile accelerator can process per pass, per pixel (or maybe tile?), or something else?

Sorry for the long post, but I still get all excited and nostalgic about the DC.
 
I don't personally recall any PAL games appearing to be truly fullscreen - there always seemed to be a small black bar top and bottom and the images often felt a little vertically crushed. This made me think PAL games were topping out at 480, though that's just my impression and I was also using VGA a lot.
Well, you'd have to allocate space for a 240 KB larger framebuffer (640x480x2x2 -> 640x576x2x2), so I guess most developers didn't plan ahead for that or didn't think it was worth the cost. (Or possibly the SDK didn't support it? But it would be very odd to add a hardware feature like that and not use it.)

With the downscaling, am I right in thinking that would be done as you moved off the tile buffer and into the full framebuffer in ram? If so this would mean 480p would take up to ~ 2x the memory of an interlaced image (depending on scaling). I could swear that some games that looked "24-bit" colour on interlaced PAL via RGB (so presumably downscaled / flicker filtered) began to show signs of dithering when switched to VGA mode. It's something I barely noticed on my Trinitron CRT monitors, but I really began to notice ... some kind of fullscreen pattern .... when I used my DC via VGA on LCD monitors. So I've speculated that some games switch to dithering down to 16-bit when using progressive scan 480p.
The scaling happens as the tile is written to video RAM's frame buffer. So supersampling doesn't grow the frame buffer at all (although some structures used to track the display lists grow a bit).

I don't know if games adjust color depth depending on interlacing. I think it's more likely that games would use a double buffered 16-bit 640x480 frame buffer for both interlaced and progressive. You can really only get away with half height interlaced frame buffers when you can consistently hit 60 FPS.

Well that's a bit of bummer. I remember someone (might have been SimonF or even yourself) saying that SS only seemed to have about a 20% performance hit for overall frame time, so I wondered why it wasn't used more - particularly for 30 fps games. Perhaps this is part of the answer.
The performance depends. If you're doing a lot of small polygons, and the hardware is bound by triangle setup, the performance hit is small. If fillrate is the bottleneck, then it's could double rendering time.

Looking through some tests I made for a PVR driver I'm working on, I saw an overhead of about 25-50% for supersampling for game-like scenes. The lower end was for polygon heavy scenes (~30-45K tris), while the higher end was for polygon light scenes (~5K tris).

Do you have any insight as to the performance hit for enabling aniso filtering? Test Drive Le Mans used it iirc, and I remembering thinking the centre line looked remarkably clear as it disappeared off into the distance ....
The "aniso" filter does a 2x2 ordered grid supersample of the texture, so it's pretty inefficient, and quadruples the fillrate cost of any pixels it's applies to (no effect to triangle setup) to push the mipmap transitions back one level. It always does 4 samples, even when magnifying the highest mipmap level of a bilinear filter, where it's completely useless. It's possible to use it to draw antialiased point sampled textures (like if you wanted to rotate or scale a sprite).

Actually, I looked into Le Mans's data files years ago (~10 years?). Almost everything is contained in a ZIP file with a modifed header/footer (I guess) that causes most programs to refuse to open it without doing some kind of repair to the file. Most stuff was specified in text files, besides the actual geometry and textures. I remember that some text files listed texture names and specified some kind of LOD value, which sounded more like a mipmap bias than aniso. There was also text in one of the UI files with development goals/progress messages to the dev. team, and I think it had files for go-karts? (I've never seen any references to TD LeMans go-karts anywhere on the internet.)

That's absolutely bonkers for a 100 MP/s console, and done without actually trying achieve a high figure.
So I tried doing a real opaque fillrate benchmark. For a quad made out of two triangles exactly filling the screen, it was possible to draw 85 layers on a 640x480 framebuffer while maintaining 60 FPS, for 1.56 Gpixel/s. By drawing a single triangle large enough to fill the entire screen, it was possible to draw 169 layers, for 3.1 Gpixel/s. By drawing a true quad (with the "sprite" command, what I was using to draw lines in the accidental benchmark) exactly filling the screen, you get the same performance as the oversized single triangle. The hardware also renders an implicit background plane polygon that's not counted in this. Turning textures on or off has no effect. All polygons drawn were coplanar, I don't know if depth fighting or intersections would reduce performance.

If I ever get around to making a demo, I'm going to abuse this to mess with people who try to run it on an emulator. Have fun running at 4K with 150 times overdraw! I'll be sure to draw everything in back-to-front order.

CLX2 on wikipedia is gives a figure for peak opaque fill with a sort depth of 60. Does this "sort depth of 60" refer to the number of polygons the tile accelerator can process per pass, per pixel (or maybe tile?), or something else?
I'm not sure what the sort depth thing is about.

I just remembered, but I think I read somewhere (maybe the Assembler or Sonic Retro forums) that the Neon 250's VQ compression was changed. On the DC, a VQ texture has a codebook with 256 entries of 2x2 tiles, while on the Neon 250, you had 256 entries of 1x1 tiles. So basically it was changed to an 8-bit palettized mode instead of real VQ. (Well, I guess scalar quantization is technically a form of vector quantization, but that's overselling it a bit.)

Also interesting: die shot of the Dreamcast's PVR:
https://www.grafik-feti.de/ftp/Die-...430___Stack-DSC08325-DSC08418_-_ZS-DMap-1.jpg
You can easily see the 32 depth/rasterization units in the bottom left. There are more images in the parent directory.
 
He's probably talking about World Driver Championship, which is a great looking game for N64 and has a "high res" mode, that's letterboxed, and it does about as good as the worst looking racing games on Dreamcast, except that in that mode it isn't full screen and the texture filtering looks exactly like you would expect an N64 game to look like. It also doesn't use the expansion pak. The high res mode is interesting, and I think it outputs 480i in that mode, but still only renders 240 lines. Performance can suffer as well. I'm not sure there is a Dreamcast racer that isn't 480p, though. There might be one, but I only own about 130ish games for the platform, out of 600+ if I remember correctly, so there are plenty I've never played.

Yup. If the N64 had proper dev code & used the ram pak It could in theory could do near DC games at 480i. I highly doubt the DS can run the visual pushers the PS1 & N64 had without severe cuts.
 
Back
Top