PSone...what was the avg polygon count for the avg game?

Heh now I feel that all those claims came from N64 fanboys.
Although I am surprised that the N64 takes the nod here with the sound
 
The original PS1 always felt like a machine that tried to do 3D but really couldn't do it well (compared to some its contemporaries like N64) and was much more suited for rather advanced 2D games instead.
 
I remember really loving Wipeout, Tekken, Tomb Raider, Crash Bandicoot, time Crisis and Micro Machines v3 though ... The only 2D game I remember liking was that crazy fun light gun party game that I forgot the name of right now.

Sure in retrospect it looks terrible but I totally loved these graphics back then ...
 
This I also doubt very much, GBA had no hardware 3D acceleration to my knowledge, and a very slow main CPU. Doing 3D math in software is slow as hell unless you got some sort of on-chip vector math unit or similar like in Sega Dreamcast's SH4.
It's not that slow. Although peak poly counts were a nice marketing number, it wasn't really the bottleneck. My GBA engine could render 64k/s visible textured polys, if you add backface culling it would peak at 128k/s. But the real limitation was the fillrate.

I've done all math with 24:8 fix point math, Sin/Cos was an 512entry LUT. the only time I needed division was during perspective projection, I've used an u16 reciprocal LUT of 2048 entries (if I recall correctly). As all memory was either sram or rom, accessing it randomly wasn't that much of an issue and very predictable, so I had no worries about that.

for rasterization, I've used a divide&conquer approach. The ARM instruction set allows quite some instruction to use a 'free' shift (nowadays on phones etc. it's an extra cycle, but on GBA it was really a free thing).
thus interpolating UVs was something like
Code:
UVnew = (UVleft+UVright)>>1
I had a tiny software stack in IWRam to push one side in, which, again on ARM, was just one instruction, because memory stores/loads and increment or decrements are a free instruction
Code:
*pUVStack++=UVRight;//one instruction

another trick was to actually store U and V in the same register. there is this old trick to blend 2 RGBA8 colors together
Code:
RGBAnew = ((RGBA0&0xfefefefe)>>1)+((RGBA1&0xfefefefe)>>1);
I've done a similar thing with UVs, but spaced them in general by one bit, thus UV would lay in a 32bit register like
Code:
000000000000000UUUUUUUU0VVVVVVVV
that's why uvnew = (UVleft+UVright)>>1; can interpolate both in one instruction.
there was some &0xfffeffff cleanup afterwards, of course.

to not calculate a new texture coordinates, textures were interleaved on the U axis (like a texture atlas), that allowed me to fetch it by directly using pTexture[UVNew];
textures were stored in ROM. You could manually setup the ROM frequency, which for those homebrew dev cartridges could run on highest settings.


The only really fast memory was the IWRam which had a 32bit bus. As ARM opcodes are 32bit, that's the only place where your code would run at 16MHz, if you'd run from ROM or the 256KB ram, you'd effectively run 8MHz at best (or with Thumb instructions at 16MHz, but that's not better either). Thus the fastest place for 'fillrate' critical stuff was ruled out by binary code. The 2nd best place was to use VMem, it's 16bit, but you could not access it in 8bit, you had two write 16bit, which made me snap my rasterization to 2pixel boundaries.

I made a kind of wipeout-fzero hybride with the engine, it was fully 3d, but the track had no height (similar to the F1 game). I have somewhere a backup of it all, but the only thing online is an early version that I've used to attract some artist :)
http://rapsooo.tripod.com/
on the bottom are 3 vids (the first time you click for download, tripod now seems to forward you to an ad page), recorded with my mighty 320x200 webcam I think.

there was way more to it, e.g. my own movie codec. funnily I've used what you'd call nowadays a hadamard transform, but I've derived it actually from the DCT transform and applied it on 4x4 blocks. but MOST of the frame just used motion compensation. No, there was no residual encoding, it was really just either a set of 4 4x4 transformed blocks or a memcpy of a previous-frame 8x8 block. there was also no smart quantitization or zigzag encoding, I've just stored the upper-left triangle of the transformed block. All data of a frame had a limit in size which implicitly limited the amount of 4x4 hadamard blocks. which block was motion vector and which transformed was decided by the MSE to the reference block.
the frame was then either huffman or lz78 encoded (whichever led to smaller size), as there was a piece of code in the firmware-ROM that had the decompression routines for those. essentially saved quite some binary size in IWRam.

I could go on forever if someone is not sleeping yet :)

but my point was, polycount wasn't the limiting factor, it rather was the fillrate.
 
@rapso Very enlightening! Awesome post, thank you so much! :D


Edit: reading your post fully, most of that stuff you write about is completely over my head, but thanks anyway hehe.
 
Last edited:
yw :)
btw. at the beginning, when I've thought about it, I made the simple math 16 840 000Hz / 240 / 160 / 60Hz, which is means you have slightly more than 7 cycles per pixel, -1 for the frame clear. Thus at first I thought it was a waste of time to try that and made a comanche/delta force like voxel raycaster. it was really fast, but the challenge was the memory size. 128mbit .....
but then there was wing commander announced
and Collin McRae

I think both (or at least one) used the "Blue Rose" engine
http://www.raylightgames.com/blueroses-engine/

which claimed at the time to be the fastest one.... I just had to beat that :D
 
Back
Top