"Yes, but how many polygons?" An artist blog entry with interesting numbers

Some titles did make use of VU0 i think, Vice City did the 4.0 DTS on it, a pretty bad mix though. The hardware was kinda weak but the strong market position made it the most utilized system of that generation. I can imagine the og xbox wasn't as well utilized, though i must say unreal tournament the liandry conflict was on another level, almost next gen, better then anything else perhaps in tech, but that's it.

Utilized in terms of size of library, or utilized in terms of system usage? I think even amongst 1st, 2nd and the very best PS2 dedicated 3rd parties, VU0 was fairly underutilized. Would love to hear more examples as to how it was used though.

The availability of DX on Xbox made it much easier to push close to the limits of that system compared to that of the PS2. Xbox visuals were probably often limited by multiplatform development that dictated workability with the PS2 and Gamecube. Splinter Cell: Chaos Theory was a very special example of a third party doing it right because the series' home was the Xbox, and it's users would expect no less.
 
What do you mean? You mean polycount of assets is increased. Because polygons render per frame should be a lot less
Clipping when polygon hits edge of view frustum pyramid and is cut to new polygons which reside within it, thus increases amount of visible polygons.
 
I am not sure I understand :p

If a model is halfway out of view, and some of it's tris get rejected, that is still called culling, not clipping. It is a very granular kind of culling, that is for sure, but still. So granular, depending on how fast/slow that culling is, and how fast/slow simply going through the geometry anyway, it might become a non-optimisation: rejecting all those polygons ends up slower than simply rendering them anyway.

Clipping refers to when a single triangle is halfway out of view, so it gets sliced and re-triangulated so that all its verts stay within the screenspace bounds for rasterisation. In practice, the screen coordinate bounds ususually extend a bit beyond the screen edge so as to reduce the amount of tris that need be clipped (more polys end up culled entirely before having to clip them)
 
Clipping refers to when a single triangle is halfway out of view, so it gets sliced and re-triangulated so that all its verts stay within the screenspace bounds for rasterisation. In practice, the screen coordinate bounds ususually extend a bit beyond the screen edge so as to reduce the amount of tris that need be clipped (more polys end up culled entirely before having to clip them)
Since the clipping is beyond the boundaries of the viewport, isnt it possible to, instead of slicing polygons, to actually "vanish" the completely once all their vertices extend beyond set boundary?
 
Since the clipping is beyond the boundaries of the viewport, isnt it possible to, instead of slicing polygons, to actually "vanish" the completely once all their vertices extend beyond set boundary?

Yes. And that is done. And that is called culling. Frustum Culling to be more specific.
But untill ALL verticies are completely out of screen-space, you can't simply not render the poly without leaving visible holes. That's why then it has to be clipped.
 
Yeah but the question is , the CPU still has to handle t&l then sends it to the GPU who does all that right? But since the CPU is still burdened with calculating polygons wouldn't it be a good thing to cull as many as possible before sending it to the GPU ?

There are 5 processors involved in PS2 graphics:

The EE (CPU) which does normal CPU things. In this case it's job is to do high-level culling of whole batches of polygons. So, decide to draw the whole batch or skip it. Batches to be drawn are linked into a queue of instructions for the VIF.

The VIF is a slightly-programmable DMA engine. Basically, it's job is to copy data from the main memory to the internal memory of the VU. In the process it can do a little rearranging of offsets, strides, packing, unpacking the source and destination buffers. The VIF also sends code to the VU1. The VU1 only has 16kb of memory total for both code and data. So, you have to stream code for specific situations kind of like switching shaders.

The VU1 receives code and a series of data chunks from the VIF. For each data chunk, the VIF can also start the code at a data-driven address. That's how you can have multiple routines in a single code chunk. The VU1's job is to prepare data for the GIF. The VU has limited (16 bit) general purpose registers and instructions. It's real power is in vector instructions that can do large volumes of math. The VU handles vertex animation, lighting, UVs (including some of the mipmap math), etc. It also handles culling individual off-screen triangles and it must manually clip triangles into sub-triangles vs. the edge of the screen. Basically, it handles everything to do with geometry. It's not a 1-in-1-out vertex setup like vertex shaders. It's "blob of bytes" in, many-triangles-out.

The GIF is another DMA engine. It reads chunks of bytes from the VU1's memory and stuff it into the registers of the GS. Like the VIF, you point it at a command queue containing a mixture of control bytes and data bytes for it to read through and interpret.

The GS is the rasterizer. It has no instruction set. You control it entirely by setting values in registers (via the GIF). There are registers to define a texture to read from the embedded DRAM. Registers to define the framebuffer in the same DRAM. Registers to define the blending/Z/interpolator actions (vertex color modes). And, a register where you stuff vertex positions. Stuff that 1 register 3 times (yes, overwriting 2 values) and the GS will rasterize a triangle into the framebuffer according to the state defined in the rest of the registers. Alternatively, there is a triangle strip mode that only requires 3 positions to get the first triangle, then 1 per triangle after that. For a given triangle, the GS can only handle a single texture, a few options for how to incorporate the vertex colors and a limited blend mode. So, if you want to use two textures, you need to draw the same triangle twice with different configurations. Fortunately, changing GS configs can be done by setting a few registers at a cost of 1 cycle each. So, the VU can transform a dozen triangle once, have the GS rasterize them, switch GS configs and rasterize them again. That's twice the pixel work for the GS, but it's pretty cheap for the VU.


So, that's a long, fun brain-dump just to say that the CPU isn't really burdened with calculating polygons. The VU1 is :) Your main intuition is correct though. The VU can and should do a whole lot of fine-grained culling before sending triangles to the rasterizer. That can explain a lot of the difference in polygon culling measured in the emulators.
 
Thanks for explaining the PS2 abit more to us :) Do you think Transformers would have been possible on the og xbox at the time? The huge draw distances and the amount of trees on display with zero fog was quite amazing at 60fps. Game had impressive physics also, almost like Havok.
 
What was the VU0 planned for and what it could have done?

The VU1 was set up to do all of the geometry and animation work to prepare triangles for the rasterizer. I'm not sure what the plan was for the VU0. But, I think the idea was basically "What if we added another VU to help out the CPU with math-heavy work?"

The problem with the VU0 was that there wasn't an obvious good way to schedule streaming results out of it. You could stream data into it easy enough. But, it couldn't send data out on it's own. It also couldn't tell the CPU when data was ready. So, the official plan was to have the CPU poll to see if the VU0 was idling before triggering a DMA out operation. It was very difficult to set that up without having the CPU waste time polling with nothing else to do (or the CPU doing lots of other stuff while the VU0 idles waiting for more work).

Late in the PS2's lifetime, my team at High Voltage Software figured out how to make scheduling work out. I think the VU0 would somehow trigger an interrupt and the CPU's interrupt handler would move the DMA pipeline along asynchronously from the CPU's main thread. I expect PS2-specialty shops figured this out way before we did, but they weren't sharing :p

With that in place, moving high-level culling to the VU0 would be an obvious first task. The VU1 doesn't have the room to do animation keyframe->blended matrices work along side the vertex transforms. So, that would be a great job for the VU0 to offload from the CPU. Physics in general would fit well too. Audio processing would be possible. But, there would be multiple hops involved getting data from main RAM->VU0 RAM->main RAM->IOP RAM->SPU RAM. That's a lot of latency.
 
Thanks for explaining the PS2 abit more to us :) Do you think Transformers would have been possible on the og xbox at the time? The huge draw distances and the amount of trees on display with zero fog was quite amazing at 60fps. Game had impressive physics also, almost like Havok.

That would be difficult. The PS2 could handle more raw, basic, simple polygons than the Xbox, but only after a whole lot of work and constraints to appease the hardware. It also technically had more fill rate, but again with a whole lot of constraints. The Xbox had a stronger CPU and a much easier API. The vertex and pixel shaders had way more features than the GS, were much more familiar to PC devs and they were by no means weak. So, if you want to draw a bunch of single-texture, vertex lit, alpha-testing polys then add full-screen "motion blur" as a post-process, the PS2 is actually a much better fit. Throw in lightmaps and it gets a bit harder for the PS2. Normal maps? LOL, no. (Someone figured out how to technically make it work, but it took like 8 passes). Shadow maps? Hope you are OK with solid black shadows. Etc...
 
There are 5 processors involved in PS2 graphics:

The EE (CPU) which does normal CPU things. In this case it's job is to do high-level culling of whole batches of polygons. So, decide to draw the whole batch or skip it. Batches to be drawn are linked into a queue of instructions for the VIF.

The VIF is a slightly-programmable DMA engine. Basically, it's job is to copy data from the main memory to the internal memory of the VU. In the process it can do a little rearranging of offsets, strides, packing, unpacking the source and destination buffers. The VIF also sends code to the VU1. The VU1 only has 16kb of memory total for both code and data. So, you have to stream code for specific situations kind of like switching shaders.

The VU1 receives code and a series of data chunks from the VIF. For each data chunk, the VIF can also start the code at a data-driven address. That's how you can have multiple routines in a single code chunk. The VU1's job is to prepare data for the GIF. The VU has limited (16 bit) general purpose registers and instructions. It's real power is in vector instructions that can do large volumes of math. The VU handles vertex animation, lighting, UVs (including some of the mipmap math), etc. It also handles culling individual off-screen triangles and it must manually clip triangles into sub-triangles vs. the edge of the screen. Basically, it handles everything to do with geometry. It's not a 1-in-1-out vertex setup like vertex shaders. It's "blob of bytes" in, many-triangles-out.

The GIF is another DMA engine. It reads chunks of bytes from the VU1's memory and stuff it into the registers of the GS. Like the VIF, you point it at a command queue containing a mixture of control bytes and data bytes for it to read through and interpret.

The GS is the rasterizer. It has no instruction set. You control it entirely by setting values in registers (via the GIF). There are registers to define a texture to read from the embedded DRAM. Registers to define the framebuffer in the same DRAM. Registers to define the blending/Z/interpolator actions (vertex color modes). And, a register where you stuff vertex positions. Stuff that 1 register 3 times (yes, overwriting 2 values) and the GS will rasterize a triangle into the framebuffer according to the state defined in the rest of the registers. Alternatively, there is a triangle strip mode that only requires 3 positions to get the first triangle, then 1 per triangle after that. For a given triangle, the GS can only handle a single texture, a few options for how to incorporate the vertex colors and a limited blend mode. So, if you want to use two textures, you need to draw the same triangle twice with different configurations. Fortunately, changing GS configs can be done by setting a few registers at a cost of 1 cycle each. So, the VU can transform a dozen triangle once, have the GS rasterize them, switch GS configs and rasterize them again. That's twice the pixel work for the GS, but it's pretty cheap for the VU.


So, that's a long, fun brain-dump just to say that the CPU isn't really burdened with calculating polygons. The VU1 is :) Your main intuition is correct though. The VU can and should do a whole lot of fine-grained culling before sending triangles to the rasterizer. That can explain a lot of the difference in polygon culling measured in the emulators.

Yes thank you for that explanation , its gives a clearer picture for the ps2. But I was actually asking about the dreamcast. That even if the gpu does auto cull of backfaces , hidden surface removal and clips x,y the cpu still has to calculate all those polygons before sending it to the gpu right? I was just asking if clipping/culling on the cpu is a good idea to do before it is sent to the gpu or is it already overloaded with everything else( physics , ai and so on). As the sdk mentions backface culling can be cpu costly.
 
Returning to how many polygons. I extracted these off the disc files. The game is marionette handler 2 for the dreamcast. Its a robot fighting sim where you dont control the robots but you program their a.i. Its a very low budget title and a sequel.The robots range from 3,100 triangles to 4,700 triangles with no weapon or effects(gun shots, rocket boost bursts). The stages range from 3,000 to 5,200 triangles. The weapons range from 300 to 500 triangles.I guess if you look at it polygon wise it performs like dissidia/ 012 on the psp where the characters range from 1,500 triangles to 2,400 triangles(with weapons) and the stages up to 8,000 triangles.

Robot 1 - 3,923 tris
marionetteh2rob1.jpg



robot 2 - 4,791 tris
marionetteh2rob2.jpg


robot 3- 4,110 tris
marionetteh2rob3.jpg


Stage 2 - 5,252 tris
marionetteh2stag2.jpg


stage 6 - 4,151 tris
marionetteh2stag6.jpg
 
Yes thank you for that explanation , its gives a clearer picture for the ps2. But I was actually asking about the dreamcast. That even if the gpu does auto cull of backfaces , hidden surface removal and clips x,y the cpu still has to calculate all those polygons before sending it to the gpu right? I was just asking if clipping/culling on the cpu is a good idea to do before it is sent to the gpu or is it already overloaded with everything else( physics , ai and so on). As the sdk mentions backface culling can be cpu costly.

With the Dreamcast (and most other systems) the CPU does not need to do poly-by-poly work to send geo to the GPU. The CPU just points the GPU at clumps of polys to draw as a batch. The CPU does do some work to make sure that batch is worth trying to draw (determine that it's probably visible vs. definitely entirely invisible), but it does not want to bother with backface culling or anything like that.

The PS2 works the same way if you think of the VU1+GS as a single unit. It's just different because the VU1 is more programmable than GPU geometry hardware ever was up until maybe the new Mesh Shaders (which are basically a modernized take on the same idea as the VU1).
 
Could just be old tools and lack of designing art styles, i mean the dc didnt have the same big pool of devs working on it compared to the ps1 and ps2.If emulators are meant to be believed quite a few hit far beyond the 1mpps. I tooks a few comparative shots here. Maybe the dc cant keep up on the same terms as the ps2 but when pushed on its merits it seems do just fine.

Star wars episode 1: Jedi power battles Dreamcast = 43,014 tris per frame x 60 fps = 2.58 mpps
starwarsjedi3.jpg

starwarsjedi4.jpg

starwarsjedi5.jpg


Metal gear solid 2 prologue : 24,904 tris per frame x 60 fps = 1.5mpps
mgs2ps21.jpg


Virtua Fighter 4: 40,771 tris per frame x 60 fps = 2.4mpps
vf42.jpg

vf41.jpg



Dead or Alive 2 dreamcast = 47,371 tris per frame @ 60 fps = 2.8 million polygons per second.
doa21.jpg

doa22.jpg


DoA Paradise psp = 58,523 tris per frame x 30 fps = 1.75 million polygons per second.
doaparadise1.jpg

doaparadise2.jpg

I meant maybe it took a while to see what didnt work and what did when it came to densing up models from ps1 transition to dc/ps2 and the dc just didnt live long enough to reap the benefits.Not to mentions companies that got it right from the get-go like square , konami and a few others refused to pour any meaningful resource on the dc at all. For example soul calibur 1 vs soul calibur psp/tekken psp

Soul calibur DC cervantes vs seigfried : 28,611 tris per frame x 60 fps =1.7 mpps
soulcaliburdc1.jpg

soulcaliburdc2.jpg


soul calibur broken destiny psp :Cervantes vs yoshimitsu = 16,845 tris per frame x 60 fps = 1mpps
souldestiny2.jpg


Tekken5 PSP lei vs xaioyu = 17,231 x 60 fps = 1mpps
tekken52.jpg


Though sometimes , like in resident evil 4 the ps2 does extreme amount of clipping. Heres the pc version(which is based on the ps2 version assets) vs ps2:
Pc captured everything in the scene nothing clipped:65,629 tris per frame x 30 fps = 1.9 mpps
res4pc1.jpg

res4pc2.jpg


PS2 version : 17,680 tris per frame x 30 fps = 530K pps
rez43.jpg


So it means, RE4 (PS2 VER), SoulCalibur 2 GCN and the PSP games are all on DC technical reach? Could they run on DC with more modern rendering techniques? And i can´t believe DC Jedi Power Battles sports more than 2 million polys, what a waste for a game that ended up looking like an upgraded 32 bit port. Happy new year to everyone!
 
So it means, RE4 (PS2 VER), SoulCalibur 2 GCN and the PSP games are all on DC technical reach? Could they run on DC with more modern rendering techniques? And i can´t believe DC Jedi Power Battles sports more than 2 million polys, what a waste for a game that ended up looking like an upgraded 32 bit port. Happy new year to everyone!
I think though that the PSP versions may have less polygons than their PS2 counterparts. Then again the DC does show to have a lot of polygons comparable to the PS2. The DOA2 game is an impressive feat
 
I think though that the PSP versions may have less polygons than their PS2 counterparts. Then again the DC does show to have a lot of polygons comparable to the PS2. The DOA2 game is an impressive feat

i dream with the day someone could port those PS2/PSP games to DC. Now we know those games are on DC tech realm, at least on polycount department!
 
Skies of arcadia for the dreamcast is next. The game is strange. The main characters are around 900 tris without weapons. side characters and low level monsters are around 500 tris. The later end (regular)monsters and bosses range from 3k to close 8k tris. The stages seem to range from 5k to 50k tris.

Vyse - 938 tris without weapons
skiesvyse.jpg


Final boss - 6,111 tris
skiesbossfinal.jpg


Regular enemy inside shrine- 3,500 tris
skiesenemy1.jpg


Electric giga boss - 7,670 tris
skiesgigase.jpg


Ruins - 24,017 tris
skiesstage1.jpg


ice ruins - 24,443 tris
skiesstage3.jpg


ice ruins pt2 - 52,414 tris
skiesstage4.jpg


ice ruins entrance - 12,498 tris
skiesstage5.jpg


Inside valuan ship - 19,761 tris
skiesstage2.jpg
 
Back
Top