"Yes, but how many polygons?" An artist blog entry with interesting numbers

Mobius1aic · Dec 16, 2019

PSman1700 said:
Some titles did make use of VU0 i think, Vice City did the 4.0 DTS on it, a pretty bad mix though. The hardware was kinda weak but the strong market position made it the most utilized system of that generation. I can imagine the og xbox wasn't as well utilized, though i must say unreal tournament the liandry conflict was on another level, almost next gen, better then anything else perhaps in tech, but that's it.

Utilized in terms of size of library, or utilized in terms of system usage? I think even amongst 1st, 2nd and the very best PS2 dedicated 3rd parties, VU0 was fairly underutilized. Would love to hear more examples as to how it was used though.

The availability of DX on Xbox made it much easier to push close to the limits of that system compared to that of the PS2. Xbox visuals were probably often limited by multiplatform development that dictated workability with the PS2 and Gamecube. Splinter Cell: Chaos Theory was a very special example of a third party doing it right because the series' home was the Xbox, and it's users would expect no less.

Nesh · Dec 16, 2019

What was the VU0 planned for and what it could have done?

Aaron Elfassy · Dec 17, 2019

Playstation blog recently posted a nice quick comparison:
https://blog.us.playstation.com/201...ution-of-5-iconic-playstation-characters/amp/

jlippo · Dec 18, 2019

Nesh said:
What do you mean? You mean polycount of assets is increased. Because polygons render per frame should be a lot less

Clipping when polygon hits edge of view frustum pyramid and is cut to new polygons which reside within it, thus increases amount of visible polygons.

Nesh · Dec 18, 2019

jlippo said:
Clipping when polygon hits edge of view frustum pyramid and is cut to new polygons which reside within it, thus increases amount of visible polygons.

I am not sure I understand

milk · Dec 18, 2019

Nesh said:
I am not sure I understand

If a model is halfway out of view, and some of it's tris get rejected, that is still called culling, not clipping. It is a very granular kind of culling, that is for sure, but still. So granular, depending on how fast/slow that culling is, and how fast/slow simply going through the geometry anyway, it might become a non-optimisation: rejecting all those polygons ends up slower than simply rendering them anyway.

Clipping refers to when a single triangle is halfway out of view, so it gets sliced and re-triangulated so that all its verts stay within the screenspace bounds for rasterisation. In practice, the screen coordinate bounds ususually extend a bit beyond the screen edge so as to reduce the amount of tris that need be clipped (more polys end up culled entirely before having to clip them)

Nesh · Dec 18, 2019

milk said:
Clipping refers to when a single triangle is halfway out of view, so it gets sliced and re-triangulated so that all its verts stay within the screenspace bounds for rasterisation. In practice, the screen coordinate bounds ususually extend a bit beyond the screen edge so as to reduce the amount of tris that need be clipped (more polys end up culled entirely before having to clip them)

Since the clipping is beyond the boundaries of the viewport, isnt it possible to, instead of slicing polygons, to actually "vanish" the completely once all their vertices extend beyond set boundary?

milk · Dec 18, 2019

Nesh said:
Since the clipping is beyond the boundaries of the viewport, isnt it possible to, instead of slicing polygons, to actually "vanish" the completely once all their vertices extend beyond set boundary?

Yes. And that is done. And that is called culling. Frustum Culling to be more specific.
But untill ALL verticies are completely out of screen-space, you can't simply not render the poly without leaving visible holes. That's why then it has to be clipped.

corysama · Dec 28, 2019

Cloofoofoo said:
Yeah but the question is , the CPU still has to handle t&l then sends it to the GPU who does all that right? But since the CPU is still burdened with calculating polygons wouldn't it be a good thing to cull as many as possible before sending it to the GPU ?

There are 5 processors involved in PS2 graphics:

The EE (CPU) which does normal CPU things. In this case it's job is to do high-level culling of whole batches of polygons. So, decide to draw the whole batch or skip it. Batches to be drawn are linked into a queue of instructions for the VIF.

The VIF is a slightly-programmable DMA engine. Basically, it's job is to copy data from the main memory to the internal memory of the VU. In the process it can do a little rearranging of offsets, strides, packing, unpacking the source and destination buffers. The VIF also sends code to the VU1. The VU1 only has 16kb of memory total for both code and data. So, you have to stream code for specific situations kind of like switching shaders.

The VU1 receives code and a series of data chunks from the VIF. For each data chunk, the VIF can also start the code at a data-driven address. That's how you can have multiple routines in a single code chunk. The VU1's job is to prepare data for the GIF. The VU has limited (16 bit) general purpose registers and instructions. It's real power is in vector instructions that can do large volumes of math. The VU handles vertex animation, lighting, UVs (including some of the mipmap math), etc. It also handles culling individual off-screen triangles and it must manually clip triangles into sub-triangles vs. the edge of the screen. Basically, it handles everything to do with geometry. It's not a 1-in-1-out vertex setup like vertex shaders. It's "blob of bytes" in, many-triangles-out.

The GIF is another DMA engine. It reads chunks of bytes from the VU1's memory and stuff it into the registers of the GS. Like the VIF, you point it at a command queue containing a mixture of control bytes and data bytes for it to read through and interpret.

The GS is the rasterizer. It has no instruction set. You control it entirely by setting values in registers (via the GIF). There are registers to define a texture to read from the embedded DRAM. Registers to define the framebuffer in the same DRAM. Registers to define the blending/Z/interpolator actions (vertex color modes). And, a register where you stuff vertex positions. Stuff that 1 register 3 times (yes, overwriting 2 values) and the GS will rasterize a triangle into the framebuffer according to the state defined in the rest of the registers. Alternatively, there is a triangle strip mode that only requires 3 positions to get the first triangle, then 1 per triangle after that. For a given triangle, the GS can only handle a single texture, a few options for how to incorporate the vertex colors and a limited blend mode. So, if you want to use two textures, you need to draw the same triangle twice with different configurations. Fortunately, changing GS configs can be done by setting a few registers at a cost of 1 cycle each. So, the VU can transform a dozen triangle once, have the GS rasterize them, switch GS configs and rasterize them again. That's twice the pixel work for the GS, but it's pretty cheap for the VU.

So, that's a long, fun brain-dump just to say that the CPU isn't really burdened with calculating polygons. The VU1 is

Your main intuition is correct though. The VU can and should do a whole lot of fine-grained culling before sending triangles to the rasterizer. That can explain a lot of the difference in polygon culling measured in the emulators.

PSman1700 · Dec 29, 2019

Thanks for explaining the PS2 abit more to us

Do you think Transformers would have been possible on the og xbox at the time? The huge draw distances and the amount of trees on display with zero fog was quite amazing at 60fps. Game had impressive physics also, almost like Havok.

corysama · Dec 29, 2019

Nesh said:
What was the VU0 planned for and what it could have done?

The VU1 was set up to do all of the geometry and animation work to prepare triangles for the rasterizer. I'm not sure what the plan was for the VU0. But, I think the idea was basically "What if we added another VU to help out the CPU with math-heavy work?"

The problem with the VU0 was that there wasn't an obvious good way to schedule streaming results out of it. You could stream data into it easy enough. But, it couldn't send data out on it's own. It also couldn't tell the CPU when data was ready. So, the official plan was to have the CPU poll to see if the VU0 was idling before triggering a DMA out operation. It was very difficult to set that up without having the CPU waste time polling with nothing else to do (or the CPU doing lots of other stuff while the VU0 idles waiting for more work).

Late in the PS2's lifetime, my team at High Voltage Software figured out how to make scheduling work out. I think the VU0 would somehow trigger an interrupt and the CPU's interrupt handler would move the DMA pipeline along asynchronously from the CPU's main thread. I expect PS2-specialty shops figured this out way before we did, but they weren't sharing

With that in place, moving high-level culling to the VU0 would be an obvious first task. The VU1 doesn't have the room to do animation keyframe->blended matrices work along side the vertex transforms. So, that would be a great job for the VU0 to offload from the CPU. Physics in general would fit well too. Audio processing would be possible. But, there would be multiple hops involved getting data from main RAM->VU0 RAM->main RAM->IOP RAM->SPU RAM. That's a lot of latency.

corysama · Dec 29, 2019

PSman1700 said:
Thanks for explaining the PS2 abit more to us Do you think Transformers would have been possible on the og xbox at the time? The huge draw distances and the amount of trees on display with zero fog was quite amazing at 60fps. Game had impressive physics also, almost like Havok.

That would be difficult. The PS2 could handle more raw, basic, simple polygons than the Xbox, but only after a whole lot of work and constraints to appease the hardware. It also technically had more fill rate, but again with a whole lot of constraints. The Xbox had a stronger CPU and a much easier API. The vertex and pixel shaders had way more features than the GS, were much more familiar to PC devs and they were by no means weak. So, if you want to draw a bunch of single-texture, vertex lit, alpha-testing polys then add full-screen "motion blur" as a post-process, the PS2 is actually a much better fit. Throw in lightmaps and it gets a bit harder for the PS2. Normal maps? LOL, no. (Someone figured out how to technically make it work, but it took like 8 passes). Shadow maps? Hope you are OK with solid black shadows. Etc...

PSman1700 · Dec 29, 2019

corysama said:
That would be difficult.

Wouldn't an xbox version be possible somehow, playing to it's strenths?

Cloofoofoo · Dec 30, 2019

corysama said:
There are 5 processors involved in PS2 graphics:

The EE (CPU) which does normal CPU things. In this case it's job is to do high-level culling of whole batches of polygons. So, decide to draw the whole batch or skip it. Batches to be drawn are linked into a queue of instructions for the VIF.

The VIF is a slightly-programmable DMA engine. Basically, it's job is to copy data from the main memory to the internal memory of the VU. In the process it can do a little rearranging of offsets, strides, packing, unpacking the source and destination buffers. The VIF also sends code to the VU1. The VU1 only has 16kb of memory total for both code and data. So, you have to stream code for specific situations kind of like switching shaders.

The VU1 receives code and a series of data chunks from the VIF. For each data chunk, the VIF can also start the code at a data-driven address. That's how you can have multiple routines in a single code chunk. The VU1's job is to prepare data for the GIF. The VU has limited (16 bit) general purpose registers and instructions. It's real power is in vector instructions that can do large volumes of math. The VU handles vertex animation, lighting, UVs (including some of the mipmap math), etc. It also handles culling individual off-screen triangles and it must manually clip triangles into sub-triangles vs. the edge of the screen. Basically, it handles everything to do with geometry. It's not a 1-in-1-out vertex setup like vertex shaders. It's "blob of bytes" in, many-triangles-out.

The GIF is another DMA engine. It reads chunks of bytes from the VU1's memory and stuff it into the registers of the GS. Like the VIF, you point it at a command queue containing a mixture of control bytes and data bytes for it to read through and interpret.

The GS is the rasterizer. It has no instruction set. You control it entirely by setting values in registers (via the GIF). There are registers to define a texture to read from the embedded DRAM. Registers to define the framebuffer in the same DRAM. Registers to define the blending/Z/interpolator actions (vertex color modes). And, a register where you stuff vertex positions. Stuff that 1 register 3 times (yes, overwriting 2 values) and the GS will rasterize a triangle into the framebuffer according to the state defined in the rest of the registers. Alternatively, there is a triangle strip mode that only requires 3 positions to get the first triangle, then 1 per triangle after that. For a given triangle, the GS can only handle a single texture, a few options for how to incorporate the vertex colors and a limited blend mode. So, if you want to use two textures, you need to draw the same triangle twice with different configurations. Fortunately, changing GS configs can be done by setting a few registers at a cost of 1 cycle each. So, the VU can transform a dozen triangle once, have the GS rasterize them, switch GS configs and rasterize them again. That's twice the pixel work for the GS, but it's pretty cheap for the VU.

So, that's a long, fun brain-dump just to say that the CPU isn't really burdened with calculating polygons. The VU1 is Your main intuition is correct though. The VU can and should do a whole lot of fine-grained culling before sending triangles to the rasterizer. That can explain a lot of the difference in polygon culling measured in the emulators.

Yes thank you for that explanation , its gives a clearer picture for the ps2. But I was actually asking about the dreamcast. That even if the gpu does auto cull of backfaces , hidden surface removal and clips x,y the cpu still has to calculate all those polygons before sending it to the gpu right? I was just asking if clipping/culling on the cpu is a good idea to do before it is sent to the gpu or is it already overloaded with everything else( physics , ai and so on). As the sdk mentions backface culling can be cpu costly.

Cloofoofoo · Dec 30, 2019

Returning to how many polygons. I extracted these off the disc files. The game is marionette handler 2 for the dreamcast. Its a robot fighting sim where you dont control the robots but you program their a.i. Its a very low budget title and a sequel.The robots range from 3,100 triangles to 4,700 triangles with no weapon or effects(gun shots, rocket boost bursts). The stages range from 3,000 to 5,200 triangles. The weapons range from 300 to 500 triangles.I guess if you look at it polygon wise it performs like dissidia/ 012 on the psp where the characters range from 1,500 triangles to 2,400 triangles(with weapons) and the stages up to 8,000 triangles.

Robot 1 - 3,923 tris

robot 2 - 4,791 tris

robot 3- 4,110 tris

Stage 2 - 5,252 tris

stage 6 - 4,151 tris

corysama · Jan 1, 2020

Cloofoofoo said:
Yes thank you for that explanation , its gives a clearer picture for the ps2. But I was actually asking about the dreamcast. That even if the gpu does auto cull of backfaces , hidden surface removal and clips x,y the cpu still has to calculate all those polygons before sending it to the gpu right? I was just asking if clipping/culling on the cpu is a good idea to do before it is sent to the gpu or is it already overloaded with everything else( physics , ai and so on). As the sdk mentions backface culling can be cpu costly.

With the Dreamcast (and most other systems) the CPU does not need to do poly-by-poly work to send geo to the GPU. The CPU just points the GPU at clumps of polys to draw as a batch. The CPU does do some work to make sure that batch is worth trying to draw (determine that it's probably visible vs. definitely entirely invisible), but it does not want to bother with backface culling or anything like that.

The PS2 works the same way if you think of the VU1+GS as a single unit. It's just different because the VU1 is more programmable than GPU geometry hardware ever was up until maybe the new Mesh Shaders (which are basically a modernized take on the same idea as the VU1).

xaeroxcore · Mar 5, 2020

Cloofoofoo said:
Could just be old tools and lack of designing art styles, i mean the dc didnt have the same big pool of devs working on it compared to the ps1 and ps2.If emulators are meant to be believed quite a few hit far beyond the 1mpps. I tooks a few comparative shots here. Maybe the dc cant keep up on the same terms as the ps2 but when pushed on its merits it seems do just fine.

Star wars episode 1: Jedi power battles Dreamcast = 43,014 tris per frame x 60 fps = 2.58 mpps

Metal gear solid 2 prologue : 24,904 tris per frame x 60 fps = 1.5mpps

Virtua Fighter 4: 40,771 tris per frame x 60 fps = 2.4mpps

Dead or Alive 2 dreamcast = 47,371 tris per frame @ 60 fps = 2.8 million polygons per second.

DoA Paradise psp = 58,523 tris per frame x 30 fps = 1.75 million polygons per second.

Cloofoofoo said:
I meant maybe it took a while to see what didnt work and what did when it came to densing up models from ps1 transition to dc/ps2 and the dc just didnt live long enough to reap the benefits.Not to mentions companies that got it right from the get-go like square , konami and a few others refused to pour any meaningful resource on the dc at all. For example soul calibur 1 vs soul calibur psp/tekken psp

Soul calibur DC cervantes vs seigfried : 28,611 tris per frame x 60 fps =1.7 mpps

soul calibur broken destiny psp :Cervantes vs yoshimitsu = 16,845 tris per frame x 60 fps = 1mpps

Tekken5 PSP lei vs xaioyu = 17,231 x 60 fps = 1mpps

Though sometimes , like in resident evil 4 the ps2 does extreme amount of clipping. Heres the pc version(which is based on the ps2 version assets) vs ps2:
Pc captured everything in the scene nothing clipped:65,629 tris per frame x 30 fps = 1.9 mpps

PS2 version : 17,680 tris per frame x 30 fps = 530K pps

So it means, RE4 (PS2 VER), SoulCalibur 2 GCN and the PSP games are all on DC technical reach? Could they run on DC with more modern rendering techniques? And i can´t believe DC Jedi Power Battles sports more than 2 million polys, what a waste for a game that ended up looking like an upgraded 32 bit port. Happy new year to everyone!

Nesh · Mar 6, 2020

xaeroxcore said:
So it means, RE4 (PS2 VER), SoulCalibur 2 GCN and the PSP games are all on DC technical reach? Could they run on DC with more modern rendering techniques? And i can´t believe DC Jedi Power Battles sports more than 2 million polys, what a waste for a game that ended up looking like an upgraded 32 bit port. Happy new year to everyone!

I think though that the PSP versions may have less polygons than their PS2 counterparts. Then again the DC does show to have a lot of polygons comparable to the PS2. The DOA2 game is an impressive feat

xaeroxcore · Mar 6, 2020

Nesh said:
I think though that the PSP versions may have less polygons than their PS2 counterparts. Then again the DC does show to have a lot of polygons comparable to the PS2. The DOA2 game is an impressive feat

i dream with the day someone could port those PS2/PSP games to DC. Now we know those games are on DC tech realm, at least on polycount department!

Cloofoofoo · Mar 14, 2020

Skies of arcadia for the dreamcast is next. The game is strange. The main characters are around 900 tris without weapons. side characters and low level monsters are around 500 tris. The later end (regular)monsters and bosses range from 3k to close 8k tris. The stages seem to range from 5k to 50k tris.

Vyse - 938 tris without weapons

Final boss - 6,111 tris

Regular enemy inside shrine- 3,500 tris

Electric giga boss - 7,670 tris

Ruins - 24,017 tris

ice ruins - 24,443 tris

ice ruins pt2 - 52,414 tris

ice ruins entrance - 12,498 tris

Inside valuan ship - 19,761 tris

"Yes, but how many polygons?" An artist blog entry with interesting numbers

Mobius1aic

Quo vadis?

Nesh

Double Agent

Aaron Elfassy

jlippo

Nesh

Double Agent

milk

Like Verified

Nesh

Double Agent

milk

Like Verified

corysama

PSman1700

corysama

corysama

PSman1700

Cloofoofoo

Cloofoofoo

corysama

xaeroxcore

Nesh

Double Agent

xaeroxcore

Cloofoofoo

Similar threads