But VU1 sens polygons only one at time, right?
Yep. The VU creates a little buffer full of GIF commands and triggers the GIF. The GIF then executes the commands that say "Stuff this value in that GS register" over and over. Some registers control the blend mode. Some the source texture address. Some the framebuffer address. Some the triangle mode (list, strip, sprite).
The fun one is the vertex position register. You stuff all vertex positions into a single register. It looks like you are overwriting a single value over and over. But, the third time you write to that one register, a triangle gets drawn! If the triangle mode register is set to List, you get a triangle every third time you write to the register. If it is set to Strip, you get a triangle on the 3rd vert and then another for every vert you stuff in the register. I think there was a reserved high bit in the position register that would reset the strip.
That shows the really fun bit of the GS design. It is a processor with no instruction set. It only has registers. The GIF pokes at those registers and the GS responds by doing stuff. But, there's no such thing as GS assembly language. The sound processors work the same way.
This is multipass you talking about here?
Yep. That's all there is to multipass.
So, you saying what when Z Buffer isn't needed anymore, you can use it's place to store temp buffer instead?
Also for what is needed temp buffer?
I'm going to do a full-screen post-process. Let's say I'm going to make the whole screen have a dark red filter in post by multiplying the screen by (192, 128, 128). I need to read the frame buffer, do the multiply using vertex colors and write the result somewhere. I need a place to put the result! I can't just put it back where the framebuffer already is because reading and writing to the same texture simultaneously is unsupported and will cause glitches. So instead, I use the memory that currently holds the depth buffer. It's convenient because it is the right size, it's holding data that won't be needed next frame and it means I don't have to stomp any useful textures or palettes in GS ram to make room.
Even if simultaneous read-write did not cause glitches, there are lots of post effects like blurs that require multiple reads from the source to get the right result. The GS can only read once, write once, read once, write once. It can't do lots of reads before it writes a pixel. So, you need some scratch space to work in for those kinds of effects.