One TMU every two pipes and writing to texture on PS2?

PS2 does no cleverness with Z - the pipeline is fixed length and assuming it's not actually stalling on something, does 8 pixels per clock regardless of most settings. Only fog or trilinear really slow it down by a serious amount.

Z read/write is part of this pipeline, so you can consider it "free". There's a certain amount of bandwidth shared between the frame-buffer and z-buffer that you might want back during a full-screen operation, but there's a hardware bug in the GS that stops you from turning it off anyway... In reality you can get around this and use the maximum bandwidth anyway.

On some "modern" hardware it's sometimes cheaper to render the whole scene untextured to build a Z-buffer first, and then when doing the more expensive operations you can early-out on the Z value if it fails and only write the visible pixels. PS2 can't do that - it'd help for some heavy multi-pass techniques but on the whole the PS2 has enough fill-rate to cope.
 
JesusClownFox said:
Aaah..I got it. I thought by framebuffer you were referring to the main FB that renders the entire scene.

If I said frame buffer when I meant a render-to-texture buffer (render target) then I'm sorry. I had woken up not long before I made that post, heh. Well, that is my excuse anyway! ;)

So, are all textures in eDRAM seen as mini-framebuffers (or as you call it, render targets), then?

I wouldn't think so. It may well be that buffers (big such as frame buffers or smaller ones such as render-to-texture buffers) have some memory alignment prerequisites textures do not have. That's something a PS2 programmer would have to answer...

If that's the case, is it possible to use the 512-bit texture in the place of the framebuffer read port?

I would think the fb read port is used for alpha blending/fog operations, and quite possibly Z reads... Besides, you wouldn't want to replace it with the texture read port anyway, as the texture port is only half as wide and thus only has half the bandwidth...

I was under the impression that IMRs did a Z check once after rasterization and once after texturing.

Well, texturing is a stage in the rasterization process, so it would only be once anyway...

IMRs do things a bit differently depending on how modern they are. Starting with the GF3 and most later chips, Z-buffer is checked before texture reads take place, so if the Z-check fails the pixel operation is aborted at that stage. Previously, Z-checks were done after the pixel was already textured and about to be written out to memory. Thus, some texture read bandwidth is saved... I would think this newer behavior is more expensive from a complexity/transistor point of view, or else they'd done this optimizations years earlier as it is a pretty obvious trick, really.

Needless to say, GS does not do early Z-check.

Wouldn't continuous Z-buffering be a bandwidth killer?

Uhm, the Graphics Synth has more bandwidth than any current 3D chip... It can handle it! ;)
 
Thanks for the clarification, from both of you. :)

MrWibble:

In regards to turning off the Z-buffer, it might give back some bandwidth when reading the current frame, but for all the overdrawing it can potentially save, it certainly would not be a very wise compromise.

As for that "modern" hardware you're referring to, certainly you are talking about deferred tile-based architectures like PowerVR, right? Personally, I think there's a tradeoff between cramming in the entire scene at once, tiling it, and doing a single Z-check, rather than doing it in small chunks of triangles with multiple Z-checks. I mean, the memory used in getting the entire scene at once might be more or less the same as what is used for immediate-mode rendering in multiple passes, considering the developer uses a proper front-to-back sorting method.


Guden Oten:

Thinking about it now, it seems that the texture and framebuffer ports each have their own specialised needs, and thus it's inappropriate to mix them around. If the texture read port had been used to read a fullscreen framebuffer, it would've had to pass through the texture fetch/coordinate/filtering stages before it finally reached the final FB blending ops. On the other hand, the FB ports go straight to that final stage, so that if you are writing to a texture target, it would have to do at least a second pass so it can be applied to its coordinates. I assume that this is not a common occurance in most games, otherwise Sony would've provided a separate "texture write" port. Or you could just have VU0 do whatever special operations on the texture and send it to the eDRAM (but there's more problems there, dealing with external bandwidth and such).

Also, I assume the three framebuffers work like this: one after triangle setup/rasterization, another after texture ops, and a final one after framebuffer ops. Is this correct?

Damn shame about the early Z-test. :( Then again, it really has only been around since the days of DirectX 8.

By the way, I realize that what I've just said is very obvious and n00bish stuff. Just bear with me please. :LOL:
 
JesusClownFox said:
In regards to turning off the Z-buffer, it might give back some bandwidth when reading the current frame, but for all the overdrawing it can potentially save, it certainly would not be a very wise compromise.

Hm, in case of the GS, all the Z-buffer really does is prevent depth errors when opaque polygons intersect. To eliminate overdraw you need to render in front-to-back order, and there's simply no point in doing that on the GS. It's so fast it does 10 layers per pixel overdraw PER FRAME without even breaking a sweat. No, on the GS you don't give a f*ck about overdraw :), instead you sort by texture order and just go ahead and render the scene straight up.

You fill pixels blazing fast, but you don't have much space for storing textures or bandwidth to upload them on a frame-by-frame basis, so it's much smarter to do the rendering on a per-texture basis even if it means a lot of overdraw. It's not elegant or efficient, but who cares. It's the end result that counts. ;)

It's rare you're ever going to be fill limited on PS2 anyway as you'll realistically bottleneck somewhere else before that happens.

I mean, the memory used in getting the entire scene at once might be more or less the same as what is used for immediate-mode rendering in multiple passes, considering the developer uses a proper front-to-back sorting method.

Problem is, efficient 3D engines don't really lend themselves to being proper front-to-back renderers, so you won't ever see full effect from the bandwidth-saving techniques on IMR-based 3D chips. It's actually faster to have some overdraw on your 3D-chip than bog down the CPU sorting through reams of triangles to eliminate overdraw. That's one of the reasons Unreal-engined games ran pretty slow in software mode, because the engine killed all the overdraw before rasterization.

Thinking about it now, it seems that the texture and framebuffer ports each have their own specialised needs, and thus it's inappropriate to mix them around.

I'm pretty sure you CAN'T. They're not likely to be connected through some kind of switchboard, they're likely hardwired to the functional units that use them.

I assume that this is not a common occurance in most games, otherwise Sony would've provided a separate "texture write" port.

I don't exactly know how common it is, but it's been done that's for sure, and because the GS is so damn quick at rasterizing, it's a blazing fast operation. You wouldn't need a specific TEXTURE write port anyway as the GS doesn't know nor care the buffer it is rendering to will be used as a texture down the road anyway.

Or you could just have VU0 do whatever special operations on the texture and send it to the eDRAM (but there's more problems there, dealing with external bandwidth and such).

Well, if you're doing a procedural texture such as a force-field or such like Unreal, wood grains, blood floating about like in Silent Hill 3 etc, you'll probably want to do it on a VU as the GS is very limited on the maths it can do on a texture, but if you want reflections for example like on a nice, shiny car's body in a racing game, it's a task for the GS.

Also, I assume the three framebuffers work like this: one after triangle setup/rasterization, another after texture ops, and a final one after framebuffer ops. Is this correct?

Um...no. :)

Usually you have three buffers, yes, but they don't work quite the way you think. :) You have front buffer that holds graphics being displayed. Back buffer being rendered into, and Z buffer for depth and possibly stencil effects or such.

I'm no expert on the specifics, but the basic 3D pipeline in a traditional polygon renderer like those we have today (ie nothing weird like the voxel engine in the PC game "Outcast", etc), is something like this:

3D transforms + lighting -> backface culling (typically done on EE's VU1, but sometimes programmers skip the culling as the GS is so damn fast you could just draw the backfacing polys anyway) -> send to GS -> poly setup + clipping -> read texels/source/destination alpha values, do blend ops -> Z check -> if success update Z if neccessary/write pixel to back-buffer, if fail discard pixel. Keep doing until frame is finished, then wait until vertical blank, swap front buffer with back buffer, clear back buffer, clear Z and start over. Basically. There are probably some steps I missed etc, but like I said, I'm just a dabbler on the specifics here.

Of course, there are lots of ifs when it comes to PS2. :) You could have a half-height front buffer to save memory for example. In that case you'll do a filter blit between back and front buffers instead of a simple swap to squash down the larger back buffer and get a flicker-fix effect at the same time. Also, a game might not neccessarily wait until the end of the frame before it swaps buffers (this happens more and more in games these days as it hides slow-down more easily but produces a tearing effect, particulary in sideways motions). You could also save the previous back buffer to read from in a blend to produce motion blur, etc... There's many possibilities and techniques.

Damn shame about the early Z-test. :( Then again, it really has only been around since the days of DirectX 8.

It's not actually API-dependent. It's a hardware function that'll work regardless of DX version, or even without an API at all. For example, Gamecube's Flipper has a simple form of early Z-rejection...
 
Guden Oden said:
Hm, in case of the GS, all the Z-buffer really does is prevent depth errors when opaque polygons intersect.

Can you elaborate on this concept of opaque polygons? Are these triangles that are hidden or just those that have zero alpha?

instead you sort by texture order and just go ahead and render the scene straight up.

You fill pixels blazing fast, but you don't have much space for storing textures or bandwidth to upload them on a frame-by-frame basis, so it's much smarter to do the rendering on a per-texture basis even if it means a lot of overdraw. It's not elegant or efficient, but who cares. It's the end result that counts. ;)

hmm? Could you please word this in a simpler way for this meager layman? :)

I don't exactly know how common it is, but it's been done that's for sure, and because the GS is so damn quick at rasterizing, it's a blazing fast operation. You wouldn't need a specific TEXTURE write port anyway as the GS doesn't know nor care the buffer it is rendering to will be used as a texture down the road anyway.

I know, but it'd be nicer to do "render-to-texture" ops without a second pass. :)

Well, if you're doing a procedural texture such as a force-field or such like Unreal, wood grains, blood floating about like in Silent Hill 3 etc, you'll probably want to do it on a VU as the GS is very limited on the maths it can do on a texture, but if you want reflections for example like on a nice, shiny car's body in a racing game, it's a task for the GS.

Essentially, these are the same maths used for framebuffer ops, no? Since they're both traveling down the same "write" port.

read texels/source/destination alpha values, do blend ops

"Source," I am assuming, refers to texture coordinates? And is destination alpha the same as regular alpha (transperancy values)? Isn't this part of the color map's RGBA data?

Also, what's the deal with GS not providing any native clipping? Sounds pretty absent-minded to me...
 
Guden Oden said:
3D transforms + lighting -> backface culling (typically done on EE's VU1, but sometimes programmers skip the culling as the GS is so damn fast you could just draw the backfacing polys anyway) -> send to GS -> poly setup + clipping -> read texels/source/destination alpha values, do blend ops -> Z check -> if success update Z if neccessary/write pixel to back-buffer, if fail discard pixel.

Just a quick correction:

The GS doesn't do any clipping (I wish!) - you typically do that on VU1 as part of your transform loop. Obviously you avoid doing it on packets of geometry which don't intersect a clip-plane, but if it's necessary, VU1 is usually where it goes.

Some crazy people try doing it on the EE before sending the packets off - but then some people like Britney Spears. For her music.

The GS can do a limited amount of scissoring - but I don't consider that as a clipping replacement, because it's only in 2d (no front-plane stuff, you've always got to clip that manually) and it's a pixel-level operation. Which means that in some cases it's not a lot faster than actually drawing the extra pixels would be - it just stops things from going wrong if your polygons extend off-screen a bit.

The GS side of your pipeline is really (as far as the programmer is concerned) one big atomic operation, so the order doesn't really matter - unless you're care about stalls from texture reads on failed pixels... However it misses out alpha-testing (which is another "free" feature, and can actually be pretty useful at times).
 
Guden Oden said:
No, on the GS you don't give a f*ck about overdraw :), instead you sort by texture order and just go ahead and render the scene straight up.

For bitmap related stuff you're probably right (although I've heard that due to the pagebased nature of the eDRAM, it's best to keep the pixel to texel ratio at ~ 1:1, to avoid thrashing of the buffer. So for meshes with large textures, backface culling might be a good idea). It would be nice though, to have a way to cull geometry, that would save a lot of bandwidth and unnesessary setup time.
Maybe VU0 could be put on the job of visibilety testing, and then send the results to VU1?
 
JesusClownFox said:
Can you elaborate on this concept of opaque polygons?

Polys that are NOT see-through, ie zero transparency. :) For example, you wouldn't want a poly belonging to the road in a racing game ending up on top of a distant car or such. :)

[/quote]hmm? Could you please word this in a simpler way for this meager layman? :)[/quote]

Ok, assume you are in a forest. You got distant mountains in the background, moss and dirt and rocks on the ground, bark on the trees, leaves on the branches and blue sky with fluffy clouds in the background. If you're an IMR, you'd want to draw the tree-trunks before you draw the mountains, and the leaves before you do the sky, etc. However, to actually do that to avoid overdraw it will involve a lot of "state changes", ie loading up new textures and setting new values in the 3D chip's registers.

The polys/models you draw might be something like: moss, fir tree, pine tree, rock on the ground, dirt, fir tree, moss, pine tree, moss... etc etc.

However, if you're the GS, you detest this technique, because you're not going to be able to store the moss, dirt, tree, rock etc textures in on-chip memory, and you'll slow down a lot by repeatedly send those textures over the bus again and again. So by utilizing the chip's fillrate, you draw ALL the pine trees first, then ALL the fir trees despite you'll be covering some of the pine trees, then ALL the birches despite covering some of the pines and firs, and so on. Now, this is just an example of an extreme brute-force case, in reality there may well be optimizations a programmer can do, but my intent was just to illustrate the procedure. :)


but it'd be nicer to do "render-to-texture" ops without a second pass. :)

Well, you'd have to do a 2nd pass anyway, as the scene data that makes up the framebuffer will not be the same that makes up your recently rendered texture...

Essentially, these are the same maths used for framebuffer ops, no? Since they're both traveling down the same "write" port.

Well, when generating a procedural texture on the GS, essentially you'll be limited to just doing blending operations, and those aren't as full-fledged as most other 3D chips for some reason only Sony can answer... :) The VUs have a much greater range of maths ops, and much better precision too (less banding in color gradients, etc).
 
Modern GPUs divide the VRAM into multiple logical buffers. The GS sees it as one whole chunk.
Er - GS addresses 3 types of buffers (Z, FB, Texture) and with dual contexts, that gives you 6 simultaneous addressings at a given time. Yes, the addresses of these buffers are allowed to overlap arbitrarily, but they are still different "logical buffers".
I was under impression NV2a treated the 64MB of main mem in pretty similar fashion too, I don't think there are any restrictions on where and how you place your buffers? That's kind of the whole idea if you want good "render to texture" flexibility and speed.

MrWibble said:
Which means that in some cases it's not a lot faster than actually drawing the extra pixels would be
IIRC it's a gross pixel rejection, similar to various early Z things on other hw. So it's around an order of magnitude faster then painting the pixels, but obviously still takes some time if the polygon is obscenely big.

Squeak said:
It would be nice though, to have a way to cull geometry, that would save a lot of bandwidth and unnesessary setup time.
Vertex streams composed of triangle strips (which will make 99% of your geometry) are poorly suited to modification. Eliminating vertices that belong to BF polys from strips is both time consuming and (over)complicates the VU shader loops.
Also, due to the vertex sharing nature of strips, the actual number of eliminated vertices will be lower then number of BF polygons, and could vary wildly from one frame to next.
Still, it can be worth going through the trouble if you are running very complex geometry shaders - but if you need to reduce GS overhead, flagging the vertices and leaving the stream unmodified is the way to go.

Maybe VU0 could be put on the job of visibilety testing, and then send the results to VU1?
Refer to MrWibble's comment about Britney Spears for that one. :p
 
Fafalada said:
MrWibble said:
Which means that in some cases it's not a lot faster than actually drawing the extra pixels would be
IIRC it's a gross pixel rejection, similar to various early Z things on other hw. So it's around an order of magnitude faster then painting the pixels, but obviously still takes some time if the polygon is obscenely big.

Actually I think it's done at rasterizer level and depends on which side of the screen your polygon is crossing. So when you're off the bottom it can just throw away the rest of the polygon, but if it's off the top it has to scan-convert stuff until it gets to the visible part. If it's off the right it can stop the current scan-line and start on the next, but if it's off the left you have to step through the pixels until they become visible.

I presume you don't get a hit from the texture unit for non-visible pixels, but I can't quite remember off-hand.

Different combinations give different performance hits - some are very fast, some are surprisingly slow.

There was some info from developer support about it, with one of their presentations showing a diagram with the relative speeds on different parts of the screen... Not sure if it was given out anywhere though, or if it was just something I saw at a conference.
 
Fafalada said:
Squeak said:
It would be nice though, to have a way to cull geometry, that would save a lot of bandwidth and unnecessary setup time.
Vertex streams composed of triangle strips (which will make 99% of your geometry) are poorly suited to modification. Eliminating vertices that belong to BF polys from strips is both time consuming and (over)complicates the VU shader loops.
Also, due to the vertex sharing nature of strips, the actual number of eliminated vertices will be lower then number of BF polygons, and could vary wildly from one frame to next.
Still, it can be worth going through the trouble if you are running very complex geometry shaders - but if you need to reduce GS overhead, flagging the vertices and leaving the stream unmodified is the way to go.

I didn’t mean culling single strips (although I can see why one could think that, within the context), I meant culling entire meshes. Like if you are rendering a town or a hilly landscape where a lot of large objects occlude others, stuff that portal culling wouldn’t be suited for.


Maybe VU0 could be put on the job of visibility testing, and then send the results to VU1?
Refer to MrWibble's comment about Britney Spears for that one. :p

Surely if you can do collision detection, inverse kinematics and stuff like that on VU1, VU0 should be able to handle it as well, because stuff like that isn’t very memory intensive (4Kb data mem on VU0.).
If VU0 can handle the above, sphere and bounding box culling should also be well within its capabilities. Then it would “justâ€￾ be a matter of telling the VU1 what objects not to render?
 
MrWibble said:
I presume you don't get a hit from the texture unit for non-visible pixels, but I can't quite remember off-hand.
This I do recall reading about - being specifically pointed out somewhere that rejection happens before texture fetch and the rest of the pipeline.

Squeak said:
I meant culling entire meshes. Like if you are rendering a town or a hilly landscape where a lot of large objects occlude others, stuff that portal culling wouldn’t be suited for.
I misunderstood you then. Anyway gross culling has always been a constant area of research, problem is that most methods that don't rely heavily on precomputed data come with sizeable CPU overhead, so on something like PS2 where you sooner have free GPU then CPU time, they are rarely as usefull as people might think.

If VU0 can handle the above, sphere and bounding box culling should also be well within its capabilities. Then it would “justâ€￾ be a matter of telling the VU1 what objects not to render?
I thought you were talking about primitive culling here as well - so the above misunderstanding applies ;)
Anyway, I think you will be hard pressed to find a PS2 title that "doesn't" use VU0 for gross culling against view frustrum etc. It will rarely be more then macro code, but it's still using VU0.
Actually that's where the curse of EE cache comes up again - in my code a box check takes 18cycles (assuming one matrix against many boxes), but the cache-miss penalties for fetching bounding-box data will bring this to an average of 60-80 in real life code...
 
Fafalada said:
This I do recall reading about - being specifically pointed out somewhere that rejection happens before texture fetch and the rest of the pipeline.
PA clearly shows that. Scissored pixels have no impact on textures buffer.

ciao,
Marco
 
Guden Oden: What does the Z-buffer have to do with collision detection?

Also, the point of my proposed "texture write" port is that the render-to-target ops hit the texture stages before reaching the framebuffer ops, therefore eliminating the need for another pass. :)


MrWibble said:
The GS can do a limited amount of scissoring - but I don't consider that as a clipping replacement, because it's only in 2d (no front-plane stuff, you've always got to clip that manually) and it's a pixel-level operation. Which means that in some cases it's not a lot faster than actually drawing the extra pixels would be - it just stops things from going wrong if your polygons extend off-screen a bit.

The GS side of your pipeline is really (as far as the programmer is concerned) one big atomic operation, so the order doesn't really matter - unless you're care about stalls from texture reads on failed pixels... However it misses out alpha-testing (which is another "free" feature, and can actually be pretty useful at times).

Scisorring is a way of cropping the framebuffer within the dimensions of the proper resolution, right? If so, how is it "limited" on the GS?

Also, what exactly is the definition of an "atomic operation"? Does that mean the pipeline is so fast, that most major unwanted bandwidth hits are negligible? As well, what are you referring to with the exclusion of alpha-testing and why is that so?

Fafalda:

When referring to the buffers in a dual context, are you talking about front and back buffers?
 
Your point about using the texture-write port to avoid a pass makes no sense - you can only draw one thing at a time. You can't render to texture and simultaneously be using that texture in another draw operation.

Scissoring and clipping are slightly nebulous terms that get used in different ways by different people. My own favoured definitions are that clipping is a geometric operation where you take your vertex data and perform a mathematical clipping operation which creates new vertex data as a result. This new data can then be clipped against a different plane, or projected and rendered.

I consider scissoring to be a 2d, pixel-level operation where parts of the polygon being drawn that extend off the screen are rejected by the rasteriser.

You *have* to clip against the near-plane (i.e. polygons that would stick through the screen) or bad things happen when you project them - you simply can't do that in 2D. The PS1 (which wasn't really powerful enough to cope with clipping very much) solution was to throw polygons away when they got too close, which is often why stuff tended to disappear if you got too close to it.

This fits with the PS2 definition, where you specifiy a rectangle within the 2d frame-buffer as the "scissor region" and only pixels within that rectangle get drawn. It is only confusing when reading the Japanese sample code, where the terms scissoring and clipping are interchanged...

When I say "atomic operation" I mean something which is indivisble (even though atoms no longer fit that desctiption themselves!). So it doesn't make any sense to consider the different parts of the pipeline in terms of optimising things, because it all gets done whether you like it or not.

Nothing is really "free" when you consider that it still takes many cycles for each pixel to reach the screen - but thats irrelevant so long as the pipeline is not stalling. It is not blindingly fast (though actually it's pretty damn good) - but it is a fixed cost unless you make it do something bad. Some features are considered "free" because the pipeline does not get any faster if the feature is not used. Thats why I refer to the pipeline as "atomic" - you can't reconfigure it, you can only tell parts of it not to do anything to the pixel.

Alpha-testing is one part of the standard PS2 drawing pipeline and so can be used without incurring a performance penalty. I only brought it up because it was left out of the definition above...

You can turn on some features which add to the pipeline - I think in most cases it just stalls at a particular stage while some complex result is calculated. In the case of fog it halves the fill-rate, as it's presumably re-using the alpha-blend hardware to blend in the fog-colour. For tri-linear it needs an extra texture-blend operation, and potentially a lot more texels.

So you can slow it down, but you can't speed it up...

(and just to jump in for Faf):
The PS2 can maintain two entirely seperate drawing contexts, which allow you to alternate between two draw buffers without reconfiguring any registers (you just tell each set of primitives which context to use). The only hiccup is that any textures have to share a single palette buffer.

It's useful for render-to-texture ops because you can point the second context at your texture buffer and leave the first context pointing at the draw-buffer.
 
Thank you very much, MrWibble and all! I am learning much out of this one thread. :eek:)

When you say "project," do you mean that the current vertex data moves on to the next stage (in this case, rasterization/rendering)?

Alpha-testing, BTW, I imagine is part of the Z-testing process...please correct me if otherwise.

So the way this dual context thing works, is it two front buffers, two back buffers, one depth buffer, and one texture buffer?
 
Projecting is what you do to go from a 3d co-ordinate to a 2d one - in other words it's the process of displaying a 3d world on a 2d device (amongst other things).

I suppose it's kind of what happens as you move stuff into the rendering stage, but it's really just a mathematical term.

Alpha-testing is a pretty similar kind of operation to z-testing, but on the PS2 it actually gives you a bit more flexibility - you don't have to reject pixels when they fail, but have them update only certain buffers.

Dual-context is having 2 draw contexts. So you potentially have 2 back-buffers, 2 depth-buffers, and 2-textures (though only one palette). The front buffers have nothing to do with the drawing operations and so aren't relevant to the drawing contexts.

Of course if you really want two front buffers, you can do that too, as the PS2 has a dual-circuit display system...
 
MrWibble said:
and 2-textures (though only one palette).
Minor nit - that's only true when you're using 8bit palette with 32bit color entries.
In all other formats (8bit@16 or 4bit@16/32) you can have multiple palettes also.
 
Fafalada said:
MrWibble said:
and 2-textures (though only one palette).
Minor nit - that's only true when you're using 8bit palette with 32bit color entries.
In all other formats (8bit@16 or 4bit@16/32) you can have multiple palettes also.

'tis true :)

First time I mentioned it I refered to only having a single palette-buffer... I was too busy generally waffling to explain the intricacies :)
 
MrWibble said:
Of course if you really want two front buffers, you can do that too, as the PS2 has a dual-circuit display system...

Interesting...does that mean you can set up a dual display dealie, given the proper cabling?


Also, the complaint about GS not having any native clipping seems kinda silly to me, seeing as how clipping works on 3D-space geometry before it's converted to a 2D raster on the GS. Isn't this typically done on the CPU by the developer for video hardware that doesn't have T&L?
 
Back
Top