Questions regarding Pipelines, TMU's and Textures Per Pass

Brent

Regular
This is a very n00b question. Very basic 3D architecture stuff. I am not totally clear how these work together and what the differences are. Though I know more then the average user I don't really know technically why or how they are different and what they do. Some questions:

1.) What is a Pipeline in a 3D Graphics chip? The way I always understood it is that it is a fixed path that data flows down. When we say one card has 4 pipelines or one card has 8 pipelines how exactly is it improving performance by increasing pipelines?

2.) What exactly does the Texture Mapping Unit do? Am I right in saying that it applies 1 texture to 1 polygon? Or is it 1 texture to one Pixel? When we say a card has 2 TMU's per Pipeline what are we really saying? That it can apply 2 textures to 2 polygons in a single clock? or just 2 textures to 1 polygon in a single clock?

3.) If the TMU is all about applying how many textures per clock to a polygon then what is the Textures per pass? If a card can do 16 textures per pass what are we doing, applying 16 textures in one clock to a polygon?

Thanks for all your help! If there is an easier way to 'learn me' :p by showing me to a URL or some documention I'd greatly appreciate it, or if you can explain it here It would be greatly appreciated.

I've never truely understood these basic things clrealy

Thanks,
Brent
 
1.) What is a Pipeline in a 3D Graphics chip? The way I always understood it is that it is a fixed path that data flows down. When we say one card has 4 pipelines or one card has 8 pipelines how exactly is it improving performance by increasing pipelines?

That’s a slightly loaded question since there are several pipelines that can exist within a 3D chip. If you were to look at a modern day chip the whole 3D pipeline would be everything going into the chip (vertex data) and everything going out (pixels). The diagram at the bottom of this page show the full render pipeline of the P10 chip.

If you take that further then there is generally 2 distinct (at the moment) elements to the 3D pipeline – the geometry (anything that deals with vertex data) pipeline and the raster (texture sampling and pixel generation) pipeline.

What you appear to be talking about though is just the pixel pipeline – when someone says it has ‘4 pixel pipelines’ then that means it has 4 processing pipes that can generate at best 1 pixel per pipe. So if one card has four pixel pipes and another has 8 then in theory the one with 8 has the potential to write twice as many pixel per clock. This gets messy due to all kinds of factors, not least because of texture sampling rate of each pipeline.

2.) What exactly does the Texture Mapping Unit do? Am I right in saying that it applies 1 texture to 1 polygon? Or is it 1 texture to one Pixel? When we say a card has 2 TMU's per Pipeline what are we really saying? That it can apply 2 textures to 2 polygons in a single clock? or just 2 textures to 1 polygon in a single clock?

A texture mapping unit essentially reads information from a texture to derive a colour value and applies that to a pixel (or a pixel is eventually derived from a texture sample colour values and other operations as well). Its not necessarily the case that one texture sample = one pixel though as, for bilinear filtering, 4 texture samples are taken to generate a colour value for a pixel – this is further confused when you add trilinear filtering as it takes 2 sets of 4 texture samples from two levels of mipmaps and combines those to derive a colour value. I’ve probably just confused you there so I would urge you to read the following article to understand a little more on filtering:

http://www.beyond3d.com//articles/Anisotropic/index1.php

Now, if its said one card has two “TMU’s†per pipeline this means its capable of reading two different texture maps at the same time in one pixel pipe. If a game uses two textures throughout the entire thing then this will be optimal for a card with two “TMU’s†per pipe because each of those TMU’s are going to be in use constantly – conversely if a card has only one TMU per pipe then it will need to use at least 2 clock cycles to retrieve the two texture values.

3.) If the TMU is all about applying how many textures per clock to a polygon then what is the Textures per pass? If a card can do 16 textures per pass what are we doing, applying 16 textures in one clock to a polygon?

The term ‘pass’ can mean many things these days, because there are many different methods of rendering something. However a pass traditionally refer to an extra geometry pass.

Take for instance a card that can only apply two textures per pipe and a game that uses 3 textures – in this instance the card cannot apply all the texture to each pixel in on go so it (or the application) resorts to multipass rendering. What happens here is that The geometry so the scene to be rendered is sent with all the relevant texture information for the first two textures and the card renders that – it then passes the results of that to the frame buffer for intermediate storage; when it has completed all the rendering of the first pass all the geometry is resent, this time with the information for the 3 texture data. The card will then proceed to render this set of geometry, but rather then just passing the results to the frame buffer it reads whats in the frame buffer (as that stores the information of the first two textures) combines that with the current pixel being rendered from the second pass and then writes the combine values from the first and second pass to the frame buffer.

Todays cards are a little more complex though and even though they have limited number of texture units per pipe they do have little intermediate storage areas on chip that texture data can be placed so that it can use this without having to resort to multipass rendering.
 
1) A 3D graphics chip generates pixels by performing a series of operations on incoming data. These operations happen one after the other in a certain order, like an assembly line. A chip with multiple pipelines is like a factory with multiple assembly lines... it can work on multiple pixels in parallel. So a chip with 2 pipelines can generate pixels twice as fast as a chip with 1 pipeline - which means you can get up to twice the frame rate.

2) TMUs or "texture units" refers to the number of textures you can apply to a pixel in a single clock cycle. Each rendering pipeline usually has one or more texture units of its own, so that it can work independently of the other pipelines. While each polygon will have one or more textures assigned to it, you have to sample those textures at each pixel location.

3) The number of texture units tell you how many textures can be sampled at one time (i.e. in one clock cycle). With pixel shader hardware, you can read some textures, then do some math operations to change the color, then read some more textures, and so on. Each pixel shader operation takes one or more clock cycles. In other words, you might sample a total of four textures (for example) to determine the pixel's color, but since they aren't all being sampled at the same time, you would not need four texture units... in fact, you could get away with just one. The number of textures per pass refers to how many total textures you can apply to a pixel before you have to write it out to the frame buffer. You could apply more textures if you wanted, but then you would have to start back at the beginning of the pipeline again and do another rendering pass over the whole image, which obviously takes a lot of time.

So to use the Radeon 9700 as an example, it can only sample one texture at a time for each rendering pipeline, but it can sample a total of 16 textures for each pixel before writing to the frame buffer. It presumably does this by looping the pixel back through the texture unit and applying each new texture one after the other.
 
A. Terms that date back to outmoded 3Dfx technology and now all pretty much meaningless :D

The use of the terminology to say '4 pipelines' nowadays could be totally wrong; it might be a single pipeline, but process multiple pixels simultaneously. It would be better to use 'maximum 4 pixels per clock'.

Similarly, the TMU is at best a logical concept nowadays... and the effective number of TMU's could be highly variable depending on filtering mode, or maybe even totally dynamic.
 
Dave, GraphixViolence and Dio Thank you so much for your responses!

It is MUCH clearer to me now, your explanations were easy to read and understand.

Let me see if I have this right then:

1.) Basically there are different 'kinds' of pipelines within the whole 3D pipeline architecture. Geometry and Raster (including Pixel and Texture information). When they say in the .pdf files for new video cards that the card has 4 or 8 Pipelines they are usually refering to the Pixel Pipeline specifically, at which it would be more accurate to say it can do 4 or 8 Pixels per clock now a days. Each pipeline can process 1 pixel, the more you have the more can get done at one time because they are all working in Parellel.

2.) So the number of TMU's determines how many textures can be applied to a Pixel in one clock? We are strictly talking about Pixels here and not Polygons right? And are we actually applying a texture to a Pixel or is it just applying texture information like color to a pixel?

3.) And since games typically use a lot more then 2 textures per pixel? or is it polygon? Then having the 2 TMU's per pipeline isn't enough to do everything in one clock? But the 'Textures Per Pass' doesn't say actually do 16 textures in one clock, this is simply 16 textures for one Pixel? or Polygon? that it can store before it has to write it out to frame buffer? But with Pixel Shaders things are done different.
 
Brent said:
3.) And since games typically use a lot more then 2 textures per pixel? or is it polygon?
Since a poly is made of pixels, they are pretty interchangeable in this context, but as for numbers of textures per poly, that it is difficult to assess. Sometimes there are no textures on a polygon (eg when doing stencils or, heaven forbid, flat shading :) )
 
1.) What is a Pipeline in a 3D Graphics chip? The way I always understood it is that it is a fixed path that data flows down. When we say one card has 4 pipelines or one card has 8 pipelines how exactly is it improving performance by increasing pipelines?


You sem to have pretty much got this, but in a general case, a pipeline on a silicon chip is basically a chain of stages that are executed when an instruction is given. This can mean that an instruction can be executed and put into the pipeline, but once the first stage of the pipeline has been done and it moves onto the next part of the chain often the start of the chain is free to accept new commands. So in effect you could have a pipeline that can execute 1 instruction every 2 clock cycles, but it actually takes 10 or so clock cycles for instruction A to be resolved, so to speak. It is this delay that can cause problems in some chips (COUGH Pentium 4) because sometimes you have a new instruction who's result depends on another instruction that is still being executed and is only halfway through the pipeline. When this happens the branch predictor in the chip (as I understand it) makes an educated guess as to what the result will be. If it is correct then whoopee doo, if it isn't then the whole pipeline must be cleared and the instruction sent again. Needless to say that costs a lot in performance on some chips with long pipelines (COUGH Pentium 4).



2.) What exactly does the Texture Mapping Unit do? Am I right in saying that it applies 1 texture to 1 polygon? Or is it 1 texture to one Pixel? When we say a card has 2 TMU's per Pipeline what are we really saying? That it can apply 2 textures to 2 polygons in a single clock? or just 2 textures to 1 polygon in a single clock?

Try not to think of it in terms of polygons once it gets to this stage of the pipeline. The rasterizer basically converts the polygon into pixels. The texture unit reads the depth information of the pixel it is rendering and the co-ordinates of the texture(s) on that pixel/polygon. It then reads a number of samples from the correct mip-map level (scaled dodwn version of the texture, used at distance to stop aliasing effects) which is the 4 nearest pixels in the texture, in the case of bilinear filtering, and blends them together. The texture unit may have to do this a number of times. For instance in a single pixel pipe there might be 2 TMU's (aka voodoo 2 or Gf2) and 8 textures to apply to the surface. The pipe has 2 TMU's so it has to apply the textures 2 at a time to the pixel. Typically then the geometry information is re-sent to the graphics card and the card basically renders a translucent polygon in the same position as the one just rendered with the next lot of textures on. This is a new 'pass' because it is having to send more source information to the GPU. New hardware allows for these multiple passes (which waste a lot of bandwidth) to be avoided though, for instance the KYRO II has 2 pipelines each with only 1 TMU, but the card can apply 8 textures (taking up 8 clock cycles) to a surface in a single pass, without the need to send polygon infmration back again. Nvidia/ATi cards have a similar feature called DX8 loopback.

3.) If the TMU is all about applying how many textures per clock to a polygon then what is the Textures per pass? If a card can do 16 textures per pass what are we doing, applying 16 textures in one clock to a polygon?

I think I sorta already answered that.

For a single pipeline textures per clock = # of TMU's per pipe. Textures per pass is basically how much the pipeline can do without stopping. 16 textures per pass is great, it still takes 4 clock cycles if you have 4 TMU's per pipe, or 16 if you have 2 per pipe.

Of course, these numbers of clock cycles are theoretical, there are hundreds of oppertunities for the graphics pipeline to be stalled for any number of clocks;)

Dave
 
OpenGL guy said:
Simon F said:
Sometimes there are no textures on a polygon (eg when doing stencils or, heaven forbid, flat shading :) )
Or even gouraud shading :D
Bon point.
Thank heavens for Edwin Catmull - inventor of texturing. (and Z buffering etc etc etc)
 
I just wanted to say thanks again for everyones answer, this definitely cleared things up for me :)

I understand them a lot better now

I much appreciate it

Your friend and loyal reader,
Brent
 
I've been interested in this also. In the case of the R300 (theRadeon9700) it has 8 pipelines, but 1 TMU or texture unit per pipeline, for a total of 8 texture units. But with many loopbacks, you can get upto 16 textures per pipeline? So with 8 pipelines and doing the maximim of 16 textures each you can have 128 textures per pass, correct?

The GPU with the most TMUs currently is the Parhelia, I believe, it has only 4 pipelines, but each pipeline has 4 TMUs, for total of 16 TMUs. Although since its pipelines are not DX9, it cannot get 16 textures per pipe, its limited to 8 right? Wouldn't Parhelia with its 16 TMUs (4 per pipe) have an advantage over R300/Radeon9700 in multi texturing situations were 16 textures per pipe is not needed?

To confuse things further, the Flipper GPU in the Gamecube console, designed by the same team that made Radeon 9700/R300, has 4 pipelines but only 1 texture unit in total. this unit it known as the TEV, and produces 4 textures per clock, 1 for each pipe. its pretty much the eqiuvalent of a 4:1 configurataion, 4 pipes, 1 TMU per pipe. but it also has loopback (not as much as R300 tho) Flipper's TEV can loopback upto 8 textures per pipe, though with Flipper's relatively small fillrate, using all the loopbacks would bring the fillrate down below 100M pixels/sec, yet that's fine for television res.

So currently, Parhelia has the most TMUs of an consumer graphics processor. I'd like to see what a full DX9 Parhelia, with 8 pipes, 4 TMUs each and more loopbacks could achieve.

further down the road, is R400, will that move to a 16:1 or 16:2 pipeline/tmu configuration?
 
So the loopbacks allow the pixel to be resent to the texture unit so that another texture may be applied? This would mean taking additional clock cycles and would stall the pipe (the other pixels in the pipe go on a coffee brake unless the have loops of their own to go throw at a previous stage). Of course the delay of resending a pixel in a new pass would depend on the architecture but does anyone now what where talking about? Is it 10 extra cycles? 100? 1000?

Is data first sent as polygons to the geometry pipe and is then rasterized into pixels before entering the pixel pipe?

So the mipmaps is stored in the memory of the graphic board and what they do is simply to allow the texture unit to sample them to choose a color for a pixel? I never knew how textures were applied before (maybe I still don't).
 
rAvEN^Rd said:
So the mipmaps is stored in the memory of the graphic board and what they do is simply to allow the texture unit to sample them to choose a color for a pixel? I never knew how textures were applied before (maybe I still don't).

Mip maps are simply smaller prefiltered versions of the original texture. You usually have original texture and the recursive smaller versions (usually always one quarter = 1/4 of the previous level) I.E if your texture is 256x256 you have; 128x128, 64x64, .. 2x2. When texturing polygon you then simply choose mip map that is closest one to the polygon in question sizewise. The chose of mip map level to be used can made once per polygon or for each polygons pixel individually.

Quality can be improved using two closest mip map levels (that is, two prefiltered textures). The two samples found from the textures are lineary interpolated and result used as texel. Also, when sampling chosen texture (both normal or mip mapped) a additional bi-linear interpolation can be used. In bi-linear interpolation four closest texture values are interpolated to found a "smooth average". When using two mip map levels with interpolation between + bi-linear interpolation for each mip map sample we have so called tri-linear mip mapping ;)
 
Back
Top