Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 29-Mar-2004, 02:48   #1
Bahadir
Member
 
Join Date: Mar 2004
Location: Australia
Posts: 97
Default pipeline architecture

hi All,

Firstly, apologies if this question has been raised before. I know the basic concepts of pipelining is (ie, prefetch, decode, execute, store blah blah blah) but in terms of graphics pipelines I hear from people that NV40 pipeline architecture is 16x1 and R420 is 12x1.

What exactly do these numbers mean? For example, recently on the Inquirer.org site, they are claiming that NV40 might be 32x0. What does the 0 mean in this case? I'm kind of confused in the naming convention used in NxY pipelining.
thanks!

-Bahadir
Bahadir is offline   Reply With Quote
Old 29-Mar-2004, 03:07   #2
Fodder
Stealth Nerd
 
Join Date: Jul 2003
Location: Sunny Melbourne
Posts: 1,112
Default

Take a 4x2 architecture. It has 4 pixel pipelines, so can work on 4 pixels per clock. Each pipeline has two TMU's, so it can do two texture operations on each pixel, per clock. The 32x0 mode refers to a different sort of calculation where nothing is actually rendered - not incredibly useful, but conversely not completely useless either.
__________________
Human Rights [X---------|----------] Robert Menzies
Fodder is offline   Reply With Quote
Old 29-Mar-2004, 03:09   #3
arjan de lumens
Senior Member
 
Join Date: Feb 2002
Location: gjethus, Norway
Posts: 1,256
Default

For pixel pipelines, NxY is usually taken to mean 'capable of rendering N pixels per clock cycle, with Y textures applied to each pixel'. So 8x2 will for example imply 8 pixels per clock, with 2 textures applied to each pixel. (If you want to apply more than Y textures, you can do so, but at the cost of being able to rendering fewer pixels per clock cycle.).

Y=0 as in 32x0 implies that the chip supports a mode where it can render 32 pixels per clock cycle, but only if you turn off texturing (in case of Nvidia, there is usually also the added condition that you must only write Z values, not color values, to the framebuffer. This extra condition tends to lead to endless terminology confusion and funny terms like 'zixels')
arjan de lumens is offline   Reply With Quote
Old 29-Mar-2004, 04:32   #4
Bahadir
Member
 
Join Date: Mar 2004
Location: Australia
Posts: 97
Default

thanks for the replies.
So basically a NxY architecture means it is capable of rendering N pixels with Y texture units applied to it in parallel?

So if pixel x belongs to texture S, and if pixel y belongs to texture T, with a 2x2 architecture should be able to render it in one pass?

Also, with 32x0, are you implying that it will only process 32 pixels in one pass if z depth values are written?
Bahadir is offline   Reply With Quote
Old 29-Mar-2004, 04:43   #5
Fodder
Stealth Nerd
 
Join Date: Jul 2003
Location: Sunny Melbourne
Posts: 1,112
Default

Quote:
Originally Posted by Bahadir
So if pixel x belongs to texture S, and if pixel y belongs to texture T, with a 2x2 architecture should be able to render it in one pass?
Yes, but I think it's better to consider the textures as belong to the pixels.
Quote:
Also, with 32x0, are you implying that it will only process 32 pixels in one pass if z depth values are written?
If only Z values are written.
__________________
Human Rights [X---------|----------] Robert Menzies
Fodder is offline   Reply With Quote
Old 29-Mar-2004, 04:54   #6
arjan de lumens
Senior Member
 
Join Date: Feb 2002
Location: gjethus, Norway
Posts: 1,256
Default

Quote:
Originally Posted by Bahadir
thanks for the replies.
So basically a NxY architecture means it is capable of rendering N pixels with Y texture units applied to it in parallel?
Not quite: it is capable of rendering N pixels per clock cycle with Y textures applied to every one of those N pixels.
Quote:
So if pixel x belongs to texture S, and if pixel y belongs to texture T, with a 2x2 architecture should be able to render it in one pass?
It's more like this: if you have two textures, S and T, and you wish to apply both textures at the same time to both pixel x and pixel y, then a 2x2 architecture will be able to do that in one clock cycle.
Quote:
Also, with 32x0, are you implying that it will only process 32 pixels in one pass if z depth values are written?
Basically yes.
arjan de lumens is offline   Reply With Quote
Old 29-Mar-2004, 05:18   #7
DemoCoder
Regular
 
Join Date: Feb 2002
Location: California
Posts: 4,732
Default

It's a shadow acceleration mode. If doing shadow buffers, it allows you to fill the buffer at up to 2x the fillrate. If doing stencil shadow volumes, it allows you to write stencils at up to 2x the fillrate.
DemoCoder is offline   Reply With Quote
Old 29-Mar-2004, 05:26   #8
Bahadir
Member
 
Join Date: Mar 2004
Location: Australia
Posts: 97
Default

Quote:
Originally Posted by arjan de lumens
Not quite: it is capable of rendering N pixels per clock cycle with Y textures applied to every one of those N pixels.
It's more like this: if you have two textures, S and T, and you wish to apply both textures at the same time to both pixel x and pixel y, then a 2x2 architecture will be able to do that in one clock cycle.
So basically this will be rendered in one pass only if the 2 texture coords point to the same pixel?
Bahadir is offline   Reply With Quote
Old 29-Mar-2004, 05:41   #9
arjan de lumens
Senior Member
 
Join Date: Feb 2002
Location: gjethus, Norway
Posts: 1,256
Default

Quote:
Originally Posted by Bahadir
Quote:
Originally Posted by arjan de lumens
Not quite: it is capable of rendering N pixels per clock cycle with Y textures applied to every one of those N pixels.
It's more like this: if you have two textures, S and T, and you wish to apply both textures at the same time to both pixel x and pixel y, then a 2x2 architecture will be able to do that in one clock cycle.
So basically this will be rendered in one pass only if the 2 texture coords point to the same pixel?
No. Practically all renderers available today use a technique called 'reverse texturing' where you start out with the pixel (x,y) coordinates and from them compute one or more sets of texture (s,t) coordinates, not the other way around. So you should probably think of the pixel as pointing to texture coordinates, not the texture coordinates pointing to the pixel.

With this understanding, a 2x2 renderer starts by picking (x,y) coordinates for two pixels, and then, for each of the two pixels, computes two sets of {s,t} texture coordinates and then looks up the associated texture data from two texture maps. All this once per clock cycle.
arjan de lumens is offline   Reply With Quote
Old 29-Mar-2004, 05:49   #10
Bahadir
Member
 
Join Date: Mar 2004
Location: Australia
Posts: 97
Default

thanks
Bahadir is offline   Reply With Quote
Old 29-Mar-2004, 06:48   #11
SiliconAbyss
Junior Member
 
Join Date: Mar 2004
Location: Canada
Posts: 75
Send a message via ICQ to SiliconAbyss
Default

Thanx from me too, that helped clear the muddy image I had in my mind regarding pipline architectures.
SiliconAbyss is offline   Reply With Quote
Old 29-Mar-2004, 10:17   #12
Dio
Senior Member
 
Join Date: Jul 2002
Location: UK
Posts: 1,758
Default

I should point out that this isn't a 'real' view of how things now work internally, but a legacy method from the 3dfx days. It's still used now as a convenience method for explaining it simply.
Dio is offline   Reply With Quote
Old 29-Mar-2004, 10:32   #13
retsam
Junior Member
 
Join Date: Apr 2003
Posts: 32
Default

here is a question that might expand on Bahadir's question..... ok say we have 16x1 in order to change there state to 32x0 does the pipe line have to be purged before reconfiguring itself from one to the other .... and what sort of performace penalties are we talking about here to go from 16x1 to 32x0 if the pipelines have to be purged..... do we even know how long these pipes are on these chips ???



rets
retsam is offline   Reply With Quote
Old 29-Mar-2004, 10:54   #14
Chalnoth
 
Join Date: May 2002
Location: New York, NY
Posts: 12,678
Default

I would suspect that it would "switch modes" only when there's a change in some global rendering variables, such as whether or not to output color. So yes, the pipelines would definitely have to be flushed. This should, however, be a tiny performance hit, as you should only do this a couple of times per frame.
Chalnoth is offline   Reply With Quote
Old 29-Mar-2004, 11:04   #15
pmac
Junior Member
 
Join Date: Feb 2004
Posts: 15
Default

Quote:
Originally Posted by Dio
I should point out that this isn't a 'real' view of how things now work internally, but a legacy method from the 3dfx days. It's still used now as a convenience method for explaining it simply.
I don't suppose you could explain to us how the internal structure of modern GPUs can best be described then, by any chance ?

Especially the upcoming technology.
pmac is offline   Reply With Quote
Old 29-Mar-2004, 11:09   #16
Dave Baumann
Gamerscore Wh...
 
Join Date: Jan 2002
Posts: 12,951
Default

Search for the term "Quad pipeline" here.

Basically pipelines from DX8 onwards really operate on a quad of pixels - a 2x2 pixel section from a triangle. The reson for this is that there are some instructions that have dependancies on neighbouring pixels.

The upshot of this is that a "4x1" pipeline is actually a "single quad" pipeline. Radeon 9800, having an "8x1" pipeline is operating on two quads at any one time. A modern 2 pixel pipeline is still working on a quad, but doing it over two (or more) cycles.
__________________
Expand. Accelerate. Dominate.
Tweet Tweet!
Dave Baumann is offline   Reply With Quote
Old 29-Mar-2004, 12:39   #17
Dio
Senior Member
 
Join Date: Jul 2002
Location: UK
Posts: 1,758
Default

It's all a question of bottlenecks inside the engine. You have various things you need to be able to do per clock cycle, and if there's any you don't have enough of, that limits your performance (a 'bottleneck'):
- buffer reads
- generate any interpolated values
- execute pixel shader instructions
- generate sampler addresses
- look up textures
- buffer writes
There might be other bottlenecks too.

Saying it's a '8x1' allocates various values:
- 8 Z reads
- 8 colour reads
- 8 interpolators
- 8 pixel shader instructions
- 8 sampler addresses
- 8 texture reads
- 8 Z writes
- 8 colour writes

Because 8x1 doesn't give much information, we see confusing things like '8x1/16x0" which means '8x1, but can do 16 Z reads and 16 Z writes'. It's a useful 'quick fix' but it's got limited relevance to how things work. (At least, I think that's what people who use it mean. I don't actually know, it seems to be a bit of a woolly term!)

It may be more complex still. It may have dependencies on renderer state, or these numbers aren't integers, or different parts of the pipeline may share resources with other parts of the engine (e.g. you will see that the first section says 'buffer reads' and the second explicitly separates them into Z and colour - there's no reason that necessarily has to be the case). As Dave says, there's also granularity - there may be smallest chunks of data that can be processed.

The further you get into GPU performance the more bottleneck and bubble analysis starts to take over your life
Dio is offline   Reply With Quote
Old 29-Mar-2004, 12:50   #18
Dio
Senior Member
 
Join Date: Jul 2002
Location: UK
Posts: 1,758
Default

So since I and Chalnoth both touched on it I'd better briefly mention bubbles: a bubble is what you get when anything further down the pipe has to wait for some event higher up the pipe.

An extreme example is when the application reads the back buffer - the pipeline has to be completely flushed so that you can guarantee all rendering operations to that back buffer has completed.

There's lots of effort and quite a bit of silicon goes into avoiding bubbles.
Dio is offline   Reply With Quote
Old 29-Mar-2004, 13:17   #19
Bahadir
Member
 
Join Date: Mar 2004
Location: Australia
Posts: 97
Default

Quote:
Originally Posted by DaveBaumann
Search for the term "Quad pipeline" here.

Basically pipelines from DX8 onwards really operate on a quad of pixels - a 2x2 pixel section from a triangle. The reson for this is that there are some instructions that have dependancies on neighbouring pixels.

The upshot of this is that a "4x1" pipeline is actually a "single quad" pipeline. Radeon 9800, having an "8x1" pipeline is operating on two quads at any one time. A modern 2 pixel pipeline is still working on a quad, but doing it over two (or more) cycles.
This doesnt make sense! How can A "8x1" pipeline such as a Radeon 9800 work on 2 quads? I thought the quads are broken down to triangles, so 1 quad->2 triangles, therefore 2 quads would make 4 triangles.

so how does "8x1" fit in the picture?
Im confused :?
Bahadir is offline   Reply With Quote
Old 29-Mar-2004, 13:31   #20
Dave Baumann
Gamerscore Wh...
 
Join Date: Jan 2002
Posts: 12,951
Default

We're not talking "Geometry Quads" here, but pixel quads. Once you've gone through the geometry setup and evaluated the triangle to screen space the triangle is split up into 2x2 pixel regions (or quads) and then processed by a "quad" of pixel pipelines.
__________________
Expand. Accelerate. Dominate.
Tweet Tweet!
Dave Baumann is offline   Reply With Quote
Old 29-Mar-2004, 13:31   #21
Dio
Senior Member
 
Join Date: Jul 2002
Location: UK
Posts: 1,758
Default

Blame our crappy terminology. 'Quads' in this context are 'Quad-pixels' not 'Quadrilaterals'. I'm not a big fan of it either, but we're stuck with it
Dio is offline   Reply With Quote
Old 29-Mar-2004, 13:33   #22
Dave Baumann
Gamerscore Wh...
 
Join Date: Jan 2002
Posts: 12,951
Default

snap
__________________
Expand. Accelerate. Dominate.
Tweet Tweet!
Dave Baumann is offline   Reply With Quote
Old 29-Mar-2004, 16:11   #23
demalion
Senior Member
 
Join Date: Feb 2002
Location: CT
Posts: 2,024
Default

A search on the term "proxel" (the smiley in that post is a link, BTW) will drop you into the middle of some long discussions of this that go into a lot of detail from the different perspective of trying to understand and describe some architectures when this complexity was being exhibited. That term is one I made up (and no one else uses, so don't learn it :P) to try to discuss pixel shading with some relation to the more easily understood pixel and texel terms, and the discussions touch on the topic question from many angles.

If you find the discussions confusing instead of revealing, just disregard.
demalion is offline   Reply With Quote
Old 30-Mar-2004, 04:04   #24
Rookie
Junior Member
 
Join Date: Feb 2002
Posts: 83
Default

Quote:
Originally Posted by DemoCoder
It's a shadow acceleration mode. If doing shadow buffers, it allows you to fill the buffer at up to 2x the fillrate. If doing stencil shadow volumes, it allows you to write stencils at up to 2x the fillrate.
Is it the only benefit of 32x0 pipeline?

BTW,How many textures are used in popular game? As I know,quake use 3 texture in some place,serios sam use tri textures,how about UT2004 or Farcry,painkiller?Does quadtexture is widely used now?
__________________
Rong"Rookie"Huo
Former Beyond3d Boys,Err,not Bit boys
Rookie is offline   Reply With Quote
Old 30-Mar-2004, 04:13   #25
Chalnoth
 
Join Date: May 2002
Location: New York, NY
Posts: 12,678
Default

The number of textures used will vary from surface to surface, and will vary widely depending upon the game.

For a complex surface, I would suspect 4 textures would be a minimum.

Anyway, the hypothetical "32x0" that we're talking about here would have nothing to do with textures, but rather with rendering z and stencil data. Such an architecture would accelerate an initial z-pass (something that is necessary for shadow volumes, but is also helpful in allowing hardware to not have to render anything that will later be covered up), as well as stencil shadow volume rendering.

It may also be possible for such an architecture to accelerate other shadowing techniques, but that would depend upon the hardware implementation.
Chalnoth is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
ATI, Cadence and TSMC Produce Fabless X Architecture Chip Dave Baumann Press Releases 6 15-Jun-2005 02:43
Simple Question on Next Generation Consoles gosh Console Technology 23 12-Jun-2005 17:59
ATI and NVIDIA Proclaim Different GPU Architecture Goals Megadrive1988 3D Architectures & Chips 3 29-Dec-2004 18:37
Unified Pipeline Architecture trinibwoy 3D Architectures & Chips 4 25-Sep-2004 04:20
No CELL revealed at IBM event. Sony Licenses POWER.. Deadmeat Console Technology 54 02-Apr-2004 16:25


All times are GMT +1. The time now is 20:20.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.