Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 15-Oct-2003, 08:44   #1
991060
Member
 
Join Date: Jul 2003
Location: Beijing
Posts: 640
Default pixel shader pipeline parallelism restriction problem.

I've just read a whitepaper named "Xbox Pixel Shader Performance" which you can find in the latest XBOX SDK.
here's some quote :
Quote:
First, on any one clock, the four pixel shader pipelines can only work on pixels belonging to a single triangle. This means that if you were to draw, say, four 1-pixel triangles in a row, it would take a minimum of 4 clocks, and three of the pixel shader pipelines would remain idle during each clock, unable to work in parallel with the one active pixel shader pipeline because there are no other pixels to work on in the current triangle.
Quote:
The restriction is that the four pixel shader pipelines can draw to only one quad on any given clock. No matter whether 1, 2, 3, or 4 pixels of a given quad are covered by a triangle, that quad will tie up all four pixel shader pipelines while its being drawn.
Quote:
the two pixel-shader-pipeline parallelism restrictions can be rolled into a single sentence: The four pixel shader pipelines can draw in parallel only to a single quad of a single triangle.
I'm wondering if such restriction also apply to current PC GPU product,such as NV3X and R3XX.
991060 is offline   Reply With Quote
Old 15-Oct-2003, 08:52   #2
991060
Member
 
Join Date: Jul 2003
Location: Beijing
Posts: 640
Default

And, if such restrictions do exist, where do they come from?

I think the single triangle restriction is due to the fact that the PS pipelines need data which is interpolated across the triangle, so shading pixels in two triangle doesn't make sense. please correct me here if I'm wrong.

So what about the single quad restriction? Accoording to the "NV30 inside" article published on 3dcenter.org, NV30 does have such restriction. what about R300/350? they have 8 Pixel shader pipeline, so 2 quads at one time?
991060 is offline   Reply With Quote
Old 15-Oct-2003, 09:27   #3
Simon F
Tea maker
 
Join Date: Feb 2002
Location: In the Island of Sodor, where the steam trains lie
Posts: 4,379
Default

Quote:
Originally Posted by 991060
And, if such restrictions do exist, where do they come from?
Probably the need to do 2x2 pixels at a time in order to implement the "d/dx" and "d/dy" instructions. (Those aren't the correct names but I'm too lazy to look it up).
__________________
"Your work is both good and original. Unfortunately the part that is good is not original and the part that is original is not good." -(attributed to) Samuel Johnson

"I invented the term Object-Oriented, and I can tell you I did not have C++ in mind." Alan Kay
Simon F is offline   Reply With Quote
Old 15-Oct-2003, 09:37   #4
Hyp-X
Irregular
 
Join Date: Feb 2002
Posts: 1,170
Default

Quote:
Originally Posted by 991060
what about R300/350? they have 8 Pixel shader pipeline, so 2 quads at one time?
Yes, 2 quads.
The 2 quads are processed independently, can take different processing time or belong to different triangles.
There's a 16x16 tile checkerboard pattern one of the units is processing the "black" tiles, the other the "white" tiles.
Hyp-X is offline   Reply With Quote
Old 15-Oct-2003, 10:04   #5
991060
Member
 
Join Date: Jul 2003
Location: Beijing
Posts: 640
Default

thanks Simon F and Hyp-x, those comments are helpful though I don't quite understand the "16x16 pattern" thing

Another question: With the advent of DFC, it's quite possible that different pixels within a single quad need different processing time, is it safe to say that the processing time a quad needs is that of its slowest pixel? If this is true, I think the parallelism is reduced with more pixel shader pipelines reside in a single GPU because the possibility that all pixels which are processed at one time need equal processing time is decreasing very quickly with more pixel shader pipelines. Is it possible that IHVs design their hardware to assign each pixel a independent pipeline rather than assigning 4 pipelines to a quad? how about this approach's efficiency?
991060 is offline   Reply With Quote
Old 15-Oct-2003, 10:17   #6
Dio
Senior Member
 
Join Date: Jul 2002
Location: UK
Posts: 1,758
Default

These kind of things are known as 'granularity losses' (the smallest chunk that can be worked on is larger than the smallest chunk of useful data that could be desired).

It's rarely significant on very small triangles - these tend to be vertex limited. In theory it could be somewhat of a problem on long (100s of pixels) and skinny (~1 pixel) triangles, but I've never actually seen a problem case.

I don't like the terminology used below. 'The four pixel pipelines...' is a misnomer because it implies independence. It is one quad pipeline. (Regular readers will have heard this rant before).
Dio is offline   Reply With Quote
Old 15-Oct-2003, 10:28   #7
Simon F
Tea maker
 
Join Date: Feb 2002
Location: In the Island of Sodor, where the steam trains lie
Posts: 4,379
Default

Quote:
Originally Posted by 991060
thanks Simon F and Hyp-x, those comments are helpful though I don't quite understand the "16x16 pattern" thing
My guess is that it's probably just ATI's way of avoiding concurrency issues . For example imagine you have two polys that overlap at pixel, P. if you had independent pipeline blocks that can write to any pixel, you don't a situation where the first poly runs a slow shader on Pipeline A then later goes and overwrites P from the second poly because it happened to run a fast shader on Pipeline B.

Quote:
Another question: With the advent of DFC, it's quite possible that different pixels within a single quad need different processing time, is it safe to say that the processing time a quad needs is that of its slowest pixel? If this is true, I think the parallelism is reduced with more pixel shader pipelines reside in a single GPU because the possibility that all pixels which are processed at one time need equal processing time is decreasing very quickly with more pixel shader pipelines.
Yes this is true in theory, but unlikely to be a problem in practice. If each pixel in the block were to do completely different things, then it's likely to alias like b*ggery.

Quote:
Originally Posted by Dio
I don't like the terminology used below. 'The four pixel pipelines...' is a misnomer because it implies independence. It is one quad pipeline. (Regular readers will have heard this rant before).
Trouble is "quad" is often used for 4 sided polys. Perhaps we should call this a "pixel quartet"?
__________________
"Your work is both good and original. Unfortunately the part that is good is not original and the part that is original is not good." -(attributed to) Samuel Johnson

"I invented the term Object-Oriented, and I can tell you I did not have C++ in mind." Alan Kay
Simon F is offline   Reply With Quote
Old 15-Oct-2003, 11:04   #8
Nick
Senior Member
 
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
Default

The restriction is most probably needed to calculate texture gradients, like the way dsx/dsy work. These gradients are then used to compute the mipmap level.

This way you can do whatever transformation on the texture coordinates, the mipmap level will still be calculated correctly. This avoids overblur and aliasing that would otherwise occur with 'linear' mipmap level interpolation. That latter method also requires more operations especially when using adaptive anisotropic filtering. Should save some silicon...
Nick is offline   Reply With Quote
Old 15-Oct-2003, 11:56   #9
Dio
Senior Member
 
Join Date: Jul 2002
Location: UK
Posts: 1,758
Default

Quote:
Originally Posted by Simon F
Trouble is "quad" is often used for 4 sided polys. Perhaps we should call this a "pixel quartet"?
We're stuck with quads, I fear...

If we can't get away from TMU's and pixel pipelines, anything you or I choose to do is not likely to make much difference
Dio is offline   Reply With Quote
Old 15-Oct-2003, 12:06   #10
991060
Member
 
Join Date: Jul 2003
Location: Beijing
Posts: 640
Default

Thanks for all the replies, I'm beefed up by visiting here :P
991060 is offline   Reply With Quote
Old 15-Oct-2003, 13:23   #11
JohnH
Member
 
Join Date: Mar 2002
Location: UK
Posts: 570
Default

Quote:
Originally Posted by Dio
Quote:
Originally Posted by Simon F
Trouble is "quad" is often used for 4 sided polys. Perhaps we should call this a "pixel quartet"?
We're stuck with quads, I fear...

If we can't get away from TMU's and pixel pipelines, anything you or I choose to do is not likely to make much difference
Quads could disappear at the point that the majority of apps are heavily using dynamic flow control making dsx/dsy approach to lod calc somewhat less useful than it is today...

John.
JohnH is offline   Reply With Quote
Old 15-Oct-2003, 22:54   #12
Nick
Senior Member
 
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
Default

Quote:
Originally Posted by JohnH
Quads could disappear at the point that the majority of apps are heavily using dynamic flow control making dsx/dsy approach to lod calc somewhat less useful than it is today...
Do you know of any efficient alternative?

The only method I know that really makes gradient calculations independent is to fully shade 3 texture coordinates per pixel (arranged in a 1x1 pixel triangle). The cost of this is of course comparable to 3x supersampling but with shader analysis it could be reduced a lot?
Nick is offline   Reply With Quote
Old 16-Oct-2003, 11:07   #13
JohnH
Member
 
Join Date: Mar 2002
Location: UK
Posts: 570
Default

Quote:
Originally Posted by Nick
Quote:
Originally Posted by JohnH
Quads could disappear at the point that the majority of apps are heavily using dynamic flow control making dsx/dsy approach to lod calc somewhat less useful than it is today...
Do you know of any efficient alternative?

The only method I know that really makes gradient calculations independent is to fully shade 3 texture coordinates per pixel (arranged in a 1x1 pixel triangle). The cost of this is of course comparable to 3x supersampling but with shader analysis it could be reduced a lot?
Worst case you need to provide shader code to generate dsx and dsy for your specific shader function, these might be simple e.g. just based on A,B from a plane eqn, but could equally be something rather lardy. The key is that if you do these sort of things the HW isn't going to be able to do the work directly anymore.

This is of course the price you pays...

John.
JohnH is offline   Reply With Quote
Old 16-Oct-2003, 21:46   #14
Arun
Unknown.
 
Join Date: Aug 2002
Location: UK
Posts: 4,877
Default

Quote:
Originally Posted by Dio
Quote:
Originally Posted by Simon F
Trouble is "quad" is often used for 4 sided polys. Perhaps we should call this a "pixel quartet"?
We're stuck with quads, I fear...

If we can't get away from TMU's and pixel pipelines, anything you or I choose to do is not likely to make much difference
Hmm... *thinks a bit*
Does that mean R500 is less advanced than NV50?
Or am I just reading too much in that sentence?

Or maybe am I overestimating the NV50? I doubt that though.


Uttar
Arun is offline   Reply With Quote
Old 17-Oct-2003, 01:17   #15
nelg
Senior Member
 
Join Date: Jan 2003
Location: Toronto
Posts: 1,557
Default

Uttar, I am going to ask that Dave bans you from B3D until you produce that editorial
__________________
on my way to becoming dark matter..........
nelg is offline   Reply With Quote
Old 17-Oct-2003, 09:35   #16
Dio
Senior Member
 
Join Date: Jul 2002
Location: UK
Posts: 1,758
Default

Quote:
Originally Posted by Uttar
Quote:
Originally Posted by Dio
Quote:
Originally Posted by Simon F
Trouble is "quad" is often used for 4 sided polys. Perhaps we should call this a "pixel quartet"?
If we can't get away from TMU's and pixel pipelines, anything you or I choose to do is not likely to make much difference
Or am I just reading too much in that sentence?
I meant 'If we can't escape the archaic uses of terminology like TMU's and pixel pipelines...'

Does that clarify things?
Dio is offline   Reply With Quote
Old 17-Oct-2003, 15:55   #17
Arun
Unknown.
 
Join Date: Aug 2002
Location: UK
Posts: 4,877
Default

Quote:
Originally Posted by nelg
Uttar, I am going to ask that Dave bans you from B3D until you produce that editorial
ROFL!

Hey, if I'm being slower than for most of my other writing, is that I want this to be high quality and very informative.
It's not I'm not working on it. It's just that there is thus a lot more "overhead" ( NV30 anyone? ) than usual.

That includes a professional technical writer correcting it in his free time and sources commenting on it, to make sure I ain't making any major mistakes and that the overall message is indeed correct.

Now, if one of the sources just told me "Err, no, that all this just doesn't seem to be true at all", I'd just scrap the whole thing and start again with other goals. No kidding.

7 full A4 pages in Times New Romans, size 12, written already. Current goal is around 10 pages.

ETA is 20-25 October. Before releasing it, I also need to make sure it's released before or after an official launch, to make sure its existence isn't forgotten due to discussions about I don't know what awfully boring fall refresh :P

Dio: Lol, okay. I did read too much in that sentence I guess


Uttar
Arun is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Xenos Chip Package Dave Baumann Beyond3D News 44 23-Jun-2005 09:30
How long before a X800 wrapper (Ruby demo) appears? g__day 3D Technology & Algorithms 285 17-Jan-2005 10:08
Ati's Technology Marketing Manager @ TechReport Evildeus 3D & Semiconductor Industry 67 02-Jun-2004 21:36
5900? NV4x! Frank 3D Architectures & Chips 71 10-Oct-2003 21:12


All times are GMT +1. The time now is 18:19.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.