Shading Instructions and Rasterizer !

DavidGraham · Dec 29, 2009

Rys said:
Depends what you call a core. On a current generation NV GPU, a core is made up of three blocks, each one running the same instruction each on a set of pixels (eight) in a given clock, in lock step. But that's three potentially different instructions in a clock per core there. Previous unified hardware was two, Fermi is two.

For ATI, the core (ATI call it a SIMD) will process 16 pixels per clock, with potentially 5 different instructions per pixel in that clock, but the same 5 instructions for each pixel, in lock step.

I am sorry I got a little confused ! what is lock step ?

And by cores you mean like CUDA cores (I know it sounds cheesy) , or the whole block of 8 shader processors ? I am guessing the latter .

As for lower precision, yes, FP16 and FP24 have variously been the common floating point precision in programmable hardware in recent years for pixel shading, with FP32 mandatory in recent generations. There are various other different precisions at work across the chip depending on what's being computed (both very high and quite low), but for the generally programmable logic in a modern unified architecture is heavily 32-bit, integer or float.

I got that part very well , thanks you alot sir .

MDolenc · Dec 29, 2009

Progression of precision in "pixel shaders" went from sub 8 bits per component (integer) to 8 bits per component (integer) to more then 8/10 bits per component (integer) and 16 bit floating point per component to 32 bit integer/float per component and over (double precision floating point in DX11).

And yes, each core can do a single mul, add, (4D) dot product, multiply add,... per clock. Though that's technically not entirely correct. Instructions are pipelined so results won't be immediately availible. If you were to issue a single add you'd get result back after say four clocks. However you can issue a new instruction

every clock and get a result of instruction (n-4) every clock.
Block/sub-block of cores (depending on what exactly you mean by that) is more or less helping to schedule everything efficiently/cheaply.

Rys · Dec 29, 2009

DavidGraham said:
I am sorry I got a little confused ! what is lock step ?

It's just a different way to say "in sync". So all of the units are executing a MUL in the same clock, for different pixels, for example.

DavidGraham said:
And by cores you mean like CUDA cores (I know it sounds cheesy) , or the whole block of 8 shader processors ? I am guessing the latter .

Correct, the latter. NVIDIA abuse the word core when they say CUDA core. On a GT200 for example, with 240 ALUs, 24 to a core (3 sub blocks of 8, as discussed previously), there are 10 cores. For ATI, each modern thing we'd call a core (usually) has 80 ALUs, so 20 cores for their Cypress chip.

DavidGraham · Dec 29, 2009

MDolenc said:
Progression of precision in "pixel shaders" went from sub 8 bits per component (integer) to 8 bits per component (integer) to more then 8/10 bits per component (integer) and 16 bit floating point per component to 32 bit integer/float per component and over (double precision floating point in DX11).

I see that it was very gradual .. thanks for the detail .

And yes, each core can do a single mul, add, (4D) dot product, multiply add,... per clock. Though that's technically not entirely correct. Instructions are pipelined so results won't be immediately availible. If you were to issue a single add you'd get result back after say four clocks. However you can issue a new instruction every clock and get a result of instruction (n-4) every clock.

Aha , I see , now I understand that concept differently , thank you guys a lot .

Now back to those question please :
Question 1 : Since the GPU handles many vertices position values , these values must relate to a fixed values on an axis (ie, the X or Y or Z axis) does the GPU need a global coordinates that are stored in memory , ot it just prcoess values and then compare them to Screen coordinates ?

Question 2 : concerning interpolation , supposedly we have a big triangle , whose 3 vertices have the color of Red , during color interpolation for the pixels inside that trinagle , the pixel in the middle would have a color of faint red , (due to the process of interpolation) , which is wrong ! wouldn't it just be easier if the vertices color was copied over to all the pixels inside the triangle ?

Question 3 : what is "lock step" Rys was referring to ? is it the same thing as pipelining ?

Rys · Dec 29, 2009

DavidGraham said:
Question 2 : concerning interpolation , supposedly we have a big triangle , whose 3 vertices have the color of Red , during color interpolation for the pixels inside that trinagle , the pixel in the middle would have a color of faint red , (due to the process of interpolation) , which is wrong ! wouldn't it just be easier if the vertices color was copied over to all the pixels inside the triangle ?

The pixel colour would be red too.

DavidGraham · Dec 29, 2009

Rys said:
It's just a different way to say "in sync". So all of the units are executing a MUL in the same clock, for different pixels, for example.

Aha , I get it now .

Correct, the latter. NVIDIA abuse the word core when they say CUDA core. On a GT200 for example, with 240 ALUs, 24 to a core (3 sub blocks of 8, as discussed previously), there are 10 cores. For ATI, each modern thing we'd call a core (usually) has 80 ALUs, so 20 cores for their Cypress chip

They do abuse these concepts a lot indeed, I was very confused about what's going on , until you guys came and enlightened me , I really can't express my feelings right now .. talk about pure scientific excitement !

We can ignore the Question number 3 then , and move on to number 1 and 2 : about the where about of global coordinates (does it even exist ?) and the use of color interpolation which could give unneeded results .

DavidGraham · Dec 29, 2009

Rys said:
The pixel colour would be red too.

How ? Interpolation is a way of finding intermediate points , and ..wait .. if the points are the same , then there wouldn't be a need for intermediate points ! that is right ! how stupid could I be ? I am sorry , you are right , I must have confused it with something else . :???:

Then what about Global Coordinates , are they necessary , I mean should they be established the first thing before rendering ?

Rys · Dec 29, 2009

If I understand you correctly with regards to question one, there's a final projection of the geometry to the screen before it's drawn, which gives you your screen-space xyz to work with during rasterisation and shading. That screen space would be your global coords.

As for question two, I guess the hardware would optimise away the calculation of the interpolant if it's not needed (vertex colour attributes are constant across the triangle in this case), but I'm not actually sure. Hardware guys, does this happen?

MDolenc · Dec 29, 2009

DavidGraham said:
Question 1 : Since the GPU handles many vertices position values , these values must relate to a fixed values on an axis (ie, the X or Y or Z axis) does the GPU need a global coordinates that are stored in memory , ot it just prcoess values and then compare them to Screen coordinates ?

I'm not sure what exactly do you mean here... Everything GPU needs is hidden in three matrices: "world", "view" and "projection" which gets multiplied into one matrix on CPU which then gets passed to GPU and then GPU will multiply incoming vertex positions by this matrix. Note that this is the simplest of cases and even vertex shaders can get VERY complex if you add animation and/or per vertex lighting into the mix.
However this is just "boring" linear algebra. When vertices come into the pipeline they are in some random local space (object space) where each object has it's origin in it's center (for example). And after they get multiplied (transformed) by this matrix they end up in screen space.
Why would some global coordinates be needed?

DavidGraham said:

Question 2 : concerning interpolation , supposedly we have a big triangle , whose 3 vertices have the color of Red , during color interpolation for the pixels inside that trinagle , the pixel in the middle would have a color of faint red , (due to the process of interpolation) , which is wrong ! wouldn't it just be easier if the vertices color was copied over to all the pixels inside the triangle ?

Click to expand...

Yeah, but how do you know that all three colors are the same? It would be expensive to check and since 90% of the time you're really dealing with different values also redundant. And as it was already pointed out the center would still be red.
The kicker here is however that that these values (vertex colors, texture coordinates) are not actually interpolated per se. Rasterizer determines how fast the value is changing (gradient) over a given triangle and simply adds this change to the starting value for each rasterized pixel when it "walks" over the triangle.

Simon F · Dec 30, 2009

@David:
For your "question 2", if you have 3 vertices of the triangle, A B & C, that have have some attribute, say, the red colour component, Ra, Rb & Rc, then at any pixel in the triangle then you will have

Code:

Rpixel = w1 * Ra + w2 * Rb + w3 *Rc

where
w1 + w2 +w3 = 1
and all ws are between 0 and 1.

Davros · Dec 30, 2009

ang on what does w represent ?

Simon F · Dec 31, 2009

The w values are, in effect, give the position on the surface of the triangle.

DavidGraham · Dec 31, 2009

Rys said:
If I understand you correctly with regards to question one, there's a final projection of the geometry to the screen before it's drawn, which gives you your screen-space xyz to work with during rasterisation and shading. That screen space would be your global coords.

You did understand me correctly , thanks for the answer .:smile:

Another Question :
In your article : http://www.beyond3d.com/content/reviews/51/2 , in which you reviewed GT200 archeticture you mentioned that each Cluster contains 3X8 FP32 Scalar ALU , and 3X8 FP32 Scalar Interpolator ! .. are those the Special Function Units ?

DavidGraham · Dec 31, 2009

MDolenc said:
I'm not sure what exactly do you mean here... Everything GPU needs is hidden in three matrices: "world", "view" and "projection" which gets multiplied into one matrix on CPU which then gets passed to GPU and then GPU will multiply incoming vertex positions by this matrix. Note that this is the simplest of cases and even vertex shaders can get VERY complex if you add animation and/or per vertex lighting into the mix.

So the Global Matrix (supposedly) is a combination of 3 things : World View and Projection , is that right ?

When vertices comeinto the pipeline they are in some random local space (object space) where each object has it's origin in it's center (for example). And after they get multiplied (transformed) by this matrix they end up in screen space.

So each object has it's own coordinates , after being transformed , it ends up in the screen space , which we could call it the global coordinates , right ?

If so , then are the screen coordinates the same as projection coordinates ?

The kicker here is however that that these values (vertex colors, texture coordinates) are not actually interpolated per se. Rasterizer determines how fast the value is changing (gradient) over a given triangle and simply adds this change to the starting value for each rasterized pixel when it "walks" over the triangle

I am sorry , but could you please elaborate more on this subject ? I didn't get it

.

DavidGraham · Dec 31, 2009

Simon F said:
@David:
For your "question 2", if you have 3 vertices of the triangle, A B & C, that have have some attribute, say, the red colour component, Ra, Rb & Rc, then at any pixel in the triangle then you will have

Code:

Rpixel = w1 * Ra + w2 * Rb + w3 *Rc

And since Ra = Rb = RC , then :

Rpixel = (w1 + w2 + w3) R .
but I didn't get what that is supposed to do ? what is the relation between the surface of the triangle (W) and pixel color ?

rpg.314 · Dec 31, 2009

DavidGraham said:
And since Ra = Rb = RC , then :

Rpixel = (w1 + w2 + w3) R .
but I didn't get what that is supposed to do ? what is the relation between the surface of the triangle (W) and pixel color ?

Ra=Rb=Rc is a very special case. It is far more common to have all three different.

DavidGraham · Dec 31, 2009

rpg.314 said:
Ra=Rb=Rc is a very special case. It is far more common to have all three different.

I still don't get it , why is it such a rare case , is it that strange to have one little red triangle ?
I am serious !

bridgman · Dec 31, 2009

I guess there are two main reasons :

1. Apps rarely draw triangles in isolation - they use triangles to model real world objects which usually have varying colors across their surfaces, and those varying colors usually result in the vertices of each triangle being different as well.

2. Even if you were drawing a "solid red thing", lighting effects (performed in the vertex shader) would frequently result in having different brightness levels for each vertex.

If I may offer (repeat) some advice, you'll have a much easier time understanding the hardware once you have a bit more familiarity with the software. Picking up a decent book on OpenGL and actually running some simple programs on your PC will make all this (especially the coordinate space stuff) seem a lot more clear.

DavidGraham · Dec 31, 2009

bridgman said:
I guess there are two main reasons :

1. Apps rarely draw triangles in isolation - they use triangles to model real world objects which usually have varying colors across their surfaces, and those varying colors usually result in the vertices of each triangle being different as well.

2. Even if you were drawing a "solid red thing", lighting effects (performed in the vertex shader) would frequently result in having different brightness levels for each vertex.

Thanks , I think I can settle with this explanation .

If I may offer (repeat) some advice, you'll have a much easier time understanding the hardware once you have a bit more familiarity with the software. Picking up a decent book on OpenGL and actually running some simple programs on your PC will make all this (especially the coordinate space stuff) seem a lot more clear.

Advice Appreciated :smile:, I am willing to do just that , but I am under the impression that being good in software (designing and writing code) doesn't necessarily mean understanding things , and by that I mean the deep-gut understanding , which gives you the ability to visualize what is going on in your mind .

Simon F · Jan 2, 2010

Rys said:
If I understand you correctly with regards to question one, there's a final projection of the geometry to the screen before it's drawn, which gives you your screen-space xyz to work with during rasterisation and shading. That screen space would be your global coords.

Personally, I would not describe screen space as being the same as global coordinates. To me, global coordinates are used to assemble all the models etc into the same coordinate space and for doing, say, lighting calculations and perhaps collision detection.

Screen coordinates are in pixel dimensions and are used for rasterisation.

Shading Instructions and Rasterizer !

DavidGraham

MDolenc

Rys

Graphics @ AMD

DavidGraham

Rys

Graphics @ AMD

DavidGraham

DavidGraham

Rys

Graphics @ AMD

MDolenc

Simon F

Tea maker

Davros

Simon F

Tea maker

DavidGraham

DavidGraham

DavidGraham

rpg.314

DavidGraham

bridgman

DavidGraham

Simon F

Tea maker

Similar threads