Shading Instructions and Rasterizer !

Depends what you call a core. On a current generation NV GPU, a core is made up of three blocks, each one running the same instruction each on a set of pixels (eight) in a given clock, in lock step. But that's three potentially different instructions in a clock per core there. Previous unified hardware was two, Fermi is two.

For ATI, the core (ATI call it a SIMD) will process 16 pixels per clock, with potentially 5 different instructions per pixel in that clock, but the same 5 instructions for each pixel, in lock step.
I am sorry I got a little confused ! what is lock step ?

And by cores you mean like CUDA cores (I know it sounds cheesy) , or the whole block of 8 shader processors ? I am guessing the latter .

As for lower precision, yes, FP16 and FP24 have variously been the common floating point precision in programmable hardware in recent years for pixel shading, with FP32 mandatory in recent generations. There are various other different precisions at work across the chip depending on what's being computed (both very high and quite low), but for the generally programmable logic in a modern unified architecture is heavily 32-bit, integer or float.
I got that part very well , thanks you alot sir .
 
Progression of precision in "pixel shaders" went from sub 8 bits per component (integer) to 8 bits per component (integer) to more then 8/10 bits per component (integer) and 16 bit floating point per component to 32 bit integer/float per component and over (double precision floating point in DX11).

And yes, each core can do a single mul, add, (4D) dot product, multiply add,... per clock. Though that's technically not entirely correct. Instructions are pipelined so results won't be immediately availible. If you were to issue a single add you'd get result back after say four clocks. However you can issue a new instruction (n) every clock and get a result of instruction (n-4) every clock.
Block/sub-block of cores (depending on what exactly you mean by that) is more or less helping to schedule everything efficiently/cheaply.
 
I am sorry I got a little confused ! what is lock step ?
It's just a different way to say "in sync". So all of the units are executing a MUL in the same clock, for different pixels, for example.

And by cores you mean like CUDA cores (I know it sounds cheesy) , or the whole block of 8 shader processors ? I am guessing the latter .
Correct, the latter. NVIDIA abuse the word core when they say CUDA core. On a GT200 for example, with 240 ALUs, 24 to a core (3 sub blocks of 8, as discussed previously), there are 10 cores. For ATI, each modern thing we'd call a core (usually) has 80 ALUs, so 20 cores for their Cypress chip.
 
Progression of precision in "pixel shaders" went from sub 8 bits per component (integer) to 8 bits per component (integer) to more then 8/10 bits per component (integer) and 16 bit floating point per component to 32 bit integer/float per component and over (double precision floating point in DX11).
I see that it was very gradual .. thanks for the detail .
And yes, each core can do a single mul, add, (4D) dot product, multiply add,... per clock. Though that's technically not entirely correct. Instructions are pipelined so results won't be immediately availible. If you were to issue a single add you'd get result back after say four clocks. However you can issue a new instruction (n) every clock and get a result of instruction (n-4) every clock.
Aha , I see , now I understand that concept differently , thank you guys a lot .


Now back to those question please :
Question 1 : Since the GPU handles many vertices position values , these values must relate to a fixed values on an axis (ie, the X or Y or Z axis) does the GPU need a global coordinates that are stored in memory , ot it just prcoess values and then compare them to Screen coordinates ?

Question 2 : concerning interpolation , supposedly we have a big triangle , whose 3 vertices have the color of Red , during color interpolation for the pixels inside that trinagle , the pixel in the middle would have a color of faint red , (due to the process of interpolation) , which is wrong ! wouldn't it just be easier if the vertices color was copied over to all the pixels inside the triangle ?


Question 3 : what is "lock step" Rys was referring to ? is it the same thing as pipelining ?
 
Question 2 : concerning interpolation , supposedly we have a big triangle , whose 3 vertices have the color of Red , during color interpolation for the pixels inside that trinagle , the pixel in the middle would have a color of faint red , (due to the process of interpolation) , which is wrong ! wouldn't it just be easier if the vertices color was copied over to all the pixels inside the triangle ?
The pixel colour would be red too.
 
It's just a different way to say "in sync". So all of the units are executing a MUL in the same clock, for different pixels, for example.
Aha , I get it now .
Correct, the latter. NVIDIA abuse the word core when they say CUDA core. On a GT200 for example, with 240 ALUs, 24 to a core (3 sub blocks of 8, as discussed previously), there are 10 cores. For ATI, each modern thing we'd call a core (usually) has 80 ALUs, so 20 cores for their Cypress chip
They do abuse these concepts a lot indeed, I was very confused about what's going on , until you guys came and enlightened me , I really can't express my feelings right now .. talk about pure scientific excitement !

We can ignore the Question number 3 then , and move on to number 1 and 2 : about the where about of global coordinates (does it even exist ?) and the use of color interpolation which could give unneeded results .
 
The pixel colour would be red too.
How ? Interpolation is a way of finding intermediate points , and ..wait .. if the points are the same , then there wouldn't be a need for intermediate points ! that is right ! how stupid could I be ? I am sorry , you are right , I must have confused it with something else . :???:

Then what about Global Coordinates , are they necessary , I mean should they be established the first thing before rendering ?
 
If I understand you correctly with regards to question one, there's a final projection of the geometry to the screen before it's drawn, which gives you your screen-space xyz to work with during rasterisation and shading. That screen space would be your global coords.

As for question two, I guess the hardware would optimise away the calculation of the interpolant if it's not needed (vertex colour attributes are constant across the triangle in this case), but I'm not actually sure. Hardware guys, does this happen?
 
Question 1 : Since the GPU handles many vertices position values , these values must relate to a fixed values on an axis (ie, the X or Y or Z axis) does the GPU need a global coordinates that are stored in memory , ot it just prcoess values and then compare them to Screen coordinates ?

I'm not sure what exactly do you mean here... Everything GPU needs is hidden in three matrices: "world", "view" and "projection" which gets multiplied into one matrix on CPU which then gets passed to GPU and then GPU will multiply incoming vertex positions by this matrix. Note that this is the simplest of cases and even vertex shaders can get VERY complex if you add animation and/or per vertex lighting into the mix.
However this is just "boring" linear algebra. :) When vertices come into the pipeline they are in some random local space (object space) where each object has it's origin in it's center (for example). And after they get multiplied (transformed) by this matrix they end up in screen space.
Why would some global coordinates be needed?

Question 2 : concerning interpolation , supposedly we have a big triangle , whose 3 vertices have the color of Red , during color interpolation for the pixels inside that trinagle , the pixel in the middle would have a color of faint red , (due to the process of interpolation) , which is wrong ! wouldn't it just be easier if the vertices color was copied over to all the pixels inside the triangle ?
Yeah, but how do you know that all three colors are the same? It would be expensive to check and since 90% of the time you're really dealing with different values also redundant. And as it was already pointed out the center would still be red.
The kicker here is however that that these values (vertex colors, texture coordinates) are not actually interpolated per se. Rasterizer determines how fast the value is changing (gradient) over a given triangle and simply adds this change to the starting value for each rasterized pixel when it "walks" over the triangle.
 
@David:
For your "question 2", if you have 3 vertices of the triangle, A B & C, that have have some attribute, say, the red colour component, Ra, Rb & Rc, then at any pixel in the triangle then you will have
Code:
Rpixel = w1 * Ra + w2 * Rb + w3 *Rc
where
w1 + w2 +w3 = 1
and all ws are between 0 and 1.
 
The w values are, in effect, give the position on the surface of the triangle.
 
If I understand you correctly with regards to question one, there's a final projection of the geometry to the screen before it's drawn, which gives you your screen-space xyz to work with during rasterisation and shading. That screen space would be your global coords.
You did understand me correctly , thanks for the answer .:smile:

Another Question :
In your article : http://www.beyond3d.com/content/reviews/51/2 , in which you reviewed GT200 archeticture you mentioned that each Cluster contains 3X8 FP32 Scalar ALU , and 3X8 FP32 Scalar Interpolator ! .. are those the Special Function Units ?
 
I'm not sure what exactly do you mean here... Everything GPU needs is hidden in three matrices: "world", "view" and "projection" which gets multiplied into one matrix on CPU which then gets passed to GPU and then GPU will multiply incoming vertex positions by this matrix. Note that this is the simplest of cases and even vertex shaders can get VERY complex if you add animation and/or per vertex lighting into the mix.
So the Global Matrix (supposedly) is a combination of 3 things : World View and Projection , is that right ?
When vertices comeinto the pipeline they are in some random local space (object space) where each object has it's origin in it's center (for example). And after they get multiplied (transformed) by this matrix they end up in screen space.
So each object has it's own coordinates , after being transformed , it ends up in the screen space , which we could call it the global coordinates , right ?

If so , then are the screen coordinates the same as projection coordinates ?
The kicker here is however that that these values (vertex colors, texture coordinates) are not actually interpolated per se. Rasterizer determines how fast the value is changing (gradient) over a given triangle and simply adds this change to the starting value for each rasterized pixel when it "walks" over the triangle
I am sorry , but could you please elaborate more on this subject ? I didn't get it :cry: .
 
@David:
For your "question 2", if you have 3 vertices of the triangle, A B & C, that have have some attribute, say, the red colour component, Ra, Rb & Rc, then at any pixel in the triangle then you will have
Code:
Rpixel = w1 * Ra + w2 * Rb + w3 *Rc
And since Ra = Rb = RC , then :

Rpixel = (w1 + w2 + w3) R .
but I didn't get what that is supposed to do ? what is the relation between the surface of the triangle (W) and pixel color ?
 
And since Ra = Rb = RC , then :

Rpixel = (w1 + w2 + w3) R .
but I didn't get what that is supposed to do ? what is the relation between the surface of the triangle (W) and pixel color ?

Ra=Rb=Rc is a very special case. It is far more common to have all three different.
 
I guess there are two main reasons :

1. Apps rarely draw triangles in isolation - they use triangles to model real world objects which usually have varying colors across their surfaces, and those varying colors usually result in the vertices of each triangle being different as well.

2. Even if you were drawing a "solid red thing", lighting effects (performed in the vertex shader) would frequently result in having different brightness levels for each vertex.

If I may offer (repeat) some advice, you'll have a much easier time understanding the hardware once you have a bit more familiarity with the software. Picking up a decent book on OpenGL and actually running some simple programs on your PC will make all this (especially the coordinate space stuff) seem a lot more clear.
 
I guess there are two main reasons :

1. Apps rarely draw triangles in isolation - they use triangles to model real world objects which usually have varying colors across their surfaces, and those varying colors usually result in the vertices of each triangle being different as well.

2. Even if you were drawing a "solid red thing", lighting effects (performed in the vertex shader) would frequently result in having different brightness levels for each vertex.
Thanks , I think I can settle with this explanation .

If I may offer (repeat) some advice, you'll have a much easier time understanding the hardware once you have a bit more familiarity with the software. Picking up a decent book on OpenGL and actually running some simple programs on your PC will make all this (especially the coordinate space stuff) seem a lot more clear.
Advice Appreciated :smile:, I am willing to do just that , but I am under the impression that being good in software (designing and writing code) doesn't necessarily mean understanding things , and by that I mean the deep-gut understanding , which gives you the ability to visualize what is going on in your mind .
 
If I understand you correctly with regards to question one, there's a final projection of the geometry to the screen before it's drawn, which gives you your screen-space xyz to work with during rasterisation and shading. That screen space would be your global coords.
Personally, I would not describe screen space as being the same as global coordinates. To me, global coordinates are used to assemble all the models etc into the same coordinate space and for doing, say, lighting calculations and perhaps collision detection.

Screen coordinates are in pixel dimensions and are used for rasterisation.
 
Back
Top