Shading Instructions and Rasterizer !

Hi guys , first I would like to tell you about how much I appreciate this forum , it is full of experts and good chunk of info .

I am an amateur to the computer and 3D world , I am interested in both , as such I have a few mysterious questions I would like to be answered , most of them are targeted at the very basics :

1-The Shading instruction , or any kind of instructions for that matter , are just Copy or Move or Go To instructions ? from what I know , computer instructions are no more than this , it is called instructions because it redirects memory information to specific locations to get them processed in a process , for example if the data is to be added the instruction is called 'Add' . Or to get them saved in another memory location (which would be called "Store Instruction" , am I right ?

2-In the Graphics Pipeline , I see a piece of Hardware called the Rasterizer ! from the every little that I know Vertex Setup happens first , where the GPU receives a good chunk of vertices , do matrix calculations on them by the shader cores , and then starts Interpolating colors in the process known as Rasterization , which -I assume- is handled by that Rasterizer , the thing here is that this bugger is only capable of Rasterizing one triangle per clock cycle , which is very low !

My Question here :
-Does Rasterizing occurs after Vertex Setup or before it ?
-Rasterizing a triangle means interpolating colors between it's vertices to generate pixels .. right ?
-If so , then the very low output of the Rasterizer will prevent the use of many shaders , as we will -have a limited amount of Pixels to perform lighting calculations on by the shader cores ..

3-About the Texture Mapping Units , I understand they are a different kind of hardware , completely different from Shader Cores , I also know that they are composed of texture filtering unit and a hardware for fetching addresses , I also understand that they do their filtering by means of interpolation , so the Component of TMUs is a hardware interpolator and a fetch address unit ?

Then what is the difference between them and the Special Function Units , they are able to interpolation too , why not share resources between the too ?

I am sorry if my questions are too long , I would appreciate getting the answers , as a general rule , I would really like the answer to start simple at first and then build up from that .

Thanks a ton !
 
1. I don't know what exactly do you mean but instructions are divided into several groups: arithmetic, input output and control flow. Since we're talking shading an example of IO instruction would be texture fetch. Note that graphics pipeline currently doesn't really allow you to arbitrary output stuff to memory.

2. I think that by "vertex setup" you actually mean triangle setup right? Vertex setup would IMO describe loading vertices onto the GPU which however is part of the pipeline called input assembly. So triangle setup is actually part of rasterization process and it is actually the first thing that happens. Generally triangle setup will first generate a list of pixels which fall inside a given triangle and for those pixels texture coordinates and such will be interpolated and pixel shaders executed.
One triangle per clock might seem low, however you should also realize that current high end hardware is able to draw 10M triangles per frame (and reject even more) with fairly high frame rates. And since triangles are usually larger then one pixel you'll be able to feed quite a few (pixel) shader cores.
To rasterize more then one triangle per clock you will need to figure out what to do when two triangles that are being rasterized in parallel overlap.

3. There are two kinds of interpolation happening in TMU. One is texture coordinate interpolation. This part is already done in general purpose shader core on ATI DX11 generation of GPUs. Filtering is second use of interpolation but this one still has dedicated hardware. Reason for this is that it rarely needs full precision and is very critical to performance.
And special functions unit is more or less a TMU.
 
DavidGraham, if you download the R6xx_R7xx_3D.pdf document from http://www.x.org/docs/AMD/ that might give you a better picture of the overall flow. Look at the picture on page 8 and read page 7 as a start.

The process starts when the driver sends a list of vertices and information about how those vertices should be grouped to form triangles. The connectivity information can either be implicit (eg the driver sends groups of three vertices, each forming a triangle) or explicit (the driver sends a list of vertices plus a separate list of indices saying, for example, "the first triangle uses vertex #3, #9 and #5".

There are normally at least two different shader programs loaded into the GPU for each drawing operation - vertex shader and pixel shader.

The GPU runs a copy of the vertex shader program on each vertex independently, although many copies of the program run at the same time to get decent performance. The vertex shader program usually includes a lot of vector multiplies & adds to perform transformation, scaling, lighting etc...

The output of the vertex shader program is usually position information plus a heap of vertex attributes. The GPU then combines the processed vertices with the original connectivity information to form primitives (this step is called Primitive Assembly) and then breaks those primitives (usually triangles) up into groups of pixels for further processing. This is what the diagram on page 8 calls Scan Conversion.

The next step is to interpolate vertex parameters across the triangle (dedicated hardware on 6xx/7xx, done in the shader core on Evergreen) and then run a copy of the pixel shader program on each pixel, giving each copy of the shader program the appropriate interpolated values including texture addresses for each active texture along with depth information.

The pixel shader program then runs instructions like texture fetches and various math operations and outputs the result to the rest of the hardware pipeline (DB/CB on a 6xx/7xx part).
 
Last edited by a moderator:
I don't know what exactly do you mean but instructions are divided into several groups: arithmetic, input output and control flow. Since we're talking shading an example of IO instruction would be texture fetch. Note that graphics pipeline currently doesn't really allow you to arbitrary output stuff to memory.

I got it now :
-Arithmetic is like Add or Multiply , which means that the hardware does the actual processing on the data.
-Input/Output are like Load/Store ,which means that data is copied to a memory location from a previous location , the locations could be of any kind :RAM , Cache , Registers .
-Control Flow is like branching and conditional jump , also interrupting execution and hating .

Right ?

3. There are two kinds of interpolation happening in TMU. One is texture coordinate interpolation. This part is already done in general purpose shader core on ATI DX11 generation of GPUs. Filtering is second use of interpolation but this one still has dedicated hardware. Reason for this is that it rarely needs full precision and is very critical to performance.
And special functions unit is more or less a TMU.

Then what is the purpose of texture coordinate interpolation , isn't it a form filtering too ?

I know that filtering involves Bi-linear or Tri-linear interpolation or even Linear , it's purpose is to scale textures to fit the variable sizes of 3D objects , it is to my understanding that when interpolation is made , it works on both coordinates and colors .

If so , then why texture coordinates is done , to be specific why is it a separate process and for what purpose ?
 
The output of the vertex shader program is usually position information plus a heap of vertex attributes. The GPU then combines the processed vertices with the original connectivity information to form primitives (this step is called Primitive Assembly) and then breaks those primitives (usually triangles) up into groups of pixels for further processing. This is what the diagram on page 8 calls Scan Conversion.

Since You mentioned vertex attributes and position , I would like to know about the reason for using the extra w parameter , I understand that location can be specified by X Y Z values , why add a fourth one ?

Also concerning lighting : what is the most common method used nowadays : Vertex or Pixel Lighting ? , by that I mean whether the GPU performs lighting on vertices or pixels , or both ?

The process of Lighting itself is a little confusing , it involves something called normal , what is normal exactly , by definition it means a perpendicular vector , but what really annoys me here is that vectors is a hard concept to grasp , not from the mathematical view , but from the hardware view . I know vertices are stored in a 4-way system inside the memory , using 32-bit or more , they go through the shader cores and undergo several processing to change the 4-way values into new ones , which in turn change the vertex screen position and attributes .

If so , how does a vector fit through the previous description , and how does all of this relate to the concept of "normal" ?

The pixel shader program then runs instructions like texture fetches and various math operations and outputs the result to the rest of the hardware pipeline (DB/CB on a 6xx/7xx part).
You summed this part perfectly , Thanks a lot :).
 
DavidGraham said:
I understand that location can be specified by X Y Z values , why add a fourth one ?
Surprisingly, it makes the math easier(!). It lets you express most transforms using simple matrix multiplication. 4 is also a much nicer number for computers than 3 is.

Also concerning lighting : what is the most common method used nowadays : Vertex or Pixel Lighting ? , by that I mean whether the GPU performs lighting on vertices or pixels , or both ?
Usually, there's a combination of both being used: the VS computes part of the lighting equation, and the PS/FS evaluates it per-fragment.

it involves something called normal , what is normal exactly
The object rendered is rarely matches the polygonal representation. For example, to draw a sphere, you'd render a lot of little triangles. However, lots of little triangles are not the same as a sphere. The lighting would be all wrong if you rendered a lot of little triangles. It would look like those cheap Christmas ornaments.

So in addition to representing the sphere with a bunch of little triangles, surface normals are provided. However, the normals are computed such that they would really be normals from the original sphere model, and not just the mesh of triangles. With the real normal information, lighting for the sphere can be computed much more accurately.

Note that normals are not always generated from the original geometry. Sometimes, they are generated from the triangle meshes themselves ssome weighted averaging of the triangle's actual normals. This is not how normals should be computed, but the differences are usually very hard to spot.
 
One of your questions was about how vectors are handled; they just get represented as a set of 3 or 4 numbers as well. The shader program or hardware knows the context of each set of numbers it gets, so it can treat the numbers as part of a vector, part of a vertex, part of a pixel, or just a number as appropriate.

If you think of a modern GPU as "something that does math really fast on short vectors", plus some dedicated hardware for things like scan conversion/rasterization and texture filtering, where shader programs provide a lot of the graphics-specific behaviour, that gives you a pretty good idea of how they work and why things like GPGPU are interesting.

re: texture coordinate interpolation vs filtering, the interpolation of texture coordinates is what lets each pixel pick up a different value from the texture. The application usually provides texture coordinates for each vertex, then the hardware interpolates those values and provides a different set of coordinates for each pixel to the pixel shader program. That is an example of how identical copies of a pixel shader program can produce different results for each pixel.
 
Last edited by a moderator:
Since You mentioned vertex attributes and position , I would like to know about the reason for using the extra w parameter , I understand that location can be specified by X Y Z values , why add a fourth one ?

Surprisingly, it makes the math easier(!). It lets you express most transforms using simple matrix multiplication. 4 is also a much nicer number for computers than 3 is.

"4 is a much nice number..." has got to be the worst explanation ever.:???: (The sphere as a triangle mesh explanation was good though)

@David: The 4 dimensional vectors are because the system will use "Homegeneous coordinates". (Hmm that link was a perhaps too heavy - maybe try here for a nicer intro) You'll encounter them in a standard course on Linear Algebra.

In essence, these allow you to combine all the many arbitrary rotations, scales, and translations needed to animate objects into a single transformation matrix. They also allow you to do most of the operations needed to for the 3D to 2D projection efficiently.
 
Worst ever? Oh please, you give me too much credit. I can come up with a lot of worse ones.
 
Usually, there's a combination of both being used: the VS computes part of the lighting equation, and the PS/FS evaluates it per-fragment.
You mean fragment shader ? what is the differnece between it and Pixel shader ?
The object rendered is rarely matches the polygonal representation. For example, to draw a sphere, you'd render a lot of little triangles. However, lots of little triangles are not the same as a sphere. The lighting would be all wrong if you rendered a lot of little triangles. It would look like those cheap Christmas ornaments.

So in addition to representing the sphere with a bunch of little triangles, surface normals are provided. However, the normals are computed such that they would really be normals from the original sphere model, and not just the mesh of triangles. With the real normal information, lighting for the sphere can be computed much more accurately.
Yes ! You mean the difference between Flat shading and Phong shading , right ?
 
One of your questions was about how vectors are handled; they just get represented as a set of 3 or 4 numbers as well. The shader program or hardware knows the context of each set of numbers it gets, so it can treat the numbers as part of a vector, part of a vertex, part of a pixel, or just a number as appropriate.
You nailed it man , such a relieve , my pain can finally rest knowing that vectors have the same representation as vertices . :D

Question : Since the GPU handles many vertices position values , these values must relate to a fixed values on an axis (ie, the X or Y or Z axis) does the GPU need a global coordinates that are stored in memory , ot it just prcoess values and then compare them to Screen coordinates ?

re: texture coordinate interpolation vs filtering, the interpolation of texture coordinates is what lets each pixel pick up a different value from the texture. The application usually provides texture coordinates for each vertex, then the hardware interpolates those values and provides a different set of coordinates for each pixel to the pixel shader program. That is an example of how identical copies of a pixel shader program can produce different results for each pixel.
I get it now , :smile:

Another question concerning interpolation , supposedly we have a big triangle , whose 3 vertices have the color of Red , during color interpolation for the pixels inside that trinagle , the pixel in the middle would have a color of faint red , (due to the process of interpolation) , which is wrong ! wouldn't it just be easier if the vertices color was copied over to all the pixels inside the triangle ?
 
@David: The 4 dimensional vectors are because the system will use "Homegeneous coordinates". (Hmm that link was a perhaps too heavy - maybe try here for a nicer intro) You'll encounter them in a standard course on Linear Algebra.
I tried to grasp the concept , but failed :cry: , must be due to my limited knowledge in mathematics (I am a biology student) , (and yes I am on the organic side not the synthetics :p) ..but all I could understand is that homogeneous coordinates help alleviate the problem of vectors that have the value of Zero (which could mess up matrix multiplication operations) , by introducing a fourth axis (w) which always have a value bigger than 0 .
 
You mean fragment shader ? what is the differnece between it and Pixel shader ?

The difference between a skedyule and shidule.

IOW, DX calls "this same thing" as pixel shader and OGL calls them fragment shaders.

Yes ! You mean the difference between Flat shading and Phong shading , right ?

Sort of, but not really.

Flat shading and Phong shading are shading models. Shading models tell you how to calculate the color of pixels/fragments. What was alluded to in that post was how doing more math can give you more accurate visuals. You can have more accurate models applied more coarsely, (say once per vertex instead of once per pixel) giving less accurate results overall.
 
I tried to grasp the concept , but failed :cry: , must be due to my limited knowledge in mathematics (I am a biology student) , (and yes I am on the organic side not the synthetics :p) ..but all I could understand is that homogeneous coordinates help alleviate the problem of vectors that have the value of Zero (which could mess up matrix multiplication operations) , by introducing a fourth axis (w) which always have a value bigger than 0 .

It seems you are trying to have a go at both 3D hardware and 3D software/rendering at the same time. Commendable goal, indeed. Assuming you are equally interested in both, I'd however suggest understanding 3D rendering/software/APIs first as then you will be able to understand 3D hardware and more importantly the "why" of 3D hardware.

The point of going to 4D vectors to represent 3D data is so that

1) 3D matrix-vector multiplication (for eg. rotation), 3D vector addition (for eg. translation) and perspective correction (ie, farther objects looking smaller) can be combined into 4D matrix-vector multiplication

2) this combination is useful as doing 1 'generic' thing many times is often simpler in hardware over doing many specialized things.

Initially, there was no 3D hardware and all this was done in highly optimized software. Slowly, more involved bit of the overall process began to migrate to hardware (aka evolution of gpu's) to increase overall speed. This process was complete by the time of Geforce 256, when it became possible to implement (almost?) the entire graphics pipeline in hardware.

At this time GPU's could do one and only one thing. 3D graphics. And on top of that, they could pretty much do it in only one way. (ie, they were fixed function)

From this point onwards, GPU's started becoming more flexible and allowed programmers to specify certain sequence of instructions (sugarcoated in nice shading languages like GLSL and HLSL) and started dropping the older fixed function hardware. This process was complete by the time of Geforce 6 series, which did all the "present day shading work" in software, but this time on GPUs.

HTH.
 
The difference between a skedyule and shidule.
IOW, DX calls "this same thing" as pixel shader and OGL calls them fragment shaders.

Thanks for the clarification .

Flat shading and Phong shading are shading models. Shading models tell you how to calculate the color of pixels/fragments. What was alluded to in that post was how doing more math can give you more accurate visuals. You can have more accurate models applied more coarsely, (say once per vertex instead of once per pixel) giving less accurate results overall.
aha ! I see .

1) 3D matrix-vector multiplication (for eg. rotation), 3D vector addition (for eg. translation) and perspective correction (ie, farther objects looking smaller) can be combined into 4D matrix-vector multiplication

2) this combination is useful as doing 1 'generic' thing many times is often simpler in hardware over doing many specialized things.
That was truly helpful .. thanks man .
So it is just a mathematical way to combine different operations .. now I get it .

From this point onwards, GPU's started becoming more flexible and allowed programmers to specify certain sequence of instructions (sugarcoated in nice shading languages like GLSL and HLSL) and started dropping the older fixed function hardware. This process was complete by the time of Geforce 6 series, which did all the "present day shading work" in software, but this time on GPUs.

So those modern Shader cores are more flexible than the old fixed function vertex and pixel processors ?

I always thought that the old Pixel processors were small multipliers and the vertex processors were small adders .. is that true ?

If no , then how come those modern shader cores have the ability to do multiply and add operations at the same time ? is that possible in hardware ?(I mean to have small processors that contain adders and multipliers in the same die) If so .. then how this "hardware" would know if it needs to do multiply or add operations ?

also could you give it a look at these :

Question : Since the GPU handles many vertices position values , these values must relate to a fixed values on an axis (ie, the X or Y or Z axis) does the GPU need a global coordinates that are stored in memory , ot it just prcoess values and then compare them to Screen coordinates ?

Another question concerning interpolation , supposedly we have a big triangle , whose 3 vertices have the color of Red , during color interpolation for the pixels inside that trinagle , the pixel in the middle would have a color of faint red , (due to the process of interpolation) , which is wrong ! wouldn't it just be easier if the vertices color was copied over to all the pixels inside the triangle ?

I know I am asking a lot , thanks in advance ..
 
The difference between a skedyule and shidule.
IOW, DX calls "this same thing" as pixel shader and OGL calls them fragment shaders.
Technically there's a difference, since the rasterised triangle primitive might not cover the entire pixel. It's potentially a fragment, hence the name.
 
So those modern Shader cores are more flexible than the old fixed function vertex and pixel processors ?

I always thought that the old Pixel processors were small multipliers and the vertex processors were small adders .. is that true ?

If no , then how come those modern shader cores have the ability to do multiply and add operations at the same time ? is that possible in hardware ?(I mean to have small processors that contain adders and multipliers in the same die)
No... To do vertex transform you need both multipliers and adders (well dot products). You also need those for pixel shaders/multitexturing. The difference in old hardware was that pixel shaders/multitexturing could be lower precision while vertex shaders had to be 32 bit floating point even back then.
Another advantage of todays hardware is that it has unified shaders which means that hardware can asign shader cores for vertex shading, pixel shading,... depending on the load.

If so .. then how this "hardware" would know if it needs to do multiply or add operations ?
Well depends on what the program/shader says it wants.
 
No... To do vertex transform you need both multipliers and adders (well dot products). You also need those for pixel shaders/multitexturing. The difference in old hardware was that pixel shaders/multitexturing could be lower precision while vertex shaders had to be 32 bit floating point even back then.
I see , by lower precision you mean 16 bit ?

There is something itching me though .. can each core do a single MUL or Add per clock , or is it the whole block/sub-block of cores ?
 
Depends what you call a core. On a current generation NV GPU, a core is made up of three blocks, each one running the same instruction each on a set of pixels (eight) in a given clock, in lock step. But that's three potentially different instructions in a clock per core there. Previous unified hardware was two, Fermi is two.

For ATI, the core (ATI call it a SIMD) will process 16 pixels per clock, with potentially 5 different instructions per pixel in that clock, but the same 5 instructions for each pixel, in lock step.

As for lower precision, yes, FP16 and FP24 have variously been the common floating point precision in programmable hardware in recent years for pixel shading, with FP32 mandatory in recent generations. There are various other different precisions at work across the chip depending on what's being computed (both very high and quite low), but for the generally programmable logic in a modern unified architecture is heavily 32-bit, integer or float.
 
Back
Top