DX, SM, OGL, and why you would bother.

Dave Baumann · Dec 20, 2004

Geometry instancing is "a feature" that is "tied to VS3.0". There's no reason why it needs to be, the API specification for it was implemented together and MS has tied them togther so as not to have another cap for it. In effect you can look at it both ways.

Acert93 · Dec 20, 2004

DaveBaumann said:
Geometry instancing is "a feature" that is "tied to VS3.0". There's no reason why it needs to be, the API specification for it was implemented together and MS has tied them togther so as not to have another cap for it. In effect you can look at it both ways.

Thanks Dave, you explained that a lot better than I did!

My only thought was that while SM 3.0 mainly allows long shaders at this point, like you noted Microsoft did seem to "tie" the Geometry Instancing feature to the VS 3.0 spec... so while this feature is not limited to SM 3.0 hardware, all SM 3.0 hardware needs it. So in that way it is a SM 3.0 benefit that pre-SM 3.0 chips could have (as in this case certain Radeons do have this feature). Oh well, you explained it very well

For this project DiGuru is doing, it might be worth noting what features are tied to what API, or maybe even hardware, release. I know when Firingsquad.com reviewed Half-Life 2 they had a breakup of what features were used in DX 7, 8, and 9 modes. Since it seems specific features are attached to different releases, but are not specifically Shader Model features, they should be listed also?

I am not sure how complex you want this project to get, but this information would be useful to some... like me

AlNom · Dec 20, 2004

There're extensive glossaries at futuremark and nvidia....

Frank · Dec 20, 2004

akira888 said:
DiGuru said:

DX 6 and lower
All these graphic chips are only able to generate pixels according to triangles, skinned with textures. Looks pretty bland and is not supported anymore.

Click to expand...

Not precisely true, D3D 6.0 had API support for an EMBM renderstate - the first dependent texture lookup mechanism in DX to my knowledge. Kyro and Radeon supported this, much later Geforce 3(+) did as well.

Thanks! Fixed.

DX 8.0: This includes a Hardware Abstraction Layer, so it was the first model that made it easy to program 3D graphics

Click to expand...

The HAL device is older than Direct3D 8. Typically most D3D implementations provide two rendering devices - a HAL (basically an interface to your video card) and RGB emulation (a software renderer).

Yes, but the DX 8 HAL was the first one that didn't used an intermediate target, thereby making it actually useful. How could I tell that without getting really technical?

PS 1.1: The DX 8.0 implementation from nVidia. It allows the developers some basic calculations in 4 to 12-bit integer values, which offers possibilities, but is too coarse (pixels "jump" from color to color) to make interesting effects possible.

Click to expand...

The OpenGL documentation for register_combiners(2) have the precision of combiner operations at 8 bits + 1 bit sign.

Ok. Fixed.

Frank · Dec 20, 2004

Reverend said:
A scene

Click to expand...

Hmm... should there be something about scenegraph?

What is that?

Vertex shaders
They allow the programmers to manipulate the geometry according to a set of rules.

Click to expand...

Hmm... now you're talking about sw when for pixel shaders you talked about hw...

Well, I explained they are used interchangeable. While the one meaning might not be totally correct, it is how it is used in practice. It might go too far if we tried to only give the correct meaning, as that would cause new confusion.

And I changed the other things

Frank · Dec 20, 2004

If anyone wants to explain a certain field, like the possibilities of shaders, image "corrections" like gamma corrections, AF and AA, things like the benefits of a high-level language (GLSL) that is compiled by the driver, transistor budgets versus clock frequency, or anything else really, feel free to do so!

That way, we can combine them, make a nice index and we have a really nice introduction to what we are all talking about in here. That would be a great feature!

Reverend · Dec 21, 2004

DiGuru said:
Reverend said:

A scene

Click to expand...

Hmm... should there be something about scenegraph?

Click to expand...

What is that?

Google!

Unknown Soldier · Dec 21, 2004

Where does PS 1.2 fit in? Didn't any IHV chips support it?

Also what are board manufacterers again i.e. Gigabyte, Asus etc. .. ?HV or something?

What about the different OGL versions. Nothing is stated. You've only stated DX.

Talk about DXT5 and it's benefits. Same with 3Dc. Also the fact that MS is looking to add 3Dc to WGF should be important to mention it.

What about WGF and the different versions i.e. 1.0 and 2.0

What about DDR, DDR2, DDR3 and GDDR3. They do have an effect on graphics cards and so are important.

US

Ostsol · Dec 21, 2004

Support of any pixel shader requires backwards support for previous versions. As such, the NV25 and above, and the R200 and above all support it.

Frank · Dec 21, 2004

Ok. Part 2.

Bigger is better.

There are different approaches to rendering 3D scenes. If you have your whole collection of objects and rules, you need to put them in the right place, transform and render them. That is called scene management. And when you have assembled your scene, you send it all to the graphics chip, together with all the rules, go sit back and watch the show. Right?

Well, that would be the ultimate goal. But it isn't as easy as that. For starters, while DX 9 class graphic chips can do quite a lot of generic processing, they can't do it all and there is still no easy way to apply things like physics rules and start an appropriate sound at the right place and moment. Further, the memory needed to contain all that is limited. More on that later.

If we are going to look at the functions that can be executed by the GPU (Graphics Processor Unit, the chip), we see that there isn't a chip in existence that can do all that is in the specs of the API (Application Programming Interface, the thing the programmers use to make it do what they want it to). So, you have to mix and match functions according to what is available for the current hardware to get the result you want.

There are basically two different ways to handle that. The first one, as used by DirectX, is just to flag what can be done by the hardware and discard the rest. The other one, as used by OpenGL, is to emulate everything the hardware cannot do in software. Both have their pro's and cons: while just about anything will run on OpenGL, it might be VERY slow, if most of it is emulated by the CPU (Central Processing Unit, the processor on your motherboard). With DirectX, it will all run fast, but it might look totally different than you expected it to.

Even so, a lot of work isn't done by the GPU as expected, but by the drivers, who use the CPU to do their work. And some things might run on the GPU, but too slow to be of any practical use.

Which brings us to the central point: size matters.

Now we have a basic platform that can do just about anything you want with the current generation of graphic chips, you want it done fast, so at least 30 fps are shown.

How much memory do we want the GPU to have? As much as possible? Sure! But the memory is only there, next to the GPU, so that it can be accessed faster than the memory on your motherboard. If we could get at that memory fast enough, we wouldn't need any other memory next to the GPU. Speed is everything.

If we create a very nice and huge 3D world, we would like to be able to give it all to the GPU and let it sort out the rendering. But it might be too much to fit into it's memory and take way too long to render a frame. So we need to clip it. That means, that we remove everything that isn't visible. And what's left is send to the GPU and rendered.

To do that, we need a way to store all the locations of all the objects and determine what parts are visible. A scene graph. Which is a pretty hard thing to do, as a random number of 3D objects of all sizes on random locations don't fit neatly into a grid.

All these things require basically two things: very fast communications and pixel rendering. Which translates to very fast RAM and as many pipelines as you can get.

When we talk about fast memory, we talk about two things: the time it takes to get the data from memory (latency), and the total amount of data you can get in a certain time (bandwidth).

The CPU in your computer basically runs a single sequence of instructions, so it wants a very low latency, as the wait for the next data might stall the whole processor. A GPU on the other hand, basically executes a large amount of simple tasks in parallel (the calculation of all the pixels in a frame), so it can start with the next pixel while the first one is waiting for data to arrive. So, it can use all the data that arrives and it will only stall if the whole capacity is used to wait for data to arrive. Therefore, it is dependent on bandwith.

The "size" part in the RAM is therefore not the total size of the RAM, but the amount of data that arrives at the GPU every second. And this goes a long way: a bit of superfast memory offers better possibilities than a very large amount of slow memory. The size of the bandwith is everything.

RAM memory is, compared to a CPU or GPU, terribly slow. And it doesn't help, that you have to push all that data through wires, from the RAM chips to the GPU and back (the bus). Those wires are embedded in the graphic board, as copper traces. And the chips need pins to connect to those traces. So, making a ultra-broad bus would be prohibitely expensive while designing the board and chips, as the chips need very many pins (and become huge) and all those traces have to travel through the board to all the chips, so the board would become very fat and very expensive to make.

To solve that, they use DDR (Double Data Rate) RAM. That is memory, that sends double the amount of data over the bus. And there is even quad-pumped DDR nowadays, which theoretically quadruples the bandwidth.

To counter all the waiting and stalling (latency), caches are used. Those are very fast buffers on the GPU, that store the data for immediate use. So, why not put all the memory on the same chip as the GPU?

The cost of a chip is determined mostly by two things: the size of the die (the rectangle that contains the millions and millions of microscopic transistors) and the amount of chips that are not defective after manufacturing (the yield). If you make very large chips, very many of them will be defective, while only a small amount can be produced at once (they come off a slice of silicium the size of a CD). Which makes the few remaining ones extremely expensive.

So, in one case, bigger isn't better!

Let's recap:

- More functionality (so it runs in hardware) is better
- More general programmability (shaders, for better visuals) is better
- More bandwith (the fastest memory and the widest bus) is better
- More pipelines (so more can be done at the same time) is better
- A bigger chip would be better, if it wasn't so very expensive

Frank · Dec 21, 2004

Is this any good? Or too popular?

Frank · Dec 21, 2004

Unknown Soldier said:
Talk about DXT5 and it's benefits. Same with 3Dc. Also the fact that MS is looking to add 3Dc to WGF should be important to mention it.

US

I tried to fit this one in as well, but it requires an explanation of vector processing first. I think that should be the next part.

Mate Kovacs · Dec 22, 2004

IMO it would be a good practice to refer the pixel/vertex shader hardware itself as "shader unit", and the code that runs on that simply "shader" (thus avoiding confusion). What do you think?

DX, SM, OGL, and why you would bother.

Dave Baumann

Gamerscore Wh...

Acert93

Artist formerly known as Acert93

AlNom

Moderator

Frank

Certified not a majority

Frank

Certified not a majority

Frank

Certified not a majority

Reverend

Unknown Soldier

Ostsol

Frank

Certified not a majority

Frank

Certified not a majority

Frank

Certified not a majority

Mate Kovacs

Similar threads