Graphics hardware has come a long way in the last 10 years. From the original SGI cards that required a separate cabinet, through fixed-function pipelines, through SIMD towards MIMD. So, what direction are we heading?
First, the challenge was in fitting everything inside one chip, of course. When the original SGI cabinet fitted on one card, the focus shifted towards speed and features. More options to produce better graphics. And speed to increase the amount of objects, to produce better graphics as well.
But a fixed-function pipeline is like a factory. Compared to a car factory, it went from "any color, as long as it's black" to "What acessories do you want with what model?". And at that moment it hit the wall.
The pipeline model enabled you to choose if you wanted to use each option and let you choose from the available models, but it didn't offer the possibility to add something that wasn't in stock, or for which there was no machine to produce it.
You had your basic building blocks and the limited ways in which they could be connected. And that was it. Although a lot of clever ways were designed to do additional things, like using textures to pre-compute certain effects.
The things that improved the result and were valued by the designers were added as new machines and process steps. But it was still just an assembly line to churn out as much models as possible, consisting of predefined parts.
The clever hacks that were incorporated, mostly the lot of clever things you could do with textures, inspired the use of shaders, that could compute the values from those textures on the fly. And enabled new ways to transform the view of the model.
Yea, DUH!
Ok. As we saw with the evolution of CPU's, at some moment it becomes unfeasible to just add new functions, especially as you realize, that you can do it all, and faster as well, by just combining less functions faster. RISC.
That has happened already with GPU's, as most current ones emulate the old, fixed function pipeline in software. And it seems almost inevitable to continue going RISC. But they have hit the wall again.
This time, the problem is SIMD. Quads. Anything can be produced, as long as you take multiples of four of them. If you only want one, throw three away. But it is not feasible to reduce the quads to single pipes, as you would only be able to produce less than half the amount of pixels for the same number of transistors.
The vertex processors are MIMD, but they gain a lot less from SIMD. They produce single instances anyway. Or you could use SIMD to produce a lot of identical units. But, that is less desirable, because even if the objects are essentially the same, you would want them to be individual. Like when creating a crowd, you want all of them to be unique.
You could fix that by breaking up the objects you want to duplicate and use different colors and models, that you fit together. Essentially, you produce a fixed-function pipeline to generate objects...
So, the main problem for both types of shaders is producing lots of identical units versus producing less unique ones.
While the functionality of the VS and PS will be merged, the function of both units is quite distinct, although they perform the same operations if you break down those operations far enough and disregard the different storage and sorting requirements. And those last two are interesting.
We have buffers filled with vertices, triangles, textures, heightmaps, cubemaps, offsetmaps, lightmaps, z-maps, pixels, shaders, fragments and lots of other intermediate objects. All of which are added to support the assembly line model. Most of them have no use in themselves, you just need them to supplement the current model of creating graphics.
Now, there are more ways to render a scene than the most common one, like tile-based and ray-tracing. And most ways to speed up the brute-force approach from the immediate one that uses vertices and a rasterizer requires a representation of the whole scene to do it more clever.
Instead of just processing random vertices, fitting those together for rasterization and churning out pixels as fast as possible, you can assemble them all into triangles and a full scene first. Essentially, you use surfaces instead of vertices, pixels and all the intermediate values (like Z) you need to produce a scene directly from them.
If you have all surfaces, you can do ray-tracing. Or you can do real anti-aliasing, by adding the colors of all the subpixels at the divisions of the triangles.
To do all that and more, we need a clever way to store all those surfaces, so we can look up all relevant (sub-) pixels as fast as possible. And we need descriptors, for things like textures, transparency, fog and light sources (intensity).
Also, if we use whole models, we need to know what is solid. Therefore, wouldn't it be easier to describe not the solids, but the space inside those solids? For CGI, any scene is indoors!
Done right, this gives us free collision detection, real anti-aliasing and easy ray-tracing.
So, instead of rendering (or discarding) all individual vertices and pixels, we would calculate a single object, that consist of a single mesh, that fills all available open space. And we only render that single object.
That way, the bottleneck represented by quads and brute force processing becomes moot. And as the CPU becomes more and more the bottleneck for pre-calculating effects like water, displacement mapping and whatever, we can do all of those things at the same moment we calculate the single mesh!
That also makes it quite easy to develop a unified shader and create a real standardized way to render graphics.
What do you think?
First, the challenge was in fitting everything inside one chip, of course. When the original SGI cabinet fitted on one card, the focus shifted towards speed and features. More options to produce better graphics. And speed to increase the amount of objects, to produce better graphics as well.
But a fixed-function pipeline is like a factory. Compared to a car factory, it went from "any color, as long as it's black" to "What acessories do you want with what model?". And at that moment it hit the wall.
The pipeline model enabled you to choose if you wanted to use each option and let you choose from the available models, but it didn't offer the possibility to add something that wasn't in stock, or for which there was no machine to produce it.
You had your basic building blocks and the limited ways in which they could be connected. And that was it. Although a lot of clever ways were designed to do additional things, like using textures to pre-compute certain effects.
The things that improved the result and were valued by the designers were added as new machines and process steps. But it was still just an assembly line to churn out as much models as possible, consisting of predefined parts.
The clever hacks that were incorporated, mostly the lot of clever things you could do with textures, inspired the use of shaders, that could compute the values from those textures on the fly. And enabled new ways to transform the view of the model.
Yea, DUH!
Ok. As we saw with the evolution of CPU's, at some moment it becomes unfeasible to just add new functions, especially as you realize, that you can do it all, and faster as well, by just combining less functions faster. RISC.
That has happened already with GPU's, as most current ones emulate the old, fixed function pipeline in software. And it seems almost inevitable to continue going RISC. But they have hit the wall again.
This time, the problem is SIMD. Quads. Anything can be produced, as long as you take multiples of four of them. If you only want one, throw three away. But it is not feasible to reduce the quads to single pipes, as you would only be able to produce less than half the amount of pixels for the same number of transistors.
The vertex processors are MIMD, but they gain a lot less from SIMD. They produce single instances anyway. Or you could use SIMD to produce a lot of identical units. But, that is less desirable, because even if the objects are essentially the same, you would want them to be individual. Like when creating a crowd, you want all of them to be unique.
You could fix that by breaking up the objects you want to duplicate and use different colors and models, that you fit together. Essentially, you produce a fixed-function pipeline to generate objects...
So, the main problem for both types of shaders is producing lots of identical units versus producing less unique ones.
While the functionality of the VS and PS will be merged, the function of both units is quite distinct, although they perform the same operations if you break down those operations far enough and disregard the different storage and sorting requirements. And those last two are interesting.
We have buffers filled with vertices, triangles, textures, heightmaps, cubemaps, offsetmaps, lightmaps, z-maps, pixels, shaders, fragments and lots of other intermediate objects. All of which are added to support the assembly line model. Most of them have no use in themselves, you just need them to supplement the current model of creating graphics.
Now, there are more ways to render a scene than the most common one, like tile-based and ray-tracing. And most ways to speed up the brute-force approach from the immediate one that uses vertices and a rasterizer requires a representation of the whole scene to do it more clever.
Instead of just processing random vertices, fitting those together for rasterization and churning out pixels as fast as possible, you can assemble them all into triangles and a full scene first. Essentially, you use surfaces instead of vertices, pixels and all the intermediate values (like Z) you need to produce a scene directly from them.
If you have all surfaces, you can do ray-tracing. Or you can do real anti-aliasing, by adding the colors of all the subpixels at the divisions of the triangles.
To do all that and more, we need a clever way to store all those surfaces, so we can look up all relevant (sub-) pixels as fast as possible. And we need descriptors, for things like textures, transparency, fog and light sources (intensity).
Also, if we use whole models, we need to know what is solid. Therefore, wouldn't it be easier to describe not the solids, but the space inside those solids? For CGI, any scene is indoors!
Done right, this gives us free collision detection, real anti-aliasing and easy ray-tracing.
So, instead of rendering (or discarding) all individual vertices and pixels, we would calculate a single object, that consist of a single mesh, that fills all available open space. And we only render that single object.
That way, the bottleneck represented by quads and brute force processing becomes moot. And as the CPU becomes more and more the bottleneck for pre-calculating effects like water, displacement mapping and whatever, we can do all of those things at the same moment we calculate the single mesh!
That also makes it quite easy to develop a unified shader and create a real standardized way to render graphics.
What do you think?