GraphixViolence said:
Ilfirin said:
It won't look as nice, but through pre and post processing each step you can achieve close, approximate data. Infinitly long shaders might be extremely slow and not very pretty on DX8 class hardware, but it works and that's all that really matters from a programming point of view. 320x240 looks really ugly too, but that's what you are going to have to run Doom III at if you have a GF1. Same scenario here, it will run on your DX8 cards, but it probably won't be very nice to look at..
I don't think it's that simple... I'm sure there's some effects that simply wouldn't work, unless you actually wrote a new fall-back shader and possibly even created new textures to run on older hardware. For example, what if you wanted to use high dynamic range environment maps in your scene... without support for floating point texture formats (or at least integer formats greater than 8 bits per channel, which also requires DX9), your hardware wouldn't even be able to read in the textures, so you would have to create new ones. There's no way a compiler could ever do those kinds of things for you, even if you were willing to accept an "ugly" image.
In this case you would simply convert the HDR texture map to a normal 32-bit INT texture map at load-time and use that where you would normally use the HDR map.
The problems arrise is with max simultaneous textures. If you were decomposing a long shader into a bunch of small, ps1.0 shaders you would be writing the intermediate steps to the render target, which would be fed as input into the next stage. This isn't just for the entire shader, but for the lifetime of each variable. So, if you have only 1 temporary variable that you change throughout the length of the shader and then use it at the output (ex: out = x * Tex0
, you would first have to evaluate 'x' to it's final value, which might take many many passes, output that to a render target and feed that render target as an input back into the original shader, which would be evalutated through the same way, to a final value.. actually let me give a very crappy example (pseudo code):
float4 Tex0 = lookup(Texture0);
float x = L dot H;
for(int i=0; i<16; i++) {
x*=x;
}
Out = x * Tex0;
x will be calculated over a few passes on ps1.0 hardware, to the final value of (x)^(x^16), the output for this stage is a grayscale texture that is then modulated with the original texture.
In this case, the shader could very well be decompossed to multi-passed DX8 shaders, but only 1 pass would be required for the actual directly related work (of modulating a texture with a specular value), whereas it might take quite a few passes for the calculation of 'x'. This shader would decomposs to:
*Stage 0 - calculate x*
over the course of many passes calculate the final value of x and store it in texture 'TexX'
*Stage 1 - calculate final image*
input: Texture0, TexX;
float4 Tex0 = lookup(Texture0);
float4 x = lookup(TexX);
Out = Tex0*x;
What was the point of this? I'm getting to that. This is with only 1 variable that was updated over the course of the shader - I am very unsure how well, if possible, it would work when you have a whole bunch of variables with the lifetime of the entire shader (by lifetime I mean the time between when it is initialized to when it's filled with it's final value). It should work.. but require a whole bunch of extra passes. Another example:
float4 Tex0 = lookup(Texture0);
float x = L dot H;
float y = L dot H;
float z = L dot H;
float w = L dot H;
for(int i=0; i<16; i++) {
x*=x;
}
for( i=0; i<16; i++) {
y*=y;
}
for( i=0; i<16; i++) {
z*=z;
}
for( i=0; i<16; i++) {
w*=w;
}
Out = x * y * z *w * Tex0;
(I have no idea why someone would want to do this, it's just for examples sake)
This would decomposs the same way, but you get a problem at the end - 5 textures(x, y, z, w, tex0), but DX8.0 cards only have 4 textures. Hence you would have to decomposs it even furthur and evaluate (x*y*z*w) in one pass, then (Result * Tex0) in another.
My point (sorry for going all round about.. i'm very tired right now) is that there is very little (if any) relation to the number of instructions of the long shader and the number of passes*number of instructions of the short shaders. You might end up with more passes of a few instructions each than the number of instructions of the long shader.. in other words: things might decomposs in very round about ways to a lot of passes; a card that can execute a 512 instruction shader might take a lot more than 2 passes to execute a 1024 instruction shader.
Just kinda showing one bad side of what I have been talking about in this thread.. now for the other (what GraphixViolence has been talking about).
Given that example shader above, it will be very bad if all the intermediate results for calculating the variables were all stored in INT format and rounded off. Say (taking the first shader) L.H = 2 (x=2), x
SHOULD equal 2^65536 -1, or 2.0035e+19728 -1, but in the case of INT32 format what would actually be written is 255.. quite a big difference eh? And hence the second problem: you might end up with DRASTICALLY different results on cards that don't have sufficent output color depth. To the point where you don't even want it running on those cards (in this example even 128-bit would still be very much insufficent).. but - it would still run
With these downsides why would one still want this rather than wait for hardware you might ask? Well, the shaders shown here are examples of the rarest of cases. You are never going to need to raise something to the power of it's self to the power of 16, and at that point you would still have the problems of not enough precision on DX9 class hardware.. or any hardware for many many decades (if ever).
If a hardware company decided to go upon the path of executing without limits shaders in hardware TODAY (if they started 6-12 months ago, that's a different story), we wouldn't see that card for atleast 1.5 years, and then we would have to wait 2-4 years for the LCD to rise to that level to use the card. That's 3.5 years minimum and atleast 5.5 years max. Given the development time of games today, if you started working on your game the day the card was released, by the time you released your game that card would probably be the LCD; but that is still a lot of un-needed waiting.
In short:
Need the flexibility NOW, across everything from DX8 cards to DX9 and beyond BUT just because we have the flexibility doesn't mean we have to be careless and generally stupid when writing the shaders.
**** Post intentionally neglects cube maps, 3d textures and anything more than DX8 level pixel shader functionality.
**** In case this hasn't been established, the output of this process wouldn't be a shader, but something similar to a D3DX effect file