Panajev2001a
Veteran
The rest depends where the bottleneck is: if you are using an OpenGL or DirectX kind of pipeline and you are bottlenecked by Fragment Operations badly
Eh?
That’s not a given – if up o the developer what they do with the resources available to them. There’s nothing inherently fragment limited unless the developer chooses for things to be that way.
Of course Dave, I was just following your example of being Fragment Processing Limited ( I was not asserting that those pipelines are inherently Fragment processing limited ): I specified two traditional rendering pipelines because if I went the REYES-like the discusssion would have changed a bit too much.
You still have to sample those textures though, and as far as we can see so far the only dedicated hardware is down in the Pixel Engine in the Visualiser.
That is true too, uless we plan to sample them with some dedicated APUs, yes... in software.
Alternatively you can transmit in the Apulet The Fragment Shader with no dependancy on texture sampling: execute until you sample the texture for the pixel program and then send the pixel program with the sample to be executed by another APU on the Broadband Engine.
Not really, or at least not in the implementations that its been put to so far – 9800 has 96 instruction slots, which could potentially be executed in about 60 cycles (best case) and that is more of a bottleneck than passing anything out to the external F-Buffer memory with the bandwidth available to 9800’s (and generally bandwidth has scaled with performance, and future hardware will have larger instruction counts). So the F-Buffer itself doesn’t necessarily slow anything down, just the length of the shader in the first place.
I was not saying that the F-buffer by itself was slowing anything down: it is as you say as well the fact that the shader is very long and if you did not have the F-buffer and you did not want to break the Shader manually into multiple rendering passes that Shader might crash or might not be executed as it would pass the instruction Slots limit.
The F-buffer in that case is a salve the day situation, transparent to the shaders you write.
I wonder how costly would be Texture Sampling for a unit like an APU: even for Vertex Programs, fi they wanted to do Texture Look-up in there they need to sample the textures.