Mintmaster
Veteran
Chalnoth said:Also remember that multiple passes are more expensive in terms of memory bandwidth, not to mention some of the processing has to be repeated with additional passes. So the 3 passes with PS 3.0 is most likely faster than the 4 passes with MRT + PS 2.0.
I'm not sure if you guys completely understand the technique.DemoCoder said:Wait a second, that's not right. First, the PS2.0 method is rendering 2 different frames (F and B fog frames), so not only are you saving state change, and geometry bandwidth, you're also saving pixel shader execution and fillrate.
The PS 2.0 method has culling enabled for the F and B frames (different orientation each time), but the PS 3.0 doesn't, deciding whether to add or subtract on the fly. In the end, both methods draw the EXACT same number of pixels. There is no memory bandwidth advantage and no shader execution advantage.
Geometry won't make any significant difference, because it's useless to use a high poly model (pixels/vertices < 50) for volumetric fog. You just can't see that kind of detail since the edges of each volume fade out. There may be a very small speed penalty for having to cull extra polys, but that's it.
Renderstate changes are minimal: You change the D3DRS_CULLMODE and D3DRS_BLENDOP renderstates, each twice per frame. Absolutely unnoticeable.
The only tangible advantage is with one less buffer you save a couple megs of video memory, and this is due to the floating point blending, not ps 3.0. But you can do the additive and subtractive blending together in the same buffer for an RGB texture as well if you clear the buffer to 0x80808000 instead of all zeros. So this advantage is gone as well.
When you're talking about a single instruction shader to draw the fog surfaces (that single frc command which I just successfully tried, or a simple texture lookup to encode), one extra instruction will halve performance. I'd call that meaningful.Chalnoth said:I doubt one instruction will be very meaningful for performance.
You can use a single channel 32-bit buffer (D3DFMT_R32F), which R300 supports right now. I'm not sure how dithering is exposed, though. Maybe a noise texture? A noise generator? Either way, it's less bandwidth than trying to use multiple channels in a FP16 format, and will be faster than anything requiring pixel shaders for encoding or dithering.Chalnoth said:Are the added memory bandwidth requirements of the 32-bit blend cheaper than the added instructions for dithering in the 16-bit blend?
Look, I'm not saying FP16 sucks, I'm just saying FP32 blending has it's practical real-time uses when FP16 doesn't cut it. Real games often need larger z-buffer ranges than demos, and you can clearly see artifacts in both the NV demo as well as the DXSDK sample. This technique can create some very nice shadows in volumetric lights by using the same stencil outlines used for stencil shadowing. It's a very nice technique that I hope developers use to create some more atmosphere.