New volumetric fogging demo

Humus

Crazy coder
Veteran
I've uploaded a new volumetric fogging demo which raytraces through the fog (which is stored in a 3D texture) updating the fogging in each iteration and also takes shadows into account.
volumetricfogging2.jpg


Grab at the usual place.
 
Nice. I'm working on a new implementation of the depth-peeled volumetric shadows I did a while back, but this time without the depth peeling :) I must post some pics...

Are you using an in-scatter based approach to calculating the illumination?
 
Nice sauna, Humus!

It totally kills my 7800GS, and I didn't get playable FPS as I went down from 1600x1200 4xAA (5 FPS) to 1024x768 NoAA (12 FPS).

Maybe it has something to do with my driver settings? Or is this just very demanding? Maybe my CPU? 2.4@2.8 GHz P4C HT on. Forceware 91.47.
 
Are you using an in-scatter based approach to calculating the illumination?

It's basically just a shadow mask stored in a volume texture. I don't compute any scattering of any sort. I guess you could improve quality a bit by adding some uniform scatter into that volume.

Nice sauna, Humus!

It totally kills my 7800GS, and I didn't get playable FPS as I went down from 1600x1200 4xAA (5 FPS) to 1024x768 NoAA (12 FPS).

Maybe it has something to do with my driver settings? Or is this just very demanding? Maybe my CPU? 2.4@2.8 GHz P4C HT on. Forceware 91.47.

It's very demanding on the GPU but nearly zero on CPU. I don't know what's reasonable performance for a 7800GS, but I'm getting 51FPS at 1024x768 NoAA and 21FPS at 1600x1200 4xAA on my X1800XL AIW.
 
Solid 17-18fps at 1920*1200 on my x1900gt.
Also 21fps at 1600*1200.
AA seems to have no effect on FPS (low geometry?)
 
Last edited by a moderator:
didnt want to work on my gffx5900
firstly cause the cg1.6 compiler wanted a statement for the prez fragment shader ( so i stuck one in )
but now when i run it nothing gets displayed
also the memory usage is very high >1.2gb and climbing after ~30secs so i killed it (both times i ran it)
 
Nice.

24 FPS at 1920x1200 and 6xAA on a X1900XT. The AA setting had no effect, same framerate at 0x as 6x.
 
The shader just has a for loop doing 40 iterations. No early exit, so the whole thing can be unrolled trivially.

The real performance culprit (in the shader anyway) is the 82 texture reads (81 of which are 3D textures). I haven't checked the memory usage, but if you're spilling out of video memory, then that's no good either.
 
AA seems to have no effect on FPS (low geometry?)

Yeah, it's low geometry. I see about 1-2% performance hit of 6x AA. The expensive part is the long fragment shader, which I expect to overshadow the cost of resolving the multisample buffer.

the shader is using branching?, quite a bit of a performance hit on nV cards ;)

nice demo btw!

It's static so it should be unrolled. I tried to use dynamic branching, but could not get rid of banding where it would take a different number of iterations. I tried a few approaches such as starting with the fractional part as the first iteration, but could not get rid of the banding so ultimately I decided to use static branching. It's unfortunate, because I got a fairly decent performance increase by using branching, especially when viewing surfaces up close where the average loop count went down significantly. With a working dynamic branching solution performance could probably be twice as high on average.

The real performance culprit (in the shader anyway) is the 82 texture reads (81 of which are 3D textures). I haven't checked the memory usage, but if you're spilling out of video memory, then that's no good either.

Memory usage isn't an issue. There are two 3D textures, one is 128x128x128 which is 2MB and one is 64x64x64, which is 256KB. Lookup in these textures didn't seem to be that big of an issue on my system anyway. Using smaller textures did not affect performance, however, simplifying the math did. In the final version each loop costs 4 ALU instructions on ATI and 2 volume lookup at two cycles each, meaning that it's a 1:1 ALU/TEX ratio, which is about ideal for X1800. X1900 will be TEX limited though.
 
What software did you use to create this Humus? How long did it take you? (I know your probably still tweaking but given at its current state?)
 
MSVC 6.0. I had the map since before, but I created it in UnrealEd some time in the past.

Actual implementation took I guess something like 5 hours spread over a few days, if we don't count all the time spent on getting it up and running in Linux, which included reinstalling Linux because my previous installation broke and fixing a bunch of warnings that GCC 4.1.1 gave in the framework that GCC 3.x didn't.
 
I do my demos without any influence from my fellow coworkers, if that's what you meant. I just keep doing what I did before I joined ATI.
 
hmm something wrong with nV cards, cause I'm getting 5-10 fps at 1600x1200, 7800 gtx 512
Looks like there's a compiler bug. The static loop was turned into a dynamic one (thus incurring the full cost of dynamic branching), because the compiler couldn't statically unroll the loop without a huge performance drop: it ended up with some 25 registers used(!) when far less are really needed. Oddly enough, this also happens if you manually unroll the loop.

I'm not sure if the fix made it in time for the next driver release or not, but it's definitely in the code tree now.
 
Back
Top