3 new GDC presentations from NVidia

Chalnoth said:
Also remember that multiple passes are more expensive in terms of memory bandwidth, not to mention some of the processing has to be repeated with additional passes. So the 3 passes with PS 3.0 is most likely faster than the 4 passes with MRT + PS 2.0.
DemoCoder said:
Wait a second, that's not right. First, the PS2.0 method is rendering 2 different frames (F and B fog frames), so not only are you saving state change, and geometry bandwidth, you're also saving pixel shader execution and fillrate.
I'm not sure if you guys completely understand the technique.

The PS 2.0 method has culling enabled for the F and B frames (different orientation each time), but the PS 3.0 doesn't, deciding whether to add or subtract on the fly. In the end, both methods draw the EXACT same number of pixels. There is no memory bandwidth advantage and no shader execution advantage.

Geometry won't make any significant difference, because it's useless to use a high poly model (pixels/vertices < 50) for volumetric fog. You just can't see that kind of detail since the edges of each volume fade out. There may be a very small speed penalty for having to cull extra polys, but that's it.

Renderstate changes are minimal: You change the D3DRS_CULLMODE and D3DRS_BLENDOP renderstates, each twice per frame. Absolutely unnoticeable.

The only tangible advantage is with one less buffer you save a couple megs of video memory, and this is due to the floating point blending, not ps 3.0. But you can do the additive and subtractive blending together in the same buffer for an RGB texture as well if you clear the buffer to 0x80808000 instead of all zeros. So this advantage is gone as well.

Chalnoth said:
I doubt one instruction will be very meaningful for performance.
When you're talking about a single instruction shader to draw the fog surfaces (that single frc command which I just successfully tried, or a simple texture lookup to encode), one extra instruction will halve performance. I'd call that meaningful.

Chalnoth said:
Are the added memory bandwidth requirements of the 32-bit blend cheaper than the added instructions for dithering in the 16-bit blend?
You can use a single channel 32-bit buffer (D3DFMT_R32F), which R300 supports right now. I'm not sure how dithering is exposed, though. Maybe a noise texture? A noise generator? Either way, it's less bandwidth than trying to use multiple channels in a FP16 format, and will be faster than anything requiring pixel shaders for encoding or dithering.



Look, I'm not saying FP16 sucks, I'm just saying FP32 blending has it's practical real-time uses when FP16 doesn't cut it. Real games often need larger z-buffer ranges than demos, and you can clearly see artifacts in both the NV demo as well as the DXSDK sample. This technique can create some very nice shadows in volumetric lights by using the same stencil outlines used for stencil shadowing. It's a very nice technique that I hope developers use to create some more atmosphere.
 
Mintmaster said:
Geometry won't make any significant difference, because it's useless to use a high poly model (pixels/vertices < 50) for volumetric fog. You just can't see that kind of detail since the edges of each volume fade out. There may be a very small speed penalty for having to cull extra polys, but that's it.
We're not talking about volumetric fog here. We're talking about using a similar technique on solid objects for translucency. So the polycount may well be quite high.

And doing the front and back faces together will still make the algorithm more numerically stable.

Anyway, I'd still have to see this in action to concede that FP32 would be worth it, so that I can see exactly what is required to eliminate banding in the final image.
 
As far as I can tell, the NV40 supports floating point Z buffers.

Mint, you're right, in this particular example, there is no inhernet advantage shader clock. You could even use static PS branching as long as you sort the geometry by back/front facing. However, in other algorithms that fake per-pixel conditions by using stencils, z, or other multipass-blend decision techniques, there would be an advantage.

Seems to me that smart HW could even treat the face register as a kind of constant boolean and get static branch like performance since it is guaranteed not to change per pixel or per quad.
 
Mintmaster said:
Chalnoth said:
I doubt one instruction will be very meaningful for performance.
When you're talking about a single instruction shader to draw the fog surfaces (that single frc command which I just successfully tried, or a simple texture lookup to encode), one extra instruction will halve performance. I'd call that meaningful.
Unless, of course, there are other overheads which meant that the first N instructions were, for all intents and purposes, free <shrug>
 
Chalnoth said:
We're not talking about volumetric fog here. We're talking about using a similar technique on solid objects for translucency. So the polycount may well be quite high.
Chalnoth, we are talking about volumetric fog or light shafts with this particular technique. Using this technique for translucency is not a good idea, because the light from translucent objects is not related to the thickness of the object in the viewing direction, but rather the thickness in the light direction integrated (with some function) along the viewing direction. Since the primary contribution will be from light scattering closest to the viewer, using the depth map scattering technique is best.

Just look at the PDF. The volume fog technique gives you an "X-Ray effect". All the pretty translucency images are from the depth-map technique, where I16 (or better) is the best format for storing depth since you have no need for increased accuracy close to the light, nor does high range do anything for you.

Obviously, we are not talking about translucent materials like marble for this particular technique. Only highly transmissive mediums, like fog or ambient dust hit by light (aka light shafts), will look good with this technique.

Chalnoth said:
And doing the front and back faces together will still make the algorithm more numerically stable.
How so? It's exactly the same operation: Read from framebuffer, add (or subtract), and write to framebuffer. There's no magic combination of additions and subtractions before the result goes to the framebuffer. There may be a write cache, but that will effect a negligible number of pixels.

You're clearly not well versed on the specifics of some of these graphics techniques, so why are you arguing such details so strongly?

Chalnoth said:
Anyway, I'd still have to see this in action to concede that FP32 would be worth it, so that I can see exactly what is required to eliminate banding in the final image.
It doesn't take a simulation to see how the results will be affected. Each step in depth (clearly visible in the screenshots of that PDF) will be divided by 13 more mantissa bits, i.e visually indistinguishable from continuity. You can hack the SDK sample yourself and change the generated lookup texture to 1024 or 512 and see how drastic of a change even one bit makes. It's in the ComputeStepTexture() function.
 
Mintmaster said:
Chalnoth, we are talking about volumetric fog or light shafts with this particular technique.
I wasn't. I still don't care much for polygonal fog. But the translucency algorithm is similar, and should have many of the same characteristics.

How so? It's exactly the same operation: Read from framebuffer, add (or subtract), and write to framebuffer. There's no magic combination of additions and subtractions before the result goes to the framebuffer. There may be a write cache, but that will effect a negligible number of pixels.
In integer, this is true. In floating-point, it is not. Floating point will be more accurate if the numbers being added and subtracted to and from the framebuffer as as close as possible to the value currently stored in the framebuffer, which essentially means that if you do all your additions and then all of your subtractions, it's going to be less accurate. Front to back ordering should also make the technique more stable.

It doesn't take a simulation to see how the results will be affected. Each step in depth (clearly visible in the screenshots of that PDF) will be divided by 13 more mantissa bits, i.e visually indistinguishable from continuity. You can hack the SDK sample yourself and change the generated lookup texture to 1024 or 512 and see how drastic of a change even one bit makes. It's in the ComputeStepTexture() function.
But what about dithering? Or FP16 blending? I can't test either of these.
 
Chalnoth said:
Mintmaster said:
Chalnoth, we are talking about volumetric fog or light shafts with this particular technique.
I wasn't. I still don't care much for polygonal fog. But the translucency algorithm is similar, and should have many of the same characteristics.
The "translucency algorithm" may also use the idea of object thickness, but there is no alpha blending and no adding/subtracting. It's used for materials with more light scattering and absorption, like marble or plastic. Once lights exits the object, it's intensity is so reduced and is so scattered that it's re-entry into the object on the same initial path is inconsequential to the appearance.

What they do there is similar to shadow mapping. Render the scene from the light's point of view, and when rendering the object from the camera, use this value to light it. In shadow mapping its a simple yes/no comparison with the distance of that pixel from the light, but here the difference is used to determine the colour. Totally unrelated to the volume fog technique, and alpha blending plays no role.

Chalnoth said:
How so? It's exactly the same operation: Read from framebuffer, add (or subtract), and write to framebuffer. There's no magic combination of additions and subtractions before the result goes to the framebuffer. There may be a write cache, but that will effect a negligible number of pixels.
In integer, this is true. In floating-point, it is not. Floating point will be more accurate if the numbers being added and subtracted to and from the framebuffer as as close as possible to the value currently stored in the framebuffer, which essentially means that if you do all your additions and then all of your subtractions, it's going to be less accurate. Front to back ordering should also make the technique more stable.
This is all irrelevant. The worst case scenario (many overlapping polygons) is still there for concave objects, unless you sort every triangle in software. The best case scenario is already quite poor, and that's what I'm arguing anyway. Even when all 12 bits are used without facilitating for overlap (i.e. just one frontface and one backface), it's not enough for a decent picture, and what you're describing won't help this best case. FP16 has only 10 mantissa bits, so you'll hardly get any better than 12-bit integer.

Dithering helps, but just think back to the days of 16-bit rendering. Dithering was especially distracting in alpha blended fog or smoke. If we're resorting to dithering, I think that's more than enough proof that more precision is desired and will make a visual difference for this technique.
 
Mintmaster said:
The "translucency algorithm" may also use the idea of object thickness, but there is no alpha blending and no adding/subtracting.
That's part of the calculation, not all of it. That's a very rough approximation for single-scattering. Absorption seemed to be handled pretty much identically to the fog technique.

This is all irrelevant. The worst case scenario (many overlapping polygons) is still there for concave objects, unless you sort every triangle in software. The best case scenario is already quite poor, and that's what I'm arguing anyway. Even when all 12 bits are used without facilitating for overlap (i.e. just one frontface and one backface), it's not enough for a decent picture, and what you're describing won't help this best case. FP16 has only 10 mantissa bits, so you'll hardly get any better than 12-bit integer.
I seriously doubt it.

First of all, the worst case scenario you describe shouldn't come up often in practice. As far as I know, most objects that overlap themselves, when rendered, will typically not render all front-facing surfaces before all back-facing surfaces (or vice versa).

And as for FP16 only having 10 mantissa bits, the 12 bits for the integer format would have to be stretched across all possible depths, and so you'd have to be dealing with objects approximately 1/4th the size of the world to use more than 10 significant bits (or perhaps more: the limited size of integer formats may require you drop accuracy further to ensure no overflow).

However, if you render front and back faces in the same pass, you are much more likely to retain most of the accuracy of FP16, particularly in the near-field where artifacts should be more noticeable.

Dithering helps, but just think back to the days of 16-bit rendering. Dithering was especially distracting in alpha blended fog or smoke. If we're resorting to dithering, I think that's more than enough proof that more precision is desired and will make a visual difference for this technique.
The main thing to notice here is that color dithering dithered among 3 separate channels, and so artifacts became very distracting. With just one channel, there will be no telltale discoloration, and so the artifacts should be much less likely to show themselves.
 
Chalnoth said:
Mintmaster said:
The "translucency algorithm" may also use the idea of object thickness, but there is no alpha blending and no adding/subtracting.
That's part of the calculation, not all of it. That's a very rough approximation for single-scattering. Absorption seemed to be handled pretty much identically to the fog technique.
Can you explain this to me? I already showed you why it is physically stupid to model a light ray entering and exiting a translucent marble-like surface and then reentering it on the same initial path. If you look at the GDC paper, the technique you're talking about starts on page 53. On page 55 di is stored in the depth map, and do is determined in the vertex shader of the final pass. Page 57 and 58 show the results of this technique. This "very rough approximation" is all that is used for those statue renderings.

Even for multiple scattering models in marble-like materials, the volume-fog technique is useless. Nothing on page 59 has anything to do with that technique. The best way to render these types of materials, esp. where multiple scattering is significant, is using spherical harmonics, because the lighting response is low frequency.

Read the translucency pdf again. You're not making sense on this point.

Chalnoth said:
This is all irrelevant. The worst case scenario (many overlapping polygons) is still there for concave objects, unless you sort every triangle in software. The best case scenario is already quite poor, and that's what I'm arguing anyway. Even when all 12 bits are used without facilitating for overlap (i.e. just one frontface and one backface), it's not enough for a decent picture, and what you're describing won't help this best case. FP16 has only 10 mantissa bits, so you'll hardly get any better than 12-bit integer.
I seriously doubt it.

First of all, the worst case scenario you describe shouldn't come up often in practice. As far as I know, most objects that overlap themselves, when rendered, will typically not render all front-facing surfaces before all back-facing surfaces (or vice versa).
True, that's why I called it a worst case. The bigger thing to take away from my post is that banding is not caused by multiple layers (although it can be made worse). Read my next paragraph...
Chalnoth said:
However, if you render front and back faces in the same pass, you are much more likely to retain most of the accuracy of FP16, particularly in the near-field where artifacts should be more noticeable.
You offered me an explanation of how FP numbers add and subtract better if no intermediate sum get large, but this is a moot point for the common case where there is only one front and back surface. If this case shows a lot of banding with 12-bit integer (even when you don't facilitate for overlap), multiple layers will be even worse. PS 3.0 + FP16 blending will help cancel out the latter, but not the former, which occurs more commonly.
Chalnoth said:
And as for FP16 only having 10 mantissa bits, the 12 bits for the integer format would have to be stretched across all possible depths, and so you'd have to be dealing with objects approximately 1/4th the size of the world to use more than 10 significant bits (or perhaps more: the limited size of integer formats may require you drop accuracy further to ensure no overflow).
They don't have to be 1/4 the size of the world. They just have to be further away than 1/4. You can't do any bounding volume tricks either because solid object intersection is only taken into account in the final pass, at least in the SDK sample's smarter solution. NVidia's paper suggests reading and comparing the depth texture when drawing each fog surface, and you'd have to scale and bias the depth texture (differently per volume) to match the depths of your bounded fog volume. Too much more work per pixel compared to the SDK sample, IMO. However, you told me before that this isn't what you were talking about, so I'm not sure where this conclusion of yours comes from.

I agree that FP16 will look somewhat better than I12 on closer objects, but not a whole lot. The problem is we know I12 is pretty crappy right now.

You're point about dithering is well taken, although when thinking about 16-bit coloured fog I never envisioned colour artifacts. Fluctuation in luminance was enough to distract me. I find it most bothersome in motion, not static screen shots. I have to admit, though, that NVidia seems to have done a good job with the dithering, and hopefully it's free.
 
Back
Top