3 new GDC presentations from NVidia

Well, the ++ is a C++ programming reference, so I'd say it makes sense in a developer document.

Anyway, I thought the Real-Time Animated Transparency document was just amazing. Now that's an application of Microsoft's volumetric fog demo that I'd really like to see.
 
NVSDK 7.0 already included the fog demo, accompanied by a document explaining the implementation detail.

I think the technique was first mentioned in an article named "render objects as thick volume" in ShaderX2.
 
bloodbob said:
512 smaples looks nice pitty about the 4 second time to calculate the data for one frame.

It's a preprocessing technique (if you were talking about ambient occlusion).
 
[maven said:
]
bloodbob said:
512 smaples looks nice pitty about the 4 second time to calculate the data for one frame.

It's a preprocessing technique (if you were talking about ambient occlusion).

Indeed, but it's also pretty good going considering that the model consists of ~150,000 triangles, according to the full GPU Gems article.

You can also extend the GPU scheme to higher-order occlusion as well, as I show in a recent article:
http://www.gamasutra.com/features/20040319/hill_01.shtml
 
Well ain't that something.

I was just talking about the volume fog technique in the other thread and how it's made easier and faster with higher precision formats.

Chalnoth, this proves my point about depth needing higher precision. Even with FP16 available, they're still encoding depth into RGB channels. In the ps 1.x version they used a texture lookup to do the decoding, and the 4096 texture size limit restricted precision that can be had this way. I guess they used the frc instruction here on differently scaled depths from the vs (can any coder here confirm that?). In that case, encoding is not such a big deal, I guess, and decoding just needs a dot-product.

Democoder, you're totally right about what they're saying about MRT. The only reason you save a pass in this technique is that the ps 3.0 vFace register allows you choose between additive and subtractive blending based on the face direction. It's 3 passes vs. 4, not 3 vs. 5. Given that volume fog is far more fillrate intensive (though less so than a crapload of alpha layers) than geometry intensive, you may see a performance decrease from the extra instruction needed to merge the passes, and it's just a matter of being slightly more convenient.

Rather misleading "++", especially considering how honest NV usually is. ;)
 
[maven said:
]
bloodbob said:
512 smaples looks nice pitty about the 4 second time to calculate the data for one frame.

It's a preprocessing technique (if you were talking about ambient occlusion).

Sure, but if it were higher speed we could use this technique accurately on animated models.
 
First of all, please forgive my extremely limited technical knowledge of 3D Hardware and Software...

In the Real-Time Animated Transparency slides on page 29, they talk about how PS3.0 can do 3 passes where as PS2.0 takes 5 passes to accomplish the same thing(6 passes for PS1.3).

My Question:

Is it possible for PS3.0 to render a PS2.0 image in 3 passes or is the 3 pass rule limited to PS3.0 written code only?

I apologize if this doesn't make sense, but I really have very limited knowledge.

Thanks,
Dave
 
Mintmaster said:
Rather misleading "++", especially considering how honest NV usually is. ;)
Huh? How is that misleading? In programming terms, ++ adds one, so that line said to me only a little bit faster. I thought it looked rather modest.
 
Well, I guess it depends on how you see it. I see it as "ps 2.0 is faster than ps 1.3", and then "ps 3.0 is faster++ than ps 2.0".

The more misleading part is where they show the passes. You'd think ps 3.0 makes this technique 66% (5/3-1) faster, but it may actually be slower. First of all, MRT is available in PS 2.0, so there's no advantage there, making it 4 passes. Second, collapsing front and back face rendering only saves a state change and geometry bandwidth in a fillrate heavy operation, i.e. negligible gains. But you need another instruction in the shader to choose between addition and subtraction, as I mentioned above. I don't think you can do this for free given that you need to do the depth encoding as well, but it might be possible if NV40 can do parallel ops like R300.

Anyway, forget about that. Do you see now why 32-bit blends can be useful? Why volume fog has plenty of advantages?
 
Mintmaster said:
Second, collapsing front and back face rendering only saves a state change and geometry bandwidth in a fillrate heavy operation, i.e. negligible gains. But you need another instruction in the shader to choose between addition and subtraction, as I mentioned above.

Wait a second, that's not right. First, the PS2.0 method is rendering 2 different frames (F and B fog frames), so not only are you saving state change, and geometry bandwidth, you're also saving pixel shader execution and fillrate. Let's say the pixel shader for the F/B frames is N instructions long. At the end of rendering two frames, you've written twice as many pixels and executed 2N instructions.

With the PS3.0 shader, you're going to have < 2N instructions, because of common subexpressions between the front and back facing shaders. The addition of one instruction, even if you use predication/CMP (no dynamic branch), will still end up with cheaper performance, because you don't need to do another SRT call and setup another scene nor do you duplicate the execution of common instructions between back/front faces.

So the 3.0 method saves

a) state changes
b) geometry bandwidth, vertex shader cycles
c) shader cycles

-DC
p.s. I get about 33fps on the fog demo under D3D on a Radeon 9700 PRO
 
Mintmaster said:
The more misleading part is where they show the passes. You'd think ps 3.0 makes this technique 66% (5/3-1) faster, but it may actually be slower. First of all, MRT is available in PS 2.0, so there's no advantage there, making it 4 passes. Second, collapsing front and back face rendering only saves a state change and geometry bandwidth in a fillrate heavy operation, i.e. negligible gains.
Actually, it should also help with numerical stability, which should improve effective precision. Also remember that multiple passes are more expensive in terms of memory bandwidth, not to mention some of the processing has to be repeated with additional passes. So the 3 passes with PS 3.0 is most likely faster than the 4 passes with MRT + PS 2.0.

But you need another instruction in the shader to choose between addition and subtraction, as I mentioned above. I don't think you can do this for free given that you need to do the depth encoding as well, but it might be possible if NV40 can do parallel ops like R300.
I doubt one instruction will be very meaningful for performance.

Anyway, forget about that. Do you see now why 32-bit blends can be useful? Why volume fog has plenty of advantages?
It depends. Are the added memory bandwidth requirements of the 32-bit blend cheaper than the added instructions for dithering in the 16-bit blend?
 
Umm I would really would have hoped that the dithering is done for the frame buffer output rather then intermediate textures??? You guys sure we are dithering the float buffer and not the output buffer?
 
bloodbob said:
Umm I would really would have hoped that the dithering is done for the frame buffer output rather then intermediate textures??? You guys sure we are dithering the float buffer and not the output buffer?
It has to be dithering of the float buffer, because we're not talking about color dithering. Anyway, don't let the bad looks of color dithering mar your view of this. The float buffer in question has only one channel, and so there won't be anything akin to the miscoloration artifacts seen with color dithering. I think it'd be pretty challenging to discern a dithered image from a higher-resolution image, but if you really want to get rid of the dithering artifacts, you can simply apply a blur filter after dithering. That may look better anyway.
 
Wow I would have thought float 16 would have been good enough their I didn't want go off and redownload the pdf again.
 
Helevitia said:
...

My Question:

Is it possible for PS3.0 to render a PS2.0 image in 3 passes or is the 3 pass rule limited to PS3.0 written code only?
...

If I understand your question correctly...

The PS 3.0 and PS 2.0 they discuss is for just this shader technique (animated transparency done in the way described), and just the specific "PS 3.0" and "PS 2.0" methods discussed. The relationship of passes discussed isn't a rule about PS 2.0 and PS 3.0 code and images.

As far as the 5 pass PS 2.0 technique applying to PS 2.0 in general, the PS 2.0 method they discuss doesn't seem to use all the features that PS 2.0 itself can be used with. For example, another "PS 2.0" technique could be more like the "PS 3.0" method mentioned, since MRTs can be used with PS 2.0.

"PS 2.0" is a label that can apply to a wide variety of things, including a good portion of the things that can be called "PS 3.0". PS 2.0, unlike PS 3.0, doesn't require all those things.

PS 3.0 (and all those things in combination) can allow implementing something in the same amount of passes as PS 2.0, or fewer passes than PS 2.0 in specific circumstances, but never requires more passes. That is the only rule about the number of passes between PS 2.0 and PS 3.0.

One way to look at this is that PS 2.0 is a good thing, and PS 3.0 is a good thing that might be easier to describe and also require less work from your hardware (which can make it easier to achieve and/or run faster). For now, the only rule here is the possibility of this...this will vary from technique to technique, and graphics chip to graphics chip.
 
Back
Top