NV40 floating point performance?

Hyp-X · Mar 28, 2004

Mintmaster said:
Imagine playing a game when all of a sudden, your character's clothes fly off! (Hehe, the next TR )

Yeah in TR the character's legs were flying off...

Mintmaster · Mar 29, 2004

Chalnoth said:
I don't think that has anything to do with blending, Mintmaster.

Aren't you talking about alpha blending? Blending the pixel shader output with the framebuffer?

Take a look at the water demo from Humus. When you right click, he fixes the water height to some position. It works for the purposes of demonstrating the water simulation, but it's not very realistic.

A better way is to raise the water in front of the moving object, and lower the water behind it. You need blending to make this relative change in the water surface. Well, you need blending to do it efficiently.

Imagine a helicopter or Harrier jump-jet over the water. You want it to disturb the water, but not eliminate any larger waves currently passing through it.

You then take your simulation texture and add the height to the vertices (whereas Humus' demo creates bump maps). FP16 should be enough for water simulation, but may not be enough for other things.

For another unrelated example of where higher precision blending is needed, look at the volume fog example from the DirectX SDK. They go through quite a bit of trouble to get 12-bits of precision using 8-bit channels, and say it's just about enough to make the technique usable. FP16 blending would make your life a lot easier to get similar precision, but 10-bits mantissa + sign + implied bit is just barely equivalent. More precision would be nice to get a better value for the fog depth (just like 8-bit shadow maps make for crappy shadows) and to be able to handle more layers. This technique would work well for light shafts as well.

KimB · Mar 29, 2004

If 12 bits integer is enough precision, I don't see why FP16 wouldn't be enough. I'm sure it will be interesting to try and break FP16 blending when the NV40 comes out.

Also, it seems like the error accumulation would have to be pretty significant there. It didn't seem like many alpha blends were actually being done for the final result. If one could calculate the accumulated error, with FP16, you could subtract the average error from each blend, dramatically increasing numerical stability (You couldn't do this so easily with an integer format as the average error is probably not an integer).

AndrewM · Mar 29, 2004

Hey Uttar, werent you the one that was saying a few months ago that they fixed the register issues? Now you're saying it's not fixed?

StealthHawk · Mar 29, 2004

AndrewM said:
Hey Uttar, werent you the one that was saying a few months ago that they fixed the register issues? Now you're saying it's not fixed?

AFAIK Uttar has always maintained that the register issues would remain to some extent and would not be completely fixed.

Mintmaster · Mar 29, 2004

Chalnoth, that's just the thing, 12-bits is barely enough, and the maximum possible (using tricks) with DX8 hardware. That means dividing the space between the near and far planes into only 4096 intervals, and no volumes can overlap. If you want to play it safe and assume some 10 backfaces can be at the same place at one time, that's only 400 intervals. Not very accurate at all, especially for thin regions like shafts of light.

In your second paragraph, which of my examples are you referring to as "there"? The volumetric fog? That is ALL alpha blends. Remember we're adding the depth value of all back faces, then subtracting the depth of all front faces. The final number can easily be less than 0.1% of the size of the sums.

Whenever your dealing with depth, be it with z-buffering, shadow mapping, or this technique, high precision makes a noticeable difference. It's not a stability issue, it just makes for a much better rendering.

If you were referring to the simulations, error accumulation is not the main problem. It's just the resolution, i.e. the smallest step any element can move in the simulation. Don't know if you've done any physics simulation with gravity, but it's similar to when you get vibration or resonance from using too large a step size. The water simulation won't (at least IMHO) suffer from this, but cloth simulations and other things might.

Anyway, you're the one complaining about FP24 not being enough for textures in the vertex shader, and the difference between FP24 and FP16 is far more noticeable than that between FP32 and FP24.

DemoCoder · Mar 29, 2004

Mint, there is already a demo of Verlet integration for water similation that uses FP16 blends, and it works fine. Sure, there will be instances where it's not enough, but you can always find pathological cases. FP16 blending and filtering is a huge step forward. Developers have been asking for it.

If you need more, you can always use an FP32 render target and do the blending yourself. You can also use pack/unpack and/or MRT + numerical analysis techniques to increase your precision.

Moreover, this is all idle speculation. We don't know if FP32 blending is possible or not. It's possible it's supported, but at 4x the cost or more, for example.

BTW, NV40 supports FP z-buffer.

Mintmaster · Mar 29, 2004

DemoCoder, don't get me wrong, I think FP16 filtering in NV40 is absolutely awesome. I meant to say so in my last post but forgot. I think that's the single thing lacking in the R3xx/NV3x that's holding back wider adoption of HDR in games.

If you look at my post, I said FP16 will be fine for water simulation. I was just trying to explain to Chalnoth why blending is very useful for simulation.

Using a render target is fine, but if you have a bunch of disturbances, you need to copy from the fp32 buffer to a temporary texture, then have this as an input when drawing the disturbance, then repeat for each disturbance. Much more inefficient than blending if you have a lot of disturbances going on.

In my second example, volumetric fog, this is basically impossible. Getting more precision through multiple channels is a real pain, too. In the DX8 solution, the texture size limitation of 4096 was the limiting factor in gaining more precision, and that hasn't changed. Doing it with calculations instead is also really hard.

This was the quote I was replying to:

Chalnoth said:
Personally, I'm not entirely sure that it will be important to optimize FP32 blending for reason of games, and for anything else, if it's not realtime, who cares about a few % performance?

Just showing him there are some applications, even pertinent to gaming, waiting for FP32 or I16 blending. I was disappointed R300 didn't have the latter, but you can't have it all.

nutball · Mar 29, 2004

VVukicevic said:
Chalnoth said:

Considering there is no GLX_render_texture extension, I guess that would be kinda hard. I would think that's more a function of Linux rather than nVidia's drivers.

Click to expand...

Well, there is GLX_ATI_render_texture -- it's supported by ATI's fglrx drivers, but the spec isn't available from the extension registry. nvidia's driver supports GLX_SGIX_pbuffer, which kinda-sorta gets you there as well.

Yeah, I've tried the pbuffer approach, the performance wasn't great (slower than doing it on the CPU, which negates the point).

davepermen · Mar 29, 2004

Chalnoth said:
If 12 bits integer is enough precision, I don't see why FP16 wouldn't be enough.

fp16 doesn't have 12 bits integer part. it has one bit sign, and 5 bit exponent, resulting in 12 bit "integer part" (mantissa). it can represent more values, but different. it is definitely (as 12bit int is, too), a rather low end limit, and fp32 would be much bether.

KimB · Mar 29, 2004

davepermen said:
fp16 doesn't have 12 bits integer part. it has one bit sign, and 5 bit exponent, resulting in 12 bit "integer part" (mantissa). it can represent more values, but different. it is definitely (as 12bit int is, too), a rather low end limit, and fp32 would be much bether.

I know this. But FP16 supports a vastly greater dynamic range, which one should be able to make use of.

KimB · Mar 29, 2004

Mintmaster said:
If you look at my post, I said FP16 will be fine for water simulation. I was just trying to explain to Chalnoth why blending is very useful for simulation.

I still don't see why you'd want to use greater than FP16 when your final output is going to be of lower precision anyway. At least, I don't see why you'd want to do it for blending.

And as far as depth is concerned, as long as your depth values aren't clamped to [0,1], there should be no problem getting more accuracy out of FP16 than you'd get from 12 bit integer.

bloodbob · Mar 29, 2004

Chalnoth said:
I still don't see why you'd want to use greater than FP16 when your final output is going to be of lower precision anyway. At least, I don't see why you'd want to do it for blending.

Cough I guess we should go back to using 8-bit ints since the final output is going to be 8 bit components?

FP16 blends are fine for colour values but I believe Mintmaster is looking to use the gpu as a general purpose processor and not as a graphics processor. So you say well if you need 32 bit precession do the blends in a pixel shader. Thats great if A) you got the support in the drivers which you might not on linux. Next their can be mass fillrate issues if your lots of small blends on the frame buffer. Lets say your doing particle tracing ( used in some GI methods ) now with this you are doing lots of small writes on the screen now if you are to do a pass for every particle for pixel shader blending ( unless you partition the screen ect ect ) the whole thing is going to crawl to a stop this is where FP blends are nice though in this case you can use FP16 because it doens't matter for the exact value but other simulations you might need accurate values.

KimB · Mar 29, 2004

As VV pointed out, GLX_SGIX_pbuffer is supported. From what I understand, this extension is used to essentially create a read/write buffer whose primary purpose is multipass. It doesn't create a texture, and so the pbuffer cannot be used with filtering, but it should be more than enough if you have to write out FP32 data for reading later.

Framebuffer blends are more of a performance optimization than a flexibility benefit. FP16 blending is excellent for games, as it will dramatically improve the ability of games to properly support high dynamic range. It should also be good enough for pretty much any color data.

FP32 blending, of course, might be good for data that isn't color data, but I don't think it would be worth spending the extra transistors (in particular you probably won't want to bother using the standard blending operations on most types of non-color data that you would want to output at higher precision). Framebuffer blending is just a performance optimization, and you can do it in the pixel shader if you need that functionality.

Arun · Mar 29, 2004

AndrewM said:
Hey Uttar, werent you the one that was saying a few months ago that they fixed the register issues? Now you're saying it's not fixed?

Nope, I never said that. In fact, I was the original source of the fact FX16/FP16 exist in the NV4x architecture. See: http://www.notforidiots.com/GPURW.php?product=NV40&submit=1

- Supports FP32, FP16 and FX16 natively. Whether there is any performance difference between FP16 and FX16 is unknown, and whether there are any truly non-FP32 units is also unknown.

Uttar

davepermen · Mar 29, 2004

Chalnoth said:
davepermen said:

fp16 doesn't have 12 bits integer part. it has one bit sign, and 5 bit exponent, resulting in 12 bit "integer part" (mantissa). it can represent more values, but different. it is definitely (as 12bit int is, too), a rather low end limit, and fp32 would be much bether.

Click to expand...

I know this. But FP16 supports a vastly greater dynamic range, which one should be able to make use of.

the dynamic range doesn't give you really more precicion. in the end, you have 3 bits more precicion. and an uneven distribution.

fp16 is great for storage, but for calculations, it would be great if it would all be fp32. the world is trained to use fp32 everywhere, so should blending be supported for it.. well then, another gen of hw failing

DemoCoder · Mar 29, 2004

The shaders calculate at FP32 precision, the storage is just FP16. Yes, framebuffer blends are done, presumably, at FP16 and not with higher internal precision (but we don't know this yet) In any case, doing your calculations at FP32, but storing at FP16 is just as bad. Subsequent passes on the simulation will have truncated you down to FP16, the same as if you moved an FP32 result into an FP16 register.

davepermen · Mar 29, 2004

you will have high precicion calculations, but low precicion intercombination inbetween passes. it's still bether than all at 16bit, but at least support for 32bit blending would be great anyways. i do understand that this means twice the bandwith, and more complex blending units. but to solve these issues, we can just wait a year or so

DemoCoder · Mar 29, 2004

What makes you think the NV40 only supports FP16 in the shaders? The only step we are speculating to be FP16 is the "fixed function" framebuffer blend. NV40 definately supports FP32 in the shaders, and runs it faster than the NV3x did.

KimB · Mar 29, 2004

davepermen said:
I know this. But FP16 supports a vastly greater dynamic range, which one should be able to make use of.

the dynamic range doesn't give you really more precicion. in the end, you have 3 bits more precicion. and an uneven distribution.[/quote]
The point I was trying to make was that in such a shader, the primary problem that would be noticeable would be banding. One should be able to make use of the extra dynamic range to optimize precision.

Here's one way you might do it:

1. Place an upper limit on the size of each fog object.
2. Normalize the depth value such that the above upper limit is set to 1. This should maximize the precision available.
3. Do the front-facing and back-facing calculations for each object before moving onto the next.

Point number 3 may only be feasible in the existence of something akin to the double-sided stencil features we've seen previously, but the above should maximize the available precision and prevent banding on smoothly-sloped fog surfaces.

One may also be able to do some logarithmic renormalization of the distance to optimize the precision, but I haven't thought about how you would precisely go about that. Shouldn't be that hard.

Personally, though, I don't think I like the idea of using geometry for volumetric fog.

NV40 floating point performance?

Hyp-X

Irregular

Mintmaster

KimB

AndrewM

StealthHawk

Mintmaster

DemoCoder

Mintmaster

nutball

davepermen

KimB

KimB

bloodbob

Trollipop

KimB

Arun

Unknown.

davepermen

DemoCoder

davepermen

DemoCoder

KimB

Similar threads