Graphics_Krazy
Newcomer
It's curious to note that NVIDIA still advertises to use FP16 as much as possible, even with the NV40. Doesn't sound like they've fixed their FP32 performance this go around.
DemoCoder said:What makes you think the NV40 only supports FP16 in the shaders? The only step we are speculating to be FP16 is the "fixed function" framebuffer blend. NV40 definately supports FP32 in the shaders, and runs it faster than the NV3x did.
The Baron said:why do people assume that PS3.0 support = FP32? there is no increase in minimum full precision in PS3.0
991060 said:nutball said:Does DX9 even support blending into FP render targets? ISTR reading a Microsoft presentation around NV30/R300 launch-time that said it didn't.
Last time I checked the spec, NO.
Where? I have glanced over the DX9 shader spec a number of times, and never seen this distinction. It's certainly not under the "Pixel Shader Differences" page at the MSDN.LeGreg said:PS3.0 is FP32 quasi IEEE mininum in full precision
(still FP16 minimum in _pp, that didn't change).
See the Dx9 shader spec.
Chalnoth said:I still don't see why you'd want to use greater than FP16 when your final output is going to be of lower precision anyway. At least, I don't see why you'd want to do it for blending.Mintmaster said:If you look at my post, I said FP16 will be fine for water simulation. I was just trying to explain to Chalnoth why blending is very useful for simulation.
Again, the volume fog demo is virtually impossible without blending. For each triangle, you'd have to copy the area being rendered into a temporary texture, and then use it as texture input when drawing it. With all the renderstate changes for each volume fog polygon, software rendering would probably be faster.Chalnoth said:Framebuffer blending is just a performance optimization, and you can do it in the pixel shader if you need that functionality.
When you're adding, though, for maximum accuracy you want to add numbers that have the same order of magnitude. By limiting volume size, you're maximizing the amount of accuracy at least within each volume. Of course, with this simple implementation you'll have decreasing accuracy at larger distances, which is where the logarithmic renormalization idea may help.Mintmaster said:As for your suggestions with the volume fog, nice try (honestly), but they are rather pointless. First of all, normalizing a value for FP calculations does nothing - the whole point behind FP numbers in computers is sort of an automatic normalization of the span of the mantissa bits.
Actually, I didn't assume that. I merely assumed that the most important area to have accuracy was in the near range.Your idea of doing each object separately is also a bad one, not only because it'll require a lot of renderstate changes, but it also breaks their solution of the situation when an object is in the fog (I'm assuming the depth values used in your idea are relative to the centre of the object, or else you're back to the original problem of subtracting two large numbers and getting a small one).
1. Sorting should be trivial.Finally, there are a lot of things to like about geometric volume fog as opposed to layered, alpha-blended volume fog:
1. You don't need to sort in software, or figure out what the slices look like. You just need the volumes themselves, which are very easy to animate.
2. All you get in the end is the thickness of fog in front of each object. You can texture this or do whatever you want.
3. It likely has higher performance due to the fillrate demands of quality layered fog.
4. It has better quality wrt banding - especially when objects are located inside it - provided you have enough precision.
I never said it was useless. My point was that FP32 blending would be a performance optimization, and if it wouldn't be used very much, then the transistors would be better spent elsewhere. I see FP32 as a format that is to be primarily used for data that is not color data, and thus the standard blending functions are much less likely to be useful than they are for color data.Again, none of these specifics really matter. All I'm saying is that FP32 blending is not pointless for the realtime graphics used in gaming, although as usual developers will take their time using it. NV40's FP16 blending is a huge step forward, but higher precision blending, even if it's just I16, would eventually be nice.
Maimed it? I'm not so sure. Notice the caption of 200,000:1 dynamic range. That's vastly above FX16's capabilities, and so would obviously look pretty bad.In fact, I think NV40 might have I16 blending, judging by that paper's comparison (although they maimed the I16 shot quite badly, making me think it could be a shot at R3xx's I16 format).
For those that don't know, there will likely always be issues with register usage. Just as adding more cache to a CPU will improve performance in some cases adding more registers will improve shader performance in some cases. There is probably some point where register usage is generally not a problem though and the bottleneck shifts elsewhere. Worst case shaders might be long with a lot of texture fetches. As more pixel threads fill the pipe the registers will get used up.AndrewM said:Hey Uttar, werent you the one that was saying a few months ago that they fixed the register issues? Now you're saying it's not fixed?
3dcgi said:For those that don't know, there will likely always be issues with register usage. Just as adding more cache to a CPU will improve performance in some cases adding more registers will improve shader performance in some cases. There is probably some point where register usage is generally not a problem though and the bottleneck shifts elsewhere. Worst case shaders might be long with a lot of texture fetches. As more pixel threads fill the pipe the registers will get used up.AndrewM said:Hey Uttar, werent you the one that was saying a few months ago that they fixed the register issues? Now you're saying it's not fixed?
Chalnoth said:I never said it was useless. My point was that FP32 blending would be a performance optimization, and if it wouldn't be used very much, then the transistors would be better spent elsewhere. I see FP32 as a format that is to be primarily used for data that is not color data, and thus the standard blending functions are much less likely to be useful than they are for color data.
I don't see why this is a problem. You can always emulate blending in the shader, as stated above. For offline rendering, the performance hit would be much less of an issue.davepermen said:there are two situations where fp32 blending is useful:
1) to finally have a full solution where everything works the same way. very useful for doing much much much more complex renderings, too, not only full-realtime-related. think of 3dsmax, running all in realtime, and then you press render, and it renders in say 1fps. for this, it needs to have all in fp32 to get good, high quality results, that are determinable, estimatable, and like that, usable. first time, gpu's could be used to accelerate rendering everywhere.
My point was that we're talking only about blending here. Blending is a very specific mathematical operation, an operation that may not be meaningful for many other data types. Transistors would have to be expended to support FP32 blending, and if there aren't many situations under which it would be used, why would the optimization be worth it?2) as you said, non-colour-data. people are now since some years following the dream of shaders, and imagine all sort of things that could be done with those very powerful very efficient streaming data processors. there's just a problem: precicion. we could use the hw to process geometry, audio, to raytrace, to do tons of funny things. there's just one issue: we have to tweak here and there to do this and that in a way we don't loose too much precicion. gpgpu shows quite some simple things that are doable, there is much more.
Chalnoth said:Where? I have glanced over the DX9 shader spec a number of times, and never seen this distinction. It's certainly not under the "Pixel Shader Differences" page at the MSDN.
Chalnoth said:Support for FP blending in no way increases the flexibility of the processor. It is a performance optimization.
My point was that we're talking only about blending here. Blending is a very specific mathematical operation, an operation that may not be meaningful for many other data types.
It doesn't make anything possible that was previously impossible.nutball said:You keep repeating that FP32 blending is a performance optimisation. When a "performance optimisation" makes possible something that was previously impossible, I'd say it's more of an "enabling technology" than a "performance optimisation", wouldn't you?
That's a step forward for your argument. But I don't think blending would be a big help in this case. That is, I would typically expect that each vertex would only be updated once per frame. Unlike the fog data example used previously, this could easily be done by using FP32 render to texture.newpos = oldpos + velocity * time
See? That's a blend!
Chalnoth said:I don't see why this is a problem. You can always emulate blending in the shader, as stated above. For offline rendering, the performance hit would be much less of an issue.