NV40: Why doesn't MSAA work with FP Blending?

LeStoffer

Veteran
Yes indeed, why doesn't MSAA work with FP Blending on the NV40?

Of course the simplest explanation could be that nVidia just decided that since the memory bandwidth (and storage) requirements will go up by quite a bit, MSAA should be a no-go.

But that explanation doesn’t sit that well with me since nVidia tends to provide brand new features foremost and performance next. And 800x600 with twice FP Blending and 2xMSAA doesn’t sound impossible to me anyway.

So what gives? I can’t see any reason beyond the bandwidth (and some storage) constraints since MSAA is done well before the FP Blending stage anyway.

Beyond3d's NV40 preview said:
Although most of the pipeline operations work under the OpenEXR format, at present the FSAA multisampling scheme does not.

Maybe it is just a decision within the drivers for now?
 
My understanding is that FP blending is an incredibly expensive feature (transistor wise), so it's most likely related to transistor cost.

Or it's a limitation of the NV40's output logic, does NV40 support MSAA on any >32bit output format?
 
ERP said:
My understanding is that FP blending is an incredibly expensive feature (transistor wise), so it's most likely related to transistor cost.

Yes, I thought about that too. But since MSAA and FP blending isn't performed at the same stage in the pipeline (before and after the PS units) I would assume they won't claim the same logic.

But maybe there is some reuse with regard to write/read to the back buffer (for MSAA) and writing to the 'blend' buffer? :?
 
ERP said:
My understanding is that FP blending is an incredibly expensive feature (transistor wise), so it's most likely related to transistor cost.

100 % Right. Maybe FP RTs and FSAA is in the next VPU....

Thomas
 
Is it just FP blending that doesn't work with MSAA? I thought it was FP rendering in general that doesn't work with MSAA on all DX9 cards, but of course I could be wrong.

BTW, does anyone know how the RTHDRIBL demo does FSAA?
 
It does seem odd that 2 seemingly unrelated rendering functions, such as MSAA and FP filtering, infringe upon each other in NV40. Aside from requiring floating point precision, what makes performing MSAA on interger blended surfaces different from performing AA on float blended surfaces?
 
Doesn't it depend largely if the blending occurs before or after the multisampling?

I was thinking about this and I'm not certain you can do it before. How does it determine the color of the dest pixel before replication? All 4 of them need not have the same color becasue of varying occlusion.

That means it they would have to replicate the fp blending logic 2 or 4 times to support MSAA. Or at least reuse the same logic, which would complicate the output section.
 
FP blending and filtering will be orthogonal with all other currently available features in future architectures.
 
LeStoffer said:
So what gives? I can?t see any reason beyond the bandwidth (and some storage) constraints since MSAA is done well before the FP Blending stage anyway.
It's not. MSAA is a feature that affects operation throughout various parts of the whole pipeline, from triangle setup and others, to blending, frame buffer compression and downsampling.

Supporting MSAA on FP16 render targets would require changes to the last three parts mentioned. And while it does not seem like it requires some difficult and big changes, I don't think it is trivial or cheap either.
Of course it would be nice to have, but slow, so NVidia obviously thought it isn't worth the effort for this generation.
 
Mintmaster said:
Is it just FP blending that doesn't work with MSAA? I thought it was FP rendering in general that doesn't work with MSAA on all DX9 cards, but of course I could be wrong.

BTW, does anyone know how the RTHDRIBL demo does FSAA?

It doesn't use the HW FSAA, it uses simple super-sampling. ShaderMark v2.1 will use HW FSAA with 16 bit floating point blending (ARGB16F) on NV4x HW and 16 bit integer blending (ARGB16) on R3xx and R4xx HW.

But I don't think games will use this technique, because you have to render the scene twice, once in an x8r8g8b8 FSAA render target and once in an 16 bit hdr render target (where you do all the hdr calculations) and then blend them together.

http://www.tommti-systems.de/temp/hdr_r3xx.png
http://www.tommti-systems.de/temp/hdr_nv4x.png

Thomas
 
My first guess would be that high-performance FP16 with FSAA would require framebuffer compression, and nVidia has not updated their framebuffer compression routines to work with FP16. This is the only thing I can think of that would directly require more transistors with a FP16 framebuffer.
 
Chalnoth said:
My first guess would be that high-performance FP16 with FSAA would require framebuffer compression, and nVidia has not updated their framebuffer compression routines to work with FP16

Hint: Read
 
The only things I can think of are the drivers just haven't implemented this feature or the memory organization with MSAA doesn't work well with FP16 blending.
 
3dcgi said:
The only things I can think of are the drivers just haven't implemented this feature or the memory organization with MSAA doesn't work well with FP16 blending.
Or FP16 is not a displayable format.

-FUDie
 
Nah. I'm sure it's more because they decided it wasn't worth the transistor cost this generation. I expect that soon MSAA with FP16 will be possible (especially considering that Dave was told so by nVidia....if you'll read a few posts up).
 
Chalnoth said:
Nah. I'm sure it's more because they decided it wasn't worth the transistor cost this generation. I expect that soon MSAA with FP16 will be possible (especially considering that Dave was told so by nVidia....if you'll read a few posts up).
If it's not displayable, then you wouldn't be able to downsample the FP16 AA buffer in the RAMDAC. Nothing you said contradicts what I said, yet you say I am wrong. :rolleyes:

-FUDie
 
No, but downsampling via the RAMDAC Is but one option available in the GeForce series of processors for finding the final color value of a particular pixel when using FSAA. A downsample should happen any time you attempt to read from the framebuffer (other than blending, of course), such as, for example, if you read the framebuffer in as a texture in the next pass of rendering (tone mapping in this case).

I don't really think that's a major obstacle, as the hardware could easily use the FP filtering hardware to do the downsampling.
 
Not sure how NV40 implements MSAA but inside triangles the throughput can be high since the same data is written to all "MSAA-sub-pixels" but these subpixels still need full Z-checking and Blending... so quite possibly they simply do not have enough throughput capability with MSAA. So they don't have the required amount of blending units when using MSAA and no loopback capability to do it over multiple clocks ?

K-
 
Xmas said:
LeStoffer said:
So what gives? I can?t see any reason beyond the bandwidth (and some storage) constraints since MSAA is done well before the FP Blending stage anyway.
It's not. MSAA is a feature that affects operation throughout various parts of the whole pipeline, from triangle setup and others, to blending, frame buffer compression and downsampling.

Thanks, I forgot about downsampling probably going on so late in the process on the GeForce cards (old 3dfx RAMDAC trick?).

But it is still a bit elusive to me why the full MSAA process (incl final downsampling) can't be finalized on the two different FP blending targets first after which these two FP 'images' are then blended together.

I'm missing something, but what? :oops:
 
Back
Top