FP blending in R420?

KimB · Jun 17, 2004

....not to mention FP16 is a MS standard :?

andypski · Jun 17, 2004

Chalnoth said:
That's a very specific case, and you'd have to cite an example for your point to be of any use. I seriously doubt that many of the "old" unsupported texture addressing modes of either the NV4x or R4xx are terribly useful at all.

I mentioned no specific hardware here - I was just giving an example.

And which old unsupported modes are you talking about? "Wrap" for instance?

Anyway, it's not 'very specific' at all - people have used these addressing modes and will continue to do so. If this were not the case then we could remove all support for them.

Certain card's inability to use certain (otherwise supported) formats under D3D seems to be due to a lack of some basic features when using these formats - would this somehow make these formats "not worth a crumpet" to developers if they could be exposed? I suspect not.

Does the fact that the hardware in question lacks support for required features make that hardware's support for those modes 'not worth a crumpet' in the eyes of certain people here? I suspect not as well.

Albuquerque · Jun 17, 2004

My first post, my bazillionth thread that I've read

Just a snippet I encountered for you folks bickering on the merits of HDR requiring post-blending:

Per John Carmack on the ARB2 path in D3...

so I have added a few little
tweaks to the current codebase on the ARB2 path:

High dynamic color ranges are supported internally, rather than with
post-blending. This gives a few more bits of color precision in the final
image, but it isn't something that you really notice.

If I'm reading this right, he's suggesting (and it makes sense) that calculating the HDR values in-process rather than a post-function blending stage results in a slightly more accurate final image.

Which says to me two things:
#1. Doom 3 will support some form of HDR when in ARB2 render mode
#2. X800's will have no problem running it in "native mode" rather than some form of hack

I am also assuming two more things:
#1. This is Carmack, we all know that many programmers and developers will likely use a similar method to his.
#2. If a floating point framebuffer was previously understood as the only "proper" way to do HDR, I think Carmack is a big enough uber-geek to dispell that myth now.

Obviously there are still problems with doing HDR calcs without a floating point framebuffer, which have been discussed in this thread (real time HDR reflections being a very notable one), but it seems that HDR on R3xx or R4xx hardware is further from a "hack" and more of a methodology change.

Pardon my interruption; let the flaming, bickering and logical fallacies continue

Edit - I apologize, I found that quote in the same plan where he talked about dropping vendor-specific rendering paths and going to only ARB / ARB2.

3dcgi · Jun 18, 2004

Chalnoth said:
3dcgi said:

I think the possible drawback is a bad product cycle, NV30.

Click to expand...

The NV40 has shown pretty clearly that FP32/FP16 had nothing to do with the poor floating-point performance of the NV3x line.

NV40 has nothing to do with my comment. I was referring to the fact that supporting two float precision formats is more complex than supporting a single format. Thus, I believe it was part of the reason NV30 was late and it contributed to the slower than expected performance.

Albuquerque · Jun 18, 2004

I'm not sure that supporting multiple FP rendering modes caused the NV30 any sort of delay; I'm betting if they targetted FP32 rendering pipelines, that it wasn't very difficult to add a trivial number more transistors to allow for "half" precision.

I'm sure that NVIDIA had their heart in the right place with the NV30, they simply weren't expecting ATI to have such a compelling product during that cycle. I believe that NV has obviously learned from that stumble (as any good company should) and has rolled up their sleeves to produce the NV40.

I'm not convinced that either product is "obviously" superior, but I am convinced that both sides (meaning the hardcore ATI and NV fanatics -- I'm not going to mention names) are trying too hard to push their hardware this round. Last product cycle, I think it's safe to say that ATI held the crown. This product cycle is simply too new to crown anyone yet; we're just in the intro stages right now.

pat777 · Jun 22, 2004

Albuquerque said:
My first post, my bazillionth thread that I've read Just a snippet I encountered for you folks bickering on the merits of HDR requiring post-blending:

Per John Carmack on the ARB2 path in D3...

so I have added a few little
tweaks to the current codebase on the ARB2 path:

High dynamic color ranges are supported internally, rather than with
post-blending. This gives a few more bits of color precision in the final
image, but it isn't something that you really notice.

Click to expand...

If I'm reading this right, he's suggesting (and it makes sense) that calculating the HDR values in-process rather than a post-function blending stage results in a slightly more accurate final image.

Which says to me two things:
#1. Doom 3 will support some form of HDR when in ARB2 render mode
#2. X800's will have no problem running it in "native mode" rather than some form of hack

I am also assuming two more things:
#1. This is Carmack, we all know that many programmers and developers will likely use a similar method to his.
#2. If a floating point framebuffer was previously understood as the only "proper" way to do HDR, I think Carmack is a big enough uber-geek to dispell that myth now.

Obviously there are still problems with doing HDR calcs without a floating point framebuffer, which have been discussed in this thread (real time HDR reflections being a very notable one), but it seems that HDR on R3xx or R4xx hardware is further from a "hack" and more of a methodology change.

Pardon my interruption; let the flaming, bickering and logical fallacies continue

Edit - I apologize, I found that quote in the same plan where he talked about dropping vendor-specific rendering paths and going to only ARB / ARB2.

I want a link. I think that was before he had a 6800U. No wonder he disabled post-blending. Noticed he didn't mention the precision of the blending.

Unreal 3 will use FP blending for HDR. Obviously(sp?), Tim Sweeney wouldn't use FP blending unless the advantages of using FP blending outweigh the advantages(are there any?)of using the alternative HDR method.
http://www.totalvideogames.com/pages/articles/index.php?article_id=5765

Being the first game engine to feature Shader Model 3.0 support (enabled only by the nVidia GeForce 6 series) to allow for a host of visual effects including high dynamic-range lighting and FP16 blending, itâ€™s likely that Unreal Engine 3.0 will become the de-facto standard for first-person-shooter titles in 2005 and beyond.

Albuquerque · Jun 22, 2004

UE3 does use 64-bit (FP16) color for HDR values, but I haven't found a specific quote where Sweeney (or anyone else from his development staff) has specifically cited the FP16 framebuffer as the reason. More on that in a minute...

Carmack's quote was taken from one of his normal .plan files which you can find mirrored on any number of sites. He did mention floating point framebuffers and blending thereof, but it was in a completely seperate paragraph (and far distanced) from where he mentioned HDR effects.

Furthermore, and somewhat back to the UE3 discussion, Carmack's revelation of "higher precision" while rendering HDR effects in-process rather than post-process will certainly hold true on current hardware if you think about it: R420's would calculate HDR values in 96-bit color, and then would down-rasterize to FX16 at the framebuffer point. NV40's would calculate HDR values in 128-bit color, and then would down-rasterize to FP16.

The difference? By doing multiple accumulations or blends with FP16 as your limit, you can still run into visible anomalies. By doing it internally at a much higher precision (24bit, or even 32 bit) and then downsampling at the end will result in several bits worth of extra accuracy. Obviously the result will be slightly different on differing hardware, following this general order: NV30 < R3xx = R4xx < NV40.

However, as has already been discussed, FP framebuffers still obviously have their place and will likely result in being the preferred method of HDR (as I have already mentioned -- specifically surrounding HDR reflections and performance options). As soon as framebuffers can grow to support the same precision as the interal rendering pipes, we will have a great solution.

In the same Plan where Carmack mentioned HDR and floating point framebuffer (and also made mention of removing the NV30, R200 and other such vendor-specific rendering paths and going to only ARB + ARB2), he also mentioned that he was simply too far in the codebase to want to "hack it up" to support FP framebuffers during this round. I'm very sure the support will be there for his next new technology release

archie4oz · Jun 22, 2004

The OpenEXR standard is actually fairly new and has not been around for many years. Since January 2003 to be exact. After R300. ILM developed it internally before that though.

Depends on which FP16 format...

Rhythm & Hues (and others using CinÃ©Paint) have been using an unsigned 8e8 format for quite a while now too... Anyways I imagine it's only a matter of time before the s5e10 format migrates into the standard C library as a standard type.

R200 and other such vendor-specific rendering paths and going to only ARB + ARB2),

How's he going to get rid of vendor specific paths for the R200 and NV2x (or regComb NV10)? There's no cross vendor ARB or EXT extension for fragment programs for that hardware...

FUDie · Jun 22, 2004

archie4oz said:
The OpenEXR standard is actually fairly new and has not been around for many years. Since January 2003 to be exact. After R300. ILM developed it internally before that though.

Click to expand...

Depends on which FP16 format... Rhythm & Hues (and others using CinÃ©Paint) have been using an unsigned 8e8 format for quite a while now too... Anyways I imagine it's only a matter of time before the s5e10 format migrates into the standard C library as a standard type.

Seriously doubt it. Since it's not native to any CPU (no common CPU at least), you'd have to do conversions every time it was written to or read from memory, which would be very slow.

-FUDie

Albuquerque · Jun 22, 2004

He rewrote the ARB2 path for in-level support of "hints" for vertex programs; I'm assuming these will be based off the exposed driver caps. To that point, he specifically mentioned that "we will require a modern driver for advanced rendering modes", which makes me assume that NV and ATI both have agreed to delve out some GL2 / GLSlang-enabled drivers . Hell, I think this exists now...

Cards or drivers not exposing these caps will fall back to a generalized ARB format that will either require multiple passes, or will drop features based on driver caps. In extreme circumstances, he also left the "NV10" rendering mode in, which is essentially like a DX5-esque fallback. Basically flat texturing, no lighting, etc for TNT's and ATI 7500's.

As for anything more specific than the above; I couldn't tell ya. Throw him an email

archie4oz · Jun 23, 2004

Seriously doubt it. Since it's not native to any CPU (no common CPU at least), you'd have to do conversions every time it was written to or read from memory, which would be very slow.

Ummm, yeah.... Sorta like long longs on non-64-bit processors or extended-precesion floats on any non-x86 hardware?

(or quad-precision floats on anything)...

FUDie · Jun 23, 2004

archie4oz said:
Seriously doubt it. Since it's not native to any CPU (no common CPU at least), you'd have to do conversions every time it was written to or read from memory, which would be very slow.

Click to expand...

Ummm, yeah.... Sorta like long longs on non-64-bit processors or extended-precesion floats on any non-x86 hardware? (or quad-precision floats on anything)...

Those are useful formats because they provide more precision than the HW may support, not less. What's the point of doing emulation for a format with less precision than the HW can support?

-FUDie

archie4oz · Jun 23, 2004

Storage primarily... Dunno why it seems all that unfathomable to you, it's pretty common in image processing to deal with all sorts of data formats that are non-native to CPU hardware (and sometimes hardware will gain hardware support for it (e.g. AltiVec's pixel formats)...

FUDie · Jun 23, 2004

archie4oz said:
Storage primarily... Dunno why it seems all that unfathomable to you, it's pretty common in image processing to deal with all sorts of data formats that are non-native to CPU hardware (and sometimes hardware will gain hardware support for it (e.g. AltiVec's pixel formats)...

You'd still want to do all the computations in FP32 (at least) to get HW acceleration. If you want to store it in FP16 in a file, then use an image conversion library. No need at all for this to be in libc, like I said before.

-FUDie

pat777 · Jun 23, 2004

Albuquerque: The problem is, John Carmack didn't say calculating HDR internally was better than FP16 post-blending, he just said it was better than post-blending. Notice John Carmack, didn't mention the precision of the post-blending. As I said, this quote was made before JC had a 6800U so the post-blending cannot be FP16 precision. Thus, JC must have been referring to integer post-blending when he said internal HDR was better than post-blending.

KimB · Jun 23, 2004

I seriously doubt that FP16 will ever prove inadequate for framebuffer blending of color data, considering that if you develop your algorithm intelligently, you're never going to do more than a few blends. You're not going to do, for example, 100 blends before the final render in a game.

nutball · Jun 23, 2004

Chalnoth said:
You're not going to do, for example, 100 blends before the final render in a game.

Why not?

KimB · Jun 23, 2004

Um, because it won't be realtime?

If you want to do that much blending, you'd find a way to do it all within the pixel shader, to avoid what would become an extreme memory bandwidth penalty.

FUDie · Jun 23, 2004

Chalnoth said:
Um, because it won't be realtime?

If you want to do that much blending, you'd find a way to do it all within the pixel shader, to avoid what would become an extreme memory bandwidth penalty.

If you need to combine multiple objects via blending, what choice do you have? Doing it in the pixel shader won't help you as it means you're reading the destination as a texture anyway. The pixel shader can't combine the results of multiple fragments before writing to the framebuffer.

-FUDie

Briareus · Jun 23, 2004

OT!!
Albuquerque do you work on blowing things up or can you say?

I hear Albuquerque is nice. One of my best friends just moved there and he's in love with the place.

FP blending in R420?

KimB

andypski

Albuquerque

Red-headed step child

3dcgi

Albuquerque

Red-headed step child

pat777

Albuquerque

Red-headed step child

archie4oz

ea_spouse is H4WT!

FUDie

Albuquerque

Red-headed step child

archie4oz

ea_spouse is H4WT!

FUDie

archie4oz

ea_spouse is H4WT!

FUDie

pat777

KimB

nutball

KimB

FUDie

Briareus

Similar threads