HDR+AA...Possible with the R520?

The layman's guide for laymen

FP16, 24, and 32 in the pixel (fragment) shaders has been pretty well covered by the 6800U reviews (and maybe the 3DM05 reviews and B3D's TR:AoD screenshot analysis). The same principles apply to FP and the current FX8 buffers as they do to fragment precision: higher precision means less rounding errors, and as shaders get more complex, rounding errors will be more prevalent (or, at least, easier to see). And the greater range that typically goes hand in hand with the greater precision (although perhaps not with FP10) is pretty important as it relates to FP buffers and HDR (High Dynamic Range). It allows for brighter whites and darker blacks without washing out intermediate colors.

As others have mentioned, just think of it in the same way as we saw the 16-bit vs. Voodoo 3 "22-bit" vs. 32-bit argument play out in the days before 32-bit became universal and shaders took center stage. Higher precision cost more in terms of either transistors or performance, so initially there'll be a trade-off between better performance with lower precision/IQ or worse performance with high precision/IQ. Then manufacturing makes supporting the higher precision mundane, and we move onto the next advancement in GPU capability (and thus higher precision requirements to avoid artifacts).

Perfect, first post on a new page. I hope, in trying to save others some retyping, I'm not leading US too far astray.

Actually, this quote from JC sums it up quite succinctly:

Quinn said:
John Carmack said:
This wasn�t much of an issue even a year ago, when we were happy to just cover the screen a couple times at a high frame rate, but real-time graphics is moving away from just �putting up wallpaper� to calculating complex illumination equations at each pixel. It is not at all unreasonable to consider having twenty textures contribute to the final value of a pixel. Range and precision matter.

Range and precision is what John Carmack wants. Range and precision is what the GeForceFX delivers.
We're moving from multi-pass textures to multi-pass shaders, and from low-range display buffers to high-range ones.
 
wireframe said:
Acert93 said:
As an NV40 owner your point would be more relevant if the NV40 could even do FP16 in a modern game at solid framerates. It cannot, and thus is one of those overblow features that takes a refresh/new gen to be usable.

I don't see how this is true when I can run both Far Cry and Splinter Cell Chaos Theory at 1280*1024 with HDR enabled on the 6800 Ultra. It may not be bragging-rights FPS territory, but it is certainly playable.

See below... difference between playable for you vs. the common accepted standard for playable for hardcore gamers shovelling out $450+ for a GPU.

And then I thought you agreed because you say this:
And with that I must say I am happy the couple games that use HDR (FC, SC:CT) seem to be able to output stable frame rates with HDR enabled.

Is that supposed to imply that only just now with G70 is this possible.

Correct, I was implying that HDR is now playable on the G70 and was not playable on the NV40 at 1280x1024.

As for "playable" that depends how you define playable.

While it may be acceptible to your tastes (I would not question that), most gamers who throw down $450+ for a GPU expect ~60fps in a FPS. Whether that lines up with your tastes or not is not really the question (if you can tolerate lower framerates that is a good thing!), but more of a "how do we evaluate this". Many people cringe at the idea of a fast paced FPS with a sub 60fps framerate, especially if we are going to see dips below 30fps, or even worse, into the teens.

Overall the stats, and the reviewers, tend to think that Far Cry (a fast paced First Person Shooter) is not very playable at 1280x1024.

No HDR vs HDR in FC 1.3 @ 1280x1024
108 > 43
93 > 40
93 > 41
108 > 41

That includes no AA and no AF.

And while ~40fps may sound reasonable, my experience with Far Cry is that it exibits a fairly jumpy framerate, so while the average may be 40fps the minimum are much lower. You can usually guestimate that minimum framerate will be about half of the average, and the benchmarks here and here backup the fact that Far Cry does exibit an unstable frame rate that drops to about half of the average at the lows. Basically Far Cry has a lot of hitches, therefore a higher framerate is needed to maintain the sense of stability and fluidity. Resolution and IQ are only 2/3 of the believability trifecta--a choppy frame rate not only affects immersion it hinders gameplay. Some games can get away with 30-40fps because their frame rate does not fluctuate much, Far Cry (from my experience) is not one of those.

Considering it is a First person Shooter, a *stable* framerate in the *60fps* range is what I would consider playable--at least on a $450+ GPU. That does not mean you cannot enjoy it at lower frame rates (and I am glad you can), but by and large this framerate would be considered an issue by a hardcore gamer who shovels out this kind of cash for a GPU.

So at 1280x1024 with HDR enabled we are seeing drops into the teens and an average of only 40fps. I would not consider that a good tradeoff for 2 reasons:

1. You could go up to 1600x1200 with 4x AA and 16x AF and get far better IQ (AA/AF affect more of the screen area in most situations) *and* a better framerate (~50fps). At 1280x1024 we are looking at a framerate of about ~65fps (depending on the map, like Volcano, Training, and Research).

~40 with HDR
vs.
~65 with 4xAA and 16x AF

Considering the hitching issues, getting a cleaner image (minus HDR) at a 60% better framerate would seem a better choice for a FPS.

2. Quality. A picture says a thousand words. While HDR is absolutely beautiful in Far Cry at times. Yet at other times the HDR effect is more of a distraction than an IQ boost.

When looking at the first two sets of pictures, ask yourself: When did the moon become a mini-Sun?

http://www.firingsquad.com/hardware/far_cry_1.3/images/18.jpg
http://www.firingsquad.com/hardware/far_cry_1.3/images/17.jpg

http://www.firingsquad.com/hardware/far_cry_1.3/images/15.jpg
http://www.firingsquad.com/hardware/far_cry_1.3/images/16.jpg

http://www.hardocp.com/images/articles/1098809904DJ7a6BZMfd_4_6_l.jpg
http://www.hardocp.com/images/articles/1098809904DJ7a6BZMfd_4_5_l.jpg

Overall it is not only my opinion that Far Cry is not very playable with HDR on at 1280x1024

http://www.xbitlabs.com/articles/video/display/farcry13_7.html
Wow! The performance impact is just unbelievable and is more than 50%. High dynamic range requires tremendous memory bandwidth and quite some additional computing power from graphics processor and its memory subsystems, even high-end graphics cards of today cannot handle that load. Therefore, we should expect games to acquire that technique only when the next-generations of graphics processors will get 24 – 32 pixel pipelines and advanced memory interfaces.

http://www.firingsquad.com/hardware/far_cry_1.3/page19.asp
The biggest downside to HDR is its performance hit, performance is roughly sliced in half once HDR is enabled, even with the mighty GeForce 6800 Ultra. As a result, the highest playable resolution for most of you will probably be 1024x768.

So while HDR makes for a beautiful image at times (no doubt!!), it still is in its infancy. On the other hand AA and AF always give an IQ boost that cleans up the image. Is 1024x768 with HDR really better than 1600x1200 w/ 4xAA and 16xAF? Considering the image issues with HDR at times and the fact a game like Far Cry has a lot of geometry, and a lot of aliasing (not to mention huge open areas that AF really helps clean up) I personally would have to give a big negative on that.

SC:CT also sees its framerate cut almost at half, with an average of ~30fps with HDR on at 1280x1024 according to FiringSquad. An average of 30fps, even in a stealth FPS, is really low when you begin to consider dips.

Essentially you are reducing jaggedness, not by sampling more of one geometry to "fill in the blanks," but you are taking the whole image in a "full scene lit context" and suddenly those jaggies become less pronounced.

Maybe in the future. But no one wants a blurry mess and for right now HDR is not taking away the jaggies in either game. Even with HDR enabled the foliage still appears very rough and jagged in most situations. Enabling AA really smooths out the image and gives it a clean look.

But again, idealy, we should not have to choose. AA + HDR (implimented in a fashion to improve IQ) would be the ideal. Until then it is the lesser of two evils, and in my opinion AA gives a better IQ boost than HDR in the two games that support it, especially if that means higher resolutions, better frame rates, and AF.
 
wireframe said:
The moment I read about FP10 FB on R520 I started laughing. I really want to read what all those people who complained that NV40 only had FP16 FB support will say now. Remember all those comments all aorund the Web? How FP16 is not "real HDR" and you really need FP32? Now watch as with sudden smoothness FP10 becomes "just right". LOL.

Of course I am not saying FP10 is not good enough. I am talking strictly about all the online noise about it.
I don't think the more knowledgeable people were complaining in that manner. I had a debate once with Chalnoth because he was effectively saying FP32 blending is useless, but I never put down FP16.

Personally, I think this was the biggest mistake in R4xx. They should have had high precision blending of some sort. 16-bit integer blending in 4 ROPs at the very least.


Zengar said:
Floating-point numbers with 10 bit precision...

Puuhh... I remember you guys for flaming nvidias FX12 precision which is about 8 times more precise :)

And 7bit mantissa is even worth then plain 8-bit RGB encoding! Plus, we have tonemapping, lots of scaling etc. Do you really thing 7 bits are enought?
Remember that the GeforceFX often used FX12 precision internally. That a lot different. Furthermore, I don't think you could even preserve that info in the framebuffer, as I doubt it supported any format like A16R16G16B16, so you were stuck with 8-bits per channel externally. Finally, the GeforceFX had no high precision filtering, so normal maps could be no better than 8-bits per channel.


Democoder, you're right - FP10 is a hack. But that's what realtime graphics are all about, really. A [32,-32] range should get you fairly decent looking HDR effects - much better than the bloom found in Tron 2.0 or XBOX games. In any case, I don't see it being a problem for users to be able to toggle between backbuffer formats, especially if R520 supports FP16 blending also.
 
Mint,
HDR scenes have contrast ratios anywhere from 1000:1 to 1,000,000:1 (sunlit outdoor scenes with shadows) -32,32 just doesn't seem enough.
 
Democoder and Chalnoth,

Is there a way to simulate ATI's FP10 implimentation in 3DStudios or another application?

It would be nice to see the differences between FP16/FP10 side-by-side.
 
DemoCoder said:
Mint,
HDR scenes have contrast ratios anywhere from 1000:1 to 1,000,000:1 (sunlit outdoor scenes with shadows) -32,32 just doesn't seem enough.

What [-32,32]? 32 binary digits? Or 64? Floating or fixed point?
 
DemoCoder said:
Mint,
HDR scenes have contrast ratios anywhere from 1000:1 to 1,000,000:1 (sunlit outdoor scenes with shadows) -32,32 just doesn't seem enough.
That's why you're right about it being a hack. I don't expect truly photorealistic scenes, but I think the step in image quality from 8-bit integer per channel to FP10 will be far greater than that from FP10 to FP32, even though the latter requires four times the bandwidth.

Remember that the only reason for an HDR framebuffer (on any mainstream display, at least) is post-processing such as tone mapping, bloom, streaks, lens flares, etc. These effects can easily be weighted to fit a 0-32 range instead, i.e. maximum contrast of 8192:1 (Using Xmas' s6e3 w/ denorm theory). Sure, if you have 50+ bright objects blended on top of each other then the range will saturate, but that's a pretty pathological case. Furthermore, saturation for HDR effects is not objectionable at all, since that's what happens with cameras anyway.

Another point is you don't have to have an accurate absolute value for the light intensity at each point. You can scale everything so that all relevent data fits into this range. The scaling factor can change much like the delay of an iris, e.g. the rthdribl demo.

As an example of how you can get quite good results without a 1,000,000:1 range, look at ATI's Debevic demo. They use a 16-bit integer format for the environment, I believe, because it's filtered. That's effectively a 0-256 range. Chopping a few high order bits and readjusting the scaling won't make that much difference.
 
Yeah, but the point of HDR is that the compression/scaling happens during tone mapping and that the buffer can record the full dynamic range.

Let me give you an analogy.

Film cameras has a CR of about 128:1, so any given photographed scene can only capture this much. But since one can adjust exposure and aperture on a camera, one can capture a much greater range, albeit, only about 7-stops at once.

But film *print* (what's used to play back your movie) must be able to store the entire range that any of the negatives could have been exposed at, because analog film projectors can't dynamically adjust their iris/lightsource (this info is not recorded by analog cameras on 35mm film for the projector playback to use) So Kodak film-quality *print* film has to be HDR to hold the entire range of what possible negative exposure settings are used. In this case, the projector/human eye is the "tone mapper". The adjustment of the camera aperture/exposure settings is the "scaling" stuff, but note, it takes significant effort to get exposure right! It's difficult to "develop".

In the case of HDR buffers, the HDR exists not just to prevent saturation, but to preserve local contrast differences. That's why tone mapping algorithms can be so complex, because the challenge is to model how the human eye views local contrast as well as global contrast. See http://www.cs.virginia.edu/~gfx/pubs/tonemapGPU/

Once you start doing your scaling and chopping in shaders to fit it into the framebuffer, your are defeating the purpose of tone mapping, and are in effect, doing a pre-tone-map range compression yourself. These tricks to fit the range into a non-true-HDR buffer are not likely to account for all of the aspects of real tone map range compression.


Besides the dynamic range objection to FP10, there is the mantissa accurracy objection. And finally, there is the "developer headache" objection, just like there was with FP16 and determining which shaders don't need more precision.

I think FP16 framebuffers, when using proper tone mapping algorithms, will not only look better than FP10, they will exhibit less artifacts, and be much easier to code for, and hence much more likely to be used.
 
I applaud ATI if R520 will have HDR + AA.

That would take the industry in a good direction.

However, if NV's G70 is able to run HDR at a higher resolution than R520 due to more pipelines, AA might not make a significant difference as far as the end user is concerned (unless of course the end user has a low resolution display. Then AA will end up being much more important).

Course, there will be lots of "But my R520 can do HDR with AA!" and "G70 runs HDR faster and at the higher resolution it looks better!"

This sort of thing (seeing if R520 supports HDR + AA) does make the wait for August/September more difficult though.
 
Unknown Soldier said:
Could someone(Dave, Chalnoth?) please write an article about FP16, FP24, FP32 for Shaders and FP10, FP16 for HDR.
I'm not Dave nor Chalnoth but where would you like such an article to start? From the word "floating point" itself? :devilish:
 
Either no one's noticed my attempted answer, or I didn't make any terrible mistakes. As my water glass is currently half full, I suggest he start with my reply. :)
 
How many transistors was it that you saved from FP24 vs FP32, i remember 25% if thats correct?
Taking the whole die with the AA-engine,Z-cull etz etz, how bigger part % with all this does that transform into?
10-15% or so maybe?

With shaders in the 200 range at most would we really see a difference in quality that equals the speed you would get with my guestimate of adding those transistors for things like HDR with AA, i dont think so.

In honesty could not the whole "DX9 era" of cards hold it to FP24 and spend resources like this into the above?

For the record i understand nVidias stance as they rip the gaming cards to Quadros workstation that HAVE use of FP32.
 
overclocked said:
For the record i understand nVidias stance as they rip the gaming cards to Quadros workstation that HAVE use of FP32.
ATI does the same. Besides, I do think FP24 is a limiting factor for long shaders, and especially when you do texture reads dependent on complex math.
 
Xmas: speaking of which, are there any screenshots examples of FP24 precision banding? When you say "long shaders" what's the instruction count ball-park you're referring to?
 
Back
Top