Writing an update article to the Editor's Day coverage

Jakub

Newcomer
The number of updates, corrections and interesting comments to add is getting absurd.

Two last issues I'd like to clear up:

a). Supposedly the F-Buffer allows R350/R360 class hardware to run without an instruction limit, at least in OpenGL 2.0. Is it the miracle cure-all, and if so, is it something PS 3.0/DX10 will be able to take advantage of?

b). Brandon brought to my attention the interesting ShaderMark results (scroll down, you'll see a lot of failures on NVIDIA's part.) These results might mirror Valve's problems with NVIDIA hardware, I thought, and I brought it up with NVIDIA, here's the answer I got:

Brian Burke said:
Floating point render targets make HDR implementation easier and more developer-friendly; however, it is possible to achieve high dynamic range using 16-bit (and even 8-bit) integer render targets, with some extra shader (and developer) legwork; notice how the high dynamic range effects in Masaki Kawase's RTHDRIBL demo run correctly on GeForceFX hardware, despite running with the same set of texture formats and render targets that are available to Shadermark. In fact, one of the primary HDR texture storage formats used in off-line rendering (Radiance's RGBE) is just 8-bits per component.

Here is a description and download link to said demo.

Is ShaderMark flawed then?
 
More importantly you should ask them "How come the NV3x does not support floating point render targets?" ;)
 
Jakub, I'm curious what exactly was the question you asked BB? Did you ask about FP formats, or did BB throw that in there as a preemptive defensive measure? :) I think the answer you sought was in his first sentence, but he was probably compelled (as PR manager) to throw in some tangential stuff to obfuscate the main issue.

I'm curious why he mentioned integer values specifically, though. Will the FX use FP16 (as Valve indicated in B3D's interview) or FX12 for its version of HDR? FX12 may offer more precision than FP16, but it doesn't natively offer as much range. Will using FX12 be slower than FP16 due to the extra math involved in extracting more range out of an integer value, or does the kind of HDR BB is talking about not require as much range?

B3D has known about rthdribl for a long time, though I'm curious to know what more knowledgable people here have to say about BB's assertion that it achieves HDR using the same formats as Shadermark. Is this just a question of DX9 not allowing for HDR on the FX using DX9's default commands, and rthdribl (and HL2) coding in custom work-arounds? I'm still not sure if FPRT's are integral to DX9 or not, and that's the essence of my Q--is the FX fully DX9 if it doesn't support FPRTs?
 
Pete, you keep trying to give me too much credit and then holding me accountable for it :) I didn't ask about FPRTs or FP formats, simply because I'm not a computer engineer, 3D programmer or hardware nut. All I have to work on are my deductions from empirical evidence, and the little theory that I know of.

My question was simply why did NVIDIA cards not perform the relevant ShaderMark tests, is it a problem with ShaderMark, NVIDIA drivers or NVIDIA hardware?

I got that answer in return.

Now guys, come on, I'd appreciate real answers, not "you're asking the wrong question, grasshopper." It's humiliating enough to have to ask for help in corrections to my own article, but I did it. I don't see why I should be the subject of further abuse. So please, if you have a real answer or real correction, feel free to share it with me. If this thread and my ignorance somehow annoy you, nobody's forcing you to read it and reply.
 
I sure hope you didn't take my response as one of abuse. If you did, you're reading far too much into it.

It seems you have an open line of communications with Nvidia. I've never seen a direct answer from them on why they don't support FPRT. I've only read what other people have derived the issue to be. Even though I have no doubts of the correctness of the outside parties' analysis, I'm interested in seeing how Nvidia handle answering the question. I doubt they'd answer it direct and concise, but parrot back more PR-rhetoric.
 
Pete said:
B3D has known about rthdribl for a long time, though I'm curious to know what more knowledgable people here have to say about BB's assertion that it achieves HDR using the same formats as Shadermark. Is this just a question of DX9 not allowing for HDR on the FX using DX9's default commands, and rthdribl (and HL2) coding in custom work-arounds?
It's certainly not a workaround. What BB was referring to, I think, was that both applications are offered the same set of driver caps to work with. RTHDRIBL goes for something that's supported by the hw/driver combo while Shadermark doesn't and skips the test.

It's not the same technique. It's a similar effect, but accomplished through different means. It's not Shadermark's fault that it wants certain capabilities to render the effect. As the man himself stated, doing HDR lighting is easier if you have FP render targets. Shadermark exclusively follows this easier approach and doesn't offer a fallback path (as would be expected from a production renderer).

Jakub said:
Is ShaderMark flawed then?
IMO no. That would be similar to stating that ATI's treasure chest demo is flawed because it can't display all effects on Geforce 3. It's a tech demo, of sorts, for a specific hardware capability (PS1.4). It's completely reasonable to fail if the feature that is about to be demonstrated isn't supported by the hardware.

That also doesn't imply that either of these two are 'biased' applications. The treasure chest demo runs just fine on GeforceFX (because they have PS1.4). That's just fair. Shadermark's HDR tests will certainly run on NVIDIA hardware as soon as the FP render target support is provided.

For representing a more real application, let's take UT2k3 and the cube map vs Kyro situation. There is a fallback, because it works and there's a market for it, but it still doesn't look the same, and it doesn't work the same way. You just can't do what you can't do ;)

In the meantime, tb could be asked to implement different versions of the HDR tests that would run w the GeforceFX feature set. That IMO would open up another can of worms because Shadermark is also a benchmarking tool, and the equal workload assumption would be broken.

I hope this makes sense :D
 
NVIDIA doesn't support FPRT because GeFFX hardware doesn't comply with current MS FPRT requirements.

MS is not very clear in regard to the requirements for DX9 compliance. MS doesn't say if FPRT are required, recommanded or optionnal. But in the recommanded specs of DX9 they say what the default value of floating point texture format should be. So it seems that FPRT is required or recommanded but not optionnal.

David Kirk says at the Editor's Day that FPRT support is not a priority for NVIDIA but that they'll work on it when developers will ask for it.


Pete -> FX12 and FP16 precision is the same (10bits)
 
F-Buffer isn't a cure-all, it is a way to address one limitation you discussed. In combination with things like: some significantly increased functionality, more performance, and developer innovation...it can allow some interesting real-time effects. It is just that nVidia tried to pretend that larger instruction count was a cure-all.

DX 10 and PS 3.0 (which, AFAIK, are not associated items, BTW) are more than just increased instruction count.

For some development work/experimentation, working toward moving real-time shaders closer to off-line rendering shaders, and simplifying the set of limitations developers have to worry about, it is pretty useful, though.

As for rthdribl...AFAIK it doesn't offer the same output quality for full functionality, and suffers a performance penalty for working around the limitation. This would seem to relate to why doing HDR in this way in addition to other significant pixel shader usage might pose quite a significant challenge to developers.

As for RGBE, take a look. It seems to use a fourth component as a common exponent for the R, G, and B components, which seems like a form of compression. At first glance, this seems to achieve an increased range, while maintaining precision, at the price of: lesser precision as the color varies from grey scale values, extra math instructions to analyze and encode into the format, and tying up the alpha component if it could be used for some other usage.

Also, remember that storage and calculation are different...the final result output gets stored in 8 bit format at the end anyways, but the higher precision and range were important at key points in the process before that.

Addition:

Actually, I found a comment from following the link:

Additionary, D3DFMT_A16B16G16R16F and D3DFMT_A16B16G16R16 texture formats that can be rendered are highly recommended. If the current Direct3D driver doesn't support any floating point texture formats, Depth Of Field and Halo effects will not work. And if doesn't support any high-precision formats (at least 16-bit floating point or integer per component), both of the image quality and the frame rate will be down :(

It shows the issues for a developer and how mentioning RGBE as a solution, and saying that is demonstrated by this demo, is...a bit of a misdirection. It does show that an fp16 intermediate format is suitable for what it tries to do (you lose effects if it isn't floating point, and performance and quality if it is less than 16 bits per component), though I don't know the demo's implementation details in between the passes to that storage format.
 
BRiT said:
I sure hope you didn't take my response as one of abuse. If you did, you're reading far too much into it.

It seems you have an open line of communications with Nvidia. I've never seen a direct answer from them on why they don't support FPRT. I've only read what other people have derived the issue to be. Even though I have no doubts of the correctness of the outside parties' analysis, I'm interested in seeing how Nvidia handle answering the question. I doubt they'd answer it direct and concise, but parrot back more PR-rhetoric.
Your remark I took very seriously, it was someone else's flippant "you are asking the wrong questions" comment that irritated me. I mean great, what's that supposed to mean. Here I am, humbling myself, asking for help in order to help my readers, and what I got in return was a snide comment simply telling me that I'm still doing the wrong thing, without any hint about what the right thing might be. :)

Do I have an open line of communication with NVIDIA? No more than anybody else. They answer what they want and how they want it. It's business. I can ask Brian about FPRTs, but I'll get either a PR machine answer or no answer at all.
 
Jakub said:
b). Brandon brought to my attention the interesting ShaderMark results (scroll down, you'll see a lot of failures on NVIDIA's part.) These results might mirror Valve's problems with NVIDIA hardware, I thought, and I brought it up with NVIDIA, here's the answer I got:

Brian Burke said:
Floating point render targets make HDR implementation easier and more developer-friendly; however, it is possible to achieve high dynamic range using 16-bit (and even 8-bit) integer render targets, with some extra shader (and developer) legwork; notice how the high dynamic range effects in Masaki Kawase's RTHDRIBL demo run correctly on GeForceFX hardware, despite running with the same set of texture formats and render targets that are available to Shadermark. In fact, one of the primary HDR texture storage formats used in off-line rendering (Radiance's RGBE) is just 8-bits per component.
No one said that you must support floating point render targets to take advantage of HDR effects. However, if you use floating point render targets, then HDR comes naturally.

RGBE... I assume the "E" is exposure (acutally, it's the exponent, I checked at this link). This means that you are losing the alpha channel to hold the exponent of the pixel. That's a problem if your application is using destination alpha. Also, this 8-bit RGBE doesn't have the precision (or range) of a 16-bit (or greater) floating point surface. (I say range because the ratio of two non-zero channels is limited to a minimum of 1/256.) In fact, I'd say that RGBE is pretty much irrelevant to the discussion (see below).
Is ShaderMark flawed then?
Not at all. It's a piece of software that was designed to use certain features. If those features are not available, it will not run. Could the author have rewritten it to work on other platforms? Tough to say as we can't be sure how the features are being used. If you need greater precision then you have to go with floating point render targets. If you need HDR and destination alpha, then RGBE is out of the question. It all depends on your needs.

Edit: I should have said "If you need greater precision, then you'll have to use a format that offers more bits." Not all high precision formats are floating point, but floating point formats have the advantage of offering a larger range.
 
demalion said:
Actually, I found a comment from following the link:

Additionary, D3DFMT_A16B16G16R16F and D3DFMT_A16B16G16R16 texture formats that can be rendered are highly recommended. If the current Direct3D driver doesn't support any floating point texture formats, Depth Of Field and Halo effects will not work. And if doesn't support any high-precision formats (at least 16-bit floating point or integer per component), both of the image quality and the frame rate will be down :(
It shows the issues for a developer and how mentioning RGBE as a solution, and saying that is demonstrated by this demo, is...a bit of a misdirection. It does show that an fp16 intermediate format is suitable for what it tries to do (you lose effects if it isn't floating point, and performance and quality if it is less than 16 bits per component), though I don't know the demo's implementation details in between the passes to that storage format.
I think the quote is saying that you need at least FP16 or 16-bit integer, not that you'll lose effects if you switch to integer. Never mind, the depth of field/halo effect just clicked.
 
Jakub said:
The number of updates, corrections and interesting comments to add is getting absurd.

Two last issues I'd like to clear up:

a). Supposedly the F-Buffer allows R350/R360 class hardware to run without an instruction limit, at least in OpenGL 2.0. Is it the miracle cure-all, and if so, is it something PS 3.0/DX10 will be able to take advantage of?

b). Brandon brought to my attention the interesting ShaderMark results (scroll down, you'll see a lot of failures on NVIDIA's part.) These results might mirror Valve's problems with NVIDIA hardware, I thought, and I brought it up with NVIDIA, here's the answer I got:

Brian Burke said:
Floating point render targets make HDR implementation easier and more developer-friendly; however, it is possible to achieve high dynamic range using 16-bit (and even 8-bit) integer render targets, with some extra shader (and developer) legwork; notice how the high dynamic range effects in Masaki Kawase's RTHDRIBL demo run correctly on GeForceFX hardware, despite running with the same set of texture formats and render targets that are available to Shadermark. In fact, one of the primary HDR texture storage formats used in off-line rendering (Radiance's RGBE) is just 8-bits per component.

Here is a description and download link to said demo.

Is ShaderMark flawed then?

A) R350/360 don't have a problem with PS2.0 and instruction limits that I'm aware of aren't they actually like 65k???

B) Can be done but requires multipul renders at different brightnesses and your gonna loose FPS and is a pain in the rear to implement and I guess you could do a rather limited implementation using only 16 bit components.
 
bloodbob said:
A) R350/360 don't have a problem with PS2.0 and instruction limits that I'm aware of aren't they actually like 65k???

B) Can be done but requires multipul renders at different brightnesses and your gonna loose FPS and is a pain in the rear to implement and I guess you could do a rather limited implementation using only 16 bit components.
A). OK, so I've heard 64, 96, and 160 so far... and now 65k :) This just gets easier with every passing hour.

B). I thought the problem was that NVIDIA didn't support floating point render to target, not the actual nature of 16- vs 24-bit precision?
 
R300 limit is crap but :) I'll try to go find some tech specs for you Jakub I could be wrong

*UPDATE* looks like my bad it only supports 65k VERTEX shader instructions :/
 
Can someone familiar with OpenGL confirm these floating point render target restrictions by NVIDIA?

Lev Povalahev said:
no filtering (infact, only GL_NEAREST or GL_NEAREST_MIPMAP_NEAREST)
no alpha test
no alpha blend
no logic op
no dither

Additionally, NVIDIA only allows floating point textures to be used only with NV_texture_rectangle (this means no mipmaps at all here) and fragment programs.
Also, "no logic op" seems like a very general restriction. Care to give an example of what it could entail? Right now it seems like almost anything...
 
Jakub,

One point that I would like you to readress if possible is not one of a technical nature. In your initial article you said that Huang made the comment along the lines that "ATI is obsessed with NVIDIA and has engineers looking at their IQ problems and then they tell people at Shader day".

I'd just like to point out that this was not the point of ATI's Shader day at all and ATI did no such thing there. The only people that made any claims about NVIDIA issues at Sahder Daye and/or IQ were in fact Valve, and given the surprise from the ATI personel about the presentation Valve actually gave they weren't expecting that at all.

In fact, in one sense its a real shame that Valve made this presentation because it gave a real stilted perception of what Shader Day was all about as this is all any of the journos (myself included) actually wrote about. The fact of the matter is that the entire shader day was actually an educational day looking at what pixel shaders are and the types of things they can achieve. ATI themselve didn't make any claims about NVIDIA, other than posting up some performance numbers of long shaders and the difference in IQ between FP16 and FP24.

Also there is another point, that I raised in the other thread which you might like to consider looking into, given the comments from Gearbox about the support they recieved from NVIDIA. that might raise a slightly different perspective that could be worth looking into for a followup article.
 
Back
Top