Futuremark: 3DMark06

Rys said:
Is it just me or is the Perlin noise one just a (very) long < 512 instruction shader that'd compile as a pixelshader 2.0 test?
Are you suggesting it as being a multipass PS2.0 test? - the texture and arithmetic instruction count is way over the PS2.0 limit; I haven't bothered to sit and check what the register usage is like either.
 
Joe DeFuria said:
Using that logic, it makes no sense to report a complete score for certain SM 3.0 NV parts that do not support floating point blending because part of the tests won't run.

And yet, a complete score is in fact given.

This is the problem that I have...it's not consistent.
I thing you are wrong here. Doing FP16 blending with a pixel shader is so trivial even I could probably write the shader after fiddling with the DX documentation for an hour or two. And the performance penalty surely isn't that great, I guess about 5-10% max.
Adding MSAA with a pixel shader on the other hand is not so simple. I wonder if it can be done and how? You could probably do SSAA, which would not really be comparable and performance would suck.

So what Futuremark has done here reflects what a most developers would have done. Add the trivial fallback and ignore the complicated one. It's not like we will see FP16 AA with current nVidia games in any forthcoming game. Will Not Happen. (Unless someone comes up with a very clever trick no one has thought of yet, but I doubt it)

So, as I said in a previous post, the absence of HDR/SM3.0 AA/AF scores with current nVidia cards should not be seen as unfair, but as a boon. The X1x00 cards simply have an important feature that the current nVidia cards don't have.

That said, the abscence of a HDR/SM3.0 AA/AF for current nVidia cards hints that future cards will support FP16 AA. So I guess in two or there months this whole affair will be non-issue anyway.
 
Chalnoth said:
Right, so either the NV4x will produce a score that is artificially high or low, depending upon how the comparison is done.

I guess, given the circumstances, the NA score is best. It simply says as far as we (FutureMark) know it you won't be able to use HDR and AA with Nvidia cards. It would be different if FutureMark did use a AA algorithm in shaders, than a score would worth be given.

An Ati fan should be quite happy with 3DMark 2006 ... it states that the so advertised Nvidia only SM 3.0 feature, HDR, is just unusable in real life. Or, Nvidia owners have to play games twice: first with decent IQ, second with HDR. Or viceversa.
 
Last edited by a moderator:
Since PCF/Fetch4 cannot be used for the "advanced shadowing" algorithm of the SM3/HDR tests (3 and 4), does DST/PCF have much of a future?

It seems to me that DST/PCF/Fetch4 might end-up like stencil shadows, a feature that's used for 2 or 3 game engines and is then "forgotten" as not good enough.

Though I presume that it's the hardware-PCF/Fetch4 that's at issue here, because DSTs are always going to be needed, however fancy the shadow filtering technique. Is that correct?

I'm not clear on whether CSM is used in all four tests. Presumably this is independent of the technique for fetching shadow samples and/or filtering them, so I presume it's in all four tests.

Jawed
 
I'm not sure that it "can't be used"; I'm looking at a test application now where it is used, along with a 12-tap random sample (equating to 48 samples in total) and the shadow quality is very good - given that the performance for single sample is roughly the same as 4 samples with PCF/Fetch4 then this is probably what developers will use anyway (this same point was brought up with 3DMark05, so I'm not sure what the logic is behind changing it). I think ATI are peeved because this can be combined with dynamic branching such that the branch test just does a single sample of the depth map in or out of the shadow, but only applies the higher tap sampling when its detected to be at the edge of a shadow map (which results in a performance improvement on ATI hardware, and can also result in IQ improvements since you could spend more on just sampling the shadow edges if you know you aren't going to waste a lot of processing when its fully in or out of shadow).

I'm assuming, here, that 3DMark06's shadowing mechanism doesn't use dynamic branching anyway.
 
Thanks !

Man, I begin to understand the intricacies of today's graphics hardware ... (the link given by Jawed in "fetch4 - important ?" topic, Siggraph Shading Course 2006 pdf. did help a lot )

I better leave until it's too late. :)
 
"Can't be used" was meant very much in the sense that "it offers no performance gain, and is therefore pointless". There's no point in fetching four samples and discarding three, if fetching one sample is an option.

Now, as to your comments about DB and filtering only where there is likely to be a penumbra - well I have to say this was always the foundation for my suspicions against 3DMk06 using DB. It is clearly a technique that heavily favours ATI hardware because of the inadequacy of the NV implementation (rather than it being absent), and one that is part of DX9 to boot. It's at the root of my assertion that FM copped-out big time. Pathetic and unimaginative.

Soft shadowing is clearly the banner case for per-pixel DB.

Jawed
 
Feching 4 samples (and multiple random locations) will always be a quality gain.

I think their point being is that given there are two paths already there for many things, why not two paths for the shadowing?
 
Dave Baumann said:
Feching 4 samples (and multiple random locations) will always be a quality gain.
But if your intention is to use a sparse filtering kernel, then four contiguous samples anywhere in the kernel means it's no longer sparse.

Jawed
 
Last edited by a moderator:
4 taps per sparse sample is going to be better quality than than just single tap sparse samples (and not that different in performance).
 
A tap and a sample are the same thing. Otherwise I'm missing something...

Code:
// Look up rotation for this pixel
 
float2 rot = BX2( tex2Dlod(RotSampler,
 
float4(vPos.xy * g_vTexelOffset.xy, 0, 0) ));
 
for(int i=0; i<12; i++) // Loop over taps
 
{
 
// Rotate tap for this pixel location and scale relative to center
 
rotOff.x = rot.r * quadOff[i].x + rot.g * quadOff[i].y;
rotOff.y = -rot.g * quadOff[i].x + rot.r * quadOff[i].y;
offsetInTexels = g_fSampRadius * rotOff;
 
// Sample the shadow map
 
float shadowMapVal = tex2Dlod(ShadowSampler,
 
float4(projCoords.xy + (g_vTexelOffset.xy * offsetInTexels.xy), 0, 0));
 
// Determine whether tap is in light
 
inLight = ( dist < shadowMapVal );
 
// Accumulate
 
percentInLight += inLight;
}

Jawed
 
Does any body know where I can download videos of these things running... I REALLY wanna see the new canyon run, and also that snow one.... But damn, I havent got the hardware....

Im just a poor addicted graphics whore that needs my next fix !!

please help :)
 
Jawed said:
A tap and a sample are the same thing. Otherwise I'm missing something...
Yeah and no. With PCF, 4 taps, the depth compare and the averaged value is all a single operation and roughly the same cost as a single sample - so using multiples of those is likely to result in a better quality output. With Fetch 4, 4 taps is a sample; the cost of the fetching the 4 taps is the same as a single sample but the compare and average has to be done in the shader, will will probably end up being negligable overall. The point being, given that 4 taps per sample more or less the same cost as just 1 tap per sample then why not do it and sparse sampling?
 
Demirug said:
How do you come to this conclusion? The Perlin noise shader is a 3.0 shader.
As Nick asks, I think it's multipassable on PS2.0 hardware and I don't see anything in the shader (although I just looked quickly) that would stop it being run on that class of hardware, primarily so a "here, look what PS3.0 buys you in this very long multipass PS2.0 shader" comparison/test could be done, since it doesn't seem to have any dynamic flow control or other PS3.0-specific construction.

You're the expert! :D
 
Jawed said:
Soft shadowing is clearly the banner case for per-pixel DB.
Soft shadowing might be a banner case for DB, but hardly for per-pixel DB. With shadows you usually have large contiguous areas that are completely in or out. In fact it is one of those rare cases where NVidia's DB can be a huge performance gain despite its large granularity.

Dave Baumann said:
I'm not sure that it "can't be used"; I'm looking at a test application now where it is used, along with a 12-tap random sample (equating to 48 samples in total) and the shadow quality is very good - given that the performance for single sample is roughly the same as 4 samples with PCF/Fetch4 then this is probably what developers will use anyway (this same point was brought up with 3DMark05, so I'm not sure what the logic is behind changing it). I think ATI are peeved because this can be combined with dynamic branching such that the branch test just does a single sample of the depth map in or out of the shadow, but only applies the higher tap sampling when its detected to be at the edge of a shadow map (which results in a performance improvement on ATI hardware, and can also result in IQ improvements since you could spend more on just sampling the shadow edges if you know you aren't going to waste a lot of processing when its fully in or out of shadow).
Did they explain how they detect edges? Taking a smaller number of samples first and checking whether they're all in or out?
That technique would help NVidia as well (they presented it in 2004), though likely not as much.
 
Rys said:
As Nick asks, I think it's multipassable on PS2.0 hardware and I don't see anything in the shader (although I just looked quickly) that would stop it being run on that class of hardware, primarily so a "here, look what PS3.0 buys you in this very long multipass PS2.0 shader" comparison/test could be done, since it doesn't seem to have any dynamic flow control or other PS3.0-specific construction.

You're the expert! :D

Now I am understand what you want to say. From a first look I would say you are right. I currently try to add a new plugin to the DirectX Tweaker that can save the HLSL code to a file if the app uses D3DX to compile it during runtime. If we can get the HLSL code from this shader we can at least check if it can compile for NV3X/R4XX.
 
Dave Baumann said:
I'm assuming, here, that 3DMark06's shadowing mechanism doesn't use dynamic branching anyway.

If I use the shadercode without removed comments I can see that sometimes they shadow texture is only used in one branch path.
 
Back
Top