Like what, specifically?Cowboy X said:I cannot be the only one who remembers the large scale cheating done in titles that weren't even in the dx 9 weak point of the NV30 .
Like what, specifically?Cowboy X said:I cannot be the only one who remembers the large scale cheating done in titles that weren't even in the dx 9 weak point of the NV30 .
Are you suggesting it as being a multipass PS2.0 test? - the texture and arithmetic instruction count is way over the PS2.0 limit; I haven't bothered to sit and check what the register usage is like either.Rys said:Is it just me or is the Perlin noise one just a (very) long < 512 instruction shader that'd compile as a pixelshader 2.0 test?
I thing you are wrong here. Doing FP16 blending with a pixel shader is so trivial even I could probably write the shader after fiddling with the DX documentation for an hour or two. And the performance penalty surely isn't that great, I guess about 5-10% max.Joe DeFuria said:Using that logic, it makes no sense to report a complete score for certain SM 3.0 NV parts that do not support floating point blending because part of the tests won't run.
And yet, a complete score is in fact given.
This is the problem that I have...it's not consistent.
Thanks.Pete said:Damien took care of the former for you. Check out Hanners' EB article for the latter. (I referenced his #s a page or three back: 25 and 17% hit on a GF6 and GF7, respectively).
Chalnoth said:Right, so either the NV4x will produce a score that is artificially high or low, depending upon how the comparison is done.
It's actually Siggraph 2005Hubert said:Thanks !
Man, I begin to understand the intricacies of today's graphics hardware ... (the link given by Jawed in "fetch4 - important ?" topic, Siggraph Shading Course 2006 pdf. did help a lot )
I better leave until it's too late.
But if your intention is to use a sparse filtering kernel, then four contiguous samples anywhere in the kernel means it's no longer sparse.Dave Baumann said:Feching 4 samples (and multiple random locations) will always be a quality gain.
// Look up rotation for this pixel
float2 rot = BX2( tex2Dlod(RotSampler,
float4(vPos.xy * g_vTexelOffset.xy, 0, 0) ));
for(int i=0; i<12; i++) // Loop over taps
{
// Rotate tap for this pixel location and scale relative to center
rotOff.x = rot.r * quadOff[i].x + rot.g * quadOff[i].y;
rotOff.y = -rot.g * quadOff[i].x + rot.r * quadOff[i].y;
offsetInTexels = g_fSampRadius * rotOff;
// Sample the shadow map
float shadowMapVal = tex2Dlod(ShadowSampler,
float4(projCoords.xy + (g_vTexelOffset.xy * offsetInTexels.xy), 0, 0));
// Determine whether tap is in light
inLight = ( dist < shadowMapVal );
// Accumulate
percentInLight += inLight;
}
Yeah and no. With PCF, 4 taps, the depth compare and the averaged value is all a single operation and roughly the same cost as a single sample - so using multiples of those is likely to result in a better quality output. With Fetch 4, 4 taps is a sample; the cost of the fetching the 4 taps is the same as a single sample but the compare and average has to be done in the shader, will will probably end up being negligable overall. The point being, given that 4 taps per sample more or less the same cost as just 1 tap per sample then why not do it and sparse sampling?Jawed said:A tap and a sample are the same thing. Otherwise I'm missing something...
As Nick asks, I think it's multipassable on PS2.0 hardware and I don't see anything in the shader (although I just looked quickly) that would stop it being run on that class of hardware, primarily so a "here, look what PS3.0 buys you in this very long multipass PS2.0 shader" comparison/test could be done, since it doesn't seem to have any dynamic flow control or other PS3.0-specific construction.Demirug said:How do you come to this conclusion? The Perlin noise shader is a 3.0 shader.
Soft shadowing might be a banner case for DB, but hardly for per-pixel DB. With shadows you usually have large contiguous areas that are completely in or out. In fact it is one of those rare cases where NVidia's DB can be a huge performance gain despite its large granularity.Jawed said:Soft shadowing is clearly the banner case for per-pixel DB.
Did they explain how they detect edges? Taking a smaller number of samples first and checking whether they're all in or out?Dave Baumann said:I'm not sure that it "can't be used"; I'm looking at a test application now where it is used, along with a 12-tap random sample (equating to 48 samples in total) and the shadow quality is very good - given that the performance for single sample is roughly the same as 4 samples with PCF/Fetch4 then this is probably what developers will use anyway (this same point was brought up with 3DMark05, so I'm not sure what the logic is behind changing it). I think ATI are peeved because this can be combined with dynamic branching such that the branch test just does a single sample of the depth map in or out of the shadow, but only applies the higher tap sampling when its detected to be at the edge of a shadow map (which results in a performance improvement on ATI hardware, and can also result in IQ improvements since you could spend more on just sampling the shadow edges if you know you aren't going to waste a lot of processing when its fully in or out of shadow).
Rys said:As Nick asks, I think it's multipassable on PS2.0 hardware and I don't see anything in the shader (although I just looked quickly) that would stop it being run on that class of hardware, primarily so a "here, look what PS3.0 buys you in this very long multipass PS2.0 shader" comparison/test could be done, since it doesn't seem to have any dynamic flow control or other PS3.0-specific construction.
You're the expert!
Dave Baumann said:I'm assuming, here, that 3DMark06's shadowing mechanism doesn't use dynamic branching anyway.