Futuremark: 3DMark06

Discussion in 'Graphics and Semiconductor Industry' started by trinibwoy, Dec 23, 2005.

  1. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    Like what, specifically?
     
  2. Neeyik

    Neeyik Homo ergaster
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,231
    Likes Received:
    45
    Location:
    Cumbria, UK
    Are you suggesting it as being a multipass PS2.0 test? - the texture and arithmetic instruction count is way over the PS2.0 limit; I haven't bothered to sit and check what the register usage is like either.
     
  3. N00b

    Regular

    Joined:
    Mar 11, 2005
    Messages:
    698
    Likes Received:
    114
    I thing you are wrong here. Doing FP16 blending with a pixel shader is so trivial even I could probably write the shader after fiddling with the DX documentation for an hour or two. And the performance penalty surely isn't that great, I guess about 5-10% max.
    Adding MSAA with a pixel shader on the other hand is not so simple. I wonder if it can be done and how? You could probably do SSAA, which would not really be comparable and performance would suck.

    So what Futuremark has done here reflects what a most developers would have done. Add the trivial fallback and ignore the complicated one. It's not like we will see FP16 AA with current nVidia games in any forthcoming game. Will Not Happen. (Unless someone comes up with a very clever trick no one has thought of yet, but I doubt it)

    So, as I said in a previous post, the absence of HDR/SM3.0 AA/AF scores with current nVidia cards should not be seen as unfair, but as a boon. The X1x00 cards simply have an important feature that the current nVidia cards don't have.

    That said, the abscence of a HDR/SM3.0 AA/AF for current nVidia cards hints that future cards will support FP16 AA. So I guess in two or there months this whole affair will be non-issue anyway.
     
  4. N00b

    Regular

    Joined:
    Mar 11, 2005
    Messages:
    698
    Likes Received:
    114
    Thanks.
     
  5. Hubert

    Newcomer

    Joined:
    Sep 16, 2003
    Messages:
    151
    Likes Received:
    0
    Location:
    Transsylvania
    I guess, given the circumstances, the NA score is best. It simply says as far as we (FutureMark) know it you won't be able to use HDR and AA with Nvidia cards. It would be different if FutureMark did use a AA algorithm in shaders, than a score would worth be given.

    An Ati fan should be quite happy with 3DMark 2006 ... it states that the so advertised Nvidia only SM 3.0 feature, HDR, is just unusable in real life. Or, Nvidia owners have to play games twice: first with decent IQ, second with HDR. Or viceversa.
     
    #565 Hubert, Jan 21, 2006
    Last edited by a moderator: Jan 21, 2006
  6. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,714
    Likes Received:
    2,135
    Location:
    London
    Since PCF/Fetch4 cannot be used for the "advanced shadowing" algorithm of the SM3/HDR tests (3 and 4), does DST/PCF have much of a future?

    It seems to me that DST/PCF/Fetch4 might end-up like stencil shadows, a feature that's used for 2 or 3 game engines and is then "forgotten" as not good enough.

    Though I presume that it's the hardware-PCF/Fetch4 that's at issue here, because DSTs are always going to be needed, however fancy the shadow filtering technique. Is that correct?

    I'm not clear on whether CSM is used in all four tests. Presumably this is independent of the technique for fetching shadow samples and/or filtering them, so I presume it's in all four tests.

    Jawed
     
  7. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    I'm not sure that it "can't be used"; I'm looking at a test application now where it is used, along with a 12-tap random sample (equating to 48 samples in total) and the shadow quality is very good - given that the performance for single sample is roughly the same as 4 samples with PCF/Fetch4 then this is probably what developers will use anyway (this same point was brought up with 3DMark05, so I'm not sure what the logic is behind changing it). I think ATI are peeved because this can be combined with dynamic branching such that the branch test just does a single sample of the depth map in or out of the shadow, but only applies the higher tap sampling when its detected to be at the edge of a shadow map (which results in a performance improvement on ATI hardware, and can also result in IQ improvements since you could spend more on just sampling the shadow edges if you know you aren't going to waste a lot of processing when its fully in or out of shadow).

    I'm assuming, here, that 3DMark06's shadowing mechanism doesn't use dynamic branching anyway.
     
  8. Hubert

    Newcomer

    Joined:
    Sep 16, 2003
    Messages:
    151
    Likes Received:
    0
    Location:
    Transsylvania
    Thanks !

    Man, I begin to understand the intricacies of today's graphics hardware ... (the link given by Jawed in "fetch4 - important ?" topic, Siggraph Shading Course 2006 pdf. did help a lot )

    I better leave until it's too late. :)
     
  9. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,714
    Likes Received:
    2,135
    Location:
    London
    "Can't be used" was meant very much in the sense that "it offers no performance gain, and is therefore pointless". There's no point in fetching four samples and discarding three, if fetching one sample is an option.

    Now, as to your comments about DB and filtering only where there is likely to be a penumbra - well I have to say this was always the foundation for my suspicions against 3DMk06 using DB. It is clearly a technique that heavily favours ATI hardware because of the inadequacy of the NV implementation (rather than it being absent), and one that is part of DX9 to boot. It's at the root of my assertion that FM copped-out big time. Pathetic and unimaginative.

    Soft shadowing is clearly the banner case for per-pixel DB.

    Jawed
     
  10. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    Feching 4 samples (and multiple random locations) will always be a quality gain.

    I think their point being is that given there are two paths already there for many things, why not two paths for the shadowing?
     
  11. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,714
    Likes Received:
    2,135
    Location:
    London
    It's actually Siggraph 2005

    http://www.ati.com/developer/SIGGRAP...Course_ATI.pdf

    Jawed
     
  12. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,714
    Likes Received:
    2,135
    Location:
    London
    But if your intention is to use a sparse filtering kernel, then four contiguous samples anywhere in the kernel means it's no longer sparse.

    Jawed
     
    #572 Jawed, Jan 21, 2006
    Last edited by a moderator: Jan 21, 2006
  13. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    4 taps per sparse sample is going to be better quality than than just single tap sparse samples (and not that different in performance).
     
  14. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,714
    Likes Received:
    2,135
    Location:
    London
    A tap and a sample are the same thing. Otherwise I'm missing something...

    Code:
    // Look up rotation for this pixel
     
    float2 rot = BX2( tex2Dlod(RotSampler,
     
    float4(vPos.xy * g_vTexelOffset.xy, 0, 0) ));
     
    for(int i=0; i<12; i++) // Loop over taps
     
    {
     
    // Rotate tap for this pixel location and scale relative to center
     
    rotOff.x = rot.r * quadOff[i].x + rot.g * quadOff[i].y;
    rotOff.y = -rot.g * quadOff[i].x + rot.r * quadOff[i].y;
    offsetInTexels = g_fSampRadius * rotOff;
     
    // Sample the shadow map
     
    float shadowMapVal = tex2Dlod(ShadowSampler,
     
    float4(projCoords.xy + (g_vTexelOffset.xy * offsetInTexels.xy), 0, 0));
     
    // Determine whether tap is in light
     
    inLight = ( dist < shadowMapVal );
     
    // Accumulate
     
    percentInLight += inLight;
    }
    
    Jawed
     
  15. kyetech

    Regular

    Joined:
    Sep 10, 2004
    Messages:
    532
    Likes Received:
    0
    Does any body know where I can download videos of these things running... I REALLY wanna see the new canyon run, and also that snow one.... But damn, I havent got the hardware....

    Im just a poor addicted graphics whore that needs my next fix !!

    please help :)
     
  16. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    Yeah and no. With PCF, 4 taps, the depth compare and the averaged value is all a single operation and roughly the same cost as a single sample - so using multiples of those is likely to result in a better quality output. With Fetch 4, 4 taps is a sample; the cost of the fetching the 4 taps is the same as a single sample but the compare and average has to be done in the shader, will will probably end up being negligable overall. The point being, given that 4 taps per sample more or less the same cost as just 1 tap per sample then why not do it and sparse sampling?
     
  17. Rys

    Rys Graphics @ AMD
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,182
    Likes Received:
    1,579
    Location:
    Beyond3D HQ
    As Nick asks, I think it's multipassable on PS2.0 hardware and I don't see anything in the shader (although I just looked quickly) that would stop it being run on that class of hardware, primarily so a "here, look what PS3.0 buys you in this very long multipass PS2.0 shader" comparison/test could be done, since it doesn't seem to have any dynamic flow control or other PS3.0-specific construction.

    You're the expert! :grin:
     
  18. Xmas

    Xmas Porous
    Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,344
    Likes Received:
    176
    Location:
    On the path to wisdom
    Soft shadowing might be a banner case for DB, but hardly for per-pixel DB. With shadows you usually have large contiguous areas that are completely in or out. In fact it is one of those rare cases where NVidia's DB can be a huge performance gain despite its large granularity.

    Did they explain how they detect edges? Taking a smaller number of samples first and checking whether they're all in or out?
    That technique would help NVidia as well (they presented it in 2004), though likely not as much.
     
  19. Demirug

    Veteran

    Joined:
    Dec 8, 2002
    Messages:
    1,326
    Likes Received:
    69
    Now I am understand what you want to say. From a first look I would say you are right. I currently try to add a new plugin to the DirectX Tweaker that can save the HLSL code to a file if the app uses D3DX to compile it during runtime. If we can get the HLSL code from this shader we can at least check if it can compile for NV3X/R4XX.
     
  20. Demirug

    Veteran

    Joined:
    Dec 8, 2002
    Messages:
    1,326
    Likes Received:
    69
    If I use the shadercode without removed comments I can see that sometimes they shadow texture is only used in one branch path.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...