Your Wish List : Things that didn't make D3D10

Discussion in 'Architecture and Products' started by Reverend, May 8, 2007.

  1. Demirug

    Veteran

    Joined:
    Dec 8, 2002
    Messages:
    1,326
    Likes Received:
    69
    Yes there is some support for tessellation in DX9 but it doesn’t go very far. D3D10 doesn’t support it anymore. The R600 tessellator should be able to support the DX9 style tessellation but as this thread is about D3D10 I haven’t include this in my first answer.
     
  2. Davros

    Legend

    Joined:
    Jun 7, 2004
    Messages:
    14,859
    Likes Received:
    2,276
    why would they remove it it seems like a good feature to have or can you do the same stuff with geometry shaders or am i missunderstanding what they are for

    another Q in the r600 refresh do you think they will remove that tessalation unit and maybe replace it with something else, render back ends maybe ?

    ps: found another quote
    "Microsoft is pushing hard to make tessellation a requirement of the next DirectX (DirectX 10.1 or DirectX 11 or whatever they end up calling it), so ATI may be a little ahead of the curve here."
     
    #22 Davros, May 16, 2007
    Last edited by a moderator: May 17, 2007
  3. Demirug

    Veteran

    Joined:
    Dec 8, 2002
    Messages:
    1,326
    Likes Received:
    69
    The DX9 tessellation has a fixed function style and was never really supported. Therefore it was logical that Microsoft had removed it. You can use the geometry shader to do tessellation and I don’t know what the R600 can do better with its dedicated unit.

    As I am not sure how big this unit it it may or may not be interesting to remove it.
     
  4. KindDragon

    Newcomer

    Joined:
    May 17, 2007
    Messages:
    7
    Likes Received:
    0
    Full support stereo rendering, not just stereo backbuffer.
     
  5. santyhammer

    Newcomer

    Joined:
    Apr 22, 2006
    Messages:
    85
    Likes Received:
    2
    Location:
    Behind you
    For DX10.1 I hope:
    - 64bits FP precision support ( that includes ZBuffer and stencil )
    - Full (32/64) floating point texture filtering.
    - Implement cubemap arrays
    - Min AA caps required
    - Improved SLI/MultiGPU syncronization-coordination routines.
    - "Jumbo" 32/64bits floating point textures support for GPGPU computing with the corresponding sampler. Basically what I want is the possibility to allocate 768MB in a 1D texture like CUDA. I'm not sure but is not the maximum DX10 texture 4096 and 8192 for DX10.1?

    For DX11 I could use:

    - A full-programable per-pixel Blend Shader with full R/W support.

    - Full customizable and programable AA shader.

    - Second-depth Z-buffer support ( for SSS, shadow bias, simple transparency sorting, etc )

    - Multiple texture fetching in one call. Something like the ATI fetch4. For example, imagine I wanna get the 17x17 neighbor pixels in a cubemap point... In code:

    Code:
    void myPS ()
    {
        float multifetchedValues[17][17] = texCUBEMultiFetched(cubeSampler,myVec3,17,17);
    }
    
    That "texCUBEMultiFetched" will perform a simple cubemap texel fetch. Then it gets the 17x17 surrounding samples in the fetched cube face. A special case must be implemented in case the neighbors uses a different cube face of course. A tex1D/2D version could be useful too.

    This could be used for penumbra shadows, PCF, etc...

    - Much more advanced texture compression based on wavelets ( like a JPG2000/WMP, but with block decoding ). This can be questionable, specially with GDDR prices coming down but.... well, some graphics cards can decode MPEG in HW so this could be possible.

    - A good solution for alpha-blended transparency ( like an A-buffer, blah blah ) because all the current methods lack or are too slow like the depth-peeling.

    - A simple raycast HLSL instruction will help too and could be the start of raytracing. I heard NVIDIA is working on a demo showing this. Something like creating an acceleration structure when you load a mesh (VB+IB) into VRAM, then do optimized ray-triangle test in local space with a HLSL called "raycast" inside the vertex/geometry/pixel shader.

    Just my 2 cents.
     
    #25 santyhammer, May 18, 2007
    Last edited by a moderator: May 19, 2007
  6. Humus

    Humus Crazy coder
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    3,217
    Likes Received:
    77
    Location:
    Stockholm, Sweden
    You can easily do single pass stereo rendering in DX10 with the GS.

    Why would you want 64bit Zbuffers? Well, DX10 has 64bit depth-stencil surfaces, it's just that 24bits are not used. Are you hoping for like 56bits depth and 8bit stencil?
     
  7. santyhammer

    Newcomer

    Joined:
    Apr 22, 2006
    Messages:
    85
    Likes Received:
    2
    Location:
    Behind you
    Nope really! I want more than 8bits for stencil... lets say 32Z + 32 stencil. I think basically the stencil should be able to work with 32bits objectIDs... 256 IDs are definitely not much hehe.
    Also I could use a 64bits Zbuffer ( double precision ) with no stencil :grin: ( for example for large camera frustrums or more accurate shadowbuffers ).
    And now that we talk about the ZBuffer I could use that 2nd-depth z-buffer for SSS and to mitigate shadow biasing problems too!

    I think the double precision is coming in the DX10.1 though.
     
    #27 santyhammer, May 19, 2007
    Last edited by a moderator: May 19, 2007
  8. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    299
    Location:
    UK
    It is true that as you begin using Stencil for things that it wasn't really made for, D32_S32 or even D32_S8S8S8S8 with a multiplexer (or, fully decoupling depth and stencil) might have some uses. I'm not convinced those are very important, but I'm sure if you really wanted to, you could think of some cool stuff there.

    In the end, none of this makes sense today. However, within the next 5 years, they suddenly making a lot of sense when the programmable shader core replaces the ROPs completely... (No, AMD, you can't claim you were forward-looking by up to 5 years, sorry! ;))
     
  9. SuperCow

    Newcomer

    Joined:
    Sep 12, 2002
    Messages:
    106
    Likes Received:
    4
    Location:
    City of cows
    Unless you also get the ability to output stencil from the pixel shader (which is likely to come with a performance impact) then supporting this number of stencil bits for object IDs would imply you're rendering each of those object in their own separate call which isn't good for batch performance.
    Btw there is a D32_S8 format in D3D10 (32 bit depth, 8 bit stencil).

    Why not, however currently the lack of depth precision often comes from poor utilization of projection matrices more than the "limited" bit precision in depth buffers. I suppose a space rendering engine (with planets and spaceships etc.) might benefit from a 64-bits depth without the hassle of having to partition your depth range.
     
    #29 SuperCow, May 19, 2007
    Last edited by a moderator: May 20, 2007
  10. Andrew Lauritzen

    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,526
    Likes Received:
    454
    Location:
    British Columbia, Canada
    Just use an int32 texture and dynamic branching for "early out". I seriously doubt a hardware-implemented stencil buffer would be any faster than that on modern hardware particularly if you're outputting stencil values from the shader. Hell I can't even get early-stencil to work properly in many *normal* cases!

    Stencil buffers are useful for a few "tricks" (like stencil routing - particularly when combined with MSAA!), but honestly many of the things that people use them for can be performed just as efficiently with a normal texture nowadays. The same can't quite be said for blending operations (which you can do similar read-modify-write cycles using stencil if you want to) due to double-buffering issues, but I suspect that will eventually be the case as well.

    I'd actually rather have *less* fixed hardware like depth and stencil and more programmable stuff :) Depth still makes sense IMHO due to the commonality of its use and the semi-complex data structure that it implements, but stencil is already becoming questionable.
     
  11. santyhammer

    Newcomer

    Joined:
    Apr 22, 2006
    Messages:
    85
    Likes Received:
    2
    Location:
    Behind you
    Ok, what about something like a "blend shader" stage with the ability to read and write? That could be nice!

    Basically what it is is a fourth shader stage that could be after the fragment shader.
    It takes the various bits of data generated by the fragment shader + whatever is in the current fragment of the render target and then output it to the specific fragment. It could be a simple pass trough shader or perhaps more complex...

    There were some thoughts in the OpenGL forums but is going to be hard to implement, specialy due to speed problems reading values already "in use".

    ps: Edited my prev post to add more crazy ideas :p
     
    #31 santyhammer, May 19, 2007
    Last edited by a moderator: May 19, 2007
  12. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    299
    Location:
    UK
    Can you give a little bit more info on that? I'm very curious on what doesn't work there, and if you have any idea why! :)
     
  13. Andrew Lauritzen

    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,526
    Likes Received:
    454
    Location:
    British Columbia, Canada
    I was using stencil for a while with deferred shading to stencil out light volumes (works nicely with z-buffering). However after some benchmarking I realized that while the stencil was *working*, it wasn't actually making it any faster than just shading the whole screen. I spoke to NVIDIA about it and they jokingly suggested that I rename my app to "Doom3.exe" ;) Basically early-stencil seems to work for exactly the case that Doom3's rendering path uses and pretty much nothing else, even in cases where it is theoretically possible as there are no data dependencies.

    With respect to a "blend" stage, it would certainly be useful, but indeed read-modify-write cycles are difficult and somewhat expensive to implement in a programmable manner. The hardware people can probably explain more...

    Many things can certainly be done with the current blend modes though, and more if we had bit-wise blending operations for integer textures.
     
  14. SuperCow

    Newcomer

    Joined:
    Sep 12, 2002
    Messages:
    106
    Likes Received:
    4
    Location:
    City of cows
    I believe what you're telling us, but that makes no sense at all! If you have a lot of volume lights to apply then the cost of shading your scene should be much higher with a fullscreen pass per light compared to marking the volume areas with stencil and only shading those for each light. Of course this depends on your shader complexity but overall this should be true (even if you use dynamic branching to reject out-of-range pixels during the shading passes). You're not using insanely-tesselated volumes (spheres?) for the volume lights are you? (on a unified architecture this may take some of the power you wanted for pixel shading).

    With regard to your comment that textures "can" do the same thing as stencil, yes, they probably can (especially on D3D10) but you cannot expect the same level of performance as stencil buffering. The stencil buffer test is part of the pipeline and has dedicated hardware optimizations (like early stencil testing as you mentioned - it's supposed to work :)) whereas an int texture will need to be written to and fetched like any other textures (both phases require dedicated shader instructions and consume precious color bandwidth).
     
  15. Andrew Lauritzen

    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,526
    Likes Received:
    454
    Location:
    British Columbia, Canada
    Oh definitely it should have been faster - that's why I was using it ;) I ended up just projecting the light volume BB's and using a scissor test on the GPU in that implementation which was fast enough. Note that stencil was *entirely* broken on ATI/OpenGL at the time and I still don't think early stencil works properly in that demo.

    As I mentioned when I spoke to NVIDIA their response was that making early stencil work is touch and go. Honestly I don't think it tends to work in many cases than the exact path of shadow volumes.

    Oh of course not - seriously, early stencil reject was just not working... it was happening *after* the shader.

    I dunno, it seems to me that hardware is getting pretty general and most of the specific API functionality is implemented in a general way in the driver anyways. This is particularly true when you look at the design and flexibility that you get with something like CTM (particularly) or even CUDA. Maybe not this generation, but I don't see a need for a fixed-function stencil buffer in the long run.
     
  16. Humus

    Humus Crazy coder
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    3,217
    Likes Received:
    77
    Location:
    Stockholm, Sweden
    Nvidia hardware seems to be a lot more sensitive with stencil. On ATI hardware you should not have any trouble with early-out stencil. In fact, you'll probably see better performance that way in many cases than using dynamic branching. On R600 it should be even better as it has Hierarchical-stencil as well, unlike previous generations that could only reject on the EarlyZ stage. I haven't revisited this topic with R600, but my gut feeling is that early-out with stencil should be better than ever.
     
  17. Andrew Lauritzen

    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,526
    Likes Received:
    454
    Location:
    British Columbia, Canada
    Yeah but that's all entirely besides the point since ATI has quite possibly the most terrible MRT implementation in OpenGL (which this app was) that I've ever worked with :( Because of that simple fact, no ATI hardware could even touch NVIDIA 6's and 7's, let alone 8's.

    Cool, although like I said I don't care that much about stencil. It can be useful for a few algorithms but IMHO it's a bit of a hold-over from fixed-function days that's only still there in hardware because of shadow volumes, which I also don't care for ;)
     
  18. pocketmoon66

    Newcomer

    Joined:
    Mar 31, 2004
    Messages:
    163
    Likes Received:
    9
    In DX you need to set the stencil state to D3DSTENCILOP_KEEP to allow early stencil. See .http://forum.beyond3d.com/showthread.php?p=286194#post286194

    Could be the same in OGL ?
     
  19. Andrew Lauritzen

    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,526
    Likes Received:
    454
    Location:
    British Columbia, Canada
    Yes it is the same in OGL, but I did that and every other thing they asked, and still no early stencil :(
     
  20. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,325
    Likes Received:
    93
    Location:
    San Francisco
    It's important to clear the stencil buffer every frame, not just once, even if you completely fill it again and again.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...