Direct3D feature levels discussion

Discussion in 'Rendering Technology and APIs' started by DmitryKo, Feb 20, 2015.

  1. Ext3h

    Regular Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    417
    Likes Received:
    475
    Do they really in practical application, or does that only apply to synthetic benchmarks in the first place which didn't use LOD for models on purpose in order to artificially trigger bottlenecks in the geometry pipeline?

    I mean, sure, you can use a 2 million triangles character model. And then boast how you still managed to squeeze it down to only a few hundred k of not rejected triangles, so the impact had been reduced far enough to allow any form of multi pass.

    Or you could just have provided a lower resolution model in the first place for slower hardware.
     
    iroboto likes this.
  2. Lurkmass

    Regular Newcomer

    Joined:
    Mar 3, 2020
    Messages:
    306
    Likes Received:
    345
    As explained in another thread when we enable tessellation or geometry shaders, there is additional overhead incurred to the hardware pipeline so a similar principle applies with the mesh shading pipeline. Our mesh shading pipeline comes in two stages with both the amplification shader and the mesh shader but the overhead of running amplification shaders isn't trivial. A vertex shading only pipeline (no tess or GS in this case) with a trivial amount of work could potentially win in terms of performance when we consider the fact that we only have one stage ...
     
    iroboto likes this.
  3. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,480
    Likes Received:
    417
    You can't just enable Mesh Shaders, provide it with the same content, and expect improved performance. For static content that can have vertex reuse performed offline Mesh Shaders are a great fit and may be able to outperform the traditional pipeline. I say may because it really depends on the bottleneck. If you're limited by VS performance Mesh Shaders will only be better if you can reduce the number of vertices.

    If you want a GPU driven approach, like Sebbbi promoted, where compute was used for high level culling instead of the CPU then the combination of Amplification and Mesh Shaders should be a good fit.

    I'm using generalizations here. We'll really need to wait and see real content to know if expectations like mine match reality.

    To really know if Amplification/Mesh Shaders are better we need optimized versions of both pipelines for comparison. As Rys, said the Mesh Shader off case for the 3dmark test isn't as fast as the traditional pipeline could go given a different implementation.
     
    tinokun, HLJ, fellix and 5 others like this.
  4. Lurkmass

    Regular Newcomer

    Joined:
    Mar 3, 2020
    Messages:
    306
    Likes Received:
    345
    I heard that the Direct3D team are exploring the potential into exposing an equivalent DEVICE_LOCAL and HOST_VISIBLE memory type in Vulkan for D3D12. On AMD HW, this memory type amounts to 256MB in VRAM. On AMD HW with resizable BAR, this memory type includes the entire VRAM.

    On D3D12, we don't have explicit access for this particular memory type like we do on Vulkan. When we create our memory heaps on D3D12, there are 4 types of heaps for which we can specify them as DEFAULT, UPLOAD, READBACK, or CUSTOM. If we are creating DEFAULT, UPLOAD, or READBACK heaps then their other properties such as the CPU page and the memory pool must be defined as UNKNOWN which makes it forbidden to express that memory type. When we're creating a CUSTOM heap type we don't know if the driver will use this memory type either so this case isn't ideal either.

    I suspect a part of the reason why AMD are seeing performance gains with resizable BAR is because their drivers are placing the UPLOAD heaps in VRAM. I assume it would be helpful information for the driver if we could extend our UPLOAD heaps to have other explicitly defined heap properties such as specifying our memory pool to be D3D12_MEMORY_POOL_L1.
     
    BRiT, Malo and DegustatoR like this.
  5. OlegSH

    Regular Newcomer

    Joined:
    Jan 10, 2010
    Messages:
    522
    Likes Received:
    758
    I guess a few exe renaming tests might show whether it's a default driver behavior with resizable BAR or there are driver profiles per game (it doesn't look like there are any profiles since there are games which perform slower with rBAR on AMD).
    It would be also interesting to know whether this driver promotion stuff works for the default 256 MB rBAR size (would require some video memory managment for the limited space, but still might capture a good fraction of benefit)
     
    Dictator likes this.
  6. Lurkmass

    Regular Newcomer

    Joined:
    Mar 3, 2020
    Messages:
    306
    Likes Received:
    345
    Here's a deeper explanation provided an AMD devtech engineer with respect to Vulkan. AMD also recommends that you shouldn't use the full 256 MB BAR either since "implicit resources" can take up that memory as well so you'll want to leave 33% available for the driver.
     
    Krteq and BRiT like this.
  7. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    881
    Likes Received:
    1,022
    Location:
    55°38′33″ N, 37°28′37″ E
    AMD released beta Adrenalin driver 27.20.21001.7005 for Windows Insider Preview "Cobalt" on Windows Update. Here are new features reported in build 21354, comparing to current Adrenalin 21.3.2:
    Code:
    Hardware-accelerated scheduler : Disabled, DXGK_FEATURE_SUPPORT_EXPERIMENTAL (1)
    HighestShaderModel : D3D12_SHADER_MODEL_6_6 (0x0066)
    ShaderCache.SupportFlags : D3D12_SHADER_CACHE_SUPPORT_SINGLE_PSO | LIBRARY | AUTOMATIC_INPROC_CACHE | AUTOMATIC_DISK_CACHE | DRIVER_MANAGED_CACHE | ??? (127) (0b0111'1111)
    AtomicInt64OnTypedResourceSupported : 1
    AtomicInt64OnGroupSharedSupported : 1
    DerivativesInMeshAndAmplificationShadersSupported : 1
    
    Note that shader models 6.6 and experimental 6.7 are supported; experimental support for hardware-accelerated graphics scheduler is available, but the feature is not enabled by recent Windows Insider builds.

    Shader Cache Support reports two new flags, 0x20 and 0x40 - no idea what they mean as Microsoft didn't yet release Windows SDK/WDK for Cobalt.
     
    #1127 DmitryKo, Apr 8, 2021
    Last edited: Apr 9, 2021
  8. Pinstripe

    Newcomer

    Joined:
    Feb 24, 2013
    Messages:
    142
    Likes Received:
    124
    Will this Hardware-accelerate scheduler also help to alleviate Nvidia's driver overhead issue?
     
  9. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    2,206
    Likes Received:
    1,601
    Location:
    msk.ru/spb.ru
    It is enabled on Pascal+ in current Windows release since last year.
    I was actually wondering why nobody bothered to check how it is influencing CPU limitations on GFs.
    But from what I can tell the differences are fairly minor, and you'd generally be better off with it being off since it does introduce some compatibility issues.
     
  10. Pinstripe

    Newcomer

    Joined:
    Feb 24, 2013
    Messages:
    142
    Likes Received:
    124
    That's the same as HAGS? Okay.

    Disappointng it does so little in practice.
     
  11. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    2,206
    Likes Received:
    1,601
    Location:
    msk.ru/spb.ru
    I may be wrong here but I think that it's more about future h/w and API changes than for what's currently available.
     
  12. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    881
    Likes Received:
    1,022
    Location:
    55°38′33″ N, 37°28′37″ E
    Yes, that's exactly what they said in the DirectX Blog post on hardware accelerated GPU scheduling.
     
    DegustatoR likes this.
  13. Lurkmass

    Regular Newcomer

    Joined:
    Mar 3, 2020
    Messages:
    306
    Likes Received:
    345
    I wonder if there's any plans from Microsoft to make PSO creation simpler or expose bindless buffers because some Vulkan implementations are clearly ahead of D3D12 in these aspects ...

    Dynamic states: EXT_extended_dynamic_state, EXT_color_write_enable, EXT_vertex_input_dynamic_state, EXT_extended_dynamic_state2 (unreleased/dynamic blend states ?)

    Bindless: KHR_buffer_device_address & bindless texture handles (future)

    It would be a massive turn around that Vulkan somehow ends up being the easier API to use between it and D3D12 on some driver implementations. Exposing more dynamic states would be very helpful to making pipeline state management simpler and it would aid in reducing pipeline/shader compilation times because drivers wouldn't need to constantly compile hundreds or even thousands of unique pipelines since the HW is capable of making these very cheap state changes and would instead only need to compile several dozen pipelines. (less pipelines = lower compilation times) D3D12 also exposes GPU virtual addresses but we can't do anything interesting with them compared to Vulkan like being able to build complex data structures such as linked lists. Bindless is mostly 'implicit' on D3D12 and the only time we can ever pass the GPU VAs to the shaders is through root descriptors where they're accessed from the very limited space of the root signature (64 DWORDs) whereas buffer references in Vulkan by comparison can be stored anywhere in memory.
     
    Remij, PSman1700 and DavidGraham like this.
  14. Remij

    Newcomer

    Joined:
    May 3, 2008
    Messages:
    232
    Likes Received:
    385
    ^that would be sweet.

    I'm excited for Game Stack Live on the 20th+21st. We should get a good look into DirectStorage, Shader Model 6.6, the Velocity Architecture, and improvements to DX12U dev tools.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...