Direct3D feature levels discussion

I'm a bit confused about this. Has it actually been explicitly stated that Nvidia's new neural rendering technologies in their RTX kit are Blackwell only as a hard requirement? The developer page has one mention of Blackwell from what I can tell as more of a marketing tie in.

A performance requirement is different as that would apply to simply just supporting cooperative vectors in Direct X as well with all architectures.
 
Has it actually been explicitly stated that Nvidia's new neural rendering technologies in their RTX kit are Blackwell only as a hard requirement?
No. If anything we have the opposite statements thus far. But we don't know what impact the lack of Blackwell's h/w optimization will have on performance of any of these new features.
 
Turing is going to age like fine wine again if Neural Texture Decompression gets used, and RTX Geometry accelerating RT performance. Crazy architecture, just crazy.
Imagine buying a 2080 Ti in 2018, spending top dollar, but in 2025, you can still play everything. Its performance is on par with the future ¿5060?, the VRAM amount is still decent, and it's compatible with DLSS 4 and FSR 3 frame generation. You could play almost until 2027 with it.
 
Turing is going to age like fine wine again if Neural Texture Decompression gets used, and RTX Geometry accelerating RT performance. Crazy architecture, just crazy.

More like crazy developer relations and marketing. In an alternate universe none of this stuff gets used if Nvidia doesn’t push it aggressively. The idea that Alan Wake 2 is already being updated with RTX geometry is just nuts.
 
Turing is going to age like fine wine again if Neural Texture Decompression gets used, and RTX Geometry accelerating RT performance. Crazy architecture, just crazy.
The 1080 Ti gets all the praise, but the 2080 Super and 2080 Ti will last even longer. The 1080 Ti became obsolescent circa 2023, lasting six years. The 2080 Super and 2080 Ti will last until the PS6 launches. That's a whole decade. I expect the 4090 and 4080 Super to last until the PS7 too.
 
Turing is going to age like fine wine again if Neural Texture Decompression gets used, and RTX Geometry accelerating RT performance. Crazy architecture, just crazy.
When you compare Turing vs RDNA1 (2070 Super vs 5700 XT), you can find this:
  • In Alan Wake 2, the 2070 Super is 60% faster .. because of it's mesh shader support.
  • In Avatar and Star Wars Outlaws, the 2070 Super is 20% faster because of ray tracing acceleration (hardware on 2070 Super vs software on 5700 XT).
  • In Indiana Jones and Metro Exodus EE, you can't even run those on the 5700 XT.
The 1080 Ti gets all the praise, but the 2080 Super and 2080 Ti will last even longer. The 1080 Ti became obsolescent circa 2023, lasting six years. The 2080 Super and 2080 Ti will last until the PS6 launches. That's a whole decade. I expect the 4090 and 4080 Super to last until the PS7 too.
In similar prospects, the 2080 Super is 38% faster than 1080Ti in Avatar and Outlaws, the 2080Ti is 60% faster.
In Alan Wake 2, the 2070 Super (not the 2080 Super) is 40% faster than the 1080Ti. The 2080 Super and 2080Ti is probably 60% and 80% faster respectively, and you still can't run Indiana Jones and Metro Exodus EE on the 1080Ti.
 
RTX Mega Geometry support for Vulkan has dropped today. It works through the proprietary VK_NV extensions. The github links include samples for animated clusters, tessellation clusters, LOD clusters, and partitioned TLAS.

 
Blackwell shows the same feature support matrix as Lovelace.
Just one difference in either normal or exp mode:

MaxGPUVirtualAddressBitsPerProcess : 44 (was 40)

Normal results:

Code:
Direct3D 12 feature checker (March 2024) by DmitryKo (x64) (Agility SDK v613)
https://forum.beyond3d.com/posts/1840641/

Windows 10X version 24H2 (build 26100.2894 ge_release) x64

ADAPTER 0
"NVIDIA GeForce RTX 5080"
VEN_10DE, DEV_2C02, SUBSYS_F3201569, REV_A1
Dedicated video memory : 15889.0 MB (16660824064 bytes)
Total video memory : 48533.8 MB (50891380736 bytes)
BIOS string : Version98.3.3b.0.50
Video driver version : 32.0.15.7202
WDDM version : KMT_DRIVERVERSION_WDDM_3_2 (3200)
Virtual memory model : GPUMMU
Hardware-accelerated scheduler : Enabled, DXGK_FEATURE_SUPPORT_STABLE (2)
GraphicsPreemptionGranularity : DXGI_GRAPHICS_PREEMPTION_PIXEL_BOUNDARY (3)
ComputePreemptionGranularity : DXGI_COMPUTE_PREEMPTION_DISPATCH_BOUNDARY (1)
Maximum feature level : D3D_FEATURE_LEVEL_12_2 (0xc200)
DoublePrecisionFloatShaderOps : 1
OutputMergerLogicOp : 1
MinPrecisionSupport : D3D12_SHADER_MIN_PRECISION_SUPPORT_16_BIT (2) (0b0000'0010)
TiledResourcesTier : D3D12_TILED_RESOURCES_TIER_3 (3)
ResourceBindingTier : D3D12_RESOURCE_BINDING_TIER_3 (3)
PSSpecifiedStencilRefSupported : 0
TypedUAVLoadAdditionalFormats : 1
ROVsSupported : 1
ConservativeRasterizationTier : D3D12_CONSERVATIVE_RASTERIZATION_TIER_3 (3)
StandardSwizzle64KBSupported : 0
CrossNodeSharingTier : D3D12_CROSS_NODE_SHARING_TIER_NOT_SUPPORTED (0)
CrossAdapterRowMajorTextureSupported : 0
VPAndRTArrayIndexFromAnyShaderFeedingRasterizerSupportedWithoutGSEmulation : 1
ResourceHeapTier : D3D12_RESOURCE_HEAP_TIER_2 (2)
MaxGPUVirtualAddressBitsPerResource : 40
MaxGPUVirtualAddressBitsPerProcess : 44
Adapter Node 0:     TileBasedRenderer: 0, UMA: 0, CacheCoherentUMA: 0, IsolatedMMU: 1, HeapSerializationTier: 0, ProtectedResourceSession.Support: 1, ProtectedResourceSessionTypeCount: 1 D3D12_PROTECTED_RESOURCES_SESSION_HARDWARE_PROTECTED
HighestShaderModel : D3D12_SHADER_MODEL_6_8 (0x0068)
WaveOps : 1
WaveLaneCountMin : 32
WaveLaneCountMax : 32
TotalLaneCount : 10752
ExpandedComputeResourceStates : 1
Int64ShaderOps : 1
RootSignature.HighestVersion : D3D_ROOT_SIGNATURE_VERSION_1_2 (3)
DepthBoundsTestSupported : 1
ProgrammableSamplePositionsTier : D3D12_PROGRAMMABLE_SAMPLE_POSITIONS_TIER_2 (2)
ShaderCache.SupportFlags : D3D12_SHADER_CACHE_SUPPORT_SINGLE_PSO | LIBRARY | DRIVER_MANAGED_CACHE | SHADER_CONTROL_CLEAR | SHADER_SESSION_DELETE (115) (0b0111'0011)
CopyQueueTimestampQueriesSupported : 1
CastingFullyTypedFormatSupported : 1
WriteBufferImmediateSupportFlags : D3D12_COMMAND_LIST_SUPPORT_FLAG_DIRECT | BUNDLE | COMPUTE | COPY | VIDEO_DECODE | VIDEO_PROCESS | VIDEO_ENCODE (127) (0b0111'1111)
ViewInstancingTier : D3D12_VIEW_INSTANCING_TIER_3 (3)
BarycentricsSupported : 1
ExistingHeaps.Supported : 1
MSAA64KBAlignedTextureSupported : 1
SharedResourceCompatibilityTier : D3D12_SHARED_RESOURCE_COMPATIBILITY_TIER_2 (2)
Native16BitShaderOpsSupported : 1
AtomicShaderInstructions : 0
SRVOnlyTiledResourceTier3 : 1
RenderPassesTier : D3D12_RENDER_PASS_TIER_0 (0)
RaytracingTier : D3D12_RAYTRACING_TIER_1_1 (11)
AdditionalShadingRatesSupported : 1
PerPrimitiveShadingRateSupportedWithViewportIndexing : 1
VariableShadingRateTier : D3D12_VARIABLE_SHADING_RATE_TIER_2 (2)
ShadingRateImageTileSize : 16
BackgroundProcessingSupported : 1
MeshShaderTier : D3D12_MESH_SHADER_TIER_1 (10)
SamplerFeedbackTier : D3D12_SAMPLER_FEEDBACK_TIER_0_9 (90)
UnalignedBlockTexturesSupported : 1
MeshShaderPipelineStatsSupported : 1
MeshShaderSupportsFullRangeRenderTargetArrayIndex : 1
AtomicInt64OnTypedResourceSupported : 1
AtomicInt64OnGroupSharedSupported : 1
DerivativesInMeshAndAmplificationShadersSupported : 0
WaveMMATier : D3D12_WAVE_MMA_TIER_NOT_SUPPORTED (0)
VariableRateShadingSumCombinerSupported : 1
MeshShaderPerPrimitiveShadingRateSupported : 1
AtomicInt64OnDescriptorHeapResourceSupported : 1
DisplayableTexture : 0
DisplayableTexture.SharedResourceCompatibilityTier : D3D12_SHARED_RESOURCE_COMPATIBILITY_TIER_0 (0)
MSPrimitivesPipelineStatisticIncludesCulledPrimitives : 1
EnhancedBarriersSupported : 1
RelaxedFormatCastingSupported : 1
UnrestrictedBufferTextureCopyPitchSupported : 1
UnrestrictedVertexElementAlignmentSupported : 1
InvertedViewportHeightFlipsYSupported : 1
InvertedViewportDepthFlipsZSupported : 1
TextureCopyBetweenDimensionsSupported : 1
AlphaBlendFactorSupported : 1
AdvancedTextureOpsSupported : 1
WriteableMSAATexturesSupported : 1
IndependentFrontAndBackStencilRefMaskSupported : 1
TriangleFanSupported : 1
DynamicIndexBufferStripCutSupported : 1
DynamicDepthBiasSupported : 1
GPUUploadHeapSupported : 1
NonNormalizedCoordinateSamplersSupported : 1
ManualWriteTrackingResourceSupported : 0
RenderPassesValid : 1
MismatchingOutputDimensionsSupported : 1
SupportedSampleCountsWithNoOutputs : 31
PointSamplingAddressesNeverRoundUp : 1
RasterizerDesc2Supported : 1
NarrowQuadrilateralLinesSupported : 1
AnisoFilterWithPointMipSupported : 1
MaxSamplerDescriptorHeapSize : 4080
MaxSamplerDescriptorHeapSizeWithStaticSamplers : 2048
MaxViewDescriptorHeapSize : 1000000
ComputeOnlyCustomHeapSupported : 0
ComputeOnlyWriteWatchSupported : 1
RecreateAtTier : D3D12_RECREATE_AT_TIER_NOT_SUPPORTED (0)
WorkGraphsTier : D3D12_WORK_GRAPHS_TIER_1_0 (10)
ExecuteIndirectTier : D3D12_EXECUTE_INDIRECT_TIER_1_1 (11)
SampleCmpGradientAndBiasSupported : 1
ExtendedCommandInfoSupported : 1
Predication.Supported : 1
HardwareCopy.Supported : 1
Metacommands enumerated : 15
Metacommands [parameters per stage]: Conv (Convolution) [84][1][6], CopyTensor [3][1][31], MVN (Mean Variance Normalization) [67][1][6], GEMM (General matrix multiply) [67][1][6], Conv (Convolution) [108][5][6], GEMM (General matrix multiply) [91][5][6], MVN (Mean Variance Normalization) [91][5][6], Pooling [56][3][4], Direct Storage [4][0][11], GEMM (General matrix multiply) [91][5][6], MHA (Multi-Head Attention) [299][13][16], MHA (Multi-Head Attention) [321][14][17], MVN (Mean Variance Normalization) [92][5][6], QuantizedGEMM (Quantized General matrix multiply) [431][21][22], DSR_SUPERRES_METACOMMAND [11][5][49]

Experimental results:

Code:
Direct3D 12 feature checker (March 2024) by DmitryKo (x64) (Agility SDK v613)
https://forum.beyond3d.com/posts/1840641/

Windows 10X version 24H2 (build 26100.2894 ge_release) x64
Checking for experimental features SM6 TR4 WAVEMMA

ADAPTER 0
"NVIDIA GeForce RTX 5080"
VEN_10DE, DEV_2C02, SUBSYS_F3201569, REV_A1
Dedicated video memory : 15889.0 MB (16660824064 bytes)
Total video memory : 48533.8 MB (50891380736 bytes)
BIOS string : Version98.3.3b.0.50
Video driver version : 32.0.15.7202
WDDM version : KMT_DRIVERVERSION_WDDM_3_2 (3200)
Virtual memory model : GPUMMU
Hardware-accelerated scheduler : Enabled, DXGK_FEATURE_SUPPORT_STABLE (2)
GraphicsPreemptionGranularity : DXGI_GRAPHICS_PREEMPTION_PIXEL_BOUNDARY (3)
ComputePreemptionGranularity : DXGI_COMPUTE_PREEMPTION_DISPATCH_BOUNDARY (1)
Maximum feature level : D3D_FEATURE_LEVEL_12_2 (0xc200)
DoublePrecisionFloatShaderOps : 1
OutputMergerLogicOp : 1
MinPrecisionSupport : D3D12_SHADER_MIN_PRECISION_SUPPORT_16_BIT (2) (0b0000'0010)
TiledResourcesTier : D3D12_TILED_RESOURCES_TIER_4 (4)
ResourceBindingTier : D3D12_RESOURCE_BINDING_TIER_3 (3)
PSSpecifiedStencilRefSupported : 0
TypedUAVLoadAdditionalFormats : 1
ROVsSupported : 1
ConservativeRasterizationTier : D3D12_CONSERVATIVE_RASTERIZATION_TIER_3 (3)
StandardSwizzle64KBSupported : 0
CrossNodeSharingTier : D3D12_CROSS_NODE_SHARING_TIER_NOT_SUPPORTED (0)
CrossAdapterRowMajorTextureSupported : 0
VPAndRTArrayIndexFromAnyShaderFeedingRasterizerSupportedWithoutGSEmulation : 1
ResourceHeapTier : D3D12_RESOURCE_HEAP_TIER_2 (2)
MaxGPUVirtualAddressBitsPerResource : 40
MaxGPUVirtualAddressBitsPerProcess : 44
Adapter Node 0:     TileBasedRenderer: 0, UMA: 0, CacheCoherentUMA: 0, IsolatedMMU: 1, HeapSerializationTier: 0, ProtectedResourceSession.Support: 1, ProtectedResourceSessionTypeCount: 1 D3D12_PROTECTED_RESOURCES_SESSION_HARDWARE_PROTECTED
HighestShaderModel : D3D12_SHADER_MODEL_6_9 (0x0069)
WaveOps : 1
WaveLaneCountMin : 32
WaveLaneCountMax : 32
TotalLaneCount : 10752
ExpandedComputeResourceStates : 1
Int64ShaderOps : 1
RootSignature.HighestVersion : D3D_ROOT_SIGNATURE_VERSION_1_2 (3)
DepthBoundsTestSupported : 1
ProgrammableSamplePositionsTier : D3D12_PROGRAMMABLE_SAMPLE_POSITIONS_TIER_2 (2)
ShaderCache.SupportFlags : D3D12_SHADER_CACHE_SUPPORT_SINGLE_PSO | LIBRARY | DRIVER_MANAGED_CACHE | SHADER_CONTROL_CLEAR | SHADER_SESSION_DELETE (115) (0b0111'0011)
CopyQueueTimestampQueriesSupported : 1
CastingFullyTypedFormatSupported : 1
WriteBufferImmediateSupportFlags : D3D12_COMMAND_LIST_SUPPORT_FLAG_DIRECT | BUNDLE | COMPUTE | COPY | VIDEO_DECODE | VIDEO_PROCESS | VIDEO_ENCODE (127) (0b0111'1111)
ViewInstancingTier : D3D12_VIEW_INSTANCING_TIER_3 (3)
BarycentricsSupported : 1
ExistingHeaps.Supported : 1
MSAA64KBAlignedTextureSupported : 1
SharedResourceCompatibilityTier : D3D12_SHARED_RESOURCE_COMPATIBILITY_TIER_2 (2)
Native16BitShaderOpsSupported : 1
AtomicShaderInstructions : 0
SRVOnlyTiledResourceTier3 : 1
RenderPassesTier : D3D12_RENDER_PASS_TIER_0 (0)
RaytracingTier : D3D12_RAYTRACING_TIER_1_1 (11)
AdditionalShadingRatesSupported : 1
PerPrimitiveShadingRateSupportedWithViewportIndexing : 1
VariableShadingRateTier : D3D12_VARIABLE_SHADING_RATE_TIER_2 (2)
ShadingRateImageTileSize : 16
BackgroundProcessingSupported : 1
MeshShaderTier : D3D12_MESH_SHADER_TIER_1 (10)
SamplerFeedbackTier : D3D12_SAMPLER_FEEDBACK_TIER_0_9 (90)
UnalignedBlockTexturesSupported : 1
MeshShaderPipelineStatsSupported : 1
MeshShaderSupportsFullRangeRenderTargetArrayIndex : 1
AtomicInt64OnTypedResourceSupported : 1
AtomicInt64OnGroupSharedSupported : 1
DerivativesInMeshAndAmplificationShadersSupported : 0
WaveMMATier : D3D12_WAVE_MMA_TIER_1_0 (10)
VariableRateShadingSumCombinerSupported : 1
MeshShaderPerPrimitiveShadingRateSupported : 1
AtomicInt64OnDescriptorHeapResourceSupported : 1
DisplayableTexture : 0
DisplayableTexture.SharedResourceCompatibilityTier : D3D12_SHARED_RESOURCE_COMPATIBILITY_TIER_0 (0)
MSPrimitivesPipelineStatisticIncludesCulledPrimitives : 1
EnhancedBarriersSupported : 1
RelaxedFormatCastingSupported : 1
UnrestrictedBufferTextureCopyPitchSupported : 1
UnrestrictedVertexElementAlignmentSupported : 1
InvertedViewportHeightFlipsYSupported : 1
InvertedViewportDepthFlipsZSupported : 1
TextureCopyBetweenDimensionsSupported : 1
AlphaBlendFactorSupported : 1
AdvancedTextureOpsSupported : 1
WriteableMSAATexturesSupported : 1
IndependentFrontAndBackStencilRefMaskSupported : 1
TriangleFanSupported : 1
DynamicIndexBufferStripCutSupported : 1
DynamicDepthBiasSupported : 1
GPUUploadHeapSupported : 1
NonNormalizedCoordinateSamplersSupported : 1
ManualWriteTrackingResourceSupported : 0
RenderPassesValid : 1
MismatchingOutputDimensionsSupported : 1
SupportedSampleCountsWithNoOutputs : 31
PointSamplingAddressesNeverRoundUp : 1
RasterizerDesc2Supported : 1
NarrowQuadrilateralLinesSupported : 1
AnisoFilterWithPointMipSupported : 1
MaxSamplerDescriptorHeapSize : 4080
MaxSamplerDescriptorHeapSizeWithStaticSamplers : 2048
MaxViewDescriptorHeapSize : 1000000
ComputeOnlyCustomHeapSupported : 0
ComputeOnlyWriteWatchSupported : 1
RecreateAtTier : D3D12_RECREATE_AT_TIER_NOT_SUPPORTED (0)
WorkGraphsTier : D3D12_WORK_GRAPHS_TIER_1_0 (10)
ExecuteIndirectTier : D3D12_EXECUTE_INDIRECT_TIER_1_1 (11)
SampleCmpGradientAndBiasSupported : 1
ExtendedCommandInfoSupported : 1
Predication.Supported : 1
HardwareCopy.Supported : 1
Metacommands enumerated : 15
Metacommands [parameters per stage]: Conv (Convolution) [84][1][6], CopyTensor [3][1][31], MVN (Mean Variance Normalization) [67][1][6], GEMM (General matrix multiply) [67][1][6], Conv (Convolution) [108][5][6], GEMM (General matrix multiply) [91][5][6], MVN (Mean Variance Normalization) [91][5][6], Pooling [56][3][4], Direct Storage [4][0][11], GEMM (General matrix multiply) [91][5][6], MHA (Multi-Head Attention) [299][13][16], MHA (Multi-Head Attention) [321][14][17], MVN (Mean Variance Normalization) [92][5][6], QuantizedGEMM (Quantized General matrix multiply) [431][21][22], DSR_SUPERRES_METACOMMAND [11][5][49]
 

MS has released DX12 Agility 1.615 and introduced a preview of Agility 1.716.0 with a bunch of new features:

  • Application Specific Driver State helps with issues caused by “app detect” behavior. Link to full blog
  • RecreateAt GPUVA, a useful feature for tooling developers for simplifying capturing and replaying of GPU workloads Link to full blog
  • Runtime Bypass From the beginning D3D12 has been about maximizing gaming performance by reducing overhead introduced by graphics drivers and D3D runtime. The Runtime Bypass feature removes the overhead of the runtime entirely by enabling the application to call directly into the driver for many APIs. This feature is managed by the runtime and graphics driver which allows it to be enabled by default for all D3D12 applications i.e. any existing or new D3D12 app will see improved performance without any code changes.
  • Shader hash bypass
  • Tight Alignment of Resources simplifies alignment restrictions across the ecosystem Link to full blog
  • Multiple video features to provide more control to apps using the D3D12 Video Encode API to reduce latency and improve quality Link to full blog
    • Encode subregion notifications
    • Encode output stats
    • Encode GPU texture input map
    • Encode GPU texture/CPU buffer dirty maps/rects
    • Encode GPU texture/CPU buffer motion vector hints
  • Please note that in this preview, mesh nodes is disabled, but it will return in a future preview and/or retail release.
 
  • Runtime Bypass From the beginning D3D12 has been about maximizing gaming performance by reducing overhead introduced by graphics drivers and D3D runtime. The Runtime Bypass feature removes the overhead of the runtime entirely by enabling the application to call directly into the driver for many APIs. This feature is managed by the runtime and graphics driver which allows it to be enabled by default for all D3D12 applications i.e. any existing or new D3D12 app will see improved performance without any code changes.

    are we consoles now? This sounds intresting .
 
Back
Top