I haven't seen too many responses that are critical of the programming paradigm, but mostly just about the lack of debugging tools.
I assume this is more a programming paradigm than it being hardware related. So don't think PS has anything to do with it.I doubt it, unless AMD exposed something similar on Playstation. The reaction from game devs makes it appear as if it's something new.
The paradigm itself isn't new and has been on the roadmaps (so to speak) for quite some time now.
I haven't seen too many responses that are critical of the programming paradigm, but mostly just about the lack of debugging tools.
I'd say that it's less interesting for UMA h/w like consoles but it may end up freeing up some CPU cycles there as well.I assume this is more a programming paradigm than it being hardware related. So don't think PS has anything to do with it.
It's possible that the PS api already made it available, or could be added, not a good fit, who knows. This is a DX thing though.
Sounds like it would help getting better utilisation out of the XSX gpu.
Timothy is complaining about the fact that the solution isn't more explicit and doesn't pivot hard enough to AMD HW despite the reception by his former colleagues over there ...
D3D12 Work Graphs are a little bit more powerful than CUDA device graphs. CUDA graphs have restrictions like where memcpy nodes can't be used with CUDA Arrays which isn't ideal for Nanite-style producer-consumer queue work compaction ...Launching compute kernels from device has been available in CUDA since Kepler.
CUDA Graphs have also been available for a while. Not sure how it compares to the DX Compute Graphs, but the concept seems to be the same.
Differences might lie in the interaction with graphics, but this has not been explored in the DX api yet.
What happened to the DirectX installers that came with every game? Why can't they fulfill this duty?One more feedback from a developer.
"This is great, but adopting new Agility SDK features is difficult when the DirectX Shader Compiler binaries have multiple conflicting linkage requirements, meaning you can't migrate an app from the Windows SDK version of DXC to the more recent GitHub release..."
Do you mean producer-consumer queues? They are supported in CUDA since A100, so you can allocate and pin a part of the L2 cache for these queues for the bandwidth savings.D3D12 Work Graphs shines over CUDA graphs in allowing implementations to efficiently pass registers from producer to the consumer which translates to significant memory bandwidth savings ...
In Workgraphs you have a blackbox data passing mechanism between nodes. You don't need (and should not) do it manually via something like UAV constructs. A hardware implementation can pass the data in any way it wants: registers, extra LDS, caches.Do you mean producer-consumer queues? They are supported in CUDA since A100, so you can allocate and pin a part of the L2 cache for these queues for the bandwidth savings.
Hardware isn't self-serving. It appreciably simplifies development of types of pipelines for ISVs. Especially binning, reduction and occupancy related constructs benefit (runtime performance and development performance).Or it solves a problem which doesn't exist on their h/w.
Registers are private per thread, and LDS is too small, leaving caches as the only feasible solution for the task. However, the lack of restrictions and explicitness raises questions about its speed in practice. It remains to be seen whether this approach will, for example, save bandwidth.In Workgraphs you have a blackbox data passing mechanism between nodes. You don't need (and should not) do it manually via something like UAV constructs. A hardware implementation can pass the data in any way it wants: registers, extra LDS, caches.
This is by design, the developer has to use external symbols to specify the exact version of the Agility SDK to be loaded at runtime - since the latest version of the tool was built withSeems like @DmitryKo 's feature checker is refusing to work with latest Agile SDK lib (It's working fine with 1.710
D3D12SDKVersion=710
embedded in the executable file, the OS will only load a matching D3D12Core.dll
version 1.710, and throw an error if you replace it with any other version (BTW they used to allow higher versions of the DLL, but this was changed with the transition to SDK 6xx/7xx).D3D12Core.dll
with a new version will not gain you anything, because the source code needs to be updated to use the new structures defined in the latest Agility SDK header files. Even though I made some changes to support Agility SDK 711, WaveMMA reporting only works on Radeon RX 7000 (RDNA3) and my current card is RX 5700 XT, so I cannot test it unless AMD implements it on RDNA1 cards (or I get a very good deal on a Radeon RX 7600).BTW the beta AMD Adrenalin driver 23.10.01.14 (build 31.0.21001.14018) now supports a few new features in the Agility SDK 1.710 and 1.706/1.606 even on the Radeon RX 5700 XT:This is how the new features are reported by the current redistributable WARP library 1.0.5 (DLL version 10.0.25321.1003).
Code:NonNormalizedCoordinateSamplersSupported : 1 ManualWriteTrackingResourceSupported : 0 RenderPassesValid : 1 MismatchingOutputDimensionsSupported : 1 SupportedSampleCountsWithNoOutputs : 31 PointSamplingAddressesNeverRoundUp : 1 RasterizerDesc2Supported : 1 NarrowQuadrilateralLinesSupported : 1 AnisoFilterWithPointMipSupported : 1 MaxSamplerDescriptorHeapSize : 2097152 MaxSamplerDescriptorHeapSizeWithStaticSamplers : 2097152 MaxViewDescriptorHeapSize : 2097152 ComputeOnlyCustomHeapSupported : 0
And this is a report by the Adrenalin driver 23.3.1 (build 31.0.14037.1007) on my Radeon 5700 XT (AMD did not relase an Agility SDK 1.710.0-specific driver yet)
EnhancedBarriersSupported : 1
RelaxedFormatCastingSupported : 1
DynamicIndexBufferStripCutSupported : 1
DynamicDepthBiasSupported : 1
GPUUploadHeapSupported : 1
NonNormalizedCoordinateSamplersSupported : 1
MismatchingOutputDimensionsSupported : 1
SupportedSampleCountsWithNoOutputs : 29
PointSamplingAddressesNeverRoundUp : 1
RasterizerDesc2Supported : 1
NarrowQuadrilateralLinesSupported : 1
AnisoFilterWithPointMipSupported : 1
MaxSamplerDescriptorHeapSize : 67108864
MaxSamplerDescriptorHeapSizeWithStaticSamplers : 67108864
MaxViewDescriptorHeapSize : 33554432
I've just got myself an Radeon RX 7600 (RNDA3) card, and there are no major new feature options comparing to RDNA2, except for experimental WaveMMA and D3D12_WORK_GRAPHS_TIER_0_1, if you look at the most recent report for RX 6800 posted by CarstenS back in November 2020; his comparison with the Nvidia RTX series remains valid as well.Has anyone ran @DmitryKo 's utility on an RDNA3 card btw?
It is likely a copy of RDNA2 feature wise but just to be sure.
Maximum feature level : D3D_FEATURE_LEVEL_12_2 (0xc200)
BarycentricsSupported : 1
RaytracingTier : D3D12_RAYTRACING_TIER_1_1 (11)
PerPrimitiveShadingRateSupportedWithViewportIndexing : 1
VariableShadingRateTier : D3D12_VARIABLE_SHADING_RATE_TIER_2 (2)
ShadingRateImageTileSize : 8
MeshShaderTier : D3D12_MESH_SHADER_TIER_1 (10)
SamplerFeedbackTier : D3D12_SAMPLER_FEEDBACK_TIER_1_0 (100)
MeshShaderPipelineStatsSupported : 1
WaveMMATier : D3D12_WAVE_MMA_TIER_1_0 (10)
VariableRateShadingSumCombinerSupported : 1
GraphicsPreemptionGranularity : DXGI_GRAPHICS_PREEMPTION_PRIMITIVE_BOUNDARY (1)
ComputePreemptionGranularity : DXGI_COMPUTE_PREEMPTION_DMA_BUFFER_BOUNDARY (0)
PSSpecifiedStencilRefSupported : 1
MaxGPUVirtualAddressBitsPerResource : 44
MaxGPUVirtualAddressBitsPerProcess : 44
ViewInstancingTier : D3D12_VIEW_INSTANCING_TIER_1 (1)
AdditionalShadingRatesSupported : 0
ShadingRateImageTileSize : 8
BackgroundProcessingSupported : 0
SamplerFeedbackTier : D3D12_SAMPLER_FEEDBACK_TIER_1_0 (100)
WaveMMATier : D3D12_WAVE_MMA_TIER_1_0 (10)
MeshShaderPerPrimitiveShadingRateSupported : 0
MSPrimitivesPipelineStatisticIncludesCulledPrimitives : 0
Direct3D 12 feature checker (July 2023) by DmitryKo (x64) (Agility SDK v711)
Windows 10X version 22H2 (build 22621.1928 ni_release) x64
ADAPTER 0
"Intel(R) Arc(TM) A770 Graphics"
VEN_8086, DEV_56A0, SUBSYS_10208086, REV_08
Dedicated video memory : 16256.0 MB (17045651456 bytes)
Total video memory : 24412.4 MB (25598205952 bytes)
BIOS string : Intel Video BIOS
Video driver version : 31.0.101.4314
WDDM version : KMT_DRIVERVERSION_WDDM_3_1 (3100)
Virtual memory model : GPUMMU
Hardware-accelerated scheduler : Disabled, DXGK_FEATURE_SUPPORT_ALWAYS_OFF (0)
GraphicsPreemptionGranularity : DXGI_GRAPHICS_PREEMPTION_TRIANGLE_BOUNDARY (2)
ComputePreemptionGranularity : DXGI_COMPUTE_PREEMPTION_THREAD_GROUP_BOUNDARY (2)
Maximum feature level : D3D_FEATURE_LEVEL_12_2 (0xc200)
DoublePrecisionFloatShaderOps : 0
OutputMergerLogicOp : 1
MinPrecisionSupport : D3D12_SHADER_MIN_PRECISION_SUPPORT_16_BIT (2) (0b0000'0010)
TiledResourcesTier : D3D12_TILED_RESOURCES_TIER_3 (3)
ResourceBindingTier : D3D12_RESOURCE_BINDING_TIER_3 (3)
PSSpecifiedStencilRefSupported : 1
TypedUAVLoadAdditionalFormats : 1
ROVsSupported : 1
ConservativeRasterizationTier : D3D12_CONSERVATIVE_RASTERIZATION_TIER_3 (3)
StandardSwizzle64KBSupported : 0
CrossNodeSharingTier : D3D12_CROSS_NODE_SHARING_TIER_NOT_SUPPORTED (0)
CrossAdapterRowMajorTextureSupported : 1
VPAndRTArrayIndexFromAnyShaderFeedingRasterizerSupportedWithoutGSEmulation : 1
ResourceHeapTier : D3D12_RESOURCE_HEAP_TIER_1 (1)
MaxGPUVirtualAddressBitsPerResource : 44
MaxGPUVirtualAddressBitsPerProcess : 48
Adapter Node 0: TileBasedRenderer: 0, UMA: 0, CacheCoherentUMA: 0, IsolatedMMU: 1, HeapSerializationTier: 0, ProtectedResourceSession.Support: 1, ProtectedResourceSessionTypeCount: 1 D3D12_PROTECTED_RESOURCES_SESSION_HARDWARE_PROTECTED
HighestShaderModel : D3D12_SHADER_MODEL_6_7 (0x0067)
WaveOps : 1
WaveLaneCountMin : 8
WaveLaneCountMax : 32
TotalLaneCount : 16384
ExpandedComputeResourceStates : 1
Int64ShaderOps : 1
RootSignature.HighestVersion : D3D_ROOT_SIGNATURE_VERSION_1_2 (3)
DepthBoundsTestSupported : 1
ProgrammableSamplePositionsTier : D3D12_PROGRAMMABLE_SAMPLE_POSITIONS_TIER_1 (1)
ShaderCache.SupportFlags : D3D12_SHADER_CACHE_SUPPORT_SINGLE_PSO | LIBRARY | AUTOMATIC_INPROC_CACHE | AUTOMATIC_DISK_CACHE | SHADER_CONTROL_CLEAR | SHADER_SESSION_DELETE (111) (0b0110'1111)
CopyQueueTimestampQueriesSupported : 1
CastingFullyTypedFormatSupported : 1
WriteBufferImmediateSupportFlags : D3D12_COMMAND_LIST_SUPPORT_FLAG_DIRECT | BUNDLE | COMPUTE | COPY (15) (0b0000'1111)
ViewInstancingTier : D3D12_VIEW_INSTANCING_TIER_2 (2)
BarycentricsSupported : 0
ExistingHeaps.Supported : 1
MSAA64KBAlignedTextureSupported : 1
SharedResourceCompatibilityTier : D3D12_SHARED_RESOURCE_COMPATIBILITY_TIER_2 (2)
Native16BitShaderOpsSupported : 1
AtomicShaderInstructions : 0
SRVOnlyTiledResourceTier3 : 1
RenderPassesTier : D3D12_RENDER_PASS_TIER_0 (0)
RaytracingTier : D3D12_RAYTRACING_TIER_1_1 (11)
AdditionalShadingRatesSupported : 1
PerPrimitiveShadingRateSupportedWithViewportIndexing : 1
VariableShadingRateTier : D3D12_VARIABLE_SHADING_RATE_TIER_2 (2)
ShadingRateImageTileSize : 8
BackgroundProcessingSupported : 1
MeshShaderTier : D3D12_MESH_SHADER_TIER_1 (10)
SamplerFeedbackTier : D3D12_SAMPLER_FEEDBACK_TIER_0_9 (90)
UnalignedBlockTexturesSupported : 1
MeshShaderPipelineStatsSupported : 1
MeshShaderSupportsFullRangeRenderTargetArrayIndex : 1
AtomicInt64OnTypedResourceSupported : 0
AtomicInt64OnGroupSharedSupported : 0
DerivativesInMeshAndAmplificationShadersSupported : 0
WaveMMATier : D3D12_WAVE_MMA_TIER_NOT_SUPPORTED (0)
VariableRateShadingSumCombinerSupported : 1
MeshShaderPerPrimitiveShadingRateSupported : 1
AtomicInt64OnDescriptorHeapResourceSupported : 1
DisplayableTexture : 0
DisplayableTexture.SharedResourceCompatibilityTier : D3D12_SHARED_RESOURCE_COMPATIBILITY_TIER_0 (0)
MSPrimitivesPipelineStatisticIncludesCulledPrimitives : 1
EnhancedBarriersSupported : 1
RelaxedFormatCastingSupported : 1
UnrestrictedBufferTextureCopyPitchSupported : 1
UnrestrictedVertexElementAlignmentSupported : 1
InvertedViewportHeightFlipsYSupported : 1
InvertedViewportDepthFlipsZSupported : 1
TextureCopyBetweenDimensionsSupported : 1
AlphaBlendFactorSupported : 1
AdvancedTextureOpsSupported : 0
WriteableMSAATexturesSupported : 0
IndependentFrontAndBackStencilRefMaskSupported : 1
TriangleFanSupported : 1
DynamicIndexBufferStripCutSupported : 1
DynamicDepthBiasSupported : 1
GPUUploadHeapSupported : 1
NonNormalizedCoordinateSamplersSupported : 1
ManualWriteTrackingResourceSupported : 0
RenderPassesValid : 1
MismatchingOutputDimensionsSupported : 0
SupportedSampleCountsWithNoOutputs : 1
PointSamplingAddressesNeverRoundUp : 0
RasterizerDesc2Supported : 1
NarrowQuadrilateralLinesSupported : 0
AnisoFilterWithPointMipSupported : 0
MaxSamplerDescriptorHeapSize : 2048
MaxSamplerDescriptorHeapSizeWithStaticSamplers : 2048
MaxViewDescriptorHeapSize : 1000000
ComputeOnlyCustomHeapSupported : 0
ComputeOnlyWriteWatchSupported : 1
Experimental.WorkGraphsTier : D3D12_WORK_GRAPHS_TIER_NOT_SUPPORTED (0)
Metacommands enumerated : 11
Metacommands [parameters per stage]: Conv (Convolution) [84][1][6], GEMM (General matrix multiply) [67][1][6], Pooling [44][1][4], Conv (Convolution) [108][5][6], GEMM (General matrix multiply) [91][5][6], MVN (Mean Variance Normalization) [91][5][6], Pooling [56][3][4], LSTM (Long Short-Term Memory) [252][10][13], DStorageCustom Metacommand [4][0][11], [1][0][9], [4][0][11]