Direct3D feature levels discussion

Indeed, I believe it's the tiling modes available that's the sticking point, but I can't really go into any more detail.

DX11.2 won't allow 3D Tiled Resources at all. You could attempt to emulate one by creating a 2D TextureArray Tiled Resource, effectively using the 'slices' dimension as a 'Z' equivalent. However with this you lose the ability to trilinearly filtering between slices. There are also some restrictions surrounding the fact that 2D Tiled Resources can't have slices and mips, it's one or the other on 11.2.
 
Indeed, I believe it's the tiling modes available that's the sticking point, but I can't really go into any more detail.

DX11.2 won't allow 3D Tiled Resources at all. You could attempt to emulate one by creating a 2D TextureArray Tiled Resource, effectively using the 'slices' dimension as a 'Z' equivalent. However with this you lose the ability to trilinearly filtering between slices. There are also some restrictions surrounding the fact that 2D Tiled Resources can't have slices and mips, it's one or the other on 11.2.
I believe that 3d tiled resources could be supported by all GCN hardware, if the API allowed tile sizes with z=1. This would have some legit use cases (sparse volumes NOT being one of them).

One additional issue with volume tiled resources in DX12 is that DXT/BC format doesn't have 3d block sizes (DXT block size is always 4x4x1). This means that the aforementioned 64x32x32 (1 byte/pixel) tile would be 256x128x32 in BC3/5/6H/7. That is awkward (x=8*z). ASTC full profile has 3d block sizes, such as 3x3x3 (and float/HDR formats), making it a much better compressed format for 3d tiled resources. Unfortunately DX12 doesn't yet support ASTC. There was preliminary documentation released (regarding to ASTC LDR profile), but the ASTC support was cut from the final version. I hope we get ASTC full profile support in DX 12.2, since 3x3x3 float/HDR format would be perfect for sparse distance fields.
 
Last edited:
GCN hardware should support virtual memory of any resource (64KB pages). As far as I understand, DX 12.1 volume tiled resources means specially support for 3d swizzle layout (the tiles are cubes, not slices with z=1).
Correct, the main constraint of tiled resources (to make them portable and usable) is that it defines 2D/3D tile shapes for each 64KB chunk based on the bpp. "Standard swizzle" further defines the layout of the data inside that 64KB.
 
Volume Textures, 3D Textures and Texture3D are synonyms.

What Volume *Tiled* Resources introduces is the ability for 3D Textures to be used as Tiled Resources (aka Sparse Textures, aka Partially Resident Textures), and it's that that GCN has never been reported to support.
Fully agree - I was just trying to say that MSDN documentation on Feature levels and Volume Tiled Resources is not up to date with the latest public Windows SDK, where this capability does not exist as a separate option anymore and is now merged to the Tiled Resource Tiers.

Hence Tiled Resource Tier 3 (Tiled Texture3D AKA Volume Tiled Resources) is not really a requirement on Feature level 12_0 - it only requires Tier 2 (Tiled Buffer and Tiled Texture2D, with some additional refinements over Tier 1 as described in D3D12_TILED_RESOURCES_TIER).


We had a similar case when Max MacMullen was giving the final requirements for resource binding tiers in his GDC 2015 presentation this March (http://channel9.msdn.com/Events/GDC/GDC-2015/Advanced-DirectX12-Graphics-and-Performance 6:40-9:30), but online MSDN docs were only updated in late May 2015 to reflect the changes.


VolumeTiledResourcesSupported was a caps-bit in an older never-public SDK
Thank you for confirming this...
 
Last edited:
Interesting, so is the smem literally wasted when running 3D? Or is there some capacity to use parts of it on a per-wavefront basis within a CU depending on what type of threads are running?
GCN ISA is publicly available. Transformed vertices (VS->PS) appear in the LDS. The vertex interpolation instructions (emitted by the compiler at the beginning of each pixel shader) take inputs from LDS.

Drobot's presentation (slide 39->) provides detailed information about fetching vertex shader outputs from LDS in pixel shader:
http://michaldrobot.files.wordpress.com/2014/05/gcn_alu_opt_digitaldragons2014.pptx
 
Transformed vertices (VS->PS) appear in the LDS. The vertex interpolation instructions (emitted by the compiler at the beginning of each pixel shader) take inputs from LDS.
Interesting, I have a feeling that would introduce some non-trivial scheduling constraints (which I guess is known about GCN), but not an unreasonable design. Anyways I'll move any followup on that to a separate thread as we are far afield the topic now ;)
 
I just downloaded the last AMD GPU Perfstudio which support D3D12 and the new Fury, they added the shader bytecode under the GCN 1.2 GPUs group, so looks like we really do not have any ISA changes..
 
I just downloaded the last AMD GPU Perfstudio which support D3D12 and the new Fury, they added the shader bytecode under the GCN 1.2 GPUs group, so looks like we really do not have any ISA changes..
It has some changes. Source: http://amd-dev.wpengine.netdna-cdn..../07/AMD_GCN3_Instruction_Set_Architecture.pdf

  • VGPR Indexing for VALU instructions.

  •  New Instructions
    • – Scalar Memory Writes.

    • – S_CMP_EQ_U64, S_CMP_NE_U64.

    • – 16-bit floating point VALU instructions.

    • – “SDWA” – Sub Dword Addressing allows access to bytes and words of VGPRs in VALU instructions.

    • – “DPP” – Data Parallel Processing allows VALU instructions to access data from neighboring lanes.

    • – V_PERM_B32.

    • – DS_PERMUTE_RTN_B32, DS_BPERMPUTE_RTN_B32.
  •  Removed Instructions
    • – V_MAC_LEGACY_F32

    • – V_CMPS* - now supported by V_CMP with the “clamp” bit set to 1.

    • – V_MULLIT_F32.

    • – V_{MIN, MAX, RCP, RSQ}_F32.

    • – V_{LOG, RCP, RSQ}_CLAMP_F32.

    • – V_{RCP, RSQ}_CLAMP_F64.

    • – V_MUL_LO_I32 (it’s functionally identical to V_MUL_LO_U32).

    • – All non-reverse shift instructions.

    • – LDS and Memory atomics: MIN, MAX and CMPSWAP on F32 and F64 data.
The biggest change is that GCN gen1 and gen2 didn't support register indexing. Try writing a local array inside a function and index that array by a dynamic index. Old compiler should emit super bad code to emulate the indexing operator, while new gen3 compiler should just emit a single instruction.

Float LDS atomics were likely removed because HLSL doesn't expose them. Most of the other removed instructions are not used either.

I don't believe the HLSL compiler emits any of the new instructions, except for fp16 stuff. Try writing OpenCL 2.1 sub group prefix sum and you should get something new, assuming the compiler is upgraded.
 
Last edited:
That's the GCN 1.2 manual, the same architecture revision of R9 285 (Codename "Tonga").

GCN.png
 
AMD renamed the GCN versions some time ago...
GCN 1.0 = GCN Gen1
GCN 1.1 = GCN Gen2
GCN 1.2 = GCN Gen3

Tonga (285) and Fiji (Fury & Fury X) are both GCN Gen3 parts. So register indexing and other good stuff should be visible in compiled shader microcode, assuming AMD have had time to implement those in their shader compiler.
 
Yep, thanks to Anandtech, people are still confused about this.
No one should be confused. When we said that Fury was GCN 1.2, we meant it.

As for "GCN 1.2", that is a matter of AMD's doing. Ever since GCN 1.1 we kept asking them for a name, they kept declining. Same thing with the Tonga launch, they would acknowledge that it's new, but they wouldn't name it. AMD PR doesn't want to call too much attention to how different parts at different architectural levels, which is hamstringing their ability to offer consistent and meaningful names.
 
No one should be confused. When we said that Fury was GCN 1.2, we meant it.

As for "GCN 1.2", that is a matter of AMD's doing. Ever since GCN 1.1 we kept asking them for a name, they kept declining. Same thing with the Tonga launch, they would acknowledge that it's new, but they wouldn't name it. AMD PR doesn't want to call too much attention to how different parts at different architectural levels, which is hamstringing their ability to offer the same old GCN 1.0 under Pitcairn re-brands.
Fixed.
I understand that no-one is able to offer only newer revision with the same crappy 28nm node, however continue to offer that GPU... Well, it is limiting in my point of view...
 
No one should be confused. When we said that Fury was GCN 1.2, we meant it.
Apart from the fact that you've been told that there is no such thing, you are confusing shader architecture with GPU architecture.

As for "GCN 1.2", that is a matter of AMD's doing. Ever since GCN 1.1 we kept asking them for a name, they kept declining. Same thing with the Tonga launch, they would acknowledge that it's new, but they wouldn't name it. AMD PR doesn't want to call too much attention to how different parts at different architectural levels, which is hamstringing their ability to offer consistent and meaningful names.
It has a name, it's Tonga. Fiji has a name, it's Fiji. Get over yourself.
 
PC games use HLSL, or GLSL, so I don't think they can specifically use those instructions anyway. The compiler takes care of the details, I assume.
 
Back
Top