DirectX 12: The future of it within the console gaming space (specifically the XB1)

I think for MSAA the hardware must somehow decide how to allocate lanes (threads) for a triangle which consists of multiple and single fragments. It still needs to form quads. So does it have a puzzle-solver ALU? What are the shapes of 100% efficient triangles, and what is the shape of 50% efficient ones? If they are simple grid-positioned, then triangle alignment (in the sense, is the long side touching the lattice or the diagonal side) matters for performance.
Not sure what you mean by multiple and single fragments, but you can conceptually treat it as: the rasteriser launches four threads (a quad) for each aligned 2x2 pixel area which has at least one sample covered. Triangle edge fragments are only special in the sense that their multisample mask isn't all ones. (There are exceptions to this, but that's the basic idea).
 
There is nothing complex in MSAA pixel shader invocations. As long as at least one sampling point is inside a triangle in a 2x2 pixel quad, the pixel shader is executed 4 times (once per pixel). It doesn't matter how many samples are covered by the triangle. The result of each pixel shader invocation is replicated to each subsample of that pixel (based on the triangle coverage mask and per sample depth/stencil tests).
 
So, you tell me the shading rate (number of invoken pixel shader instances) of MSAA and super-sampling is identical? I expected the hardware to be smarter ...

MSAA.png


Okay, okay, I always fall into the same trap. This is edge-selective super-sampling.
 
Last edited:
With 4x MSAA, the shading rate is 4 samples per pixel from a single fragment (work item). When the colour has been computed and it's written to the render target, the hardware knows which of the four samples actually fell within the portion of the triangle that hit the pixel. All 4 samples are the same colour, but only a subset of them are actually hit by the triangle.

In super sampling there are no samples. Four pixels need to be computed. After those 4 pixels have been written to the render target a special resolve shader comes along (after the entire rendering pass has been completed) and averages each quad of pixels down to a single pixel.

4x MSAA is therefore 4x faster to shade than 4x SSAA. But there is a bandwidth cost as well as a memory allocation cost, so it's slower than non-MSAA.

Since the GPU can't pixel shade less than a quad (texture shading gradients only exist at the quad level), a quad of pixels enter the pixel shader atomically. If you have a triangle that hits only a single pixel, then 4x MSAA and 4x SSAA have the same throughput (well, the resolve pass required by SSAA will make it slower in the end). Obviously, the vast majority of triangles are larger than a single pixel, which is why MSAA wins.

The loss of performance from single-pixel triangles also occurs when MSAA is off. It's not a bug in MSAA, it's a side effect of texturing-optimised immediate renderers that work on quads of pixels as the base unit of computation.

It's why very small triangles are literally a waste of time. AMD recommends triangle sizes of about 8-16 pixels area as a loose minimum.
 
Last edited:
To be fair dx was also awful at first
Yes, I still remember reading John Carmack's statements saying that he wasn't going to waste his time on it after trying the first version of DirectX. Those words when I was a kid have been ringing on my head ever since, Carmack was like a god back then. That's why I still remember them.
 
Yep, quake first supported speedy3d (aka redline) after the software renderer of course.
although he hated speedy3d as well
were getting off topic here:
 
Yep, quake first supported speedy3d (aka redline) after the software renderer of course.
although he hated speedy3d as well
were getting off topic here:
I've never ever heard of Speedy3d, was it an early version of DirectX?
 
http://www.slideshare.net/liquidboy...dias-maxwell-architecture-presented-by-nvidia

Translated by google:
1. Lin Nan - NVIDIA Developer Technical director of China and NVIDIA GPU develop a new generation of DirectX on Windows systems

2. NVIDIA Maxwell architectural goals DirectX12 support new hardware features of Maxwell's other new features of the new special effects technology outlook presentations Summary

3. On the existing 28-nm node with a new architecture to improve performance significantly improve the energy consumption ratio is designed to provide the best support for DirectX 12 is designed to focus on the new graphics features provide real-time global illumination in complex dynamic scenes ( GI) effects provide a higher quality of working set memory interface programmable anti-aliasing (AA) (working set) management scalable 2D graphics acceleration capabilities of the DirectX 12 hardware platform to provide the best target architecture NVIDIA Maxwell

4. NVIDIA Maxwell architecture target hardware features DirectX12 new conservative boundary rasterization (Conservative Rasterization) ordered raster view (Raster Ordered Views) splicing resources (Tiled Resources) Maxwell support other new features of the new special effects technology outlook presentations Summary

5. Direct3D 12 API latest high-performance graphics hardware closer to the bottom, more direct hardware access can run on Microsoft supports a variety of hardware platforms are supported by a variety of excellent development tools provide support for all major hardware vendors

6. DirectX 12 feature to enhance the efficiency of the new API CPU on multi-core CPU parallel rendering overhead each nucleus has been effectively reduced to optimize efficiency and increase the GPU by using the index of new features to improve the flexibility of the resource bundle data efficient management and delivery of higher more direct job scheduling model a variety of new hardware features NVIDIA booth offers demo

7. conventional rasterization if the pixel is not the center of the triangle cover, no output conservative rasterization border: output all pixels of the triangle touch Application examples: raytracing shadow of voxels (Voxelization) second generation support this feature Maxwell GPU GTX 960/970/980 / Titan X DX12 conservative rasterization border

8. no ROV: output pixel shader is disordered ROV can guarantee a specific order pixel shader output can be used to solve many problems: programmable mixing operations efficient, flexible and transparent picture sequence-independent (OIT) transparent deferred rendering algorithm: Any Package g-buffer data in the second-generation Maxwell GPU support this feature orderly output performance loss will bring depth complexity will affect the performance DX12 orderly raster view (ROVs)

9. stitching 2D & 3D maps and virtual memory array by sea way to store massive amounts of data and use resources DX12 DX11.2 provide 2D stitching adds 3D mosaic texture streaming application resources (texture streaming) / shear mapping (Clipmaps) adaptive shadow Mapping (adaptive shadow map) sparse multi-resolution rendering (Sparse multi-resolution rendering) sparse voxel grid (Sparse voxel grids) second-generation Maxwell GPU support this feature depends on the specific application performance DX12 stitching resources
Other new features target architecture

10. NVIDIA Maxwell Maxwell DirectX12 support new hardware features multi-directional projection (Multi-Projection) accelerate programmable anti-aliasing effects of new technology outlook presentations Summary

11. Maxwell Support: Fast geometry shader fast multi-viewport projection (viewport multi-casting) can accelerate the following effects: a voxel-based (voxelization) rendered to the cube texture (cube-map rendering) Cascaded shadow maps (cascaded shadow maps) more Resolution Rendering (multi-resolution rendering) second-generation Maxwell GPU support this feature to accelerate the projected multi-Maxwell

12. The new advanced multi-sampling feature pixel shader can specify the location of all the sub-pixel pixel shader can learn whether each sub-pixel depth tested by bringing a series of new anti-aliasing technology multi-frame sampling anti-aliasing (Multi-frame sampled AA) polymerization G-Buffer antialiasing (Aggregate G-Buffer AA) Cumulative antialiasing (Accumulative AA) Maxwell GPU second generation anti-aliasing support this feature higher quality, faster Maxwell programmable anti-aliasing

13. NVIDIA Maxwell architectural goals DirectX12 support new hardware features Maxwell Maxwell other new features supported by other new features raytracing shadows (Raytraced Shadows) sparse fluid simulation (Sparse fluid simulation) voxel-based global illumination (Voxel Based Global Illumination) Aggregate G-Buffer AA VR Direct speech outline

14. raytracing shadows using raytracing to generate fine shadow effect applied DX12 conservative rasterization to accelerate border

15. The traditional shadow mapping SM = 8K x 8K (256 MB)

16. raytracing shadows SM = 3K x 3K (36 MB) PM = 1K x 1K x 64 (256 MB)

17. Minimum hardware requirements: D3D feature level 11 Recommended: D3D feature level 12 (conservative boundary rasterization can significantly improve the speed) which games will use raytracing shadows: the shadow of the game requires a lot of clarity in the game may be classified as "high" or "very high" level of screen effect has been integrated into the Unreal 4 engine raytracing shadows - Hardware considerations

18. The fluid simulation often requires a lot of data previously restricted to splice 3D algorithms limited memory resource properties using DX12 containing only sparse fluid simulation and computational fluid storage grid to save memory space, reduce the computation time

19. sparse fluid simulation demo

20. The minimum and recommended hardware requirements: D3D feature level 12 due to a lack of resources stitching 3D, D3D Feature level 11 only a small range of fluid simulation game that uses sparse fluid simulation: large-scale, interactive smoke / flame / water effects In the game may be classified as "high" or "very high" level of picture effect sparse fluid simulation - Hardware considerations

21. voxel-based global illumination - VXGI use the laws of physics simulation of light propagation to provide the most realistic lighting effects in Maxwell with a substantial performance boost DX12 advantage of new features to accelerate a variety of hardware rasterizer border more conservative projection to accelerate Stitching Resources

22. forums.unrealengine.com, user "rabellogp" no indirect lighting

23. forums.unrealengine.com, user "rabellogp" lightmap (static)

24. forums.unrealengine.com, user "rabellogp" VXGI (dynamic)

25. Minimum hardware requirements: D3D feature level 11 Recommendation: high-end DX12 feature level hardware (can use a variety of new features to accelerate) applications exquisite dynamic lighting in the game in the game may be classified as "very high" level of picture effect has been integrated into the Unreal 4 engine VXGI - Hardware considerations

26. deferred rendering special deferred rendering is now mainstream rendering method compared with MSAA, better, uses less memory and 32x MSAA AGAA 2A quite effective use of a programmable anti-aliasing feature provides faster Maxwell: visible only deals subpixel better: Adapted from the pixel position by the effect of aggregation to enhance the G-Buffer antialiasing - AGAA

27. AGAA demo

28. The minimum and recommended hardware requirements: Maxwell GPUs (NVIDIA GTX 960/970/980 / Titan X) applied in the game: You can replace the MSAA, achieve better and faster antialiasing AGAA - Hardware considerations

29. VR technology from NVIDIA collectively, including software and hardware technologies used to improve rendering performance VR reduces the response delay accelerated volume rendering NVIDIA VR Direct

30. Avoid sports vertigo elements: picture frame rate> 90 / sec for head rotation screen response time <20 ms VR Direct: reduce latency Scott W. Vincent turned to the screen change ≤ 20 ms Franklin Heijnen

31. How to control the response time of 20 milliseconds required hardware and driver provided "Timewarp" GPU technology before each output image to the display, immediately conduct a picture adjustments for the current helmet VR Direct location:? Timewarp

32. Timewarp technology can be a delay of 20 milliseconds or less effective in approaching the frame rate of less than 60 for the game is also effective support Timewarp technology NVIDIA GPU:! Section 500 and 600 series GPU, all 700 and 900 series GPU will be the future of GPU have better support VR Direct: Timewarp

33. VR Direct: VR SLI API around the same set of rendering instructions

34. Send a rendering instruction, while drawing the eyes of the image frame rate to enhance performance is difficult to reduce latency dual SIM cards close to twice the number of single observation point is repeated independent work done, such as shadows cast hardware requirements NVIDIA Geforce 500 series and above VR Direct: VR SLI

35. Q & A
 
Cyan if your interested in the early days and history of 3d on the pc
http://www.techspot.com/article/650-history-of-the-gpu/

ps: An i5-4690T isnt going to get 2.7million draw calls with dx12 since it only runs at 45watt ;)
Sure I am interested! Thanks, excellent stuff. They even mention "obscure", unknown hardware like the Project Talisman, and it's also nice to know how ATi started in the GPU business, among many other different things. A must read.
 
That VR Direct stuff sounds like it might be similar to what Sony is doing with Morpheus?
I think everyone is doing the same, AMD's LiquidVR is the exact same as VR Direct (+ on top of that the direct to vr output possibility, bypassing the OS)
 
Back
Top