liquidboy
Regular
Not really sure if this is standard thing in modern GPU's these days?!
Also anyone think of an application of this, if not for performance what other scenario would benefit from going off-chip?!
Also anyone think of an application of this, if not for performance what other scenario would benefit from going off-chip?!
Tessellation
Describes the advantages and disadvantages of Onchip and Offchip tessellation. Applicable to only Hull Shaders and Geometry Shaders
Onchip tessellation mode
By default, the D3D11 driver always configures the GPU for onchip tessellation mode. In onchip tessellation mode, all of the data for input and output control points and per-patch constant is stored in LDS, including the constant factors. Because LDS is memory internal to the GPU, this means that no additional memory bandwidth is generated, and access to the data is guaranteed to be low-latency.
However, because this data is needed by the DS threads generated by tessellating the patches in a threadgroup, all such DS threads will need to run in the same CU that ran the VS and HS threads for the same threadgroup. This poses a severe limitation to the GPU’s ability to load-balance work amongst the 12 available CUs, especially when the tessellation factors are high.
With high tessellation factors, a single threadgroup might generate many waves of DS threads, and the LDS memory used will be blocked for any other use, including other threadgroups which might otherwise be able to run in the CU. (LDS is also used for threadgroup-local data in compute shaders, for PS interpolant values, and for onchip GS mode.)
Offchip tessellation mode
Offchip tessellation is an option that enables the use of non-LDS memory with hull and domain shaders.
The GPU's offchip tessellation mode is enabled by specifying the flag D3D11X_TESSELLATION_OFFCHIP. This mode uses the same amount of LDS as the onchip mode, but the HS also writes all output control points, tessellation factors, and per-patch constants to a memory buffer.
A heuristic is then used to run some DS waves in the same CU in onchip mode, to read the data from LDS, and to run other DS waves in other CUs, using offchip mode and reading the data from memory. This allows the GPU to release the threadgroup’s LDS memory before all DS waves are finished, or even launched, and it also allows the GPU to load-balance DS waves better across all CUs.
The advantage of doing tessellation off-chip is that LDS memory is freed-up for other graphics purposes. Whether a performance improvement is actually achieved by doing tessellation off-chip is very much dependent on the title code; a performance improvement might not be the result.