Is 4GB enough for a high-end GPU in 2015?

My bad, you are right. I should spend more time with the profiler I guess xD
As for Direct3D 12, for MSAA an extra render target is needed as support only the flip presentation mode, which unfortunately does not support directly multi-sampling :\
Another "bad" news of D3D12 is that ASTC compression support will not supported in this version of DX12, I guess it will probably added in a next minor iteration of DX12, maybe focused on mobile hardware too (Qualcomm where are you?)

ASTC is already optional in DX12, besides you already have other compression formats to work with. The real benefit of ASTC is outside of the D3D universe imo.
 
Tell me more!

You allocate a base-resolution buffer (say 2560x1440), render all, see it's too slow, next frame render to a smaller scissor (say 1920x1080) using the same buffer, for display you scale it up. That way you don't need to re-allocate. Some buffers you don't need at full resolution, and some buffers you don't need for the whole time. So you can recycle (alias) a full-resolution buffer to store fe. 4 half-resolution buffers. It's fakey custom memory management under DX11 banner.
 
Interesting: I thought this was done for consoles, but not for PCs. (It completely invalidates benchmark results BTW. ;) )
Is this really a common technique?
 
ASTC is already optional in DX12, besides you already have other compression formats to work with. The real benefit of ASTC is outside of the D3D universe imo.
ASTC support has been retired from the lasd SDK: https://msdn.microsoft.com/en-us/li...790(v=vs.85).aspx?f=255&MSPPError=-2147217396
ASTC support will be added in future (I can guess in "Windows Redstone").
The real benefit of ASTC is relative to mobile world, however a finer control of texture compression can benefit PC games too reducing texture data downloading.

Tell me more!
You can yous barrier aliasing in Direct3D 12 to do that: https://msdn.microsoft.com/en-us/library/dn899226.aspx
 
Interesting: I thought this was done for consoles, but not for PCs. (It completely invalidates benchmark results BTW. ;) )
Is this really a common technique?

I had the same thought. If any game does this without making it clear, it makes benchmarks almost entirely useless, but I think people would notice.
 
I'm curious what the memory load of realtime GI techniques like LPV's and voxel cone tracing is? I'm guessing it will put current level game graphics past 4GB requirements.
 
Volume tiled resources (tier3) should help to implement more efficient GI algorithms, I can guess NVidia VXGI has or will have such a back-end.. Tiled Resource Tier 3 actually is supported only by Maxwell 2.
Another capabilities that could help to reach a new level of quality and efficiency are the tier 2 and tier 3 of conservative rasterization (which actually are not supported by none GPU). Microsoft claims Tier 2 and Tier 3 support "CPU-based algorithm acceleration (such as voxelization)." while Tier 1 not. All games I know that uses sort of voxelization techniques soffer a lot from being CPU bound.

I honestly think that when the hardware will be enough mature to support and run such features in a decent way, 4GB of VRAM could become be really a limitation on "AAA" games. But that's a personal opinion. Probably in the next years 4GB should be enough for most major of games, especially thinking about how much developers try to use such small buffers and structs for those techniques.
 
Gigabyte GeForce GTX 960 G1 Gaming 4GB review - VRAM Analysis 2GB vs 4GB
Another capabilities that could help to reach a new level of quality and efficiency are the tier 2 and tier 3 of conservative rasterization (which actually are not supported by none GPU). Microsoft claims Tier 2 and Tier 3 support "CPU-based algorithm acceleration (such as voxelization)." while Tier 1 not. All games I know that uses sort of voxelization techniques soffer a lot from being CPU bound.

I thought CR was supported by Maxwell 2 ...

Maxwell 2 will offer full support for these forthcoming features, and of these features the inclusion of volume tiled resources and conservative rasterization is seen as being especially important by NVIDIA, particularly since NVIDIA is building further technologies off of them.

http://www.anandtech.com/show/8526/nvidia-geforce-gtx-980-review/4
 
Interesting: I thought this was done for consoles, but not for PCs. (It completely invalidates benchmark results BTW. ;) )
Is this really a common technique?
Dynamic res was done for console version of Rage and Brink. I'm not sure they took that out of the engine for PC, but it probably isn't likely that it'd activate on most PCs anyway.
 
Dynamic res was done for console version of Rage and Brink. I'm not sure they took that out of the engine for PC, but it probably isn't likely that it'd activate on most PCs anyway.
DXGI 1.3 along with Direct3D 11.2 also introduced support for swap-chain up-scaling, but I don't know how behaves in such techniques, plus - and more important - it works only with Store apps and XAML.
 
One additional aspect for the 4-GiB-topic: Games are largely streaming based nowadays,
Unfortunately that is not true for many modern titles currently, They load vast amounts of textures into memory just to avoid streaming artifacts or problems related to storage speeds. Other games use a combination of non-streaming and uncompressed high resolution textures. Examples include Titanfall, Watch_Dogs, Thief, Daylight, Call Of Duty Ghosts, Wolfenstein, Evolve, Ryse, Far Cry 4, Assassin's Creed Unity, The Evil Within, Lords Of The Fallen, Shadow Of Mordor, Dying Light and GTA V. Most of these games require 3GB of VRAM to run at the highest texture levels. The last 4 require more than 3GB. Note that increasing MSAA levels or resolution above 1080p will necessitate a further increase in the required amount of VRAM.
 
Last edited:
Streaming does not mean you don't have a high local graphics load anymore, but that you dynamically stream into local memory what's being used currently. And with maybe a few exceptions most of the titles you listed did away with the concept of small levels that are completely hosted in local graphics memory.
 
I'm just as pumped for Star Citizen as the next guy but this is a mid 2015 card and you are talking about a game that will be lucky to make it out of alpha in 2016 let alone a full release.
To me that proves the need for larger VRAM now. If we're looking at 2016 for really utilizing these huge capacities, then I'd rather not have to upgrade next year again.
 
To me that proves the need for larger VRAM now. If we're looking at 2016 for really utilizing these huge capacities, then I'd rather not have to upgrade next year again.
but your under the assumption that 8 gigs would give you the desired performance and that you wouldn't need a faster card and 8 gigs.

I believe we will need faster cards and 8 gigs than what we have now. That is why I'm waiting till the next micron drop
 
Unfortunately that is not true for many modern titles currently, They load vast amounts of textures into memory just to avoid streaming artifacts or problems related to storage speeds. Other games use a combination of non-streaming and uncompressed high resolution textures.
In both cases the developers are needlessly wasting GPU memory.

If you are afraid of texture popping from HDD streaming, you can load your assets to main RAM in a lossless compressed format (LZMA or such in addition to DXT). Uncompress the data when a texture region becomes visible and stream to GPU memory using tiled resources.

Uncompressed textures (no DXT compression in addition to ZIP/LZMA) are just a stupid idea in huge majority of the use cases. You just waste a lot of bandwidth (performance) and memory for no visible gain. With normalization/renormalization the quality is very good for material properties and albedo/diffuse and BC5 actually beats uncompressed R8B8 in quality for normal maps (the Crytek paper about this method is a good read). BC7 format in DX11 gives you extra quality compared to BC3 with no extra runtime cost.

Most games are not using BC7 yet on PC, because the developer needs to also support DX10 GPUs. Duplicate assets would double the download size. Tiled resources need DX11.2 and DX11.2 unfortunately needs Windows 8. This is not yet broad enough audience. These problems will fix themselves in a few years. In addition, DX12 adds asych copy queues and async compute allowing faster streaming with less latency (much reduced texture popping).

Hopefully these new features will stop the brute force memory wasting seen in some PC games. Everything we have seen so far could have been easily implemented using less than 2GB of video memory (even at 4K), if the memory usage was tightly optimized.
 
Last edited:
In both cases the developers are needlessly wasting GPU memory.

If you are afraid of texture popping from HDD streaming, you can load your assets to main RAM in a lossless compressed format (LZMA or such in addition to DXT). Uncompress the data when a texture region becomes visible and stream to GPU memory using tiled resources.

Uncompressed textures (no DXT compression in addition to ZIP/LZMA) are just a stupid idea in huge majority of the use cases. You just waste a lot of bandwidth (performance) and memory for no visible gain. With normalization/renormalization the quality is very good for material properties and albedo/diffuse and BC5 actually beats uncompressed R8B8 in quality for normal maps (the Crytek paper about this method is a good read). BC7 format in DX11 gives you extra quality compared to BC3 with no extra runtime cost.

Most games are not using BC7 yet on PC, because the developer needs to also support DX10 GPUs. Duplicate assets would double the download size. Tiled resources need DX11.2 and DX11.2 unfortunately needs Windows 8. This is not yet broad enough audience. These problems will fix themselves in a few years. In addition, DX12 adds asych copy queues and async compute allowing faster streaming with less latency (much reduced texture popping).

Hopefully these new features will stop the brute force memory wasting seen in some PC games. Everything we have seen so far could have been easily implemented using less than 2GB of video memory (even at 4K), if the memory usage was tightly optimized.

With every one of your posts I get more excited for DX12!
 
[...] BC5 actually beats uncompressed R8B8 in quality for normal maps (the Crytek paper about this method is a good read).

Not sure what you're referring to. R8G8 is better than BC5 whenever your average angle is above ~12 degree. Otherwise BC5 is only better if your encoder can recognize the 8.6 fixed point precision possible with the hardware, or if the encoder doesn't if it happens to be better by chance as the hardware does 8.6 anyway.
 
Back
Top