Is 4GB enough for a high-end GPU in 2015?

Discussion in 'Architecture and Products' started by Albuquerque, Jun 9, 2015.

  1. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,293
    Location:
    Helsinki, Finland
    Modern GPUs (already last gen consoles) do not truncate BC5 interpolated value pair (to 8 bit) after decompression. This allows more precision than 8 bit uncompressed. However the BC3 format alpha is truncated to 8 bit after decompression (so this particular format never exceeds 8 bit uncompressed in quality).

    8 bit uncompressed channel provides 256 different values. This is not enough for large smoothly curved high specular surfaces (such as car hoods) especially in physically based lighting pipelines. You can see some minor banding in the highlights.

    BC5 has two 8 bit endpoint palette for a 4x4 tile. There is 3 bit interpolation value between the endpoints (8 different values). On a large smooth curved surface the palette endpoints are very close to each other. In areas with the most notable banding the endpoints differ by one. In this case the 3 bit interpolation gives you 6 extra values between the 8 bit endpoints. This produces quality higher than 10 bit uncompressed channels. Crytek's few years old SIGGRAPH presentation describes the benefits of this normal texture compression method.

    Rough areas suffer some LSB bit loss from BC5, but often this is impossible to see by naked eye, since rough areas (by definition) do not have smooth highlights (banding is not possible). Also if you use toksvig mapping or some other specular AA method, it will smooth your highlights at rough areas, further hiding this particular issue. In the end BC5 with Crytek style texture compressor beats uncompressed R8B8 in quality and needs half the bandwidth and half the memory.
     
  2. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,019
    Likes Received:
    115
    Interesting. I always thought BC4/5 is just like the alpha channel of BC3. The WGF specs though actually indeed mention the precision is higher. Though according to specs, higher precision for BC3 alpha would be allowed as well, just not required.
     
  3. Ethatron

    Regular Subscriber

    Joined:
    Jan 24, 2010
    Messages:
    869
    Likes Received:
    277
    Correct, they use about 8.6+ fixed point.

    Not unconditionally, and not the way you mean it. If the angle is above 12 degrees (which is 8/255 BTW) you get technically more bits for the interpolated values than in the R8G8 case, but they don't lead to more precise interpolated values per-se as you have less bins than R8G8 in the first place and you're not free to choose them. They might be better, but they are more likely worst. Depends on the content.

    That's the <12 degree case.

    No. Anton just describes that if you ought to have a BC5 compressor which supports float->BC5 compression that "by chance" you get better normals from the 8.6+ interpolation. He doesn't propose to write an actual coder which knows anything about it. Which means that by chance a tile could also end up worst. You can also use 8bit source material and end up having a similar "chance" to get better or worst results.

    I wrote a BC5 compressor that actually knows about the higher precision, and is likely the best after pure brute-force. It also found it's way into the CryEngine. Feel free to check if you can get even better normal maps. :)

    It has half the bandwidth, but it isn't unconditionally better, BC5 is lossy after all.
     
    Putas likes this.
  4. Ethatron

    Regular Subscriber

    Joined:
    Jan 24, 2010
    Messages:
    869
    Likes Received:
    277
    No, BC4 and BC5 behave different than the BC3 alpha channel, even if the block-coding is identical. I think they didn't want to change the specs for DXT5 "after-the-fact" and break old hardwares.
     
  5. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,293
    Location:
    Helsinki, Finland
    I am talking about the "exhaustive" compute shader compressor that operates on 16 bit (per channel) source normals. This compressor also doesn't choose the best value simply by minimizing the (square) error of the 2 channels, it decodes the 2 channel BC5 compressed data block back to normalized 3d vectors and compares it against the original normal vectors. Maybe you are talking about an earlier presentation? I am not 100% sure that the presentation I was referring to was held at SIGGRAPH (it might have been an earlier one).

    Or maybe the compressor I am talking about is your compressor :)

    But the conclusion stands, BC5 is an excellent normal map compression format assuming you have a good compressor. With 16 bit source data it often looks better than R8G8 and is only half the price. BC5 is also good for storing material properties (roughness, specular, emissive, etc). It has a nice "pseudo float" property. Channels (and texture regions) that are filled with closer to zero values get more precision. This is useful especially for non-linear data.
     
    #65 sebbbi, Jun 16, 2015
    Last edited: Jun 16, 2015
  6. Ethatron

    Regular Subscriber

    Joined:
    Jan 24, 2010
    Messages:
    869
    Likes Received:
    277
    It should be "Reaching the speed of light" from SIGGRAPH 2010, yes. Such a brute force compressor is/was not practical for a production pipeline, it simply takes too long. Anton simply stated how to achieve the maximum possible quality with BC5. Although he missed the problem of 255/2, only signed or 254 valued unsigned formats can represent perfect up-vectors, regardless if 2 or 3 channel encoded.

    I thought back then the criteria for maximum quality was plain and obvious (I'm in signal processing and data compression for 20 years), but it took me a while to get an algorithm done which is pretty much within 99.5% (or more) of brute-force in 0.5% (or less) of the time. So, in a way I'd say yes. :)

    Yes, ofc, no doubt about that. It's just not the golden bullet. Not to mention the interpolation-problems arising from the parallel projection.

    Indeed. Interestingly for smooth textures it's possible to store sRGB values in the black domain with smaller bin-size than sRGB8, which in turn invalidades the white-compression of sRGB giving back space there as well. You need to live with the typical block-artefacts in high variance areas though. Verrry useful for low frequency lightmaps. (BC4 has no sRGB permutation, so no chance to get best of both.)
     
  7. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,293
    Location:
    Helsinki, Finland
    The original reason why DXT5 (now called BC3) was decompressed to RGBA8 was that this gave full rate filtering and occupied 32 bits per pixel in texture cache. Back then the texture caches stored uncompressed texture data (DXT blocks were decompressed to the cache). Nowadays GPU L1 caches contain compressed data. This gives 4x improvement on storage capacity for BC formats at a small extra cost for filtering (just a few extra transistors for fixed function palette interpolation and bit shuffling). Improved cache utilization is a good reason why nobody should ignore the BC compressed formats on modern GPUs.

    NVIDIAs old GPUs decoded DXT1 (BC1) to R5G5B5A1 (16 bpp) in their texture cache. Other GPUs (including NVIDIA nowadays) decode this to RGBA8, as all modern GPUs have full rate filtering for RGBA8. BC5 was originally created by ATI and was called 3Dc. BC5 was decoded to the texture cache as RG16 (at least on ATI hardware). All 32 bpp normalized formats have traditionally been full rate on all ATI cards, so there was no performance loss compared to RG8 decoding. ATI texture cache also didn't provide any storage gains for less that 32 bpp textures. I believe these are the reasons why BC5 (also known as 3Dc, ATI2 and DXN) decoded to more bits, and this is still a big advantage of this format.
     
    #67 sebbbi, Jun 16, 2015
    Last edited: Jun 16, 2015
  8. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,293
    Location:
    Helsinki, Finland
    Yes. I was some time ago looking for an optimal format for store a distance field. I was bummed not to find BC4_sRGB in the DX11 format list. I don't understand why only the lousy quality RGB triplets get sRGB support. Even the BC3 alpha channel doesn't get sRGB (it is linear while the RGB is in gamma space).

    BC7 is nice because it has sRGB support and higher quality than BC3 (at same cost). It is the best sRGB option available. Unfortunately BC7 compressors are still way slower than the others. You definitely don't want to reconvert all your BC7 textures for each new data revision.
     
    #68 sebbbi, Jun 16, 2015
    Last edited: Jun 16, 2015
  9. Alessio1989

    Regular Newcomer

    Joined:
    Jun 6, 2015
    Messages:
    602
    Likes Received:
    320
    Isn't TexConv of DirectXTex fast enough (gpu accelerated)? https://directxtex.codeplex.com/wikipage?title=Texconv
    AMD also proved a new Compress lib recently http://developer.amd.com/tools-and-sdks/graphics-development/amdcompress/
     
  10. Kaarlisk

    Regular Newcomer Subscriber

    Joined:
    Mar 22, 2010
    Messages:
    293
    Likes Received:
    49
    A little OT, but at least I stumbled upon a review (not new; I had somehow missed it) that has allayed my fears that buying a 4GB GTX 960 was completely useless.
    http://www.gamersnexus.net/guides/1888-evga-supersc-4gb-960-benchmark-vs-2gb/Page-2
    Yes, there are games where there is zero difference between 2/4GB. There are games where 4GB gets an advantage which is useless anyway because the average frame rate is still unplayable.
    And there are also games where, at playable frame rates, there is noticeably less stutter with the 4GB card :)
     
  11. revan

    Newcomer

    Joined:
    Nov 9, 2007
    Messages:
    55
    Likes Received:
    18
    Location:
    look in the sunrise ..will find me
    "Are 4GB HBM enough?
    Fiji is set to the new and much more advanced memory type HBM, but is limited to 4,096 MB. AMD asserts that the four gigabytes enough for future games. The fact that Radeon R9 290X and Radeon R9 390 (X) is already set to eight gigabytes, should not contradict this argument. Fiji goes according to AMD significantly more economical with the memory to as the GPUs with tethered GDDR5. H is followed up this statement in numerous games.

    Memory Usage
    If you look at the memory usage in different games, quickly notice that the Radeon R9 Fury X actually behaves differently than, for example, the Radeon R9 390X and the Radeon R9 290X. While the GDDR5 graphics card, for example, in Assassin's Creed Unity approve seven gigabytes of memory, the Radeon R9 Fury X comes in the same game with around 3,950 MB narrowly on the four gigabyte limit, without the at first glance differences in performance result.

    Even more significant is the new treatment of memory in Call of Duty: Advanced Warfare and Middle-earth: Shadow of Mordor. In first-person shooter on the other hand, the HBM-based graphics card is only 3.1 gigabytes, other four-gigabyte card the full memory. And also in Middle-earth, which is actually denounced as VRAM eaters, the assignment remains with 3,771 megabytes a little lower. "

    http://www.computerbase.de/2015-06/amd-radeon-r9-fury-x-test/11/

    "Memory is full" is not the same with "memory is required" it seems..
     
  12. snc

    snc
    Newcomer

    Joined:
    Mar 6, 2013
    Messages:
    198
    Likes Received:
    97
  13. ArcticCircle

    Newcomer

    Joined:
    Dec 14, 2012
    Messages:
    197
    Likes Received:
    42
    Location:
    Finland
  14. Alessio1989

    Regular Newcomer

    Joined:
    Jun 6, 2015
    Messages:
    602
    Likes Received:
    320
    Yeah, WDDM 1.x is such Jurassic :lol2:
     
  15. snc

    snc
    Newcomer

    Joined:
    Mar 6, 2013
    Messages:
    198
    Likes Received:
    97
    link "4gb is enough" :)
     
  16. Alessio1989

    Regular Newcomer

    Joined:
    Jun 6, 2015
    Messages:
    602
    Likes Received:
    320
    Thanks WDDM 1.x for that u.u. It stutters more like at ~3.1 - 3.2GB, considering only a small part of the VRAM pool is reserved for the OS (not ~800Mb for sure) all this should be caused by memory external fragmentation.
     
  17. Malo

    Malo Yak Mechanicum
    Legend Veteran Subscriber

    Joined:
    Feb 9, 2002
    Messages:
    7,683
    Likes Received:
    3,757
    Location:
    Pennsylvania
    Then why does a 980ti not experience the same issues?
     
  18. Alessio1989

    Regular Newcomer

    Joined:
    Jun 6, 2015
    Messages:
    602
    Likes Received:
    320
    I don't know, bad drivers or just bad different memory management implementation, a good point to start to investigate this issue could be a GPUView session on GTA5 and AMD Fury X (another interesting detail could be know the exposed GPU pre-emption granularity to DXGI).
    You can compare DX 11 (and prior versions) memory management like Java handles unreferenced objects in memory, where the garbage collector is implementation defined by the JVM (the driver): the efficiency of (de)allocation and the heap (de)fragmentation is quite all (but not totally) hidden to the developer. Manual memory management of D3D12 and the WDDM 2.0 memory reservation model (which is more "console-like") will provide developers the tools to solve this issues.
     
  19. Dominik D

    Regular

    Joined:
    Mar 23, 2007
    Messages:
    782
    Likes Received:
    22
    Location:
    Wroclaw, Poland
    For full-screen applications preemption granularity and engine dependencies don't matter that much and GPU is exclusive to you and e.g. DWM won't try to preempt you.
     
  20. Alessio1989

    Regular Newcomer

    Joined:
    Jun 6, 2015
    Messages:
    602
    Likes Received:
    320
    I know, that's called "exclusive" mode... I was just curios to see if there are any changes from the other GCN GPUs, especially considering the presentation mode changes that will affect Windows 10 (full-screen included).
    Also, does anyone know if there are issues on Windowed mode too?
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...