Will GPUs with 4GB VRAM age poorly?

Discussion in 'Architecture and Products' started by DavidGraham, Jul 10, 2016.

  1. Esrever

    Regular

    Joined:
    Feb 6, 2013
    Messages:
    846
    Likes Received:
    647
    Judging by ms's render of the Scorpio, it will have 12gb of ram total. So 9gb usable for games seem reasonable if the same O's reservation is kept. 8gb won't be enough then if 4gb isnt enough now.
     
  2. gamervivek

    Regular

    Joined:
    Sep 13, 2008
    Messages:
    805
    Likes Received:
    320
    Location:
    india
    3GB cards are aging poorly, not all of them though.

    [​IMG]
     
    Lightman likes this.
  3. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    Doom has settings beyond Ultra that should in theory kill 7970, though.
     
    BRiT likes this.
  4. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,976
    Likes Received:
    5,213
    Strangely, with OpenGL, 3GB Kepler cards deliver 70+% more performance, so it's probably the same situation with Mantle, meaning the close to the metal nature of these APIs necitates the use of bigger amounts of VRAM.

    However, it seems HD 7000 cards are not affected by this, so the question now becomes, how is it that a low level APIs which is supposed to give more performance and more control over the hardware cause such massive negative scaling, compared to the old automated API?

    [​IMG]
     
    #64 DavidGraham, Jul 14, 2016
    Last edited: Jul 14, 2016
  5. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    Maybe NVidia's driver for those old cards isn't working correctly?
     
    Heinrich04 likes this.
  6. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,661
    Likes Received:
    1,114
    Or Nvidia has put a lot of effort into optimizing for specific games on their hardware at the driver level.

    Devs make choices that has to work on all platforms and some of these choices will be suboptimal for Nvidia hardware, some for AMD and Intel. DX12 doesn't allow the same level of intercepting stuff; Substituting buffer layouts/formats and shader code on the fly.

    Drivers have always been Nvidia's strong suit.

    Cheers
     
    Heinrich04 and Michellstar like this.
  7. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    I like how my 4 years old HD 7970 OC (which is bascially a R9 280X) inches ahead of the much more modern and with 33% more memory equipped R9 380X and is within striking distance to the lesser Hawaii-based models as well as almost reaching the GTX 970. And of course, how my shiny new, 220-EUR-RX480 is faster than a GTX 980. :)
     
    Heinrich04 and Lightman like this.
  8. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,976
    Likes Received:
    5,213
    Could be. Or the developer didn't care putting low level optimization for Kepler in DOOM.

    Other Vulkan games work just fine on Kepler.
    Highly unlikely, they would have to do it for a lot of games and for a lot of architectures.
     
  9. Anarchist4000

    Veteran

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    If he were doing that he would have linked the computerbase benchmarks with the 4GB fury beating even a 1070 by a fair margin.
    https://www.computerbase.de/2016-07/doom-vulkan-benchmarks-amd-nvidia/
    Regardless the standardized use of ASTC and even procedural texturing will help a lot. NVLink or whatever AMD is using to the CPU/system memory and it's even less of an issue.
     
  10. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,976
    Likes Received:
    5,213
    FuryX beating a 1070 have nothing to do with memory, and infact according to the charts from gamegpu, it just reached 980Ti levels of performance, even with the help of the massive boost from vulkan. Meaning it had abysmal performance to begin with.
     
  11. smw

    smw
    Newcomer

    Joined:
    Sep 13, 2008
    Messages:
    113
    Likes Received:
    43
    But as was pointed out in the other thread gamegpu used a setting that precludes Async Compute from working
     
  12. Alessio1989

    Regular

    Joined:
    Jun 6, 2015
    Messages:
    614
    Likes Received:
    321
    Yes, advanced modern straming techniques help a lot. However, Virtual/Sparse textures or Tiled/Reserved resources (did I miss some extra nomenclature?) are still uncommon in a lot of modern games (we can tanks Terascale and pre-broadwell GPUs for that I guess... plus the lack of DirectX 11.2 support under Windows 7).
    If we talk about things like VR +MSSAA bandwidth waste due texture streaming could become a serious issue under multi-GPU.
    PS: wasn't 64K the minimum granularity under Windows?

    EDIT: there are also game using dynamic details scaling techniques to deal with "small" VRAM pools, like the last couple of Total War (which is the reason I never consider such games as pure benchmark if the related option is enabled).
     
    #72 Alessio1989, Jul 14, 2016
    Last edited: Jul 14, 2016
    Heinrich04 likes this.
  13. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    That's exactly what the Game Ready drivers are. Well, the games part. Architectures seems to be a much muddier facet.
     
  14. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,296
    Location:
    Helsinki, Finland
    Software indirection (in virtual texturing) is practically free nowadays. Redlynx Trials games and id software games ran already at locked 60 fps on last gen consoles. Software indirection is just a couple extra ALU. And the indirection texture read is super cache optimal (as 128x128 pixels read the same texel). It is close to 100% L1 hit.

    Software indirection also results in tightly packed "physical address space". This is great if you use deferred texturing and need to store the "physical UV" to your g-buffer.

    As you said, hardware PRT has also too big page size. 16 KB would be much better. Software indirection doesn't have this problem.

    Also regarding to virtual/sparse shadow mapping. You need to refresh the mapping every frame. GPU goes through the depth buffer and determines shadow map page visibility (and mip). Current APIs dont have UpdateTileMappingsIndirect. You can't change the hardware PRT mappings without a CPU roundtrip, and that would either stall the GPU or add one frame of latency -> shadow glitches. Thus software indirection is currently the only possibility.
     
  15. Alessio1989

    Regular

    Joined:
    Jun 6, 2015
    Messages:
    614
    Likes Received:
    321
    16K and indirect tile update.
    This is a good reuqest for Santa Claus living in Redmond :lol2:
     
    pharma, Razor1 and sebbbi like this.
  16. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,296
    Location:
    Helsinki, Finland
    I talked with Ola Olsson (his paper: http://www.cse.chalmers.se/~uffe/ClusteredWithShadows.pdf) at Siggraph last year and we discussed about this problem. He was talking about virtual shadow mapping local lights and I had one slide about our sunlight virtual shadow mapping stuff. They use hardware PRT and it stalls everything. Also the page mapping changes are horribly slow on OpenGL. These problems are acceptable for research papers/demos, but you can't afford the stall in real games. Everyone doing research on this topic agrees that indirect page table update is needed. Otherwise hardware PRT loses many use cases.
     
    #76 sebbbi, Jul 16, 2016
    Last edited: Jul 16, 2016
    Alessio1989 and Heinrich04 like this.
  17. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,976
    Likes Received:
    5,213
    Comparisons before and after Game Ready drivers don't show much of a performance impact in most cases, driver notes don't state as such as well. These drivers are directed towards bug fixing and support of visual features .
     
  18. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,976
    Likes Received:
    5,213
    Well, I can confirm Rise of Tomb Raider is extremely sensitive to texture quality settings, just by switching to Very high preset sees my frames cut in half despite not running out of memory, running the game on High texture settings can boost fps from anywhere between 33% to 70%. Normally, in most games, this doesn't happen. As texture settings affect VRAM more than anything, it doesn't interfere with fps unless you run out of VRAM, in which case hitching and stuttering will appear.

    Also not every area in the game is taxing on the hardware. Only wide, open areas are (like the Soviet base level or the forest level). Those are the ones affected by the texture settings more. And these are ones that show the widest gap between NV and AMD hardware performance.

    Edit: NV recommends the use of a 12GB GPU @Very High, they say VRAM utilization can reach 10GB over prolonged sessions:
    http://www.geforce.com/whats-new/guides/rise-of-the-tomb-raider-graphics-and-performance-guide
     
    #78 DavidGraham, Aug 19, 2016
    Last edited: Aug 20, 2016
  19. pMax

    Regular

    Joined:
    May 14, 2013
    Messages:
    327
    Likes Received:
    22
    Location:
    out of the games
    Why? A page fault will force you to load from disk anyway, so where's the heck? Do you talk about dGPU systems with some textures held in CPU RAM?
    I do not see exactly the point with HSA -except, of course, if the HDD doesnt have the 64k read sequentially at hand.

    The CPU trap triggered by the GPU is causing so much latency to lose a frame? Again, are you talking about windows or consoles?
     
  20. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,296
    Location:
    Helsinki, Finland
    No, I was talking about hardware PRT (tiled resources) API. On PC you don't have any other way to do sparse GPU virtual memory mappings. The page size is 64 KB in PC DirectX. Virtual memory is useful for data streaming, but it is also very useful for sparse resources, such as sparse shadow maps and volumetric data structures (sparse voxels). 64 KB page mapping granularity is too coarse for these purposes.

    HDD sector size has no meaning for virtual texturing. All modern virtual texturing systems compress data separately in a different variable bitrate format for HDD (either lossless LZMA style or lossy wavelet/JPG style). Data is transcoded at page load. HDD data chunks do not have 1:1 mapping to GPU pages. Usually HDD pages are grouped in macro pages to reduce seeking.
    There is no CPU trap in PC DirectX API. You need to manually put page misses to an append buffer, readback page miss buffer on CPU (next frame) and call UpdateTileMappings to update virtual memory page table. This is acceptable for texture streaming (virtual texturing), but it not acceptable for dynamic GPU generated data, such as sparse (virtual) shadow maps. We need UpdateTileMappingsIndirect API. GPU writes new tile mappings to a UAV and indirect mapping changes mappings accordingly.
     
    #80 sebbbi, Aug 19, 2016
    Last edited: Aug 19, 2016
    Malo and BRiT like this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...