Strangely, with OpenGL, 3GB Kepler cards deliver 70+% more performance, so it's probably the same situation with Mantle, meaning the close to the metal nature of these APIs necitates the use of bigger amounts of VRAM.3GB cards are aging poorly, not all of them though.
Could be. Or the developer didn't care putting low level optimization for Kepler in DOOM.Maybe NVidia's driver for those old cards isn't working correctly?
Highly unlikely, they would have to do it for a lot of games and for a lot of architectures.Or Nvidia has put a lot of effort into optimizing for specific games on their hardware at the driver level
Cheers
If he were doing that he would have linked the computerbase benchmarks with the 4GB fury beating even a 1070 by a fair margin.Dude what're you doing? You're destroying the narrative!
Please stahp!
But as was pointed out in the other thread gamegpu used a setting that precludes Async Compute from workingFuryX beating a 1070 have nothing to do with memory, and infact according to the charts from gamegpu, it just reached 980Ti levels of performance, even with the help of the massive boost from vulkan. Meaning it had abysmal performance to begin with.
Yes, advanced modern straming techniques help a lot. However, Virtual/Sparse textures or Tiled/Reserved resources (did I miss some extra nomenclature?) are still uncommon in a lot of modern games (we can tanks Terascale and pre-broadwell GPUs for that I guess... plus the lack of DirectX 11.2 support under Windows 7).Of course you could crank up shadow/texture resolution to exceed even 12 GB card limits. But this mostly benefits 1440p and 4K. Gains are minimal at 1080p. Remember that we are talking about 4 GB cards = mainstream (all high end cards nowadays are 8 GB+). Brute force scaling all settings (especially shadow map resolution) to maximum has a huge performance impact (for mainstream cards), but a very small image quality improvement (esp at 1080p).
Modern shadow mapping algorithms don't need limitless memory to look good. Virtual shadow mapping needs only roughly as many shadow map texels as there are pixels in the screen for pixel perfect 1:1 result. Single 4k * 4k shadow tile map is enough for pixel perfect 1080p (= 32 MB). Algorithms using conservative rasterization need even less texel resolution to reach pixel perfect quality (https://developer.nvidia.com/content/hybrid-ray-traced-shadows).
Similarly texture resolution increase doesn't grow GPU memory cost with no limit. Modern texture streaming technologies calculate required pixel density for each object (GPU mipmapping is guaranteed not to touch more detailed data). Only objects very close to the camera will require more memory. Each 2x2 increase in texture resolution halves the distance needed to load the highest mip level. At 1080p a properly working streaming system will not load highest mip textures most of the time (less screen pixels = high mips are needed much less frequently). Of course there's some overhead in larger textures, but it is not directly related to the texture assets data size. There are also systems that incur (practically) zero extra memory cost of added content or added texture resolution. Virtual texturing reaches close to 1:1 memory usage (loaded texels to output screen pixels). You could texture every single object in your game with a 32k * 32k texture, and still run the game on 2 GB graphics card. Loading times would not slow down either. But your game would be several terabytes in size, so it would be pretty hard to distribute it
That's exactly what the Game Ready drivers are. Well, the games part. Architectures seems to be a much muddier facet.Highly unlikely, they would have to do it for a lot of games and for a lot of architectures.
Software indirection (in virtual texturing) is practically free nowadays. Redlynx Trials games and id software games ran already at locked 60 fps on last gen consoles. Software indirection is just a couple extra ALU. And the indirection texture read is super cache optimal (as 128x128 pixels read the same texel). It is close to 100% L1 hit.Yes, advanced modern straming techniques help a lot. However, Virtual/Sparse textures or Tiled/Reserved resources (did I miss some extra nomenclature?) are still uncommon in a lot of modern games (we can tanks Terascale and pre-broadwell GPUs for that I guess... plus the lack of DirectX 11.2 support under Windows 7).
If we talk about things like VR +MSSAA bandwidth waste due texture streaming could become a serious issue under multi-GPU.
PS: wasn't 64K the minimum granularity under Windows?
I talked with Ola Olsson (his paper: http://www.cse.chalmers.se/~uffe/ClusteredWithShadows.pdf) at Siggraph last year and we discussed about this problem. He was talking about virtual shadow mapping local lights and I had one slide about our sunlight virtual shadow mapping stuff. They use hardware PRT and it stalls everything. Also the page mapping changes are horribly slow on OpenGL. These problems are acceptable for research papers/demos, but you can't afford the stall in real games. Everyone doing research on this topic agrees that indirect page table update is needed. Otherwise hardware PRT loses many use cases.16K and indirect tile update.
This is a good reuqest for Santa Claus living in Redmond
Comparisons before and after Game Ready drivers don't show much of a performance impact in most cases, driver notes don't state as such as well. These drivers are directed towards bug fixing and support of visual features .That's exactly what the Game Ready drivers are. Well, the games part. Architectures seems to be a much muddier facet.
Why? A page fault will force you to load from disk anyway, so where's the heck? Do you talk about dGPU systems with some textures held in CPU RAM?As you said, hardware PRT has also too big page size. 16 KB would be much better. Software indirection doesn't have this problem.
The CPU trap triggered by the GPU is causing so much latency to lose a frame? Again, are you talking about windows or consoles?You can't change the hardware PRT mappings without a CPU roundtrip, and that would either stall the GPU or add one frame of latency
No, I was talking about hardware PRT (tiled resources) API. On PC you don't have any other way to do sparse GPU virtual memory mappings. The page size is 64 KB in PC DirectX. Virtual memory is useful for data streaming, but it is also very useful for sparse resources, such as sparse shadow maps and volumetric data structures (sparse voxels). 64 KB page mapping granularity is too coarse for these purposes.Why? A page fault will force you to load from disk anyway, so where's the heck? Do you talk about dGPU systems with some textures held in CPU RAM?
I do not see exactly the point with HSA -except, of course, if the HDD doesnt have the 64k read sequentially at hand.
There is no CPU trap in PC DirectX API. You need to manually put page misses to an append buffer, readback page miss buffer on CPU (next frame) and call UpdateTileMappings to update virtual memory page table. This is acceptable for texture streaming (virtual texturing), but it not acceptable for dynamic GPU generated data, such as sparse (virtual) shadow maps. We need UpdateTileMappingsIndirect API. GPU writes new tile mappings to a UAV and indirect mapping changes mappings accordingly.The CPU trap triggered by the GPU is causing so much latency to lose a frame? Again, are you talking about windows or consoles?