Will GPUs with 4GB VRAM age poorly?

HDD data chunks do not have 1:1 mapping to GPU pages. Usually HDD pages are grouped in macro pages to reduce seeking.
In most cases I'd agree with this. However that SSG card AMD showed off would seem to indicate that doesn't always need to be the case.
 
In most cases I'd agree with this. However that SSG card AMD showed off would seem to indicate that doesn't always need to be the case.
That SSG card doesn't yet use memory mapped files or flat addressing. SSD is a separate memory and you need to manually copy data to/from it.

Quote:
AMD indicated that it is not using the NAND pool as a memory-mapped space currently, but it can do so, and it is a logical progression of the technology. The company also commented that some developers are requesting a flat memory model.
 
Would adding a secondary slower ram be a good solution ?

Say 4 gigs of GDDR 7gbps and then 4 gigs of 5gbps ?


Anyway I wouldn't buy a 3gig card or a 4 gig card today unless it was going into an htpc or was a temp card that would be replaced in a year
 
Would adding a secondary slower ram be a good solution ?

Say 4 gigs of GDDR 7gbps and then 4 gigs of 5gbps ?
Commonly in games the GPU accesses pretty much the same memory regions (cache lines) on two consecutive frames. Camera and objects must move smootly to make an illusion of movement (animation). Most of the vertex and texture data accessed is the same as was accessed one frame earlier. All temp data such as render targets remain in the same memory addresses during the whole application life time. So it is safe to assume that less than 10% of the data set (cache lines accessed) changes per frame (at 60 fps).

Let's assume our high end GPU has 300 GB/s bandwidth. We are rendering at 60 fps. Typically achievable GDDR bandwidth is around 80% of the maximum, so we can access 240 GB per second. This is 4 GB per frame. However not all memory accesses are unique. Most of the bandwidth intensive data, such as render targets are accessed multiple times every frame. Also only a part of the frame is memory bandwidth bound (full memory bandwidth is only utilized when GPU is BW bound). So it is safe to asume that no more than 2 GB of unique data can be accessed per frame (on this 300 GB/s GPU).

A GPU with 2 GB of fast HBM2 (as last level cache) and slow DDR as main memory would likely work just fine. With the numbers from above, the DDR only needs to provide 2 GB * 10% = 200 MB of new data per frame. We need 60 fps * 200 MB = 12 GB/s DDR bandwidth. A slow single channel DDR setup would be just fine. It's worth remembering that these calculation assume competition with 300 GB/s bandwidth. If you want to match 600 GB/s unified memory system (running a game with 2x more bandwidth requirements and 2x more unique data) you'd need 4 GB of cache and 24 GB/s main memory bandwidth (dual channel LPDDR/DDR). All numbers are of course rough estimates, but it it is clear that we need another level in the memory hierarchy soon (the new Xeon Phi already has it). It is a big waste to have 16+ GB of ultra fast memory, if you can only access a tiny fraction of it every frame.
 
Commonly in games the GPU accesses pretty much the same memory regions (cache lines) on two consecutive frames. Camera and objects must move smootly to make an illusion of movement (animation). Most of the vertex and texture data accessed is the same as was accessed one frame earlier. All temp data such as render targets remain in the same memory addresses during the whole application life time. So it is safe to assume that less than 10% of the data set (cache lines accessed) changes per frame (at 60 fps).

Let's assume our high end GPU has 300 GB/s bandwidth. We are rendering at 60 fps. Typically achievable GDDR bandwidth is around 80% of the maximum, so we can access 240 GB per second. This is 4 GB per frame. However not all memory accesses are unique. Most of the bandwidth intensive data, such as render targets are accessed multiple times every frame. Also only a part of the frame is memory bandwidth bound (full memory bandwidth is only utilized when GPU is BW bound). So it is safe to asume that no more than 2 GB of unique data can be accessed per frame (on this 300 GB/s GPU).

A GPU with 2 GB of fast HBM2 (as last level cache) and slow DDR as main memory would likely work just fine. With the numbers from above, the DDR only needs to provide 2 GB * 10% = 200 MB of new data per frame. We need 60 fps * 200 MB = 12 GB/s DDR bandwidth. A slow single channel DDR setup would be just fine. It's worth remembering that these calculation assume competition with 300 GB/s bandwidth. If you want to match 600 GB/s unified memory system (running a game with 2x more bandwidth requirements and 2x more unique data) you'd need 4 GB of cache and 24 GB/s main memory bandwidth (dual channel LPDDR/DDR). All numbers are of course rough estimates, but it it is clear that we need another level in the memory hierarchy soon (the new Xeon Phi already has it). It is a big waste to have 16+ GB of ultra fast memory, if you can only access a tiny fraction of it every frame.
Wouldn't be better have two different pool areas, so it would make it easier to place bandwidth critical resources in the fastest pool? Or do you think that this could be simply managed by the driver? Also, what about games that need a high frame-rate - like VR 90+ FPS and online shooters. Moreover on multi-GPU configuration could this create additional headaches?
 
Wouldn't be better have two different pool areas, so it would make it easier to place bandwidth critical resources in the fastest pool? Or do you think that this could be simply managed by the driver? Also, what about games that need a high frame-rate - like VR 90+ FPS and online shooters. Moreover on multi-GPU configuration could this create additional headaches?
I was suggesting a fully automated last level memory cache. 64 byte cache line granularity would of course be too expensive. Maybe it could work if we had bigger cache lines. Or simply the residency could be at page granularity. Basically the GPu page faults for each memory read that is not in the fast memory. Those pages are immediately copied from the slow memory to the fast memory. If this GPU had fast latency optimized access to main memory (faster than PCIE) stalls caused by page faults would be short enough to handle by pre-emption, parallel work queues and simply by having more threads on flight at once. So it would look like a huge continuous memory space for the programmer (and you could use CPU pointers directly as it would look like UMA).
 
Would adding a secondary slower ram be a good solution ?

Say 4 gigs of GDDR 7gbps and then 4 gigs of 5gbps ?
Even if it could work, how much money would you save?

The price difference between 7 and 5 Gbps is much smaller than the difference between 4 and 8GB.
 
Nvidia GeForce GTX 1060 3GB vs 6GB review
Is three gigs of VRAM enough for top-tier 1080p60 gameplay?
Going back to our GTX 1080 review, we were pleasantly surprised to see how well the old GTX 780 Ti held up on our modern benchmarking suite bearing in mind its 3GB of VRAM. The new GTX 1060 3GB has the same amount of memory but an additional two generation's worth of memory compression optimisations - the end result is that three gigs is indeed enough for top-tier 1080p60 gameplay - as long as you stay away from memory hogs like MSAA (which tends to kill frame-rate) along with 'HQ/HD' texture packs and extreme resolution texture options. By and large, the visual impact of these options at 1080p is rather limited anyway - generally speaking, they're designed for 4K screens.
http://www.eurogamer.net/articles/digitalfoundry-2016-nvidia-geforce-gtx-1060-3gb-vs-6gb-review

Edit: Added video link from review.
 
Last edited:
What I don't get is, why people always throw in memory compression into a discussion that should be about memory capacity. Memory compression is about reducing bandwidth requirements, not storage requirements. That is - for example - texture compression, which is (mainly) a function at the content-side of things, or tessellation, which with proper use, could be used to minimize the PCIe transfer requirements of geometry. IIRC it even was conceived with that in mind (Endless City), not with high-poly roadblocks (Crysis 2).
 
What I don't get is, why people always throw in memory compression into a discussion that should be about memory capacity. Memory compression is about reducing bandwidth requirements, not storage requirements.
Because they don't know what they are talking about and because memory compression does in some cases popup as requiring extra memory (if say TMUs can't read what the ROPs wrote then surface has to be decompressed by driver).
 
BF1 now joins the bunch, 4K gameplay is unsustainable on Fury X, RX 480 8GB actually has higher fps than Fury X!
bf1_3840_11.png

bf1_3840_12.png

http://gamegpu.com/action-/-fps-/-tps/battlefield-1-open-beta-test-gpu
 
What I don't get is, why people always throw in memory compression into a discussion that should be about memory capacity. Memory compression is about reducing bandwidth requirements, not storage requirements. That is - for example - texture compression, which is (mainly) a function at the content-side of things, or tessellation, which with proper use, could be used to minimize the PCIe transfer requirements of geometry. IIRC it even was conceived with that in mind (Endless City), not with high-poly roadblocks (Crysis 2).

Normally, Digital Foundry's better than this, but they've kinda been slipping some the last... I think four years? So, mistakes like this get made more often by them. :(
 
That's a shame, because I really was under the impression that DF actually knew what they were talking about and this maybe was just some indicental twist in otherwise thorough articles.
 
By and large they're still accurate, but it's about an 80-90% hit rate when it used to be higher than that. If you want to see some good material by them that's new, check out their DF Retro vids. They're pretty damned awesome.
 
Normally, Digital Foundry's better than this, but they've kinda been slipping some the last... I think four years? So, mistakes like this get made more often by them. :(

Oof. It's weird too since he also mentioned the driver's memory management in the one spot then... derp.
 
Back
Top