I am pretty sure your computes are wrong...Well, so the math:
1920x1080 x 4 RGBA x 4 MSAA = 122MB.
3840x2160 x 4 RGBA x 4 MSAA = 488MB.
Difference: 366MB. Multiply by 3 for triple buffering: 1098 MB.
Add another 366 MB for z buffer, and you're at ~1.5GB extra with identical textures.
I'm sure there are other buffers (G, stencil, whatever) for stuff that I don't know anything about, and my 3x multiplication is probably pessimistic (maybe they reduce the MSAA to non-MSAA at the end of a frame?), but you get the idea. And some expert will probably point out all my mistakes...
Yes. I had a factor of 4 extra!I am pretty sure your computes are wrong...
They are a fixed constant. We're talking about memory allocation, not BW usage.Let's remove the MSAA to simplify computes.. Anyway the mask pattern is not a fixed constant (ie 2x, 4x, 8x) to multiply by a frame buffer.
Ignoring compression is exactly the right thing to do, because compression only saves BW. It doesn't save on memory usage.Let's remove anything related to front-buffer compression (like AMD GCN 1.2 delta compression) to simplify even more the computes.
Correct.Now let's see a 4K frame buffer queue:
- on 3840x2160 we have 8,294,400 texels;
- on a r8g8b8a8 frame buffer format we multiply all texels by 4 bytes (32-bit) and we have a 33,177,600 byte buffer or a 32,400 KibiByte (kilo binary byte) or ~32 MebiByte (mega binary byte);
- 32 MiB on a triple frame buffer queue are ~95MiB.
Yes, that's why I didn't include them in my calculation.Depth and stencil buffer are implementation dependent, on GCN they are separated (doesn't matter the format used) ...
There's 1 Z value per fragment. I believe Z buffers are 32-bits per pixel, so it's basically the same as your RGBA8888 except that you don't need to keep the Z buffer 3 times for the tripple buffering. But the multiplication factor remains for MSAA.... and the depth-buffer should be a 32-bit buffer (don't remember how it's backed the stencil buffer), on Geforce (at least on Fermi and Kepler) they should be packed togheter.
All of those should be resolution independent, so they don't matter if you want to calculate the delta between one resolution and the other.The rest is for all other buffers (index buffers, vertex buffers, textures, samplers, descriptors, constant buffers, structured buffers and so on). Texture on games are always block compressed, the format and the compression rate vary...
Star Citizen @1080p at the highest settings need can need 3-3.5 GB depending on the environment and number of ships in the area. Ships with 7 material layers have been demoed.
Chris Roberts has said Star Marine will require even more memory for the highest settings than Star Citizen does.
Thats not even talking about 3-4K settings for these games.
Ninja re-edit.
To be clear, which I forgot to put in my previous post, for now (this year and probably next) 4gb isn't going to kill you, consoles currently give over that amount to devs. But as soon as they give dev apps more ram, like 5-6gb, you'll be swapping in textures over PCI-Express 3.0 on and AMD 4gb Fury (whatever) and that will kill your framerate. But for an immediate purchase and usage, it's not going to be too limiting, framebuffers even at 4k probably aren't going to put you over.
6-8gb (depending on what your bus size is) is really better if you intend to keep your card around for a few years or more.
Honestly I'll be surprised if more than 1% of games in 2-3 years will effectively require more than 4 GB of memory. We're still going to be mostly limited by console requirements which means games that exceed that is going to be exceedingly rare. Games on consoles also need to store system RAM and VRAM in the same pool of memory unlike PC. So it's unlikely a game on console will exceed 4 GB of VRAM usage.
Considering how well is the 295 X2 doing, the test was not very capacity demanding. At least as average framerate goes, but that is not gonna capture low point caused by data swapping. So this did not prove anything really.Yes you can run into limitations with carefully manufactured tests now, but then what does it get you. Even the Titan X with 12 GB of memory can't even reach 30 FPS in a situation where it can actually use more than 4 GB of memory (http://www.anandtech.com/show/9306/the-nvidia-geforce-gtx-980-ti-review/13 ).
I don't know what cards exactly are you talking about. I agree that the ideal capacity is determined by GPU abilities, so we should state which are we talking about, to be able to judge. Obviously there are SKUs with too much memory that won't be used, but are you saying there aren't also those that suffer from too little? The 970 affair showed that card can get to 3.5 gigs rather easy, why shouldn't we expect significantly faster chips to use over 4?Remember when 2 GB cards launched and 1 GB cards were considered idiotic to buy because even those no games used 2 GB on the video card, it was a way to future proof? And by the time 2 GB was actually used in games, all those 2 GB cards that launched back then were too slow to actually run those games with max IQ? Yeah that future proofing worked really well.
If perception always differ how did you access reality? Capacity of Titan X was probably not determined by gaming needs. 6 GB of 980 Ti looks like sweet spot for me. Chips as powerfull as GM200 and Fiji will definitely find good cases for more than 4 GB. It won't be always, and unfortunately probably after reviewers able to measure it put their attention to other products, but it will happen.Of course, perception is always different from reality. The perception that more than 4 GB will make a current video card future proof will discourage people from getting the Radeon Fury. The reality is, that by the time it's a serious limitation, it wouldn't be fast enough to run those games to the satisfaction of anyone needing more than 4 GB. Likewise, by the time Titan X can take advantage of that mass of memory it has in games, it'll be too slow to satisfy that people that bought it in the first place.
On SSAA the mask pattern should be fixed constant. On MSAA is a sorta optimization of SSAA and the mask pattern is a fixed constant only in an hypothetical worst case except for dept/z-buffer. MJP wrote a good article about MSAA and other anti-aliasing methods: https://mynameismjp.wordpress.com/2012/10/24/msaa-overview/They are a fixed constant. We're talking about memory allocation, not BW usage.
There's 1 Z value per fragment. I believe Z buffers are 32-bits per pixel, so it's basically the same as your RGBA8888 except that you don't need to keep the Z buffer 3 times for the tripple buffering. But the multiplication factor remains for MSAA.
That's a great article, but it doesn't contradict my claim: you always need to reserve as much memory as the worst case.On SSAA the mask pattern should be fixed constant. On MSAA is a sorta optimization of SSAA and the mask pattern is a fixed constant only in an hypothetical worst case except for dept/z-buffer. MJP wrote a good article about MSAA and other anti-aliasing methods: https://mynameismjp.wordpress.com/2012/10/24/msaa-overview/
My bad, you are right. I should spend more time with the profiler I guess xDThat's a great article, but it doesn't contradict my claim: you always need to reserve as much memory as the worst case.
That's a great article, but it doesn't contradict my claim: you always need to reserve as much memory as the worst case.
How common is that case where you allocate a frame buffer and you don't render the full image? I don't think that ever happens, and that'd be the only case where paging wouldn't result in ridiculous drop in performance.It will be interesting to see if fiji supports memory paging, and if it can be used effectively or not. In that case, reserving is not a burden at all.
All WDDM 2.0 devices must support VRAM paging since they must support virtual memory: https://msdn.microsoft.com/en-us/library/dn894173.aspxIt will be interesting to see if fiji supports memory paging, and if it can be used effectively or not. In that case, reserving is not a burden at all.
How common is that case where you allocate a frame buffer and you don't render the full image?