Is 4GB enough for a high-end GPU in 2015?

Esrever · Jun 10, 2015

Who uses 4x MSAA at 4k?

Alessio1989 · Jun 10, 2015

I am pretty sure your computes are wrong

silent_guy said:
Well, so the math:
1920x1080 x 4 RGBA x 4 MSAA = 122MB.
3840x2160 x 4 RGBA x 4 MSAA = 488MB.

Difference: 366MB. Multiply by 3 for triple buffering: 1098 MB.

Add another 366 MB for z buffer, and you're at ~1.5GB extra with identical textures.

I'm sure there are other buffers (G, stencil, whatever) for stuff that I don't know anything about, and my 3x multiplication is probably pessimistic (maybe they reduce the MSAA to non-MSAA at the end of a frame?), but you get the idea. And some expert will probably point out all my mistakes...

I am pretty sure your computes are wrong...

Let's remove the MSAA to simplify computes.. Anyway the mask pattern is not a fixed constant (ie 2x, 4x, 8x) to multiply by a frame buffer.
Let's remove anything related to front-buffer compression (like AMD GCN 1.2 delta compression) to simplify even more the computes.

Now let's see a 4K frame buffer queue:
- on 3840x2160 we have 8,294,400 texels;
- on a r8g8b8a8 frame buffer format we multiply all texels by 4 bytes (32-bit) and we have a 33,177,600 byte buffer or a 32,400 KibiByte (kilo binary byte) or ~32 MebiByte (mega binary byte);
- 32 MiB on a triple frame buffer queue are ~95MiB.

Depth and stencil buffer are implementation dependent, on GCN they are separated (doesn't matter the format used) and the depth-buffer should be a 32-bit buffer (don't remember how it's backed the stencil buffer), on Geforce (at least on Fermi and Kepler) they should be packed togheter.

The rest is for all other buffers (index buffers, vertex buffers, textures, samplers, descriptors, constant buffers, structured buffers and so on). Texture on games are always block compressed, the format and the compression rate vary...

And finally a portion of the VRAM is OS reserved, plus we have a portion of system shared memory to prevent atrocious crashes when VRAM is not enough.

Frenetic Pony · Jun 10, 2015

I'm sure I, and the other programmers on here, can say from experience "no, 4gb is categorically not enough to cover all scenarios that will arise" at the so called "top end". Framebuffers aside, ram usage will vary drastically game to game, but the attitude at the top is usually "if you've got it, may as well use it" or rather "no reason not to use it."

I haven't done a survey or anything, but the practice before the consoles launched was just to guess at the specs of the consoles as best as possible, since games take so long to make, and then build around that. For me personally, and I got the impression from others as well, was that ram for the PS4/XBO ended up being a more than expected while CPU/GPU power was less than expected. Quite the reversal from the last generation where the console GPUs (at least for the 360) wer cutting edge for the time, while there wasn't enough ram in comparison to what the GPU could make use of. But in retrospect the huge drop in ram prices after the price fixing conspiracy was broken up shouldn't have made this too surprising.

Regardless, as the games that were built based upon guessing at the specs are finally released and the games built upon knowing the specs come up more and more I wouldn't be surprised to see ram usage go up for the "big release" titles. There are a lot of really good streaming options for textures, still the biggest ram hog, but the best of those still take up a lot of R&D time and can take up precious bandwidth and ALU time as well. It can be cheaper and easier to burn through all those gigs of ram that you've got anyway and save whatever milliseconds and R&D time you've got for other things.

All that being said, I think I did read somewhere that somehow the new "delta color compression" thing both Nvidia and AMD are now using also compresses textures somehow? But I was under the impression it was just for framebuffers and bandwidth. Still, someone correct me if I'm wrong there.

Pixel · Jun 10, 2015

Star Citizen @1080p at the highest settings need can need 3-3.5 GB depending on the environment and number of ships in the area. Ships with 7 material layers per surface exist.

Chris Roberts has said Star Marine will require even more memory for the highest settings than Star Citizen does.

Thats not even talking about 3-4K settings for these games.

ArcticCircle · Jun 10, 2015

4 GB might be enough for some but it most certainly ain't enough for me. I'm one of those idiots who put texture quality to max and only after that worry about other settings. I'll run the game without shadows or AA if that means I can use highest quality textures. While texture quality eat my VRAM, it don't seem to have any effect on FPS. Or frametime unless I'll crank setting way past VRAM limit and then I get stutter (as expected). In both Witcher 3 and GTA 5, I have to be very careful not to go above total VRAM and I've noticed that it's best to keep it way bellow if settings allow it. And why not since you also get better FPS when you turn down other settings then texture quality

2nd gen HBM should be great. Something like 16-32 GB VRAM sounds great for my purposes. Maybe finally then textures are not blurry mess anymore.

silent_guy · Jun 10, 2015

Alessio1989 said:
I am pretty sure your computes are wrong...

Yes. I had a factor of 4 extra!

Let's remove the MSAA to simplify computes.. Anyway the mask pattern is not a fixed constant (ie 2x, 4x, 8x) to multiply by a frame buffer.

They are a fixed constant. We're talking about memory allocation, not BW usage.

Let's remove anything related to front-buffer compression (like AMD GCN 1.2 delta compression) to simplify even more the computes.

Ignoring compression is exactly the right thing to do, because compression only saves BW. It doesn't save on memory usage.

Now let's see a 4K frame buffer queue:
- on 3840x2160 we have 8,294,400 texels;
- on a r8g8b8a8 frame buffer format we multiply all texels by 4 bytes (32-bit) and we have a 33,177,600 byte buffer or a 32,400 KibiByte (kilo binary byte) or ~32 MebiByte (mega binary byte);
- 32 MiB on a triple frame buffer queue are ~95MiB.

Correct.

Depth and stencil buffer are implementation dependent, on GCN they are separated (doesn't matter the format used) ...

Yes, that's why I didn't include them in my calculation.

... and the depth-buffer should be a 32-bit buffer (don't remember how it's backed the stencil buffer), on Geforce (at least on Fermi and Kepler) they should be packed togheter.

There's 1 Z value per fragment. I believe Z buffers are 32-bits per pixel, so it's basically the same as your RGBA8888 except that you don't need to keep the Z buffer 3 times for the tripple buffering. But the multiplication factor remains for MSAA.

The rest is for all other buffers (index buffers, vertex buffers, textures, samplers, descriptors, constant buffers, structured buffers and so on). Texture on games are always block compressed, the format and the compression rate vary...

All of those should be resolution independent, so they don't matter if you want to calculate the delta between one resolution and the other.

eastmen · Jun 10, 2015

Pixel said:
Star Citizen @1080p at the highest settings need can need 3-3.5 GB depending on the environment and number of ships in the area. Ships with 7 material layers have been demoed.

Chris Roberts has said Star Marine will require even more memory for the highest settings than Star Citizen does.

Thats not even talking about 3-4K settings for these games.

I'm just as pumped for Star Citizen as the next guy but this is a mid 2015 card and you are talking about a game that will be lucky to make it out of alpha in 2016 let alone a full release.

Frenetic Pony · Jun 10, 2015

Ninja re-edit.

To be clear, which I forgot to put in my previous post, for now (this year and probably next) 4gb isn't going to kill you, consoles currently give over that amount to devs. But as soon as they give dev apps more ram, like 5-6gb, you'll be swapping in textures over PCI-Express 3.0 on and AMD 4gb Fury (whatever) and that will kill your framerate. But for an immediate purchase and usage, it's not going to be too limiting, framebuffers even at 4k probably aren't going to put you over, as that's a couple hundred megabytes with today's g-buffers (assuming you don't naively multisample or something). But even then that's cutting it close, depending on the game you're playing. As soon as something goes wrong, as in you're playing a game with a super thick g-buffer stack that also maxes out on V-Ram for consoles, you'll start swapping and probably kill performance.

6-8gb (depending on what your bus size is) is really better if you intend to keep your card around for a few years or more.

eastmen · Jun 10, 2015

Frenetic Pony said:
Ninja re-edit.

To be clear, which I forgot to put in my previous post, for now (this year and probably next) 4gb isn't going to kill you, consoles currently give over that amount to devs. But as soon as they give dev apps more ram, like 5-6gb, you'll be swapping in textures over PCI-Express 3.0 on and AMD 4gb Fury (whatever) and that will kill your framerate. But for an immediate purchase and usage, it's not going to be too limiting, framebuffers even at 4k probably aren't going to put you over.

The real questions are this . How much is the card vs cards with similar performance and more ram ? How many games coming out in the next year or two (or however long you keep your cards) need more than 4 gigs of ram? How many of those games wont require a faster card than even a fury with 8 gigs would provided or a similar card in its price range ?

That's really it. If its priced in line with NVidia's 4 gig cards but offers performance of the 6/8/12 gig cards then for many it could be worth it even with a smaller frame buffer. For others it might not be.

For me , I am most interested in VR , so I'd question if 4 ,6 , 8 or more is needed. I will have to wait to start seeing reviews. But since both VR helmets are in q4/15 and q1/16 I have time to see

Silent_Buddha · Jun 10, 2015

Frenetic Pony said:
6-8gb (depending on what your bus size is) is really better if you intend to keep your card around for a few years or more.

Sure, if you're happy running your game at 15-20 FPS.

Honestly I'll be surprised if more than 1% of games in 2-3 years will effectively require more than 4 GB of memory. We're still going to be mostly limited by console requirements which means games that exceed that is going to be exceedingly rare. Games on consoles also need to store system RAM and VRAM in the same pool of memory unlike PC. So it's unlikely a game on console will exceed 4 GB of VRAM usage.

As well, in 2-3 years, the hope would be that DX12 will be more prevalent. And Sebbbi has already given the viewpoint of a current developer of many memory saving tricks that games can easily use to provide far greater texture detail than exists currently with less than 1 GB of VRAM.

Of course, more memory is never bad...unless it doesn't get used and only inflates the price of your graphics card. But until such time as 4 GB becomes a limiting factor it's all supposition.

Yes you can run into limitations with carefully manufactured tests now, but then what does it get you. Even the Titan X with 12 GB of memory can't even reach 30 FPS in a situation where it can actually use more than 4 GB of memory (http://www.anandtech.com/show/9306/the-nvidia-geforce-gtx-980-ti-review/13 ). Seriously how much future proofing are you going to get with memory when your GPU is going to limit you. Especially when the inflection point for increased memory useage is consoles which aren't going to be getting a hardware upgrade for another 4-7 years.

Yes, there will be the 1% that have to run max everything mods in something like Skyrim. In which case, no, the Fury card isn't going to be for them.

Remember when 2 GB cards launched and 1 GB cards were considered idiotic to buy because even those no games used 2 GB on the video card, it was a way to future proof? And by the time 2 GB was actually used in games, all those 2 GB cards that launched back then were too slow to actually run those games with max IQ? Yeah that future proofing worked really well.

Of course, perception is always different from reality. The perception that more than 4 GB will make a current video card future proof will discourage people from getting the Radeon Fury. The reality is, that by the time it's a serious limitation, it wouldn't be fast enough to run those games to the satisfaction of anyone needing more than 4 GB. Likewise, by the time Titan X can take advantage of that mass of memory it has in games, it'll be too slow to satisfy that people that bought it in the first place.

TL: DR - By the time greater than 4 GB of memory is critically important. Almost everyone that bought an enthusiast card this year, will have already upgraded their video card...making it a moot point.

DX12 can bring a lot of exciting changes to the world of graphics rendering in games. Increased VRAM useage likely isn't going to be one of those.

Regards,
SB

Pixel · Jun 10, 2015

Silent_Buddha said:
Honestly I'll be surprised if more than 1% of games in 2-3 years will effectively require more than 4 GB of memory. We're still going to be mostly limited by console requirements which means games that exceed that is going to be exceedingly rare. Games on consoles also need to store system RAM and VRAM in the same pool of memory unlike PC. So it's unlikely a game on console will exceed 4 GB of VRAM usage.

Lets not forget modding is a big part of PC gaming. To alot of pc gamers with the best videocards Skyrim/GTA/Fallout/Crysis/Halflife2 modding is/was something with lots of time sunken into. Skyrims Creation Engine Ugridstoload editing expands draw distance and will load higher LODs. If you install a large # of the graphics mods along with increasing draw distance you can easily hit Skyrims particular ~3.2GBish vram 32bit app memory limit cap (no not 4.096GB). Now with Fallout4 being also a Creation Engine game but a 64bit app whats to say with mods it wont. Ugridstoload to this day is still the most graphically impressive way to improve the visuals for distant objects in skyrim a 4 year old game, though it consumes alot of memory.

Putas · Jun 10, 2015

Silent_Buddha said:
Yes you can run into limitations with carefully manufactured tests now, but then what does it get you. Even the Titan X with 12 GB of memory can't even reach 30 FPS in a situation where it can actually use more than 4 GB of memory (http://www.anandtech.com/show/9306/the-nvidia-geforce-gtx-980-ti-review/13 ).

Considering how well is the 295 X2 doing, the test was not very capacity demanding. At least as average framerate goes, but that is not gonna capture low point caused by data swapping. So this did not prove anything really.

Silent_Buddha said:
Remember when 2 GB cards launched and 1 GB cards were considered idiotic to buy because even those no games used 2 GB on the video card, it was a way to future proof? And by the time 2 GB was actually used in games, all those 2 GB cards that launched back then were too slow to actually run those games with max IQ? Yeah that future proofing worked really well.

I don't know what cards exactly are you talking about. I agree that the ideal capacity is determined by GPU abilities, so we should state which are we talking about, to be able to judge. Obviously there are SKUs with too much memory that won't be used, but are you saying there aren't also those that suffer from too little? The 970 affair showed that card can get to 3.5 gigs rather easy, why shouldn't we expect significantly faster chips to use over 4?

Silent_Buddha said:
Of course, perception is always different from reality. The perception that more than 4 GB will make a current video card future proof will discourage people from getting the Radeon Fury. The reality is, that by the time it's a serious limitation, it wouldn't be fast enough to run those games to the satisfaction of anyone needing more than 4 GB. Likewise, by the time Titan X can take advantage of that mass of memory it has in games, it'll be too slow to satisfy that people that bought it in the first place.

If perception always differ how did you access reality? Capacity of Titan X was probably not determined by gaming needs. 6 GB of 980 Ti looks like sweet spot for me. Chips as powerfull as GM200 and Fiji will definitely find good cases for more than 4 GB. It won't be always, and unfortunately probably after reviewers able to measure it put their attention to other products, but it will happen.

Alessio1989 · Jun 10, 2015

silent_guy said:
They are a fixed constant. We're talking about memory allocation, not BW usage.

There's 1 Z value per fragment. I believe Z buffers are 32-bits per pixel, so it's basically the same as your RGBA8888 except that you don't need to keep the Z buffer 3 times for the tripple buffering. But the multiplication factor remains for MSAA.

On SSAA the mask pattern should be fixed constant. On MSAA is a sorta optimization of SSAA and the mask pattern is a fixed constant only in an hypothetical worst case except for dept/z-buffer. MJP wrote a good article about MSAA and other anti-aliasing methods: https://mynameismjp.wordpress.com/2012/10/24/msaa-overview/

CarstenS · Jun 10, 2015

From a memory allocation standpoint it is my understanding that you have to provide worst-case sized buffers since you cannot dynamically resize them.

silent_guy · Jun 10, 2015

Alessio1989 said:
On SSAA the mask pattern should be fixed constant. On MSAA is a sorta optimization of SSAA and the mask pattern is a fixed constant only in an hypothetical worst case except for dept/z-buffer. MJP wrote a good article about MSAA and other anti-aliasing methods: https://mynameismjp.wordpress.com/2012/10/24/msaa-overview/

That's a great article, but it doesn't contradict my claim: you always need to reserve as much memory as the worst case.

Alessio1989 · Jun 10, 2015

silent_guy said:
That's a great article, but it doesn't contradict my claim: you always need to reserve as much memory as the worst case.

My bad, you are right. I should spend more time with the profiler I guess xD
As for Direct3D 12, for MSAA an extra render target is needed as support only the flip presentation mode, which unfortunately does not support directly multi-sampling :\
Another "bad" news of D3D12 is that ASTC compression support will not supported in this version of DX12, I guess it will probably added in a next minor iteration of DX12, maybe focused on mobile hardware too (Qualcomm where are you?)

pMax · Jun 10, 2015

silent_guy said:
That's a great article, but it doesn't contradict my claim: you always need to reserve as much memory as the worst case.

It will be interesting to see if fiji supports memory paging, and if it can be used effectively or not. In that case, reserving is not a burden at all.

silent_guy · Jun 10, 2015

pMax said:
It will be interesting to see if fiji supports memory paging, and if it can be used effectively or not. In that case, reserving is not a burden at all.

How common is that case where you allocate a frame buffer and you don't render the full image? I don't think that ever happens, and that'd be the only case where paging wouldn't result in ridiculous drop in performance.
The sub pixels need to be stored close to each other in memory: if not fully covered pixels would be fast, but performance would sink dramatically when rendering sub pixels.

Alessio1989 · Jun 10, 2015

pMax said:
It will be interesting to see if fiji supports memory paging, and if it can be used effectively or not. In that case, reserving is not a burden at all.

All WDDM 2.0 devices must support VRAM paging since they must support virtual memory: https://msdn.microsoft.com/en-us/library/dn894173.aspx

Ethatron · Jun 10, 2015

silent_guy said:
How common is that case where you allocate a frame buffer and you don't render the full image?

Every game doing dynamic resolution rendering should do that. It's also possible if resources are aliased across resolutions. Re-allocating at high frequency is out of the question.

Is 4GB enough for a high-end GPU in 2015?

Moderator