Memory requirements for AA?

Alistair

Regular
Can anyone tell me how to calculate the memory requirement for different levels of AA?

Is it No of pixels x colour depth x something else?

Alternatively, what's the best that can be achieved with 64, 128 and 256 MB cards?


A.
 
Pixel width x pixel height x color depth x AA setting/ 8 / 1024 / 1024= MB used for the back buffer, plus pixel width x pixel height x color depth / 8 / 1024 / 1024 for the front buffer.

1600 x 1200 x 32 x 6 / 8 / 1024 / 1024 = 43.9MB for the back buffer, then 1600x1200x32 / 8 / 1024 / 1024 = 7.3MB your total is 51.2MB.

So it is possible to run 16x12x32x6 on a 64MB board, although performance would likely be horrible due to only having 12.8MB left outside of framebuffer(5.5MB if you are tripple buffering).
 
:oops: Forgot Z/W, is there any way to calculate out how much RAM that would take though? Do the current boards always hit optimal Z compression rates?
 
Typically, lossless compression means that even if you normally get, say, 2:1 you still need to allocate enough mem for 1:1 as there are always occasions where it just fails to do anything for you.

John
 
JohnH said:
Typically, lossless compression means that even if you normally get, say, 2:1 you still need to allocate enough mem for 1:1 as there are always occasions where it just fails to do anything for you.

John

Example?
 
dominikbehr said:
every color buffer needs a depth buffer.
Meaning Ben's total would be roughly doubled (assuming a 24- or 32-bit Z-buffer)?

BTW, why does everyone on this board spell triple wrong? :)
 
And doesn't (purely) postprocessor/RAMDAC based AA require the front buffer to be the same size as the back buffer?
 
Pete said:
Meaning Ben's total would be roughly doubled (assuming a 24- or 32-bit Z-buffer)?

Yes, you need another 43.9MB, and the total becomes 95.1MB. So the 64MB cards are out, and 128MB cards have only 32.9MB left for textures.

demalion said:
And doesn't (purely) postprocessor/RAMDAC based AA require the front buffer to be the same size as the back buffer?

Yes, although I don't know any chip other than VSA-100 doing this.

By the way, the memory requirement for AA on tilers can be very different. It is possible for a tiler to use the same amount of frame buffer memory with and without AA.
 
pcchen said:
demalion said:
And doesn't (purely) postprocessor/RAMDAC based AA require the front buffer to be the same size as the back buffer?

Yes, although I don't know any chip other than VSA-100 doing this.

I assumed this was also the case with GFFX 2x AA.
 
So there is actually a use for 256 MB boards, but only with big textures, high res and a high degree of AA?

Does anyone think they're worth the extra cash over 128 MB cards, for things available this year?

(Deus Ex 2 is about the only must have I can think of, in my currently jaded state...)
 
K.I.L.E.R said:
JohnH said:
Typically, lossless compression means that even if you normally get, say, 2:1 you still need to allocate enough mem for 1:1 as there are always occasions where it just fails to do anything for you.

John

Example?

Just think about typical entrophy or run length encoding techniques, you throw random data at them, they normally fail very badly. These techniques also tend to be very bad for random access.

The other thing to consider is that most techniques currently used are aimed at reducing BW not footprint. For example, if you consider how some IHV's might currently be compressing the FB when MSAA is enable, each pixel on the screen requires either 1 or four (for 4x) colours to represent it depending on if a polygon edge crosses it or not. These might be stored as plane of base ARGB pixels plus a seperate plane for the other three potentual ARGB values. You maintain a seperate 1 BPP plane that indicates if each pixel is 1 or 4 samples (you could do this by stealing one bit from one of your ARGB channels) which you use to determine if you need to just access one sample or all four. This allows you to use 1/4 the BW on pixels that don't lay on a polygon edge, but you still need to allocate the full amount of memory up front as if you supplied a screen full of single pixel polygons every pixel then lays on an edge...

John.
 
stevem said:
pcchen said:
demalion said:
And doesn't (purely) postprocessor/RAMDAC based AA require the front buffer to be the same size as the back buffer?

Yes, although I don't know any chip other than VSA-100 doing this.

I assumed this was also the case with GFFX 2x AA.

AFAIK yes for 2xRGMS/Quincunx on NV25-NV3x.
 
FX 5900 Ultra 256 MB and 9800 Pro 256 MB here we come :)

Hmm I wonder how the SS modes will run on the NV35, without them the chip can't produce more than 4x AA which to me seems like a pretty sad situation.
 
Alistair said:
So there is actually a use for 256 MB boards, but only with big textures, high res and a high degree of AA?

Does anyone think they're worth the extra cash over 128 MB cards, for things available this year?

(Deus Ex 2 is about the only must have I can think of, in my currently jaded state...)
Textures aren't everything. As polygon counts increase there's also a need to store lots of vertex data. My own little terrain demo started running out of memory even before I could upload textures. Of course, I was using a highly inefficient method that could do with some optimization. . . Still, with 3/4 million vertices I hit a limit on my Radeon 9700. . .
 
pcchen said:
Pete said:
Meaning Ben's total would be roughly doubled (assuming a 24- or 32-bit Z-buffer)?

Yes, you need another 43.9MB, and the total becomes 95.1MB. So the 64MB cards are out, and 128MB cards have only 32.9MB left for textures.
Um, once rendering is completed for a frame, there is no longer a need for a z-buffer. So, for double buffering, there is only a need for two color buffers and one z-buffer.

There may be implementations that attempt to optimize memory bandwidth usage by interleaving the frame and z-buffers at the expense of extra memory usage, but I don't see why it would be necessary to do so.

There is also the additional problem of when the framebuffer downsampling is to be done. If the hardware decides to downsample at buffer swap, then the front buffer never needs to be any higher-resolution than the display resolution. If, however, the hardware does the downsampling at scanout, then both color buffers need to be the same size.

In other words, there is some freedom in memory size requirements of FSAA in order to optimize memory bandwidth usage.

Minimum:
(# of buffers - 1 + number of samples * 2) * height * width * bit depth

Medium:
(# of buffers + 1) * (number of samples) * height * width * bit depth

Maximum:
(# of buffers) * (number of samples * 2) * height * width * bit depth

Note that the *2 in both cases is for the z-buffer.

So, minimum memory requirements for 6x FSAA at 1600x1200x32 (triple buffering):
102.5MB

Medium memory requirements (same as above, no copied z-buffer):
175.8MB

Maximum memory requirements (duplicate z-buffer for each color buffer):
263.7MB

Just fyi, it appears ATI downsamples at buffer swap, and nVidia downsamples at scanout. This is likely the cause of the larger performance hits of FSAA at high resolutions for the FX. It's either running out of texture memory, or beginning to enable downsampling at buffer swap, reducing memory bandwidth performance.
 
using a loosy compression it should be possible to do even highres AA within an 64MB Buffer (without the use of 64bit or 128bit Framebuffers)

Z-Buffer : 1600x1200x32 x 4xAA = 30000 KB

The Z-Buffer cannot be compressed cause this would give a lot of artifacts (the 8bit stencil is not compressed too)

Framebuffer's :

2 x 1600x1200x32 x 4AA / ( 1.5 x 2 ) = 20000 KB

It should be possible to compress the Framebuffers loosy without "too much"; visible artifacts. Even without 4xAA it should be possible to compress the Framebuffer 1.5times without an problem. Using 4xAA I think an additional 2times compression should be possible.

So here I need < 50 MB for 4xAA even with the Frontbuffer at highRes. If the Frontbuffer is smaller then the memory requirement is even lower at <45 MB.



Is this technique useful at all? IMHO yes, but only for lowend systems. I would like to see an embedded graphics controller like the nForce2 or the R300 using 128bit 200MHz DDR main memory and an 128bit 64MB-DDR-RAM framebuffer (using multichip modules like in notebooks) on the back side of the mainboard. This would give an really nice and fast system.
 
Unless you garentee a fixed compressed block size, you'll find it hard to save memory as you need random access into.

John.
 
Chalnoth said:
pcchen said:
Pete said:
Meaning Ben's total would be roughly doubled (assuming a 24- or 32-bit Z-buffer)?

Yes, you need another 43.9MB, and the total becomes 95.1MB. So the 64MB cards are out, and 128MB cards have only 32.9MB left for textures.
Um, once rendering is completed for a frame, there is no longer a need for a z-buffer. So, for double buffering, there is only a need for two color buffers and one z-buffer.

Of course, that was considered in my number:

back color buffer: 1600x1200x6x4 = 43.94MB
depth buffer: 1600x1200x6x4 = 43.94MB
downsampled front buffer: 1600x1200x4 = 7.32MB

Total = 43.94 + 43.94 + 7.32 = 95.2MB
 
JohnH said:
Unless you garentee a fixed compressed block size, you'll find it hard to save memory as you need random access into.

There are several ways to do lossy fixed ratio Z compression. An algorithm is to assume limited fragments inside a pixel, and all fragments inside a pixel are flat. So you just record their center Z value, dy/dz and dx/dz values. You can use quite low precision for dy/dz and dx/dz, such as 8 bits. Stencil values are also the same for each fragment. You also need to record the coverage mask of each fragment inside a pixel. If you limit the number of fragments inside a pixel to three, you can compress quite well if you have 16 or more subsamples. Of course, if you have more than three fragments inside a pixel you'll get artifacts.
 
Back
Top