The math

rAvEN^Rd

Newcomer
Help me do the math guys! I snatched this in another thread.

"Well even 1024x768x32 w/4X takes up 40.5MB for the framebuffer."

How is this calculated?
 
back buffer:
1024 horz * 768 vert * 4 samples/pixel * 4 bytes per pixel (32-bit color) * 2 (24-bits Zbuffer, 8-bits stencil) = 25mb

front buffer:
1024 horz * 768 vert * 4 bytes per pixel = 3mb

total = 28mb

I don't know where the other 12 comes from. Triple buffering?
 
rAvEN^Rd said:
Help me do the math guys! I snatched this in another thread.

"Well even 1024x768x32 w/4X takes up 40.5MB for the framebuffer."

How is this calculated?

Incorrectly! :)

10x7x32bpp is 3MBytes. If you say that 4x AA requires 6 times the storage (1 front buffer, 1 back buffer, and 4 AA sample buffers. Really crude, but....) then thats 18MBytes. Even if you add in a 32bit Z-buffer you're still only at 21MBytes.

I don't know how anyone would get that figure.

Edit: Ok, looking at Democoders calcs, I guess you could have a Z buffer per sample buffer. That would add on another 9MBytes, taking you to 30. Still not 40.5
 
DemoCoder said:
I don't know where the other 12 comes from. Triple buffering?

That's the same as another front buffer, just another 3MBytes.

btw: your first line should be 24MBytes, not 25.

(1024*768*4) = 3MBytes
3MBytes * 4 * 2 = 3MBytes * 8 = 24MBytes

Sorry, being pedantic. :p
 
Depends if you use 2^10 or 10^3 as your k-byte or m-byte divisor. I have long used 2^10, but I have noticed many people use 10^3, including industry specsheets for harddrives, so I just roughly divided by 10^3^2 :(
 
DemoCoder said:
Depends if you use 2^10 or 10^3 as your k-byte or m-byte divisor. I have long used 2^10, but I have noticed many people use 10^3, including industry specsheets for harddrives, so I just roughly divided by 10^3^2 :(

hdd manufacturers have been known to use the 10^3 divisor (in opposition to everybody else in the IT industry) with the sole reason to make their products look better in marketing terms. dunno who first started that, but they all followed eventually.
 
It depends on what architecture you are talking about.

For tilers (Kyro), there will be no difference in framebuffer size between AA and noAA.
10x7x32 double buffered, any number of samples: 6 MiB (+3 MiB optional Z-Buffer)
However, the on-chip tile buffer is not unlimited in size, so you get smaller tiles with AA enabled. This increases memory requirements a bit (triangle lists).

IMRs with RAMDAC downsampling (Voodoo4/5, GF4Ti) do not store the downsampled image in memory. So both front and back buffers (and Z too) are sample buffers.
10x7x32 double buffered, s samples: 9 MiB * s
4 samples: 9 MiB * 4 = 36 MiB = 37,75 MB

Other IMRs store the downsampled image. Front and back buffers are "low res", AA buffers hold color and Z/stencil per sample.
10x7x32 double buffered, s samples: 6 Mib + 6 MiB * s

I don't know how much mem Matrox' FAA takes.

ps: 1 MiB = 2^20 bytes, 1 MB = 10^6 bytes
 
Just a thought:

If you make the assumption that images are updated at the screen refresh rate, then it'd be more bandwdith efficient to do the downsampling in the DAC feed.

If hardware were to take that approach, then the back and front buffers would both be 4x the display resolution.
 
I'm reasonably certain that all current renderers downsample at the DAC feed, so the front buffer is the same size as the backbuffer, making the math:

1024 * 768 (# of pixels on-screen) * 4 (32-bit color) * 4 (4 samples per pixel) * 3 (front buffer, back buffer, z-buffer) = 36MB

(Quick note: divide by 1024 twice to go from bytes to megabytes)
 
I'm reasonably certain that all current renderers downsample at the DAC feed,

is this the same as what 3DFX did back in the V5 (combining the out put of the T-buffers in the DAC)? If so its just been implemented in the Gf4 as per David Kurts comments.
 
Chalnoth said:
I'm reasonably certain that all current renderers downsample at the DAC feed,
I'm reasonably certain that NOT all current renderers do this :)
jb said:
is this the same as what 3DFX did back in the V5 (combining the out put of the T-buffers in the DAC)?
Not quite. AFAIAA, the T buffer was done as N*M separate X*Y framebuffers as opposed to a single (N*X) * (M*Y) framebuffer.
 
Simon F said:
If you make the assumption that images are updated at the screen refresh rate, then it'd be more bandwdith efficient to do the downsampling in the DAC feed.
Actually, DAC downsampling is more bandwidth efficient even before you reach a 1:1 fps/refresh ratio.
When the framerate is higher than (s-1)/(s+1) times the refresh rate (where s is the number of samples), DAC downsampling saves bandwidth (e.g. 1:3 for 2 samples, 3:5 for 4 samples)
This means however, that the more samples you take the more your framerate needs to approach refresh rate to benefit from DAC downsampling.
 
Simon F said:
Not quite. AFAIAA, the T buffer was done as N*M separate X*Y framebuffers as opposed to a single (N*X) * (M*Y) framebuffer.
Well, no big difference here as this only affects where the samples are stored in memory. I'm not sure whether GF3/4 put the samples one after another in one buffer (=linear memory area) or uses separate buffers for each sample position. I guess this depends on what the memory controller can do best.
 
Simon F said:
I'm reasonably certain that NOT all current renderers do this :)

Based on Quincunx/4x9 screenshots, all GeForce4 cards certainly do. Other than that, it is a fairly simple operation of just looking at max FSAA resolutions supported. You'll most likely find that they conform to a full-size front buffer.

Additionally, the downsampling at buffer swap would result in a significant delay during the buffer swap, which could play havoc on VSYNC, and may slow down overall processing (i.e. downsampling at buffer swap has to happen all at once...and it's very hard to continue processing while it's happening), while downsampling at DAC readout is a continuous process that will not cause the rest of the video card to pause.

Anyway, if you want to show current video cards (other than the Kyro series...they do FSAA internally) that do downsampling at buffer swap, find me a video card that supports higher FSAA resolutions than its memory capacity would otherwise imply.
 
DemoCoder said:
back buffer:
1024 horz * 768 vert * 4 samples/pixel * 4 bytes per pixel (32-bit color) * 2 (24-bits Zbuffer, 8-bits stencil) = 25mb

front buffer:
1024 horz * 768 vert * 4 bytes per pixel = 3mb

total = 28mb

I don't know where the other 12 comes from. Triple buffering?

Hmm, well, you said *2 for a 32-bit Z/Stencil buffer; that should be *4. Which makes

1024*768*4*4*4=~48MB

which is over the mark. But let's take out the stencil buffer and make it

1024*768*4*4*3=37748736 bytes=36MB

Plus the front buffer,

1024*768*4*4=12582912 bytes=12MB

36 + 12 = 48MB. Something's amiss here.

My math would be somewhat different, but anyway...

I dunno. Hmm. Who was it that said 1024x768x32 @4x was 40.5MB? It just doesn't add up. o_O
 
Tagrineth said:
I dunno. Hmm. Who was it that said 1024x768x32 @4x was 40.5MB? It just doesn't add up. o_O


Actually, it was me... I just did it in my head, it was meant as a top-of-my head thing. I didn't put much more than 5 seconds of thought to it. I knew that a 1024x768x32 framebuffer was an even 9M. Took it times four and a half(no z on front buffer).

AFAIK, you only need one z-buffer. When you're done drawing one frame you're done with it and can use it to the next.

Ergo: 1024x768x(32bit+32bit+32bitZ)=9M


I crunched it in my head a bit after all the hubbub, and came to the conclusion that I was completely wrong. You don't need to doublebuffer the backbuffer or the frontbuffer, seems kindof obvious when one say it out loud. And the front buffer wouldn't be half an entire regular buffer, but 3M. My bad. :)
 
Tagrineth said:
DemoCoder said:
back buffer:
1024 horz * 768 vert * 4 samples/pixel * 4 bytes per pixel (32-bit color) * 2 (24-bits Zbuffer, 8-bits stencil) = 25mb


Hmm, well, you said *2 for a 32-bit Z/Stencil buffer; that should be *4.

No, it's still *2. One color buffer is 1024x768 pixels. Each pixel takes 4 bytes (RGBA). The Z+Stencil takes another 4 bytes per pixel. That's 4 * 2 = 8 bytes per pixel. The *2 means "2 times the 4 bytes per pixel)

Obviously, you don't need destination alpha, or stencil all the time, and you can use 16-bit W-buffering, but I'm talking about the worst case.
 
Only the framebuffer is duplicated for double/triple buffering, though. The z-buffer/stencil buffer are only needed during rendering, and thus are only needed for the currently-active back buffer.

In the end, this is basically what you have:

(size of z-buffer) + (size of stencil buffer) + (size of frame buffer) * (number of frame buffers)

Since 32-bit color usually uses 24-bit Z with 8-bit stencil, it's generally easier just to combine the two, and multiply the size of one framebuffer by 3 for double buffering, or 4 for triple buffering.
 
Yes, and that's exactly what my calculations show. No where in my post do I say that stencil/z are duped for the front-buffer. The calculations I gave for the backbuffer are correct.

1024 * 768 * (1R + 1B + 1G + 1A + 3Z + 1S) * #AA samples

= 1024 * 768 * 8 * 4 = 24mb
 
Back
Top