720p 64bit framebuffer size? Educated guesses here please

EasyRaider said:
Titanio said:
I don't think you'd be anti-aliasing your framebuffer either ;)

Uh, what else would you be anti-aliasing? Texture samples only?

Be patient with me, I'm still figuring this out ;)

I guess you'd need to keep the z-value for every sample you're taking with anti-aliasing? Which effectively would result in the same bandwidth as an "anti-aliased" z-buffer?

That still wouldn't double your total requirement beyond 32-bit framebuffers though, unless your z-precision had to match your colour precision.
 
Z-precision does not have to match colour precision, to make that clear.

With both super- and multi-sampling, you need to keep both colour and depth value for every sample, although compression helps a lot here, especially with multisampling.
 
Maybe I should elaborate a bit on my question.
Of course what I'm getting at is, how much bandwidth xenos actually saves with its EDRAM buffer, and how much bandwidth PS3 has to set aside for a similar (to 360s) full back buffer (with z-buffer, stencil and 4x AA).
I chose 720p 64bit 4x AA because I get the impression that's what’s going to be the standard on x360.
I don’t know about the 10R 10G 10B 2A 32bit format. Won’t any alpha be to horribly banded to be useful?

Does the z-buffer need to have more bitdepth than 32bit? If not then that needs to be factored in as well.

Has nVidias backbuffer compression been improved beyond the theoretical 4x realworld 2x?
 
Squeak said:
I chose 720p 64bit 4x AA because I get the impression that's what’s going to be the standard on x360.
I don’t know about the 10R 10G 10B 2A 32bit format. Won’t any alpha be to horribly banded to be useful?

Well, I got the impression the FP10 32bit format will be widely used. AFAIK, very few games use framebuffer alpha anyway.

Does the z-buffer need to have more bitdepth than 32bit? If not then that needs to be factored in as well.

Today we have 24b z/8b stencil. 32bit depth buffer could be useful, I think. If stencil is needed in addition to that, then I suppose we would be looking at 40 or 48 bits total. But I would expect that to be a rare case.
 
EasyRaider said:
Well, I got the impression the FP10 32bit format will be widely used. AFAIK, very few games use framebuffer alpha anyway.
Well, I guess then my question is about a 32bit buffer instead of 64bit.
 
Backbuffer = resolution*(colour+Z+stencil+FSAA)*overdraw

resolution = 1280*720 = 921600 pixels
colour = 64bit = 8 bytes/pixel
Z = 24bit = 3 bytes/pixel
stencil = 8bit = 1 byte/pixel

FSAA,

4*SSAA ~ f(colour, Z) ~ 4*(8+3) byte samples/pixel ~ 44 bytes /pixel*
4*MSAA ~ f(Z) ~ 4*(3) byte samples/pixel ~ 12 bytes/pixel

*worst case with SSAA as f(colour, Z) for comparison.

overdraw ~ 5*

*TBDR would be nice with a fat '0'! I believe the overdraw can vary quite alot...any guestimates on min-max on typical games?

Backbuffer = 921600*(8+3+1+12)*5=110592000/(1024)^2
=105 MBytes/frame

=3.16 GB/s @ 30 FPS
=6.32 GB/s @ 60 FPS


Well that's my guestimate without compression! :)

EDIT:

That's per frame! I could be wrong though! :p

EDIT2:

Sorry, that's MBytes/frame!
 
Jaws, I think what you are showing for the MSAA case, IS what vendors refer to as "color compression" and is not accurate since some of those pixels will contain polygon edges forcing writes of the specific color fragments to the backbuffer. Your SSAA example more correctly describes the "compressionless" case. It also seems you are storing 5 z-samples rather than 4. And AFAIK, Z and Stencil (if used) are also always stored together, not seperate as you describe, but loseless Z+stencil compression can be as good as 4:1.

Editted: corrected my compression ratio error.
 
What is overdraw in these calculations? Multipass rendering? Jaws' final bandwidth figures also aren't taking into account multipass read/writes. As I understand it the EDRAM in Xenos allows the same data to be processed several times really quickly without bandwidth gobbling, whereas the same effects with eDRAM are going to impact bandwidth noticeably.

If not, why is there question of RSX not being able to cope with AA, HDR? Seem it can comfortably fit it all in if Jaws' calculations are right. What more needs to be factored in?
 
Shifty Geezer said:
If not, why is there question of RSX not being able to cope with AA, HDR? Seem it can comfortably fit it all in if Jaws' calculations are right. What more needs to be factored in?

Forget RSX, what about the NV40 with ~35GB/s? Or a 6600GT (which has ~16GB/s)?

With the numbers above it would appear that even low end cards like the 6600GT should be able to do this stuff with ease. Yet the real world is telling us this is not the case :(

These cards can be bandwidth limited at higher resolutions without AA or HDR and the effect of these being enabled is pretty well documented (although, to be fair, the performance hit is not all bandwidth related). Still the NV40 takes some decent hits with 2x and 4x AA in modern games like HL2, Doom3, FarCry, and BF2. (BF2, e.g., drops from 90fps to 72fps when 4xAA is applied at 1024x768 and 70fps to 56fps at 1280x1024 and 56fps to 42fps at 1600x1200... I picked BF2 because I had some stats handy and because the game is SERIOUSLY jaggy without AA). I think ultimately the bandwidth is not the only issue, but the design of the chip. A chip with a lot of bandwidth, with the anticipation of a certain feature (like AA) being standard, will design enough logic to ensure it can utilize that bandwidth for that purpose and hit that features without a performance hit. If that is not the case that bandwidth is wasted.

I don't have an answer for this thread (outside the number crunching we do know that many games can be bandwidth limited at higher resolutions even without AA), but the chart here in this thread is at least interesting and may contribute some information:

http://www.beyond3d.com/forum/viewtopic.php?t=24125
 
Acert93 said:
Shifty Geezer said:
If not, why is there question of RSX not being able to cope with AA, HDR? Seem it can comfortably fit it all in if Jaws' calculations are right. What more needs to be factored in?

Forget RSX, what about the NV40 with ~35GB/s? Or a 6600GT (which has ~16GB/s)?

With the numbers above it would appear that even low end cards like the 6600GT should be able to do this stuff with ease. Yet the real world is telling us this is not the case :(

Since when did higher bandwidth give you higher fillrate aswell?
 
What is overdraw in these calculations? Multipass rendering? Jaws' final bandwidth figures also aren't taking into account multipass read/writes. As I understand it the EDRAM in Xenos allows the same data to be processed several times really quickly without bandwidth gobbling, whereas the same effects with eDRAM are going to impact bandwidth noticeably.
What does the eDRAM store that enables it to do that? Besides which, overdraw is a matter of different polygons that write to the same pixels, so reprocessing the same data won't help you. It's more of a fillrate concern than a bandwidth concern, but it does mean repeated framebuffer writes if the pixels pass all tests.

What's wrong with drawing front to back?
Alpha blending is not order-independent. And actually, in those cases, you need the overdraw, and it can quite easily go way higher than a factor of 5.
 
Shifty Geezer said:
As I understand it the EDRAM in Xenos allows the same data to be processed several times really quickly without bandwidth gobbling, whereas the same effects with eDRAM are going to impact bandwidth noticeably.
Xenos can never be framebuffer bandwidth bound.

Alstrong said:
What's wrong with drawing front to back?
Nothing, actually it's the right thing to do.
 
Vysez said:
Xenos can never be framebuffer bandwidth bound.
32Gb is not exactly unlimited bandwidth, even with free AA.
If someone tries to run on two high a resolution or do a lot of alphablending, they could become fillrate and/or bandwidth limited.
 
Squeak said:
32Gb is not exactly unlimited bandwidth, even with free AA.
If someone tries to run on two high a resolution or do a lot of alphablending, they could become fillrate and/or bandwidth limited.
Just not true. With the parent die limited to 8 pixels/clock, 32 GB/s is enough.

500 MHz * 8 pixels/clock * 8 bytes/pixel = 32 GB/s.
 
Just not true. With the parent die limited to 8 pixels/clock, 32 GB/s is enough.
You're assuming z compression always yields 4:1, which I don't think is the case. Remember with 4xAA you are transferring upto 8 color and 32 z values per clock to the daughter die. Or during Z only pass, 64 z values per clock. This is sent along with probably a couple of other bits describing to which subsamples the color and z values apply; these might be handled by seperate control lines or perhaps attached to the z value itself, I'm not sure. This bandwidth may also be affected as the tile is copied back out to system memory. So it "could" be bandwidth limited in rare instances. Unless I'm wrong and Z always yields 4:1 and like the original leak described, there is a seperate bus for the resolve; both of which I think are unlikely.
 
Rockster said:
Just not true. With the parent die limited to 8 pixels/clock, 32 GB/s is enough.
You're assuming z compression always yields 4:1, which I don't think is the case. Remember with 4xAA you are transferring upto 8 color and 32 z values per clock to the daughter die. Or during Z only pass, 64 z values per clock. This is sent along with probably a couple of other bits describing to which subsamples the color and z values apply; these might be handled by seperate control lines or perhaps attached to the z value itself, I'm not sure. This bandwidth may also be affected as the tile is copied back out to system memory. So it "could" be bandwidth limited in rare instances. Unless I'm wrong and Z always yields 4:1 and like the original leak described, there is a seperate bus for the resolve; both of which I think are unlikely.

No he's right.

In the case of Xenos it does, because your compressing pre framebuffer lend not post framebuffer blend. This is the reason they split the dies where they did, and why the blending logic is on the daughter die.

Xenos cannot be limited by it's EDRAM bus, sure it can get fillrate limited, but not bandwidth limited.
 
To answer the original post, Dave's Xenos article already correctly describes framebuffer size so I won't re-hash that. As for bandwidth there are too many if statements to come to any meaningful conclusion since it varies wildly between apps and the numbers are significantly changed by early-Z, Z-compression, cache, color compression, overdraw, sort order, polygon size, and number of translucent pixels.

ie. Basic Framebuffer bandwidth = unknown number of fragments * (IF fragments pass early Z tests then, IF Z value is in cache read from there, else read compressed Z values from buffer, IF Z test passes then check blending, IF blending then read color, blend and write color, else just write color)

The number of reads and writes per pixel isn't fixed, but needed to understand bandwidth consumption. Not sure how to come up with reasonable values for those components, and it's probably more useful to do real world analysis. "ATI's calculations lead to a colour and z bandwidth demand of around 26-134GB/s at 8 pixels with 4x Multi-Sampling AA enabled at High Definition TV resolutions." I don't have data with which to argue that.
 
Back
Top