256MB Graphics Cards

demalion · Feb 15, 2003

Dave H said:
That calculation doesn't seem to count the memory needed for the Z-buffer. With a 32-bit multisampled Z-buffer, it becomes
1600 * 1200 * ( 6*8 + 2*4 ) = ~103 MBytes.

Click to expand...

Thanks for the correction. Aren't most Z-buffers still 24-bit, though? (That'd get you to ~92MB.)

I'm not sure if the 8-bit stencil buffer has to be replicated for each sample set...I'd think it would be.

OT but related: how exactly are framebuffer/z-compression working such that they save bandwidth yet don't save memory usage? Conversely, wouldn't hierarchical-Z require more memory space (in the same way that mipmaps take up more memory but reduce bandwidth utilization)?

There are several explanations in these forums in the past, and some concepts outlined by some of the ATI people in scattered places. Sorry, this computer and my connection (using a modem now, haven't configured for the DSL yet) is so slow that until I get used to it, my annoyance factor while using is prohibitive in regards to me doing a search for you, but IIRC, keywords "compression", "bandwidth", "memory" should help you find the comments (I'd guess it was sireric and OpenGL guy who made some comments I recall, so perhaps search by post and look for those threads with matches with their names).

EDIT: I just realized that it might be necessary to look at this thread to understand some of what I say above.

My brief explanation of why, if not how: what if you update the buffer and store it compressed? If so, what happens if you update it the buffer for that screen position again? What if it can't be compressed into the same space? How do you manage the overflow that results? How do you maintain predictable alignment and addressing for each screen position such that the buffer can be randomly accessed?

IIRC, the general indication is that IHVs haven't discovered how to do the above efficiently with a lossless compression scheme yet. We had an interesting discussion primarily about z buffer compression. For framebuffer compression, well, I think Matrox FAA might be considered to be one method of dealing with some of the above issues if it actually saves storage space, though the current implementation seems to have issues.

As for the how of color compression for the R300: if several samples are the same color, you send the color once and cause it to be replicated in the suitable amount of places. Possibilities might be some flexibility in the addressing controller that lets a data value be sent to multiple locations (actual way of doing the above), or as I proposed in another post some sort of "all or nothing" 1 bit mask used to represent that either all the sample colors are the same for a pixel or they are not, and if set, only the first color sample is read at all, and it was the only one written in the first place (conceptual way of doing the above).

I'm pretty sure ATI engineers could conceivably have come up with something a bit different than the above

, but since (in my limited knowledge) they appear feasible, there is a chance they didn't have to.
I'll note the first works pretty well to my mind with the idea of a limited amount of samples being allowed, and something about addressing concerns tickles my memory in regards to the discussions I mentioned.
Don't take the above as anything but a general indication of the issues and my own theories on how they might be resolved, though...search out the discussions for better answers!

Nagorak · Feb 15, 2003

Althornin said:
Nagorak said:

Aren't we getting to the point where, even with full FSAA, almost half the 256 MBs of memory would be completely unused most of the time? I think this is jumping the gun, just a little bit...

Click to expand...

6xFSAA at 1600x1200 uses how much memory?
(on a 9700)?

It would be too slow to play with that resolution anyway! There's no point adding more memory when the rest of the card would bog down, long before it became useful.

no_way · Feb 15, 2003

Ok, with 8X+ antialiasing pure framebuffer mem requirements for IMR-s ARE indeed getting silly.
8 times the mem just to do polygon edge antialiasing ? bleh, what a waste. Bring out the tilers.

Dave H · Feb 15, 2003

I'm not sure if the 8-bit stencil buffer has to be replicated for each sample set...I'd think it would be.

Probably would be. I didn't even think of it. I'm a little slow today...

EDIT: I just realized that it might be necessary to look at this thread to understand some of what I say above.

Ouch, and on Valentine's Day too.

Good luck. My thoughts are with you.

or as I proposed in another post some sort of "all or nothing" 1 bit mask used to represent that either all the sample colors are the same for a pixel or they are not, and if set, only the first color sample is read at all, and it was the only one written in the first place (conceptual way of doing the above).

Of course: you just skip writing and reading from all sub-sample framebuffers but one. But of course you need to "skip over" that memory position, to keep everything aligned, hence no savings in space.

Brilliant. Why couldn't I think of that?! (Of course the bit mask will also need to be saved to memory, so we need to add another 229k per frame to the calculation. I think.)

Hmm...ok; that answer was satisfying enough (proof-of-concept) that I can go to bed in peace and do a proper forum search re: z-compression tomorrow.

Thanks!

Katsa · Feb 15, 2003

Guys,

you've discounted the fact that it's desirable to have geometry reside in video memory as well.

And memory per vertex amount is growing all of the time due to the increased amount of texture coordinates etc.

I could throw off a figure off the top of my head and say a game released this year might use about 20 MB of vertex buffers.

With 64 bytes/vertex (again, off the top of my head) this would be only about 325000 vertices in lets say a level (divided maybe over 5-15 min of gameplay). It's not a feat at all for any card to push... So my estimate actually could be pretty low, and you'd need much more space for vertex buffers.

Dave Baumann · Feb 15, 2003

And memory per vertex amount is growing all of the time due to the increased amount of texture coordinates etc.

#

Yup, its certianly quite interesting to see the size of the vertex buffering in 3DM03.

DeanoC · Feb 15, 2003

DaveBaumann said:
And memory per vertex amount is growing all of the time due to the increased amount of texture coordinates etc.

Click to expand...

#

Yup, its certianly quite interesting to see the size of the vertex buffering in 3DM03.

Vertex compression is becoming more and more important. And everybody should be using it if they are using vertex shaders (in most cases its completely speed free with no visual loss in quality).

If 3DM03 isn't using any then thats a bit crap

I'll just plug my vertex compression chapters in ShaderX1 and 2 here then in case they are reading

Ante P · Feb 15, 2003

Dave H said:
edit - Ante P: you wrote that the calculation I quoted above is *not* counting triple buffering? Then what's the factor of 3 for? (Apologies in advance if I'm missing something stupid...)

double buffering of course

Ante P · Feb 15, 2003

Dave H said:
Thanks for the correction. Aren't most Z-buffers still 24-bit, though? (That'd get you to ~92MB.)

OT but related: how exactly are framebuffer/z-compression working such that they save bandwidth yet don't save memory usage?

Perhaps he's counting the stencil buffer too?

As for compression, since compression isn't constant you still need to reserve as much mamoery as you'd need without compression.
(Sorry if that was a dumb answer.)

Ante P · Feb 15, 2003

Nagorak said:
It would be too slow to play with that resolution anyway! There's no point adding more memory when the rest of the card would bog down, long before it became useful.

You have obviously not used a R300?
I play with those settings (+16x Aniso) in a handfull of games.
An R350 would increase that "handfull" to many more me thinks.

Jare · Feb 15, 2003

demalion said:
I'm not sure if the 8-bit stencil buffer has to be replicated for each sample set...I'd think it would be.

Stencil needs to be multisampled just like the zbuffer, otherwise you wouldn't get antialiasing at the edges of your stencil masks.

Basic · Feb 15, 2003

And then throw in high precision MRT.
1600x1200x4x128/8 = 117MB
That's for four 4x32bit fp buffers, no AA.
And that's in addition to the usual FB.
Is it possible to get AA on MRTs? In that case 6*117MB = ouch.

It wouldn't be useful in real time though.

horvendile · Feb 15, 2003

Dave H said:
OT but related: how exactly are framebuffer/z-compression working such that they save bandwidth yet don't save memory usage?

This may have been answered earlier, but here goes:
I believe that since it is paramount to keep the compression lossless, compression ratios cannot be guaranteed (I have no or very little technical knowledge about this, but it stands to reason). Thus, memory must be allocated for worst case scenarios, i.e. no compression at all.

Naturally, in practice there is "always" compression, so bandwidth is saved, but the saved space is nevertheless not available for other data.

Arun · Feb 15, 2003

Basic said:
Is it possible to get AA on MRTs? In that case 6*117MB = ouch.

I'll respond to that by a direct quote from the DX9 SDK, in the "Multiple Render Target" subtitle:

No antialiasing is supported.

Uttar

EDIT: I'd like to say I'm not impressed by Mufu's comment about R350 6x being as fast as R300 4x
According to Anand, the R300 is between 11% and 18% slower at 6x than at 4x - that's probably because Z & Color compression become even more efficient.
And that means you'd only need 18% faster RAM to get that. Really not impressive, IMO.
Assuming that Anand figure was in an optimal case, and considering a 21% performance drop, and thus 21% faster RAM, you'd get 375Mhz RAM.
That's exactly what The Inquirer had predicted on the 4th of February:

even though we hear that the memory will actually work at 375 MHz only or should be say 750 MHz in DDR mode.

So, as you see, nothing new

Tahir2 · Feb 15, 2003

Tahir said:
Yeah I hear from my sources 256MB is gonna be an option on R350.

No, really.

I know it's could be considered egotistical to quote oneself (haw haw) but I just wanted everyone to know I was joking about my sources .. I don't have any - hehe.

arjan de lumens · Feb 15, 2003

Hmmm - an idea: 6x multisampling requires 6x the amount of memory per buffer. With compression, you can reduce the amount of memory that needs to be accessed by a large amount - most of the time. This would imply that in the multisample buffer, there will be large amounts of memory that are almost never accessed - wouldn't it be possible to map these memory areas out to AGP memory and that way free up lots of onboard memory?

Arun · Feb 15, 2003

arjan de lumens said:
Hmmm - an idea: 6x multisampling requires 6x the amount of memory per buffer. With compression, you can reduce the amount of memory that needs to be accessed by a large amount - most of the time. This would imply that in the multisample buffer, there will be large amounts of memory that are almost never accessed - wouldn't it be possible to map these memory areas out to AGP memory and that way free up lots of onboard memory?

Seems unoptimal to me. AGP has way too high latency...
Might not be as much of a problem as I think, but I doubt I'd do miracles anyway.

Uttar

MuFu · Feb 15, 2003

Uttar said:
EDIT: I'd like to say I'm not impressed by Mufu's comment about R350 6x being as fast as R300 4x
According to Anand, the R300 is between 11% and 18% slower at 6x than at 4x - that's probably because Z & Color compression become even more efficient.
And that means you'd only need 18% faster RAM to get that. Really not impressive, IMO.

Man, I think you speculate a little too quantitatively sometimes! I meant to say "as useable" as the 4x mode, sorry - corrected now. It'll depend on final clockspeeds anyway and that's pretty much a marketing call, what with there being no real competition. :-\

256MB is defintely an option for R350, Tahir. It's an "option" for R300 and NV30 as well though - doesn't really mean much.

MuFu.

arjan de lumens · Feb 15, 2003

Uttar said:
Seems unoptimal to me. AGP has way too high latency...
Might not be as much of a problem as I think, but I doubt I'd do miracles anyway.

Still I would think it's a better idea than AGP texturing, especially given that framebuffer accesses have much more predictable/prefetchable access patterns than texture map lookups and therefore are much less sensitive to latency.

mboeller · Feb 15, 2003

MuFu said:
Doomtrooper said:

Interesting you would post that now

Click to expand...

Yeah, that's what I was thinking.

MuFu.

R350

How was the presentation, Dave?

256MB Graphics Cards

When do you think 256MB cards will be necessary?

Now (Its useful for FSAA)

Next 6 months

Next 12 Months

When DoomIII Ships!!

256MB?? This is getting silly...

demalion

Nagorak

no_way

Dave H

Katsa

Dave Baumann

Gamerscore Wh...

DeanoC

Trust me, I'm a renderer person!

Ante P

Ante P

Ante P

Jare

Basic

horvendile

Arun

Unknown.

Tahir2

arjan de lumens

Arun

Unknown.

MuFu

Chief Spastic Baboon

arjan de lumens

mboeller

Similar threads