256MB Graphics Cards

When do you think 256MB cards will be necessary?

  • Next 6 months

    Votes: 0 0.0%
  • Next 12 Months

    Votes: 0 0.0%
  • When DoomIII Ships!!

    Votes: 0 0.0%
  • 256MB?? This is getting silly...

    Votes: 0 0.0%

  • Total voters
    133
Dave H said:
That calculation doesn't seem to count the memory needed for the Z-buffer. With a 32-bit multisampled Z-buffer, it becomes
1600 * 1200 * ( 6*8 + 2*4 ) = ~103 MBytes.

Thanks for the correction. Aren't most Z-buffers still 24-bit, though? (That'd get you to ~92MB.)

I'm not sure if the 8-bit stencil buffer has to be replicated for each sample set...I'd think it would be.

OT but related: how exactly are framebuffer/z-compression working such that they save bandwidth yet don't save memory usage? Conversely, wouldn't hierarchical-Z require more memory space (in the same way that mipmaps take up more memory but reduce bandwidth utilization)?

There are several explanations in these forums in the past, and some concepts outlined by some of the ATI people in scattered places. Sorry, this computer and my connection (using a modem now, haven't configured for the DSL yet) is so slow that until I get used to it, my annoyance factor while using is prohibitive in regards to me doing a search for you, but IIRC, keywords "compression", "bandwidth", "memory" should help you find the comments (I'd guess it was sireric and OpenGL guy who made some comments I recall, so perhaps search by post and look for those threads with matches with their names).

EDIT: I just realized that it might be necessary to look at this thread to understand some of what I say above.

My brief explanation of why, if not how: what if you update the buffer and store it compressed? If so, what happens if you update it the buffer for that screen position again? What if it can't be compressed into the same space? How do you manage the overflow that results? How do you maintain predictable alignment and addressing for each screen position such that the buffer can be randomly accessed?

IIRC, the general indication is that IHVs haven't discovered how to do the above efficiently with a lossless compression scheme yet. We had an interesting discussion primarily about z buffer compression. For framebuffer compression, well, I think Matrox FAA might be considered to be one method of dealing with some of the above issues if it actually saves storage space, though the current implementation seems to have issues.

As for the how of color compression for the R300: if several samples are the same color, you send the color once and cause it to be replicated in the suitable amount of places. Possibilities might be some flexibility in the addressing controller that lets a data value be sent to multiple locations (actual way of doing the above), or as I proposed in another post some sort of "all or nothing" 1 bit mask used to represent that either all the sample colors are the same for a pixel or they are not, and if set, only the first color sample is read at all, and it was the only one written in the first place (conceptual way of doing the above).

I'm pretty sure ATI engineers could conceivably have come up with something a bit different than the above ;), but since (in my limited knowledge) they appear feasible, there is a chance they didn't have to.
I'll note the first works pretty well to my mind with the idea of a limited amount of samples being allowed, and something about addressing concerns tickles my memory in regards to the discussions I mentioned.
Don't take the above as anything but a general indication of the issues and my own theories on how they might be resolved, though...search out the discussions for better answers!
 
Althornin said:
Nagorak said:
Aren't we getting to the point where, even with full FSAA, almost half the 256 MBs of memory would be completely unused most of the time? I think this is jumping the gun, just a little bit...

6xFSAA at 1600x1200 uses how much memory?
(on a 9700)?

It would be too slow to play with that resolution anyway! There's no point adding more memory when the rest of the card would bog down, long before it became useful.
 
Ok, with 8X+ antialiasing pure framebuffer mem requirements for IMR-s ARE indeed getting silly.
8 times the mem just to do polygon edge antialiasing ? bleh, what a waste. Bring out the tilers.
 
I'm not sure if the 8-bit stencil buffer has to be replicated for each sample set...I'd think it would be.

Probably would be. I didn't even think of it. I'm a little slow today...

EDIT: I just realized that it might be necessary to look at this thread to understand some of what I say above.

Ouch, and on Valentine's Day too. :cry: Good luck. My thoughts are with you. ;)

or as I proposed in another post some sort of "all or nothing" 1 bit mask used to represent that either all the sample colors are the same for a pixel or they are not, and if set, only the first color sample is read at all, and it was the only one written in the first place (conceptual way of doing the above).

:idea:

Of course: you just skip writing and reading from all sub-sample framebuffers but one. But of course you need to "skip over" that memory position, to keep everything aligned, hence no savings in space.

Brilliant. Why couldn't I think of that?! (Of course the bit mask will also need to be saved to memory, so we need to add another 229k per frame to the calculation. I think.)

Hmm...ok; that answer was satisfying enough (proof-of-concept) that I can go to bed in peace and do a proper forum search re: z-compression tomorrow.

Thanks! :)
 
Guys,

you've discounted the fact that it's desirable to have geometry reside in video memory as well.

And memory per vertex amount is growing all of the time due to the increased amount of texture coordinates etc.

I could throw off a figure off the top of my head and say a game released this year might use about 20 MB of vertex buffers.

With 64 bytes/vertex (again, off the top of my head) this would be only about 325000 vertices in lets say a level (divided maybe over 5-15 min of gameplay). It's not a feat at all for any card to push... So my estimate actually could be pretty low, and you'd need much more space for vertex buffers.
 
And memory per vertex amount is growing all of the time due to the increased amount of texture coordinates etc.
#

Yup, its certianly quite interesting to see the size of the vertex buffering in 3DM03.
 
DaveBaumann said:
And memory per vertex amount is growing all of the time due to the increased amount of texture coordinates etc.
#

Yup, its certianly quite interesting to see the size of the vertex buffering in 3DM03.

Vertex compression is becoming more and more important. And everybody should be using it if they are using vertex shaders (in most cases its completely speed free with no visual loss in quality).

If 3DM03 isn't using any then thats a bit crap :(

I'll just plug my vertex compression chapters in ShaderX1 and 2 here then in case they are reading :)
 
Dave H said:
edit - Ante P: you wrote that the calculation I quoted above is *not* counting triple buffering? Then what's the factor of 3 for? (Apologies in advance if I'm missing something stupid...)

double buffering of course
 
Dave H said:
Thanks for the correction. Aren't most Z-buffers still 24-bit, though? (That'd get you to ~92MB.)

OT but related: how exactly are framebuffer/z-compression working such that they save bandwidth yet don't save memory usage?

Perhaps he's counting the stencil buffer too?

As for compression, since compression isn't constant you still need to reserve as much mamoery as you'd need without compression.
(Sorry if that was a dumb answer.)
 
Nagorak said:
It would be too slow to play with that resolution anyway! There's no point adding more memory when the rest of the card would bog down, long before it became useful.

You have obviously not used a R300?
I play with those settings (+16x Aniso) in a handfull of games.
An R350 would increase that "handfull" to many more me thinks.
 
demalion said:
I'm not sure if the 8-bit stencil buffer has to be replicated for each sample set...I'd think it would be.

Stencil needs to be multisampled just like the zbuffer, otherwise you wouldn't get antialiasing at the edges of your stencil masks.
 
And then throw in high precision MRT.
1600x1200x4x128/8 = 117MB
That's for four 4x32bit fp buffers, no AA.
And that's in addition to the usual FB.
Is it possible to get AA on MRTs? In that case 6*117MB = ouch.

It wouldn't be useful in real time though. :)
 
Dave H said:
OT but related: how exactly are framebuffer/z-compression working such that they save bandwidth yet don't save memory usage?

This may have been answered earlier, but here goes:
I believe that since it is paramount to keep the compression lossless, compression ratios cannot be guaranteed (I have no or very little technical knowledge about this, but it stands to reason). Thus, memory must be allocated for worst case scenarios, i.e. no compression at all.

Naturally, in practice there is "always" compression, so bandwidth is saved, but the saved space is nevertheless not available for other data.
 
Basic said:
Is it possible to get AA on MRTs? In that case 6*117MB = ouch.

I'll respond to that by a direct quote from the DX9 SDK, in the "Multiple Render Target" subtitle:
No antialiasing is supported.


Uttar

EDIT: I'd like to say I'm not impressed by Mufu's comment about R350 6x being as fast as R300 4x
According to Anand, the R300 is between 11% and 18% slower at 6x than at 4x - that's probably because Z & Color compression become even more efficient.
And that means you'd only need 18% faster RAM to get that. Really not impressive, IMO.
Assuming that Anand figure was in an optimal case, and considering a 21% performance drop, and thus 21% faster RAM, you'd get 375Mhz RAM.
That's exactly what The Inquirer had predicted on the 4th of February:
even though we hear that the memory will actually work at 375 MHz only or should be say 750 MHz in DDR mode.
So, as you see, nothing new :)
 
Tahir said:
Yeah I hear from my sources 256MB is gonna be an option on R350.

No, really. :idea:

I know it's could be considered egotistical to quote oneself (haw haw) but I just wanted everyone to know I was joking about my sources .. I don't have any - hehe. :D
 
Hmmm - an idea: 6x multisampling requires 6x the amount of memory per buffer. With compression, you can reduce the amount of memory that needs to be accessed by a large amount - most of the time. This would imply that in the multisample buffer, there will be large amounts of memory that are almost never accessed - wouldn't it be possible to map these memory areas out to AGP memory and that way free up lots of onboard memory?
 
arjan de lumens said:
Hmmm - an idea: 6x multisampling requires 6x the amount of memory per buffer. With compression, you can reduce the amount of memory that needs to be accessed by a large amount - most of the time. This would imply that in the multisample buffer, there will be large amounts of memory that are almost never accessed - wouldn't it be possible to map these memory areas out to AGP memory and that way free up lots of onboard memory?

Seems unoptimal to me. AGP has way too high latency...
Might not be as much of a problem as I think, but I doubt I'd do miracles anyway.

Uttar
 
Uttar said:
EDIT: I'd like to say I'm not impressed by Mufu's comment about R350 6x being as fast as R300 4x
According to Anand, the R300 is between 11% and 18% slower at 6x than at 4x - that's probably because Z & Color compression become even more efficient.
And that means you'd only need 18% faster RAM to get that. Really not impressive, IMO.

Man, I think you speculate a little too quantitatively sometimes! I meant to say "as useable" as the 4x mode, sorry - corrected now. It'll depend on final clockspeeds anyway and that's pretty much a marketing call, what with there being no real competition. :-\

256MB is defintely an option for R350, Tahir. It's an "option" for R300 and NV30 as well though - doesn't really mean much.

MuFu.
 
Uttar said:
Seems unoptimal to me. AGP has way too high latency...
Might not be as much of a problem as I think, but I doubt I'd do miracles anyway.
Still I would think it's a better idea than AGP texturing, especially given that framebuffer accesses have much more predictable/prefetchable access patterns than texture map lookups and therefore are much less sensitive to latency.
 
Back
Top