XBOX 2 GRAPHICS DETAILS EMERGE .. not much actually

How else can the Xbox 2 pull off 4 X AA at 720p? There is only so much eDRAM they can put on the GPU and only so much overall system bandwidth available from a unified memory architechture.
 
good question, one that i dont have an answer for really. I'm confident ATI has things covered though. with a next gen Hyper Z and other algorithms.

and hopefully more than 10 MB eDRAM....
 
Brimstone said:
How else can the Xbox 2 pull off 4 X AA at 720p? There is only so much eDRAM they can put on the GPU and only so much overall system bandwidth available from a unified memory architechture.
Considering that current cards already do this, I don't see why it would be so unprecedented for future cards to do the same (without going to TBDR).
 
Current cards don't have a 10MB+ embedded frame/back buffer though ...

I suppose they could cut the screen up, drop to 32-bit colour when writing out to the buffer or ... just not use 4x AA.

Seriously, the vast majority of Xbox 2 owners, in the entire lifetime of the Xbox 2, won't have HDTV. Spending all those resources doing 1280 x 768 buffers with 4 x AA and 8 or 16 times aniso seems like a potentially large miss allocation of the console's resources.

Just using a standard 1280 x 768 buffer with aniso would allow them to make great use of HDTVs, and also use the same buffer to create the equivalent of a 640 x 480 supersampled image using the rumoured "high quality video scaler". Everyone's a winner, nothing's really going to waste and the game can be optimised for rendering at one resolution. Good for VGA too.
 
Inane_Dork said:
Brimstone said:
How else can the Xbox 2 pull off 4 X AA at 720p? There is only so much eDRAM they can put on the GPU and only so much overall system bandwidth available from a unified memory architechture.
Considering that current cards already do this, I don't see why it would be so unprecedented for future cards to do the same (without going to TBDR).


Computer graphic cards have ample amounts of bandwidth because of the dedicated memory bus. A console with a UMA isn't the same thing. After the eDRAM is blown out only so much bandwidth is going to be available.
 
Brimstone said:
How else can the Xbox 2 pull off 4 X AA at 720p? There is only so much eDRAM they can put on the GPU and only so much overall system bandwidth available from a unified memory architechture.

IIRC that Xenon diagram, you're not forced to use that EDRAM (We're not even sure it was embedded DRAM, IIRC, it was enhanced DRAM) to store your framebuffers.
 
Brimstone said:
Computer graphic cards have ample amounts of bandwidth because of the dedicated memory bus. A console with a UMA isn't the same thing. After the eDRAM is blown out only so much bandwidth is going to be available.
Do you realize how high the cache hit frequency would be with a 10 MB framebuffer cache? A 10 MB cache would drastically drop the memory bandwidth demands of a 44 MB framebuffer (64 bit color, 32 bit Z, 1280x720, 4x MSAA). This is especially the case if you compress the color and Z buffers (which ATi does).
 
Inane_Dork said:
Do you realize how high the cache hit frequency would be with a 10 MB framebuffer cache? A 10 MB cache would drastically drop the memory bandwidth demands of a 44 MB framebuffer (64 bit color, 32 bit Z, 1280x720, 4x MSAA). This is especially the case if you compress the color and Z buffers (which ATi does).
A 10 MB cache COULD , not would, drastically drop the memory bandwith requirements. If most of your current working set doesn't fit in the edram you're actually gonna trash a lot of bandwith. I believe current texture caches are doing pretty well on modern GPUs..edram make sense as a very big L2 texture cache and/or as back buffer if you can split the rendering process to make your buffers completely fit in the edram.

ciao,
Marco
 
nAo:

Didn't Baumann say something about R500 being able to tile the screen a while back? If so then 10MB just might be more than enough.
 
akira888 said:
nAo:

Didn't Baumann say something about R500 being able to tile the screen a while back? If so then 10MB just might be more than enough.
Dunno what Dave wrote..but tiling is a way to split your data working set :)
 
nAo said:
A 10 MB cache COULD , not would, drastically drop the memory bandwith requirements. If most of your current working set doesn't fit in the edram you're actually gonna trash a lot of bandwith.
And what application, exactly, would produce such behavior? A game? I can't think of why a game's working set of the framebuffer would be larger than 1/4 of the screen at one time. Keep in mind that 1/4 figure is completely without compression.
 
Inane_Dork said:
nAo said:
A 10 MB cache COULD , not would, drastically drop the memory bandwith requirements. If most of your current working set doesn't fit in the edram you're actually gonna trash a lot of bandwith.
And what application, exactly, would produce such behavior? A game? I can't think of why a game's working set of the framebuffer would be larger than 1/4 of the screen at one time. Keep in mind that 1/4 figure is completely without compression.
Just factor togheter MSAA, FP render targets (probably there will be no compression here..), Z-buffer, and so on..
Maybe tiling to 1/4 of the screen will be enough most of the time..but we're not talking about a huge cache here, 'just' a very fast/wide dram

ciao,
Marco
 
nAo said:
Just factor togheter MSAA, FP render targets (probably there will be no compression here..), Z-buffer, and so on..
I pretty much did.

1280x720 screen
64 bits of color
32 bits of Z
4x MSAA

Do the math and you come up with 44 MB, like I used. I'm not trying to skirt the issue. Factor in how much color and depth compression ATi can do with 4x MSAA and one-quarter of the screen is about the worst case scenario.

Or look at it this way. Cards already have caches for the depth buffer. That must make most to all cases faster. Why would adding eDRAM slow it down?
 
Brimstone said:
Computer graphic cards have ample amounts of bandwidth because of the dedicated memory bus. A console with a UMA isn't the same thing. After the eDRAM is blown out only so much bandwidth is going to be available.

You don't seem to realize a console with UMA isn't going to NEED as much bandwidth to begin with.

Just for starters, screen update speed will be 60 frames/sec absolute max, it will NEVER exceed that. Second, games will generally run in lower resolutions than on a PC. Third, it is presumed games will use lots more, and longer/more complicated shaders, thus lifting further emphasis off the bandwidth-burning rasterization side of 3D rendering.
 
As far as the framebuffer is concerned there is no real working set with the way most games work ... there is some spatial coherency, but no temporal coherency (ie. the odds of a location being accessed is entirely independent of whether it is in cache or not ... unlike with general purpose processing where there tends to be a huge amount of temporal coherency). To get temporal coherency in framebuffer access you need tiling.

Now maybe with a virtualized memory system you could store the framebuffer inside eDRAM and use external memory as overflow ... and maybe fragmentation wouldnt be significant enough to take away the advantage of compression ... and maybe this could fit enough of the framebuffer in there for it to be advantageous.

Present GPUs have caches because they need to accumulate the spatially coherent writes for compression, otherwise they'd still just have FIFOs.
 
Couldn't you easily move off the not-often-accessed parts of the z/color buffer to normal ram from edram? Presumably z-buffer compression for msaa works by exploiting the fact that most pixels don't contain polygon edges - thus you need to store only one z value instead of 4 (in case of 4x FSAA).
So you'd just allocate basically a non-fsaa z buffer in edram, and the rest of it (3 times as large) in normal ram. You'd also need 3 bits per pixel to flag if the 3 subpixels have the same z value as the first subpixel in the edram buffer (these 3 bits might be on-die cache).
So you WILL need to access the normal ram, but only on the pixels which have polygon edges, which compared to accesses in the edram should be a lot less frequently.
Color can be handled just the same, and you have just dropped the needed size of the edram by a factor of 4.
Simple, isn't it (it is so simple I'm just running off to the patent office :devilish:).

mczak
 
You need to store all the Z-values all the time (otherwise intersecting surfaces wont work correctly). You can get away with reading less values though, which is basically what the hierarchical Z-buffer does.

Storing the colour in seperate full-pixel/subpixel buffers is possible, but you get a little extra latency.
 
Back
Top