Anti-aliasing Without EDRAM in NVidia's PS3 GPU...

MfA said:
API, developer inertia, compatibility. On the PC the direct mode renderers are what the APIs and software is designed for, when your competition sets the terms you can compete on things are not always easy.

Oh I didn't know there was so much dependence on the API/software. Thought it was mostly just a hardware implementation of how to handle geometry processing.
 
SimonF said:
Reading back how? If you mean with software then you'll see the system grind to a halt.
I think the better question would be "reading back where". From my perspective every Z-read That my code explicitly initiates, is "software" :p

That said, I'd argue that a good 2/3s of postprocessing effects manipulate Z-buffer in some manner or another. On some architectures even more then that (like when you are forced to invert Z-buffer manually because Z-test only supports one-way comparators).

Anyway - to me it's just plain common sense that each tile can output a Z-Buffer after it's completed. Just make sure the output is also unified graphics mem, and we've got all that's needed for Layer rendering abstraction, complete with all the postprocessing fun 8)
 
nAo said:
2) If NVIDIA GPU will not have a big pool of embedded memory then performance wise R500 could be faster when AA is on.
That would be assuming that you're GPU bandwidth bound.
There's a lot of possible rendering scenarios, especially in an embedded platform, where you can have MSAA "for free".

MfA said:
This isnt the same situation as with the poorly conceived GS. ATI's eDRAM chip is unlikely to be lacking features ...
Why wouldn't eDRAM be include at the expense of something else, in this case? (if the XeGPU has indeed eDRAM)
For instance, ERP thinks that even the ArtX's Flipper is paying a big price, transistor wise, for its eDRAM:
http://www.beyond3d.com/forum/viewtopic.php?p=445308#445308
 
The percentage of the die it takes up has gone down ... sure if you dont put it in you could put in more pipelines, but I doubt they would actually change the pipelines themselves much. Plus they dont need to worry about framebuffer compression, which saves a few trannies (but not a huge deal).

Without eDRAM you might be optimized for the long shaders everyone raves about, but you cant solve every problem with a longer shader. Dynamic inter-object shadows, volumetric effects, particle systems ...

Of course Id still rather see a tiler.
 
IMO you'd still want compression for the stuff in eDRAM - the information computed for compression (specifically: which nearby samples have identical colors) can be used to optimize downsampling and blending.
 
The downsampling is so little work, why worry about it? How does knowing if a neighbouring sample has the same color help in blending?
 
Well, say a fragment (single color) covers 3 samples inside a pixel which is currently fully covered by an opaque fragment (4 samples with identical colors). Doesn't that mean you can do the color blend once (for all 3 samples) instead of 3 times?

Maybe for 4xAA this doesn't give you much, but I'm still hoping for 8x or 16x one of these days...
 
psurge said:
Well, say a fragment (single color) covers 3 samples inside a pixel which is currently fully covered by an opaque fragment (4 samples with identical colors). Doesn't that mean you can do the color blend once (for all 3 samples) instead of 3 times?
That's correct.
 
Unless you are doing it to save power, you would actually need to make the blending a potential bottleneck ... now that might make sense for multisampling, but I dont think for outside of that. As for multisampling, that could be handled more low brow than the compression schemes being used now.
 
I agree that if you're not multisampling it's pointless. The idea also doesn't have much merit if a low number of samples are being used (2x, maybe 4x).

Were you thinking that higher sample counts (16X) should be dropped in favor of some kind of super-sampling (MSAA doesn't address in-shader aliasing)?
 
16x? Even with external DRAM that is a bit optimistic, with eDRAM you would need virtualized memory and be able to spill to external DRAM ... without tiling that is.

Im pretty sure noone will do 16x multisampling.
 
MfA said:
16x? Even with external DRAM that is a bit optimistic, with eDRAM you would need virtualized memory and be able to spill to external DRAM ... without tiling that is.

Im pretty sure noone will do 16x multisampling.

I'm hoping for 6x "temporal" from the xenon . I use 6x2 in every game i can . The games really do show a big improvement over regular 6x . Perhaps ati can perfect it so that 6x3 would work giving you some where between 12 and 18x quality . On my x800xt pe the flickering gets bad at 6x3 though so i would think it would take alot of work . Perhaps a hybrid 4x mode from nvidia with 2x multi and 2x super sampling .
 
personally, I can't tell the difference between 6xMSAA and 6xTAA (on the R350). The difference would be even harder to distinguish on a TV, I think.
 
MfA said:
16x? Even with external DRAM that is a bit optimistic, with eDRAM you would need virtualized memory and be able to spill to external DRAM ... without tiling that is.

Im pretty sure noone will do 16x multisampling.

R500 seems to use its EDRAM to hold the frame buffer (colour + Z per pixel) but not to hold any AA sample data (slightly simplistic description).

R500 appears to use system RAM to store AA samples. The EDRAM/blending/filtering part of R500 has a latency-hiding buffer designed to feed a pipeline of pixel operations performed against the framebuffer.

In other words, Xbox 360's GPU will necessarily spill AA sample data over into system RAM.

The combination of:

- pipelined blending/filtering operations (minimal latency)
- blending/filtering hardware being on the same chip as the frame buffer

is what gives R500 it's extremely high speeds. Well, that's my interpretation.

(Lots of simplifications in this - I just want to point out that Xbox 360 can't do all AA within EDRAM, as far as I can tell. I just want to point this out as a comparison point for possible NVidia architectures.)

Jawed
 
Back
Top