Ailuros said:MSAA requires a 4 fold increase in z-buffer memory access and a 4 fold increase in framebuffer access (both of which dont effect a tiler). It also requires a much smaller increase in texture bandwidth and fillrate. There will be an ocean of difference in performance drop between a tiler and an IMR for MSAA.
For 2/4x samples per pixel and today's standards I do have some doubts. Today's architectures get 2xMSAA virtually "for free" and that because they're only capable of 2x samples per cycle. 4x or higher get achieved via sample loops. For very high sample densities the above would be more in line IMHO, something like 16x or higher.
Ailuros said:As for framebuffer compression, is there any reason a PVR card couldn't impliment this?
I don't think there are any reasons that speak against it, I just don't think a TBDR needs it especially for MSAA. When it comes to antialiasing and the future I'd prefer IHVs to flip to more "exotic" algorithms; irrelevant of architecture it could be eventually possible to get to high sample densities with minimal framebuffer and bandwidth requirements. If the implementation would be a piece of cake we'd be already there obviously. With all the hardware space complex shaders take up though and resources required for R&D, I'm afraid that IHVs have set other priorities for the time being.
Ailuros said:Plus, its not about free memory, its about memory bandwidth. The only thing that doesn't increase 4x is the texture access whichis why MSAA is faster and preferential to SSAA.
I don't see why memory footprint isn't a consideration. Let's suppose NV40 would be capable of combining float blending HDR with MSAA; do you really think that the 256MB framebuffer would be sufficient for let's say 16xMSAA with 64bpp HDR in a high resolution, if the bandwidth would be sufficient?
Indeed, I'd better just add that I am basing this 4 fold increase on using 2x2 MSAA.
Indeed, an IMR is gonna neeed this compression more though if they plan on using SSAA.
Speaking of HDR, ive got a 6800, that can do it right? where do I find a demo of this? did I hear somewhere than you can enable it on doom 3 or some other prog?
Ah, yeah, that's right, he was the PR guy. Slightly less education required for that job....
Dave B(TotalVR) said:Speaking of HDR, ive got a 6800, that can do it right? where do I find a demo of this?
Huh. No problems running it here (GeForce 6800). Not blazing-fast, but about 15-20 fps or so.Dave B(TotalVR) said:I downloaded Timbury, slowest ass demo I have ever run on any graphics crd. Its like 1 fps all the way thorugh. I thought it was supposed to show off Nvidia cards?
Tommti's 3DAnalyzer has an option that allows NV hardware to run most, if not all, ATI demos. I seem to remember being able to run all of the X800 launch demos.LeGreg said:There was also a demo by ATI (Debevec like) but I've lost track wether it can run on NV hardware or not (no technical reason not to).
1. Since it takes longer to produce the multiple pixel samples for SSAA, the hardware doesn't really need to improve memory bandwidth accesses over no AA when supersampling.Dave B(TotalVR) said:Indeed, an IMR is gonna neeed this compression more though if they plan on using SSAA.
That number is missing either the z buffer (would be 80MB total then) or a downsampled frontbuffer (5MB extra).Dave B(TotalVR) said:I was more thinking about memory footprint.
How much ram does a triple buffered 1280x1024 32bit 4x SSAA framebuffer take up?
60MB
Same thing with multisampling. There are basically two choices:Dave B(TotalVR) said:also, in terms of SSAA remember downsampling has to be performed at some point. Im not sure how they do it on IMR's but its gonna add more bandwidth overhead.
Correct. But note that the bandwidth per "pixel" (or sampley) doesn't increase at all -- more likely decrease, as Chalnoth pointed out.Dave B(TotalVR) said:In fact SSAA is 4x the fillrate and bandwidth requirements throughout the pipeline.
No. The primary concern with supersampling is fillrate. Fillrate demands scale linearly with the number of SS samples. Bandwidth requirements scale slightly slower. I.e. you're not going to be bandwidth limited because of supersampling. If you are, you already were without AA. Of course rendering will be slower, but the balance between fill and bandwidth demands won't change much.Dave B(TotalVR) said:Also, from what you say framebuffer compression is going to be more effective under MSAA and less effective under SSAA so surely the bandwidth is more of a concern under SSAA?
Which is the exact point I was attempting to make: supersampling AA multiplies the fillrate and bandwidth requirements by the same amount, meaning it requires no more bandwidth per pixel rendered than normal rendering. Now, there are two other factors at work here, of course: z-buffer compression improves bandwidth usage, and downsampling must be done at some point. But I claim that the combination of these two effects results in a small change in overall efficiency.Dave B(TotalVR) said:In fact SSAA is 4x the fillrate and bandwidth requirements throughout the pipeline.
Certainly, but the reduction in fillrate usage is more significant than the reduction in bandwidth usage. All that you're saying here is that SSAA is slow compared to MSAA, but both end up being fairly close in terms of fillrate/bandwidth ratios, though MSAA does need a bit more bandwidth.Also, from what you say framebuffer compression is going to be more effective under MSAA and less effective under SSAA so surely the bandwidth is more of a concern under SSAA?
Erm, there's something rather wrong in the above. Here:zeckensack said:That number is missing either the z buffer (would be 80MB total then) or a downsampled frontbuffer (5MB extra).Dave B(TotalVR) said:I was more thinking about memory footprint.
How much ram does a triple buffered 1280x1024 32bit 4x SSAA framebuffer take up?
60MB
But in any case, supersampling takes the exact same amount of memory as multisampling. Note that multisampling color compression saves bandwidth, but not footprint.
zeckensack said:That number is missing either the z buffer (would be 80MB total then) or a downsampled frontbuffer (5MB extra).Dave B(TotalVR) said:I was more thinking about memory footprint.
How much ram does a triple buffered 1280x1024 32bit 4x SSAA framebuffer take up?
60MB
But in any case, supersampling takes the exact same amount of memory as multisampling. Note that multisampling color compression saves bandwidth, but not footprint.
Same thing with multisampling. There are basically two choices:Dave B(TotalVR) said:also, in terms of SSAA remember downsampling has to be performed at some point. Im not sure how they do it on IMR's but its gonna add more bandwidth overhead.
1)Downfilter to frontbuffer when flipped
2)Downfilter on scanout ("on the fly")
Correct. But note that the bandwidth per "pixel" (or sampley) doesn't increase at all -- more likely decrease, as Chalnoth pointed out.Dave B(TotalVR) said:In fact SSAA is 4x the fillrate and bandwidth requirements throughout the pipeline.
No. The primary concern with supersampling is fillrate. Fillrate demands scale linearly with the number of SS samples. Bandwidth requirements scale slightly slower. I.e. you're not going to be bandwidth limited because of supersampling. If you are, you already were without AA. Of course rendering will be slower, but the balance between fill and bandwidth demands won't change much.Dave B(TotalVR) said:Also, from what you say framebuffer compression is going to be more effective under MSAA and less effective under SSAA so surely the bandwidth is more of a concern under SSAA?
____
On TBDRs, you don't need a larger external framebuffer, so both bandwidth and footprint requirements shouldn't increase at all. Applies to supersampling and multisampling.
Um, no. There are many factors today that ensure that IMR's are becoming more and more fillrate-limited. Anisotropic filtering, for one, requires a good amount of fillrate, but not as much memory bandwidth. Long shaders require lots of fillrate but little memory bandwidth. Framebuffer compression, z-buffer compression, and early depth check methods further improve memory bandwidth efficiency.Dave B(TotalVR) said:Thing is there is a lot more spare fillrate on IMRS than there is bandwidth. How close do they get to their peak fillrate? When you are rendering the same scene but much bigger how much more efficient is the shading part of your pipeline going to become?
but doesn't a defered renderer also increase the efficenty of the hardware allowing it to speed ahead when a traditional of the same clock speeds and pipelines hit that fillrate wall .Um, no. There are many factors today that ensure that IMR's are becoming more and more fillrate-limited. Anisotropic filtering, for one, requires a good amount of fillrate, but not as much memory bandwidth. Long shaders require lots of fillrate but little memory bandwidth.
Assuming there's any fillrate wall to hit. We've moving further away from any such wall at the moment.jvd said:but doesn't a defered renderer also increase the efficenty of the hardware allowing it to speed ahead when a traditional of the same clock speeds and pipelines hit that fillrate wall .
Probably not. Depends upon the rendering algorithm. Any game which does an initial z pass (a pretty smart thing to do with the longer shaders that many new games today use) would have pretty much the same effective fillrate whether rendered with a deferred renderer or an immediate mode renderer.Would a defered renderer of the same configuration of a x800xt have almost double the fillrate effectively in the average game ? Perhaps more if over draw becomes large .
Assuming there's any fillrate wall to hit. We've moving further away from any such wall at the moment.
There are many factors today that ensure that IMR's are becoming more and more fillrate-limited.
I don't know how efficent the imrs have become .Probably not. Depends upon the rendering algorithm. Any game which does an initial z pass (a pretty smart thing to do with the longer shaders that many new games today use) would have pretty much the same effective fillrate whether rendered with a deferred renderer or an immediate mode renderer.
Right, and the GeForce2 series was quite possibly the most bandwidth-limited of all time (particularly the GeForce2 GTS).jvd said:Last example we've seen is the geforce 2 vs the kyro
Chalnoth said:Right, and the GeForce2 series was quite possibly the most bandwidth-limited of all time (particularly the GeForce2 GTS).jvd said:Last example we've seen is the geforce 2 vs the kyro