PowerVR 5?

Ailuros said:
MSAA requires a 4 fold increase in z-buffer memory access and a 4 fold increase in framebuffer access (both of which dont effect a tiler). It also requires a much smaller increase in texture bandwidth and fillrate. There will be an ocean of difference in performance drop between a tiler and an IMR for MSAA.

For 2/4x samples per pixel and today's standards I do have some doubts. Today's architectures get 2xMSAA virtually "for free" and that because they're only capable of 2x samples per cycle. 4x or higher get achieved via sample loops. For very high sample densities the above would be more in line IMHO, something like 16x or higher.


Indeed, I'd better just add that I am basing this 4 fold increase on using 2x2 MSAA :rolleyes:



Ailuros said:
As for framebuffer compression, is there any reason a PVR card couldn't impliment this?

I don't think there are any reasons that speak against it, I just don't think a TBDR needs it especially for MSAA. When it comes to antialiasing and the future I'd prefer IHVs to flip to more "exotic" algorithms; irrelevant of architecture it could be eventually possible to get to high sample densities with minimal framebuffer and bandwidth requirements. If the implementation would be a piece of cake we'd be already there obviously. With all the hardware space complex shaders take up though and resources required for R&D, I'm afraid that IHVs have set other priorities for the time being.

Indeed, an IMR is gonna neeed this compression more though if they plan on using SSAA.


Ailuros said:
Plus, its not about free memory, its about memory bandwidth. The only thing that doesn't increase 4x is the texture access whichis why MSAA is faster and preferential to SSAA.

I don't see why memory footprint isn't a consideration. Let's suppose NV40 would be capable of combining float blending HDR with MSAA; do you really think that the 256MB framebuffer would be sufficient for let's say 16xMSAA with 64bpp HDR in a high resolution, if the bandwidth would be sufficient?


Well, I thought we were talking about a tiler here, I had also completely forgotten about HDR.

Speaking of HDR, ive got a 6800, that can do it right? where do I find a demo of this? did I hear somewhere than you can enable it on doom 3 or some other prog?
 
Indeed, I'd better just add that I am basing this 4 fold increase on using 2x2 MSAA.

Well personally I've asked a couple of times if and when we're going to see single cycle 4xMSAA on graphics accelerators. I was impressed when I saw that the Mali100 can do it. With 4 cycles it achieves 16xMSAA.


Indeed, an IMR is gonna neeed this compression more though if they plan on using SSAA.

W/o being entirely sure, I think the benefits of colour compression with SSAA are relatively small.

Speaking of HDR, ive got a 6800, that can do it right? where do I find a demo of this? did I hear somewhere than you can enable it on doom 3 or some other prog?

FarCry, patch 1.3. A rather quirky implementation IMHO, if you want to judge the improvements HDR can deliver. HDR seems like an afterthought to me in FC; there are spots where it really shows why we need better lighting and there are spots where it's not only too much, yet the overall aliasing it adds on textures can get disgusting.

I have high hopes for it, yet I wouldn't suggest to judge it by Crytek's implementation.

Ah, yeah, that's right, he was the PR guy. Slightly less education required for that job....

Slightly less was another good one ROFL :LOL:
 
Dave B(TotalVR) said:
Speaking of HDR, ive got a 6800, that can do it right? where do I find a demo of this?

Far Cry with patch 1.3, and also try to download Timbury from the Nvidia site, that's the demo that was demonstrated at the 6800 launch.

There was also a demo by ATI (Debevec like) but I've lost track wether it can run on NV hardware or not (no technical reason not to).
 
I downloaded Timbury, slowest ass demo I have ever run on any graphics crd. Its like 1 fps all the way thorugh. I thought it was supposed to show off Nvidia cards?
 
Dave B(TotalVR) said:
I downloaded Timbury, slowest ass demo I have ever run on any graphics crd. Its like 1 fps all the way thorugh. I thought it was supposed to show off Nvidia cards?
Huh. No problems running it here (GeForce 6800). Not blazing-fast, but about 15-20 fps or so.
 
LeGreg said:
There was also a demo by ATI (Debevec like) but I've lost track wether it can run on NV hardware or not (no technical reason not to).
Tommti's 3DAnalyzer has an option that allows NV hardware to run most, if not all, ATI demos. I seem to remember being able to run all of the X800 launch demos.
 
Dave B(TotalVR) said:
Indeed, an IMR is gonna neeed this compression more though if they plan on using SSAA.
1. Since it takes longer to produce the multiple pixel samples for SSAA, the hardware doesn't really need to improve memory bandwidth accesses over no AA when supersampling.
2. Z-buffer compression improves automatically with any form of antialiasing enabled, as triangles automatically get larger compared to pixels.
3. Framebuffer compression is completely dependent upon the fact that in multisampling, multiple pixel samples will contain the same color very frequently. Obviously this won't happen with supersampling.

So, in the end, supersampling ensures that memory bandwidth usage is even less of a concern than in the cases with multisampling or with no AA.
 
I was more thinking about memory footprint.

How much ram does a triple buffered 1280x1024 32bit 4x SSAA framebuffer take up?


60MB

also, in terms of SSAA remember downsampling has to be performed at some point. Im not sure how they do it on IMR's but its gonna add more bandwidth overhead. In fact SSAA is 4x the fillrate and bandwidth requirements throughout the pipeline.

Also, from what you say framebuffer compression is going to be more effective under MSAA and less effective under SSAA so surely the bandwidth is more of a concern under SSAA?
 
Dave B(TotalVR) said:
I was more thinking about memory footprint.
How much ram does a triple buffered 1280x1024 32bit 4x SSAA framebuffer take up?


60MB
That number is missing either the z buffer (would be 80MB total then) or a downsampled frontbuffer (5MB extra).
But in any case, supersampling takes the exact same amount of memory as multisampling. Note that multisampling color compression saves bandwidth, but not footprint.
Dave B(TotalVR) said:
also, in terms of SSAA remember downsampling has to be performed at some point. Im not sure how they do it on IMR's but its gonna add more bandwidth overhead.
Same thing with multisampling. There are basically two choices:
1)Downfilter to frontbuffer when flipped
2)Downfilter on scanout ("on the fly")
Dave B(TotalVR) said:
In fact SSAA is 4x the fillrate and bandwidth requirements throughout the pipeline.
Correct. But note that the bandwidth per "pixel" (or sampley) doesn't increase at all -- more likely decrease, as Chalnoth pointed out.
Dave B(TotalVR) said:
Also, from what you say framebuffer compression is going to be more effective under MSAA and less effective under SSAA so surely the bandwidth is more of a concern under SSAA?
No. The primary concern with supersampling is fillrate. Fillrate demands scale linearly with the number of SS samples. Bandwidth requirements scale slightly slower. I.e. you're not going to be bandwidth limited because of supersampling. If you are, you already were without AA. Of course rendering will be slower, but the balance between fill and bandwidth demands won't change much.

____
On TBDRs, you don't need a larger external framebuffer, so both bandwidth and footprint requirements shouldn't increase at all. Applies to supersampling and multisampling.
 
Dave B(TotalVR) said:
In fact SSAA is 4x the fillrate and bandwidth requirements throughout the pipeline.
Which is the exact point I was attempting to make: supersampling AA multiplies the fillrate and bandwidth requirements by the same amount, meaning it requires no more bandwidth per pixel rendered than normal rendering. Now, there are two other factors at work here, of course: z-buffer compression improves bandwidth usage, and downsampling must be done at some point. But I claim that the combination of these two effects results in a small change in overall efficiency.

Also, from what you say framebuffer compression is going to be more effective under MSAA and less effective under SSAA so surely the bandwidth is more of a concern under SSAA?
Certainly, but the reduction in fillrate usage is more significant than the reduction in bandwidth usage. All that you're saying here is that SSAA is slow compared to MSAA, but both end up being fairly close in terms of fillrate/bandwidth ratios, though MSAA does need a bit more bandwidth.
 
zeckensack said:
Dave B(TotalVR) said:
I was more thinking about memory footprint.
How much ram does a triple buffered 1280x1024 32bit 4x SSAA framebuffer take up?


60MB
That number is missing either the z buffer (would be 80MB total then) or a downsampled frontbuffer (5MB extra).
But in any case, supersampling takes the exact same amount of memory as multisampling. Note that multisampling color compression saves bandwidth, but not footprint.
Erm, there's something rather wrong in the above. Here:
Total usage = 1280x1024x4(bytes per pixel) * (total number of buffers)
Therefore:
Total usage = 1280x1024x4 * (4 for the front buffer + 2 for two back buffers + 4 for the z-buffer)

...which totals to 50MB. Now, the hardware may wish to do the downsampling at scanout, but in that case it will also typically just use double buffering, which leads to a total of 60MB. But all this is immaterial. Most cards capable of such a thing are 256MB cards anyway, and so it's just not a big deal.
 
zeckensack said:
Dave B(TotalVR) said:
I was more thinking about memory footprint.
How much ram does a triple buffered 1280x1024 32bit 4x SSAA framebuffer take up?


60MB
That number is missing either the z buffer (would be 80MB total then) or a downsampled frontbuffer (5MB extra).
But in any case, supersampling takes the exact same amount of memory as multisampling. Note that multisampling color compression saves bandwidth, but not footprint.
Dave B(TotalVR) said:
also, in terms of SSAA remember downsampling has to be performed at some point. Im not sure how they do it on IMR's but its gonna add more bandwidth overhead.
Same thing with multisampling. There are basically two choices:
1)Downfilter to frontbuffer when flipped
2)Downfilter on scanout ("on the fly")
Dave B(TotalVR) said:
In fact SSAA is 4x the fillrate and bandwidth requirements throughout the pipeline.
Correct. But note that the bandwidth per "pixel" (or sampley) doesn't increase at all -- more likely decrease, as Chalnoth pointed out.
Dave B(TotalVR) said:
Also, from what you say framebuffer compression is going to be more effective under MSAA and less effective under SSAA so surely the bandwidth is more of a concern under SSAA?
No. The primary concern with supersampling is fillrate. Fillrate demands scale linearly with the number of SS samples. Bandwidth requirements scale slightly slower. I.e. you're not going to be bandwidth limited because of supersampling. If you are, you already were without AA. Of course rendering will be slower, but the balance between fill and bandwidth demands won't change much.

____
On TBDRs, you don't need a larger external framebuffer, so both bandwidth and footprint requirements shouldn't increase at all. Applies to supersampling and multisampling.

Well first off, I missed the z-buffer deliberately because I said 'framebuffer'. Also, the bandwidth may decrease slightly (due to cache efficiency etc) with SSAA but the requirements aren't going to be hugley different per pixel. Thing is there is a lot more spare fillrate on IMRS than there is bandwidth. How close do they get to their peak fillrate? When you are rendering the same scene but much bigger how much more efficient is the shading part of your pipeline going to become?
 
Dave B(TotalVR) said:
Thing is there is a lot more spare fillrate on IMRS than there is bandwidth. How close do they get to their peak fillrate? When you are rendering the same scene but much bigger how much more efficient is the shading part of your pipeline going to become?
Um, no. There are many factors today that ensure that IMR's are becoming more and more fillrate-limited. Anisotropic filtering, for one, requires a good amount of fillrate, but not as much memory bandwidth. Long shaders require lots of fillrate but little memory bandwidth. Framebuffer compression, z-buffer compression, and early depth check methods further improve memory bandwidth efficiency.

You were right about making that statement back in the days of the GeForce2. Not anymore. Cards like the GeForce 6600 GT show this pretty conclusively.
 
Um, no. There are many factors today that ensure that IMR's are becoming more and more fillrate-limited. Anisotropic filtering, for one, requires a good amount of fillrate, but not as much memory bandwidth. Long shaders require lots of fillrate but little memory bandwidth.
but doesn't a defered renderer also increase the efficenty of the hardware allowing it to speed ahead when a traditional of the same clock speeds and pipelines hit that fillrate wall .

Would a defered renderer of the same configuration of a x800xt have almost double the fillrate effectively in the average game ? Perhaps more if over draw becomes large .
 
jvd said:
but doesn't a defered renderer also increase the efficenty of the hardware allowing it to speed ahead when a traditional of the same clock speeds and pipelines hit that fillrate wall .
Assuming there's any fillrate wall to hit. We've moving further away from any such wall at the moment.

Would a defered renderer of the same configuration of a x800xt have almost double the fillrate effectively in the average game ? Perhaps more if over draw becomes large .
Probably not. Depends upon the rendering algorithm. Any game which does an initial z pass (a pretty smart thing to do with the longer shaders that many new games today use) would have pretty much the same effective fillrate whether rendered with a deferred renderer or an immediate mode renderer.
 
Assuming there's any fillrate wall to hit. We've moving further away from any such wall at the moment.

well u just finished saying this

There are many factors today that ensure that IMR's are becoming more and more fillrate-limited.

Probably not. Depends upon the rendering algorithm. Any game which does an initial z pass (a pretty smart thing to do with the longer shaders that many new games today use) would have pretty much the same effective fillrate whether rendered with a deferred renderer or an immediate mode renderer.
I don't know how efficent the imrs have become .

Last example we've seen is the geforce 2 vs the kyro
 
jvd said:
Last example we've seen is the geforce 2 vs the kyro
Right, and the GeForce2 series was quite possibly the most bandwidth-limited of all time (particularly the GeForce2 GTS).
 
Chalnoth said:
jvd said:
Last example we've seen is the geforce 2 vs the kyro
Right, and the GeForce2 series was quite possibly the most bandwidth-limited of all time (particularly the GeForce2 GTS).

hence why i've said i dont know how efficent they have become compared to the deffered renderers


Though looking into the future high speed ram is not going to be signifigantly higher than it is now . I don't think we will see more than 700mhz ram by the end of the year . Do to this i still think deffered renderers will have an advantage because of any slight or substantial savings it can net over traditionals .
 
Back
Top