X1600 MSAA Fillrates

Dave Baumann

Gamerscore Wh...
Moderator
Legend
Following the discussion over the fillrates of X1800 from our review, I've just done the same numbers from X1600, adding in Colour + Z, and I'm now not at all convinced that ATI's colour/Z fillrate and MSAA interaction are understood.

Here are the numbers from X1600 XT:

Code:
(M Pixel/s)  Colour   Z Fill   Colour + Z
No AA        2237.2   4716.1   2239.8
2x           2073.7   4142.3   1995.7
4x           2025.8   2219.6   1754.1
6x           1998.2   1517.5   1489.9

For comparison here are the numbers from G70 taken from our article:

Code:
(M Pixel/s)  Colour   Z Fill 
No AA        6555.7   12094.7 
2x           6555.7   6555.7 
4x           3410.0   3412.5

First off, it appears to be the case that X1600 maintains is double pumped Z even with AA enabled (unlike G70, but like Xenos). Even with both Z and Colour writes the enabled to colour fill does not appear to be linked to the number of Z samples for required for the level of AA...
 
Actually .. seems I found something wrong on them scores.

http://www.beyond3d.com/reviews/ati/x1800xf/index.php?p=05

Code:
Core Clock (MHz) Fill-rate (Mp/s) Texture Fill-rate (Mt/s) Triangle (Mtris/p) Memory Clock (MHz) Memory Bandwidth (GB/s)
X1800 XT 	625 	10000 	10000 	938 	750 	48.0
X1800 XL 	500 	8000 	8000 	750 	500 	32.0

http://www.beyond3d.com/reviews/ati/r520/index.php?p=15

Code:
Core Clock (MHz) Fill-rate (Mp/s) Texture Fill-rate (Mt/s) Triangle (Mtris/p) Memory Clock (MHz) Memory Bandwidth (GB/s)
X1800 XT 	625 	10000 	10000 	1250 	750 	48.0
X1800 XL 	500 	8000 	8000 	1000 	500 	32.0

Both the XT and XL in the XF Review is either correct or incorrect. If those scores are incorrect .. what are the real rates of the X1800XT XF?



Dave?

US
 
Last edited by a moderator:
Dave Baumann said:
First off, it appears to be the case that X1600 maintains is double pumped Z even with AA enabled (unlike G70, but like Xenos). Even with both Z and Colour writes the enabled to colour fill does not appear to be linked to the number of Z samples for required for the level of AA...
Would someone be so kind as to explain what implications this has for the hardware? Is this related to orthogonalizing(tm) the ROPs to allow for AAing FP buffers?

I finally bothered to work up the R3D D3 AA #s (OMGWTFBBQORLY?), in anticipation of Dave's:

Code:
4xAA hit	10x7	12x10	16x12		
6800 GT		19%	30%	34%
6800 GS		27%	38%	41%
6800		31%	39%	41%
6600 GT		46%	50%	53%
X800 GTO	32%	32%	34%
[B]X1600XT		15%	16%	16%[/B]
X800 GT		34%	35%	38%
So, do these #s confirm Z continues to be double-time with 4xAA? Riddick shows a perf. hit in line with the other cards, but maybe it's not stencil-bound.

Unknown, those appear to be geometry setup rates, which are separate (orthogonal, even :)) from the fillrates being discussed. Dunno which is right. Dave used 2 tris/ck for the R520 review, and 1.5 for the XF one. Maybe he got updated info. For comparison, apparently G70 can set up 2tris/ck and NV4x, 1.5.
 
Last edited by a moderator:
I find it very hard to untangle what data is actually being written in these fill-rate tests.

I believe Z-fill rate is:
  • No AA - 4 bytes per ROP per clock (or 8 bytes if double-pumped)
  • 2xAA - 8 bytes per ROP per clock
  • 4xAA - 16 bytes in two passes, 8 bytes per ROP per clock
Unfortunately, according to this scheme, 7800GTX (and 6800U) cannot have a Z fill of ~12000MP/s, because 4 bytes per Z exceeds the available bandwidth - there's only enough bandwidth for 3 bytes per Z.

Perhaps X1600XT's theoretical Z fill should be double what it currently is (~9400MP/s) but the required bandwidth (28GB/s if 3 bytes per Z?) isn't available. Therefore the halving we should see going from No AA to 2xAA is concealed by the shortfall in bandwidth.

Jawed
 
Jawed said:
Unfortunately, according to this scheme, 7800GTX (and 6800U) cannot have a Z fill of ~12000MP/s, because 4 bytes per Z exceeds the available bandwidth - there's only enough bandwidth for 3 bytes per Z.
Umm, frame/Z-buffer compression?
 
Dunno.

As far as I can tell this test is a good indicator of stencil shadow volume performance, which requires that every slot of the stencil is filled. I can't see how it's possible to take a short cut (compression) in writing data to stencil. But I'm woolly on the subject...

I have a nagging feeling that the Z fill test actually works on a 16-bit Z and 8-bit stencil. Dunno.

It would be nice to get a definitive statement on the data format of the back buffer for each of these tests.

Jawed
 
I thought it would be useful to post the fill-rates in terms of the theoretical colour fill rate.

X1600XT

b3d41.gif


X1800XL

b3d42.gif


7800GTX

b3d43.gif


Jawed
 
Jawed said:
It would be nice to get a definitive statement on the data format of the back buffer for each of these tests.
I'd second that. ATis and Nvidias default to different Back- and, much more important, Z-Buffer Formats. Whereas Nvidia uses D24S8 as default, Radeons go for a lockable 16-Bit-Format.

Dave?
 
As a matter of interest I fiddled around with Dolenc's fillrate tester (no idea of version) and found that varying the Z buffer format didn't make any appreciable difference in performance on my AIW X800XT (with a pile of other open apps and several tracks from Zion Train pumping along for extra speed).

Variations were within 5%. Running at 1024x768 85Hz.

I presume this tester is making an effort to get round compression techniques by making every pixel and Z/stencil different. I only say that cos of the pretty colours displayed, every pixel being a different colour. Dunno though.

Presumably Dave is using some other tester to get the Colour+Z fill-rates he reported for X1600XT.

Jawed
 
Quasar said:
I'd second that. ATis and Nvidias default to different Back- and, much more important, Z-Buffer Formats. Whereas Nvidia uses D24S8 as default, Radeons go for a lockable 16-Bit-Format.

Dave?
In D3D at least, the application specifies what Z buffer format to use, not the driver. Same goes for the backbuffer format.
 
Thanks, Nick.

Quick and dirty results Catalyst 5.11, 1024x768, 32-bits D24S8, average of 5 runs:

Code:
No AA
                  Color Fill : 5697.542 M-Pixel/s
                      Z Fill : 7801.908 M-Pixel/s
              Color + Z Fill : 4359.728 M-Pixel/s
 
2xAA
                  Color Fill : 5391.023 M-Pixel/s
                      Z Fill : 7103.306 M-Pixel/s
              Color + Z Fill : 3704.913 M-Pixel/s
 
4xAA
                  Color Fill : 5062.861 M-Pixel/s
                      Z Fill : 3798.53 M-Pixel/s
              Color + Z Fill : 2905.143 M-Pixel/s

Lots of apps open but this time with Tindersticks - slower paced music, but curiously slightly faster fillrates than yesterday.

This benchmark draws multiple quads to cover the screen whereas I think Dolenc's draws a single screen-covering quad.

Jawed
 
Last edited by a moderator:
Back
Top