9800/9700 Z-fill question.

So, seems to me like the summary is:

4xAA: no cards have fast Z writes
2xAA: GeForceFX and Radeon 9x00 have speed gain (more for the GeForceFX). No speed gain for GeForce 6800.
No AA: GeForceFX and GeForce 6800 have a speed gain (almost 2x). Radeons have no speed gain.

I'm baffled by the 4xAA results. It's a commonly used AA mode, and I'm not sure why there's no optimisation for it on any of the cards. Any technical reason for it? (I can't really see it.)

I can see ATI gaining no benefit from this in benchmarks, since No-AA and 4xAA are the common settings. 2xAA is rarely benchmarked. It'd be interesting to see some 2x benchmark results for Doom3.
 
I'm baffled by the 4xAA results. It's a commonly used AA mode, and I'm not sure why there's no optimisation for it on any of the cards. Any technical reason for it? (I can't really see it.)

Bandwidth.
 
Mostly internal bandwidth (from ROP to tile cache), and die space for the ROPs. External bandwidth too, but compression does a good job in reducing the difference between 2x and 4x. The case that pixels take only one clock in the shader isn't that common to justify the cost any more. Two ROPs per pipe are enough.
 
Yep, since if you simply look at the multitexturing results, you'll note that at more than two textures per pixel, there is no performance drop from 2x or 4x FSAA. This shows that the if a pixel takes more than one clock to output (at 4x FSAA, it takes two clocks), then there's no problem if the pxiel takes at least that amount of time to process.
 
It should be noted that the 5900XT (mine) on the second graph is running at 5900 ultra speeds (hence it beating the 5900).

I'm rather glad that the Z fill boost is still apparent on the NV35 at 2xAA.
I was ruling out the use of AA in Doom3 completely, but it looks like I should get reasonable results with 2x at least.

It's unfortunate that NV40 class cards don't enjoy the same Z boost with AA, but they seem plenty fast enough anyway :)
 
Chalnoth said:
Now I just wish the 6800 non-Ultra had 256MB of RAM. With the high texture requirements of some games (UT2k4, Far Cry, in particular), it really needs it.
Yeah, but then the 6800 wouldn't be such a bargain. I think ATI needs to desperately release a 128MB X800PRO to compete with the 6800, or a 500MHz eight-piped card to replace the 9800XT.

Heck, even 256MB 9800PRO's cost nearly $300. I don't see how you could market a 256MB 6800NU. The GT is so much more for just a bit more money.
 
Man, the first page of this thread has got to be the longest (in terms of how many lines) that I've ever seen here. :D

Now all we need are some X800 results, both XT and PRO.
 
Well, Mintmaster, just keep in mind that the 6800 non-Ultra is limited far more by its amount of memory than its fillrate. I think a 256MB non-Ultra would fit well at around $350 msrp.

But, I think I'm going to keep my non-Ultra for the time being. It's really pretty good, and only rarely do I have to drop below 1280x960 with 4x FSAA. I'll just wait until I get a PCI Express platform before changing video cards again.
 
Code:
FillrateBenchmark(tm) 2004 - "easy benchmark series"

    Benchmark Main Program Version: FRB_V092
    Benchmark Date/Time : 23/07/2004 12:23:19 PM

                     System Information
-----------------------------------------------------------
        CPU : Intel(R) Pentium(R) 4 CPU 2.40GHz
        GFX : RADEON X800 XT Platinum Edition
         OS : Microsoft Windows XP
   Settings : 1024x768  32 bits  D24S8  No AA

                      Benchmark Result
-----------------------------------------------------------
           FrameBuffer Clear : 8281.6 FPS
                  Color Fill : 5551.581 M-Pixel/s
                      Z Fill : 8229.225 M-Pixel/s
              Color + Z Fill : 4836.871 M-Pixel/s
              Single Texture : 5538.998 M-Pixel/s
  Single Texture Alpha Blend : 3279.107 M-Pixel/s
               Dual Textures : 3384.803 M-Pixel/s
             Triple Textures : 2355.521 M-Pixel/s
               Quad Textures : 1774.191 M-Pixel/s
    1 Floating Poing Texture : 3576.064 M-Pixel/s
              Render to Self : 4634.706 M-Pixel/s
               PS 1.1 Simple : 4736.208 M-Pixel/s
               PS 1.4 Simple : 4731.175 M-Pixel/s
               PS 2.0 Simple : 4731.175 M-Pixel/s
            PS 2.0 PP Simple : 4731.175 M-Pixel/s
     Customized Pixel Shader : 6029.732 M-Pixel/s
              PS 2.0 Complex : (Unsupported)
           PS 2.0 PP Complex : (Unsupported)
     PS 2.0 Massive Register : (Unsupported)
  PS 2.0 PP Massive Register : (Unsupported)
 PS 2.0 Sincos Procedure Tex : (Unsupported)
   PS 2.0 Per-Pixel Lighting : (Unsupported)
-----------------------------------------------------------
    * End of FillrateBenchmark Result

FillrateBenchmark(tm) 2004 - "easy benchmark series"

    Benchmark Main Program Version: FRB_V092
    Benchmark Date/Time : 23/07/2004 12:26:27 PM

                     System Information
-----------------------------------------------------------
        CPU : Intel(R) Pentium(R) 4 CPU 2.40GHz
        GFX : RADEON X800 XT Platinum Edition
         OS : Microsoft Windows XP
   Settings : 1024x768  32 bits  D24S8  2x FSAA

                      Benchmark Result
-----------------------------------------------------------
           FrameBuffer Clear : 83507.2 FPS
                  Color Fill : 4648.127 M-Pixel/s
                      Z Fill : 7547.23 M-Pixel/s
              Color + Z Fill : 3752.225 M-Pixel/s
              Single Texture : 4318.456 M-Pixel/s
  Single Texture Alpha Blend : 3017.382 M-Pixel/s
               Dual Textures : 3311.823 M-Pixel/s
             Triple Textures : 2297.64 M-Pixel/s
               Quad Textures : 1751.541 M-Pixel/s
    1 Floating Poing Texture : 3545.865 M-Pixel/s
              Render to Self : 4592.553 M-Pixel/s
               PS 1.1 Simple : 4006.399 M-Pixel/s
               PS 1.4 Simple : 4001.366 M-Pixel/s
               PS 2.0 Simple : 4001.366 M-Pixel/s
            PS 2.0 PP Simple : 4001.366 M-Pixel/s
     Customized Pixel Shader : 4645.611 M-Pixel/s
              PS 2.0 Complex : (Unsupported)
           PS 2.0 PP Complex : (Unsupported)
     PS 2.0 Massive Register : (Unsupported)
  PS 2.0 PP Massive Register : (Unsupported)
 PS 2.0 Sincos Procedure Tex : (Unsupported)
   PS 2.0 Per-Pixel Lighting : (Unsupported)
-----------------------------------------------------------
    * End of FillrateBenchmark Result

 FillrateBenchmark(tm) 2004 - "easy benchmark series"

    Benchmark Main Program Version: FRB_V092
    Benchmark Date/Time : 23/07/2004 12:28:43 PM

                     System Information
-----------------------------------------------------------
        CPU : Intel(R) Pentium(R) 4 CPU 2.40GHz
        GFX : RADEON X800 XT Platinum Edition
         OS : Microsoft Windows XP
   Settings : 1024x768  32 bits  D24S8  4x FSAA

                      Benchmark Result
-----------------------------------------------------------
           FrameBuffer Clear : 83251.2 FPS
                  Color Fill : 3422.552 M-Pixel/s
                      Z Fill : 3840.305 M-Pixel/s
              Color + Z Fill : 2254.858 M-Pixel/s
              Single Texture : 3185.993 M-Pixel/s
  Single Texture Alpha Blend : 2322.806 M-Pixel/s
               Dual Textures : 2821.089 M-Pixel/s
             Triple Textures : 2113.929 M-Pixel/s
               Quad Textures : 1703.726 M-Pixel/s
    1 Floating Poing Texture : 3085.33 M-Pixel/s
              Render to Self : 4598.635 M-Pixel/s
               PS 1.1 Simple : 2946.918 M-Pixel/s
               PS 1.4 Simple : 2941.885 M-Pixel/s
               PS 2.0 Simple : 2944.401 M-Pixel/s
            PS 2.0 PP Simple : 2941.885 M-Pixel/s
     Customized Pixel Shader : 3425.069 M-Pixel/s
              PS 2.0 Complex : (Unsupported)
           PS 2.0 PP Complex : (Unsupported)
     PS 2.0 Massive Register : (Unsupported)
  PS 2.0 PP Massive Register : (Unsupported)
 PS 2.0 Sincos Procedure Tex : (Unsupported)
   PS 2.0 Per-Pixel Lighting : (Unsupported)
-----------------------------------------------------------
    * End of FillrateBenchmark Result
 
Chalnoth said:
Well, Mintmaster, just keep in mind that the 6800 non-Ultra is limited far more by its amount of memory than its fillrate. I think a 256MB non-Ultra would fit well at around $350 msrp.
Really? You're spending $350 on a video card, and you won't pony up $50 (only 14% more dough) for 43% more fillrate in the GT?

Generally, things in the electronics business scale the other way. Cost increases faster than performance. No offence, but only the uneducated and illogical would buy a 256MB 6800NU for $350.
 
Mintmaster said:
Chalnoth said:
Well, Mintmaster, just keep in mind that the 6800 non-Ultra is limited far more by its amount of memory than its fillrate. I think a 256MB non-Ultra would fit well at around $350 msrp.
Really? You're spending $350 on a video card, and you won't pony up $50 (only 14% more dough) for 43% more fillrate in the GT?

Generally, things in the electronics business scale the other way. Cost increases faster than performance. No offence, but only the uneducated and illogical would buy a 256MB 6800NU for $350.

well there is a 6800NU with 256 Megs of GDDR3 memory @ 1ghz, For 50 dollars less you lose fillrate, But not bandwith /shrug.

For some reason I am finding the 6800NU is not as bandwith limited as I originally guess. Overclocking my memory from 700 to 900 mhz didnt yield the performance gains I expected or hoped for.
 
ET said:
I'm baffled by the 4xAA results. It's a commonly used AA mode, and I'm not sure why there's no optimisation for it on any of the cards. Any technical reason for it? (I can't really see it.)
I think you're sort of reading the results incorrectly. The 2xAA optimization means 4xAA rasterization is now twice the speed of supersampling, and 4xAA on FX chips is full speed, minus the double speed Z write. In many cases, you're slowed down by bandwidth, multiple textures, trilinear filtering, shaders, geometry, etc. It's not worth the transistors if it's not the weakest link most of the time.

Here's a summary of pixels written per clock (ideal):
Code:
       NoAA    2xAA    4xAA
Chip   C  Z    C  Z    C  Z
---------------------------
R300   8  8    8  8    4  4
R420  16 16   16 16    8  8
NV35   4  8    4  8    4  4
NV40  16 32   16 16    8  8
 
Mintmaster said:
ET said:
I'm baffled by the 4xAA results. It's a commonly used AA mode, and I'm not sure why there's no optimisation for it on any of the cards. Any technical reason for it? (I can't really see it.)
I think you're sort of reading the results incorrectly. The 2xAA optimization means 4xAA rasterization is now twice the speed of supersampling, and 4xAA on FX chips is full speed, minus the double speed Z write. In many cases, you're slowed down by bandwidth, multiple textures, trilinear filtering, shaders, geometry, etc. It's not worth the transistors if it's not the weakest link most of the time.

Here's a summary of pixels written per clock (ideal):
Code:
       NoAA    2xAA    4xAA
Chip   C  Z    C  Z    C  Z
---------------------------
R300   8  8    8  8    4  4
R420  16 16   16 16    8  8
NV35   4  8    4  8    4  4
NV40  16 32   16 16    8  8


Mintmaster, Regarding that table, Are you talking about the ROPS? Because with No AA/ wouldnt the Geforce 6800NU ( be)

12/24/ 12/12/ 8/8 ?
 
I mean exactly what I said - max pixels (not samples) written to the framebuffer per clock. I never included 6800NU in there (I was talking about 6800U when I said NV40), but you're right.
 
Mintmaster said:
I mean exactly what I said - max pixels (not samples) written to the framebuffer per clock. I never included 6800NU in there (I was talking about 6800U when I said NV40), but you're right.

Sorry wasnt sure, Since the thread started with a 6800NU, so I just needed you to clarify. Thanks of course :)
 
Mintmaster said:
Here's a summary of pixels written per clock (ideal):
Code:
       NoAA    2xAA    4xAA
Chip   C  Z    C  Z    C  Z
---------------------------
R300   8  8    8  8    4  4
R420  16 16   16 16    8  8
NV35   4  8    4  8    4  4
NV40  16 32   16 16    8  8

Might be interesting to get blends in there as well.

Xmas said:
So I assume X800 Pro is 12/12 12/12 6/6. Anyone able to confirm this?

Test rig is in an NV configuration at the moment, so I can't test - but looking at the Z rates at the bottom of this page that would be correct.
 
DaveBaumann said:
Mintmaster said:
Here's a summary of pixels written per clock (ideal):
Code:
       NoAA    2xAA    4xAA
Chip   C  Z    C  Z    C  Z
---------------------------
R300   8  8    8  8    4  4
R420  16 16   16 16    8  8
NV35   4  8    4  8    4  4
NV40  16 32   16 16    8  8

Might be interesting to get blends in there as well.

Hmm, I thought you already had the facts on this, Dave?

NV40: Half the fillrate = 8 pixels (no MSAA possible).
 
No, thats FP blending (which runs at 4 pixels) - its been suggested that there are only 8 integer blending ops per cycle. I've just run 3DMark2001SE's single texturing fill-rate test in 16 bit and it just achieves a little over half its theoretrical rate, but I'm not sure thats bandwidth limited since its so close to its 32-bit rate (32-bit: 3481.7, 16-bit: 3586.5 MT/s)
 
DaveBaumann said:
No, thats FP blending (which runs at 4 pixels) - its been suggested that there are only 8 integer blending ops per cycle. I've just run 3DMark2001SE's single texturing fill-rate test in 16 bit and it just achieves a little over half its theoretrical rate, but I'm not sure thats bandwidth limited since its so close to its 32-bit rate (32-bit: 3481.7, 16-bit: 3586.5 MT/s)

Okay, sorry to have misunderstood you... Is the above true for both NV40 and R420? And any reason to think that blending performance would ever be the bottleneck if so?
 
Back
Top