9800/9700 Z-fill question.

LeStoffer said:
Okay, sorry to have misunderstood you... Is the above true for both NV40 and R420? And any reason to think that blending performance would ever be the bottleneck if so?

In truth, I can't remember having run 3DMark2001SE's fill-rate test in 16-bit on any R300+ board. However, 9800 PRO has a theoretical rate of 3040 MP/s and achieves 2025.1MP/s in 32-bit - which is well above half. I would assume R420 to have the same layout, given the lineage similarities.

As for bottlenecks; other than the 3DMArk test blending is probably going to be most used in, what, alpha situations and multipass situations.

Xmas said:
Which card/clocks is that?

That was a 400/550 NV40 Ultra. Its theoretical rate would be 6800MP/s.

EDIT: [Spaz] I've just noticed the fill-rate app used in this thread (which I can't seem to download now) has a single texture alpha blend and it appears to be consistently half the colour fill rate on NV40's.
 
Mintmaster said:
In many cases, you're slowed down by bandwidth, multiple textures, trilinear filtering, shaders, geometry, etc. It's not worth the transistors if it's not the weakest link most of the time.

Several people said that here in response to my post. But doesn't Doom3 use a separate Z pass is made without any pixel writing? Wasn't it said that NVIDIA cards will work better with it because of this? Has the discussion of this feature switched context while I wasn't looking?
 
DaveBaumann said:
That was a 400/550 NV40 Ultra. Its theoretical rate would be 6800MP/s.
6400MP/s
AFAIK the 3DMark2k1 single texturing test draws 64 layers. Interestingly, if it were only eight passes, your results would make more sense, as
(1*6400 + 7*3200) / 8 = 3600
but
(1*6400 + 63*3200) / 64 = 3250
 
ET said:
Several people said that here in response to my post. But doesn't Doom3 use a separate Z pass is made without any pixel writing? Wasn't it said that NVIDIA cards will work better with it because of this? Has the discussion of this feature switched context while I wasn't looking?
Well, let's say you have 4:1 compression on Z/stencil, that's 8 bits that need to be read and written per pixel in a Z-only pass. If you have 32 Z-ROPs, that's 2*32*8 = 512 bits per clock. There's not much to gain by increasing the number of ROPs.
 
DaveBaumann said:
That was a 400/550 NV40 Ultra. Its theoretical rate would be 6800MP/s.

EDIT: [Spaz] I've just noticed the fill-rate app used in this thread (which I can't seem to download now) has a single texture alpha blend and it appears to be consistently half the colour fill rate on NV40's.
But it's still above half, even if only slightly.
 
Here are my numbers:

1x AA
Code:
FillrateBenchmark(tm) 2004 - "easy benchmark series"

    Benchmark Main Program Version: FRB_V092
    Benchmark Date/Time : 04.02.2005 23:23:07

                     System Information
-----------------------------------------------------------
        CPU : AMD Athlon(tm) XP 2200+
        GFX : NVIDIA GeForce 6600 GT 
         OS : Microsoft Windows XP
   Settings : 1024x768  32 bits  D16  No AA

                      Benchmark Result
-----------------------------------------------------------
           FrameBuffer Clear : 7257,6 FPS
                  Color Fill : 1897,503 M-Pixel/s
                      Z Fill : 3797,523 M-Pixel/s
              Color + Z Fill : 1832,072 M-Pixel/s
              Single Texture : 1905,053 M-Pixel/s
  Single Texture Alpha Blend : 1509,949 M-Pixel/s
               Dual Textures : 1905,053 M-Pixel/s
             Triple Textures : 1336,305 M-Pixel/s
               Quad Textures : 958,8179 M-Pixel/s
    1 Floating Poing Texture : 1905,053 M-Pixel/s
              Render to Self : 1497,576 M-Pixel/s
               PS 1.1 Simple : 1902,536 M-Pixel/s
               PS 1.4 Simple : 1907,57 M-Pixel/s
               PS 2.0 Simple : 1905,053 M-Pixel/s
            PS 2.0 PP Simple : 1905,053 M-Pixel/s
     Customized Pixel Shader : 1892,47 M-Pixel/s
              PS 2.0 Complex : (Unsupported)
           PS 2.0 PP Complex : (Unsupported)
     PS 2.0 Massive Register : (Unsupported)
  PS 2.0 PP Massive Register : (Unsupported)
 PS 2.0 Sincos Procedure Tex : (Unsupported)

2x AA
Code:
                      Benchmark Result
-----------------------------------------------------------
           FrameBuffer Clear : 3673,6 FPS
                  Color Fill : 1902,536 M-Pixel/s
                      Z Fill : 1892,47 M-Pixel/s
              Color + Z Fill : 1799,356 M-Pixel/s
              Single Texture : 1905,053 M-Pixel/s
  Single Texture Alpha Blend : 1905,053 M-Pixel/s
               Dual Textures : 1877,37 M-Pixel/s
             Triple Textures : 1336,305 M-Pixel/s
               Quad Textures : 958,8179 M-Pixel/s
    1 Floating Poing Texture : 1894,987 M-Pixel/s
              Render to Self : 1482,687 M-Pixel/s
               PS 1.1 Simple : 1751,541 M-Pixel/s
               PS 1.4 Simple : 1754,058 M-Pixel/s
               PS 2.0 Simple : 1754,058 M-Pixel/s
            PS 2.0 PP Simple : 1751,541 M-Pixel/s
     Customized Pixel Shader : 1905,053 M-Pixel/s

4x AA
Code:
-----------------------------------------------------------
           FrameBuffer Clear : 1548,8 FPS
                  Color Fill : 961,3345 M-Pixel/s
                      Z Fill : 958,8179 M-Pixel/s
              Color + Z Fill : 961,3345 M-Pixel/s
              Single Texture : 958,8179 M-Pixel/s
  Single Texture Alpha Blend : 958,8179 M-Pixel/s
               Dual Textures : 958,8179 M-Pixel/s
             Triple Textures : 958,8179 M-Pixel/s
               Quad Textures : 840,5386 M-Pixel/s
    1 Floating Poing Texture : 958,8179 M-Pixel/s
              Render to Self : 1496,737 M-Pixel/s
               PS 1.1 Simple : 958,8179 M-Pixel/s
               PS 1.4 Simple : 961,3345 M-Pixel/s
               PS 2.0 Simple : 961,3345 M-Pixel/s
            PS 2.0 PP Simple : 956,3013 M-Pixel/s
     Customized Pixel Shader : 958,8179 M-Pixel/s
 
Everyone else is doing it....

Might be interesting to compare my numbers with aths', as I have a pretty similar system. Don't ask me what happened with the AA'ed buffer clears. I'm also not sure why his 6600GT is so competitive with mine in alpha blends, as I thought the 6600 could only perform half as many blends per clock as the 9700. Obviously the GT is clocked much higher, but then it's bandwidth is also lower. Bandwidth may not make a difference with a single texture, though. NV's 2xAA sample pattern and double-loop 4xAA makes a nice showing in the numbers. Finally, why am I whipping him in the shader department?

No real difference b/w D16 and D24S8, BTW.

Code:
    FillrateBenchmark(tm) 2004 - "easy benchmark series"

    Benchmark Main Program Version: FRB_V092
    Benchmark Date/Time : 2/4/2005 9:37:10 PM

                     System Information
-----------------------------------------------------------
        CPU : AMD Athlon(TM) XP 2400+
        GFX : RADEON 9700 PRO
         OS : Microsoft Windows 2000
   Settings : 1024x768  32 bits  D16  No AA

                      Benchmark Result
-----------------------------------------------------------
           FrameBuffer Clear : 4864 FPS
                  Color Fill : 2461.218 M-Pixel/s
                      Z Fill : 2156.711 M-Pixel/s
              Color + Z Fill : 1761.608 M-Pixel/s
              Single Texture : 2415.919 M-Pixel/s
  Single Texture Alpha Blend : 1600.546 M-Pixel/s
               Dual Textures : 1278.424 M-Pixel/s
             Triple Textures : 843.0551 M-Pixel/s
               Quad Textures : 634.1788 M-Pixel/s
    1 Floating Poing Texture : 1250.741 M-Pixel/s
              Render to Self : 2303.931 M-Pixel/s
               PS 1.1 Simple : 2420.952 M-Pixel/s
               PS 1.4 Simple : 2423.469 M-Pixel/s
               PS 2.0 Simple : 2443.601 M-Pixel/s
            PS 2.0 PP Simple : 2425.986 M-Pixel/s
     Customized Pixel Shader : 2481.35 M-Pixel/s
              PS 2.0 Complex : (Unsupported)
           PS 2.0 PP Complex : (Unsupported)
     PS 2.0 Massive Register : (Unsupported)
  PS 2.0 PP Massive Register : (Unsupported)
 PS 2.0 Sincos Procedure Tex : (Unsupported)
   PS 2.0 Per-Pixel Lighting : (Unsupported)
-----------------------------------------------------------
    * End of FillrateBenchmark Result

Code:
   Settings : 1024x768  32 bits  D16  2x FSAA

                      Benchmark Result
-----------------------------------------------------------
           FrameBuffer Clear : 61184 FPS
                  Color Fill : 1585.447 M-Pixel/s
                      Z Fill : 2010.749 M-Pixel/s
              Color + Z Fill : 1132.462 M-Pixel/s
              Single Texture : 1527.566 M-Pixel/s
  Single Texture Alpha Blend : 1517.499 M-Pixel/s
               Dual Textures : 1172.727 M-Pixel/s
             Triple Textures : 800.2731 M-Pixel/s
               Quad Textures : 609.013 M-Pixel/s

    1 Floating Poing Texture : 1162.661 M-Pixel/s
              Render to Self : 2237.032 M-Pixel/s
               PS 1.1 Simple : 1469.684 M-Pixel/s
               PS 1.4 Simple : 1452.068 M-Pixel/s
               PS 2.0 Simple : 1462.134 M-Pixel/s
            PS 2.0 PP Simple : 1469.684 M-Pixel/s
     Customized Pixel Shader : 1577.897 M-Pixel/s

Code:
   Settings : 1024x768  32 bits  D16  4x FSAA

                      Benchmark Result
-----------------------------------------------------------
           FrameBuffer Clear : 60953.6 FPS
                  Color Fill : 1442.002 M-Pixel/s
                      Z Fill : 1235.642 M-Pixel/s
              Color + Z Fill : 843.0551 M-Pixel/s
              Single Texture : 1389.154 M-Pixel/s
  Single Texture Alpha Blend : 1381.604 M-Pixel/s
               Dual Textures : 1155.111 M-Pixel/s
             Triple Textures : 805.3064 M-Pixel/s
               Quad Textures : 614.0461 M-Pixel/s
    1 Floating Poing Texture : 1155.111 M-Pixel/s
              Render to Self : 2239.968 M-Pixel/s
               PS 1.1 Simple : 1321.206 M-Pixel/s
               PS 1.4 Simple : 1316.172 M-Pixel/s
               PS 2.0 Simple : 1313.656 M-Pixel/s
            PS 2.0 PP Simple : 1321.206 M-Pixel/s
     Customized Pixel Shader : 1436.969 M-Pixel/s
 
Re: Everyone else is doing it....

Pete said:
Might be interesting to compare my numbers with aths', as I have a pretty similar system. Don't ask me what happened with the AA'ed buffer clears. I'm also not sure why his 6600GT is so competitive with mine in alpha blends, as I thought the 6600 could only perform half as many blends per clock as the 9700.
I don't trust those numbers, to speak freely. There is another fillrate tester at http://files.skenegroup.net/benchmark/FillrateTester.zip which numbers I can understand better.

Pete said:
Obviously the GT is clocked much higher, but then it's bandwidth is also lower. Bandwidth may not make a difference with a single texture, though.
NV's 2xAA sample pattern and double-loop 4xAA makes a nice showing in the numbers. Finally, why am I whipping him in the shader department?
Bandwidth is not the big problem of the 6600 GT, since it only limits quite old games. If there are some more (and high-quality filtered) textures in the game, and/or long arithmetic pixelshaders, the 6600 GT is GPU-clockrate-bound.

Since with its 8 Z-testing units, with 4x MSAA enabled only 2 px/clk can be rendered. That can limit the pixel output of simple pixelshaders. Very simple pixelshaders are even limited without any MSAA, since despite the 8 pixelpipes, the NV43 can only write up to 4 px/clk (with alphablending only 2 px.)

As much as I know, in pixelshader-business the 6600 is about the same ratio faster than a 9700/9800, as the GPU-clockrate is higher. In some cases, the use of the _PP flag can lead to some additional nice speed gains.

As targeting the midrange market, the 6600 is not build to deliver highest pixeloutput at any circumstances, but it looks like the card can take a big amount of workload without losing significantly performance. But on the other hand, 1280x960 already pushes the 6600 GT to the limits: With 4x AA and 8x AF enabled, the framerate is ok but not outstanding. (My old 5900 XT is cleary beaten anyway.)
 
There is another fillrate tester at http://files.skenegroup.net/benchmark/FillrateTester.zip which numbers I can understand better.

True. The PS results in MDolenc's fillrate tester seem way more realistic.

I can't say though that I get unrealistic comparetive results from noAA up to 4xAA:

unlocked 6800nonU@350/435MHz:

noAA:

Code:
                     System Information
-----------------------------------------------------------
        CPU : AMD Athlon(tm) XP 3000+
        GFX : NVIDIA GeForce 6800
         OS : Microsoft Windows XP
   Settings : 1024x768  32 bits  D24S8  No AA

                      Benchmark Result
-----------------------------------------------------------
           FrameBuffer Clear : 16076.8 FPS
                  Color Fill : 5458.467 M-Pixel/s
                      Z Fill : 10086.46 M-Pixel/s
              Color + Z Fill : 4497.132 M-Pixel/s
              Single Texture : 5440.851 M-Pixel/s
  Single Texture Alpha Blend : 2780.823 M-Pixel/s
               Dual Textures : 2785.856 M-Pixel/s
             Triple Textures : 1874.854 M-Pixel/s
               Quad Textures : 1414.319 M-Pixel/s
    1 Floating Poing Texture : 2785.856 M-Pixel/s
              Render to Self : 4086.301 M-Pixel/s
               PS 1.1 Simple : 5025.615 M-Pixel/s
               PS 1.4 Simple : 5028.132 M-Pixel/s
               PS 2.0 Simple : 5023.099 M-Pixel/s
            PS 2.0 PP Simple : 5020.582 M-Pixel/s
     Customized Pixel Shader : 5463.501 M-Pixel/s

2xAA:

Code:
                     System Information
-----------------------------------------------------------
        CPU : AMD Athlon(tm) XP 3000+
        GFX : NVIDIA GeForce 6800
         OS : Microsoft Windows XP
   Settings : 1024x768  32 bits  D24S8  2x FSAA

                      Benchmark Result
-----------------------------------------------------------
           FrameBuffer Clear : 8384 FPS
                  Color Fill : 5018.065 M-Pixel/s
                      Z Fill : 5010.516 M-Pixel/s
              Color + Z Fill : 4776.474 M-Pixel/s
              Single Texture : 4834.355 M-Pixel/s
  Single Texture Alpha Blend : 2498.967 M-Pixel/s
               Dual Textures : 2534.198 M-Pixel/s
             Triple Textures : 1693.66 M-Pixel/s
               Quad Textures : 1273.391 M-Pixel/s
    1 Floating Poing Texture : 2529.165 M-Pixel/s
              Render to Self : 4080.638 M-Pixel/s
               PS 1.1 Simple : 4326.005 M-Pixel/s
               PS 1.4 Simple : 4328.521 M-Pixel/s
               PS 2.0 Simple : 4331.039 M-Pixel/s
            PS 2.0 PP Simple : 4336.072 M-Pixel/s
     Customized Pixel Shader : 4992.899 M-Pixel/s

4xAA:


Code:
                     System Information
-----------------------------------------------------------
        CPU : AMD Athlon(tm) XP 3000+
        GFX : NVIDIA GeForce 6800
         OS : Microsoft Windows XP
   Settings : 1024x768  32 bits  D24S8  4x FSAA

                      Benchmark Result
-----------------------------------------------------------
           FrameBuffer Clear : 3366.4 FPS
                  Color Fill : 2780.823 M-Pixel/s
                      Z Fill : 2785.856 M-Pixel/s
              Color + Z Fill : 2529.165 M-Pixel/s
              Single Texture : 2785.856 M-Pixel/s
  Single Texture Alpha Blend : 2519.099 M-Pixel/s
               Dual Textures : 2783.34 M-Pixel/s
             Triple Textures : 1869.821 M-Pixel/s
               Quad Textures : 1411.803 M-Pixel/s
    1 Floating Poing Texture : 2775.791 M-Pixel/s
              Render to Self : 4280.917 M-Pixel/s
               PS 1.1 Simple : 2775.791 M-Pixel/s
               PS 1.4 Simple : 2773.274 M-Pixel/s
               PS 2.0 Simple : 2775.791 M-Pixel/s
            PS 2.0 PP Simple : 2770.757 M-Pixel/s
     Customized Pixel Shader : 2780.823 M-Pixel/s
 
No real difference b/w D16 and D24S8, BTW.

I get a difference:

D16:

Code:
CPU : AMD Athlon(tm) XP 3000+
        GFX : NVIDIA GeForce 6800
         OS : Microsoft Windows XP
   Settings : 1024x768  32 bits  D16  2x FSAA

                      Benchmark Result
-----------------------------------------------------------
           FrameBuffer Clear : 6476.8 FPS
                  Color Fill : 5390.52 M-Pixel/s
                      Z Fill : 5450.918 M-Pixel/s
              Color + Z Fill : 3789.973 M-Pixel/s
              Single Texture : 5111.179 M-Pixel/s
  Single Texture Alpha Blend : 2743.075 M-Pixel/s
               Dual Textures : 2783.34 M-Pixel/s
             Triple Textures : 1874.854 M-Pixel/s
               Quad Textures : 1411.803 M-Pixel/s
    1 Floating Poing Texture : 2778.307 M-Pixel/s
              Render to Self : 4072.25 M-Pixel/s
               PS 1.1 Simple : 4741.241 M-Pixel/s
               PS 1.4 Simple : 4748.791 M-Pixel/s
               PS 2.0 Simple : 4748.791 M-Pixel/s
            PS 2.0 PP Simple : 4746.275 M-Pixel/s
     Customized Pixel Shader : 5398.069 M-Pixel/s

D24S8:

Code:
                    System Information
-----------------------------------------------------------
        CPU : AMD Athlon(tm) XP 3000+
        GFX : NVIDIA GeForce 6800
         OS : Microsoft Windows XP
   Settings : 1024x768  32 bits  D24S8  2x FSAA

                      Benchmark Result
-----------------------------------------------------------
           FrameBuffer Clear : 8384 FPS
                  Color Fill : 5018.065 M-Pixel/s
                      Z Fill : 5010.516 M-Pixel/s
              Color + Z Fill : 4776.474 M-Pixel/s
              Single Texture : 4834.355 M-Pixel/s
  Single Texture Alpha Blend : 2498.967 M-Pixel/s
               Dual Textures : 2534.198 M-Pixel/s
             Triple Textures : 1693.66 M-Pixel/s
               Quad Textures : 1273.391 M-Pixel/s
    1 Floating Poing Texture : 2529.165 M-Pixel/s
              Render to Self : 4080.638 M-Pixel/s
               PS 1.1 Simple : 4326.005 M-Pixel/s
               PS 1.4 Simple : 4328.521 M-Pixel/s
               PS 2.0 Simple : 4331.039 M-Pixel/s
            PS 2.0 PP Simple : 4336.072 M-Pixel/s
     Customized Pixel Shader : 4992.899 M-Pixel/s
 
Back
Top