AMD vs. Nvidia: Per Unit/clock performance of recent architectures

CarstenS

Moderator
Moderator
Legend
Supporter
Starting from an off-topic discussion here, I'd like to use this thread for getting more insights into what the individual units in recent AMD and Nvidia GPU architectures can do per clock, essentially trying to assess their efficiency - on the first hand per clock but also per mm² if data's available.

For starters, I have run MDolencs fillrate tester with recent drivers on a GTX280 and a HD 4890 (same res, same formats, Vista x64) at approximately the same clock rates for GPU/TMU-Domain.

Code:
Fillrate Tester
--------------------------
Display adapter: ATI Radeon HD 4800 Series (HD4870 1G)
Driver version: 7.14.10.643
Display mode: 1600x1200 A8R8G8B8 60Hz
Z-Buffer format: D24S8
--------------------------

FFP - Pure fillrate - 11613.801758M pixels/sec
FFP - Z pixel rate - 35630.382813M pixels/sec
FFP - Single texture - 11591.027344M pixels/sec
FFP - Dual texture - 11590.750000M pixels/sec
FFP - Triple texture - 7828.766113M pixels/sec
FFP - Quad texture - 5901.380371M pixels/sec
PS 1.1 - Simple - 11592.215820M pixels/sec
PS 1.4 - Simple - 11592.044922M pixels/sec
PS 2.0 - Simple - 11592.227539M pixels/sec
PS 2.0 PP - Simple - 11592.182617M pixels/sec
PS 2.0 - Longer - 11591.844727M pixels/sec
PS 2.0 PP - Longer - 11592.151367M pixels/sec
PS 2.0 - Longer 4 Registers - 11591.844727M pixels/sec
PS 2.0 PP - Longer 4 Registers - 11591.825195M pixels/sec
PS 2.0 - Per Pixel Lighting - 7674.044922M pixels/sec
PS 2.0 PP - Per Pixel Lighting - 7674.179688M pixels/sec

Code:
Fillrate Tester
--------------------------
Display adapter: NVIDIA GeForce GTX 280 (@756-/1.512/1.296)
Driver version: 7.15.11.8206
Display mode: 1600x1200 A8R8G8B8 60Hz
Z-Buffer format: D24S8
--------------------------

FFP - Pure fillrate - 22968.218750M pixels/sec
FFP - Z pixel rate - 94500.023438M pixels/sec
FFP - Single texture - 23033.056641M pixels/sec
FFP - Dual texture - 23006.001953M pixels/sec
FFP - Triple texture - 17711.222656M pixels/sec
FFP - Quad texture - 13282.576172M pixels/sec
PS 1.1 - Simple - 23028.386719M pixels/sec
PS 1.4 - Simple - 23037.162109M pixels/sec
PS 2.0 - Simple - 23028.552734M pixels/sec
PS 2.0 PP - Simple - 23030.312500M pixels/sec
PS 2.0 - Longer - 18745.587891M pixels/sec
PS 2.0 PP - Longer - 18746.000000M pixels/sec
PS 2.0 - Longer 4 Registers - 18329.902344M pixels/sec
PS 2.0 PP - Longer 4 Registers - 18329.849609M pixels/sec
PS 2.0 - Per Pixel Lighting - 7094.438477M pixels/sec
PS 2.0 PP - Per Pixel Lighting - 7095.013672M pixels/sec

If i find the time today, i'll repeat this with a higher resolution and a memory overclocked HD 4870 - after all we want the individual units to shine, don't we?
 
Last edited by a moderator:
*post reserved*

I intend on using parts of the results of MDolencs fillrate tester, parts of the archmark results, and texturing fillrate tests from 3dmark03 and 06. either rv770 and gt200 will be run with clockrates changed, so as to supply 3,6 bytes per texel per clock for each texture filter (quad) present.

Before i start switching both architectures hence and forth - has anyone ideas which other tests (preferrably run under Windows XP) might provide useful results wrt to the performance of texture units?

edit1:
System: C2D E8500@ 3,8 GHz (400x9,5), DDR2-800 5.5.5.12, Windows XP SP3, Geforce 185.68 Beta, Catalyst 9.4 WHQL)
Cards:
Geforce GTX280 (Zotac AMP) @ 576/1.404/1.296 (should equate to 3,6 (theoretical) Bytes per (theoretical) Texel)
Radeon HD 4870 1G (Powercolor) @750/843,75 [as reported in RT] (should equate to 3,6 (theoretical) Bytes per (theoretical) Texel)
--
Theoretical Numbers with mentioned clocks for the whole chips (GT200/RV770)
Color-Fill: 18.432 / 12.000 (mostly INT-Formats)
Tex-Fill: 46.080 / 30.000 (mostly INT-Formats)
Z-Fill: 147.456 / 48.000 (mostly INT-Formats)
AA-Fill: 73.728 / 48.000 (mostly INT-Formats)
Bandwidth: 165.888 / 108.000
--
Available Bytes per unit per clock:
ROP: 9 / 9
TEX: 3,6 / 3,6
--

Colorfill (%)
-------------
Archmark 99,50 / 97,95
MDolenc 98,48 / 95,88
GZ easy 81,83 / 96,40


Z-Fill (%)
--------------
Archmark 95,09 / 101,84 (sic!)
MDolenc 52,32 / 44,66
GZ easy 19,67 / 67,13


Tex-Fill, Single (%)
------------
Archmark 33,01 / --
MDolenc 39,37 / 38,39
GZ easy 32,78 / 30,41
3DM03 36,77 / 38,22
3DM06 37,12 / 38,92

Tex-Fill, Multi/Quad (%)
------------
Archmark 69,34 / --
MDolenc 94,69 / 78,00 (97,50)
GZ easy 85,16 / 70,12 (87,65)
3DM03 95,26 / 76,87 (96,08)
3DM06 98,86 / 79,71 (99,63)

So it looks like Pixel and Z go to Nvidia (at least without MSAA) and Tex goes to AMD as long as we use their maximum number of Texture Interpolators (32 in RV770) and not the filtering units as a base (numbers in brackets for Multi-Texturing).

edit2:
After having completed the additional GZ Easy Fillrate Tests, it seems, that test and the NV-arch bear no love for each other - on average the GT200 scores subpar with this test.

What's evident though is, that apparently RV770 is limited to 24 GTex-Fillrate in multitexturing also which is news to me, since i thought those tests were designed to measure filtering and not adressing throughput.
 
Last edited by a moderator:
Hm, why not jump to FillrateBenchamrk 2004 (latest version is 0.92)?
The only drawback is the resolution limit of 1024*768.
 
Done - but I'm afraid, there's no radically new insights gained from this test apart from that GT200 scores sub par in it. :)
 
It is sad, that geometry and texturing test in Archmark are still broken on Radeon drivers, since R600. :(
 
Back
Top