PCGH - Pixelshader-Shootout: RV770 vs. GT200

Jawed

Legend
This is a rather fascinating comparison of the way pixel shaders from selected games run on the newest GPUs:

http://www.pcgameshardware.com/aid,654394/Reviews/Pixelshader-Shootout_RV770_vs_GT200/&page=1

We have normalized the results to a Geforce 7900 GT putting it's results of each measured shader at an abstract 1. From these achieved relative performances in thousands of shaders, we are showing the averaged means per game. So if a card has a score of say 5 it means, that the average of each shader tested for a single game's shader programs, is five times as high as a Geforce 7900 GT.
There are lots of subtleties in the testing method, so it's best to read the entire article.

The overall aim is to try to isolate the mathematical throughput, GFLOPs if you like, of each GPU. They are trying to remove the influence of texturing, and with unified GPUs, the influence of the vertex workload.

I wish that the numerical values for each point on these graphs was given - it's rather fiddly trying to interpolate the numbers. With the numerical values it'd be easier to make comparisons of scaling - e.g. it appears that GT200 is gaining absolutely nothing from the prodigal MUL.

One of the key things I'm wary of in these results relates to texture coordinate interpolation. All the NVidia GPUs do interpolation in the ALUs - but these tests appear to remove texture coordinate interpolation from all tested shaders, by using a "small texture" (presumably repeated fetches from constant coordinates). Other vertex attributes are, presumably, still present. Maybe the data confirms or denies my suspicion, but so far I haven't worked this out.

Jawed
 
Neat article, but there are some flaws.

In addition to textures consuming no BW, the most important issue is that they are all single cycle bilinear RGBA8 fetches. Another big factor is that all shaders in a game are given the same importance, regardless of how much they are actually used in a game. They are weighted by relative improvement over a 7900GT, meaning any shader that the 7900GT was bad at gets the most relevence. Finally, any shader with dynamic branching will not be characterized well without the original texture and vertex data.

Having said that, there are some interesting things to note. Aside from a couple games, RV670 holds up quite well to G80 (the lack of trilinear/aniso is probably a big reason). There are a number of tests where RV770 is less than twice as fast as RV670, suggesting that those games have a lot of shaders that are short enough to be fillrate limited on RV770.

BTW, Jawed, I think you're wrong about the interpolation. They are small textures, but the the vertex shader still outputs the attributes specified by the shader, and they probably vary across the screen like most fullscreen quad renders.
 
In addition to textures consuming no BW, the most important issue is that they are all single cycle bilinear RGBA8 fetches.
They are just trying to measure ALU utilisation, so all other latencies/throughputs are ignored.

There are a number of tests where RV770 is less than twice as fast as RV670, suggesting that those games have a lot of shaders that are short enough to be fillrate limited on RV770.
Hmm, didn't even think of that. I suppose if you felt like drudging through thousands of shaders you could put them through GPUSA and remove all of them that are less than 10 ALU instruction groups (so as not to be fillrate limited on RV770) - even with GPUSA's command line option that'd be a tedious task :LOL:

BTW, Jawed, I think you're wrong about the interpolation. They are small textures, but the the vertex shader still outputs the attributes specified by the shader, and they probably vary across the screen like most fullscreen quad renders.
I was kinda hoping that the data would contain a clue on this subject.

Anyway, you saying that has made me wonder if the huge margin between X1800XT and 7900GT is indicating interpolation on 7900GT's shaders. It's running at half the utilisation of X1800XT :oops:

Jawed
 
They are just trying to measure ALU utilisation, so all other latencies/throughputs are ignored.
I suppose, but when playing games you will need more TF cycles and thus less ALU usage.

Anyway, you saying that has made me wonder if the huge margin between X1800XT and 7900GT is indicating interpolation on 7900GT's shaders. It's running at half the utilisation of X1800XT :oops:
Not per mm2. ;)

Looking at percent of total GFLOPs is a useless metric, particularly because the 7900GT uses part of it for texture calcs whereas the X1800XT's GFLOPs figure in that review doesn't include TA ability or the extra ADD. In an apples-to-apples comparison, the X1800XT has 93% of the 7900GT's math ability.
 
shader wise is it not true to say the ati 4000 series is like netburst (lots of shaders that do very little) and the nv 200 seies is like the athlon (fewer but they do so much more) ?
 
Not per mm2. ;)
No, and now they've swapped roles, with NVidia having the most expensive scheduling hardware.

Looking at percent of total GFLOPs is a useless metric, particularly because the 7900GT uses part of it for texture calcs whereas the X1800XT's GFLOPs figure in that review doesn't include TA ability or the extra ADD. In an apples-to-apples comparison, the X1800XT has 93% of the 7900GT's math ability.
Yeah, agreed, it's a bit fake when only counting MADs - and there'll never be agreement on how to define the theoretical capability of a GPU anyway.

Though it does appear to indicate that texture coordinate interpolation is retained in the shaders under test.

Hmm, I wonder if they could be persuaded to test 9600GT, which has ~ same colour fillrate as HD4850. Though I suppose there's still a question over how much of the fillrate limited games are colour-fillrate as opposed to Z-fillrate limited. Hmm.

Jawed
 
shader wise is it not true to say the ati 4000 series is like netburst (lots of shaders that do very little) and the nv 200 seies is like the athlon (fewer but they do so much more) ?
Or you can look at it the other way around. GT200 has 10 highly clocked SIMD processors that can't do much per clock (24 MAD), whereas RV770 has 10 SIMD processors that are smaller and lower clocked but have higher IPC (80 MAD). ;)
 
However you want to look at it, it seems to me that AMD has this time (finally) found a nice balance between ALUs and control logic.
 
If the low points for the ATI GPUs represent fillrate-limited scenarios then these can only be Z-only fillrate scenarios, can't they?

If so, then with essentially no texturing memory bandwidth, it seems that NVidia's ROPs are speeding up massively (the insane Z capability is showing off).

In a game like NFS:MW, 8800Ultra is almost 3x the speed of 7900GT. Colour fillrate is 2x and bandwidth is 2.5x, so it seems like Z rate.

GTX280 is ~4.8x faster than 7900GT. Colour fillrate is 2.7x and bandwidth is 3.3x.

Anno 1701 is very similar, except GTX280 is even further ahead. TD:U appears to favour 8800U more than GTX280.

Also I'm wondering what happens to "driver optimisations" when the rendering workload just loses texturing.

Jawed
 
If the low points for the ATI GPUs represent fillrate-limited scenarios then these can only be Z-only fillrate scenarios, can't they?
Why do you say that? There's no Z-only rendering here. They're shader tests run on fullscreen quads.

Remember that each data point on the graph involves many shaders. They're not all fillrate limited on RV770. If you're wondering why the 8800U beats the 4850 twice, it's due to the color fillrate.

In a game like NFS:MW, 8800Ultra is almost 3x the speed of 7900GT. Colour fillrate is 2x and bandwidth is 2.5x, so it seems like Z rate.
Or, for example, 60% of the shaders are fillrate limited and 40% are ALU limited, with the latter running 4x as fast on the 8800U than the 7900 GT (very possible).

Hence the 8800U scoring 3x the 7900GT, and the 4870 scoring only 1.5x the 3870.
 
However you want to look at it, it seems to me that AMD has this time (finally) found a nice balance between ALUs and control logic.
It's not so much balance as it is efficient design. RV670 is pretty balanced, IMO, but each functional unit is just not as fast and/or as small as it is on RV770. If RV770 was 350 mm2 but still had the same "balance", it wouldn't be nearly as impressive.
 
Why do you say that? There's no Z-only rendering here. They're shader tests run on fullscreen quads.
The clarity of morning makes me agree. I was thinking that they would have retained the Z-only shaders and the corresponding render state from the game.

Remember that each data point on the graph involves many shaders. They're not all fillrate limited on RV770. If you're wondering why the 8800U beats the 4850 twice, it's due to the color fillrate.

Or, for example, 60% of the shaders are fillrate limited and 40% are ALU limited, with the latter running 4x as fast on the 8800U than the 7900 GT (very possible).
I was thinking something similar about the hiding effects of the averaging last night but wanted to go to bed...

Jawed
 
ang on according to wiki it has 800 sp's and the gt280 has 240 sp's
Sure, but that's just a marketing description. It's like saying Cell has 32 SPs as opposed to 8 SPUs.

In any case, having a higher clock and lower IPC is the hallmark of netburst, not the athlon. Your analogy is backwards.
 
Hi there - sorry for not responding earlier to the thread about our article but i've been quite busy at work. :(

Nevertheless, I'd like to comment on several things:
- The "full results" were not given, because i simply can't think of any sane way to present the (absolute) fillrates of about 5.000 shaders (including some unpublished results which were not representative probably because of MRT-stuff only being rendered in one channel on G80 and up) per card. :)

If you'd like to dig through that data though, PM me and I'll mail you the spreadsheet - you can then go and filter out the fillrate-bound shaders yourself ;) :D

- About "by using a "small texture" (presumably repeated fetches from constant coordinates)" I must confess I am not sure - I'll have to check back with the author of our tool. Actually I think that it's a real texture, not constants. Same thing for the Z-Pass (but then - I am not sure, if any of the games use a pre-Z-pass at all [in fact I do not think so])

- 9600GT: Maybe I can borrow one from work and run the tests over the weekend. You'd have to content with it being thrown in the spreadsheet though. ;)

- "They are weighted by relative improvement over a 7900GT, meaning any shader that the 7900GT was bad at gets the most relevence." True - but then this is what we have included a reference to X1800 XT there - both cards do not differ that much from one another (on average), that it'd be a major problem.


Any other comments, suggestions or questions? I'd be happy to discuss the article further and learn some more things from the discussion!​
 
Hi there - sorry for not responding earlier to the thread about our article but i've been quite busy at work. :(
It's a nice article, even if it's quite difficult to draw information out of the data! The fillrate bottleneck makes things quite difficult.

The "full results" were not given, because i simply can't think of any sane way to present the (absolute) fillrates of about 5.000 shaders (including some unpublished results which were not representative probably because of MRT-stuff only being rendered in one channel on G80 and up) per card. :)
I meant the data points on the graphs. e.g. instead of trying to work out if a datapoint is 3.4 or 3.6, just having the number makes life a lot easier :smile:

If you'd like to dig through that data though, PM me and I'll mail you the spreadsheet - you can then go and filter out the fillrate-bound shaders yourself ;) :D
Worth a try...

- 9600GT: Maybe I can borrow one from work and run the tests over the weekend. You'd have to content with it being thrown in the spreadsheet though. ;)
In for a penny, in for a pound!

Jawed
 
I meant the data points on the graphs. e.g. instead of trying to work out if a datapoint is 3.4 or 3.6, just having the number makes life a lot easier :smile:

Easy enough:
Code:
	                    HD2900XT  HD3870  HD4870  HD4850  GTX260  GTX280  8800 U.  7900 GT
Theoretical GFLOPs (MADD)	2,75	2,87	6,94	5,79	2,76	3,60	2,24	1,00
GRAW 2	                        1,98	2,09	3,08	2,58	6,45	7,97	4,36	1,00
Rainbow Six: Vegas	        3,53	3,39	5,56	4,65	6,28	7,91	5,08	1,00
Call of Duty 4	                3,79	3,74	7,66	6,40	5,31	6,82	3,78	1,00
Gothic 3	                3,78	3,92	6,99	5,83	4,80	6,18	3,61	1,00
Anno 1701	                2,17	2,27	3,92	3,28	4,52	5,71	3,16	1,00
Test Drive Unlimited	        2,86	2,98	4,00	3,34	4,39	5,54	3,58	1,00
Call of Juarez	                2,58	2,65	4,52	3,77	4,23	5,39	3,24	1,00
Race Driver Grid            	3,06	3,19	6,23	5,21	3,95	5,02	2,78	1,00
Oblivion	                2,63	2,74	5,06	4,22	3,92	4,97	2,95	1,00
Stalker	                        1,81	1,86	2,62	2,19	3,89	4,91	3,12	1,00
Age of Conan	                2,78	2,85	4,31	3,60	3,77	4,77	2,71	1,00
NfS Most Wanted	                2,02	2,09	3,10	2,59	3,77	4,75	2,83	1,00
NfS Carbon	                2,12	2,21	3,90	3,26	3,48	4,35	2,56	1,00
Average	                        2,70	2,77	4,69	3,92	4,52	5,71	3,37	1,00
(rounded values)
 
Awesome, so in terms of the achieved scaling as a percentage of theoretically expected scaling, for HD4870 in comparison with HD3870:

Code:
GRAW 2                61%
Rainbow Six: Vegas    68%
Call of Duty 4        85%
Gothic 3              74%
Anno 1701             71%
Test Drive Unlimited  56%
Call of Juarez        71%
Race Driver Grid      81%
Oblivion              76%
Stalker               58%
Age of Conan          63%
NfS Most Wanted       61%
NfS Carbon            73%
Average               70%

Jawed
 
Last edited by a moderator:
Any ideas why R6:V and CoD4 are slower on HD3870 than HD2900XT? Is bandwidth really coming into play? Particularly with CoD4 which seems to be fairly arithmetic intensive, I'd expect bandwidth to be practically impossible as a factor.

Some kind of colour compression effect - these games are producing relatively incompressible pixels?

Jawed
 
Back
Top