PDA

View Full Version : Xbit have their GT200/RV770 Review online - interesting results


CarstenS
04-Aug-2008, 10:27
Here it is:
http://www.xbitlabs.com/articles/video/display/geforce-gtx200-theory.html

Interesting parts include:
http://www.xbitlabs.com/articles/video/display/geforce-gtx200-theory_12.html#sect0
Traditionally, they did a VGA-only measure of power consumption

And their theoretical part:
http://www.xbitlabs.com/articles/video/display/geforce-gtx200-theory_15.html#sect0
There, GTX200 receives a fair and square slapping by RV770. But judging from my own testing results, GTX200 gets better as resolution increases.

For example, in 640x480 all Radeons since HD2900 are sometimes twice as fast, whereas in 2560x1600, GTX280 can keep up with HD 4870 most of the time (and winning about as many tests as it looses)

Kaotik
04-Aug-2008, 13:21
Related to their power consumption reports, according to hardware-infos.com Catalyst 8.8's will bring the fixes/improvements/whatever we've been waiting for powerplay features of HD4800's.
A machine with idle consumption of 211W dropped to 174W when Cat 8.8's were installed.

http://translate.google.com/translate?u=http%3A%2F%2Fwww.hardware-infos.com%2Fnews.php%3Fnews%3D2269&hl=fi&ie=UTF8&sl=de&tl=en

CarstenS
04-Aug-2008, 13:27
Yeah, that was for a machine with crossfire installed. Another german site (more credible IMO than hw-infos) also says, the alleged 8.8 beta doesn't do squat wrt power consumption. (http://www.forumdeluxx.de/forum/showthread.php?p=9740982&posted=1#post9740982)

Well, we shall see and I certainly hope the best for AMD.

AnarchX
04-Aug-2008, 13:40
Did not NV promise 25W idle consumption for GTX 200 or is this a future target? :???:

CarstenS
04-Aug-2008, 14:22
Did not NV promise 25W idle consumption for GTX 200 or is this a future target? :???:
No, actually they did.

wishiknew
04-Aug-2008, 15:08
Was the article taken down? No go for me.

CarstenS
04-Aug-2008, 15:16
Works like a charm - at least from where I'm sitting.

Jawed
04-Aug-2008, 15:37
And their theoretical part:
http://www.xbitlabs.com/articles/video/display/geforce-gtx200-theory_15.html#sect0
There, GTX200 receives a fair and square slapping by RV770. But judging from my own testing results, GTX200 gets better as resolution increases.
The XBitMark dynamic branching tests are a complete whitewash, which is pretty surprising given the difference in batch sizes.

The overall feel of the article seems to be that NVidia's ALU+TEX architecture is a dead-end as it scales so poorly.

Anyway, we know from other tests (ixbt's GS tests, 3DMk Vantage's GPU Cloth) that there are some big advantages for NVidia's architecture. And obviously there are some games that are considerably faster on GTX280.

Jawed

Mintmaster
04-Aug-2008, 17:17
It's about damn time that XBitLabs got a GT200 based card!

The XBitMark dynamic branching tests are a complete whitewash, which is pretty surprising given the difference in batch sizes.I think it has more to do with the shaders used in the branches. The wood shader and Bump+spec/Bump+spec+refl are already way faster on RV770, and I'd expect the Mandelbrot shader to do even better.

A factor of 2 with batch size isn't that important. It makes a difference for really branchy code, but I doubt that this test has anything like that.

There's something seriously wrong with the GTX 280 in this review. There's no reason for it not to be 30% faster than the GTX 260 in pixel shader benchmarks.

Jawed
04-Aug-2008, 19:21
I think it has more to do with the shaders used in the branches. The wood shader and Bump+spec/Bump+spec+refl are already way faster on RV770, and I'd expect the Mandelbrot shader to do even better.
Wouldn't the pure math of Mandlebrot show a smaller difference? Hmm, I've just realised that RV770 has a far higher transcendental throughput than GT200. So prolly not.

GT200 has 30 SMs, each of which has 2 transcendental units. So 60 Ts at 1300MHz is 78G instructions per second, assuming 1 T per clock. RV770 has 160 Ts at 750MHz, which is 120G instructions per second, 54% more :shock:

Looking at these tests, I'm starting to wonder if NVidia's texture cache system has reached the end of the road. 10 clusters <-> 8 L2s appears to be completely clogging up, e.g. the Factored BRDF test runs only 1% faster on GTX280 than on 9800GTX. Now I really don't know what that shader's doing (why can't we look at these shaders?) but 1% for a shader that at worst is TEX bound seems incredible.

Prolly the driver?

A factor of 2 with batch size isn't that important. It makes a difference for really branchy code, but I doubt that this test has anything like that.
Hmm, well I suppose if there's no nest depth then there's not enough incoherency for GT200 to show an advantage.

Anyone know of a test that does nested DB?

Jawed

Kaotik
05-Aug-2008, 01:34
Yeah, that was for a machine with crossfire installed. Another german site (more credible IMO than hw-infos) also says, the alleged 8.8 beta doesn't do squat wrt power consumption. (http://www.forumdeluxx.de/forum/showthread.php?p=9740982&posted=1#post9740982)

Well, we shall see and I certainly hope the best for AMD.

There's apparently more than one "8.8 beta" set floating around, which of some might and some might not contain powerplay fixes?

According to user on another forum, there's couple sets spreaded here:
http://www.ati-forum.de/allgemein/news/p4409-catalyst-8-8-beta-aufgetaucht-catalyst-8-8-beta-found/#post4409
(the newer mentions "maybe without powerplay function")
then there's apparently yet newer set found via guru3d (scroll down more, after the screenshots there's the links I think):
http://forums.guru3d.com/showthread.php?p=2795279

I haven't tested any of these personally, just going by what some users have said about the sets.

fellix
05-Aug-2008, 06:59
PowerPlay in that release refers to CrossFire setups. Single cards are already fine with that.

By the way, the leaked 8.52.2 driver significantly boosts PS4 Texturing scores in D3DRightMark 2.0 (POM & Fur), on my 4870.

CarstenS
05-Aug-2008, 07:34
PowerPlay in that release refers to CrossFire setups. Single cards are already fine with that.

By the way, the leaked 8.52.2 driver significantly boosts PS4 Texturing scores in D3DRightMark 2.0 (POM & Fur), on my 4870.

That's good to hear - the HD4800 were really lacking there. Do you mean a percentage or a factor increase btw?

fellix
05-Aug-2008, 07:48
Two-fold to be exact, with both on and off 4xSSAA. The rest of the sub-tests also got minor speedups. :shock:

The GPU Cloth feature test in Vantage is 2 FPS on average faster, now. POM also seems smoother, too.

I read reports, for additional FPS increase for CF setups in Crysis, so 4870 X2 is looking to kick around nicely, when Cat 8.8 WHQL is out, I guess. ;)

Mintmaster
05-Aug-2008, 10:51
Wouldn't the pure math of Mandlebrot show a smaller difference? Hmm, I've just realised that RV770 has a far higher transcendental throughput than GT200. So prolly not.I guess you don't know your Mandelbrot! It's a pure MADD test, which is RV770's biggest strength. However, I forgot that it's quite dependent, so it'll only get maybe 50% utilization. My bad.

Looking at these tests, I'm starting to wonder if NVidia's texture cache system has reached the end of the road. 10 clusters <-> 8 L2s appears to be completely clogging up, e.g. the Factored BRDF test runs only 1% faster on GTX280 than on 9800GTX. Now I really don't know what that shader's doing (why can't we look at these shaders?) but 1% for a shader that at worst is TEX bound seems incredible.I guess that's a possibility, but it seems like a rather bizarre flaw to me. You don't see this in games or Shadermark. My guess is that they tested wrong or got a faulty card.

Anyone know of a test that does nested DB?There were some tests from PowerVR doing voxel rendering or something that really stressed DB. Another crazy DB test is the Quaternion Julia Set raytracer (http://www.devmaster.net/forums/showthread.php?t=4448).

Jawed
05-Aug-2008, 12:17
I guess you don't know your Mandelbrot! It's a pure MADD test, which is RV770's biggest strength. However, I forgot that it's quite dependent, so it'll only get maybe 50% utilization. My bad.
That wasn't my point. Bump+specular+reflection runs 77% faster on HD4870 and wood is 46% faster. In pure math terms, Mandlebrot could only be 28% faster on HD4870, so Mandlebrot would slow things down in comparison with GTX280.

I just put Humus's Mandlebrot into GPUSA - 41% utilisation. So GTX280 would run Mandlebrot alone ~2.3x faster.

I guess that's a possibility, but it seems like a rather bizarre flaw to me. You don't see this in games or Shadermark. My guess is that they tested wrong or got a faulty card.
Looking at the two Mandlebrot DB tests, it's notable that the second "+ 10 textures" is faster on GT200, 10.6%, while on 9800GTX it's slower, 83.7%. Similarly on RV770, the second test is slower, 87.2%. So, why does adding what may be 10 simple surface textures make the shader run faster on GT200?

Going back into the mists of time, look at X1800XT and 7800GTX:

http://www.xbitlabs.com/articles/video/display/geforce7800gtx512_9.html#sect0

there's something about this test that appears to favour ATI (though you might argue this is just an example of ATI's better latency-hiding with out of order ALU and TEX scheduling). X1900XTX and X1950XTX v 7900GTX and 7950GTX:

http://www.xbitlabs.com/articles/video/display/ati-x1950xtx_10.html#sect2

There were some tests from PowerVR doing voxel rendering or something that really stressed DB. Another crazy DB test is the Quaternion Julia Set raytracer (http://www.devmaster.net/forums/showthread.php?t=4448).
Looks great. I even got the cg to compile in GPUSA without any editing, which was a pleasant surprise. Yes, quite a bit of DB in there.

Sadly it won't execute, I get a "bad argument -unroll" error from cgc, after I downloaded cg.dll, cgGL.dll and glut32.dll.

Jawed

CarstenS
05-Aug-2008, 14:08
There's apparently more than one "8.8 beta" set floating around, which of some might and some might not contain powerplay fixes?

According to user on another forum, there's couple sets spreaded here:
http://www.ati-forum.de/allgemein/news/p4409-catalyst-8-8-beta-aufgetaucht-catalyst-8-8-beta-found/#post4409
(the newer mentions "maybe without powerplay function")
then there's apparently yet newer set found via guru3d (scroll down more, after the screenshots there's the links I think):
http://forums.guru3d.com/showthread.php?p=2795279

I haven't tested any of these personally, just going by what some users have said about the sets.
I've downloaded them all (i think) and they're all giving me the same build-nr. 8.52.2-080722a-066081E
D3D: 0604
OGL: 7869

Or did i miss one?

Kaotik
05-Aug-2008, 14:24
I've downloaded them all (i think) and they're all giving me the same build-nr. 8.52.2-080722a-066081E
D3D: 0604
OGL: 7869

Or did i miss one?

Might be that older links have been all updated to the newest set aswell? There's too many reports saying there's at least 2 different ones

Mintmaster
06-Aug-2008, 02:00
That wasn't my point. Bump+specular+reflection runs 77% faster on HD4870 and wood is 46% faster. In pure math terms, Mandlebrot could only be 28% faster on HD4870, so Mandlebrot would slow things down in comparison with GTX280.You're assuming full use of the second MUL, but you'll have more ADDs than MULs when iterating the MandelBrot set. I would say that in pure math terms the 4870 is twice as fast as the GTX 280.

Once you take VLIW utilization into account, though, we're back to parity. Like I said before, I forgot about this, so I was wrong about it explaining higher perf in the DB tests.

Looking at the two Mandlebrot DB tests, it's notable that the second "+ 10 textures" is faster on GT200, 10.6%, while on 9800GTX it's slower, 83.7%. Similarly on RV770, the second test is slower, 87.2%. So, why does adding what may be 10 simple surface textures make the shader run faster on GT200?I was talking about GTX 280 vs. GTX 260. The results just don't make sense.

For the 10 texture stuff, IMO it's not worth analyzing without seeing the shader code.

Sadly it won't execute, I get a "bad argument -unroll" error from cgc, after I downloaded cg.dll, cgGL.dll and glut32.dll.I think I converted it to HLSL at one point, but that computer is down right now. I'll keep you posted.

Jawed
06-Aug-2008, 12:31
You're assuming full use of the second MUL, but you'll have more ADDs than MULs when iterating the MandelBrot set. I would say that in pure math terms the 4870 is twice as fast as the GTX 280.
Worse, I forgot NVidia is MAD+MUL and was thinking of it as just MAD. In the assemby for ATI there's only one MUL instruction, with a pile of MADs and 2 ADDs.

Once you take VLIW utilization into account, though, we're back to parity. Like I said before, I forgot about this, so I was wrong about it explaining higher perf in the DB tests.
Yeah, that sounds right.

I was talking about GTX 280 vs. GTX 260. The results just don't make sense.
That's 3.4% for Factored BRDF.

For the 10 texture stuff, IMO it's not worth analyzing without seeing the shader code.
Maybe the answer lies here:


ATI Catalyst:
Catalyst A.I.: Standard
Mipmap Detail Level: High Quality
High Quality AF: On
Wait for vertical refresh: Always Off
Enable Adaptive Anti-Aliasing: On/Quality
Method: Multi-sampling
Temporal Anti-Aliasing: Off
Other settings: defaultNvidia GeForce:
Texture filtering – Quality: High quality
Texture filtering – Trilinear optimization: Off
Texture filtering – Anisotropic sample optimization: Off
Vertical sync: Force off
Antialiasing - Gamma correction: On
Antialiasing - Transparency: Multisampling
Other settings: defaultIt's infuriating that no-one else has access to these synthetics.

I think I converted it to HLSL at one point, but that computer is down right now. I'll keep you posted.
I wonder if it's worth me tangling with it in VS2008...

Jawed

ShaidarHaran
06-Aug-2008, 14:22
There goes Xbit again, enabling transparency adaptive AA on Radeons and disabling it on Geforces.

Does anyone know why they test like this?

Jawed
06-Aug-2008, 14:26
There goes Xbit again, enabling transparency adaptive AA on Radeons and disabling it on Geforces.
Transparency MSAA is turned on on NVidia. What I don't understand is if "Quality" on ATI means supersampling, or if it means something else.

Jawed

Kaotik
06-Aug-2008, 14:34
Transparency MSAA is turned on on NVidia. What I don't understand is if "Quality" on ATI means supersampling, or if it means something else.

Jawed

Quality in CCC would be supersampling, it has "Performance" and "Quality" options, while multisampling has slider from smooth to sharp

fellix
06-Aug-2008, 14:37
Depend on which driver release you are looking at.
In the Cat 8.8 beta it's only the Performance/Quality slider available, no MS/SS option to pick.

As I examined it under FSAA Viewer, Performance AdAA = half of the MSAA sample count, while Quality AdAA = 1:1 match of the active MSAA mode.

ShaidarHaran
06-Aug-2008, 14:37
Transparency MSAA is turned on on NVidia. What I don't understand is if "Quality" on ATI means supersampling, or if it means something else.

Jawed

LOL, doh! Sorry folks, nothing to see here. Move along.

Kaotik
06-Aug-2008, 15:46
Depend on which driver release you are looking at.
In the Cat 8.8 beta it's only the Performance/Quality slider available, no MS/SS option to pick.

As I examined it under FSAA Viewer, Performance AdAA = half of the MSAA sample count, while Quality AdAA = 1:1 match of the active MSAA mode.

They used Catalyst 8.6, which has the option to pick between Super- and Multisampling, and in which "Quality" -setting is only seen in Supersampling

CarstenS
07-Aug-2008, 09:25
We have observed differences between XP and Vista. The latter lets you choose between Quality and Performance, while the former offers Multi- and Supersampling.

However, in our testing under Vista x64 with CoD4, both options produced identical visuals and Fps, indicating, that only one option worked. Judging from comparision to Nvidias TMSAA and TSSAA, it looked a lot like Multisampling was used on the HD4800.

Kaotik
07-Aug-2008, 12:30
We have observed differences between XP and Vista. The latter lets you choose between Quality and Performance, while the former offers Multi- and Supersampling.

However, in our testing under Vista x64 with CoD4, both options produced identical visuals and Fps, indicating, that only one option worked. Judging from comparision to Nvidias TMSAA and TSSAA, it looked a lot like Multisampling was used on the HD4800.

Hum? Vista x64 + HD3850 + Cat 8.6 gives at least options for Multi- and Supersampling, with Performance & Quality options for Super and slider from Smooth to Sharp on Multisampling.

Jawed
07-Aug-2008, 12:38
They've now published their game benchmarks:

http://www.xbitlabs.com/articles/video/display/geforce-gtx200-games.html

I think the use of transparency MSAA is really throwing in a wild card, when comparing these results to other sites. Some might argue the texture quality settings are doing so too.

There are a few occasions when GTX260 is slower than 9800GTX: Crysis 1920, HL2:EP2, Lost Planet min framerates, Oblivion almost always.

Jawed

fellix
07-Aug-2008, 13:14
Weird russians -- just bare one thing in mind about it. :D

CarstenS
07-Aug-2008, 13:35
Hum? Vista x64 + HD3850 + Cat 8.6 gives at least options for Multi- and Supersampling, with Performance & Quality options for Super and slider from Smooth to Sharp on Multisampling.
With the official 8.7ish Catalysts everything's back to normal, yes. But there was a horrible mess with all the different 8.6-Cats including various Betas floating around.

With our 8.6-driver (and i honestly do not remember now, which was which), we've only had access to either one of these settings - Perf/Qual/Slider or MS/SS - depending on the OS.

AlexV
07-Aug-2008, 13:49
http://www.hardocp.com/article.html?art=MTUzMiwsLGhlbnRodXNpYXN0

The bit in yellow under the "Adaptive AA" header should clear things up.

Jawed
07-Aug-2008, 13:57
http://www.hardocp.com/article.html?art=MTUzMiwsLGhlbnRodXNpYXN0

The bit in yellow under the "Adaptive AA" header should clear things up.
So that means it's impossible to benchmark ATI and NVidia comparatively with any kind of transparent texture antialiasing.

Jawed

CarstenS
07-Aug-2008, 19:01
http://www.hardocp.com/article.html?art=MTUzMiwsLGhlbnRodXNpYXN0

The bit in yellow under the "Adaptive AA" header should clear things up.

Reminds me painfully of the discussion of tying texture filter reductions to game fixes (ironically called "AI"). According to AMD it would have been to confusing for the user to...blablabla.

Same thing here. Plus, the driver routine does obviously not evaluate in the best interest of the gamer: In Call of Duty 4, chain link fences for example were (a month or so ago) untouched compared to force TSSAA - I'd have to check & verify this for current drivers though.


edit:
But thanks anyway for providing this clarifying link, MTDE!

ChrisRay
07-Aug-2008, 20:15
Sounds similar to Nvidia's improved transparency MS. But it require app specific support. Though you have to use transparency MS for it to work.

Jawed
12-Aug-2008, 13:23
Looking at these tests, I'm starting to wonder if NVidia's texture cache system has reached the end of the road. 10 clusters <-> 8 L2s appears to be completely clogging up, e.g. the Factored BRDF test runs only 1% faster on GTX280 than on 9800GTX. Now I really don't know what that shader's doing (why can't we look at these shaders?) but 1% for a shader that at worst is TEX bound seems incredible.
Here's another one of these anomalies:

http://www.computerbase.de/artikel/hardware/grafikkarten/2008/test_ati_radeon_hd_4870_x2/6/#abschnitt_theoretische_benchmarks

Make sure to expand the D3D Rightmark 2.0 section of the page, right at the bottom. In that set of results you'll find vertex texture tests 1 and 2. In test 1, GTX260 is 95% of the performance of 8800GTX. In test 2 GTX260 is the same performance as 8800GTX.

GTX260 and 8800GTX have the same core clock 576/575MHz.

In both tests GTX280 is 33% faster than GTX260, so the test is consistent on GT200. Yet this is not the case when comparing G92 GPUs, i.e. in test 1 9800GTX+ is 7% faster than 8800GTS-512, but in test 2 it's 13% faster.

I wonder if GT200's 32-wide batch is relevant here (instead of 16-wide in G92), having an effect on the way ALU and TEX work is scheduled from the batches in flight. If so, could that imply that pixel shader texturing in the factored BRDF test is somehow affected by poor scheduling of batches?

Jawed