PDA

View Full Version : Futuremark: 3DMark06


Pages : 1 2 [3]

N00b
20-Jan-2006, 09:59
The thing I don't understand with the new 3DMark06 is the ATI camp complaining that current nVidia cards don't get a score with AA on.
If I were to tease some nVidia user I would certainly say something like "Dude, your card isn't even good enough to get a score with AA on. It's missing an important feature! nVidia suxxxx!" ;-)

Hubert
20-Jan-2006, 10:03
Yes but we are chivalruous, we don't like to win without a good fight. :)
Nvidia cards can't pick up the glove we throw, but what's the point ? The fun is in fighting. Competing, to be politically correct. :)

N00b
20-Jan-2006, 10:42
Is there any site that has 3DMark06 feature tests scores for the 1800XT and the 7800 GTX (512)? Or a comparisson with hardware shadow mapping disabled?

Bouncing Zabaglione Bros.
20-Jan-2006, 11:49
And why is this ? because ATi is not favoured. Whenever Ati is not favoured then we get the most long winded threads where a court is summarily set up and the "injustice" to Ati is gone over in such minute detail that only paranoia can be thought to be in the heads of adjudicators. In swoops thw cardinals in their red ( how appropriate ) gowns, " Everybody expects the B3D inquisition " Fear and surprise is our .....

It's not that at all. There are just so many weird discrepancies and choices, and they seem to favour Nvidia. Come on, a "forward looking test" with no SM3.0 branching, no parallex mapping, no AA/AF? Even places where Nvidia cards get no score rather than a bad score, where the exact opposite happens for ATI cards? Nvidia cards get advantage from their specific non-DX features, but ATI cards don't?

The cards from ATI/Nvidia are simply not being treated with the same level of objectivity, and that is what is being queried. It's not tribalism, it's frustration at what should be a level playing field being so far tilted that 3DMark06 is pretty useless for comparing performance between the two main chip suppliers, even though it claims to be an even and honest test of capabilities. 3DMark06 is now just a marketing tool, rather than an objective testbench of a card's capabilities and performance.

Ragemare
20-Jan-2006, 12:24
Yes, they can. Do you think they will?

I think there's more chance of them not even RUNNING AA tests since a score isn't generated for nVidia cards (thus, no interesting and pretty bar charts to show) , than calculating what the score "would have been".

Which, if I were a cynical type, I would say is exactly what nvidia would prefer.

I don't know, maybe if they were harrased enough one or two might.

Subtlesnake
20-Jan-2006, 12:31
It's not that at all. There are just so many weird discrepancies and choices, and they seem to favour Nvidia. Come on, a "forward looking test" with no SM3.0 branching, no parallex mapping, no AA/AF? Even places where Nvidia cards get no score rather than a bad score, where the exact opposite happens for ATI cards? Nvidia cards get advantage from their specific non-DX features, but ATI cards don't?

The cards from the ATI/Nvidia are simply not being treated with the same level of objectiviity, and that is what is being queried. It's not tribalism, it's frustration at what should be a level playing field being so far tilted that 3DMark06 is pretty useless for comparing performance between the two main chip suppliers, even though it claims to be an even and honest test of capabilities. 3DMark06 is now just a marketing tool, rather than an objective testbench of a card's capabilities and performance.
But dynamic branching has been confirmed, in this very thread.

Also, when dealing with mandatory features, the benchmark treats ATI and Nvidia equally. The 6200 doesn't support floating point blending, so the SM3.0 tests aren't run and the card receives a lower score, as with the ATI X800 class of hardware.

Finally, 3D Mark does take advantage of ATI "specific non-DX features", like fetch4.

Cowboy X
20-Jan-2006, 13:08
Hehe definitly Joe.

In the meantime, lets look at this.

http://www.pcper.com/article.php?aid=199&type=expert&pid=6


Seems like its not CPU limited here at all.

Looking at the individual tests, sm 3.0 ATi cards seem to do better respectively, possible due to the increased branch performance.

But I thought that there was minimal branching in this benchmark .

NocturnDragon
20-Jan-2006, 13:19
But dynamic branching has been confirmed, in this very thread.
Yes, but it has not been confirmed on HOW it's used, and how much impact it has,
and showing from the benchmaks it has no impact whatsoever.
If it was used how future games will use it (a year from now?) nvidia cards would crawl!

Also, when dealing with mandatory features, the benchmark treats ATI and Nvidia equally. The 6200 doesn't support floating point blending, so the SM3.0 tests aren't run and the card receives a lower score, as with the ATI X800 class of hardware.
Not really so! If it was the case, FM would have created a pingopong shader to emulate the lack of floating point blending on the 6200 and it would have create a PS to emulate the lack of AA when FP blending is enabled for cards that don't use it.
That would have been treat Hardware equally. If the hardware doesn't support a feature implement it with PS! For all features on all hardware.
OR they could make ati not run PS3 tests because of no FP texture filtering (ok that could have never been a valid decision, but it still makes my point!)

Read my previous post. Why penalize ATI for it's decision on not supporting HW filtering, while not penalize Nvidia for not supporting hardware AA or FP blending?

Finally, 3D Mark does take advantage of ATI "specific non-DX features", like fetch4.

Sure after the 05 only supported the Nvidia one!
Bear in mind that fetch4 is available on every single channel texture format, while in the test is only used in the PS2.0 tests... why not in the other ones? (maybe because nvidia PCF wouldn't work?)

And what about 3dc? what about Rendering to buffer array?

Blastman
20-Jan-2006, 13:23
I think a good question would be whether the dynamic branching is in there just for show. It’s likely the R520 would be a good portion faster than the G70 if it was really doing something.

HDR should be an option (like AA/AF) and not on by default -- all cards could then be tested with AA. AA is pretty well a given for good IQ on high end cards. HDR can hardly be seen as forward looking when one has to give up AA to get it (on most cards out there that currently support it).

I’m wondering why there is no SM2.0b support -- this really dismisses the X800 series cards compared to the NV6 series. I read on one site that shaders were to be limited to 512 instructions or less -- so why not have an equivalent SM2.0b? We might get some idea if SM3.0 and the dynamic branching helps performance or is there just for show. Supporting SM2.0b would also mean one doesn’t have to pull some number out of a rabbit’s hat for the SM2.0b X800 cards overall score -- ie. … multiply SM2.0 score by 0.75 as a pull-down to lower the overall score if SM3.0 isn’t supported. One wonders how that pull-down number of 0.75 was arrived at.

Score wise it looks like on 3DMark6 a X1600XT will beat a X800 XL. Yet in one of xbits latest roundups the XL beat the X1600XT in every game benchmark -- 18 games total , and by a quite large margin (typically 50% faster) in most of the benches to boot. SM3.0 isn’t going to do much performance wise for the X1600XT unless a lot of dynamic branching is used. Similar thing with X1600XT vs a 6800GS. XT looks like it matches the GS in 3DMarK6 but is quite a bit slower in almost all the games out there.

At least in 3Dmark5 there was some semblance of reality in the scores and one could expect reasonably close DX9 performance to what the scores indicated on various cards --- but in 06 there is no semblance of reality in the respective performance of many cards. And since 3Dmark6 is so far off the mark in so many cases -- it doesn’t seem very useful as a benchmark.

N00b
20-Jan-2006, 13:28
Yes, but it has not been confirmed on HOW it's used, and how much impact it has,
and showing from the benchmaks it has no impact whatsoever.
If it was used how future games will use it (a year from now?) nvidia cards would crawl!Yeah. Right. :roll:
I seriously doubt that a year from now there will be a single game where current nVidia cards will crawl and ATI ones will fly.

NocturnDragon
20-Jan-2006, 13:28
Score wise it looks like on 3DMark6 a X1600XT will beat a X800 XL. Yet in one of xbits latest roundups the XL beat the X1600XT in every game benchmark -- 18 games total , and by a quite large margin (typically 50% faster) in most of the benches to boot. SM3.0 isn’t going to do much performance wise for the X1600XT unless a lot of dynamic branching is used. Similar thing with X1600XT vs a 6800GS. XT looks like it matches the GS in 3DMarK6 but is quite a bit slower in almost all the games out there.

That's probably due to the pixel shader / texture ratio of the x1600 3 to 1, that is not really fully used yet in current games.
We might have more information on that when the x1900 benchmarks will be shown.

inefficient
20-Jan-2006, 13:32
It's not that at all. There are just so many weird discrepancies and choices, and they seem to favour Nvidia. Come on, a "forward looking test" with no SM3.0 branching, no parallex mapping, no AA/AF? Even places where Nvidia cards get no score rather than a bad score, where the exact opposite happens for ATI cards? Nvidia cards get advantage from their specific non-DX features, but ATI cards don't?

The cards from the ATI/Nvidia are simply not being treated with the same level of objectiviity, and that is what is being queried. It's not tribalism, it's frustration at what should be a level playing field being so far tilted that 3DMark06 is pretty useless for comparing performance between the two main chip suppliers, even though it claims to be an even and honest test of capabilities. 3DMark06 is now just a marketing tool, rather than an objective testbench of a card's capabilities and performance.

Your getting too caught up in the fact that the GTX beats the X1800XT here. Just look at the frame rates of the tests! These tests ARE forward looking. They are NOT meant for this generation of video cards. The fact is that ALL current cards run these tests badly.

A GTX512 can't even get 20fps average on game3 at the standard res with no AA and no AF. And on the few cards that currently do support HDR+FSAA the frame rates are so low it makes little sense to run the HDR tests on these cards. The XT run's the test at under 15fps and the XL at under 10fps.

These tests were designed to target up comming hardware and both ATI and NVIDIA had input on what went into them.

And seriously - do you really think that Nvidia needed to pay off futuremark to sell more GTX cards? Those cards have no problem selling themselves.

NocturnDragon
20-Jan-2006, 13:36
Yeah. Right. :roll:
I seriously doubt that a year from now there will be a single game where current nVidia cards will crawl and ATI ones will fly.
I probably used the wrong word, I was meaning crawling compared to ati ones, not in a absolute meaning.
Anyway I bet that the various 6800 won't be that fast in high end games sold a year or 2 from now!
But we are still talking about FutureMark.
And you have to agree with me that dynamic branching will be used a lot in the future. And i'm pretty sure next Nvidia card (not the upcoming refresh) will have no problem with that!

If the test was using DB in a heavy way (which i'm pretty sure it will be used in the future) nvidia cards would really be slower than the ati ones.

Here there is a link to remind you of the speed difference.
http://www.xbitlabs.com/images/video/radeon-x1000/x1800/Xbitmark_x18.gif

AlexV
20-Jan-2006, 13:56
BY the time branch intensive shaders become a de facto standard, certainly one, but possibly 2 other 3DMarks will be released. Even though I don`t favour either side, I also don`t encourage being selectively blind.

3DMark was never meant to be a showcase for far-future tech, more for things coming up rather soonish, under a year`s timeframe. This "Dynamic Branching will rock da world"yadda is very reminescent of the "DX9.1 for FX goodness" and "Screw SM3.0, 3Dc iz da shiznit".Not in the x1800xt`s lifetime, or the GTXs. It`s a very important feature, true, but it`s not something you`d want to rely heavily on if your game/engine was coming out in the following 18 months, IMO.And as for flying/crawling...mehh, I doubt that will happen...the FX sucked ass badly, and it only crawled near the end of its lifecycle. Devs aren`t IHV demo-makers, they target a large audience.

Bouncing Zabaglione Bros.
20-Jan-2006, 14:01
Your getting too caught up in the fact that the GTX beats the X1800XT here. Just look at the frame rates of the tests! These tests ARE forward looking. They are NOT meant for this generation of video cards. The fact is that ALL current cards run these tests badly.


Nope, my points will still be true in a week when X1900XT arrives, and in a few of months when G71 arrives. Futuremark are not treating companies equally, and this stands out because some of the decisions they made on what to support and what not to support look pretty bizarre in light of the what we'll be seeing in the next year on our PCs.

Cowboy X
20-Jan-2006, 14:12
BY the time branch intensive shaders become a de facto standard, certainly one, but possibly 2 other 3DMarks will be released. Even though I don`t favour either side, I also don`t encourage being selectively blind.

3DMark was never meant to be a showcase for far-future tech, more for things coming up rather soonish, under a year`s timeframe. This "Dynamic Branching will rock da world"yadda is very reminescent of the "DX9.1 for FX goodness" and "Screw SM3.0, 3Dc iz da shiznit".Not in the x1800xt`s lifetime, or the GTXs. It`s a very important feature, true, but it`s not something you`d want to rely heavily on if your game/engine was coming out in the following 18 months, IMO.And as for flying/crawling...mehh, I doubt that will happen...the FX sucked ass badly, and it only crawled near the end of its lifecycle. Devs aren`t IHV demo-makers, they target a large audience.

Not meaning to go off track , but the FX sucked early on in many titles and used all manner of low quality hacks to appear competitive . And then by the time of the next gen cards (NV40 ) everyone happily abandoned the FX and relegated it to DX 8 at best .

N00b
20-Jan-2006, 14:34
I probably used the wrong word, I was meaning crawling compared to ati ones, not in a absolute meaning.
Anyway I bet that the various 6800 won't be that fast in high end games sold a year or 2 from now!
But we are still talking about FutureMark.
And you have to agree with me that dynamic branching will be used a lot in the future. And i'm pretty sure next Nvidia card (not the upcoming refresh) will have no problem with that!

If the test was using DB in a heavy way (which i'm pretty sure it will be used in the future) nvidia cards would really be slower than the ati ones.

Here there is a link to remind you of the speed difference.
http://www.xbitlabs.com/images/video/radeon-x1000/x1800/Xbitmark_x18.gif
I think I understood perfectly well what you meant. And still I disagree. Thanks for posting the link to the Xbitlabs charts, it will help me to make my point clear.

If you look at the chart, you will notice that nVidia (7800 GTX) is ahead of ATI (1800XT) in 11 of the 17 shaders tests. If you look at the branching tests, you will notice that these tests are not very realistic, meaning you will probably never ever see a shader like that in a game. The use of branching in these tests is artificially high, so branching will have an heavy impact on the score. In a shader used in a real game, even a year from now, you will not have as much branching. Even in two years not every shader in every game will use heavy dynamic branching because some/most shaders will not require it. So while the use of dynamic branching in future games will give ATi a boost, it will be a modest one and current ATI X1x00 cards will not be suddenly twice as fast as current nVidia cards.

And, last not least, DX10 will be here soon. So there surely will be 3DMark07 and will probably arrive early 2007. So the scope of 3DMark06 is to give a forecast on games that will come out in the next year. I'm absolutely convinced that heavy branching will not be as widely used (in upcoming games') shaders as you suggest.

Hubert
20-Jan-2006, 14:40
Yeah. Right. :roll:
I seriously doubt that a year from now there will be a single game where current nVidia cards will crawl and ATI ones will fly.

I agree here, you can't dismiss dynamic branching's share in 3DMark 2006 just because Nvidia cards don't suck running 3DMark 2006. :)

But if there won't be games what use DB, that's not because developers don't want to use it without reason. There must be a good reason not to use such tehnique, with so much potential performance and quality wise, and that reason might be the difficulty in implementation or very small gains/effort. And of course a good reason might be that a large part of the cards on market can't use it well. So your statement that there won't be games which crawl on Nvidia products might be true not because they are so good, but simply because noone will develop games which would crawl on 60% (dunno, just a number) of the cards outhere.

Subtlesnake
20-Jan-2006, 14:48
Yes, but it has not been confirmed on HOW it's used, and how much impact it has
That's correct, but you can't use that logic to claim there's no impact. In synthetic pixel shader tests the X1800 is significantly slower than the 7800 GTX - now so far that difference hasn't translated into real world gaming performance, but that doesn't mean the same is true for 3D Mark 2006. Maybe the X1800 is being significantly helped by the dynamic branching.

Not really so! If it was the case, FM would have created a pingopong shader to emulate the lack of floating point blending on the 6200
Well, floating point blending is a requirement. I can understand this seems somewhat arbitrary, but if floating point blending is a real world requirement too then their decision is sensible.

and it would have create a PS to emulate the lack of AA when FP blending is enabled for cards that don't use it.
I wasn't aware you could fully simulate AA using shaders.

Read my previous post. Why penalize ATI for it's decision on not supporting HW filtering, while not penalize Nvidia for not supporting hardware AA or FP blending?
Nvidia is being penalised, because Futuremark is saying "your card isn't compatable with our SM3.0 tests". With the hardware FP filtering situation on the other hand they're giving ATI a very efficient fallback.

Now it's presumed that developers will use the fallback, so the test will be an accurate reflection of the performance difference between ATI and Nvidia hardware.

Sure after the 05 only supported the Nvidia one!
Fetch4 is only present in the X1000 series.

Bear in mind that fetch4 is available on every single channel texture format, while in the test is only used in the PS2.0 tests... why not in the other ones? (maybe because nvidia PCF wouldn't work?)

And what about 3dc? what about Rendering to buffer array?
According to Nick, neither would work:

"Due to the sampling method in the HDR/SM3.0 graphics tests, we weren't able to use neither FETCH4 or PCF in those tests. It simply wouldn't have worked due to the rotated grid we use."

on 3dc:

"3Dc would have increased the package by 2x"

jb
20-Jan-2006, 14:57
"3Dc would have increased the package by 2x"

Not only do I doubt it would increase the package by x2 (sure it will be bigger) but who cares how big the package is anyways as a test should be constrained by the features not the size! This was a very very week excuse.

inefficient
20-Jan-2006, 15:02
Not only do I doubt it would increase the package by x2 (sure it will be bigger) but who cares how big the package is anyways as a test should be constrained by the features not the size! This was a very very week excuse.

Seriously? You really think adding another 600MB justifies proving that one card is faster than another at 3dc? :roll:

This is just one feature! One feature that nearly no-one is even using.

Dave Baumann
20-Jan-2006, 15:04
Errr, this is just compression of normals - how many normal maps are there? It would add the size of the compressed normal map(s), or the normals could be compressed on install.

jb
20-Jan-2006, 15:11
Seriously? You really think adding another 600MB justifies proving that one card is faster than another at 3dc? :roll:

This is just one feature! One feature that nearly no-one is even using.

It was more of a counter point. And I really doubt its 600mb mroe in file size unless you have data to back that up :) Not including 3dc because its not used that much is a much better reason that it makes the size to big....

Neeyik
20-Jan-2006, 15:59
Errr, this is just compression of normals - how many normal maps are there? It would add the size of the compressed normal map(s), or the normals could be compressed on install.
The latter would require uncompressed normal maps to be in the package though; either solution is going to result in a bigger download package. As to how many normal maps there are in 06, it's anyone's guess but I do know that GT2 in 3DMark03 required 140MB of uncompressed normal maps. One could make some speculative guesses as to how much more one of the tests in 06 requires.

Dave Baumann
20-Jan-2006, 16:22
The latter would require uncompressed normal maps to be in the package though; either solution is going to result in a bigger download package.
Eh? Uncompressed normals are presumably already there, so compressing on install would result in the same download size now (save for a small conversion routine).

Neeyik
20-Jan-2006, 16:39
I was presuming that the normal maps in 06 are already compressed (DXT5) in the download package or is this not the case?

mczak
20-Jan-2006, 17:27
I was presuming that the normal maps in 06 are already compressed (DXT5) in the download package or is this not the case?
3dmark06 thus uses green/alpha channel of DXT5 textures for normal maps? In this case 3Dc wouldn't improve performance probably, as it has same bandwidth requirements / memory footprint (though IIRC such dxt5 normal maps need one instruction more in the pixel shader than 3Dc would, so a small performance improvement might be there). Though the quality of the normal maps would be better...
It still would not necessarily mean you'd need two times the space, for the download package normal maps could be stored in 3Dc only with the same size as the current dxt5 textures (*). Then you can create the DXT5 version of it upon installation, half the values you can put in the dxt5 alpha channel without any recompression, the other half needs to be decompressed/recompressed to the green channel. There should be no loss in quality there, or maybe a very very slight loss compared to when you'd generated it from the uncompressed maps. Obviously, the other way around (3Dc created from DXT5) would be pointless as far as quality is concerned.
(*) not quite true for zipped packages. Since the red/blue components are unused and thus presumably always 0, those DXT5 textures should probably compress somewhat better. Though this only affects 10 bits out of 128, so the potential gain isn't that big.

DemoCoder
20-Jan-2006, 18:14
I find the griping over the lack of parallax mapping to be really pathetic. Hey, they aren't testing physics on the GPU either (where ATI will presumably win big if you read the GPGPU papers), it's all a conspiracy! I don't see people crying that they should have spherical harmonics/prt or other "future/uncommon today" effects, because those won't show an ATI advantage. Likewise, 3Dc isn't going to improve ATI's performance in 3dMark06, only it's IQ, since it already uses DXT5.

This is the same situation as the vertex heavy shadow volume arguments of past benchmarks. Those favored ATI and nVidia fans were griping the tests were unrealistic. Now ATI f*nb*ys are upset over the results. If '06 showed nVidia soundly losing, none of these red herrings would even be swimming around.

mrcorbo
20-Jan-2006, 18:57
Yeah, I find it pretty funny how similar the tone here is now to that when NV30 was showing so poorly in '03. Right down to the unshaking belief that with future games their favored architecture is going to show it's "real" performance.

I'm not comparing the technology of NV30 to R520, because R520 is a much stronger design then NV30 was. But it sure feels like I've gone into the way-back machine back to the 3DMark 2003 days.

Now WRT the complaints about Nvidia's parts getting no score when AA is enabled; what's wrong with just comparing the SM2.0 scores and including the SM3.0 results with either a 0 for the Nvidia cards or just not even including them in the results becasue they don't support it. I mean, the overall 3DMark score in '06 is a pretty poor way to compare different cards anyway IMO. With the CPU score included it has become more of a platform result and any difference between 2 cards while using the same CPU is going to actually be lessened in the overall 3DMark score because you are getting the exact same score for the CPU.

Basically you are complaining about the inability to do something that it is unadvisable to do in the first place.

dizietsma
20-Jan-2006, 19:12
What bothers most people that Nvidia's slight lead without AA is well known, but things are just the opposite when AA enabled, based on real life tests. (games)
But 3DMark2006 simply can't compare competing IHV's cards' with AA enabled. This totally nullifies Ati's effort put in optimising bandwith usage. So, whatever the reason are, this synthetic benchmark simply can't test competing products. Because few people will bother doing separate game tests (mostly reviewers) most people will just run it, got their score and an idea about their systems capabilities.




"This totally nullifies Ati's effort put in optimising bandwith usage"

and

"Because few people will bother doing separate game tests (mostly reviewers) most people will just run it, got their score and an idea about their systems capabilities."

For your second point

I can asure you that most people, presumably gamers, when assessing their systems capabilities will run the standard test and see 25fps and then not decide to apply AA/AF on top of that just to decrease their fps further. This is a theoretical test, not a practical one.

On your first point

No, because the SM3 tests where AA cannot be applied for nvidia are heavily gpu biased and not bandwidth limited at all I think. The SM2 tests might be but then the capability of each card can be measured in turn.

Chalnoth
20-Jan-2006, 19:14
Well, it makes no sense to report a complete score with AA on an NV4x, because part will be run with AA, and part either won't be run with AA, or not at all.

It may make some sense, however, to compare the SM2 with AA scores between the NV4x and ATI hardware. I believe this score is reported when 3DMark06 is run with AA enabled.

Joe DeFuria
20-Jan-2006, 19:16
Well, it makes no sense to report a complete score with AA on an NV4x, because part will be run with AA, and part either won't be run with AA, or not at all.

Using that logic, it makes no sense to report a complete score for certain SM 3.0 NV parts that do not support floating point blending because part of the tests won't run.

And yet, a complete score is in fact given.

This is the problem that I have...it's not consistent.

DemoCoder
20-Jan-2006, 19:21
Well, if they changed that (report no score for SM3.0 parts without blending) would that satisfy you? If they made it consistent, by still not reporting a complete score AA on an NV4x, would you be happy?

dizietsma
20-Jan-2006, 19:23
It's not that at all. There are just so many weird discrepancies and choices, and they seem to favour Nvidia. Come on, a "forward looking test" with no SM3.0 branching, no parallex mapping, no AA/AF? Even places where Nvidia cards get no score rather than a bad score, where the exact opposite happens for ATI cards? Nvidia cards get advantage from their specific non-DX features, but ATI cards don't?

The cards from the ATI/Nvidia are simply not being treated with the same level of objectiviity, and that is what is being queried. It's not tribalism, it's frustration at what should be a level playing field being so far tilted that 3DMark06 is pretty useless for comparing performance between the two main chip suppliers, even though it claims to be an even and honest test of capabilities. 3DMark06 is now just a marketing tool, rather than an objective testbench of a card's capabilities and performance.

Then why, when 3dmark03 did a mainly single textured game for GT1 (when most of the current games did multitexturing and future games did shading) did nobody on this forum create a big stink and a 30 page thread to defend nvidia when nvidia got so upset they left the futuremark program ?

And why, also, has a a writer from another web site ever had to defend a negative review of nvidia? Never is my answer ! It constantly happens when it is Ati who is having the negative review or loses in some benchmark.

Although Daves reviews are so neutral it is a marvel ( even though I guess I know he has a leaning to) people on the forum just cannot do likewise.

Hence why I wear my green hat, just to try and balance things up.

It's a lonely cause, but being lonely means I do not have to shower :D

Chalnoth
20-Jan-2006, 19:33
Using that logic, it makes no sense to report a complete score for certain SM 3.0 NV parts that do not support floating point blending because part of the tests won't run.
Well, there's a lot of cards that won't run the SM3 tests. I think that Futuremark's logic was simply that it made sense to sacrifice the comparability of the benchmark a bit in order to get it to run on more hardware.

There wasn't much of any reason to make such a compromise on the AA situation, as it's a non-default setting.

Lux_
20-Jan-2006, 19:48
I think that Futuremark's logic was simply that it made sense to sacrifice the comparability of the benchmark a bit in order to get it to run on more hardware.
The benchmarks should maximise comparability, don't you think? To run on more hardware - that's what games are about :)

Joe DeFuria
20-Jan-2006, 19:50
Well, there's a lot of cards that won't run the SM3 tests.

Right. So treat them all the same.

There wasn't much of any reason to make such a compromise on the AA situation, as it's a non-default setting.

I submit that one "small" reason to make the same comprimise with AA is that practically everyone who cares about this benchmark runs these cards with AA enabled at some level...

Look at it this way: FM had a decision to handle the AA scores one way or the other...if it's kind of a toss-up as to which way they should do it...why chose the way that is NOT CONSISTENT with the non AA approach?

Chalnoth
20-Jan-2006, 19:54
I submit that one "small" reason to make the same comprimise with AA is that practically everyone who cares about this benchmark runs these cards with AA enabled at some level...
Right, so either the NV4x will produce a score that is artificially high or low, depending upon how the comparison is done.

Much better, if you ask me, to just compare the SM2 AA scores and be done with it. The only thing not reporting a full score does is it doesn't allow you do search for projects on the ORB. It doesn't prevent benchmark sites from reporting scores.

trinibwoy
20-Jan-2006, 19:55
I submit that one "small" reason to make the same comprimise with AA is that practically everyone who cares about this benchmark runs these cards with AA enabled at some level...

Whoa there pardner - I think you're quite wrong. 3dmark comparisons in the vast, vast majority of cases are at default settings.

Chalnoth
20-Jan-2006, 20:00
Can you even enable AA without paying for 3DMark? I can't test it until this afternoon, but given that you can't change any other settings...

Joe DeFuria
20-Jan-2006, 20:02
Whoa there pardner - I think you're quite wrong. 3dmark comparisons in the vast, vast majority of cases are at default settings.

Whoa there....read what I wrote.

I said that people use the CARDS with AA enabled (you know, when they use their cards to play games). Not that that they run the becnhmark with AA on the majority of times.

Of course, whatever default FM decides on, that's going to be the "most run" for that benchmark.

Joe DeFuria
20-Jan-2006, 20:05
Much better, if you ask me, to just compare the SM2 AA scores and be done with it.

Look, I don't really have a problem with it one way or the other....there are pros and cons to each approach. Either just manually compare the SM2 AA scores (and don't produce an overall score), or produce an overall score using their formulas to get a "pseudo comparison." Just be consistent about it!. (Have I not used the word "consistent" enough? ;) )

trinibwoy
20-Jan-2006, 20:05
Whoa there....read what I wrote.

Ah, now I get your meaning. But your wording above isn't exactly clear in that respect. And with regard to the comparison to games, 3dmark06 does exactly what the game does with Nvidia cards and HDR+AA - it doesn't run at all !! :wink:

Joe DeFuria
20-Jan-2006, 20:07
Well, if they changed that (report no score for SM3.0 parts without blending) would that satisfy you? If they made it consistent, by still not reporting a complete score AA on an NV4x, would you be happy?

Yes, I would. Am I not clear on that?

Xmas
20-Jan-2006, 21:38
It's not that at all. There are just so many weird discrepancies and choices, and they seem to favour Nvidia. Come on, a "forward looking test" with no SM3.0 branching, no parallex mapping, no AA/AF? Even places where Nvidia cards get no score rather than a bad score, where the exact opposite happens for ATI cards? Nvidia cards get advantage from their specific non-DX features, but ATI cards don't?
Some valid points, but how does the absence of parallax mapping favour NVidia? Sometimes I get the impression that some people look at the branching advantage R520 has over G70 and from there extrapolate an advantage in arithmetic- and texture-heavy shaders that just isn't there. Then they expect R520 to perform better G70 in "PS3.0" and if that expectation isn't met the benchmark must be crap. While of course that is a possibility (and I won't comment on 3DMark06 because I haven't seen it yet), there's also the point that G70 does indeed several things faster than R520.

Doesn't ATI get a better score from their specific non-DX feature (fetch4)?

Yes, but it has not been confirmed on HOW it's used, and how much impact it has,
and showing from the benchmaks it has no impact whatsoever.
If it was used how future games will use it (a year from now?) nvidia cards would crawl!
You're jumping to a conclusion and then expect the tests to support it. Not exactly scientific method.

Bear in mind that fetch4 is available on every single channel texture format, while in the test is only used in the PS2.0 tests... why not in the other ones? (maybe because nvidia PCF wouldn't work?)
Because it doesn't do any good for sparsely sampled filter kernels, especially as you cannot efficiently index vector components.


For those who are interested in how "CPU-limited" 3DMark06 is: use a null renderer (e.g. DXTweaker, 3D-Analyze).

Pete
20-Jan-2006, 21:46
I'd agree with this Pete except that gamers do have the option of putting up the screen resolution instead ( I am assuming they have a good enough monitor ). Indeed, Futuremark themselves have put the default screen resolution up and yet again left out AA in the standard test.And I'm assuming that all current cards can just as easily, if not moreso, use AA rather than bump the res. Seems to me FM settled on 12x10 and not 12x9 or even 720p b/c the first matches with the most common LCD res. LCD uses would likely prefer AA to higher res, just like CRT users might prefer 2xAA rather than a notch higher res with no AA.

the issue here is that people think the scoring is not fair for none standard tests.I don't know, I think HDR is becoming more standard, so why not the option to use AA with it? And if one can't, why not let that be reflected in the score--a score, rather than N/A? I'm going to completely read FM's whitepaper and reviewer's guide before I mouth off anymore, tho, to be fair to Nick & Co. (better late than never).

To me this is a bit strange because for the last few months/years this forum has tended to pour scorn on Futuremarks bench, it's scoring and the use the IHV's use the scores to sell cards and that anybody who buys a card based on this is tending towards being a bit daft. But now it seems this is the upmost importance.

| I put on my green tinged pro futuremark hat |

And why is this ? because ATi is not favoured.Well, that's one way to look at it, and quite a few ppl consider B3D ATI's last bastion of hope/hype. But you could argue that 3DM03 and 05 were partially tilted toward ATI. NV cheated on the first with the FX vs. ATI's 9-series, but they didn't need to do so (detectably) with the 6-series, whose shader power outgunned the X-series. And tho 05 had the DST and PCF brouhaha, its vertex setup limitation seemed to help ATI in relation to its weaker pixel shaders (see X1600 up with the 256-bit big boys, but falling behind on most games).

Remember that poor bloke from Anandtech that came over here and went grey haired before he had to leave saying he had better things to doWell, to be fair, I haven't seen Derek in Ars' or even AT's forums in quite some time. :) Unfortunately, ppl get vocal everywhere, and ultimately his time is better spent reviewing than arguing (see Brent). That's not to say he can't just read the forums, disregard most posts, and maybe pick up a tip or two.

Is there any site that has 3DMark06 feature tests scores for the 1800XT and the 7800 GTX (512)? Or a comparisson with hardware shadow mapping disabled?Damien took care of the former (http://www.behardware.com/news/7951/futuremark-releases-3dmark-06.html) for you. Check out Hanners' EB article for the latter. (I referenced his #s a page or three back: 25 and 17% hit on a GF6 and GF7, respectively).

no AA/AF?Wait, 06 goes back to no AF by default? Didn't 05 and maybe even 03 use 4xAF? Has FM given a reason for this (e.g., too many texture accesses otherwise, most games start with 1xAF, etc.)?

Chalnoth
20-Jan-2006, 21:52
Doesn't ATI get a better score from their specific non-DX feature (fetch4)?
From what I'm gathering from this thread, Fetch4 is not currently used on most ATI hardware (notably the R520) because most ATI hardware doesn't support Fetch4 at the precision that Futuremark is asking (24 bit).

Also bear in mind that none of the SM3 benchmarks use either PCF or Fetch4. And in the SM2 benchmarks where some ATI hardware can use Fetch4, nVidia hardware is always using PCF (since nVidia supports PCF at the precision 3DMark is asking).

Dave Baumann
20-Jan-2006, 21:57
DST24 was implemented at the same time as the boards that implemented Fetch4 - i.e. if the ATI hadware supports Fetch4 that hardware will also support 24 bit depth texture formats.

ANova
20-Jan-2006, 22:04
From what I'm gathering from this thread, Fetch4 is not currently used on most ATI hardware (notably the R520) because most ATI hardware doesn't support Fetch4 at the precision that Futuremark is asking (24 bit).

Also bear in mind that none of the SM3 benchmarks use either PCF or Fetch4. And in the SM2 benchmarks where some ATI hardware can use Fetch4, nVidia hardware is always using PCF (since nVidia supports PCF at the precision 3DMark is asking).

Indeed, and this is where the problems lie. It's simply not an accurate benchmark because the cards are running differently. Nvidia is using the PCF optimization in 24 bit while ATI is running without any optimizations in 32 bit. Now you have to give Futuremark some credit, the X1800 simply does not support D24X8 nor fetch4 for some reason (I guess timing), so there's little Futuremark could have done, but I would still be interested in the results of both cards running 16 bit DST without any optimizations. Either that or both cards running with R32F and all possible optimizations.

Neeyik
20-Jan-2006, 22:09
For those who are interested in how "CPU-limited" 3DMark06 is: use a null renderer (e.g. DXTweaker, 3D-Analyze).
I've only tried a P4 3GHz system with a 6600 GT so far, but the SM2.0 tests seem remarkably CPU-limited. The HDR tests are a little better, with the first one being much less CPU bound than the second. Odd...

rwolf
20-Jan-2006, 22:10
Yeah, I find it pretty funny how similar the tone here is now to that when NV30 was showing so poorly in '03. Right down to the unshaking belief that with future games their favored architecture is going to show it's "real" performance.

I'm not comparing the technology of NV30 to R520, because R520 is a much stronger design then NV30 was. But it sure feels like I've gone into the way-back machine back to the 3DMark 2003 days.

Now WRT the complaints about Nvidia's parts getting no score when AA is enabled; what's wrong with just comparing the SM2.0 scores and including the SM3.0 results with either a 0 for the Nvidia cards or just not even including them in the results becasue they don't support it. I mean, the overall 3DMark score in '06 is a pretty poor way to compare different cards anyway IMO. With the CPU score included it has become more of a platform result and any difference between 2 cards while using the same CPU is going to actually be lessened in the overall 3DMark score because you are getting the exact same score for the CPU.

Basically you are complaining about the inability to do something that it is unadvisable to do in the first place.

Only back then NV30 performed in games the same as it did in 3DMark. Which is not the case in this situation.

Neeyik
20-Jan-2006, 23:12
For anybody who is interested, I've collated shader dumps from all of the tests (bar the batch tests) from 3DMark06:

http://www.neeyik.info/fmark/06shaders.rar

Each folder in the rar file contains the vertex and pixel shaders from the respective tests, as caught by 3DAnalyze. The SM2.0 and HDR folders contain quite a lot of shaders because in the case of the SM2.0 tests, I ran them twice: default and then without HW DST. The same applies for the HDR tests but this time default and then with software FP filtering.

A quick glance at some of the longer pixel shaders in the HDR tests shows that a couple of them are using if not equal to...else with a bucket load of instructions that can be skipped; the shader particle test is also using flow control in its vertex shader that performs the vertex texturing. Oh and the Perlin noise test is also one hell of a PS!

Rys
20-Jan-2006, 23:42
Is it just me or is the Perlin noise one just a (very) long < 512 instruction shader that'd compile as a pixelshader 2.0 test?

Unknown Soldier
20-Jan-2006, 23:45
Well, if they changed that (report no score for SM3.0 parts without blending) would that satisfy you? If they made it consistent, by still not reporting a complete score AA on an NV4x, would you be happy?

Actually, Joe has a point.

Think about it. Every review site on the planet when reviewing graphics cards, will benchmark the cards using no AA+AF as well as adding AA+AF results. Check any site for the last 4 years and you'll see it's the standard.

FutureMark have been in this industry for quiet a while and for them to still not include AA+AF results as a final score is worrisome. Of course, you can do so with the advanced and professional editions, but the basic edition can't do so.

Of course since the program is used to analyse a range of cards, this is most probably not convienient atm.

Back to all that's been happening at hand, well Futuremark could've maybe did more, they don't think so and it's their prerogative as developer.

mrcorbo
21-Jan-2006, 00:01
Only back then NV30 performed in games the same as it did in 3DMark. Which is not the case in this situation.

Actually, IIRC, at the time 3DMark 2003 was released, the 5800 Ultra was even or performed better than the 9700 pro in most games w/o AA/AF. I think NVidia had convinced developers as well as consumers to wait to get serious about DX9 until NV30 was released because it was going to be such a great product. So, at the time it came out, 3DMark '03 was really the only indication of how bad a DX9 implementation NV30 really was. It wasn't until later on that the predictions made by '03 were validated by actual games.

Chalnoth
21-Jan-2006, 00:17
Actually, Joe has a point.

Think about it. Every review site on the planet when reviewing graphics cards, will benchmark the cards using no AA+AF as well as adding AA+AF results. Check any site for the last 4 years and you'll see it's the standard.

FutureMark have been in this industry for quiet a while and for them to still not include AA+AF results as a final score is worrisome. Of course, you can do so with the advanced and professional editions, but the basic edition can't do so.
Well, again, it'd be ridiculous to do comparisons in this way with the full score. You'd want to break down the score and only compare the SM2 FSAA results between the two IHV's. But that's what you can do now.

Unknown Soldier
21-Jan-2006, 00:45
Hence the

Of course since the program is used to analyse a range of cards, this is most probably not convienient atm.

;)

US

mrcorbo
21-Jan-2006, 03:40
I would hope that a competent reviewer would choose effective and meaningful over convenient.

Cowboy X
21-Jan-2006, 03:54
Actually, IIRC, at the time 3DMark 2003 was released, the 5800 Ultra was even or performed better than the 9700 pro in most games w/o AA/AF. I think NVidia had convinced developers as well as consumers to wait to get serious about DX9 until NV30 was released because it was going to be such a great product. So, at the time it came out, 3DMark '03 was really the only indication of how bad a DX9 implementation NV30 really was. It wasn't until later on that the predictions made by '03 were validated by actual games.

I cannot be the only one who remembers the large scale cheating done in titles that weren't even in the dx 9 weak point of the NV30 .

Demirug
21-Jan-2006, 07:00
Is it just me or is the Perlin noise one just a (very) long < 512 instruction shader that'd compile as a pixelshader 2.0 test?

How do you come to this conclusion? The Perlin noise shader is a 3.0 shader.

Chalnoth
21-Jan-2006, 07:04
I cannot be the only one who remembers the large scale cheating done in titles that weren't even in the dx 9 weak point of the NV30 .
Like what, specifically?

Neeyik
21-Jan-2006, 08:17
Is it just me or is the Perlin noise one just a (very) long < 512 instruction shader that'd compile as a pixelshader 2.0 test?
Are you suggesting it as being a multipass PS2.0 test? - the texture and arithmetic instruction count is way over the PS2.0 limit; I haven't bothered to sit and check what the register usage is like either.

N00b
21-Jan-2006, 08:50
Using that logic, it makes no sense to report a complete score for certain SM 3.0 NV parts that do not support floating point blending because part of the tests won't run.

And yet, a complete score is in fact given.

This is the problem that I have...it's not consistent.I thing you are wrong here. Doing FP16 blending with a pixel shader is so trivial even I could probably write the shader after fiddling with the DX documentation for an hour or two. And the performance penalty surely isn't that great, I guess about 5-10% max.
Adding MSAA with a pixel shader on the other hand is not so simple. I wonder if it can be done and how? You could probably do SSAA, which would not really be comparable and performance would suck.

So what Futuremark has done here reflects what a most developers would have done. Add the trivial fallback and ignore the complicated one. It's not like we will see FP16 AA with current nVidia games in any forthcoming game. Will Not Happen. (Unless someone comes up with a very clever trick no one has thought of yet, but I doubt it)

So, as I said in a previous post, the absence of HDR/SM3.0 AA/AF scores with current nVidia cards should not be seen as unfair, but as a boon. The X1x00 cards simply have an important feature that the current nVidia cards don't have.

That said, the abscence of a HDR/SM3.0 AA/AF for current nVidia cards hints that future cards will support FP16 AA. So I guess in two or there months this whole affair will be non-issue anyway.

N00b
21-Jan-2006, 08:55
Damien took care of the former (http://www.behardware.com/news/7951/futuremark-releases-3dmark-06.html) for you. Check out Hanners' EB article for the latter. (I referenced his #s a page or three back: 25 and 17% hit on a GF6 and GF7, respectively).Thanks.

Hubert
21-Jan-2006, 11:43
Right, so either the NV4x will produce a score that is artificially high or low, depending upon how the comparison is done.


I guess, given the circumstances, the NA score is best. It simply says as far as we (FutureMark) know it you won't be able to use HDR and AA with Nvidia cards. It would be different if FutureMark did use a AA algorithm in shaders, than a score would worth be given.

An Ati fan should be quite happy with 3DMark 2006 ... it states that the so advertised Nvidia only SM 3.0 feature, HDR, is just unusable in real life. Or, Nvidia owners have to play games twice: first with decent IQ, second with HDR. Or viceversa.

Jawed
21-Jan-2006, 11:44
Since PCF/Fetch4 cannot be used for the "advanced shadowing" algorithm of the SM3/HDR tests (3 and 4), does DST/PCF have much of a future?

It seems to me that DST/PCF/Fetch4 might end-up like stencil shadows, a feature that's used for 2 or 3 game engines and is then "forgotten" as not good enough.

Though I presume that it's the hardware-PCF/Fetch4 that's at issue here, because DSTs are always going to be needed, however fancy the shadow filtering technique. Is that correct?

I'm not clear on whether CSM is used in all four tests. Presumably this is independent of the technique for fetching shadow samples and/or filtering them, so I presume it's in all four tests.

Jawed

Dave Baumann
21-Jan-2006, 11:53
I'm not sure that it "can't be used"; I'm looking at a test application now where it is used, along with a 12-tap random sample (equating to 48 samples in total) and the shadow quality is very good - given that the performance for single sample is roughly the same as 4 samples with PCF/Fetch4 then this is probably what developers will use anyway (this same point was brought up with 3DMark05, so I'm not sure what the logic is behind changing it). I think ATI are peeved because this can be combined with dynamic branching such that the branch test just does a single sample of the depth map in or out of the shadow, but only applies the higher tap sampling when its detected to be at the edge of a shadow map (which results in a performance improvement on ATI hardware, and can also result in IQ improvements since you could spend more on just sampling the shadow edges if you know you aren't going to waste a lot of processing when its fully in or out of shadow).

I'm assuming, here, that 3DMark06's shadowing mechanism doesn't use dynamic branching anyway.

Hubert
21-Jan-2006, 12:05
Thanks !

Man, I begin to understand the intricacies of today's graphics hardware ... (the link given by Jawed in "fetch4 - important ?" topic, Siggraph Shading Course 2006 pdf. did help a lot )

I better leave until it's too late. :)

Jawed
21-Jan-2006, 12:19
"Can't be used" was meant very much in the sense that "it offers no performance gain, and is therefore pointless". There's no point in fetching four samples and discarding three, if fetching one sample is an option.

Now, as to your comments about DB and filtering only where there is likely to be a penumbra - well I have to say this was always the foundation for my suspicions against 3DMk06 using DB. It is clearly a technique that heavily favours ATI hardware because of the inadequacy of the NV implementation (rather than it being absent), and one that is part of DX9 to boot. It's at the root of my assertion that FM copped-out big time. Pathetic and unimaginative.

Soft shadowing is clearly the banner case for per-pixel DB.

Jawed

Dave Baumann
21-Jan-2006, 12:22
Feching 4 samples (and multiple random locations) will always be a quality gain.

I think their point being is that given there are two paths already there for many things, why not two paths for the shadowing?

Jawed
21-Jan-2006, 12:29
Thanks !

Man, I begin to understand the intricacies of today's graphics hardware ... (the link given by Jawed in "fetch4 - important ?" topic, Siggraph Shading Course 2006 pdf. did help a lot )

I better leave until it's too late. :)
It's actually Siggraph 2005

http://www.ati.com/developer/SIGGRAP...Course_ATI.pdf

Jawed

Jawed
21-Jan-2006, 12:32
Feching 4 samples (and multiple random locations) will always be a quality gain.
But if your intention is to use a sparse filtering kernel, then four contiguous samples anywhere in the kernel means it's no longer sparse.

Jawed

Dave Baumann
21-Jan-2006, 12:44
4 taps per sparse sample is going to be better quality than than just single tap sparse samples (and not that different in performance).

Jawed
21-Jan-2006, 13:02
A tap and a sample are the same thing. Otherwise I'm missing something...


// Look up rotation for this pixel

float2 rot = BX2( tex2Dlod(RotSampler,

float4(vPos.xy * g_vTexelOffset.xy, 0, 0) ));

for(int i=0; i<12; i++) // Loop over taps

{

// Rotate tap for this pixel location and scale relative to center

rotOff.x = rot.r * quadOff[i].x + rot.g * quadOff[i].y;
rotOff.y = -rot.g * quadOff[i].x + rot.r * quadOff[i].y;
offsetInTexels = g_fSampRadius * rotOff;

// Sample the shadow map

float shadowMapVal = tex2Dlod(ShadowSampler,

float4(projCoords.xy + (g_vTexelOffset.xy * offsetInTexels.xy), 0, 0));

// Determine whether tap is in light

inLight = ( dist < shadowMapVal );

// Accumulate

percentInLight += inLight;
}


Jawed

kyetech
21-Jan-2006, 13:04
Does any body know where I can download videos of these things running... I REALLY wanna see the new canyon run, and also that snow one.... But damn, I havent got the hardware....

Im just a poor addicted graphics whore that needs my next fix !!

please help :-)

Dave Baumann
21-Jan-2006, 13:19
A tap and a sample are the same thing. Otherwise I'm missing something...
Yeah and no. With PCF, 4 taps, the depth compare and the averaged value is all a single operation and roughly the same cost as a single sample - so using multiples of those is likely to result in a better quality output. With Fetch 4, 4 taps is a sample; the cost of the fetching the 4 taps is the same as a single sample but the compare and average has to be done in the shader, will will probably end up being negligable overall. The point being, given that 4 taps per sample more or less the same cost as just 1 tap per sample then why not do it and sparse sampling?

Rys
21-Jan-2006, 13:31
How do you come to this conclusion? The Perlin noise shader is a 3.0 shader.
As Nick asks, I think it's multipassable on PS2.0 hardware and I don't see anything in the shader (although I just looked quickly) that would stop it being run on that class of hardware, primarily so a "here, look what PS3.0 buys you in this very long multipass PS2.0 shader" comparison/test could be done, since it doesn't seem to have any dynamic flow control or other PS3.0-specific construction.

You're the expert! :grin:

Xmas
21-Jan-2006, 13:45
Soft shadowing is clearly the banner case for per-pixel DB.
Soft shadowing might be a banner case for DB, but hardly for per-pixel DB. With shadows you usually have large contiguous areas that are completely in or out. In fact it is one of those rare cases where NVidia's DB can be a huge performance gain despite its large granularity.

I'm not sure that it "can't be used"; I'm looking at a test application now where it is used, along with a 12-tap random sample (equating to 48 samples in total) and the shadow quality is very good - given that the performance for single sample is roughly the same as 4 samples with PCF/Fetch4 then this is probably what developers will use anyway (this same point was brought up with 3DMark05, so I'm not sure what the logic is behind changing it). I think ATI are peeved because this can be combined with dynamic branching such that the branch test just does a single sample of the depth map in or out of the shadow, but only applies the higher tap sampling when its detected to be at the edge of a shadow map (which results in a performance improvement on ATI hardware, and can also result in IQ improvements since you could spend more on just sampling the shadow edges if you know you aren't going to waste a lot of processing when its fully in or out of shadow).
Did they explain how they detect edges? Taking a smaller number of samples first and checking whether they're all in or out?
That technique would help NVidia as well (they presented it in 2004), though likely not as much.

Demirug
21-Jan-2006, 13:46
As Nick asks, I think it's multipassable on PS2.0 hardware and I don't see anything in the shader (although I just looked quickly) that would stop it being run on that class of hardware, primarily so a "here, look what PS3.0 buys you in this very long multipass PS2.0 shader" comparison/test could be done, since it doesn't seem to have any dynamic flow control or other PS3.0-specific construction.

You're the expert! :grin:

Now I am understand what you want to say. From a first look I would say you are right. I currently try to add a new plugin to the DirectX Tweaker that can save the HLSL code to a file if the app uses D3DX to compile it during runtime. If we can get the HLSL code from this shader we can at least check if it can compile for NV3X/R4XX.

Demirug
21-Jan-2006, 13:49
I'm assuming, here, that 3DMark06's shadowing mechanism doesn't use dynamic branching anyway.

If I use the shadercode without removed comments I can see that sometimes they shadow texture is only used in one branch path.

Jawed
21-Jan-2006, 13:49
Yeah and no. With PCF, 4 taps, the depth compare and the averaged value is all a single operation and roughly the same cost as a single sample - so using multiples of those is likely to result in a better quality output. With Fetch 4, 4 taps is a sample; the cost of the fetching the 4 taps is the same as a single sample but the compare and average has to be done in the shader, will will probably end up being negligable overall. The point being, given that 4 taps per sample more or less the same cost as just 1 tap per sample then why not do it and sparse sampling?
Why you won't do this is that four contiguous taps in your sparse filter are meaningless.

I'm talking about shadow map filtering after PCF - where PCF is a technique that's predicated on taking contiguous samples from the shadow map, and a technique that's hard to tweak for quality. Which is why 3DMk06 doesn't use PCF in tests graphics tests 3 and 4.

The technique presented by ATI in the Siggraph presentation (as well as other places) uses a sparse-sampled kernel in preference to a large-density PCF filtering technique.

From page 18 of the presentation linked above:



• Grid-based PCF kernel needs to be fairly large to eliminate aliasing– Particularly in cases with small detail popping in and out of the underlying hard shadow.


• Irregular sampling allows us to get away with fewer samples– Error is still present, only the error is “unstructured” and thus less noticeable
– Per-pixel spatially varying rotation of kernel is used to provide even more variation.


Multiple-contiguous sample taps don't make sense in a sparse-sampling kernel. At least, not as far as I can see.

Jawed

Jawed
21-Jan-2006, 13:57
Soft shadowing might be a banner case for DB, but hardly for per-pixel DB. With shadows you usually have large contiguous areas that are completely in or out. In fact it is one of those rare cases where NVidia's DB can be a huge performance gain despite its large granularity.
It's why the X1k material showed a tree (with real geometrical branches) being shadowed when discussing DB performance.

Jawed

Xmas
21-Jan-2006, 14:38
Yeah and no. With PCF, 4 taps, the depth compare and the averaged value is all a single operation and roughly the same cost as a single sample - so using multiples of those is likely to result in a better quality output. With Fetch 4, 4 taps is a sample; the cost of the fetching the 4 taps is the same as a single sample but the compare and average has to be done in the shader, will will probably end up being negligable overall. The point being, given that 4 taps per sample more or less the same cost as just 1 tap per sample then why not do it and sparse sampling?
Compare and average are negligible overall is almost like saying shadow mapping itself is negligible overall, which we know it is not.
Shadow mapping only consists of three operations: sample, compare, average. And then the result is multiplied with the light color/intensity and passed on to the light interaction part of the shader.
PCF cobines those three into a single operation (though I'm not convinced it's single cycle [like point sampling is], which would be bandwidth limited anyway). Fetch4 accelerates sampling but does nothing to the compare and average steps.

Comparing four samples is a vec4 sub followed by a vec4 cmp, and averaging multple samples is a add4/dp4 cascade (unweighted, which in this case is fine).
So if Futuremark wanted to use fetch4 for the PS3.0 tests in 3DMark06 they would have had the controversial choice of taking x < 16 fetch4 samples to somehow match the average quality of 16 point samples, doing less texture sampling and more arithmetic.

Taking 16 fetch4 samples instead of 16 point samples would have increased quality but also the workload by 12 sub4, 12 cmp4 and 12 add4. In any case, they would have done the same for PCF, which actually means that, relatively speaking, ATI is better off with Futuremark not using Fetch4/PCF in the PS3.0 test at all.

Dave Baumann
21-Jan-2006, 15:37
When I said the compare / average for Fetch4 was likely to be fairly negligable I think we're looking at about 2 cycles on RV530 style hardware, some of which will be hidden by the instruction scheduling.

Hubert
21-Jan-2006, 16:04
"This totally nullifies Ati's effort put in optimising bandwith usage"

and

"Because few people will bother doing separate game tests (mostly reviewers) most people will just run it, got their score and an idea about their systems capabilities."

For your second point

I can asure you that most people, presumably gamers, when assessing their systems capabilities will run the standard test and see 25fps and then not decide to apply AA/AF on top of that just to decrease their fps further. This is a theoretical test, not a practical one.

On your first point

No, because the SM3 tests where AA cannot be applied for nvidia are heavily gpu biased and not bandwidth limited at all I think. The SM2 tests might be but then the capability of each card can be measured in turn.

Well, I begin to live with the idea that 3DMark 2006 is just as is. :)
Looking on the bright side, X1800 users can test not just how their card will suck big time running future games, but they also can test how much worse it will perform with AA enabled. :)

Kanyamagufa
21-Jan-2006, 18:28
Could we get a Mac version in the next release? :wink:

You know you want to.

Jawed
21-Jan-2006, 18:38
Do we have a concensus that shadowing in graphics tests 3 and 4 is a level playing field for ATI and NVidia hardware?

Jawed

digitalwanderer
21-Jan-2006, 19:09
I think their point being is that given there are two paths already there for many things, why not two paths for the shadowing?
I don't understand why not either, Nick do you know? :-|

Demirug
21-Jan-2006, 19:11
Could we get a Mac version in the next release? :wink:

You know you want to.

As soon as MacOS X supports Direct3D and you have an Intel Mac.

Kanyamagufa
21-Jan-2006, 19:19
As soon as MacOS X supports Direct3D and you have an Intel Mac.

Heh, hell froze over once already this year, if we're lucky it may just happen again. :wink:

Xmas
21-Jan-2006, 19:58
When I said the compare / average for Fetch4 was likely to be fairly negligable I think we're looking at about 2 cycles on RV530 style hardware, some of which will be hidden by the instruction scheduling.
Don't get me wrong, I do agree that Fetch4 and PCF should be used when available. However, it wouldn't have been as easy as saying "let's enable fetch4", because even if the three additional samples per fetch cost only half a cycle, that's going to be 8 cycles per pixel for a 16-sample sparse kernel. So Futuremark would have had to adjust the number of samples to somehow get comparable quality in all three paths. Which isn't always trivial.

btw, I also think that the decision to use either F4/PCF or a 4-tap rotated kernel in the PS2.0 test is somewhat poor in terms of comparable quality. However I'm not sure there's a better one. 3-tap maybe?
And the point sampling can be hidden by a sufficiently arithmetically complex shader, too.

Cowboy X
22-Jan-2006, 03:27
Like what, specifically?

Brilinear , 3d murk , clip planes ............. . And just generally poor iq , not because the cards couldn't look better , but because looking better wouldn't have been competitive .

Chalnoth
22-Jan-2006, 04:35
Brilinear , 3d murk , clip planes ............. .
Okay, I don't quite see how you'd call the first cheating. I also have no idea what you're talking about with "3d murk," but the third was, from what I remember, only done in 3DMark. You specifically were attempting to call attention to cheating for non-DX9 games.

mrcorbo
22-Jan-2006, 18:02
Yet more irony:

3DMark 2006 will probably show X1900 to greater advantage than any other available benchmark.

digitalwanderer
22-Jan-2006, 18:09
Yet more irony:

3DMark 2006 will probably show X1900 to greater advantage than any other available benchmark.
Not true...."there is another". :cool:

neliz
22-Jan-2006, 20:51
Not true...."there is another". :cool:

the son of fudo?

mrcorbo
23-Jan-2006, 00:40
Not true...."there is another". :cool:

Nice tease. :D

would have been even better with the dramatic pauses "There....is.....an...oth....errrr.

Chalnoth
23-Jan-2006, 01:33
Dude, don't rip on Yoda. He was dying, man!

neliz
23-Jan-2006, 08:28
Nice tease. :D

would have been even better with the dramatic pauses "There....is.....an...oth....errrr.


So what is different between the XT and XTX besides the few extra clocks?

Junkstyle
23-Jan-2006, 08:43
I was wondering if the 3dmark rep could be so kind in explaining why rendering an entire 3d scene frame by frame using only the CPU is relevant to any kindof game. Thanks.

neliz
23-Jan-2006, 08:57
I was wondering if the 3dmark rep could be so kind in explaining why rendering an entire 3d scene frame by frame using only the CPU is relevant to any kindof game. Thanks.

That results in a relative CPU performance for that workload.
Like every benchmark, if they're benchmarking something you don't use, it's irrelevant to you.

The problem is with 3dmark06 is that by this time next month it will allready be an obsolete benchmark.

Neeyik
23-Jan-2006, 09:06
I was wondering if the 3dmark rep could be so kind in explaining why rendering an entire 3d scene frame by frame using only the CPU is relevant to any kindof game. Thanks.
It's not - all rendering is done by the graphics card, unlike in 03/05 where the vertex shaders were done on the CPU.

bigmouse
23-Jan-2006, 13:31
I'm a user of Geforce 6200.I know that this display card can't run the HDR/SM3.0 demos of 3dmark06 by default.But how to let it to run HDR/SM3.0 demos by adding parameters?

Neeyik
23-Jan-2006, 14:03
You can't - the HDR/SM3.0 use FP16 blending which the 6200 simply cannot do.

Cowboy X
23-Jan-2006, 15:07
Okay, I don't quite see how you'd call the first cheating. I also have no idea what you're talking about with "3d murk," but the third was, from what I remember, only done in 3DMark. You specifically were attempting to call attention to cheating for non-DX9 games.

I considered the bri-linear 'optimisations' present in those days to be cheats , since they visibly( sometimes significantly ) lowered quality while boosting speed .But why i thought them to truly be cheats :

1/ Not done transparently to users or people like us who follow things like this .

2/ Lies , smoke and mirrors hindered us getting to the bottom of these bug/optimisations .

3/ Sliders and quality modes in the drivers that blatantly were not working or purposely did less than what they were supposed to do .

Importantly bri-linear as it is now called was applied across the board in the nv drivers and affected all DX games not just DX 9 ones . What made matters worse was that in many cases FX hardware would have done quite well without the optimisations especially in their highend ( 5800 5800U 5900 5900U ) with DX 8 and DX 7 and hybrid games ( Unreal Tournament 2003 etc ) . But because they would likely lose the benchmark or at least not look as good as the ATI high end offerings FX users were subjected to subterfuge and poor image quality for benchmarketing purposes .

Some people still don't think that bri-linear as it was applied then is/was a cheat or bad optimisation , but I do . If you don't , no problem we can agree to disagree .

Mariner
23-Jan-2006, 17:43
Ooh, look. Another article from The Inquirer entitled, 3DMark06 confuses Fuad (http://www.theinquirer.net/?article=29188)

Goodness, I wish he'd get a clue. :lol:

Razor1
23-Jan-2006, 17:46
Ooh, look. Another article from The Inquirer entitled, 3DMark06 confuses Fuad (http://www.theinquirer.net/?article=29188)

Goodness, I wish he'd get a clue. :lol:


LOL its good that he knows he is confused :lol:

Jawed
23-Jan-2006, 18:05
Actually I think it's a good point, overall.

Indeed, maybe it would have been better if FM had created 3DMk06 as a purely SM3 benchmark.

It would have better-reflected the future prospects of the available GPUs - even if we're still a long way from seeing a game with intensive use of SM3.

Oh, and isn't it more fun to link to:

http://www.chipzilla.com/?article=29188

Jawed

mrcorbo
23-Jan-2006, 19:00
Ooh, look. Another article from The Inquirer entitled, 3DMark06 confuses Fuad (http://www.theinquirer.net/?article=29188)

Goodness, I wish he'd get a clue. :lol:

Grrrrr. He actually posted pictures showing the actual breakdown of scores:
1600 XT SM2.0 883, SM3.0 896.
X850 SM2.0 1153, SM3.0 N/A.

IMO any intelligent person should be able to figure out that you need to use the SM2.0 and SM3.0 scores independantly in '06 when comparing video cards. The final 3DMark score is completely unsuitable for this, not only because of this issue (the penalty on SM2.0-only cards), but because of the CPU score being factored in. What the 3DMark score does do is indicate that in future games, in order to support their full featureset with the best performance a SM3.0 capable card and a multi-core processor will be needed. Any arguments with this? The means they chose to do this was making the lack of these factor negatively into the 3DMark score.

This benchmark is just a tool. And when it comes to how well reviewers use it I'll refer to the old saying, "Blame the craftsman, not the tools."

Edited for readability

ANova
25-Jan-2006, 19:29
Grrrrr. He actually posted pictures showing the actual breakdown of scores:
1600 XT SM2.0 883, SM3.0 896.
X850 SM2.0 1153, SM3.0 N/A.

IMO any intelligent person should be able to figure out that you need to use the SM2.0 and SM3.0 scores independantly in '06 when comparing video cards. The final 3DMark score is completely unsuitable for this, not only because of this issue (the penalty on SM2.0-only cards), but because of the CPU score being factored in. What the 3DMark score does do is indicate that in future games, in order to support their full featureset with the best performance a SM3.0 capable card and a multi-core processor will be needed. Any arguments with this? The means they chose to do this was making the lack of these factor negatively into the 3DMark score.

This benchmark is just a tool. And when it comes to how well reviewers use it I'll refer to the old saying, "Blame the craftsman, not the tools."

Edited for readability

In this case the best way to compare them would be by disabling the SM3 tests, since both were run on the same system with the same CPU. Fuad makes a good point, the majority of people look at the final score, not the individual scores and this gives SM3 capable cards an artificial advantage that will not translate to real world games. Not to mention the different paths ATI and nvidia cards are running regardless of whether or not they're SM3 capable. 3dmark06 is fun to look at, but otherwise completely pointless imo.

mrcorbo
26-Jan-2006, 00:08
In this case the best way to compare them would be by disabling the SM3 tests, since both were run on the same system with the same CPU. Fuad makes a good point, the majority of people look at the final score, not the individual scores and this gives SM3 capable cards an artificial advantage that will not translate to real world games. Not to mention the different paths ATI and nvidia cards are running regardless of whether or not they're SM3 capable. 3dmark06 is fun to look at, but otherwise completely pointless imo.

Why bother changing the settings? The damn score is right there. It takes no additional effort to get these results, that's what drives me crazy. It's shows either true ignorance or willful ignorance, because laziness isn't a possible excuse.

The majority of people don't really factor into it. Any idiot can do a run of 3DMark. But a competant reviewer should be able to analyze the results and comprehend what they mean. You can make the same comparisons WRT performance as you could make using '05. You just have to go about it differently.

Finally, there will be effects introduced in games that will be SM3.0-only. There is going to be a performance advantage for multi-core processors. And IMO it is proper for a benchmark that is aiming to be an indicator of how capable your system is of running these future tiltles to take this into account in some way. How else could they quantify the fact that a card may be equivalent in raw power to a newer design, but won't be capable of running at the same settings because of missing features? How else could they indicate how your GPU may be stalled while D3D fights with the AI and physics threads for CPU time?

The gripe about using shader code that favors Nvidia hardware is the only one that I really don't dispute, because I can't. This is something that I will leave to the experts to argue. I will ask this, though. Given the default settings that 3DMark '06 runs at (no AA or AF) are the results that we have seen so far that far off what can be proven (i.e. current shader-heavy games)?

ANova
26-Jan-2006, 05:21
Why bother changing the settings? The damn score is right there. It takes no additional effort to get these results, that's what drives me crazy. It's shows either true ignorance or willful ignorance, because laziness isn't a possible excuse.

The majority of people don't really factor into it. Any idiot can do a run of 3DMark. But a competent reviewer should be able to analyze the results and comprehend what they mean. You can make the same comparisons WRT performance as you could make using '05. You just have to go about it differently.

And this is the problem. Look around various forums, those that have a "Post your 3dmark06" thread only post the overall score, then complain when their X850 XT and Athlon 64 3500+ is being outperformed by an X1600 XT and Pentium D 3.0 GHz. This benchmark has changed things, but most people aren't aware of the changes and what they mean. Furthermore unless you buy it you cannot fiddle with the settings to compare to each other. Yes a competant reviewer should be able to analyze the results, but the fact is not all reviewers are as knowledged in this area as others.

Finally, there will be effects introduced in games that will be SM3.0-only. There is going to be a performance advantage for multi-core processors. And IMO it is proper for a benchmark that is aiming to be an indicator of how capable your system is of running these future tiltles to take this into account in some way. How else could they quantify the fact that a card may be equivalent in raw power to a newer design, but won't be capable of running at the same settings because of missing features? How else could they indicate how your GPU may be stalled while D3D fights with the AI and physics threads for CPU time?

There's a fundamental problem with this. Firstly, the primary benefits to SM3 over SM2 are barely or not even being used in this benchmark. There really is very little difference between the two; most games (including 3dm06) are simply choosing to equate HDR with SM3 because SM3 capable cards also have the features to more easily implement it, any shaders in SM3 can be done in SM2 in addition, they may simply run a little slower depending on the complexity and whether or not they make use of dynamic branching, it's up to the developers to decide. Considering DirectX 10 is around the corner (end of this year) I doubt SM3 will have any significant life, and within a year non DX10 games will probably fall back to SM2 rather then SM3 since there are more cards in the mainstream for the former. Not to mention some SM3 cards are too slow to make any use of things like dynamic branching. Just because one may not be able to run HDR or a shader catered specifically toward SM3 doesn't mean the card that does support these features should recieve a much higher score, that does not equate to performance, that equates to extra visuals.

As far as multi-core processors, while they will most definitely help in future titles that make use of all the cores I'd say the results given are exadurated being that it can double the cpu score. It is doubtful that we'll see anywhere near that significant of a performance increase in anything other than synthetic benchmarks.

The gripe about using shader code that favors Nvidia hardware is the only one that I really don't dispute, because I can't. This is something that I will leave to the experts to argue. I will ask this, though. Given the default settings that 3DMark '06 runs at (no AA or AF) are the results that we have seen so far that far off what can be proven (i.e. current shader-heavy games)?

Any results that you get cannot be taken seriously since both cards aren't running the same settings and optimizations. If you compare the GTX 512 to the X1900 XTX the latter only beats the former by a small amount. In games like FEAR and AOE3 which are indicative of where future titles are going, ie. much more shader heavy, we are seeing a much bigger real world performance difference between the two. I'd hardly call that a future looking benchmark, which is rather self contradictory since 3dmark07 will likely be replacing it by the end of this year or beginning of next year.

mrcorbo
26-Jan-2006, 19:01
And this is the problem. Look around various forums, those that have a "Post your 3dmark06" thread only post the overall score, then complain when their X850 XT and Athlon 64 3500+ is being outperformed by an X1600 XT and Pentium D 3.0 GHz. This benchmark has changed things, but most people aren't aware of the changes and what they mean. Furthermore unless you buy it you cannot fiddle with the settings to compare to each other. Yes a competant reviewer should be able to analyze the results, but the fact is not all reviewers are as knowledged in this area as others.

No fiddling necessary. Again, the results you need to compare are right there when you run the default benchmark. And you can't really trust an incompetant reviewer to give you a good review regardless of what benchmarks they run. It's the craftsman not the tools....



most games (including 3dm06) are simply choosing to equate HDR with SM3 because SM3 capable cards also have the features to more easily implement it

Yup. :)

Considering DirectX 10 is around the corner (end of this year) I doubt SM3 will have any significant life, and within a year non DX10 games will probably fall back to SM2 rather then SM3 since there are more cards in the mainstream for the former.

I disagree with this. Over the next 8 months I think we will see a signifigant uptake in the numbers of SM3.0 hardware and thus much more use of HDR and other features that may be best implemented on SM3.0 capable hardware, even if it isn't strictly the SM3.0 compliance that enables them.

As far as multi-core processors, while they will most definitely help in future titles that make use of all the cores I'd say the results given are exadurated being that it can double the cpu score. It is doubtful that we'll see anywhere near that significant of a performance increase in anything other than synthetic benchmarks.

You're probably right. The hit in the '06 results does kinda drive the point home, though. Maybe they just wanted to make sure that people "got it".



Any results that you get cannot be taken seriously since both cards aren't running the same settings and optimizations. If you compare the GTX 512 to the X1900 XTX the latter only beats the former by a small amount. In games like FEAR and AOE3 which are indicative of where future titles are going, ie. much more shader heavy, we are seeing a much bigger real world performance difference between the two.

I couldn't find a single review that supports this for these two games. With no AA/AF enabled the performance differences are roughly the same (in the %10 range) as the difference in the '06 scores. Maybe you can find some links. If you want to compare with AA/AF results you would have to get the SM2.0 results from both cards in '06 with AA/AF enabled and compare that.

tEd
26-Jan-2006, 22:24
I'm sure it has been mentioned in this thread before somewhere but i couldn't find a clear answer.

For what is DB exactly used in 3dmark06?

thx

Chalnoth
26-Jan-2006, 22:34
Probably soft shadows.

Neeyik
26-Jan-2006, 22:57
Although I've not sat through the several hundred pixel shader dumps, the longer ones from the HDR test look something like this:


ps_3_0

def c6 , -0.018729299306869507000000, 0.074261002242565155000000, 1.570728778839111300000000, 10000.000000000000000000000000

def c7 , 0.000000000000000000000000, 1.000000000000000000000000, 0.031250000000000000000000, 0.062500000000000000000000

def c8 , 2.000000000000000000000000, -1.000000000000000000000000, 1.000000000000000000000000, -0.212114393711090090000000

def c9 , 0.416087001562118530000000, -0.303380995988845830000000, 0.135195001959800720000000, 0.220419004559516910000000

def c10 , -0.183682993054389950000000, 0.077253997325897217000000, -0.252817988395690920000000, -0.237764000892639160000000

def c11 , -0.054127000272274017000000, 0.662913024425506590000000, -0.031250000000000000000000, 0.318309873342514040000000

def c12 , -0.486135989427566530000000, 0.397747993469238280000000, -0.397747993469238280000000, 3.000000000000000000000000

def c13 , 0.574523985385894780000000, -0.062500000000000000000000, -0.574523985385894780000000, 0.108253002166748050000000

def c14 , -0.625000000000000000000000, -0.750000000000000000000000, 0.875000000000000000000000, 0.187500000000000000000000

def c15 , 1.000000000000000000000000, 1.001000046730041500000000, -0.797193884849548340000000, 0.014567226171493530000000

def c16 , 0.636619746685028080000000, -1.009999990463256800000000, -1.120000004768371600000000, 0.000100009805464651440000

def c17 , 0.500000000000000000000000, 1.000000000000000000000000, 0.159154936671257020000000, 16.000000000000000000000000

dcl_texcoord0 v0.xy
dcl_texcoord1 v1.xy
dcl_texcoord2 v2.xyz
dcl_texcoord3 v3.xyz
dcl_texcoord4 v4.xyz
dcl_texcoord5 v5.xyz
dcl_texcoord6 v6.xyz
dcl_texcoord7 v7.xyz
dcl v4096.xy
dcl_2d s0
dcl_2d s1
dcl_2d s2
dcl_2d s3
dcl_2d s4
dcl_cube s5
dcl_cube s6
texld r0 , v0.xyxx , s3
mad_pp r1.xyz , c8.xxxx , r0.wyzw , c8.yyyy
nrm_pp r2.xyz , v2
dp3_pp r4.x , r1 , v3
dp3_pp r4.y , r1 , v4
dp3_pp r4.z , r1 , v5
dp3_pp r0.x , r2 , v3
dp3_pp r0.y , r2 , v4
dp3_pp r0.z , r2 , v5
dp3_pp r0.w , -r0 , r4
add_pp r0.w , r0.wwww , r0.wwww
dp3_sat_pp r2.w , r1 , r2
mad_pp r0.xyz , r4 , -r0.wwww , -r0
texld_pp r1 , r0 , s6
mad_pp r0.w , r2.wwww , c6.xxxx , c6.yyyy
add_pp r0.z , -r2.wwww , c8.zzzz
mad_pp r0.w , r0.wwww , r2.wwww , c8.wwww
rsq_pp r0.z , r0.zzzz
mad_pp r0.w , r0.wwww , r2.wwww , c6.zzzz
rcp_pp r0.z , r0.zzzz
mul_pp r0.w , r0.wwww , r0.zzzz
mad r0.xy , r0.wwww , c16.xxxx , c16.yzzw
mul r0.xy , r0 , r0
rcp r0.x , r0.xxxx
rcp r0.y , r0.yyyy
add r0.w , -r0.xxxx , c6.wwww
mul r1.w , r0.wwww , c16.wwww
add r2.w , r0.yyyy , c15.zzzz
texld r0 , v0.xyxx , s2
mov r6.z , c4.xxxx
add r2.z , -r6.zzzz , c5.xxxx
mad r2.w , r2.wwww , -c15.wwww , c15.xxxx
mad_pp r5.w , r0.wwww , r2.zzzz , c4.xxxx
mul_pp r3 , r0.xyzz , c3.xyzz
mul_pp r2.z , r5.wwww , r5.wwww
add_pp r0 , r3 , c8.yyyy
mad_pp r4.w , r5.wwww , -r2.zzzz , c8.zzzz
mad r2 , r2.wwww , r0 , c8.zzzz
add_pp r0.xy , -r4.wwww , c15
mad_sat_pp r1.w , r0.yyyy , r1.wwww , r0.xxxx
add_pp r5.w , -r5.wwww , c8.zzzz
texld_pp r0 , r4 , s5
mul_pp r0 , r3 , r0.xyzz
mul_pp r1 , r1.xyzz , r1.wwww
mul_pp r0 , r5.wwww , r0
mul r2 , r2 , r1
mul_pp r0 , r4.wwww , r0
texld r1 , v1.xyxx , s4
mul r2 , r2 , r1.wwww
mad_pp r0 , r0 , r1.xxxx , r2
cmp r2.w , -v6.zzzz , c7.xxxx , c7.yyyy
mul r1.xy , v4096 , c7.zzzz
texld_pp r1 , r1 , s1
if_ne r2 , -r2.wwww ******* NOTE ******
dsx r2 , v7.xyxy
dsy r3 , v7.xyxy
add r2 , abs r2 , abs r3
mov r3.w , c2.xxxx
mad_pp r2 , r2 , r3.wwww , c1.xyxy
mul_pp r4 , r1.zwxy , r2
mad r2 , r4 , c14.zzww , v7.xyxy
texldl r1 , r2.xyxy , s0
texldl r5 , r2.zwzw , s0
mad r2 , r4.zwzw , c17.xxyy , v7.xyxy
texldl r3 , r2.xyxy , s0
texldl r2 , r2.zwzw , s0
mov r1.y , r5.xxxx
mov r1.z , r3.xxxx
mov r1.w , r2.xxxx
mul_pp r2 , r4 , c9.xxyy
add r1 , r1 , -v7.zzzz
mad_pp r2 , r4.zwxy , c9.zzww , r2
cmp_pp r1 , r1 , c7.yyyy , c7.xxxx
add r3 , r2 , v7.xyxy
texldl r2 , r3.xyxy , s0
texldl r5 , r3.zwzw , s0
mul_pp r3 , r4 , c10.xxyy
mad_pp r3 , r4.zwxy , c10.zzww , r3
mov r2.y , r5.xxxx
add r3 , r3 , v7.xyxy
texldl r5 , r3.xyxy , s0
texldl r3 , r3.zwzw , s0
mov r2.z , r5.xxxx
mov r2.w , r3.xxxx
add r3 , r2 , -v7.zzzz
mul_pp r2 , r4 , c11.xxyy
cmp_pp r3 , r3 , c7.yyyy , c7.xxxx
mad_pp r2 , r4.zwxy , c11.zzyy , r2
dp4_pp r6.w , r3 , c7.wwww
add r2 , r2 , v7.xyxy
texldl r3 , r2.xyxy , s0
texldl r5 , r2.zwzw , s0
mul_pp r2 , r4 , c12.xxyy
mad_pp r2 , r4.zwxy , c12.xxzz , r2
mov r3.y , r5.xxxx
add r5 , r2 , v7.xyxy
texldl r2 , r5.xyxy , s0
texldl r5 , r5.zwzw , s0
mov r3.z , r2.xxxx
mul_pp r2 , r4 , c13.xxyy
mov r3.w , r5.xxxx
mad_pp r2 , r4.zwxy , c13.zzww , r2
add r3 , r3 , -v7.zzzz
add r5 , r2 , v7.xyxy
texldl r2 , r5.xyxy , s0
texldl r5 , r5.zwzw , s0
mov r2.y , r5.xxxx
mad r4 , r4 , c14.xxyy , v7.xyxy
texldl r5 , r4.xyxy , s0
texldl r4 , r4.zwzw , s0
mov r2.z , r5.xxxx
mov r2.w , r4.xxxx
cmp_pp r3 , r3 , c7.yyyy , c7.xxxx
add r2 , r2 , -v7.zzzz
dp4 r3.w , r3 , c7.wwww
cmp_pp r2 , r2 , c7.yyyy , c7.xxxx
add_pp r3.w , r6.wwww , r3.wwww
dp4 r2.w , r2 , c7.wwww
dp4 r1.z , r1 , c7.wwww
add_pp r1.w , r3.wwww , r2.wwww
add_pp r6.w , r1.zzzz , r1.wwww
texld r1 , v0.xyxx , s3
mad_pp r1.xyz , c8.xxxx , r1.wyzw , c8.yyyy
nrm_pp r5.xyz , v6
dp3_sat_pp r5.w , r1 , r5
nrm_pp r2.xyz , v2
add_pp r2.w , -r5.wwww , c8.zzzz
mad_pp r1.w , r5.wwww , c6.xxxx , c6.yyyy
rsq_pp r2.w , r2.wwww
mad_pp r1.w , r1.wwww , r5.wwww , c8.wwww
rcp_pp r2.w , r2.wwww
mad_pp r1.w , r1.wwww , r5.wwww , c6.zzzz
add_pp r4.xyz , r5 , r2
mul_pp r1.w , r2.wwww , r1.wwww
nrm_pp r3.xyz , r4
mad r4.xy , r1.wwww , c16.xxxx , c16.zyzw
dp3_sat_pp r4.z , r1 , r3
mul r3.xy , r4 , r4
dp3_sat_pp r1.y , r1 , r2
rcp r2.x , r3.xxxx
rcp r2.y , r3.yyyy
mad_pp r1.w , r1.yyyy , c6.xxxx , c6.yyyy
add_pp r1.z , -r1.yyyy , c8.zzzz
mad_pp r1.w , r1.wwww , r1.yyyy , c8.wwww
rsq_pp r1.z , r1.zzzz
mad_pp r1.w , r1.wwww , r1.yyyy , c6.zzzz
rcp_pp r1.y , r1.zzzz
add r1.z , r2.xxxx , c15.zzzz
mul_pp r1.w , r1.wwww , r1.yyyy
mul r1.z , r1.zzzz , c15.wwww
mad r1.xy , r1.wwww , c16.xxxx , c16.yzzw
add r1.w , -r2.yyyy , c6.wwww
mul r1.xy , r1 , r1
mul r1.w , r1.zzzz , r1.wwww
rcp r1.x , r1.xxxx
rcp r1.y , r1.yyyy
mul r1.z , r1.wwww , c16.wwww
add r1.w , -r1.xxxx , c6.wwww
mul r1.w , r1.zzzz , r1.wwww
add r1.z , -r6.zzzz , c5.xxxx
texld r3 , v0.xyxx , s2
mad_pp r4.w , r3.wwww , r1.zzzz , c4.xxxx
mul r3.w , r1.wwww , c16.wwww
add_pp r6.z , -r4.wwww , c8.zzzz
add r1.z , r1.yyyy , c15.zzzz
rcp_pp r1.w , r6.zzzz
mad r6.y , r1.zzzz , -c15.wwww , c15.xxxx
mul_pp r1.w , r1.wwww , c12.wwww
pow r2.z , r4.zzzz , r1.wwww
add r2.w , r1.wwww , c8.zzzz
mul_pp r1 , r6.wwww , c0.xyzz
mul r2.w , r2.zzzz , r2.wwww
mul_sat r2.w , r2.wwww , c17.zzzz
mul_pp r4.z , r4.wwww , r4.wwww
mul r2 , r1 , r2.wwww
mad_pp r6.w , r4.wwww , -r4.zzzz , c8.zzzz
add_pp r5.xy , -r6.wwww , c15
mul_pp r4 , r3.xyzz , c3.xyzz
mad_sat r5.y , r5.yyyy , r3.wwww , r5.xxxx
add_pp r3 , r4 , c8.yyyy
mul r2 , r2 , r5.yyyy
mad r3 , r6.yyyy , r3 , c8.zzzz
mul_sat_pp r5.z , r5.zzzz , c17.wwww
mul r3 , r2 , r3
texld r2 , v1.xyxx , s4
mul_pp r1 , r1 , r4
mul_pp r1 , r6.zzzz , r1
mul_pp r1 , r6.wwww , r1
mul r3 , r3 , r2.wwww
mul_pp r1 , r1 , c11.wwww
mul_pp r2.w , r5.wwww , r5.zzzz
mad_pp r1 , r1 , r2.xxxx , r3
mad oC0 , r1 , r2.wwww , r0
else
mov oC0 , r0
endif


There are a few like this in the HDR tests.

Xmas
26-Jan-2006, 23:07
That is indeed quite a big if.

Chalnoth
26-Jan-2006, 23:10
My, that if statement can conceivably skip most of the shader.

Edit:
Doh! I blame my missing the reply on an officemate to whom I explained the issue.

ERK
27-Jan-2006, 01:05
Sorry... Could somebody please remind me, do G70-class chips use _pp?
Thanks,
ERK

Chalnoth
27-Jan-2006, 01:11
Sorry... Could somebody please remind me, do G70-class chips use _pp?
Thanks,
ERK
Yes. But the performance difference remains much smaller than it was for the NV3x. All NV4x (G7x-included) still gain from reduced register pressure, and also have the capability to execute partial precision normalization for free.

ERK
27-Jan-2006, 01:14
Thanks, Chal. Seeing all those _PPs in that shader reminded me that there may be some performance help from such for 7800GTX in 3DMark06, but perhaps as you say it is small.

Jawed
27-Jan-2006, 01:17
Yes.

There's the register bandwidth problem to contend with in G70 - maximum of four FP32 registers as operands in any one clock - which will get in the way of certain combinations of instructions, the obvious one being dual-issued MADs:

MAD r0, r1, r2, r3
MAD r2, r1, r4, r5

can't be issued if all the source registers are FP32s - but it's fine if they're all FP16s (or some mixture of FP32/FP16, since two FP16s actually "fit" into one FP32).

Jawed

ERK
27-Jan-2006, 01:17
Follow up question for the experts. How much does _PP help the 7800GTX in the shader above?
For any willing to speculate...

poly-gone
27-Jan-2006, 04:55
Follow up question for the experts. How much does _PP help the 7800GTX in the shader above?
For any willing to speculate...
Not too much, I guess, at least to a lesser extent than the NV4x cards. The NV3x series needed it badly, the NV4x slightly (though in some cases it did help) but hopefully doesn't matter too much for the G70. More than the _pp gain, nVIDIA cards gain performance from using lookup tables over straight math. Also, the DST thing is a big performance gainer.

poly-gone
27-Jan-2006, 05:07
Probably soft shadows.
Probably, but it wouldn't make a difference if they're using 8 test samples (which is probably why they're using DB! Mwahahahaha). I don't see the water using DB either (from what's given in their whitepaper - 2 scrolling normal maps and 4 Gerstner wave functions). The Heterogenous Fog is another likely candidate, though they clearly mention that they're able to get away with just 5 samples (which wouldn't require DB).

Chalnoth
27-Jan-2006, 05:31
Probably, but it wouldn't make a difference if they're using 8 test samples (which is probably why they're using DB! Mwahahahaha).
I believe Nick mentioned somewhere in this thread that they use a custom 16-sample pattern.

poly-gone
27-Jan-2006, 05:43
I believe Nick mentioned somewhere in this thread that they use a custom 16-sample pattern.
Yes, that can be done by encoding the offsets in a 3D volume map. You use 8 test samples to check if the pixel is fully shadowed (then exit the shader quickly), otherwise fetch the remaining 8 samples to soften the edges. It doesn't require DB though, if you skip the testing.

mongoled
27-Jan-2006, 08:20
Slightly off topic, but could someone explain the reason Intel dual core CPU's are providing a significant boost to SM2.0 and HDR/SM3.0 scores in comparison to AMD dual core CPU's? I just saw the review over at AMDZone

http://www.amdzone.com/modules.php?op=modload&name=Sections&file=index&req=viewarticle&artid=229&page=3

Breakdown of results:

SM2.0
P4 840D+7800GT: 1458
FX-60+7800GT: 1266

P4 840D + X1800XL: 1251
FX-60 + X1800XL: 1178

HDR/SM3.0
P4 840D + 7800GT: 1451
FX-60 + 7800GT: 1264

P4 840D + X1800XL: 1317
FX-60 + X1800XL: 1212

The CPU scores show a completely different picture

CPU Score
P4 840D + 7800GT: 1416
FX-60 + 7800GT: 1891

P4 840D + X1800XL: 1388
FX-60 + X1800XL: 1863

-EDIT-

silly mistake :)

Chalnoth
27-Jan-2006, 08:34
You've got a typo in your CPU score results, mongoled :)

But yeah, that definitely seems very strange to me.

ANova
27-Jan-2006, 09:17
You've got a typo in your CPU score results, mongoled :)

But yeah, that definitely seems very strange to me.

Probably has to do with the chipsets, Intel's tend to be very good and fast.

Chalnoth
27-Jan-2006, 09:29
Probably has to do with the chipsets, Intel's tend to be very good and fast.
Both used the nForce4. So it more likely has something to do with hyperthreading. But what, I don't know.

mongoled
27-Jan-2006, 11:21
You've got a typo in your CPU score results, mongoled :)

But yeah, that definitely seems very strange to me.Thanxs for tht, sorted it out. Still interested to see if someone else can shed more light on this as the difference in the scores is quite obvious. Would the CPU's be doing work with regards to SM2.0 and SM3.0?

Neeyik
27-Jan-2006, 14:07
Thread closed for a minute to do some post separating.

Edit: Threads pertaining to the discussion about SSAA and MSAA have now been moved to here:

http://www.beyond3d.com/forum/showthread.php?t=27848

If anyone can think of a better title for the thread though, please let me know! Don't forget that this thread is for discussing Futuremark's 3DMark06.

Fox5
27-Jan-2006, 15:59
Both used the nForce4. So it more likely has something to do with hyperthreading. But what, I don't know.

Weren't both cpus used dual core cpus?

Chalnoth
27-Jan-2006, 18:01
Weren't both cpus used dual core cpus?
Yes. But that doesn't necessarily mean that hyperthreading wouldn't have had an effect. It is, after all, the primary advantage available for the P4.

OpenGL guy
27-Jan-2006, 20:34
Yes, that can be done by encoding the offsets in a 3D volume map. You use 8 test samples to check if the pixel is fully shadowed (then exit the shader quickly), otherwise fetch the remaining 8 samples to soften the edges.
Just because 8 samples lie within, or out of, the shadow doesn't mean all 16 will! A better solution is to generate an edge mask so that only pixels within the edge mask get all 16 samples. Pixels outside the edge mask only need 1 sample to determine whether they are shadowed or not.

Geo
27-Jan-2006, 20:38
More Fetch4/3dm06 dramatoid:

http://techreport.com/onearticle.x/9324

Moloch
27-Jan-2006, 20:44
Quack:!: ;)


edited so it's clear I'm joking.

Farid
27-Jan-2006, 21:03
More Fetch4/3dm06 dramatoid:

http://techreport.com/onearticle.x/9324
I think it's time for Ati to admit that they lost the IQ crown in favor of the technologically superior nVIDIA™ cards.

This bug on a particular surface in a particular test of 3DMARKS 2006™ is a clear indicative that Ati are not only cheating with their drivers, but are also deceiving their customers, their friends and all the humankind, plus Rys (Since he's not exactly a part of the humankind, he's like an evolved hamster according to some folks).

Here's another unrelated and inconclusive anedoctical evidence to go with the TR one, the other day I saw an ati logo in a magazine, and the logo had clearly image quality issue, like color dithering or something.
You know what that means? Ati cheats with their Drivers. Yeah, even the paper drivers!

I think it's time to call for a general boycott of all Ati products. i'm starting an Online Petition right away!

And before someone points out that it's maybe a simple anecdotical driver bug, and that the TR wanted a few cheap clicks. Let me tell you this you folks of little faith,... You're probably right.

Jawed
27-Jan-2006, 21:22
Funny that, don't we already have Hanners's word that DF24/Fetch4 isn't actually working on X1900XT and is only working on X1600XT (no comments either way for X1300)?

Jawed

Chalnoth
27-Jan-2006, 21:25
More Fetch4/3dm06 dramatoid:

http://techreport.com/onearticle.x/9324
That looks like precision problems. Interesting. Well, these were beta drivers for the X1900, so no conclusion should be drawn for at least two weeks.

Geo
27-Jan-2006, 21:27
Funny that, don't we already have Hanners's word that DF24/Fetch4 isn't actually working on X1900XT and is only working on X1600XT (no comments either way for X1300)?

Jawed

Well, I asked him that: http://www.elitebastards.com/forum/viewtopic.php?t=13550

Xmas
27-Jan-2006, 21:31
Just because 8 samples lie within, or out of, the shadow doesn't mean all 16 will! A better solution is to generate an edge mask so that only pixels within the edge mask get all 16 samples. Pixels outside the edge mask only need 1 sample to determine whether they are shadowed or not.
Just because a pixel isn't inside the edge mask doesn't mean it's not a penumbra pixel. If you take 8 "outer" samples first, the results are likely very similar to those of an edge mask. And generating an edge mask needs sampling from the shadow map, too.

Jawed
27-Jan-2006, 21:48
In (corrected link):

http://www.ati.com/developer/SIGGRAPH05/ShadingCourse_ATI.pdf

page 21 it says:

Shadow map edge map must be dilated to at least the width of the filtering kernel

It seems to me that the edge map encompasses the entire penumbra, rather than being "1 pixel" thick along "edges".

Jawed

Moloch
28-Jan-2006, 03:10
I think it's time for Ati to admit that they lost the IQ crown in favor of the technologically superior nVIDIA™ cards.

This bug on a particular surface in a particular test of 3DMARKS 2006™ is a clear indicative that Ati are not only cheating with their drivers, but are also deceiving their customers, their friends and all the humankind, plus Rys (Since he's not exactly a part of the humankind, he's like an evolved hamster according to some folks).

Here's another unrelated and inconclusive anedoctical evidence to go with the TR one, the other day I saw an ati logo in a magazine, and the logo had clearly image quality issue, like color dithering or something.
You know what that means? Ati cheats with their Drivers. Yeah, even the paper drivers!

I think it's time to call for a general boycott of all Ati products. i'm starting an Online Petition right away!

And before someone points out that it's maybe a simple anecdotical driver bug, and that the TR wanted a few cheap clicks. Let me tell you this you folks of little faith,... You're probably right.
Well said mate :grin:

Xmas
28-Jan-2006, 12:55
It seems to me that the edge map encompasses the entire penumbra, rather than being "1 pixel" thick along "edges".
That's not what I meant. As you are using a finite resolution shadow map, you can miss "single pixel penumbras". It's almost the same with taking 8 "outer" samples first.

btw, your link doesn't work.

Jawed
28-Jan-2006, 14:12
I corrected the link - copy and pasting from another posting used the truncated-display version :cry:

I can't figure out what a single pixel penumbra is. Is that a penumbra that should be one pixel wide? Or do you mean penumbras due to particles? :oops:

Jawed

Xmas
28-Jan-2006, 14:20
I mean a penumbra that is just (less than) a single pixel wide. Both approaches have the same problem: sampling gaps.

Jawed
28-Jan-2006, 14:42
Thanks.

What's puzzling me is whether these algorithms are attempting to vary the penumbra width as the depth-ratio: light-blocker/light-receiver varies. The edge map on page 21 sort of looks like it varies in thickness.

It seems to me that the kernel should vary in diameter as the ratio changes. I'm not sure if that would help with the < single pixel penumbra problem, though.

Jawed

Geo
04-Mar-2006, 17:22
Hexus up with a long, thotful analysis: http://www.hexus.net/content/item.php?item=4599

There are many interesting points, technical and, err "political" (for lack of a better term) to point at, but from a summarization pov what really struck me as "the money shot" is about mid-article:


Can you reasonably engineer a faithful and unbiased "Gamer's Benchmark" under those conditions? I'd argue that you reasonably can't. The compromises you have to make would be too polarised, and I think that's entirely visible with 3DMark06.

digitalwanderer
04-Mar-2006, 17:36
A great write up, nicely done Rys! :D

Kanyamagufa
04-Mar-2006, 21:29
A great write up, nicely done Rys! :D

QFT - Very good article Rys. The evil git in me would love to see FM's reaction to it.