FarCry Performance Revisited: ATI Strikes Back with Sh 2b

If I were a betting person (which I aint) my money would be on NV40 being faster, I base this solely on the use of Perspective Shadow Map's.

Assuming futuremark use the built in depth texture sampling and PCF on NV hardware, emulating it at the same quality on ATI hardware is going to require a lot of extra shader ops.

Which I guess brings up another issue, is using a DX feature unique to one manufacturer that significantly accellerates rendering of a key feature in the benchmark fair?
 
ERP said:
If I were a betting person (which I aint) my money would be on NV40 being faster, I base this solely on the use of Perspective Shadow Map's.

Assuming futuremark use the built in depth texture sampling and PCF on NV hardware, emulating it at the same quality on ATI hardware is going to require a lot of extra shader ops.

Which I guess brings up another issue, is using a DX feature unique to one manufacturer that significantly accellerates rendering of a key feature in the benchmark fair?
I think it depends on what the benchmark demonstrating. Game development conditions? A technique? In the case of the former, emulating a feature that other hardware has may not be particularily fair -- unless it is deemed to be a necessary. It doesn't make sense for a game devloper to force one video card to be slower without giving the option for an alternative technique. In the case of demonstrating a technique, it is totally fair as it shows how fast various hardware can render the effect with the same quality.
 
I don't know about the average PC developer, but if I were implementing PSM on both pieces of hardware (And FWIW I think it's probably the right way to go), I would implement it using NV's depth textures and emulate it on ATI hardware. I might give an option (or just defaullt) to reduce the quality on ATI cards.

In a lot of ways it's just as unfair to not use the feature that NV provides, since it's there just for this purpose.

The real question is what is 3DMark trying to simulate (and I think it's game like environments) so it comes down to what do the futuremark developers think subsequent games will do, and how will they implement. Multiple paths for cards with Depth Texture support and those without, or just ignore the depth texture support and have a single path for both sets of cards. I can see arguments that are valid either way.
 
Bouncing Zabaglione Bros. said:
Luminescent said:
If clocked equally, NV40 would open up a can on R420.

But they don't clock equally, do they? Probably one of the reasons why ATI left out SM3.0 was so that they would have that clock speed advantage.

So when ATI is forced to move to SM3.0 who do you guys think will have the upper hand? That will be interesting to see since we won't have this precision / feature support disparity that we do now. Unless nvidia decides to support SM3.0b or some shit and increase their tran count even more. :rolleyes:
 
trinibwoy said:
Bouncing Zabaglione Bros. said:
Luminescent said:
If clocked equally, NV40 would open up a can on R420.

But they don't clock equally, do they? Probably one of the reasons why ATI left out SM3.0 was so that they would have that clock speed advantage.

So when ATI is forced to move to SM3.0 who do you guys think will have the upper hand? That will be interesting to see since we won't have this precision / feature support disparity that we do now. Unless nvidia decides to support SM3.0b or some shit and increase their tran count even more. :rolleyes:

i think ati has a better hardware team than nvidia.
 
trinibwoy said:
So when ATI is forced to move to SM3.0 who do you guys think will have the upper hand? That will be interesting to see since we won't have this precision / feature support disparity that we do now. Unless nvidia decides to support SM3.0b or some shit and increase their tran count even more. :rolleyes:

Depends on whether ATI drop to a smaller process for SM3.0, which would be another advantage.

My point was there is nothing useful in "if NV40 was clocked the same as R420", because it isn't. It's just as pointless as replying to that statement with "if R420 was ten times faster". Fact is that NV40 can't clock that as high as R420 because it is a bigger, more complex chip, mostly due to extra SM3.0 transistors.

Does SM3.0 efficiencies offset the clockspeed penalty for designing your chip with those extra transistors? Doesn't seem to be the case so far.
 
maosee said:
SM2b is for cheerleaders.

so whats sm3.0 ? Is that for male cheerleaders ?


Anyway if the nv40 was clocked as fast as the x800xt pe they'd both be so bandwidth limited it wouldn't even be funny

Just look at the x800pro vs 6800gt. THe pro is 12x1 at 475 vs the 6800gt at 350 (?) 16x1 .

The 6800gt is slightly faster. But is that because its more efficent or is it because it has a 50mhz (100mhz effective) memory speed advantage.

I would suspect the r420 series to be slightly slower clock for clock when sm3.0 and p.s2.0b plus instancing is taken into account.

But not so much that the nv40 would be kick the r420 butt .
 
Ostsol said:
Before DX9 there was simply no FPxx in the programmable pixel pipeline. Everything was some sort of fixed point precision. PS1.x cards had 8 bits of mantissa precision, plus the sign bit. ATI's implementation of PS1.4 allowed the same precision, but more range. Instead of a range of [-1,1], they had [-8,8].
I think you'll find that ATIs implementation of PS1.4 had additional precision as well as range - remember that a major advance of PS1.4 was generalised dependent texture lookups - if a PS1.4 implementation only had 8-bits of precision you would get no bilinear filtering on a 256x256 texture on a dependent read, and wouldn't be able to even address all the texels in a 512x512 texture individually, which wouldn't exactly be great...
 
andypski said:
Ostsol said:
Before DX9 there was simply no FPxx in the programmable pixel pipeline. Everything was some sort of fixed point precision. PS1.x cards had 8 bits of mantissa precision, plus the sign bit. ATI's implementation of PS1.4 allowed the same precision, but more range. Instead of a range of [-1,1], they had [-8,8].
I think you'll find that ATIs implementation of PS1.4 had additional precision as well as range - remember that a major advance of PS1.4 was generalised dependent texture lookups - if a PS1.4 implementation only had 8-bits of precision you would get no bilinear filtering on a 256x256 texture on a dependent read, and wouldn't be able to even address all the texels in a 512x512 texture individually, which wouldn't exactly be great...
True. . . If I still had my Radeon 8500 I'd experiment with this.
 
ERP said:
I don't know about the average PC developer, but if I were implementing PSM on both pieces of hardware (And FWIW I think it's probably the right way to go), I would implement it using NV's depth textures and emulate it on ATI hardware. I might give an option (or just defaullt) to reduce the quality on ATI cards.

This seems to be the general trend. Carmack is going to use PCF, and most probably UE3.0 will also use PCF (if anyone has any exact info on UE3.0's shadowing methods, please post).
I think the question should not be "How much slower is ATi at emulating PCF", but rather 'How long until ATi has its own PCF or other shadowmapping hardware functionality?".
Since shadowmaps are most probably going to replace stencilshadows in the next generation of engines, this seems to be what ATi should be doing. Problem could be that NVIDIA's PCF is patented.

The real question is what is 3DMark trying to simulate (and I think it's game like environments) so it comes down to what do the futuremark developers think subsequent games will do, and how will they implement. Multiple paths for cards with Depth Texture support and those without, or just ignore the depth texture support and have a single path for both sets of cards. I can see arguments that are valid either way.

Hard to say what 3dmark is going to do, since their policy has generally been to make the rendering as 'neutral' as possible, not using any kind of vendor-specific features. This hasn't been completely representative for games in the past (*cough*Doom3*cough*), and it may be even less representative in this case. Question is, do they think it's worth an exception? I wouldn't be surprised if they didn't.
 
Scali said:
ERP said:
I don't know about the average PC developer, but if I were implementing PSM on both pieces of hardware (And FWIW I think it's probably the right way to go), I would implement it using NV's depth textures and emulate it on ATI hardware. I might give an option (or just defaullt) to reduce the quality on ATI cards.

This seems to be the general trend. Carmack is going to use PCF...
ERP, Scali: Am I missing something?
[url=http://www.gamedev.net/community/forums/topic.asp?topic_id=266373 said:
Carmack[/url]]With shadow buffers, the new versions that I've been working with, there's a few things that have changed since the time of the original Doom 3 specifications. One this that we have fragment programs now, so we can do pretty sophisticated filtering on there, and that turns out to be the key critical thing. Even if you take the built in hardware percentage closer filtering [PCF], and you render obscenely high resolution shadow maps (2000x2000 or more than that), it still doesn't look good. In general, it's easy to make them look much worse than the stencil shadow volumes when you're in that basic kind of hardware-only level filtering on it. You end up with all the problems you have with biases, and pixel grain issues on there, and it's just not all that great. However, when you start to add a bit of randomized jitter to the samples, you have to take quite a few samples to make it look decent, it changes the picture completely. Four randomized samples is probably going to be our baseline spec for normal kind of shipping quality on the next game. That looks pretty good.
Carmack seems to have spent a LOT of time with shadow maps, and is saying built-in PCF doesn't look very good. I sort of wonder if PCF + multiple jittered samples may look a bit better than w/o PCF, but probably not (I'm sure Carmack thought of that and tried it).

In any case, the shadow master himself wants to use 4 jittered samples instead for the baseline, so I don't think you guys are right on this one. However, it does seem like pretty minimal effort is needed to do PCF in hardware, so I do wish ATI included it.
 
so whats sm3.0 ? Is that for male cheerleaders ?
yep, cheerleaders with "a little somethin' extra", ifyaknowutimean.
i have nothing productive to add to this conversation
 
You use the "free" PCF in conjunction with jittered samples.

I have no Idea what Carmak is intending for his next generation engine, but multiplying the number of samples you have by 4 for free isn't something I'd ignore. It's hard to say from the quote out of context but I'd guess Carmak is suggesting using 4 jittered samples in addition to the hardware PCF for a total of 16 samples.

Shadow maps in general have all sorts of issues that are hard to solve in general cases, PCF is just a help is solving one of them. They do however suck slightly less than shdow volumes IMO.
 
When shadow mapping with multiple jittered samples using bilinear filtered PCF versus point sampled PCF makes surprisingly little difference in appearance. Here are some images from some tests I did.

16 sample point sampled PCF:
16SamplePointPCF.JPG


16 sample bilinear filtered PCF:
16SampleBilinearPCF.JPG


Before taking a look at the differences I was inclined to think that using bilinear filtered PCF on the jittered samples would look significantly better and that it was a mistake on ATI's part to not include hardware support for it. Now, I am not so sure that bilinear filtered PCF is very useful because it does not make any visible difference when using jittered sampling and because shadows maps sampled with just bilinear PCF look ugly in comparison to ones sampled with jittered PCF.
 
Bouncing Zabaglione Bros. said:
Luminescent said:
If clocked equally, NV40 would open up a can on R420.

But they don't clock equally, do they? Probably one of the reasons why ATI left out SM3.0 was so that they would have that clock speed advantage.
Just giving a one-sided fact ;) . I know from a consumer's standpoint it's quite insignificant, although I believe the 6800 Ultra Extreme wins a good number of SM 2.0 benchmarks against the XT PE without SM 3.0 optimized programming.
 
Luminescent said:
although I believe the 6800 Ultra Extreme wins a good number of SM 2.0 benchmarks against the XT PE without SM 3.0 optimized programming.
Going by this, I wonder if there is any significance with the new bolded text :?: ;)
 
Back
Top