Which path will NV40 use in Doom3?

Bjorn · Apr 25, 2004

I'd rather concentrate on the IQ then what path each card is using.

Take the new Futuremark as an example:

The HLSL shaders are dynamically built and runtime compiled using the most optimal compilation target for the installed hardware. Of course, all compilations produce the same rendering.

Then it's of course the question on what you think is the same rendering

But i think it's clear that we're moving away from cards using the same paths.

radar1200gs · Apr 25, 2004

Mordenkainen said:
Thanks Nick and Dave.

radar: here's my opinion why the general path should be used: because benchmarks are always artificial. Even hardocp's are artificial. They say people don't "play" 3dmark03, I say we don't play "UT flyby", I say we don't play "Q3 Crunch". We even don't play their "FRAPS run".

What's worse IMHO however, is that they change image quality options, rez, etc. to "hit the sweet spot". What makes them think what they consider adequate is equal to my experience? And perhaps I prefer to drop down a rez rung than reduce AA (or vice-versa). On top of that, how can they say they test how gamers play if a lot of people is going to have CPU/RAM/Mobo/whatever different from what they test.

What this all means is that, unless you happen to agree with their IQ vs fps standards their benchmark numbers are next to useless. And since they can't test every single game gamers are prone to play, this also means (IMHO) that striving to "mimic gamers" is bound to fail, no matter what.

The other way is to go for benchmarks "you have to be this tall". Then you compare fps & IQ. Like Nick said, this is the only way to compare video cards's power: give them equal (or as similar as possible) workload and judge the fps and IQ results.

I don't read a p/review to know how many fps I'm going to get on game X. I read them to know which is better overall so I can do an informed decision. I want to know worst case fps scenarios because the only way out from that is up and not "sweet spot" scenarios that may clash with what I'm interested in.

But of course, there will be people who disagree. :shrug:

You still can make an informed decision. You can get best, worst and average case figures for each card/path and determine IQ for the resolutions that interest you then compare that data to what the competition can do. How each card delivers the results is irrelevant so long as IQ is up to scratch (nothing missing that should be there) and no cheating is going on.

John Reynolds · Apr 25, 2004

Maybe it's already been mentioned (if so, I didn't notice) but how is the ARB2 path going to give an apples to apples benchmarking scenario when one board will render the game at FP32 and the other 24?

Bjorn · Apr 25, 2004

John Reynolds said:
Maybe it's already been mentioned (if so, I didn't notice) but how is the ARB2 path going to give an apples to apples benchmarking scenario when one board will render the game at FP32 and the other 24?

That is true. And how about the upcoming Far Cry patch which will use SM3.0 for the NV4X and SM2.0 for the R420 ?

And yes, this has been mentioned before. Not in this thread but the last time we had this discussion

Dave Baumann · Apr 25, 2004

Bjorn said:
And how about the upcoming Far Cry patch which will use SM3.0 for the NV4X and SM2.0 for the R420 ?

Haven't these issues already presented themselves anyway? How many people have been taking Far Cry benchmarks not realising they were really looking at PS1.1 on FX and PS2.0 on ATI? Do you know what differences there are in the render paths for Halo even when you select the PS2.0 path because there are differences between ATI and NVIDIA then (NVIDIA couldn't do the predator effects for example)?

At least with with the latest FC patch it appears that they are aligning NV4x and ATI somewhat closer in at least they are rendering mathmatical equivelents (well, the game requests it, gawd knows what is actually rendered) but the SM3.0 path is only used to improve performance, which IMO is a valid comparison (this is the equivelent of, say, SS:SE using two textures per pass on a GF2 / V3 or 4 textures per pass on GF3/4 / R8500).

In terms of usage I think the majority of differences will just be using mathematically equivenet shaders but, for instance, some shaders may be rolled up and branched in SM3.0 and unrolled and multiple shaders provided for SM2.0 - of course, this is assuming SM3.0 branching is actually quicker than providing the unrolled shaders.

What really concerns me for SM3.0 testing is having fixed tests that have branches but drivers replacing that with unrolled shaders internally because it knows the path the benchmark will take - you may think you are testing SM3.0 branching but not be at all (which is why we need tests where we can vary the branching points).

Mordenkainen said:
Hmm? I was under the impression you are always running under one path or another. (either general paths such as ARB/ARB2 or chip dedicated ones like nv10/nv20/nv30/r200/parhelia)

In terms of what you ask the game to do, you are, but that doesn't mean that the drivers aren't detecting that particular shader and replacing it with it own internal one. Each one of the different "paths" (which as I said, pretty much only relates to a different shader code for the unified lighting model) may well be deteched and replaced with a single piece of code within the drivers.

Saem · Apr 25, 2004

Like Dave said, questions about varied rendering paths and other optimizations done by the developer are fine. I don't see a point trying to dodge that reality. If you want to benchmark a game, then do a real world test, otherwise, guess why synthetics were made?

Arbrash, wrote much of the assembly for Quake, IIRC. I believe the function is Carmack's work. I didn't look at the source of the function at all (instantly knew it when I saw the name), if it uses ASM, that's likely Mike's work.

nAo · Apr 25, 2004

DaveBaumann said:
What really concerns me for SM3.0 testing is having fixed tests that have branches but drivers replacing that with unrolled shaders internally because it knows the path the benchmark will take - you may think you are testing SM3.0 branching but not be at all (which is why we need tests where we can vary the branching points).

If a driver can replace a shader with dynamic branching with another one with static or no branching at all..then:
1) the branch wasn't dynamic at all..so the driver is doing a good work
2) or the branch is really dynamic but it takes just one path all the time..well, in this case that shader test was fucked up from the start
3) I know you're thinking about cheating..but (theoretically) the driver can't cheat on a shader with a real dynamic behaviour..at least I don't think you want the driver to switch shader on per pixel basis

If Nalu is shaded with a single PS3.0 shader for real..then no application that resemble a test like that can cheat via shaders replacement, imho.

Each one of the different "paths" (which as I said, pretty much only relates to a different shader code for the unified lighting model) may well be deteched and replaced with a single piece of code within the drivers.

So is the driver going to split one drawindexedprimitive() call in several calls, batching different geometry with different shaders?

ciao,
Marco

Dave Baumann · Apr 25, 2004

nAo â€“ note that Iâ€™m talking about fixed tests. If there are known benchmarks that utilise a fixed path but use a some branching for SM3.0 then its possible just to replace the shaders for that test with unrolled shaders because you know the path it is going to take and you know which code would need to operate. It depends one whether the costs of branching are more or less than the costs of state changing for different shaders.

So is the driver going to split one drawindexedprimitive() call in several calls, batching different geometry with different shaders?

Umm, I was talking about the Doom3 shaders there.

AlphaWolf · Apr 25, 2004

nAo said:
If a driver can replace a shader with dynamic branching with another one with static or no branching at all..then:
1) the branch wasn't dynamic at all..so the driver is doing a good work
2) or the branch is really dynamic but it takes just one path all the time..well, in this case that shader test was fucked up from the start
3) I know you're thinking about cheating..but (theoretically) the driver can't cheat on a shader with a real dynamic behaviour..at least I don't think you want the driver to switch shader on per pixel basis

And if the ps3.0 dynamic branching shader is replaced with one doing much less work, but producing similiar output? (like a brilinear type optimization)

Richard · Apr 25, 2004

John Reynolds said:
Maybe it's already been mentioned (if so, I didn't notice) but how is the ARB2 path going to give an apples to apples benchmarking scenario when one board will render the game at FP32 and the other 24?

True, but JC himself mentions in his last .plan that the ARB2 path is apples-to-apples.

John Carmack said:
This is unfortunate, because when you do an
exact, apples-to-apples comparison using exactly the same API, the R300 looks twice as fast, but when you use the vendor-specific paths, the NV30 wins.

And hasn't the NV40 shown that the difference in performance between fp32/fp16 was less to do with the actual precision and more to do with lack of registers? I mean, don't some shaders running in fp32 precision run faster than in fp16 in the nv40? And if that's true, then a path that (painfully) demonstrates hardware flaws seems to be more in line with informing consumers so they can make informed purchases.

radar said:
You still can make an informed decision. You can get best, worst and average case figures for each card/path and determine IQ for the resolutions that interest you then compare that data to what the competition can do. How each card delivers the results is irrelevant so long as IQ is up to scratch (nothing missing that should be there) and no cheating is going on.

That's assuming the review provides benchmark runs for every single path. JC mentions several new tweaks he's put into the ARB2 mode. If you test one card using a IHV dedicated path you're not testing the card under the full workload. The difference might be minor, but it might not. Considering that performance difference between nv30 and arb2 I think that's a pretty significant piece of information to have.

Dave said:
In terms of what you ask the game to do, you are, but that doesn't mean that the drivers aren't detecting that particular shader and replacing it with it own internal one.

Well, as long as the shader output is mathematically equal to the original shader I think everyone can agree to being okay (let's forget about that "games that aren't 'optimised' don't benefit" argument for a minute). More telling, however, is that considering the game will probably only be released after both the nv40 and r420 are in stores why wouldn't they give them to id to put into the game itself?

nAo · Apr 25, 2004

DaveBaumann said:
nAo â€“ note that Iâ€™m talking about fixed tests. If there are known benchmarks that utilise a fixed path but use a some branching for SM3.0 then its possible just to replace the shaders for that test with unrolled shaders because you know the path it is going to take and you know which code would need to operate. It depends one whether the costs of branching are more or less than the costs of state changing for different shaders.

I know you're speaking about fixed tests and I disagree with your opinion

To me a fixed test is a test that executes each time the same code (ie. a shader test on a benchmark suite).
Executing the same code all the time doesn't mean taking the same path all the time.
If the Nalu demo were a bechmark and if that single shader could be replaced with N shaders, how can a cheating driver replace that single shader used with a single draw call?
Even a much more simple test, as a PS3.0 shader that performs mandelbrot set rendering and that use dynamic branching to early out (z>2) from the main loop could be unreplaceable (ie, the replaced shader would be faster in the general case) if given the number of iterations per pixel distribuited as a bell curve and if that particular distribuition has an huge sigma. (one would have to choiche the right complex plane area to have such a distribuition, but the benchmark suite itself could cheat on the early out condition)
Of course cheating is possible..but as I see now with PS3.0 it's much more viable to build a suite test where an IHV will have an hard time to cheat just with shader replacement.

ciao,
Marco

nAo · Apr 25, 2004

AlphaWolf said:
And if the ps3.0 dynamic branching shader is replaced with one doing much less work, but producing similiar output? (like a brilinear type optimization)

everything is possible, but entropy is not for free

Neeyik · Apr 25, 2004

John Reynolds said:
Maybe it's already been mentioned (if so, I didn't notice) but how is the ARB2 path going to give an apples to apples benchmarking scenario when one board will render the game at FP32 and the other 24?

Some variables in an experiment are inherently out of one's control but that doesn't mean one shouldn't strive to ensure that everything that can be monitored and controlled, is monitored and controlled. As far as Quake benchmarking is concerned, everyone has been happy enough to ignore the precision differences between the likes of the NV2x and R2xx

.

Dave Baumann · Apr 25, 2004

Executing the same code all the time doesn't mean taking the same path all the time.
If the Nalu demo were a bechmark and if that single shader could be replaced with N shaders, how can a cheating driver replace that single shader used with a single draw call?

Something like that can't be used as a benchmark since branching if you take a different path you'll have differing amounts of branching per test.

However, an easy use for a game scenario may be to provide a single lighting shader but provide branching for different properties dependant on the room / environment you are in (yes, developers are looking at doing this). In the fixed benchmrk scenario it will be very easy to replace the single shader for multiple shaders per lighting property (or even whats in that benchmark scenario) which sets up the possiblity of reviewers thinking they are testing true SM3.0 scenarios, but are not in reality.

Bjorn · Apr 25, 2004

Neeyik said:
As far as Quake benchmarking is concerned, everyone has been happy enough to ignore the precision differences between the likes of the NV2x and R2xx .

And they should because the precision differences doesn't cause any differences in IQ. And of course, that means comparisions between any different path in Doom 3 is valid as long as the IQ are the same.

nAo · Apr 25, 2004

DaveBaumann said:
Something like that can't be used as a benchmark since branching if you take a different path you'll have differing amounts of branching per test.

No Dave, this test shader (we're on a gedankenexperiment!), on a given frame N, take all the times we run it the same sequence of paths.
In the Nalu case this sequence is determined by the strip ordering and even by which order will use the GPU to rasterize pixels. You can build a completely reproducible test, but a driver can't cheat just replacing the shader.

In the fixed benchmrk scenario it will be very easy to replace the single shader for multiple shaders per lighting property (or even whats in that benchmark scenario) which sets up the possiblity of reviewers thinking they are testing true SM3.0 scenarios, but are not in reality.

A game developer may be lazy, a benchmark writer can't

In the case you pictured here I doubt a shader replacement will improve performance, cause that shader is taking the same path all the time, and this shouldn't be a problem at all for PS3.0 hardware (but we have to test it to know better..)

ciao,
Marco

noko · Apr 25, 2004

Another option with Doom3 and Doom3 engine games is to bench using all available paths if possible. A comparison or statement of IQ differences could let us know if any real IQ differences result. Now doesn't the Radeon3xx, 2xx use a 5bit bilinear filter while Nv 4x,3x,2x use 8bits? How can you account for those differences? FP24 to FP32? Meaning apples to apples comparisons are like compairing red to green apples in the end.

Bjorn · Apr 26, 2004

noko said:
Meaning apples to apples comparisons are like compairing red to green apples in the end.

And we'll let the taste decide

Neeyik · Apr 26, 2004

noko said:
Another option with Doom3 and Doom3 engine games is to bench using all available paths if possible. A comparison or statement of IQ differences could let us know if any real IQ differences result.

Almost certainly this is something we would do upon the release of Doom 3 and/or a Doom 3 benchmark. I can imagine that we would want to release an article on how it all looks, works and whatnot.

AndrewM · Apr 26, 2004

Neeyik said:
noko said:

Another option with Doom3 and Doom3 engine games is to bench using all available paths if possible. A comparison or statement of IQ differences could let us know if any real IQ differences result.

Click to expand...

Almost certainly this is something we would do upon the release of Doom 3 and/or a Doom 3 benchmark. I can imagine that we would want to release an article on how it all looks, works and whatnot.

I really hope this happens!

Which path will NV40 use in Doom3?

Bjorn

radar1200gs

John Reynolds

Ecce homo

Bjorn

Dave Baumann

Gamerscore Wh...

Saem

nAo

Nutella Nutellae

Dave Baumann

Gamerscore Wh...

AlphaWolf

Specious Misanthrope

Richard

Mord's imaginary friend

nAo

Nutella Nutellae

nAo

Nutella Nutellae

Neeyik

Homo ergaster

Dave Baumann

Gamerscore Wh...

Bjorn

nAo

Nutella Nutellae

noko

Bjorn

Neeyik

Homo ergaster

AndrewM

Similar threads