Futuremarks technical response

just me · Feb 17, 2003

Intel has a say:

http://www.vr-zone.com/#2933

Intel Response To 3DMark2003

VR-Zone managed to caught up with Matt Dunford, Intel Client Performance Manager (Benchmarking) at a NDA briefing today and get some of his views on the much debated 3DMark2003. According to Matt, Intel is currently evaluating on the 3DMark2003 and has found it to be a nice synthetic benchmark for gaming.

However, one point of concern is that 3DMark2003 can't be used for testing integrated graphics anymore as the score will be too low since it is much tuned towards DirectX 9 architecture. Even the upcoming integrated graphics found on Springdale-G featuring 2nd generation Extreme Graphics won't be having DX 9 but it will still have a big performance boost from the current Extreme Graphics from what we have seen.

Intel is actually hoping for the CPU tests in 3DMark2001SE so the newly included CPU tests within the 3DMark2003 especially interests Intel and are currently testing how Hyper-Threading, CPU, Memory etc. will affect the score.

Source: DigiTimes

just me

Nick[FM] · Feb 17, 2003

Morris Ital said:
Futuremark probably wish they had made it less agressive and more cpu dependent, after all they could bring out the SE edition in 1 years time with more polygons to test the current crop of cards more severely then.

Sorry mate! We are very happy with how 3DMark03 turned out, and wouldn't change it!

boobs · Feb 17, 2003

Hellbinder[CE said:
]

Joe, seriously, honestly, do you work for Ati?

Instead of sitting here and arguing about 3DMark, how about some of the demo writers here code up a quick benchmark for DX 9.0 using HLSL or Cg so that Dave or Kyle can use it to check out what optimized code can do on Gffx and R300?

Click to expand...

Because that is not the way its supposed to work. Also you are insinuating that 3dmark03's code is not optomized, or is somehow unworthy.

1. 3dmark03's shader routines were written with HLSL.

2. Code is *supposed* to be optomized for DX8,DX9 or OpenGL. It is then the companies problem if they cant run the damn thing.

Ati is not upset at the methodology used in developing 3dmark. either Sis or Any of the other Beta companies have seen fit to make a public statement. The bottom line is that 3dmark03 comes as close as you can to simulating the condistions in future games.

For a Synthetic benchmark that is all you can ask for.

I'm saying that it would be interesting to code the same demo two ways. Once with HLSL and once with CG, and see if there is a difference between the two.

Also, people could do things like varying the shader length and other factors to map out the strengths and weaknesses of both cards.

It doesn't have to look pretty. It just has to give useful map of the performance under variables that are of concern when you make 3d engines and program 3d games.

All this arguing over 3dMark seems like grasping at straws to me when one could really go in and get to the guts of the hardware if one really wanted to. You wouldn't expect this to be popular at Anandtech, but hey, this is b3d right?

Sabastian · Feb 17, 2003

worm[Futuremark said:
]

Morris Ital said:

Futuremark probably wish they had made it less agressive and more cpu dependent, after all they could bring out the SE edition in 1 years time with more polygons to test the current crop of cards more severely then.

Click to expand...

Sorry mate! We are very happy with how 3DMark03 turned out, and wouldn't change it!

Good too hear.

Doomtrooper · Feb 17, 2003

As long as its a Standard Industry API HLSL that was used.

Jima13 · Feb 18, 2003

some help please

I'm just passing on some questions to the experts....dealing with members of another forum & my knowledge of the subject isn't good enough to give a qualified answer.....so any replies will be posted back to the requester....TIA

"1. Why does 3DM2K3 render characters 11 times/frame when no game in the world will do this?

2. Why do processor scores differ when exact same setups are being used with the only difference being the graphics card. Case in point GFFXU cpu scores vs. those of the 9700Pro cpu scores. anywhere from 100-200 points in difference.

3. Granted PS 1.4 is DX8 and a subset of DX9, but when the only games that use it are those that ship with ATI cards, why use it?"

demalion · Feb 18, 2003

Hyp-X said:
Hellbinder[CE said:

]
1. 3dmark03's shader routines were written with HLSL.

Click to expand...

Only in the pixel shader 2.0 test.

It is!? Where is this info provided so I know what I failed to read closely enough...

This eases my mind a bit if it is compiled a run time...this is exactly the type of optimization nvidia should focus on achieving (if they can)...and maybe when the DX 9 HLSL compiler is further along this will be facilitated.

demalion · Feb 18, 2003

Doomtrooper said:
As long as its a Standard Industry API HLSL that was used.

I agree with boobs, it would be interesting to see a comparison of Cg to DX 9 HLSL, whether for the code for GT 4 or, preferrably, some other example where the source is available (and can be varied as boobs suggests). Discussing the issues involved with the difference in performance (if there is any, to a significant degree) might be educational about the strengths it offers (I've already expounded on the weaknesses of Cg at great length elsewhere

).
The main problem is that we wouldn't know how much of the difference is due to the current state of nvidia's DX 9 drivers, and perhaps even the HLSL compiler itself (AFAIK, it doesn't support extended 2.0 instruction capabilities yet).
After that, it might be an issue of Cg allowing weaknesses in the GF FX (fp 32 execution speed) to be avoided rather than actually exposing strengths (like the additional shader instructions), but if you are not comparing to a differing architecture (like the R300) then both of these are pretty much the same thing in any case.

RussSchultz · Feb 18, 2003

It would be interesting to see the same shader compiled on both DX9 HLSL and Cg, when targetting PS2.0 and and a R300. It would also be interesting to see how the code differs when compiled at runtime for an NV30 vs. R300, GF4/9000/9100, etc.

Dave H · Feb 18, 2003

Quick question re: GT2 and 3-

The White Paper says the PS 1.1 path takes three passes: stencil, "light fall-off to alpha buffer" and lighting. But in Nvidia's complianing about the oh-so horrible vertex shading burdens on PS 1.1, they only mention the stencil and lighting passes.

Now, Futuremark gives us the PS 1.1/PS 1.4 rendered poly counts--5:3 for GT2 and 2:1 for GT3. If we assume that PS 1.1 indeed requires 3 geometry passes per light, then we come out with an average of .5 lights/poly for GT2 (!) and 1 light/poly for GT3. If, on the other hand, we assume that PS 1.1 only requires 2 geometry passes per light, we get 2 lights/poly for GT2 and...infinity lights/poly for GT3.

Neither result is entirely satisfying. My guess then is that the light fall-off to alpha buffer pass indeed requires full geometry setup (this should have been obvious I suppose), but is only required for certain polys. Which ones? What exactly does this pass do??

Another option of course is that there are a certain number of polys that aren't rendered according to the unified lighting algorithm and thus throw the numbers off a bit.

Can anyone explain what exactly is going on here, and--if I'm right that the alpha pass only applies to a certain fraction of polys--offer an estimate as to how large that fraction might be?

nooneyouknow · Feb 18, 2003

Re: some help please

Jima13 said:
3. Granted PS 1.4 is DX8 and a subset of DX9, but when the only games that use it are those that ship with ATI cards, why use it?"

What games are shipping with ATI cards? Your statement has no fact to back it up. Are ATI cards bundling Unreal Tournament 2003? Are they bundling Independance War 2? How about Tiger Woods 2003? How about Neverwinter Nights (OpenGL equivalent for the water)? I could go on but I think you get the point.

Oh, and why use it? Simple: DX9 games will not use PS 2.0 exclusively. Makes no sense. They WILL use the lowest possible PS level that DOES what they need. So, if it can be done in PS 1.4, then it will be done in PS 1.4. If they can multiple pass in PS 1.1, then they will do that. All about install base.

Jima13 · Feb 18, 2003

Re: some help please

nooneyouknow said:
Jima13 said:

3. Granted PS 1.4 is DX8 and a subset of DX9, but when the only games that use it are those that ship with ATI cards, why use it?"

Click to expand...

What games are shipping with ATI cards? Your statement has no fact to back it up. Are ATI cards bundling Unreal Tournament 2003? Are they bundling Independance War 2? How about Tiger Woods 2003? How about Neverwinter Nights (OpenGL equivalent for the water)? I could go on but I think you get the point.

Oh, and why use it? Simple: DX9 games will not use PS 2.0 exclusively. Makes no sense. They WILL use the lowest possible PS level that DOES what they need. So, if it can be done in PS 1.4, then it will be done in PS 1.4. If they can multiple pass in PS 1.1, then they will do that. All about install base.

I was afraid this wouod happen

....if you read my entire post you might notice those #questions were C&P from someone on another forum........I knew I was going to be humiliated, but I did it anyway

Dave H · Feb 18, 2003

1. Why does 3DM2K3 render characters 11 times/frame when no game in the world will do this?

First, the question is slightly confused: geometry is rendered, while "characters" (or more specifically, anything that uses skeletal animation) are skinned. If we're just talking about rendering geometry, then unless this is a corny way of saying "Doom3 will be out of this world", the premise of the question is incorrect. (Incidentally, that may just have been the worst joke I've ever made.

) Doom3, when using its PS 1.1-equivalent path, will also render geometry 11 times in any scene with 5 lights. That's what happens when you have to multipass--you rerender the geometry on every pass. 1 z-buffer pass plus 2 per light equals 11. That's how it works.

Second, it doesn't render or skin anything 11 times/frame under normal conditions. Here's the Nvidia quote being referred to:

[url=http://www.xbitlabs.com/news/story.html?id=1045073804 said:
Nvidia marketing[/url]] In a scene with five lights, for example, each object gets re-skinned 11 times.

Unfortunately, the example is basically meaningless since the average scene has much closer to 2 lights than 5. A simpler way to calculate things is to just look at the rendered poly counts, which Futuremark has provided in the White Paper. GT2 renders an average of 250,000 polys/frame using the PS 1.1 path, and 150,000/frame with the PS 1.4 path; the numbers for GT3 are 580,000 and 240,000 respectively. Now, these numbers are averages: some scenes will have more lights (and thus a larger disparity in the # of passes), and some fewer; but in general, due to the fact that multiple passes means the same geometry is rendered more times, the amount of geometry rendered goes up 66-100% when moving from PS 1.4 to PS 1.1.

As for the amount of vertex skinning, it will also increase 66-100%, but from smaller numbers. After all, skinning is only needed for skeletally animated characters, not world geometry. And skinning is a light workload vertex shader operation, comparable to transforming, which is done on all the vertices. Doom3 does use the CPU for skinning, and this presumably increases performance particularly on hardware that isn't capable of PS 1.4. But, as Futuremark points out, it takes up CPU time that could better be spent on AI, physics, etc.

In the end the only way to really answer this question is to examine Nvidia's assertion:

[url=http://www.xbitlabs.com/news/story.html?id=1045073804 said:
Nvidia marketing[/url]]This approach creates such a serious bottleneck in the vertex portion of the graphics pipeline that the remainder of the graphics engine (texturing, pixel programs, raster operations, etc.) never gets an opportunity to stretch its legs.

Are GT2 and GT3 really seriously vertex bottlenecked? Not really, no.

One quick way to see this is to look at the graphs in the new 3DMark03 performance writeup here at B3D. The graphs will seem a bit unusual if you're used to looking at fps graphs; instead of using framerate as the y-axis variable, they use achieved pixel fillrate, i.e. fps*resolution. (The x-axis variable is resolution.) This style of graph is extremely useful to find out at a glance what the bottleneck is in any given situation.

First let's see what being vertex shader bottlenecked looks like. As you can see, we get a bunch of straight lines radiating out from the x-y origin. That's because vertex shader workload doesn't change as you change resolution, so if it's always the bottleneck your framerate will stay constant no matter what you do to the resolution. Thus fillrate rises linearly with resolution.

Conversely, if you're completely pixel fillrate limited, you get a horizontal straight line; that's because increasing the resolution won't increase your fillrate because, well, you're already fillrate limited like I said. This effect can be seen with the 9500 at high resolutions on several tests including GT2.

Normally, if you lower resolution enough you eventually end up geometry limited (i.e. diagonal straight line), and if you raise resolution enough you eventually end up fillrate limited (horizontal straight line), so a "well-balanced" game or benchmark is one that follows an arc: steeper in the low resolutions, flatter in the high resolutions. In the middle of such an arc, you're neither exclusively vertex nor fillrate limited.

Having said that, let's look at the results for GT2. All the cards are following a nice arc, except the aforementioned 9500 which becomes fillrate limited above 1024*768. The GF4 cards in particular--since that's what we're really concerned with here, is whiny GF4 owners--scale very nicely, although it can be difficult to see the arc since the scale is smaller down there.

Let's break out the numbers a little bit by comparing each card to its 640*480 performance. The percentages represent the framerate at the given resolution as a percent of framerate at 640*480. Remember that if GT2 were completely vertex shader limited as Nvidia charges, all the numbers would be 100%. (As they are, more or less, if you do this analysis on the Vertex Shader test results.)

Code:

        % of 640*480 fps

        800 1024 1280 1600
9700P: 78.6 56.3 38.4 28.6
9700:  78.9 56.8 38.4 28.7
9500P: 75.3 52.7 34.4 24.9
4600:  76.2 57.8 41.1 30.8
4200:  82.0 62.5 43.0 31.3

In general, the 4600 is hardly more vertex shader limited than the 9700 Pro, which is to say, not very much at all. The 4200 is a bit more vertex limited, just as the 9500 Pro is a bit less so. Then again, this is to be expected, as the GF4s have to process more geometry on account of having to run more passes. But all 5 cards are pretty close in scaling characteristics, and none of them is anywhere near approaching a situation where, as Nvidia puts it, "the remainder of the graphics engine (texturing, pixel programs, raster operations, etc.) never gets an opportunity to stretch its legs."

As regards the skinning issue, remember that the skinning workload represents only a portion of the overall geometry workload. And note that the test appears quite bandwidth limited (look at how much the 9700 beats the 9500 Pro by; they're exactly the same except for bandwidth), which further hurts the GF4s because multipassing also takes quite a bit more bandwidth (an extra write to and read from the framebuffer).

For more evidence of the same thing, we can look at this comparison of a 9700 Pro with PS 1.4 enabled/disabled in the drivers. GT2 and GT3 performance each go up about 23% with PS 1.4 over PS 1.1. This is significant, of course, but when you consider first of all that the geometry load increases 66-100% with PS 1.1 only, and second that only a part of the performance hit is due to geometry (the other part being due to the extra bandwidth required), and third that only a part of the geometry hit is due to skinning...we're talking a pretty minor effect here.

All in all, if skinning were moved to the CPU, the GF4 might see perhaps a 5-10% performance increase on GT2/3 relative to PS 1.4 capable cards (but probably on the lower end of that). Meanwhile, you'd be making 3DMark03 less of a GPU and more of a CPU benchmark--which is contrary to one of its stated aims with the new version--and you'd probably be hurting GT2/3 performance on future GPUs.

Why do processor scores differ when exact same setups are being used with the only difference being the graphics card. Case in point GFFXU cpu scores vs. those of the 9700Pro cpu scores. anywhere from 100-200 points in difference.

First off, the difference is 40-50 points, not 100-200. Anyways, the most likely reason for this is that the GFfx's drivers are not as efficient. (Which is somewhat to be expected; after all, it is a new architecture.) Remember, drivers run on the CPU, so they're competing with the software vertex shading and everything else for CPU time.

Granted PS 1.4 is DX8 and a subset of DX9, but when the only games that use it are those that ship with ATI cards, why use it?

Doom3 is shipping with ATI cards? Kickass!!

(Note: technically D3 doesn't use PS 1.4 or PS 1.1 because it is written in OpenGL; however, the R200 path uses exactly PS 1.4 functionality, and the NV20 path uses exactly PS 1.1 functionality.)

More generally, per-pixel unified bump-mapped specular and diffuse lighting with stencil shadows (ala Doom3) cannot be done in 1 pass with PS 1.1, PS 1.2 or PS 1.3. Any game with D3-style lighting is going to use PS 1.4 functionality.

Why not use PS 2.0? Because it's not necessary for the effect, and the installed base of PS 1.4-capable cards is a superset of the installed base with PS 2.0-capable cards. And while almost any PS 1.4 effect can be replicated using PS 1.1-1.3 and 2 or 3 rendering passes, a PS 2.0 effect generally can't be replicated using any PS 1.x shader. In this case, the only benefit to be had from moving to PS 2.0 is the use of higher-precision floating point for some of the lighting calculations--and indeed, Doom3 makes use of this, offering "minor quality improvements" in exchange for "a slight speed [dis]advantage".

Surely many similar games will also offer PS 2.0 versions of the effect, but since it will only be running what is essentially a PS 1.4 shader with FP components for a couple calculations, the performance and image quality will be only slightly different from straight PS 1.4. More to the point, all such games will offer a PS 1.4 path (and presumably a fall-back PS 1.1 path), until such time as consumers with DX8 cards aren't worth supporting at all (probably not for 2.5+ years).

Considering 3DMark03 is meant to simulate games released ~1.5 years from now, rather than games available today, the choice to feature unified per-pixel lighting with stencil shadowing on 2 of 4 tests seems very sensible: it is very likely to be the most important rendering technique used in the next generation of graphics-intensive games. Once that choice has been made, the decision to heavily use PS 1.4 has also already been made.

If someone is more concerned with how cards perform running games that are available today, they should benchmark them with those games. Duh.

Evildeus · Feb 18, 2003

Still waiting on the long list :!:

(to long for the forum?

)

SvP · Feb 18, 2003

Very nice post, Dave H, thank you 8)

Jima13 · Feb 18, 2003

Yea Dave!

SvP said:
Very nice post, Dave H, thank you 8)

Yes, many thanks for the explanation; I'll be sure and pass it on

stevem · Feb 18, 2003

Dave H said:
Jima13 said:

Why do processor scores differ when exact same setups are being used with the only difference being the graphics card. Case in point GFFXU cpu scores vs. those of the 9700Pro cpu scores. anywhere from 100-200 points in difference.

Click to expand...

First off, the difference is 40-50 points, not 100-200. Anyways, the most likely reason for this is that the GFfx's drivers are not as efficient. (Which is somewhat to be expected; after all, it is a new architecture.) Remember, drivers run on the CPU, so they're competing with the software vertex shading and everything else for CPU time.

This leads to an interesting issue of CPU use/overhead by the drivers. Nobody said that the CPU use of different GPUs should be the same, nor that they should be as low as possible under all scenarios...

Dave Baumann · Feb 18, 2003

Dave H - Slow down :!:

mr · Feb 18, 2003

DaveB you should hire Dave H.

Honestly, thanks Dave H for your contributions to this forum.

martrox · Feb 18, 2003

Dave H, thanks for some of the most lucid writing I have ever seen on these or any forum...... you have managed to fully explain this to the layman while not talking down to them.... I mean us!

Futuremarks technical response

just me

Nick[FM]

boobs

Sabastian

Doomtrooper

Jima13

demalion

demalion

RussSchultz

Professional Malcontent

Dave H

nooneyouknow

Jima13

Dave H

Evildeus

SvP

Jima13

stevem

Dave Baumann

Gamerscore Wh...

mr

martrox

Old Fart

Similar threads