Futuremarks technical response

FM did nVidia a favor by not going CPU VS in 2 & 3 > I see it now.

If FM went CPU VS in 2 & 3: the PS1.4 of the 8500's would have made the GF4 Ti's look REAL bad w/their PS1.1, correct? Then the allegations of 'optimization' would fly & on face value would have creedance.

Actually...no. :) But it's easy to come to that conclusion if you don't understand the difference between PS 1.4 and PS 1.1.

What is the difference between them? It's not that PS 1.4 is "faster" or "more powerful" in the sense of getting more done per instruction and therefore requiring fewer cycles in the pixel shader to achieve the same effect. Rather it's that PS 1.4 programs can be significantly longer, so that a certain effect which takes 2 (or sometimes 3) PS 1.1 programs can all be done in one PS 1.4 program.

Now, this doesn't reduce the workload for the pixel shader: the 1 PS 1.4 program that does the work of 2 PS 1.1 programs also takes (roughly) twice as long as each PS 1.1 program. Instead, it reduces the workload on the geometry engine (including vertex shaders), and reduces bandwidth utilization.

To see why, take a look at what happens when you perform the effect using 2 PS 1.1 programs instead, let's call then program A and program B. When it comes time for the GPU to render a poly to which the effect is applied, it will fetch the vertex coordinates, transform it, run any vertex shader programs to adjust those vertices, light it, and then, for each pixel in the interior of the polygon, run program A and write the result to the framebuffer. Then it will go on rendering all the other polys in the scene until it is done. Then it will start a second pass, and for any polys that are not finished rendering--like this one, which still needs program B to be applied--it will have to repeat the process again: fetch the vertices, run vertex shaders, and render again, this time running program B on the results from program A, and finally writing these final values out to the framebuffer.

With PS 1.4, you can do the effect with a single program. So you save the task of reading in the geometry again, running any vertex programs (including T&L) again, reading the temp values from the framebuffer and writing the new values to the framebuffer. Same amount of work in the pixel shaders, much less work in the rest of the GPU.

So in some sense, Nvidia is right to complain about vertex shaders in that GF4's inability to use the one-pass PS 1.4 rendering path does indeed increase its vertex shading workload. Of course, they're completely wrong and very disingenuous to imply that Futuremark could have used a different PS level instead: it's impossible to implement bump-mapped per-pixel specular and diffuse lighting in a single pass in PS 1.1, 1.2 or 1.3. About the best you can do is what Carmack has done in the Doom3 engine: 1 pass in PS 1.4, 2 passes in PS 1.1, and 5 passes for a DX7 GPU that only has fixed-function pixel pipelines. The rendering style used in GT2 and GT3 is a bit more complex: it takes 1 pass in PS 1.4, 3 in PS 1.1, and cannot be done at all on a DX7 card (presumably; or perhaps FM just didn't bother coding a fallback because the performance would be so absurdly bad).

So is Nvidia right when they assert "This approach creates such a serious bottleneck in the vertex portion of the graphics pipeline that the remainder of the graphics engine (texturing, pixel programs, raster operations, etc.) never gets an opportunity to stretch its legs"? No, absolutely not.

As Futuremark suggests, this fact is easily seen by looking at the scaling factors on the various cards as the resolution is increased. Luckily the Tech Report review has the data we need. Now, if the only bottleneck on rendering this scene was pixel throughput (in this case, the pixel shaders), then you would expect the scores to scale linearly with the number of pixels onscreen; i.e., you would expect the score @1280*1024 to be exactly .6x the score @1024*768, and the score @1600*1200 to be exactly .4096x the score @1024*768. If, on the other hand, the only bottleneck, even at 1600*1200, was the vertex shader workload, then lowering the resolution wouldn't change the fps one bit. Similarly, if the vertex shaders are the bottleneck at 1024*768, increasing the resolution won't lower the fps until pixel shading becomes enough of a burden to shift the bottleneck away from vertex shading, and indeed the drop in performance at higher resolutions will be very slight.

So we can use those results at from TR to tell us by how far these GPUs deviate from perfect "100% pixel throughput bottleneck" on GT2. Each percentage represents the amount by which the actual score is faster than the theoretical score assuming linear scaling with resolution, using the 1024*768 scores as the base:

9700 Pro:
1280*1024 - 13.46%
1600*1200 - 23.31%

9500 Pro:
1280*1024 - 9.55%
1600*1200 - 14.63%

GF4 Ti4600:
1280*1024 - 13.76%
1600*1200 - 22.07%

The GF4 is just as pixel-limited as the 9700 Pro! It is slightly less so than the 9500 Pro, but nothing to complain about. And let me remind you that the entire effect we're measuring is going to be much more significant than the portion of it due only to the extra vertex skinning. Conversely, if you look at the graphs in any reviews here at B3D--in which the x-axis represents pixel throughput rather than fps, in order to facilitate exactly this sort of comparison--you'll notice that GT2 is much more pixel-limited than many 3d games on the market (which should be expected because a game is more likely to be CPU-limited than a synthetic 3d benchmark, 3DMark01 notwithstanding).

Another demonstration of the same result comes from this comparison of a 9700 Pro with and without PS 1.4 support disabled in the drivers. The GT2 result increases by 22.5% when PS 1.4 is enabled; again, only a portion of that is due to any drop in vertex shader workload (probably more is due to the drop in required bandwidth), and only a portion of that portion is due to vertex skinning (as opposed to T&L).
 
Dave H,

Thank you so very much for explaining that. I even understood 90% in the 1st read thru. :D I'll be reading it many more times too.

Hmmmm. <Trying to put all that has happened into perspective again>

Thanks again Dave H. & I'll definately keep an eye open for your posts. 8) 8)

just me
 
Dave H,

To those of us who know very little about 3d technology, your post was very illuminating. Some of us lurkers appreciate clear explanations…
 
Someone more familiar with GameCube can probably verify this, but isn't GameCube also close to "having PS1.4 support"?

If true, wouldn't that imply that many gamecube developers would have PS1.4 support in the PC versions of the games (though dunno how many come across that port line, may not be many)

As someone brought the 1.1/1.3 & XBOX argument up in this fight...
 
Doomtrooper said:
There will be nothing posted by [H], I can gurantee it. The problem with the internet hardware websites is that 98% of them are not technically inclined to make opinions on their own relating to graphic cards, some .pdf shows up by a IHV and they post it as fact on their front page.

Never was it as apparent as last week, and the power Nvidia has over websites.
What did you say?

FutureMark Rebuttal:
The guys over at FutureMark have posted their thoughts on many of the arguments flying about this week. This is surely something you will want to take the time to read if synthetic benchmarks concern you.


While there are many very good and professional arguments and analysis published, there are also several misconceptions and erroneous allegations that really need to be addressed.

The PDF is available at this FutureMark link.

Also worth checking out is ExtremeHardware's article on this that contain some words directly from ATI. We have been asked for specific statements from ATI and are looking forward to seeing those next week.
[H]
 
What did you say?

He was referring to my statement about hoping that Tom's Hardware and [H] doing an article on responses from Future Mark. I said:

Well, I can only hope and assume that Lars will post another article on Tom's hardware about this response. It's also extremely prudent that [H] address this response as well, since it more or less addresses all the "issues" that [H] brought up in their 3DMark article.

I don't see any indication at all that [H] is going to consider FutureMark's (or ATI's) responses, and use them to test the validity of its current opinion. I'm not saying they won't, but there's no indication that they will. And that's what we want to see.

Between FutureMark, and ATI, they address pretty much every issue that [H] and nVidia raised.

Is it to [H]'s satisfaction? Why or why not?
 
Joe DeFuria said:
I don't see any indication at all that [H] is going to consider FutureMark's (or ATI's) responses, and use them to test the validity of its current opinion. I'm not saying they won't, but there's no indication that they will. And that's what we want to see.

Between FutureMark, and ATI, they address pretty much every issue that [H] and nVidia raised.

Is it to [H]'s satisfaction? Why or why not?
We have been asked for specific statements from ATI and are looking forward to seeing those next week.
 
Katsa said:
Someone more familiar with GameCube can probably verify this, but isn't GameCube also close to "having PS1.4 support"?

If true, wouldn't that imply that many gamecube developers would have PS1.4 support in the PC versions of the games (though dunno how many come across that port line, may not be many)

As someone brought the 1.1/1.3 & XBOX argument up in this fight...

No, Flipper (GameCube's GPU) have something known as TEV which is a sort of programmable fragment pipeline (allowing dependant texture reads and ALU ops on fragments) but has nothing to do with DirectX shaders.
 
Ichneumon said:
Game Test 2 and 3 don't change much between CPU vs GPU skinning because primarily they are Pixel Shader limited. Moving the skinning to the CPU would only make it MORE pixel shader limited.

No.

All game tests are heavy vertex shader tests. Test 1 is only limited by the vertex shader. Test 2,3 and 4 are vs limited in low resolutions, but at higher resolutions they are limited by fillrate. Pixel Shader performance has very low impact on these tests (on the radeon 9700 pro)

http://www.tommti-systems.com/main-Dateien/reviews/3dmark03/3dmark03.html

Thomas
 
tb said:
Ichneumon said:
Game Test 2 and 3 don't change much between CPU vs GPU skinning because primarily they are Pixel Shader limited. Moving the skinning to the CPU would only make it MORE pixel shader limited.

No.

All game tests are heavy vertex shader tests. Test 1 is only limited by the vertex shader. Test 2,3 and 4 are vs limited in low resolutions, but at higher resolutions they are limited by fillrate. Pixel Shader performance has very low impact on these tests (on the radeon 9700 pro)

http://www.tommti-systems.com/main-Dateien/reviews/3dmark03/3dmark03.html

Thomas

I'm not sure that's what your tests show, but then again, I have a hard time understanding quite a bit on that page (not only because I don't understand German that well).
 
tb said:
Ichneumon said:
Game Test 2 and 3 don't change much between CPU vs GPU skinning because primarily they are Pixel Shader limited. Moving the skinning to the CPU would only make it MORE pixel shader limited.

No.

All game tests are heavy vertex shader tests. Test 1 is only limited by the vertex shader.

"Only"? So the test results don't change as resolution increases? That is counter to the results I've seen. Of course if you reduce vertex shader performance enough, changing resolution won't have an effect, but how is that saying anything new or useful for the comparison?

Test 2,3 and 4 are vs limited in low resolutions, but at higher resolutions they are limited by fillrate. Pixel Shader performance has very low impact on these tests (on the radeon 9700 pro)

Really? The data doesn't look like that to me (though, perhaps I missed something in the translation?). If you lower the resolution enough, of course they won't be as fillrate or pixel shader limited, and vertex shading limitations (which haven't changed) have a greater impact, but the very idea that changing resolution changes performance drastically points out that vertex shading is not the most significant limitation of the workload as that situation is changed.
When Ichy is talking about moving the vertex skinning to the CPU as per nvidia's complaint, he is, AFAICS, rightly pointing out that this reduces the workload on the vertex shading performance, so I assume your "No." isn't to that part of his response?

BTW, I may have misread, but some of your reasoning based on low resolution testing looks like you are assuming changing resolution doesn't have a corresponding impact on pixel shading workload...?

Whatever you mean, some test data for a 9500 (non pro) and a 9000 Pro for comparison might yield some information on relative shader performance gains from enhanced functionality in those tests (maybe with Hyper Z features turned off).
 
demalion said:
tb said:
Ichneumon said:
Game Test 2 and 3 don't change much between CPU vs GPU skinning because primarily they are Pixel Shader limited. Moving the skinning to the CPU would only make it MORE pixel shader limited.

No.

All game tests are heavy vertex shader tests. Test 1 is only limited by the vertex shader.

"Only"? So the test results don't change as resolution increases? That is counter to the results I've seen. Of course if you reduce vertex shader performance enough, changing resolution won't have an effect, but how is that saying anything new or useful for the comparison?

Test 2,3 and 4 are vs limited in low resolutions, but at higher resolutions they are limited by fillrate. Pixel Shader performance has very low impact on these tests (on the radeon 9700 pro)

Really? The data doesn't look like that to me (though, perhaps I missed something in the translation?). If you lower the resolution enough, of course they won't be as fillrate or pixel shader limited, and vertex shading limitations (which haven't changed) have a greater impact, but the very idea that changing resolution changes performance drastically points out that vertex shading is not the most significant limitation of the workload as that situation is changed.
When Ichy is talking about moving the vertex skinning to the CPU as per nvidia's complaint, he is, AFAICS, rightly pointing out that this reduces the workload on the vertex shading performance, so I assume your "No." isn't to that part of his response?

BTW, I may have misread, but some of your reasoning based on low resolution testing looks like you are assuming changing resolution doesn't have a corresponding impact on pixel shading workload...?

Whatever you mean, some test data for a 9500 (non pro) and a 9000 Pro for comparison might yield some information on relative shader performance gains from enhanced functionality in those tests (maybe with Hyper Z features turned off).

My No, was to this part of the message "Game Test 2 and 3 don't change much between CPU vs GPU skinning because primarily they are Pixel Shader limited"

I think they are more vertex shader and fillrate limited than pixel shader calculation speed limited. The limitation goes from the vertex shader to the fillrate (and a little bit of pixel shader) when you increase the resolution. Test 1 is most of the time vertex shader limited, but fillrate comes into play with some very high resolutions. Test 2,3 and 4 are not that heavy limited by the vertex shader. Fillrate is the main limitation in these tests(2,3,4) and the pixel shader has a very less impact.

Sorry, don't have a radeon 9500 / 9500 pro :(

Thomas
 
The whole thing:

Synthetic benchmarks also offer us a lot of help in designing future hardware. Relying solely on existing game benchmarks in this case would leave us in danger of producing new products that run old games well, but run new games poorly.

I must say that i don't really buy this argument. Not fully at least.
I mean, Ati like other IHV's must have access to most developers "coming tech" engines, Doom3, new Unreal tech and stuff like that. Surely they get reports back from them and rely more on those type of things then on 3D Mark.
 
Evildeus said:
We have been asked for specific statements from ATI and are looking forward to seeing those next week.

LIKE I SAID,

They have not made any mention if they actually plan to comment on ATI's / FutureMark's response or not.

Again, the point is, [H] made several comments (parroting nVidia's complaints) about 3D Mark. FutureMark and ATI basically ADDRESSED all of those complaints in their rebuttles.

So the question is...is [h] satisfied with the explanations? And more importantly, if not....WHY NOT.

Well they have Ati response.

Right...and no indication about why they aren't satisfied with that response...assuming they aren't.

Yeah, here's a quote from [H] (my emphasis added:

Notice that in their opening paragraph ATI points out that, "We believe that using synthetic benchmarks such as 3DMark03 in combination with real applications and game benchmarks provides the most complete picture of the overall performance and value of graphics hardware." This is certainly saying that 3DMark03 should never be relied upon as a stand alone tool for evaluation.

Well, excuse me, but DUH. :rolleyes: Is that supposed to be some sort of revelation by [H]? Everyone has always agreed that by using ANY single benchmark or tool, you can't get the whole story. This goes for 3DMark, Quake, Doom, Serious Sam, Code Creatures, et. all.

And again, this is no admission from ATI or Futuremark that the score is useless, which is what [H] is claiming.
 
Evildeus said:
Well they have Ati response. I do like this quote :)

Synthetic benchmarks also offer us a lot of help in designing future hardware.

http://www.hardocp.com/article.html?art=NDMx

"ATI on 3DMark03 : ATI comes to out to be the first graphics card company with an official statement on the benchmark that was released this week."

So, has HardOCP already admitted publically that they received commentary from nvidia? If so, the above can maybe be taken in context. If not, the above is a pretty gross distortion and abuse of the term "official" to give the impression that nvidia hasn't had a chance to "tell their side" yet.
That is assuming, to give them the benefit of the doubt, your definition of "official <vendor> response" excludes the nvidia whitepaper.

Personally, I don't recall having seen any mention that the reasoning presented in the 3dmark03 article reflected an outside source, and it looked to me like it was passed off as HardOCP's independently achieved conclusions. In that light, the above looks like a pretty strong indication of a deep bias, where "HardOCP = nvidia", and "Outside opinion that bears the burden of proof = ATI". The first equation is where I see a very large problem. :-?
 
well, we don't = nvidia

nvidia made some good points, so did futuremark and ati

i agree with points from all of them

i think its important to take in and evaluate what everyone involved has said and draw input from that...
 
Back
Top