New 3DMark03 Patch 330

Gunhead said:
demalion said:
My early take (from Page 5, haven't caught up yet):

It seems clear that ATI is doing a "bad optimization", i.e., a benchmark specific one.

When you catch up, what's your take on the 4X AA result [p. 10 of this thread] that was lower with build 320?

That's a bit tricky. Simplest explanation that occurs to me is that there is a 3dmark03 path that doesn't handle memory allocation (on board the card) in quite the same way. Another explanation is that the code substitution has some significant overhead: for example, if the optimization was done by a low level optimizer that was "switched on" for 3dmark 03 because it is known to be successful for that set of shaders, and the test with 4xAA is just more thoroughly bandwidth limited (if the leaves and grass are polygons, so color compression would break down a bit, this would make a lot of sense) there might be a consistent improvement with the low level optimizer turned off. A shift in bottlenecks. To check this, scaling CPU speed and performing multiple runs (to rule out error, etc) might provide some insight.

Also, if more efficient AGP bus utilization is a primariy optimization for 3.4 (I think they've been doing this in previous sets, don't know about 3.4 yet), scaling AGP speed might reveal whether there is a specific 3dmark03 code path that behaves a bit differently for GT 4. However, that would lead to new questions, not immediate answers.

Now that ATI has admitted optimising for 3DM03, was it a poorly conducted optimisation (for not working when AA is on)? Or did ATI intentionally optimise for non-AA speed (for FM ORB), allowing AA speed to suffer? Or was it an optimisation just to render 3DM03 in a smarter way, not directly shooting for an increase in the score (hence non-AA happened to get faster, AA happened to get slower) but just doing it smarter like they'll optimise in patches for the most popular games?

Many unanswered questions. If the code substitution is actually light work (as I'd expect just substitution to be), it seems likely they got a bit sloppy and left some general optimizations turned off for GT 4 since re-ordered instructions provided more of a benefit. If it wasn't just a bit of sloppiness, it could even be that the driver architecture for the optimizations offered in 3.4 don't lend themselves to a shader detection and substitution mechanism, and had to be circumvented.

I'm confused.

Technical answers from ATI would be helpful for me too.

Can somebody with a R9800P repeat HotHardware's AA tests?
...

Yeah, it would be helpful to make sure it isn't just error before testing my whacky theories. :LOL:
 
Lezmaka said:
Was gonna say something about bandwidth and the 9800 scores, but then realized shouldn't have an effect.

But while looking at hothardware, saw something else...

I'm assuming the numbers at hothardware are correct. That means with 4xAA and 4xAF and using patch 330, the 5900 is still slightly faster than the 9800 Pro.

5900U - 3098
9800P - 2940

Anyone else see that or did some smoke from the people in the apartment downstairs drift up into mine? lol

I just ran it at 4x FSAA/8xAF and got 3114 with my 9800P (stock configuration) clocked at 445/365.x. (5774 with 0xFSAA/AF in 330, see my earlier post for specifics)--running 330 of the '03 software with the Cat 3.4's. I'll go back and re-run it with 4xAF and come back and add in the results.

Edit: picked up one (1) point by dropping from 8xAF to 4x AF for an aggregate score of 3115....*chuckle* (BTW, running all texture settings at HQ.)

Interesting that my score at 0xFSAA, 0xAF is ~1100 + points higher than FutureMark's and Hot Hardware's 5900U scores running 330, but my 4xFSAA/4x/8xAF scores are only slightly higher than the 5900U's. IMO, nVidia's doing something funky with its FSAA modes which is positively affecting fps performance at the expense of IQ. These particular "optimizations" in this case would have nothing to do with 3D Mark at all, but rather would be general "optimizations" which would be visible in all software. It's a possibility that I can't dismiss without looking at some detailed comparison of 4xFSAA/4xAF screenshots of the output of both cards.
 
WaltC said:
I just ran it at 4x FSAA/8xAF and got 3114 with my 9800P (stock configuration) clocked at 445/365.x.
So did you run your 9800P at stock speeds (380/340), or OC'ed (445/365)? Your scores may indicate the former, but your prose seems to be badly optimized. ;)

BTW, both Hexus.net and NVNews noticed blurriness in 4xAA screenshots, but NVNews said the screens weren't indicative of visible IQ (i.e., what you see on the monitor is superior to what's seen in screens). I'm waiting for a little hard-hitting investigative reporting in B3D's review, when Dave gets around to it (first this 3DM03 mess, then I think he has a 256MB 9800P in the wings, then maybe the 5900U).
 
kyleb said:
Brent did just that not to long ago, and his results were interesting to say the least:

http://www.hardocp.com/article.html?art=NDcyLDY=

You're right! I thought he did a very good job with that review and had actually read it earlier. If we can ever move past the 3D Mark 03 atrocities committed in the Dets we might be able to take a hard look at what's going on with their FSAA modes--I think Brent's just scratched the surface and raised some provocative questions. I'd very much like to know what they're doing with the post filter and FSAA--they were admittedly doing something with it with nv30 which they have declined to talk about in even general detail--citing "trade secret" restrictions they have imaginatively employed to avoid talking about the fundamentals behind it. Maybe we'll eventually get the scoop someday...
 
Pete said:
So did you run your 9800P at stock speeds (380/340), or OC'ed (445/365)? Your scores may indicate the former, but your prose seems to be badly optimized. ;)

BTW, both Hexus.net and NVNews noticed blurriness in 4xAA screenshots, but NVNews said the screens weren't indicative of visible IQ (i.e., what you see on the monitor is superior to what's seen in screens). I'm waiting for a little hard-hitting investigative reporting in B3D's review, when Dave gets around to it (first this 3DM03 mess, then I think he has a 256MB 9800P in the wings, then maybe the 5900U).

Heh-Heh....I see your point...;) What I meant by "stock configuration" was that the card is ROOB running with the standard fan, etc.--I did nothing to it except advance the clock--yep, scores were at 445/365.x. Sorry for the haze.... :D
 
WaltC said:
IMO, nVidia's doing something funky with its FSAA modes which is positively affecting fps performance at the expense of IQ.

Or maybe it might have something to do with having 25% more raw bandwidth? :rolleyes:

I know, I know; there have been reports of suspicious blurriness at 4xAA with the NV35 samples, and those bear checking out. (Of course Anand reported blurriness at all settings for his NV35 sample, so maybe it's just a DAC problem or something.) And obviously Nvidia has proven they don't deserve the benefit of our doubt.

But the simple fact is that the 5900U should have better performance than the 9800P at high MSAA levels. Sometimes the obvious explanation is also the correct one...
 
Dave H said:
WaltC said:
IMO, nVidia's doing something funky with its FSAA modes which is positively affecting fps performance at the expense of IQ.

Or maybe it might have something to do with having 25% more raw bandwidth? :rolleyes:

I know, I know; there have been reports of suspicious blurriness at 4xAA with the NV35 samples, and those bear checking out. (Of course Anand reported blurriness at all settings for his NV35 sample, so maybe it's just a DAC problem or something.) And obviously Nvidia has proven they don't deserve the benefit of our doubt.

But the simple fact is that the 5900U should have better performance than the 9800P at high MSAA levels. Sometimes the obvious explanation is also the correct one...

I thought about that...but what's the old saying, "Bandwidth without pixels-per-clock is like...a car without gasoline".....or something?....;) With my 9800P at 445/365.x I'm getting more than 5900U power with less bandwidth...so I'm not sure that even at 445MHz the bandwidth I have is insufficient to support the core. It may be in certain situations...just not entirely sure of that at the moment.

But the main thing is that nVidia admitted to using the post filter with its QC-2x FSAA modes with nv30--grudgingly admitted it--when it had to explain to some sites why its QC-2x FSAA modes couldn't be grabbed with the standard screen-grabbing software they had. nVidia literally said "It's a trade secret" and slammed the lid closed on any further explanation of what it was doing--didn't even offer a sketch of the general fundamentals. I found that strange--you can't give away a trade secret by describing its fundamentals--kind of like how it's well known the "The Colonel" at Kentucky Fried Chicken (may he RIP) uses "11 secret herbs and spices" on his chicken--the ingredients used and the recipes would be the "trade secret"--not the mere revelation that he was using "11 secret herbs and spices".....*chuckle* (Haven't a clue as to why I thought of that example!)

For instance, by way of a more pertinent example, when 3dfx used post-filter blending to achieve its 16/22-bit mode (16-bits & post filter blending to approximate 22-bit output with 16-bit performance), the company was all over the place with general descriptions of what they were doing and why (post filter wasn't used for FSAA and it was turned off entirely in the V5 in 32-bit mode.) The product which premiered 3dfx's post filter use was the V3--which didn't even use FSAA at all. Anyway, a general description of what it was doing with the post filter was never considered by 3dfx as a "trade secret" it had to hide but rather as a feature it wanted to promote.

OK, I have absolutely nothing against nVidia employing the post filter as part of its FSAA--I'd just like to know what it's doing and how they are using it in that fashion--in general terms--they don't need to furnish me with schematics...;) Since it is known it was used for FSAA in the nv30 it's probably a given it's being used for nv35 FSAA--nVidia should be up front and forthcoming as to what FSAA modes it's being used in. I can only think of one reason they'd prefer not to talk about it--and that's not a complimentary reason. So obviously I'd prefer it if they were candid about it instead of avoiding the issue with spurious talk about "trade secrets." Heck, they've already admitted to using the post filter for FSAA with nv30--what, do they think ATi doesn't know what a post filter is???? *chuckle*

It'd just be nice to have some solid information to discuss instead having to reverse-engineer their stuff to find things out.

So, maybe the bandwidth has a part to play, but I also think the post filter might, too. It's just too bad nVidia feels it has to be so close-mouthed all the time. But maybe, as we've seen with their Detonators--they have a *reason* they'd rather not discuss these things candidly. That's why it needs to be thoroughly checked out.

Again--I have nothing against them using the post filter for FSAA in some capacity. I'd just like a little information about it, is all.

Edit: You know, when you look at how the 9800P clobbers the 5900U when both cards are set to their manufacturer-mandated maximum IQ settings (8xFSAA/8xAF for 5900U and 6x FSAA/16xAF for the 9800P) it's remakable (I think) that the 9800P produces demonstrably better image quality while running on average ~2x faster than the 5900U (according to results compiled in the [H] 5900U review.) And I'm talking a 9800P not overclocked like mine but running at the standard 380MHz core. I find those results very revealing as I think they indicate the product is not using the post filter with 8x FSAA, and that this is where we see the advantage of the R350's 8x1 organization over the nv35's 4x2, among other things. I just think that their QC-4x FSAA mode performance seems quite a bit different from their 8x FSAA performance, such that use of the post filter for FSAA modes of QC-4x might be a factor.
 
Dave H said:
But the simple fact is that the 5900U should have better performance than the 9800P at high MSAA levels. Sometimes the obvious explanation is also the correct one...

Then again they might have more bandwidth but they have less efficient compression so that might make it almost even steven..?
 
Walt-

What is it with you and these "post filter" conspiracy theories? There is really nothing suspicious or untoward about using a post filter: a post filter is capable of doing anything that a filter on the GPU can do; the only difference is its location, both physically on the chip and schematically in the rendering process. Under the simplest method of using a post filter to filter MSAA subsamples, you just have a tradeoff of less bus utilization (because you save the work of transfering the frontbuffer to the GPU for filtering and transfering back the filtered data to serve as the backbuffer) in exchange for a larger memory footprint (because for n-way MSAA your unfiltered backbuffer still has n sub-samples for every pixel). Although, the last time we talked this over Demalion did a much more detailed analysis which IIRC seemed to show that the post filter version actually consumed more bandwidth in the >2x MSAA case, although I forget why. (I had to have it explained to me slowly back then, too...) But in any case, that's the basic idea; and it should be obvious that the same filtering function can be used in a post filter as in the normal way of doing things.

In other words, the question of whether NV35 is using a post filter or not has absolutely no bearing on the IQ of their MSAA implementation. (Ok, there is one difference, namely that with a post filter a normal screenshot will not match what's shown on the screen. And, if you want to put on your tinfoil hats, it could certainly be possible that a special screenshot program could incorporate a different (i.e. higher quality) filter than what the actual post filter does. But this is really pushing it.)

Moving on...

WaltC said:
I thought about that...but what's the old saying, "Bandwidth without pixels-per-clock is like...a car without gasoline".....or something?....

In general, yes. But turning on MSAA only uses up bandwidth, not pixel fillrate. I'm well aware that the 9800P and 5900U tend to be even (or perhaps with a slight advantage to the 9800P) in non-AA situations, thanks. But turning up the MSAA will eventually lead to a bandwidth race, which the 5900U would be expected to win.

WaltC said:
You know, when you look at how the 9800P clobbers the 5900U when both cards are set to their manufacturer-mandated maximum IQ settings (8xFSAA/8xAF for 5900U and 6x FSAA/16xAF for the 9800P)...

Yes, we all know that supersampling is slower than multisampling. But thanks for bringing it to everyone's attention once again. :rolleyes:

Ante P said:
Then again they might have more bandwidth but they have less efficient compression so that might make it almost even steven..?

Well sure, that's possible. Or maybe R350's memory controller is simply more efficient than NV35's. Or maybe it will turn out that NV35's memory bus is really 128-bit wide QDR. (Oh, I'm so clever! :p ) And so on.

The fact is, we have two cards which are apparently very evenly matched when AA is turned off, but the 5900U appears to generally have a moderate performance advantage over the 9800P with 4xMSAA turned on. WaltC thinks this is evidence that the NV35 is "doing something funky" with its MSAA implementation. But it's no such thing: this behavior is exactly what we would expect when both cards have similar no-AA performance and the 5900U has ~25% more bandwidth.

Is that proof that the NV35 isn't cheating when it comes to AA? No, of course not. There could very well be some limitation in NV35 which prevents it from achieving its expected MSAA performance, which is then being covered up by some cheat or other. But there is just no evidence that that's the case.

NV35's strong performance with 4xMSAA is expected, not anomalous, and thus cannot be construed as evidence of wrongdoing.
 
With all the fuss about ATI&NV ... what about Matrox, SIS, Trident ? They probably dont have manpower to do clip plane tweakin in benchmarks ?
 
I VERY much doubt that Matrox has been doing any benchmark optimisations given their current situation.

Possibly R0M @ Matrox has looked at getting 3dMark to run across 3 screens :) but this I rather doubt as well :p
 
Code:
                Build  GT1   GT2  GT3
Matrox Parhelia 3.2.0  78.7  3.5  4.5
                3.3.0  78.9  3.5  4.5

Xabre 400       3.2.0  43.7  2.9  2.4
                3.3.0  43.5  2.9  2.4
 
Wow this thread is very long. I have read most of it, but hope I'm not repeating anyone.

Look at this (someone posted)

IMG0006295.gif


Does this benchmark show what FX can do? Is ATIs shaders really 240% faster than NVIDIA's?
Or did FM address some cheating, but didnt get it all? What guaranty do we have that none of the two companies are cheating in 330 now? With the lousy score 5900U has now, my 9700P scores higher, and I really doubt that my card is faster in most games.


Edit:
For me this whole stupid affair has shown that benchmarks can't be trusted. No matter what companies you deal with.
The whole benchmarking industry should change. Reviewers should benchmark more games, use more Fraps, make their own demoes and stuff like that.
With all the different techs the only thing a benchmark in UT2003 tells you is how fast the card is in UT2003, not anything about performance in D3D-games.
Thanks NVIDIA for crushing my illusion that benchmarks really showed anything.
And benchmarkers should really start to pay more attention to consistency in framerates, and minimum framerates. And more image quality. Average framerates are totally useless.
A short story: My old GF4 had a minimum framerate in GT1 in 3dmark2001se of 24 of something. My new 9700Pro has some strange stutter in that test giving it a minimum of 10fps or something. The result is off cource that 9700Pro has a much higher average framerate in that test. If it was a real game I would not like to see 10fps on my screen.
 
Your card is definitely faster, in fp-shader based games, at least. Your lucky to have such an engineering marvel at your fingertips ;).
 
binmaze said:
10fps on GT1? At which frame? My minimum around 42 on 1.6 @ 2.13 machine. I don't see any stuttering either.

It's probably a bug or something. Right after the car shoots down the first plane"thing".

But that is beside the point. The point was that minimum framerates are much more important :)
 
Galilee said:
It's probably a bug or something. Right after the car shoots down the first plane"thing".
God. Thought it's 3DMark03. LOL. Sorry about that. I've never played 3DMark01SE. Actually, I've never run 3DMark03 either till this cheat fiasco. I'd better go check 3DMark01.
 
Dave H said:
Walt-

What is it with you and these "post filter" conspiracy theories? There is really nothing suspicious or untoward about using a post filter: a post filter is capable of doing anything that a filter on the GPU can do; the only difference is its location, both physically on the chip and schematically in the rendering process. Under the simplest method of using a post filter to filter MSAA subsamples, you just have a tradeoff of less bus utilization (because you save the work of transfering the frontbuffer to the GPU for filtering and transfering back the filtered data to serve as the backbuffer) in exchange for a larger memory footprint (because for n-way MSAA your unfiltered backbuffer still has n sub-samples for every pixel).

"Conspiracy theory?" *chuckle* I don't recall objecting to nVidia using the post filter in any way they choose--what I object to is them refusing to describe it in any terms whatsoever, claiming it is a "trade secret" so incredibly valuable that even mentioning it in a generally descriptive way would endanger the "trade secret." That's a perfectly valid point.

Look, you claim to know what they are doing and how--will you link me to a source from nVidia corroborating your comments? It would be much appreciated. *chuckle* The only "conspiracy" I see here is people talking about it in the absence of any released information on the subject from nVidia...For a subject nVidia has quite literally clammed up on--you seem to know a lot about it...;)

BTW, the "trade secret" thing is not my invention--that's a quote of the words nVidia used when it declined to talk about what it was doing any further. Your questions are better directed toward nVidia than me...most assuredly...;) I'm only guilty of wanting to hear it from the horse's mouth...

Although, the last time we talked this over Demalion did a much more detailed analysis which IIRC seemed to show that the post filter version actually consumed more bandwidth in the >2x MSAA case, although I forget why. (I had to have it explained to me slowly back then, too...) But in any case, that's the basic idea; and it should be obvious that the same filtering function can be used in a post filter as in the normal way of doing things.

How can *anything* on the subject be "obvious" when nVidia refuses to talk about it...? I mean we can do conjecture and spin hypotheses all day long--doesn't mean they're correct or even pertinent. I'm just saying it would be nice for nVidia to declassify this "trade secret" (nVidia's words, not mine) and talk about it, explain its benefits, drawbacks if any, etc. That seems an entirely reasonable request.

In other words, the question of whether NV35 is using a post filter or not has absolutely no bearing on the IQ of their MSAA implementation. (Ok, there is one difference, namely that with a post filter a normal screenshot will not match what's shown on the screen. And, if you want to put on your tinfoil hats, it could certainly be possible that a special screenshot program could incorporate a different (i.e. higher quality) filter than what the actual post filter does. But this is really pushing it.)

Sorry, but I think if there were no differences relative to something--be it performance or image quality or both--there'd be no reason for nVidia to start using it with nv3x for FSAA in the first place. So it's clear to me there are several missing gaps that need filling in. It's fine if you prefer to fill in the pieces with your own explanations--I wouldn't deprive you that...;) But I'd much prefer hearing it from nVidia.

In general, yes. But turning on MSAA only uses up bandwidth, not pixel fillrate. I'm well aware that the 9800P and 5900U tend to be even (or perhaps with a slight advantage to the 9800P) in non-AA situations, thanks. But turning up the MSAA will eventually lead to a bandwidth race, which the 5900U would be expected to win.

Interesting that you seem to overlook compression here...as I recall that was one of nv30's most overrated, mis-hyped "features"...;)

Yes, we all know that supersampling is slower than multisampling. But thanks for bringing it to everyone's attention once again. :rolleyes:

That's funny--so SS is not only a lot slower than MS, it produces a much inferior level of image quality, too? That's what you're saying...? Because that's what you get when you compare the two. That's exactly why I think the example is so telling (if you have the eyes to see it...;))
Not only is 8xFSAA on the 5900U *much slower* than R9800P's 6x FSAA, it's also visibly inferior in the IQ it generates. Again, that's what makes the comparison noteworthy.
 
Back
Top