Another ATIvsNVIDIA review from gamersdepot

Anybody else a little perturbed by the manner of that article? :?

Maybe it's just me... I don't really disagree with anything they say, it's just a bit "in yer face".

MuFu.
 
Maybe it's just me... I don't really disagree with anything they say, it's just a bit "in yer face".

I don´t personally like the tone and bluntness myself, but up until recently it hasn´t been always galant and tactful from the other side either, has it?
 
What? You don't think NVidia asked for this. They have mislead the press and their customers with lies and cheats. Sooner or later someone is going to tell it the way it is.

Maybe NV is running out of money and couldn't buy him off. ;)
 
MuFu said:
Anybody else a little perturbed by the manner of that article? :?

Maybe it's just me... I don't really disagree with anything they say, it's just a bit "in yer face".

MuFu.

I agree it was a little on the aggressive side but right now I feel the same way with all NV's PR tactics of late.
 
Sure, they asked for it. I just think H/W websites should always rise above the kind of craptalk you find on forums, whether it is solidly based on fact or not.

MuFu.
 
vrecan said:
http://www.gamersdepot.com/hardware/video_cards/ati_vs_nvidia/dx9_desktop/001.htm

What I found interesting was the comments from carmack at the very beginning that as soon as you use pixel shaders the nv3x slows to a crawl! Good to see NV's pr is slowly starting to crumble.

That's not what Carmack said ("crawl"). He said ARB2 was slow, but he has alternate shaders written to handle the situation.

Not to excuse the FP32 NV30 performance, but let's not put fanboy terminology in Carmack's mouth.
 
First Kyle and now this. Maybe they are pissed for being an unwitting thrid party to deception. :? They do, after all, have a responsibility to their readers.
 
DemoCoder said:
vrecan said:
http://www.gamersdepot.com/hardware/video_cards/ati_vs_nvidia/dx9_desktop/001.htm

What I found interesting was the comments from carmack at the very beginning that as soon as you use pixel shaders the nv3x slows to a crawl! Good to see NV's pr is slowly starting to crumble.

That's not what Carmack said ("crawl"). He said ARB2 was slow, but he has alternate shaders written to handle the situation.

Not to excuse the FP32 NV30 performance, but let's not put <bleep> terminology in Carmack's mouth.

GD: John, we've found that NVIDIA hardware seems to come to a crawl whenever Pixel Shader's are involved, namely PS 2.0..

Have you witnessed any of this while testing under the Doom3 environment?

"Yes. NV30 class hardware can run the ARB2 path that uses ARB_fragment_program, but it is very slow, which is why I have a separate NV30 back end that uses NV_fragment_program to specify most of the operations as 12 or 16 bit instead of 32 bit."

by him saying YES that means he agree's unless I am insane?
 
I tend to agree, MuFu, but that's part of why B3D stands out to me. But I try to settle for honesty and accuracy, which seems to be evident at least.

Of course, another way of looking at it is that reality just so happened to coincide with hiding that they just like to slam one IHV or another for whatever reason, but that would require looking at how reality related to past instances of such strong commentary, if there are any.
 
Interesting side note: AFAIK this is the first explicit confirmation that Doom3's NV30 path will be making extensive use of FX12 precision, not just FP16. Not that this is much of a surprise--at this point, we've known for a while that was the only likely way NV30-34 could manage to be competitive with corresponding ATI parts. (Although PS 2.0-equivalent OpenGL tests have been non-existent enough that some may have still hoped it was due to the NV_fragment_program compiler being magically better than the ARB_f_p compiler and the various PS 2.0 compilers. Let this be another nail in the coffin of the "still unoptimized drivers" theory.)

Moreover, it provides some much-needed context for the cryptic comment that NV35 (specifically) would likely be using the ARB2 path when all was said and done: since NV35 ditched its FX12 units, it should indeed be equally fast running ARB_fragment_program with fastest (i.e. FP16) precision hints. NV30-34, of course, will not be.

Of course I trust Carmack to have limited FX12 usage to those effects where the extra dynamic range of FP is least likely to be noticeable. OTOH--and unlike with FP16, which really ought not make any difference for the short shaders used by Doom3--I doubt it won't be noticeable at all.

Still, the fact that the NV30 path will indeed be using FP16 in the spots where Carmack thought it necessary could leave IQ several steps ahead of the R200 path. I suppose my main question is how the two precisions are likely to be mixed, particularly because the majority of fragment processing workload would seem to be in one lighting shader applied to almost every visible surface. Does it make any sense to mix precisions in that shader--say, to do the diffuse lighting calcs in FX12 and the specular in FP16? Is that legit in NV_f_p, and would it give you the performance and quality one might ideally expect?
 
Dave H said:
Moreover, it provides some much-needed context for the cryptic comment that NV35 (specifically) would likely be using the ARB2 path when all was said and done: since NV35 ditched its FX12 units, it should indeed be equally fast running ARB_fragment_program with fastest (i.e. FP16) precision hints. NV30-34, of course, will not be.

http://www.beyond3d.com//previews/nvidia/nv35/index.php?p=8

* NVIDIA guards the internal shader architecture very closely, and limits the details that it give out publicly, leaving us to trying to guess what's really going on inside. NVIDIA states that for NV35 they have moved to totally floating point units in the shader pipeline, whereas NV30 featured one FP32 unit per pipe, and two FX12 integer (DX8) units per pipeline. Now, given the main issue with implementing fully floating point units previously was that of the number transistors required to do it (and hence the silicon die size costs) we can assume that removing eight FX12 units and replacing them with eight FP32 units would still require quite a transistor difference, and yet there is only a quoted addition of 5Million more transistor in NV35, which represents only a 4% difference. Given this 5 Million transistors also includes doubling the internal memory bus width from 128-bit to 256-bit it would seem unlikely from these figures that NV35 utilises a total of 12 FP32 units. For now, we are still left guessing as to its exact internal structure.
 
Dave H said:
Moreover, it provides some much-needed context for the cryptic comment that NV35 (specifically) would likely be using the ARB2 path when all was said and done: since NV35 ditched its FX12 units, it should indeed be equally fast running ARB_fragment_program with fastest (i.e. FP16) precision hints. NV30-34, of course, will not be.
How will the P.R. department spin this if the results show that the 5800ultra > 5900ultra in Doom3 :oops: . They sure overlooked the issue of different paths and its implications when they said...
Since NVIDIA is not part in the FutureMark beta program (a program which costs of hundreds of thousands of dollars to participate in) we do not get a chance to work with Futuremark on writing the shaders like we would with a real applications developer. We don't know what they did, but it looks like they have intentionally tried to create a scenario that makes our products look bad. This is obvious since our relative performance on games like Unreal Tournament 2003 and Doom 3 shows that the GeForce FX 5900 Ultra is by far the fastest graphics on the market today.

my bold
 
BTW, does anyone remember the press release where nVidia quoted [H] saying that the 5900ultra outperformed the 9800pro in Doom3. I could not find it on their site.
 
John Reynolds said:
Dave H said:
Moreover, it provides some much-needed context for the cryptic comment that NV35 (specifically) would likely be using the ARB2 path when all was said and done: since NV35 ditched its FX12 units, it should indeed be equally fast running ARB_fragment_program with fastest (i.e. FP16) precision hints. NV30-34, of course, will not be.

http://www.beyond3d.com//previews/nvidia/nv35/index.php?p=8

* NVIDIA guards the internal shader architecture very closely, and limits the details that it give out publicly, leaving us to trying to guess what's really going on inside. NVIDIA states that for NV35 they have moved to totally floating point units in the shader pipeline, whereas NV30 featured one FP32 unit per pipe, and two FX12 integer (DX8) units per pipeline. Now, given the main issue with implementing fully floating point units previously was that of the number transistors required to do it (and hence the silicon die size costs) we can assume that removing eight FX12 units and replacing them with eight FP32 units would still require quite a transistor difference, and yet there is only a quoted addition of 5Million more transistor in NV35, which represents only a 4% difference. Given this 5 Million transistors also includes doubling the internal memory bus width from 128-bit to 256-bit it would seem unlikely from these figures that NV35 utilises a total of 12 FP32 units.

In addition to numerous hints from developers, web writers and Nvidia themselves, there's plenty of concrete evidence suggesting that NV35 has identical NV_fragment_program performance with FX12 and FP16. (For example, Uttar's hacked Dawn demos.) That does not in any way imply they've gone from 1 full-featured PS 2.0+ FP32 and 2 PS 1.3-functionality FX12 units per pipe to 3 full PS 2.0+ FP32 units. Not even close.

For all we (or at least I) know, each NV35 pipe could have replaced its 2 FX12 units with only 1 FP32 unit (perhaps capable of 2 FX12 ops to retain similar PS 1.1-1.3 performance-per-clock). (EDIT: there are, after all, 23 + 1 fixed point bits.) Or with 2 FP16 units (after all, what would be the point of having more than one FP32 unit when you can only use two FP32 temp registers at full speed anyways??).

That German article based on the Nvidia patent seemed to suggest that the new FP unit(s) only had ADD, MUL, and MAD functionality, leaving the other PS 2.0+ ops for the pre-existing full FP32 unit; that sounds extremely likely to me.

There's the possibility that NV30 kept a seperate set of physical registers for PS 1.1-1.3 shaders, and those were removed from NV35 along with the FX12 register combiners, saving more transistors. Indeed, we have no idea how many of NV30's 130 million (IIRC) were "dark transistors", implementing broken functionality that may have been removed from NV35 to save space.

There are a gazillion ways NV35 could have dumped its FX12 units and still had such a small transistor budget increase over NV30 (even with the extra-wide memory controller).

For now, we are still left guessing as to its exact internal structure.

Exactly. All we appear to know for a fact is that NV35 has identical FP16 and FX12 performance. :p
 
JC's quote is interesting, if only as a tease. We now know the NV30 path uses FX12 as well as FX16, but the question remains: How much faster will the NV3x be compared to its R3x0 counterpart with final Doom 3 code? (I'm still not taking the AT and [H] numbers as fact, though I suspect nV may remain faster using the NV30 codepath.)

I'm not so much worried about IQ, as JC must have coded D3 with non-FP limits in mind. I imagine FP shading will add more speed than IQ over older, INT/FX cards. Right? :)
 
That German article based on the Nvidia patent seemed to suggest that the new FP unit(s) only had ADD, MUL, and MAD functionality, leaving the other PS 2.0+ ops for the pre-existing full FP32 unit; that sounds extremely likely to me.

Did it? I didn't see it (I mentioned it in the threadP.
 
Pete said:
I'm not so much worried about IQ, as JC must have coded D3 with non-FP limits in mind.

True, although don't forget he also made his public feature request for FP16 color soon after he started work on the D3 engine. Certainly with lighting playing such a large role the extra dynamic range should help out in certain circumstances. But you're right that the engine is primarily targeted at FX8 rendering.

I imagine FP shading will add more speed than IQ over older, INT/FX cards. Right? :)

Well, it's not the FP that would add speed, but rather the ability to use a shader to do the lighting in one pass that adds speed.

JC has said that the 9700 runs the R200 path slightly faster than the R300 path. Both use a shader to implement the lighting model in a single pass; the R200 path uses PS 1.4 functionality via ATI_fragment_shader, while the R300 path uses PS 2.0 functionality via ARB_fragment_program. Whether the R200 path is faster due to not using FP (but would this matter on an R300 which has FP24 pipes throughout? Perhaps if FP values are ever read/written to the graphics DRAM.), ATI_fragment_shader being for some reason slightly more efficient than ARB_fragment_program on R300 (or perhaps it was so back when Carmack made his .plan update), or due to the R300 path enabling a few extra features, isn't completely settled I don't think.

Clearly the reason to prefer the R300 path to the R200 is IQ; I believe at least a large part of the IQ benefit is from the use of FP color, but, again, perhaps other features are involved as well.

Meanwhile, for NV30-34, clearly the use of FX12 is speeding things up greatly, as one would expect given their architecture. Just as clearly, the use of FP color must have some positive IQ benefit, because otherwise the NV30 path would be entirely FX12, which it is certainly not.
 
Dave H said:
Let this be another nail in the coffin of the "still unoptimized drivers" theory.
The problem with existing FX drivers, as indicated by recent articles, is one of scheduling. The mythical 50.x are, apparently, far better in this regard. Also, apparently, some revisions are already in the hands of OEMs/devs for testing. They still contain some apparent hand tweaks, which are to be made less apparent on release... apparently...:)
 
I think I feel vindicated in my frequent assertions over the past year that "scheduling is hard" and "better compilers are needed" :) If you go back to my first messages on the NV30, I predicted instruction scheduling would be the bottleneck. Some responded that such reordering is trivial given PS2.0 architecture, but now we see that really depends on your architecture.

I think I'll go out of a limb and say that when 3.0 arrives with dynamic branches, good optimizations in the driver will be one of the deciding factors in performance, and I'll bet that the first few drivers for the NV40/R420 will have a fraction of the true performance that will later be delivered in updates.
 
Back
Top