Some usual funny nonsense from THG...

Althornin · Sep 14, 2003

Borsti said:
We all know that ATI cards run better when it comes to DX9 shaders. But there are also ways to improve shader code for FX cards. And thatÂ´s not only using FP16 instead of FP32. ThereÂ´s a lot more in shader design/compiling you can do to let it run better on the CineFX architecture. It does not make sense to me that a 9600P runs faster than a 5900U with optimized code.

Why does this not make sense, if you are SHADER LIMITED?

Dio · Sep 14, 2003

WaltC said:
Perhaps, some readers were mislead by inaccurate information circulating at some web sites during that time which painted a much rosier picture for nV3x than it deserved? I think that is certainly true. So maybe those people who bought nV3x cards on the advice of some of those web sites should not only look to nVidia but to the web sites which hyped those products with glowing recommendations? Just a thought.

I think this is a wise observation...

And great news for Beyond3D.

Dave Baumann · Sep 14, 2003

I've just noticed this:

Borsti said:
ThereÂ´s a lot more in shader design/compiling you can do to let it run better on the CineFX architecture.

There is a couple of issues with this statement. You say there is more that can be done to make it run better for CineFX, but how sure are you that it can run better within the confines of the DX API?

The exact same situation occurs with Doom3 - with the standard OpenGL path NVIDIA runs much slower (half the performance) of ATI, to increase the performance Carmack uses NVIDIA's OpenGL extensions, which are tantamount to NVIDIA's own API for shader code - the NV extensions can get mich closer to how CineFX can handle the instructions best buy utilising a mix of precisions.

Under DX there is no framework for this type of action, other than NVIDIA's shader replacement - as I;ve said elsewhere, when NVIDIA replaces shader code in the drivers they won;t be replacing it at assembly level, but at machine level than can make much better use of its internal shader structure. The issue here is that when its running replacement shaders its not running pure DX9 code so the performance is not representative of the performance a user would see if he's just downloaded something off of Steam - the performance (to the end use if he were unaware of this) would be horrible until NVIDIA replaced that shader (and if it wasn't in a benchmark then they probably wouldn;t get round to it). Valve were tyring to get across that there is a huge difference in performance between precompiled shaders (that we've seen in benchmarks) to pure DX9 code, and he was trying to raise awareness to show users that the performance they see in benchmarks may not necessarily correlate to what they see when they download new stuff off of Steam.

It does not make sense to me that a 9600P runs faster than a 5900U with optimized code.

Well, this is actually the message that ATI were trying to get across on that day by saying "it just works". As my previous post indicates, in terms of "Pure DX9" execution units 9600 actually has roughly the equivelent of 5900 (believe it or not), which is why 9600 posts similar scores to the 5900 - you actually have to break outside of the framework of DirectX (or the the ARB approaved OpenGL fragement shaders) to get the shader performance that the CineFX can do, and there is no way for a DirectX developer to do this.

WaltC · Sep 14, 2003

DaveBaumann said:
Interesting that you should say that. Take a look at this Guru3D thread from a new FX5900 Ultra owner that I spotted in my referrals list. As a reviewer this is the type of responce that really concerns me - afterall, what are we here to do? Give consumers an accurate representation of what they can expect.

Wow...Talk about hitting the nail on the head...thanks for providing that link as it sums up my thoughts on this subject beautifully. I would imagine this guy is one of thousands in the same position--a position no different from my own several years ago. The only difference between he and I at this point in time is experience, which allows me to better sift through the things published at web sites as to the wheat and the chaff. When I was in his place I thought it was all wheat, too. But maybe he will learn more quickly than I did and in the process not burn through as much money and time along the way--he's certainly got better resources than I had (like B3d--no fluff, just fact) even if he doesn't discover them until after he's made an unfortunate purchase decision or two along the way. But the interesting thing about it is that he seems to have learned several things rather quickly because of the resources he found--even though they came too late for him in this case--and so next time he presumably will know where to go before he buys. I can't tell you of all the heated telephone calls I had with manufacturers of various products years ago because I read things about them in paper trade mag reviews that turned out to be sometimes completely false when I bought the products--some of which were quite expensive by current standards. So I know exactly where he's coming from. Wow--again, what a timely reference. Thanks. I really applaud the guy who wrote that thread--he's got an excellent attitude and I'm sure things will turn out well for him on down the line.

Bouncing Zabaglione Bros. · Sep 14, 2003

Borsti said:
@Bouncing Zabaglione Bros.
I do not say it happened that way nor do I suggest something. I thought this is a Forum and a Forum normaly means that you can discuss, read, learn, think and hear different opinions.

Lars

Yes, and when people post ridiculous statements, they get admonished for doing so and asked to justify their ridiculous words. This is how it should be, and how people who make ridiculous statements also read, learn, and think.

Just because I make a posting that the moon is made of green cheese, that doesn't make it any more correct just because it is "my opinion".

nelg · Sep 14, 2003

WaltC wrote:
Perhaps, some readers were mislead by inaccurate information circulating at some web sites during that time which painted a much rosier picture for nV3x than it deserved? I think that is certainly true. So maybe those people who bought nV3x cards on the advice of some of those web sites should not only look to nVidia but to the web sites which hyped those products with glowing recommendations? Just a thought.

The worst part of this is that it seems to go beyond ignorance. IMHO, it seems to me that certain sites have actively tried to show the NV3x in a postive light. Do you remember "The Pretender to the Throne" headline.

KimB · Sep 14, 2003

T2k said:
On the other hand: he couldn't be dead serious about Valve... why Valve should have offer the solution? Especially after they DID put a lot of effort spending 5x more time to create a goddamn' lowered precision version codepath for NV's FX-family only?

Let me just say that the NV3x generation is not a generation that should be programmed to using assembly.

What Valve should have done is simply used HLSL or Cg, and not done anything at a lower level. It should be up to nVidia to make sure that the compilers are optimal for the NV3x hardware.

So, I think that Valve spending 5x the development time was a mistake. I think they were attempting to tweak the assembly without full knowledge of the hardware. I think it took a lot of time, and wasn't terribly productive.

I hope that future game developers start putting forth an ethic of refusing to program shaders in assembly. This will produce three desirable results. The first is that it puts the pressure on the hardware vendor to make sure the shaders work optimally. Secondly, it reduces the workload of the software developer. Thirdly, it frees up hardware vendors to make more varied and original designs.

Dave Baumann · Sep 14, 2003

What Valve should have done is simply used HLSL or Cg, and not done anything at a lower level.

They used HLSL.

WaltC · Sep 14, 2003

nelg said:
The worst part of this is that it seems to go beyond ignorance. IMHO, it seems to me that certain sites have actively tried to show the NV3x in a postive light. Do you remember "The Pretender to the Throne" headline.

Yes, you're always going to have the "marketeers" and the "fluff" sites, which think their job is to become a PR arm of various manufacturers, and little else...but you know...I'm encouraged lately--especially by the thread Dave B. linked for me above in which a new 3d user talks about his disappointment with some hardware reviews he based his purchase on--that as time moves on we are actually moving forward and an above-average consensus among general users is beginning to show signs of development. It's a sign that the industry is beginning to firm up and mature, which is very encouraging. People are beginning to grasp and command concepts which just a few years ago I would have thought would be unlikely--but it's because the good information--the good, solid stuff--is getting circulated and more and more people are absorbing it and thinking about it and talking about it. It could be that in just a couple of years the fluff sites will fall away to be replaced by sites offering information consumers really do want to know--whether some of the extant powers that be believe it or not. The Internet is playing an increasing role in this, of course. Anyway, I think the days where sites can publish bad information with impunity are numbered, because people eventually do learn what's going on with enough exposure. OK, enough cheery optimism...

KimB · Sep 14, 2003

DaveBaumann said:
They used HLSL.

But did they then tweak the assembly? That's the key...

T2k · Sep 14, 2003

Chalnoth said:
T2k said:

On the other hand: he couldn't be dead serious about Valve... why Valve should have offer the solution? Especially after they DID put a lot of effort spending 5x more time to create a goddamn' lowered precision version codepath for NV's FX-family only?

Click to expand...

Let me just say that the NV3x generation is not a generation that should be programmed to using assembly.

What Valve should have done is simply used HLSL or Cg, and not done anything at a lower level. It should be up to nVidia to make sure that the compilers are optimal for the NV3x hardware.

Khm... FYI: HL2 has been developed fully in HLSL. They haven't used anything else.

EDIT: Ah, Dave was faster...

Chalnoth: according to some interview (I'll search later) they used ONLY HLSL...

GraphixViolence · Sep 14, 2003

One more comment about the statement in Lars' conclusion:

We hope to get an explanation from Valve soon, where they explain what exact problem they have with Detonator 50.

Is it even reasonable to expect Valve to be able to tell us this? Since they certainly must have NDA agreements with Nvidia, how would they ever be allowed to divulge specific optimization techniques they became aware of in the drivers? Whether they are valid or not, they are Nvidia's IP and they have to be protected.

Assuming Valve took issue with certain optimizations, it seems that they took the only action they could to make them public without breaking the law. That is, they describe the types of optimizations they've seen, without linking them to any specific driver or IHV. They then specifically recommend against using a particular Nvidia driver. Sounds like a pretty clear signal to me.

I'd be surprised we ever get any specific answers to Lars' question from Valve or Nvidia. But fortunately Valve has given us some things to look out for when the Det50's do become public.

Dave Baumann · Sep 14, 2003

I asked if there was anything they can do to help spot the optimisations they have mentioned and they said "No, other than tweaking the instruction order". I suspect they may alter some of the shader codes before they drop the benchmark to disable the shader detects/replacements.

madshi · Sep 14, 2003

Borsti said:
We all know that ATI cards run better when it comes to DX9 shaders. But there are also ways to improve shader code for FX cards. And thatÂ´s not only using FP16 instead of FP32. ThereÂ´s a lot more in shader design/compiling you can do to let it run better on the CineFX architecture.

Lars, there are several developers visiting the beyond3d forums. And what I've read again and again is that it's almost impossible to get good performance out of the Nv3x DX9 shaders - at least compared to the R300. The performance difference I've heard being mentioned is between twice and ten times better for the R300, depending on the shaders.

I suggest that you do some research and ask some 3d programmers who have done serious DX9 programming and I'm sure they'll all confirm that. If you ask me, you should have done that half an eternity earlier, though. There have been lots of hints and signs about the DX9 performance of Nv3x vs R3x0 for months, but unfortunately the biggest websites Anandtech/Tom/HardOCP have heavily underestimated the significance of this.

With future true DX9 games evidence will come, and I think (and somehow hope) that this will come back to Anandtech/Tom/HardOCP, because all their past reviews of Nv3x vs R3x0 will prove to have been short sighted at best.

Dave H · Sep 14, 2003

Chalnoth said:
So, I think that Valve spending 5x the development time was a mistake. I think they were attempting to tweak the assembly without full knowledge of the hardware. I think it took a lot of time, and wasn't terribly productive.

You don't think Nvidia DevRel gave Valve all the resources they possibly could to make the most anticipated bleeding-edge game of the year run decently on their products?

Of course Valve had full knowledge of the underlying hardware. The problem is likely that when using HLSL that doesn't buy you enough when you're trying to target NV3x.

Let me just say that the NV3x generation is not a generation that should be programmed to using assembly.

At the current time I think we can say exactly the opposite: your best hope of succesfully targeting NV3x in DX9 is to use PS 2.0 assembly rather than HLSL, even though you still can't hope for better than mediocre performance. The most pressing cause of poor NV3x fragment shader performance is clearly the extremely tight restrictions on register usage. Clever programming in PS 2.0 assembly can address this issue to the fullest extent possible, although it would seem very unlikely that most shaders of any complexity can be reasonably programmed without overstepping the bounds of NV3x's 4 FP16 or 2 FP32 full-speed registers. Meanwhile, any architecture-neutral HLSL compiler is going to use far more temp registers, which is necessary to enable most of the optimizations that compilers typically perform. An HLSL compiler targeted at generating optimal PS 2.0 code for NV3x could concievably do much better, but probably not as good as a determined human assembly programmer.

The second cause of NV3x's poor PS performance--at least in the case of NV30, NV31 and NV34--is of course the inability to make any use whatsoever of the FX12 units in a PS 2.0 shader. But this is a fundamental limitation of the API itself, and neither HLSL nor PS 2.0 assembly can circumvent it. The only "solution" here (other than to rewrite the game in OpenGL) is for Nvidia's drivers to cheat and generate FX12 machine instructions anyway, either by special-casing shaders from known high-profile games, or by some sort of general optimization (ack!).

Third, there are likely some scheduling tricks (other than those necessary to keep register count as low as possible) that could help out NV3x performance a bit. Of course these are just as accessible to a human assembly programmer who knows the relavent performance characteristics of the hardware (and, again, it is inconcievable that Valve would not) as to a good optimizing compiler; and, again, the only chance that that compiler will generate such code is if it is specifically targeted at NV3x to the exclusion of other architectures.

So I'm not sure what you mean by suggesting that programming in HLSL (or Cg) can buy NV3x performance it can't otherwise achieve when programmed in PS 2.0 assembly. Unless you're suggesting shaders be shipped in uncompiled Cg form and runtime compiled without going through the intermediate step of passing through PS 2.0. (I'm fairly certain that HLSL does not have this ability; it's always compiled to PS 2.0 assembly at compile time.) In that case, Nvidia drivers could presumably circumvent the DX9 spec by issuing FX12 instructions as discussed above, although now with the extra context available to it from having the full Cg code instead of the intermediately-compiled assembly version. Indeed, this ability to potentially circumvent the DX9 specs is presumably the reason Nvidia pushed Cg so hard to the exclusion of MS's HLSL. Fortunately enough, that push has failed in the marketplace, and it's clear that the vast majority of DX9 games will have their shaders written in HLSL, not Cg or PS 2.0 assembly. Nor, given the performance of Cg and HLSL in PCChen's (IIRC) synthetic tests or in TR:AOD, does it seem Nvidia even got around to taking much advantage of the nefarious possibilities inherent in Cg; on average it does no better, and sometimes worse, than HLSL.

There is apparently much room for improvement in Nvidia's NV3x-targeted HLSL compiler; and there may be significant optimizations still waiting to be had by their runtime assembly compiler as well (although at that point it will often be too late to save decent performance on NV3x). It just seems unlikely any of this will go far at all towards making up the huge gulf between NV3x and R3x0 in PS 2.0 performance.

Dave H · Sep 14, 2003

DaveBaumann said:
However, for me Gabes presentation at Shader Day wasn't actually the most eye opening - it was Eric's presentation that really interested me because his R300 shader diagram actually shows that for some operations the R300 shader core can handle twice as many ops per cycle as we had previously thought.

Details?

Chris123234 · Sep 14, 2003

the 5x might take into account the fact that they had to make textures for effects on the nv3x instead of using the equations to generate it. (Sorry for the lack of specific words..I can remember of them lol)

But it would probably not account for to much of it.

Heathen · Sep 14, 2003

One other could be:

- They want something from NVIDIA
- NVIDIA did not react or they rejected it
- Valve published the info to put pressure on NV

Maybe Valve just wanted a card which performed as per the hype Nvidia's been pumping out these long months...

Nvidia didn't react because what valve is seeing is all there is, a typical first gen release nvidia product where most of the features are purely for the checkbox sheet.

Of course they published it to put pressure on Nvidia, guess where the pressure would have been on if they hadn't. Valve couldn't be bothered with that game so pre-empted the situation.

Mark · Sep 14, 2003

It's ridiculous to expect Valve to spend any more time on nV peformance with HL2 than they already have. I'm convinced that's why the game has been delayed already.

Games are heading toward DX9 like this. Does nVidia expect every dev to create specialized code-paths for them? Do they expect devs to spend months trying to squeeze playable framerates out of their cards? Isn't that nVIdias job, to make cards that perform acceptably well?!

rwolf · Sep 14, 2003

HL2 Results have already been confirmed by JC, halo, and the new tomraider game.

Some usual funny nonsense from THG...

Althornin

Senior Lurker

Dio

Dave Baumann

Gamerscore Wh...

WaltC

Bouncing Zabaglione Bros.

nelg

KimB

Dave Baumann

Gamerscore Wh...

WaltC

KimB

T2k

GraphixViolence

Dave Baumann

Gamerscore Wh...

madshi

Dave H

Dave H

Chris123234

Heathen

Mark

aka Ratchet

rwolf

Rock Star

Similar threads