R360 != 0.13 process ?

KimB · Aug 15, 2003

DaveBaumann said:
The majority of the shader performances with NV35 suggest that the configuration has not significantly changed from NV30.

http://www.beyond3d.com/previews/nvidia/nv35/index.php?p=8

DirectX8. FP processing isn't required here, anyway.

http://www.beyond3d.com/previews/nvidia/nv35/index.php?p=19
http://www.beyond3d.com/previews/nvidia/nv35/index.php?p=20

Despite my objections to 3DMark as a whole, if I remember correctly, there were significant quality problems in this benchmark on the FX 5800. That means integer was being used.

Regardless, we all know that the FX 5800 is not well-suited for use in DirectX. It does, however, work to spec when using nVidia's OpenGL extensions. It would probably be best to use nVidia's OpenGL shader extensions, then, if you're attempting to compare the hardware. At the very least, Microsoft's DirectX 9 shaders are different enough from nVidia's OpenGL shaders, and therefore from the hardware, that significant compiling between the assembly and the machine guarantees that the differences in the architectures aren't going to be readily-apparent.

Seiko · Aug 15, 2003

Chalnoth said:
1. I don't expect replacement shader codes. I expect optimized shaders in the games themselves. This is the way that software is going. OpenGL is just beginning to support architecture-specific compiling, and I expect (well, mostly hope, I suppose) DirectX to have architecture-specfic compiling soon as well. This should take care of most of the optimization.

This concept alone scare the bajeezas out of me. Whats the point of having a standard API (well two of them) if the dev teams than have to optimise for specific hardware?

Chalnoth said:
It would probably be best to use nVidia's OpenGL shader extensions, then, if you're attempting to compare the hardware.

Again I'd respectfully disagree. Simply compare the hardware using the most common APIs and software engines around. If Nvidias DirectX performance is poor it should be highlighted accordingly.

surfhurleydude · Aug 15, 2003

Chalnoth, give it up. You can't win the argument. If you expect optimizations in the games themselves, expect them for ATi Radeon cards as well. If it's optimized for both, you can expect one of the following scenarios:

1. Equal quality for both, but horrible performance on GeForce FX 5900 Ultra compared to the Radeon 9800 Pro.

2. Equal performance for both, but severely degraded image quality for the GeForce FX 5900 Ultra to reach 9800 Pro levels of speed.

3. Degraded image quality for both, with horrible performance on the GeForce FX 5900 Ultra compared to the 9800 Pro.

Sorry, but these are the only 3 things that can happen. If you don't realize this, then you're only lying to yourself.

Dave Baumann · Aug 15, 2003

Chalnoth said:
DaveBaumann said:

The majority of the shader performances with NV35 suggest that the configuration has not significantly changed from NV30.

http://www.beyond3d.com/previews/nvidia/nv35/index.php?p=8

Click to expand...

DirectX8. FP processing isn't required here, anyway.

Well, as the text is suggesting, given the silicon budgets for full FP processing its highly unlikely that all FX units would have been replaced with an equal number of FP units. The Splinter Cell, and all DX8 tests, indicate that DX8 shading is still inline in terms of performance, meaning there is an equal number of shader (FX or FP) units in total, so its unlikely that all 8 FX units were replaced with FP units with only a 5M transistor difference. This is also evidenced by the fact there is no order of magnitude increase in FP shading performance â€“ quite the opposite in most cases as the later tests show.

Despite my objections to 3DMark as a whole, if I remember correctly, there were significant quality problems in this benchmark on the FX 5800. That means integer was being used.

Wrong.

The 330 patch was used, which forced the 44.03 drivers into using Futuremarks shaders, which call for DX9 FP operation. The artefacts were there because (again, as the later tests show) the drivers were forcing the 5800 into executing these shaders in FP16, whereas the 5900 is executing at FP32.

Colourless · Aug 15, 2003

Having the shader compiler in the drivers could be a nightmare, because it will lead to unpredictable results.

For example, I could write a shader that compiles to say 60 instructions using the D3DX shader compiler. I know that I can distribute this shader in it's HLSL form because I know that it will work with all PS2.0 hardware.

Now, if we start to get driver based compilers then my shader might get compiled to say 65 instructions by a theoretical compiler in the drivers by some company. What happens then is the shader will no longer work because it's too long, as the limit the hardware can use is 64 instructions. This then causes a support nightmare for all involved.

Joe DeFuria · Aug 15, 2003

Colourless said:
Having the shader compiler in the drivers could be a nightmare, because it will lead to unpredictable results.

This was my primary argument aginst Cg.

It's bad enough that developers have to deal with different driver versions. But throw in another layer of different shader compilers among driver revisions...and it makes it that much more of a problem imo.

There are of course advocates of each vendor having their own compilers...reason being each vendor knows his hardware best. But I don't see that outweighing the negative of unpredictability. Especially when the "standard compiler" (supplied my Microsoft), seems to do a very good job of general optimization in the first place.

Seiko · Aug 15, 2003

Joe DeFuria said:
Especially when the "standard compiler" (supplied my Microsoft), seems to do a very good job of general optimization in the first place.

Exactly this to my mind is critical if we're really going to unify the 3D element of games. Consoles are a perfect example of how a single 3D API can yield great effects even though they are somewhat limited in specs alone when compared with PC hardware. When the poor developers end up having to support multiple versions of both OS and 3D API ultimately results are going to suffer, shader considerations can only muddy this already dirty water and can only worsen the gamers experience. IMHO of course

KimB · Aug 15, 2003

Seiko said:
This concept alone scare the bajeezas out of me. Whats the point of having a standard API (well two of them) if the dev teams than have to optimise for specific hardware?

hardware....specific....compiling....

Chalnoth said:
Chalnoth said:

It would probably be best to use nVidia's OpenGL shader extensions, then, if you're attempting to compare the hardware.

Click to expand...

Again I'd respectfully disagree. Simply compare the hardware using the most common APIs and software engines around. If Nvidias DirectX performance is poor it should be highlighted accordingly.

I was referring to Dave's comments on the difference (or lack thereof) between the shader performance of the 5800 and 5900. The actual hardware differences would be more easily-seen in nVidia's OpenGL extensions than they would be anywhere else.

KimB · Aug 15, 2003

surfhurleydude said:
1. Equal quality for both, but horrible performance on GeForce FX 5900 Ultra compared to the Radeon 9800 Pro.

2. Equal performance for both, but severely degraded image quality for the GeForce FX 5900 Ultra to reach 9800 Pro levels of speed.

3. Degraded image quality for both, with horrible performance on the GeForce FX 5900 Ultra compared to the 9800 Pro.

Sorry, but these are the only 3 things that can happen. If you don't realize this, then you're only lying to yourself.

BS. Using FP16 will not result in "severely degraded image quality" for the FX 5900, but should bring the 5900 to similar performance levels. There are many situations where FP32 will be preferable, but for most FP16 will be more than enough. The 9800 Pro doesn't benefit at all from FP16.

Althornin · Aug 15, 2003

Chalnoth said:
This isn't nVidia propoganda. I've been saying 3DMark is essentially useless since 3DMark2000.

You saying it doesnt make it so.

And finally, the GeForce FX 5900 does better than the Radeon 9800 Pro in 3DMark2003. I still say it's a meaningless benchmark.

only when nVidia is cheating does the GFFX win.

Not on the 5900. That's what we're talking about here.

yeah, it sucks even on the 5900. Got a benchmark that proves me wrong? bring it. Nice job ignoring that fact that your mythical performance depends on an external factor that is not guaranteed....

Except, from what I've heard, it's mostly nVidia spending the time to help developers optimize for their hardware. Besides, the installed-base of nVidia hardware virtually guarantees optimization (particularly at the low-end...nVidia currently has the only low-end DX9 card).

"virtually"
Oh yeah, and even when optomized for, its still slower than ATI, if you look at say, TR:AOD (cg).

BS. Using FP16 will not result in "severely degraded image quality" for the FX 5900, but should bring the 5900 to similar performance levels. There are many situations where FP32 will be preferable, but for most FP16 will be more than enough. The 9800 Pro doesn't benefit at all from FP16.

All i have to do to disprove this is go back a year and look at all your examples that show that FP24 is not enough and FP32 is REQUIRED to be any good.

Seiko · Aug 15, 2003

Chalnoth said:
Seiko said:

This concept alone scare the bajeezas out of me. Whats the point of having a standard API (well two of them) if the dev teams than have to optimise for specific hardware?

Click to expand...

hardware....specific....compiling....

Could you elaborate just what this entails? I had assumed (and chances are incorrectly) that a flag of sorts would be set at complie time to indicate whether or not the compiled code would be run on hardware x or y. Smiliar to pentium pro optimisations of yesteryear etc? If this is so then wouldn't that classify as specific optimising? As colorless mentioned you'd hardly want the drivers to try and do this on the fly which really only leaves it in the hands of dev teams? If so then I stand by my point, I'd rather they concentrate on generic API solutions as opposed to hardware specifics?

Then again, I am only just coming to terms with shader coding

KimB · Aug 15, 2003

Seiko said:
Could you elaborate just what this entails? I had assumed (and chances are incorrectly) that a flag of sorts would be set at complie time to indicate whether or not the compiled code would be run on hardware x or y. Smiliar to pentium pro optimisations of yesteryear etc? If this is so then wouldn't that classify as specific optimising? As colorless mentioned you'd hardly want the drivers to try and do this on the fly which really only leaves it in the hands of dev teams? If so then I stand by my point, I'd rather they concentrate on generic API solutions as opposed to hardware specifics?

Then again, I am only just coming to terms with shader coding

Same idea, just that the compiling would be done at runtime. Since shaders are sufficiently short, this compiling shouldn't take a significant amount of time (if it does, the game could cache the compiled programs).

The main problem with, for example, Microsoft's HLSL compiler is that it compiles to the standardized assembly. OpenGL's language will not. It will compile directly to machine language (though it may offer an intermediate compiler that goes to standard shaders, I haven't looked up on that just yet). By nixing this intermediate step, it will allow each IHV to have the compiler optimized much more for their own architecture, without having to push developers for optimization as much.

Seiko · Aug 15, 2003

Chalnoth said:
Seiko said:

Could you elaborate just what this entails? I had assumed (and chances are incorrectly) that a flag of sorts would be set at complie time to indicate whether or not the compiled code would be run on hardware x or y. Smiliar to pentium pro optimisations of yesteryear etc? If this is so then wouldn't that classify as specific optimising? As colorless mentioned you'd hardly want the drivers to try and do this on the fly which really only leaves it in the hands of dev teams? If so then I stand by my point, I'd rather they concentrate on generic API solutions as opposed to hardware specifics?

Then again, I am only just coming to terms with shader coding

Click to expand...

Same idea, just that the compiling would be done at runtime. Since shaders are sufficiently short, this compiling shouldn't take a significant amount of time (if it does, the game could cache the compiled programs).

The main problem with, for example, Microsoft's HLSL compiler is that it compiles to the standardized assembly. OpenGL's language will not. It will compile directly to machine language (though it may offer an intermediate compiler that goes to standard shaders, I haven't looked up on that just yet). By nixing this intermediate step, it will allow each IHV to have the compiler optimized much more for their own architecture, without having to push developers for optimization as much.

Thx Chalnoth,
Yes I see your point but with such a simplified set of instructions I'd expect a single and unified approach to be acceptable to all. The thought that a switch in instructions position should cause such a difference between the cards at this point in time is a little concerning. Sure, I appreciate that a process change will always have an effect but to think a developers attempt to tweak one process could be positive to one card and yet negative to another on a large enough and noticeable scale is really worrying. Whatâ€™s it going to be like with PS3/VS3?

It's hard enough to balance state and texture changes so that an acceptable performance on differing cards is obtained. Now throw in the mix shader specifics and boy what a mess? I can see a whole bunch of gamers using one IHVs card singing a games praise and yet the other group using another IHVs card loathing it due to performance problems occurring more frequently.

I'd also ponder that as with any programming model choosing the correct process and initial design is vital if it is to sustain performance. No compiler in the world can make a slow design run fast. Is it going to be the case that developers will have to choose a design path in order to gain performance on a particular card? If so, this really goes back to my fear of developers having to code for a card as opposed to a generic API specification.

I have to be honest and say I hadn't really paid too much attention to the HLSL threads but I'm beginning to see the importance of clearing this mess up before PS3/VS3 come down the road with yet more instructions and variety

KimB · Aug 15, 2003

Seiko said:
Thx Chalnoth,
Yes I see your point but with such a simplified set of instructions I'd expect a single and unified approach to be acceptable to all.

Why is a simplified instruction set supposed to be the best? This is all about two things: high performance, and easy programmability. A very complex instruction set can be extremely high performance, and with an HLSL, it can be just as easy to program for as any other instruction set.

Particularly when things like branching start coming into play, more complex instruction sets will probably begin to pull away.

Anyway, with the ideal HLSL, each shader would be "write once and run anywhere," where each IHV would have their own optimized compiler to work with. Considerations may have to be made for the strengths of each architecture (for example, use of lower-precision data types for the NV3x would be required for good performance), but there shouldn't be any requirement for IHV-specific code. The code would just be compiled at runtime by the video card's drivers.

Colourless · Aug 15, 2003

There would be a lot of people out there that would say your thinking is completely backwards. You are arguing for 'CISC' designed GPUs

Seiko · Aug 15, 2003

Chalnoth said:
Seiko said:

Thx Chalnoth,
Yes I see your point but with such a simplified set of instructions I'd expect a single and unified approach to be acceptable to all.

Click to expand...

Why is a simplified instruction set supposed to be the best?

Erm who's saying I want a simplified instruction set. I was saying or at least trying to say that with a comparitively small set of instructions currently available I'm shocked to think the variations of those could cause one card to perform exceptionally well and the other exceptionally poorly?
Of course we require more instructions but as the possible approaches and specific usage of those increase aren't we running the risk of seeing the cards performance vary even more?

KimB · Aug 15, 2003

Colourless said:
There would be a lot of people out there that would say your thinking is completely backwards. You are arguing for 'CISC' designed GPUs

No, I'm absolutely not.

What I'm arguing for is a decoupling of shader programs from the hardware. The higher the standard level of programming, the more freedom hardware developers have to separate from conventional, straightforward instruction sets.

CISC became popular because programs were compiled directly to the machine language. Compiling (or writing) directly to assembly is almost as bad. If there was a standard, high-level language implemented early in the lifetime of CISC processors, then we wouldn't have this problem of having processors with very significant numbers of transistors dedicated to translating the CISC instructions to RISC.

By getting programmers to step back to a higher-level language, we will make it easier on hardware developers to innovate.

Side note: when I was stating that a hard to program for assembly language may be higher-performing, I was thinking more along the lines of VLIW, which, I believe, includes instructions that have information about parallelism and scheduling, things that programmers shouldn't have to deal with.

KimB · Aug 15, 2003

Seiko said:
Erm who's saying I want a simplified instruction set. I was saying or at least trying to say that with a comparitively small set of instructions currently available I'm shocked to think the variations of those could cause one card to perform exceptionally well and the other exceptionally poorly?

I think the main problem with the NV3x architecture is that it has problems with large amounts of registers, and that it has different functional units for different tasks that must be executed in a specific order for optimal performance.

Rugor · Aug 17, 2003

Personally, I really dislike the fact that the NV3x series appears to require optimized shaders for top performance.

For starters that means you have to hope your game is one of the ones they optimized for or performance will bite. I also think that optimizations are a way of saying your card doesn't have enough power or it could compete without them.

As has been said before the only way to compare optimized Nvidia drivers would be to use optimized ATI drivers as well. Failing that, the only other option to compare the ability of the two cards is to use non-optimized drivers for both.

Now comes the crux of the synthetic vs. game benchmarks argument. While it is true that synthetic tests cannot be equated to game tests, they can give an idea of two cards relative capability. If a card does poorly on synthetic shader tests it is safe to expect that it may have an issue in shader dependent games. Since the GfFX series does have weaknesses in most synthetic shader tests, it should not be surprising that it may have weaknesses in games using shaders. Other factors may outweigh shader performance but poor shader performance in benchmarks is not something I would consider a positive.

Every test we have been able to devise, game or synthetic, indicates NV3x cards have weak shaders. If they are all telling the same thing, I would take it as a strong indication that shaders are not the card's strongest point.

Ichneumon · Aug 17, 2003

Chalnoth said:
Regardless, we all know that the FX 5800 is not well-suited for use in DirectX. It does, however, work to spec when using nVidia's OpenGL extensions. It would probably be best to use nVidia's OpenGL shader extensions, then, if you're attempting to compare the hardware. At the very least, Microsoft's DirectX 9 shaders are different enough from nVidia's OpenGL shaders, and therefore from the hardware, that significant compiling between the assembly and the machine guarantees that the differences in the architectures aren't going to be readily-apparent.

Ahh...
So what you're saying is that if we code around the weaknesses inherant in the FX architecture, we get good performance.

If there was no such thing as the R3xx series, then I'm quite sure developers would have grudgingly swallowed that large, irregularly shaped pill.

However, the R3xx series proved that developers shouldn't have to bend over backwards to accomodate botched hardware designs. That is what general graphics API's are for... I am quite happy to be out of the days of Glide, CIF, and all that proprietary API/custom extesion nonsense. You should be too.

R360 != 0.13 process ?

KimB

Seiko

surfhurleydude

Dave Baumann

Gamerscore Wh...

Colourless

Monochrome wench

Joe DeFuria

Seiko

KimB

KimB

Althornin

Senior Lurker

Seiko

KimB

Seiko

KimB

Colourless

Monochrome wench

Seiko

KimB

KimB

Rugor

Ichneumon

Similar threads